Re: [PATCH 05/10] UBIFS: fix shrinker object count reports

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 05/10] UBIFS: fix shrinker object count reports
  2011-06-03 13:49 ` [PATCH 05/10] UBIFS: fix shrinker object count reports Artem Bityutskiy
@ 2011-06-03 13:48   ` Artem Bityutskiy
  2011-06-03 14:30     ` Artem Bityutskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:48 UTC (permalink / raw)
  To: linux-mtd

On Fri, 2011-06-03 at 16:49 +0300, Artem Bityutskiy wrote:
> +		/*
> +		 * Due to the way UBIFS updates the clean znode counter it may
> +		 * temporarily be negative.
> +		 */
> +		return clean_zn_cnt > 0 ?: 1;

Oh, this should be  clean_zn_cnt >= 0 ?: 1;

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 00/10] UBIFS: for Linux 3.0
@ 2011-06-03 13:49 Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 01/10] UBIFS: supress false error messages Artem Bityutskiy
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

Hi,

I'm going to send these patches to Linus later today.

Artem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/10] UBIFS: supress false error messages
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 02/10] UBIFS: introduce a "grouped" journal head flag Artem Bityutskiy
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but
it introduced a regression - now UBIFS prints scary error messages during
recovery on all corrupted nodes, even though the corruptions are expected
(due to a power cut). This patch fixes the issue.

Additionally fix a typo in a commentary introduced by the same commit.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/recovery.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 731d9e2..95e2418 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -635,7 +635,7 @@ struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 		 * Scan quietly until there is an error from which we cannot
 		 * recover
 		 */
-		ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 0);
+		ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
 		if (ret == SCANNED_A_NODE) {
 			/* A valid node, and not a padding node */
 			struct ubifs_ch *ch = buf;
@@ -701,7 +701,7 @@ struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 	 * While we are in the middle of the same min. I/O unit keep dropping
 	 * nodes. So basically, what we want is to make sure that the last min.
 	 * I/O unit where we saw the corruption is dropped completely with all
-	 * the uncorrupted node which may possibly sit there.
+	 * the uncorrupted nodes which may possibly sit there.
 	 *
 	 * In other words, let's name the min. I/O unit where the corruption
 	 * starts B, and the previous min. I/O unit A. The below code tries to
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 02/10] UBIFS: introduce a "grouped" journal head flag
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 01/10] UBIFS: supress false error messages Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 03/10] UBIFS: amend ubifs_recover_leb interface Artem Bityutskiy
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Journal heads are different in a way how UBIFS writes nodes there. All normal
journal heads receive grouped nodes, while the GC journal heads receives
ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
describes this property.

This patch is a preparation to a further recovery fix.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |    5 ++++-
 fs/ubifs/ubifs.h |    2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 1ab0d22..1e40db7 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -811,15 +811,18 @@ static int alloc_wbufs(struct ubifs_info *c)
 
 		c->jheads[i].wbuf.sync_callback = &bud_wbuf_callback;
 		c->jheads[i].wbuf.jhead = i;
+		c->jheads[i].grouped = 1;
 	}
 
 	c->jheads[BASEHD].wbuf.dtype = UBI_SHORTTERM;
 	/*
 	 * Garbage Collector head likely contains long-term data and
-	 * does not need to be synchronized by timer.
+	 * does not need to be synchronized by timer. Also GC head nodes are
+	 * not grouped.
 	 */
 	c->jheads[GCHD].wbuf.dtype = UBI_LONGTERM;
 	c->jheads[GCHD].wbuf.no_timer = 1;
+	c->jheads[GCHD].grouped = 0;
 
 	return 0;
 }
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index a70d7b4..adeca14 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -722,12 +722,14 @@ struct ubifs_bud {
  * struct ubifs_jhead - journal head.
  * @wbuf: head's write-buffer
  * @buds_list: list of bud LEBs belonging to this journal head
+ * @grouped: non-zero if UBIFS groups nodes when writing to this journal head
  *
  * Note, the @buds list is protected by the @c->buds_lock.
  */
 struct ubifs_jhead {
 	struct ubifs_wbuf wbuf;
 	struct list_head buds_list;
+	unsigned int grouped:1;
 };
 
 /**
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/10] UBIFS: amend ubifs_recover_leb interface
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 01/10] UBIFS: supress false error messages Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 02/10] UBIFS: introduce a "grouped" journal head flag Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 04/10] UBIFS: fix recovery broken by the previous recovery fix Artem Bityutskiy
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
whether the nodes are grouped in the LEB to recover, pass the journal head
number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.

This patch is a preparation to a further fix where we'll need to know the
journal head number for other purposes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/orphan.c   |    2 +-
 fs/ubifs/recovery.c |   10 ++++++----
 fs/ubifs/replay.c   |    3 +--
 fs/ubifs/ubifs.h    |    2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index bd644bf..a5422ff 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -674,7 +674,7 @@ static int kill_orphans(struct ubifs_info *c)
 		if (IS_ERR(sleb)) {
 			if (PTR_ERR(sleb) == -EUCLEAN)
 				sleb = ubifs_recover_leb(c, lnum, 0,
-							 c->sbuf, 0);
+							 c->sbuf, -1);
 			if (IS_ERR(sleb)) {
 				err = PTR_ERR(sleb);
 				break;
diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 95e2418..6adb532 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -604,7 +604,8 @@ static int drop_last_node(struct ubifs_scan_leb *sleb, int *offs, int grouped)
  * @lnum: LEB number
  * @offs: offset
  * @sbuf: LEB-sized buffer to use
- * @grouped: nodes may be grouped for recovery
+ * @jhead: journal head number this LEB belongs to (%-1 if the LEB does not
+ *         belong to any journal head)
  *
  * This function does a scan of a LEB, but caters for errors that might have
  * been caused by the unclean unmount from which we are attempting to recover.
@@ -612,13 +613,14 @@ static int drop_last_node(struct ubifs_scan_leb *sleb, int *offs, int grouped)
  * found, and a negative error code in case of failure.
  */
 struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
-					 int offs, void *sbuf, int grouped)
+					 int offs, void *sbuf, int jhead)
 {
 	int ret = 0, err, len = c->leb_size - offs, start = offs, min_io_unit;
+	int grouped = jhead == -1 ? 0 : c->jheads[jhead].grouped;
 	struct ubifs_scan_leb *sleb;
 	void *buf = sbuf + offs;
 
-	dbg_rcvry("%d:%d", lnum, offs);
+	dbg_rcvry("%d:%d, jhead %d, grouped %d", lnum, offs, jhead, grouped);
 
 	sleb = ubifs_start_scan(c, lnum, offs, sbuf);
 	if (IS_ERR(sleb))
@@ -881,7 +883,7 @@ struct ubifs_scan_leb *ubifs_recover_log_leb(struct ubifs_info *c, int lnum,
 		}
 		ubifs_scan_destroy(sleb);
 	}
-	return ubifs_recover_leb(c, lnum, offs, sbuf, 0);
+	return ubifs_recover_leb(c, lnum, offs, sbuf, -1);
 }
 
 /**
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 6617280..5e97161 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -557,8 +557,7 @@ static int replay_bud(struct ubifs_info *c, struct bud_entry *b)
 		 * these LEBs could possibly be written to at the power cut
 		 * time.
 		 */
-		sleb = ubifs_recover_leb(c, lnum, offs, c->sbuf,
-					 b->bud->jhead != GCHD);
+		sleb = ubifs_recover_leb(c, lnum, offs, c->sbuf, b->bud->jhead);
 	else
 		sleb = ubifs_scan(c, lnum, offs, c->sbuf, 0);
 	if (IS_ERR(sleb))
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index adeca14..f79983d 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1744,7 +1744,7 @@ struct inode *ubifs_iget(struct super_block *sb, unsigned long inum);
 int ubifs_recover_master_node(struct ubifs_info *c);
 int ubifs_write_rcvrd_mst_node(struct ubifs_info *c);
 struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
-					 int offs, void *sbuf, int grouped);
+					 int offs, void *sbuf, int jhead);
 struct ubifs_scan_leb *ubifs_recover_log_leb(struct ubifs_info *c, int lnum,
 					     int offs, void *sbuf);
 int ubifs_recover_inl_heads(const struct ubifs_info *c, void *sbuf);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/10] UBIFS: fix recovery broken by the previous recovery fix
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (2 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 03/10] UBIFS: amend ubifs_recover_leb interface Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 05/10] UBIFS: fix shrinker object count reports Artem Bityutskiy
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
(UBIFS: fix extremely rare mount failure) broke recovery. This commit make
UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
for the GC head. And this does not work for non-GC heads. For example, if
suppose we have min. I/O units A and B, and A contains a valid node X, which
was fsynced, and then a group of nodes Y which spans the rest of A and B. In
this case we'll drop not only Y, but also X, which is obviously incorrect.

This patch fixes the issue and additionally makes recovery to drop last min.
I/O unit only for the GC head, and leave things as they have been for ages for
the other heads - this is safer.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/recovery.c |  152 +++++++++++++++++++++++++++++----------------------
 1 files changed, 87 insertions(+), 65 deletions(-)

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index 6adb532..783d8e0 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -564,19 +564,15 @@ static int fix_unclean_leb(struct ubifs_info *c, struct ubifs_scan_leb *sleb,
 }
 
 /**
- * drop_last_node - drop the last node or group of nodes.
+ * drop_last_group - drop the last group of nodes.
  * @sleb: scanned LEB information
  * @offs: offset of dropped nodes is returned here
- * @grouped: non-zero if whole group of nodes have to be dropped
  *
  * This is a helper function for 'ubifs_recover_leb()' which drops the last
- * node of the scanned LEB or the last group of nodes if @grouped is not zero.
- * This function returns %1 if a node was dropped and %0 otherwise.
+ * group of nodes of the scanned LEB.
  */
-static int drop_last_node(struct ubifs_scan_leb *sleb, int *offs, int grouped)
+static void drop_last_group(struct ubifs_scan_leb *sleb, int *offs)
 {
-	int dropped = 0;
-
 	while (!list_empty(&sleb->nodes)) {
 		struct ubifs_scan_node *snod;
 		struct ubifs_ch *ch;
@@ -585,17 +581,40 @@ static int drop_last_node(struct ubifs_scan_leb *sleb, int *offs, int grouped)
 				  list);
 		ch = snod->node;
 		if (ch->group_type != UBIFS_IN_NODE_GROUP)
-			return dropped;
-		dbg_rcvry("dropping node at %d:%d", sleb->lnum, snod->offs);
+			break;
+
+		dbg_rcvry("dropping grouped node at %d:%d",
+			  sleb->lnum, snod->offs);
+		*offs = snod->offs;
+		list_del(&snod->list);
+		kfree(snod);
+		sleb->nodes_cnt -= 1;
+	}
+}
+
+/**
+ * drop_last_node - drop the last node.
+ * @sleb: scanned LEB information
+ * @offs: offset of dropped nodes is returned here
+ * @grouped: non-zero if whole group of nodes have to be dropped
+ *
+ * This is a helper function for 'ubifs_recover_leb()' which drops the last
+ * node of the scanned LEB.
+ */
+static void drop_last_node(struct ubifs_scan_leb *sleb, int *offs)
+{
+	struct ubifs_scan_node *snod;
+
+	if (!list_empty(&sleb->nodes)) {
+		snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node,
+				  list);
+
+		dbg_rcvry("dropping last node at %d:%d", sleb->lnum, snod->offs);
 		*offs = snod->offs;
 		list_del(&snod->list);
 		kfree(snod);
 		sleb->nodes_cnt -= 1;
-		dropped = 1;
-		if (!grouped)
-			break;
 	}
-	return dropped;
 }
 
 /**
@@ -697,59 +716,62 @@ struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 		 * If nodes are grouped, always drop the incomplete group at
 		 * the end.
 		 */
-		drop_last_node(sleb, &offs, 1);
+		drop_last_group(sleb, &offs);
 
-	/*
-	 * While we are in the middle of the same min. I/O unit keep dropping
-	 * nodes. So basically, what we want is to make sure that the last min.
-	 * I/O unit where we saw the corruption is dropped completely with all
-	 * the uncorrupted nodes which may possibly sit there.
-	 *
-	 * In other words, let's name the min. I/O unit where the corruption
-	 * starts B, and the previous min. I/O unit A. The below code tries to
-	 * deal with a situation when half of B contains valid nodes or the end
-	 * of a valid node, and the second half of B contains corrupted data or
-	 * garbage. This means that UBIFS had been writing to B just before the
-	 * power cut happened. I do not know how realistic is this scenario
-	 * that half of the min. I/O unit had been written successfully and the
-	 * other half not, but this is possible in our 'failure mode emulation'
-	 * infrastructure at least.
-	 *
-	 * So what is the problem, why we need to drop those nodes? Whey can't
-	 * we just clean-up the second half of B by putting a padding node
-	 * there? We can, and this works fine with one exception which was
-	 * reproduced with power cut emulation testing and happens extremely
-	 * rarely. The description follows, but it is worth noting that that is
-	 * only about the GC head, so we could do this trick only if the bud
-	 * belongs to the GC head, but it does not seem to be worth an
-	 * additional "if" statement.
-	 *
-	 * So, imagine the file-system is full, we run GC which is moving valid
-	 * nodes from LEB X to LEB Y (obviously, LEB Y is the current GC head
-	 * LEB). The @c->gc_lnum is -1, which means that GC will retain LEB X
-	 * and will try to continue. Imagine that LEB X is currently the
-	 * dirtiest LEB, and the amount of used space in LEB Y is exactly the
-	 * same as amount of free space in LEB X.
-	 *
-	 * And a power cut happens when nodes are moved from LEB X to LEB Y. We
-	 * are here trying to recover LEB Y which is the GC head LEB. We find
-	 * the min. I/O unit B as described above. Then we clean-up LEB Y by
-	 * padding min. I/O unit. And later 'ubifs_rcvry_gc_commit()' function
-	 * fails, because it cannot find a dirty LEB which could be GC'd into
-	 * LEB Y! Even LEB X does not match because the amount of valid nodes
-	 * there does not fit the free space in LEB Y any more! And this is
-	 * because of the padding node which we added to LEB Y. The
-	 * user-visible effect of this which I once observed and analysed is
-	 * that we cannot mount the file-system with -ENOSPC error.
-	 *
-	 * So obviously, to make sure that situation does not happen we should
-	 * free min. I/O unit B in LEB Y completely and the last used min. I/O
-	 * unit in LEB Y should be A. This is basically what the below code
-	 * tries to do.
-	 */
-	while (min_io_unit == round_down(offs, c->min_io_size) &&
-	       min_io_unit != offs &&
-	       drop_last_node(sleb, &offs, grouped));
+	if (jhead == GCHD) {
+		/*
+		 * If this LEB belongs to the GC head then while we are in the
+		 * middle of the same min. I/O unit keep dropping nodes. So
+		 * basically, what we want is to make sure that the last min.
+		 * I/O unit where we saw the corruption is dropped completely
+		 * with all the uncorrupted nodes which may possibly sit there.
+		 *
+		 * In other words, let's name the min. I/O unit where the
+		 * corruption starts B, and the previous min. I/O unit A. The
+		 * below code tries to deal with a situation when half of B
+		 * contains valid nodes or the end of a valid node, and the
+		 * second half of B contains corrupted data or garbage. This
+		 * means that UBIFS had been writing to B just before the power
+		 * cut happened. I do not know how realistic is this scenario
+		 * that half of the min. I/O unit had been written successfully
+		 * and the other half not, but this is possible in our 'failure
+		 * mode emulation' infrastructure at least.
+		 *
+		 * So what is the problem, why we need to drop those nodes? Why
+		 * can't we just clean-up the second half of B by putting a
+		 * padding node there? We can, and this works fine with one
+		 * exception which was reproduced with power cut emulation
+		 * testing and happens extremely rarely.
+		 *
+		 * Imagine the file-system is full, we run GC which starts
+		 * moving valid nodes from LEB X to LEB Y (obviously, LEB Y is
+		 * the current GC head LEB). The @c->gc_lnum is -1, which means
+		 * that GC will retain LEB X and will try to continue. Imagine
+		 * that LEB X is currently the dirtiest LEB, and the amount of
+		 * used space in LEB Y is exactly the same as amount of free
+		 * space in LEB X.
+		 *
+		 * And a power cut happens when nodes are moved from LEB X to
+		 * LEB Y. We are here trying to recover LEB Y which is the GC
+		 * head LEB. We find the min. I/O unit B as described above.
+		 * Then we clean-up LEB Y by padding min. I/O unit. And later
+		 * 'ubifs_rcvry_gc_commit()' function fails, because it cannot
+		 * find a dirty LEB which could be GC'd into LEB Y! Even LEB X
+		 * does not match because the amount of valid nodes there does
+		 * not fit the free space in LEB Y any more! And this is
+		 * because of the padding node which we added to LEB Y. The
+		 * user-visible effect of this which I once observed and
+		 * analysed is that we cannot mount the file-system with
+		 * -ENOSPC error.
+		 *
+		 * So obviously, to make sure that situation does not happen we
+		 * should free min. I/O unit B in LEB Y completely and the last
+		 * used min. I/O unit in LEB Y should be A. This is basically
+		 * what the below code tries to do.
+		 */
+		while (offs > min_io_unit)
+			drop_last_node(sleb, &offs);
+	}
 
 	buf = sbuf + offs;
 	len = c->leb_size - offs;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/10] UBIFS: fix shrinker object count reports
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (3 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 04/10] UBIFS: fix recovery broken by the previous recovery fix Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:48   ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 06/10] UBIFS: fix memory leak on error path Artem Bityutskiy
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:

shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788

This patch makes sure UBIFS never returns negative count of objects.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
---
 fs/ubifs/shrinker.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
index ca953a9..dfc105a 100644
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -284,7 +284,11 @@ int ubifs_shrinker(struct shrinker *shrink, struct shrink_control *sc)
 	long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
 
 	if (nr == 0)
-		return clean_zn_cnt;
+		/*
+		 * Due to the way UBIFS updates the clean znode counter it may
+		 * temporarily be negative.
+		 */
+		return clean_zn_cnt > 0 ?: 1;
 
 	if (!clean_zn_cnt) {
 		/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/10] UBIFS: fix memory leak on error path
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (4 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 05/10] UBIFS: fix shrinker object count reports Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 07/10] UBIFS: fix clean znode counter corruption in error cases Artem Bityutskiy
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
---
 fs/ubifs/journal.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 34b1679..cef0460 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -669,6 +669,7 @@ out_free:
 
 out_release:
 	release_head(c, BASEHD);
+	kfree(dent);
 out_ro:
 	ubifs_ro_mode(c, err);
 	if (last_reference)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 07/10] UBIFS: fix clean znode counter corruption in error cases
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (5 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 06/10] UBIFS: fix memory leak on error path Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:49 ` [PATCH 08/10] UBIFS: assert no fixup when writing a node Artem Bityutskiy
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

UBIFS maintains per-filesystem and global clean znode counters
('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.

However, in case of failures during commit the counters were corrupted. E.g.,
if a failure happens in the middle of 'write_index()', then some nodes in the
commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
have 2 file-sytem and one of them fails, and we unmount it,
'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/tnc.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
index 8119b1f..91b4213 100644
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -2876,12 +2876,13 @@ static void tnc_destroy_cnext(struct ubifs_info *c)
  */
 void ubifs_tnc_close(struct ubifs_info *c)
 {
-	long clean_freed;
-
 	tnc_destroy_cnext(c);
 	if (c->zroot.znode) {
-		clean_freed = ubifs_destroy_tnc_subtree(c->zroot.znode);
-		atomic_long_sub(clean_freed, &ubifs_clean_zn_cnt);
+		long n;
+
+		ubifs_destroy_tnc_subtree(c->zroot.znode);
+		n = atomic_long_read(&c->clean_zn_cnt);
+		atomic_long_sub(n, &ubifs_clean_zn_cnt);
 	}
 	kfree(c->gap_lebs);
 	kfree(c->ilebs);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 08/10] UBIFS: assert no fixup when writing a node
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (6 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 07/10] UBIFS: fix clean znode counter corruption in error cases Artem Bityutskiy
@ 2011-06-03 13:49 ` Artem Bityutskiy
  2011-06-03 13:50 ` [PATCH 09/10] UBIFS: intialize LPT earlier Artem Bityutskiy
  2011-06-03 13:50 ` [PATCH 10/10] UBIFS: fix-up free space earlier Artem Bityutskiy
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:49 UTC (permalink / raw)
  To: linux-mtd

From: Ben Gardiner <bengardiner@nanometrics.ca>

The current free space fixup can result in some writing to the UBI volume
when the space_fixup flag is set.

To catch instances where UBIFS is writing to the NAND while the space_fixup
flag is set, add an assert to ubifs_write_node().

Artem: tweaked the patch, added similar assertion to the write buffer
       write path.

Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/io.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 166951e..3be645e 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -581,6 +581,7 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
 	ubifs_assert(wbuf->size % c->min_io_size == 0);
 	ubifs_assert(mutex_is_locked(&wbuf->io_mutex));
 	ubifs_assert(!c->ro_media && !c->ro_mount);
+	ubifs_assert(!c->space_fixup);
 	if (c->leb_size - wbuf->offs >= c->max_write_size)
 		ubifs_assert(!((wbuf->offs + wbuf->size) % c->max_write_size));
 
@@ -759,6 +760,7 @@ int ubifs_write_node(struct ubifs_info *c, void *buf, int len, int lnum,
 	ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
 	ubifs_assert(offs % c->min_io_size == 0 && offs < c->leb_size);
 	ubifs_assert(!c->ro_media && !c->ro_mount);
+	ubifs_assert(!c->space_fixup);
 
 	if (c->ro_error)
 		return -EROFS;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 09/10] UBIFS: intialize LPT earlier
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (7 preceding siblings ...)
  2011-06-03 13:49 ` [PATCH 08/10] UBIFS: assert no fixup when writing a node Artem Bityutskiy
@ 2011-06-03 13:50 ` Artem Bityutskiy
  2011-06-03 13:50 ` [PATCH 10/10] UBIFS: fix-up free space earlier Artem Bityutskiy
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:50 UTC (permalink / raw)
  To: linux-mtd

From: Ben Gardiner <bengardiner@nanometrics.ca>

The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.

Artem: massaged the patch and commit message.

Signed-off-by: Ben Gardiner <bengardiner@nanometrics.ca>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |   29 ++++++++++++++++-------------
 1 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 1e40db7..6d357fd 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1282,17 +1282,24 @@ static int mount_ubifs(struct ubifs_info *c)
 	if (err)
 		goto out_master;
 
-	init_constants_master(c);
-
 	if ((c->mst_node->flags & cpu_to_le32(UBIFS_MST_DIRTY)) != 0) {
 		ubifs_msg("recovery needed");
 		c->need_recovery = 1;
-		if (!c->ro_mount) {
-			err = ubifs_recover_inl_heads(c, c->sbuf);
-			if (err)
-				goto out_master;
-		}
-	} else if (!c->ro_mount) {
+	}
+
+	init_constants_master(c);
+
+	if (c->need_recovery && !c->ro_mount) {
+		err = ubifs_recover_inl_heads(c, c->sbuf);
+		if (err)
+			goto out_master;
+	}
+
+	err = ubifs_lpt_init(c, 1, !c->ro_mount);
+	if (err)
+		goto out_master;
+
+	if (!c->ro_mount) {
 		/*
 		 * Set the "dirty" flag so that if we reboot uncleanly we
 		 * will notice this immediately on the next mount.
@@ -1300,13 +1307,9 @@ static int mount_ubifs(struct ubifs_info *c)
 		c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
 		err = ubifs_write_master(c);
 		if (err)
-			goto out_master;
+			goto out_lpt;
 	}
 
-	err = ubifs_lpt_init(c, 1, !c->ro_mount);
-	if (err)
-		goto out_lpt;
-
 	err = dbg_check_idx_size(c, c->bi.old_idx_sz);
 	if (err)
 		goto out_lpt;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 10/10] UBIFS: fix-up free space earlier
  2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
                   ` (8 preceding siblings ...)
  2011-06-03 13:50 ` [PATCH 09/10] UBIFS: intialize LPT earlier Artem Bityutskiy
@ 2011-06-03 13:50 ` Artem Bityutskiy
  9 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 13:50 UTC (permalink / raw)
  To: linux-mtd

From: Ben Gardiner <bengardiner@nanometrics.ca>

The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:

Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.

Artem: tweaked the patch a bit

Signed-off-by: Ben Gardiner <bengardiner@nanometrics>
Reviewed-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 fs/ubifs/super.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 6d357fd..b5aeb5a 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1282,13 +1282,13 @@ static int mount_ubifs(struct ubifs_info *c)
 	if (err)
 		goto out_master;
 
+	init_constants_master(c);
+
 	if ((c->mst_node->flags & cpu_to_le32(UBIFS_MST_DIRTY)) != 0) {
 		ubifs_msg("recovery needed");
 		c->need_recovery = 1;
 	}
 
-	init_constants_master(c);
-
 	if (c->need_recovery && !c->ro_mount) {
 		err = ubifs_recover_inl_heads(c, c->sbuf);
 		if (err)
@@ -1299,6 +1299,12 @@ static int mount_ubifs(struct ubifs_info *c)
 	if (err)
 		goto out_master;
 
+	if (!c->ro_mount && c->space_fixup) {
+		err = ubifs_fixup_free_space(c);
+		if (err)
+			goto out_master;
+	}
+
 	if (!c->ro_mount) {
 		/*
 		 * Set the "dirty" flag so that if we reboot uncleanly we
@@ -1402,12 +1408,6 @@ static int mount_ubifs(struct ubifs_info *c)
 	} else
 		ubifs_assert(c->lst.taken_empty_lebs > 0);
 
-	if (!c->ro_mount && c->space_fixup) {
-		err = ubifs_fixup_free_space(c);
-		if (err)
-			goto out_infos;
-	}
-
 	err = dbg_check_filesystem(c);
 	if (err)
 		goto out_infos;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 05/10] UBIFS: fix shrinker object count reports
  2011-06-03 13:48   ` Artem Bityutskiy
@ 2011-06-03 14:30     ` Artem Bityutskiy
  2011-06-03 15:06       ` Artem Bityutskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 14:30 UTC (permalink / raw)
  To: linux-mtd

On Fri, 2011-06-03 at 16:48 +0300, Artem Bityutskiy wrote:
> On Fri, 2011-06-03 at 16:49 +0300, Artem Bityutskiy wrote:
> > +		/*
> > +		 * Due to the way UBIFS updates the clean znode counter it may
> > +		 * temporarily be negative.
> > +		 */
> > +		return clean_zn_cnt > 0 ?: 1;
> 
> Oh, this should be  clean_zn_cnt >= 0 ?: 1;

Crap, sorry, stupid me :-)

return clean_zn_cnt < 0 ?: clean_zn_cnt;

But yeah, it is saner to not use "?:" at all:

return clean_zn_cnt >= 0 ? clean_zn_cnt: 1;

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 05/10] UBIFS: fix shrinker object count reports
  2011-06-03 14:30     ` Artem Bityutskiy
@ 2011-06-03 15:06       ` Artem Bityutskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Artem Bityutskiy @ 2011-06-03 15:06 UTC (permalink / raw)
  To: linux-mtd

So,

>From cf610bf4199770420629d3bc273494bd27ad6c1d Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Tue, 31 May 2011 07:03:21 +0300
Subject: [PATCH] UBIFS: fix shrinker object count reports

Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:

shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788

This patch makes sure UBIFS never returns negative count of objects.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
---
 fs/ubifs/shrinker.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ubifs/shrinker.c b/fs/ubifs/shrinker.c
index ca953a9..9e1d056 100644
--- a/fs/ubifs/shrinker.c
+++ b/fs/ubifs/shrinker.c
@@ -284,7 +284,11 @@ int ubifs_shrinker(struct shrinker *shrink, struct shrink_control *sc)
 	long clean_zn_cnt = atomic_long_read(&ubifs_clean_zn_cnt);
 
 	if (nr == 0)
-		return clean_zn_cnt;
+		/*
+		 * Due to the way UBIFS updates the clean znode counter it may
+		 * temporarily be negative.
+		 */
+		return clean_zn_cnt >= 0 ? clean_zn_cnt : 1;
 
 	if (!clean_zn_cnt) {
 		/*
-- 
1.7.2.3

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-06-03 15:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-03 13:49 [PATCH 00/10] UBIFS: for Linux 3.0 Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 01/10] UBIFS: supress false error messages Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 02/10] UBIFS: introduce a "grouped" journal head flag Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 03/10] UBIFS: amend ubifs_recover_leb interface Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 04/10] UBIFS: fix recovery broken by the previous recovery fix Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 05/10] UBIFS: fix shrinker object count reports Artem Bityutskiy
2011-06-03 13:48   ` Artem Bityutskiy
2011-06-03 14:30     ` Artem Bityutskiy
2011-06-03 15:06       ` Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 06/10] UBIFS: fix memory leak on error path Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 07/10] UBIFS: fix clean znode counter corruption in error cases Artem Bityutskiy
2011-06-03 13:49 ` [PATCH 08/10] UBIFS: assert no fixup when writing a node Artem Bityutskiy
2011-06-03 13:50 ` [PATCH 09/10] UBIFS: intialize LPT earlier Artem Bityutskiy
2011-06-03 13:50 ` [PATCH 10/10] UBIFS: fix-up free space earlier Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.