Linux-mtd Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/2] fix potential race between ubifs_tnc_locate() and GC
@ 2020-03-05  9:22 Hou Tao
  2020-03-05  9:22 ` [PATCH v3 1/2] ubifs: factor out helper ubifs_check_node_buf() Hou Tao
  2020-03-05  9:22 ` [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf Hou Tao
  0 siblings, 2 replies; 6+ messages in thread
From: Hou Tao @ 2020-03-05  9:22 UTC (permalink / raw)
  To: linux-mtd, Richard Weinberger; +Cc: Carson.Li1, Adrian Hunter, houtao1

Hi,

The patchset tries to fix the problem reported by 李傲傲 [1]. It happens
because there are races between ubifs_tnc_locate() and GC when
ubifs_tnc_locate() finds the target LEB is used as write-buffers or buds.
And the patchset fixes it by only reading the node from write-buffer
when the node is fully contained in write-buffer.

Comments are welcome.

Regards,
Tao

--
v3:
 -- add Link: tag
 -- add UBIFS_CHK_FORCE_DUMP_BAD_NODE flag for error message control

[1]: https://www.spinics.net/lists/linux-mtd/msg10771.html

Hou Tao (2):
  ubifs: factor out helper ubifs_check_node_buf()
  ubifs: read node from wbuf when it fully sits in wbuf

 fs/ubifs/io.c    | 109 +++++++++++++++++++++++------------------------
 fs/ubifs/tnc.c   |  81 +++++++++++++++++++++++++++++++++--
 fs/ubifs/ubifs.h |   5 +++
 3 files changed, 136 insertions(+), 59 deletions(-)


-- 
2.25.0.4.g0ad7144999


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] ubifs: factor out helper ubifs_check_node_buf()
  2020-03-05  9:22 [PATCH v3 0/2] fix potential race between ubifs_tnc_locate() and GC Hou Tao
@ 2020-03-05  9:22 ` Hou Tao
  2020-03-05  9:22 ` [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf Hou Tao
  1 sibling, 0 replies; 6+ messages in thread
From: Hou Tao @ 2020-03-05  9:22 UTC (permalink / raw)
  To: linux-mtd, Richard Weinberger; +Cc: Carson.Li1, Adrian Hunter, houtao1

It will be used by the following patch to check
the validity of node in buf.

And in order to disable node dumping during fs probing,
an UBIFS_CHK_FORCE_DUMP_BAD_NODE flag is added to
accomplish it.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/ubifs/io.c    | 109 +++++++++++++++++++++++------------------------
 fs/ubifs/ubifs.h |   5 +++
 2 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 8ceb51478800..c174303274ae 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -946,6 +946,55 @@ int ubifs_write_node(struct ubifs_info *c, void *buf, int len, int lnum,
 	return ubifs_write_node_hmac(c, buf, len, lnum, offs, -1);
 }
 
+/**
+ * ubifs_check_node_buf - check node buffer
+ * @c: UBIFS file-system description object
+ * @buf: the buffer saves the node
+ * @type: the expected type of node
+ * @len: the expected length of node
+ * @lnum: logical eraseblock number
+ * @offs: offset within the logical eraseblock
+ * @flags: flags for error message control
+ *
+ * returns 0 in case of success and %-EINVAL or %-EUCLEAN in case of failure.
+ */
+int ubifs_check_node_buf(const struct ubifs_info *c, void *buf, int type,
+			 int len, int lnum, int offs, int flags)
+
+{
+	const struct ubifs_ch *ch = buf;
+	int err;
+	int l;
+
+	if (type != ch->node_type) {
+		ubifs_err(c, "bad node type (%d but expected %d)",
+			  ch->node_type, type);
+		goto out;
+	}
+
+	err = ubifs_check_node(c, buf, lnum, offs, 0, 0);
+	if (err) {
+		ubifs_err(c, "expected node type %d", type);
+		return err;
+	}
+
+	l = le32_to_cpu(ch->len);
+	if (l != len) {
+		ubifs_err(c, "bad node length %d, expected %d", l, len);
+		goto out;
+	}
+
+	return 0;
+out:
+	ubifs_errc(c, "bad node at LEB %d:%d, LEB mapping status %d", lnum,
+		   offs, ubi_is_mapped(c->ubi, lnum));
+	if ((flags & UBIFS_CHK_FORCE_DUMP_BAD_NODE) || !c->probing) {
+		ubifs_dump_node(c, buf);
+		dump_stack();
+	}
+	return -EINVAL;
+}
+
 /**
  * ubifs_read_node_wbuf - read node from the media or write-buffer.
  * @wbuf: wbuf to check for un-written data
@@ -966,7 +1015,6 @@ int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
 {
 	const struct ubifs_info *c = wbuf->c;
 	int err, rlen, overlap;
-	struct ubifs_ch *ch = buf;
 
 	dbg_io("LEB %d:%d, %s, length %d, jhead %s", lnum, offs,
 	       dbg_ntype(type), len, dbg_jhead(wbuf->jhead));
@@ -998,31 +1046,8 @@ int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
 			return err;
 	}
 
-	if (type != ch->node_type) {
-		ubifs_err(c, "bad node type (%d but expected %d)",
-			  ch->node_type, type);
-		goto out;
-	}
-
-	err = ubifs_check_node(c, buf, lnum, offs, 0, 0);
-	if (err) {
-		ubifs_err(c, "expected node type %d", type);
-		return err;
-	}
-
-	rlen = le32_to_cpu(ch->len);
-	if (rlen != len) {
-		ubifs_err(c, "bad node length %d, expected %d", rlen, len);
-		goto out;
-	}
-
-	return 0;
-
-out:
-	ubifs_err(c, "bad node at LEB %d:%d", lnum, offs);
-	ubifs_dump_node(c, buf);
-	dump_stack();
-	return -EINVAL;
+	return ubifs_check_node_buf(c, buf, type, len, lnum, offs,
+				    UBIFS_CHK_FORCE_DUMP_BAD_NODE);
 }
 
 /**
@@ -1041,8 +1066,7 @@ int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
 int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
 		    int lnum, int offs)
 {
-	int err, l;
-	struct ubifs_ch *ch = buf;
+	int err;
 
 	dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
 	ubifs_assert(c, lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
@@ -1054,34 +1078,7 @@ int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
 	if (err && err != -EBADMSG)
 		return err;
 
-	if (type != ch->node_type) {
-		ubifs_errc(c, "bad node type (%d but expected %d)",
-			   ch->node_type, type);
-		goto out;
-	}
-
-	err = ubifs_check_node(c, buf, lnum, offs, 0, 0);
-	if (err) {
-		ubifs_errc(c, "expected node type %d", type);
-		return err;
-	}
-
-	l = le32_to_cpu(ch->len);
-	if (l != len) {
-		ubifs_errc(c, "bad node length %d, expected %d", l, len);
-		goto out;
-	}
-
-	return 0;
-
-out:
-	ubifs_errc(c, "bad node at LEB %d:%d, LEB mapping status %d", lnum,
-		   offs, ubi_is_mapped(c->ubi, lnum));
-	if (!c->probing) {
-		ubifs_dump_node(c, buf);
-		dump_stack();
-	}
-	return -EINVAL;
+	return ubifs_check_node_buf(c, buf, type, len, lnum, offs, 0);
 }
 
 /**
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index bff682309fbe..7b412494b3b6 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -155,6 +155,9 @@
 #define UBIFS_HMAC_ARR_SZ 0
 #endif
 
+/* Dump bad node when node checking fails */
+#define UBIFS_CHK_FORCE_DUMP_BAD_NODE 1
+
 /*
  * Lockdep classes for UBIFS inode @ui_mutex.
  */
@@ -1710,6 +1713,8 @@ int ubifs_is_mapped(const struct ubifs_info *c, int lnum);
 int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len);
 int ubifs_wbuf_seek_nolock(struct ubifs_wbuf *wbuf, int lnum, int offs);
 int ubifs_wbuf_init(struct ubifs_info *c, struct ubifs_wbuf *wbuf);
+int ubifs_check_node_buf(const struct ubifs_info *c, void *buf, int type,
+			 int len, int lnum, int offs, int flags);
 int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
 		    int lnum, int offs);
 int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
-- 
2.25.0.4.g0ad7144999


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf
  2020-03-05  9:22 [PATCH v3 0/2] fix potential race between ubifs_tnc_locate() and GC Hou Tao
  2020-03-05  9:22 ` [PATCH v3 1/2] ubifs: factor out helper ubifs_check_node_buf() Hou Tao
@ 2020-03-05  9:22 ` Hou Tao
  2020-03-16 22:21   ` Richard Weinberger
  2020-03-16 22:33   ` Richard Weinberger
  1 sibling, 2 replies; 6+ messages in thread
From: Hou Tao @ 2020-03-05  9:22 UTC (permalink / raw)
  To: linux-mtd, Richard Weinberger; +Cc: Carson.Li1, Adrian Hunter, houtao1

Carson Li Reports the following error:

 UBIFS error: ubifs_read_node_wbuf: expected node type 0
 Not a node, first 24 bytes:
 Kernel panic - not syncing
 CPU: 1 PID: 943 Comm: http-thread 4.4.83 #1
   panic+0x70/0x1e4
   ubifs_dump_node+0x6c/0x9a0
   ubifs_read_node_wbuf+0x350/0x384
   ubifs_tnc_read_node+0x54/0x214
   ubifs_tnc_locate+0x118/0x1b4
   ubifs_iget+0xb8/0x68c
   ubifs_lookup+0x1b4/0x258
   lookup_real+0x30/0x4c
   __lookup_hash+0x34/0x3c
   walk_component+0xec/0x2a0
   path_lookupat+0x80/0xfc
   filename_lookup+0x5c/0xfc
   vfs_fstatat+0x4c/0x9c
   SyS_stat64+0x14/0x30
   ret_fast_syscall+0x0/0x34

It seems the LEB used as DATA journal head is GC'ed, and ubifs_tnc_locate()
read an invalid node. But now the property of journal head LEB has
LPROPS_TAKEN flag set and GC will skip these LEBs.

The actual situation of the problem is the LEB is GCed, freed and then
reused as journal head, and finally ubifs_tnc_locate() reads
an invalid node. And it can be reproduced by the following steps:
* create 128 empty files
* overwrite 8 files in backgroup repeatedly to trigger GC
* drop inode cache and stat these 128 empty files repeatedly

We can simply fix the problem by removing the optimization of reading
wbuf when possible. But because taking spin lock and memcpying from
wbuf is much less time-consuming than reading from MTD device, so we fix
the logic of wbuf reading instead.

If the node is not fully contained in write buffer, we will try to
reading the remained node from MTD without any lock, and the
journal head may be switched and GCed, and we will get invalid
node data. So we only read from wbuf if the node fully sits in
the write buffer.

And we also need to check whether or not the current is LEB is GC'ed
and reused as journal head.

Link: https://www.spinics.net/lists/linux-mtd/msg10771.html
Fixes: 601c0bc46753 ("UBIFS: allow for racing between GC and TNC")
Reported-and-analyzed-by: 李傲傲 (Carson Li1/9542) <Carson.Li1@unisoc.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/ubifs/tnc.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
index e8e7b0e9532e..d4c0435d0276 100644
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -1425,6 +1425,74 @@ static int maybe_leb_gced(struct ubifs_info *c, int lnum, int gc_seq1)
 	return 0;
 }
 
+/**
+ * ubifs_check_and_read_wbuf - read node from write-buffer if possible
+ * @c: UBIFS file-system description object
+ * @zbr: the zbranch describing the node to read
+ * @gc_seq: the saved GC sequence used for GC checking
+ * @buf: buffer to read to
+ * @retry: whether try to lookup TNC again
+ *
+ * The function checks whether the node fully sits in the write-buffer
+ * and whether the LEB used by write-buffer is not GCed recently,
+ * then it will read the node, checks it and stores in @buf.
+ *
+ * Returns 1 in case of success, 0 in case of not found, and a negative
+ * error code in case of failure.
+ *
+ * If the node is not in write-buffer and the LEB used by write-buffer
+ * may be GCed recently, @retry will be true, else false.
+ */
+static int ubifs_check_and_read_wbuf(struct ubifs_info *c,
+				     const struct ubifs_zbranch *zbr,
+				     int gc_seq, void *buf, bool *retry)
+{
+	bool found = false;
+	int lnum = zbr->lnum;
+	int offs = zbr->offs;
+	int len = zbr->len;
+	int type;
+	int i;
+	int err;
+
+	*retry = false;
+	for (i = 0; i < c->jhead_cnt; i++) {
+		struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
+
+		/* Check whether the node is fully included in wbuf */
+		spin_lock(&wbuf->lock);
+		if (wbuf->lnum == lnum && wbuf->offs <= offs &&
+		    offs + len <= wbuf->offs + wbuf->used) {
+			/*
+			 * lnum is GC'ed and reused as journal head,
+			 * we need to lookup TNC again.
+			 */
+			if (maybe_leb_gced(c, lnum, gc_seq)) {
+				spin_unlock(&wbuf->lock);
+				*retry = true;
+				break;
+			}
+
+			memcpy(buf, wbuf->buf + offs - wbuf->offs, len);
+			spin_unlock(&wbuf->lock);
+			found = true;
+			break;
+		}
+		spin_unlock(&wbuf->lock);
+	}
+
+	if (!found)
+		return 0;
+
+	type = key_type(c, &zbr->key);
+	err = ubifs_check_node_buf(c, buf, type, len, lnum, offs,
+				   UBIFS_CHK_FORCE_DUMP_BAD_NODE);
+	if (err)
+		return err;
+
+	return 1;
+}
+
 /**
  * ubifs_tnc_locate - look up a file-system node and return it and its location.
  * @c: UBIFS file-system description object
@@ -1444,6 +1512,7 @@ int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key,
 	int found, n, err, safely = 0, gc_seq1;
 	struct ubifs_znode *znode;
 	struct ubifs_zbranch zbr, *zt;
+	bool retry;
 
 again:
 	mutex_lock(&c->tnc_mutex);
@@ -1477,10 +1546,16 @@ int ubifs_tnc_locate(struct ubifs_info *c, const union ubifs_key *key,
 	gc_seq1 = c->gc_seq;
 	mutex_unlock(&c->tnc_mutex);
 
-	if (ubifs_get_wbuf(c, zbr.lnum)) {
-		/* We do not GC journal heads */
-		err = ubifs_tnc_read_node(c, &zbr, node);
+	err = ubifs_check_and_read_wbuf(c, &zbr, gc_seq1, node, &retry);
+	if (err < 0)
 		return err;
+	/* find a valid node */
+	if (err > 0)
+		return 0;
+	/* The node is GC'ed, so lookup it again */
+	if (retry) {
+		safely = 1;
+		goto again;
 	}
 
 	err = fallible_read_node(c, key, &zbr, node);
-- 
2.25.0.4.g0ad7144999


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf
  2020-03-05  9:22 ` [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf Hou Tao
@ 2020-03-16 22:21   ` Richard Weinberger
  2020-03-16 22:33   ` Richard Weinberger
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Weinberger @ 2020-03-16 22:21 UTC (permalink / raw)
  To: Hou Tao; +Cc: Richard Weinberger, linux-mtd, Adrian Hunter, Carson.Li1

Hou Tao,

On Thu, Mar 5, 2020 at 10:15 AM Hou Tao <houtao1@huawei.com> wrote:
>
> Carson Li Reports the following error:
>
>  UBIFS error: ubifs_read_node_wbuf: expected node type 0
>  Not a node, first 24 bytes:
>  Kernel panic - not syncing
>  CPU: 1 PID: 943 Comm: http-thread 4.4.83 #1
>    panic+0x70/0x1e4
>    ubifs_dump_node+0x6c/0x9a0
>    ubifs_read_node_wbuf+0x350/0x384
>    ubifs_tnc_read_node+0x54/0x214
>    ubifs_tnc_locate+0x118/0x1b4
>    ubifs_iget+0xb8/0x68c
>    ubifs_lookup+0x1b4/0x258
>    lookup_real+0x30/0x4c
>    __lookup_hash+0x34/0x3c
>    walk_component+0xec/0x2a0
>    path_lookupat+0x80/0xfc
>    filename_lookup+0x5c/0xfc
>    vfs_fstatat+0x4c/0x9c
>    SyS_stat64+0x14/0x30
>    ret_fast_syscall+0x0/0x34
>
> It seems the LEB used as DATA journal head is GC'ed, and ubifs_tnc_locate()
> read an invalid node. But now the property of journal head LEB has
> LPROPS_TAKEN flag set and GC will skip these LEBs.
>
> The actual situation of the problem is the LEB is GCed, freed and then
> reused as journal head, and finally ubifs_tnc_locate() reads
> an invalid node. And it can be reproduced by the following steps:
> * create 128 empty files
> * overwrite 8 files in backgroup repeatedly to trigger GC
> * drop inode cache and stat these 128 empty files repeatedly
>
> We can simply fix the problem by removing the optimization of reading
> wbuf when possible. But because taking spin lock and memcpying from
> wbuf is much less time-consuming than reading from MTD device, so we fix
> the logic of wbuf reading instead.

I'm digging now into that issue. Did you also experiment with reading while
tnc_mutex is locked? So, no race at all (having safely = 1 by default).
Just to make sure we don't fix an no longer needed optimization.

The code is already anything but trivial and adding more code makes me
nervous.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf
  2020-03-05  9:22 ` [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf Hou Tao
  2020-03-16 22:21   ` Richard Weinberger
@ 2020-03-16 22:33   ` Richard Weinberger
  2020-03-18  1:57     ` Hou Tao
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2020-03-16 22:33 UTC (permalink / raw)
  To: Hou Tao; +Cc: Richard Weinberger, linux-mtd, Adrian Hunter, Carson.Li1

On Thu, Mar 5, 2020 at 10:15 AM Hou Tao <houtao1@huawei.com> wrote:
> The actual situation of the problem is the LEB is GCed, freed and then
> reused as journal head, and finally ubifs_tnc_locate() reads
> an invalid node. And it can be reproduced by the following steps:
> * create 128 empty files
> * overwrite 8 files in backgroup repeatedly to trigger GC
> * drop inode cache and stat these 128 empty files repeatedly

So far I failed to reproduce. Do you have a script?
Or even better, a xfstest?

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf
  2020-03-16 22:33   ` Richard Weinberger
@ 2020-03-18  1:57     ` Hou Tao
  0 siblings, 0 replies; 6+ messages in thread
From: Hou Tao @ 2020-03-18  1:57 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Richard Weinberger, linux-mtd, Adrian Hunter, Carson.Li1

Hi,

On 2020/3/17 6:33, Richard Weinberger wrote:
> On Thu, Mar 5, 2020 at 10:15 AM Hou Tao <houtao1@huawei.com> wrote:
>> The actual situation of the problem is the LEB is GCed, freed and then
>> reused as journal head, and finally ubifs_tnc_locate() reads
>> an invalid node. And it can be reproduced by the following steps:
>> * create 128 empty files
>> * overwrite 8 files in backgroup repeatedly to trigger GC
>> * drop inode cache and stat these 128 empty files repeatedly
> 
> So far I failed to reproduce. Do you have a script?
> Or even better, a xfstest?
> 
You can increase the probability by adding an extra delay (e.g. msleep(1))
between the unlock of tnc_mutex and the call of ubifs_get_wbuf().
And I will try to writ xfstest for the problem.

Regards,
Tao


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05  9:22 [PATCH v3 0/2] fix potential race between ubifs_tnc_locate() and GC Hou Tao
2020-03-05  9:22 ` [PATCH v3 1/2] ubifs: factor out helper ubifs_check_node_buf() Hou Tao
2020-03-05  9:22 ` [PATCH v3 2/2] ubifs: read node from wbuf when it fully sits in wbuf Hou Tao
2020-03-16 22:21   ` Richard Weinberger
2020-03-16 22:33   ` Richard Weinberger
2020-03-18  1:57     ` Hou Tao

Linux-mtd Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mtd/0 linux-mtd/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mtd linux-mtd/ https://lore.kernel.org/linux-mtd \
		linux-mtd@lists.infradead.org
	public-inbox-index linux-mtd

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-mtd


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git