linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees
@ 2019-10-04  9:31 Qu Wenruo
  2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-10-04  9:31 UTC (permalink / raw)
  To: linux-btrfs

There is a false alerts of tree-checker when running fstests/btrfs/063
in a loop.

The bug is caused by commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect
missing INODE_ITEM").
For the full error analyse, please check the first patch.

The first patch will give it a quick fix, so that it can be addressed in
v5.4 release cycle.

The 2nd patch is a more proper patch, with refactor to reduce duplicated
code and add the check to INODE_REF item.
But it's pretty large (+72, -41), not sure if it's suitbale for late
-rc.

Also current write-time tree checker error message is too silent, can't
be caught by fstests nor a quick glance of dmesg. And it doesn't contain
enough info to debug.

So to enhance the error message, and make it more noisy, the 3rd patch
will enhance the error message.

Qu Wenruo (3):
  btrfs: tree-checker: Fix false alerts on log trees
  btrfs: tree-checker: Refactor prev_key check for ino into a function
  btrfs: Enhance the error outputting for write time tree checker

 fs/btrfs/disk-io.c      |   2 +
 fs/btrfs/tree-checker.c | 111 ++++++++++++++++++++++++++--------------
 2 files changed, 74 insertions(+), 39 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
@ 2019-10-04  9:31 ` Qu Wenruo
  2019-10-04 13:52   ` Nikolay Borisov
  2019-10-04 14:15   ` Filipe Manana
  2019-10-04  9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-10-04  9:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

[BUG]
When running btrfs/063 in a loop, we got the following random write time
tree checker error:

  BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
  BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
  BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
          item 0 key (305 1 0) itemoff 16123 itemsize 160
                  inode generation 0 size 0 mode 40777
          item 1 key (305 12 257) itemoff 16111 itemsize 12
          item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
                  extent data disk bytenr 0 nr 0
                  extent data offset 0 nr 614400 ram 671744
          item 3 key (307 108 614400) itemoff 16005 itemsize 53
                  extent data disk bytenr 195342336 nr 57344
                  extent data offset 0 nr 53248 ram 57344
          item 4 key (307 108 667648) itemoff 15952 itemsize 53
                  extent data disk bytenr 194048000 nr 4096
                  extent data offset 0 nr 4096 ram 4096
	  [...]
  BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
  BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-4): forced readonly
  BTRFS warning (device dm-4): Skipping commit of aborted transaction.
  BTRFS info (device dm-4): use zlib compression, level 3
  BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure

[CAUSE]
Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
should have previous key with the same objectid as ino.

But it's only true for fs trees. For log-tree, we can get above log tree
block where an EXTENT_DATA item has no previous key with the same ino.
As log tree only records modified items, it won't record unmodified
items like INODE_ITEM.

So this triggers write time tree check warning.

[FIX]
As a quick fix, check header owner to skip the previous key if it's not
fs tree (log tree doesn't count as fs tree).

This fix is only to be merged as a quick fix.
There will be a more comprehensive fix to refactor the common check into
one function.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index b8f82d9be9f0..5e34cd5e3e2e 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 	 * But if objectids mismatch, it means we have a missing
 	 * INODE_ITEM.
 	 */
-	if (slot > 0 && prev_key->objectid != key->objectid) {
+	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
+	    prev_key->objectid != key->objectid) {
 		file_extent_err(leaf, slot,
 		"invalid previous key objectid, have %llu expect %llu",
 				prev_key->objectid, key->objectid);
@@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
 	u32 cur = 0;
 
 	/* Same check as in check_extent_data_item() */
-	if (slot > 0 && prev_key->objectid != key->objectid) {
+	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
+	    prev_key->objectid != key->objectid) {
 		dir_item_err(leaf, slot,
 		"invalid previous key objectid, have %llu expect %llu",
 			     prev_key->objectid, key->objectid);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function
  2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
  2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
@ 2019-10-04  9:31 ` Qu Wenruo
  2019-10-04  9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo
  2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba
  3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-10-04  9:31 UTC (permalink / raw)
  To: linux-btrfs

Refactor the check for prev_key->objectid of the following key types
into one function, check_prev_ino():
- EXTENT_DATA
- INODE_REF
- DIR_INDEX
- DIR_ITEM
- XATTR_ITEM

Despite the refactor, also add the check of prev_key for INODE_REF.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 113 +++++++++++++++++++++++++---------------
 1 file changed, 72 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 5e34cd5e3e2e..73678393340a 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -125,6 +125,74 @@ static u64 file_extent_end(struct extent_buffer *leaf,
 	return end;
 }
 
+/*
+ * Customized reported for dir_item, only important new info is key->objectid,
+ * which represents inode number
+ */
+__printf(3, 4)
+__cold
+static void dir_item_err(const struct extent_buffer *eb, int slot,
+			 const char *fmt, ...)
+{
+	const struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct btrfs_key key;
+	struct va_format vaf;
+	va_list args;
+
+	btrfs_item_key_to_cpu(eb, &key, slot);
+	va_start(args, fmt);
+
+	vaf.fmt = fmt;
+	vaf.va = &args;
+
+	btrfs_crit(fs_info,
+	"corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV",
+		btrfs_header_level(eb) == 0 ? "leaf" : "node",
+		btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
+		key.objectid, &vaf);
+	va_end(args);
+}
+
+/*
+ * This functions checks prev_key->objectid, to ensure current key and prev_key
+ * shares the same objectid as ino.
+ *
+ * This is to detect missing INODE_ITEM in subvolume trees.
+ *
+ * Return true if everything is OK or we don't need to check.
+ * Return false if anything is wrong.
+ */
+static bool check_prev_ino(struct extent_buffer *leaf,
+			   struct btrfs_key *key, int slot,
+			   struct btrfs_key *prev_key)
+{
+	/* No prev key, skip check */
+	if (slot == 0)
+		return true;
+
+	/* Only these key->types needs to be checked */
+	ASSERT(key->type == BTRFS_XATTR_ITEM_KEY ||
+	       key->type == BTRFS_INODE_REF_KEY ||
+	       key->type == BTRFS_DIR_INDEX_KEY ||
+	       key->type == BTRFS_DIR_ITEM_KEY ||
+	       key->type == BTRFS_EXTENT_DATA_KEY);
+
+	/*
+	 * Only subvolume trees along with their reloc trees needs this check.
+	 * Things like log tree doesn't follow this ino requirement.
+	 */
+	if (!is_fstree(btrfs_header_owner(leaf)))
+		return true;
+
+	if (key->objectid == prev_key->objectid)
+		return true;
+
+	/* Error found */
+	dir_item_err(leaf, slot,
+		"invalid previous key objectid, have %llu expect %llu",
+		prev_key->objectid, key->objectid);
+	return false;
+}
 static int check_extent_data_item(struct extent_buffer *leaf,
 				  struct btrfs_key *key, int slot,
 				  struct btrfs_key *prev_key)
@@ -148,13 +216,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
 	 * But if objectids mismatch, it means we have a missing
 	 * INODE_ITEM.
 	 */
-	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
-	    prev_key->objectid != key->objectid) {
-		file_extent_err(leaf, slot,
-		"invalid previous key objectid, have %llu expect %llu",
-				prev_key->objectid, key->objectid);
+	if (!check_prev_ino(leaf, key, slot, prev_key))
 		return -EUCLEAN;
-	}
 
 	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 
@@ -285,34 +348,6 @@ static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key,
 	return 0;
 }
 
-/*
- * Customized reported for dir_item, only important new info is key->objectid,
- * which represents inode number
- */
-__printf(3, 4)
-__cold
-static void dir_item_err(const struct extent_buffer *eb, int slot,
-			 const char *fmt, ...)
-{
-	const struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct btrfs_key key;
-	struct va_format vaf;
-	va_list args;
-
-	btrfs_item_key_to_cpu(eb, &key, slot);
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	btrfs_crit(fs_info,
-	"corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV",
-		btrfs_header_level(eb) == 0 ? "leaf" : "node",
-		btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
-		key.objectid, &vaf);
-	va_end(args);
-}
-
 static int check_dir_item(struct extent_buffer *leaf,
 			  struct btrfs_key *key, struct btrfs_key *prev_key,
 			  int slot)
@@ -322,14 +357,8 @@ static int check_dir_item(struct extent_buffer *leaf,
 	u32 item_size = btrfs_item_size_nr(leaf, slot);
 	u32 cur = 0;
 
-	/* Same check as in check_extent_data_item() */
-	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
-	    prev_key->objectid != key->objectid) {
-		dir_item_err(leaf, slot,
-		"invalid previous key objectid, have %llu expect %llu",
-			     prev_key->objectid, key->objectid);
+	if (!check_prev_ino(leaf, key, slot, prev_key))
 		return -EUCLEAN;
-	}
 	di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
 	while (cur < item_size) {
 		u32 name_len;
@@ -1266,6 +1295,8 @@ static int check_inode_ref(struct extent_buffer *leaf,
 	unsigned long ptr;
 	unsigned long end;
 
+	if (!check_prev_ino(leaf, key, slot, prev_key))
+		return -EUCLEAN;
 	/* namelen can't be 0, so item_size == sizeof() is also invalid */
 	if (btrfs_item_size_nr(leaf, slot) <= sizeof(*iref)) {
 		inode_ref_err(fs_info, leaf, slot,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker
  2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
  2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
  2019-10-04  9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo
@ 2019-10-04  9:31 ` Qu Wenruo
  2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba
  3 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2019-10-04  9:31 UTC (permalink / raw)
  To: linux-btrfs

Unlike read time tree checker error, write time error can't be inspected
by "btrfs ins dump-tree", so we need extra info to determine what's
going wrong.

The patch will add the following output for write time tree checker
error:
- The content of the offending tree block
  To help determining if it's a false alert.

- Kernel WARN_ON() for debug build
  This is helpful for us to detect unexpected write time tree checker
  error, especially fstests could catch the dmesg.
  Since the WARN_ON() is only triggered for write time tree checker,
  test cases utilizing dm-error won't trigger this WARN_ON(), thus no
  extra noise.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 16dc60b4966d..a0925b4e00af 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -545,9 +545,11 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 		ret = btrfs_check_leaf_full(eb);
 
 	if (ret < 0) {
+		btrfs_print_tree(eb, 0);
 		btrfs_err(fs_info,
 		"block=%llu write time tree block corruption detected",
 			  eb->start);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
 		return ret;
 	}
 	write_extent_buffer(eb, result, 0, csum_size);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
@ 2019-10-04 13:52   ` Nikolay Borisov
  2019-10-04 14:13     ` Filipe Manana
  2019-10-04 14:15   ` Filipe Manana
  1 sibling, 1 reply; 10+ messages in thread
From: Nikolay Borisov @ 2019-10-04 13:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba, Filipe Manana



On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
> [BUG]
> When running btrfs/063 in a loop, we got the following random write time
> tree checker error:
> 
>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
>           item 0 key (305 1 0) itemoff 16123 itemsize 160
>                   inode generation 0 size 0 mode 40777
>           item 1 key (305 12 257) itemoff 16111 itemsize 12
>           item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>                   extent data disk bytenr 0 nr 0
>                   extent data offset 0 nr 614400 ram 671744
>           item 3 key (307 108 614400) itemoff 16005 itemsize 53
>                   extent data disk bytenr 195342336 nr 57344
>                   extent data offset 0 nr 53248 ram 57344
>           item 4 key (307 108 667648) itemoff 15952 itemsize 53
>                   extent data disk bytenr 194048000 nr 4096
>                   extent data offset 0 nr 4096 ram 4096
> 	  [...]
>   BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
>   BTRFS info (device dm-4): forced readonly
>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>   BTRFS info (device dm-4): use zlib compression, level 3
>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
> 
> [CAUSE]
> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> should have previous key with the same objectid as ino.
> 
> But it's only true for fs trees. For log-tree, we can get above log tree
> block where an EXTENT_DATA item has no previous key with the same ino.
> As log tree only records modified items, it won't record unmodified
> items like INODE_ITEM.
> 
> So this triggers write time tree check warning.
> 
> [FIX]
> As a quick fix, check header owner to skip the previous key if it's not
> fs tree (log tree doesn't count as fs tree).
> 
> This fix is only to be merged as a quick fix.
> There will be a more comprehensive fix to refactor the common check into
> one function.
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> Signed-off-by: Qu Wenruo <wqu@suse.com>


It's not entirely clear why this bug manifests. My tests show that when
we write extents we always update the inode's c/m time so it's always
dirtied hence it's logged. OTOH when punching a hole the same thing is
valid.

Filipe, under what conditions should it be possible to log an
EXTENT_DATA item without first logging the inode it belongs to? It seems
using the usual write paths (e.g. buffered write and punchole) that's
impossible?

> ---
>  fs/btrfs/tree-checker.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b8f82d9be9f0..5e34cd5e3e2e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>  	 * But if objectids mismatch, it means we have a missing
>  	 * INODE_ITEM.
>  	 */
> -	if (slot > 0 && prev_key->objectid != key->objectid) {
> +	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +	    prev_key->objectid != key->objectid) {
>  		file_extent_err(leaf, slot,
>  		"invalid previous key objectid, have %llu expect %llu",
>  				prev_key->objectid, key->objectid);
> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
>  	u32 cur = 0;
>  
>  	/* Same check as in check_extent_data_item() */
> -	if (slot > 0 && prev_key->objectid != key->objectid) {
> +	if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +	    prev_key->objectid != key->objectid) {
>  		dir_item_err(leaf, slot,
>  		"invalid previous key objectid, have %llu expect %llu",
>  			     prev_key->objectid, key->objectid);
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04 13:52   ` Nikolay Borisov
@ 2019-10-04 14:13     ` Filipe Manana
  2019-10-04 14:19       ` Nikolay Borisov
  0 siblings, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2019-10-04 14:13 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Qu Wenruo, linux-btrfs, David Sterba

On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
> > [BUG]
> > When running btrfs/063 in a loop, we got the following random write time
> > tree checker error:
> >
> >   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
> >   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
> >   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
> >           item 0 key (305 1 0) itemoff 16123 itemsize 160
> >                   inode generation 0 size 0 mode 40777
> >           item 1 key (305 12 257) itemoff 16111 itemsize 12
> >           item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
> >                   extent data disk bytenr 0 nr 0
> >                   extent data offset 0 nr 614400 ram 671744
> >           item 3 key (307 108 614400) itemoff 16005 itemsize 53
> >                   extent data disk bytenr 195342336 nr 57344
> >                   extent data offset 0 nr 53248 ram 57344
> >           item 4 key (307 108 667648) itemoff 15952 itemsize 53
> >                   extent data disk bytenr 194048000 nr 4096
> >                   extent data offset 0 nr 4096 ram 4096
> >         [...]
> >   BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
> >   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
> >   BTRFS info (device dm-4): forced readonly
> >   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
> >   BTRFS info (device dm-4): use zlib compression, level 3
> >   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
> >
> > [CAUSE]
> > Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> > assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> > should have previous key with the same objectid as ino.
> >
> > But it's only true for fs trees. For log-tree, we can get above log tree
> > block where an EXTENT_DATA item has no previous key with the same ino.
> > As log tree only records modified items, it won't record unmodified
> > items like INODE_ITEM.
> >
> > So this triggers write time tree check warning.
> >
> > [FIX]
> > As a quick fix, check header owner to skip the previous key if it's not
> > fs tree (log tree doesn't count as fs tree).
> >
> > This fix is only to be merged as a quick fix.
> > There will be a more comprehensive fix to refactor the common check into
> > one function.
> >
> > Reported-by: David Sterba <dsterba@suse.com>
> > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
>
>
> It's not entirely clear why this bug manifests. My tests show that when
> we write extents we always update the inode's c/m time so it's always
> dirtied hence it's logged. OTOH when punching a hole the same thing is
> valid.
>
> Filipe, under what conditions should it be possible to log an
> EXTENT_DATA item without first logging the inode it belongs to? It seems
> using the usual write paths (e.g. buffered write and punchole) that's
> impossible?

The tests you did are pointless, none of those operations write to a
log tree, only fsync does that.

This change is perfectly fine. Logging (fsync) always logs the inode
item since commit [1] (2015),
however it might do so after logging extents and other items, and in
between that, if writeback for
the log tree leaf happens we get that error from the tree-checker.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5

>
> > ---
> >  fs/btrfs/tree-checker.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> > index b8f82d9be9f0..5e34cd5e3e2e 100644
> > --- a/fs/btrfs/tree-checker.c
> > +++ b/fs/btrfs/tree-checker.c
> > @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
> >        * But if objectids mismatch, it means we have a missing
> >        * INODE_ITEM.
> >        */
> > -     if (slot > 0 && prev_key->objectid != key->objectid) {
> > +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> > +         prev_key->objectid != key->objectid) {
> >               file_extent_err(leaf, slot,
> >               "invalid previous key objectid, have %llu expect %llu",
> >                               prev_key->objectid, key->objectid);
> > @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
> >       u32 cur = 0;
> >
> >       /* Same check as in check_extent_data_item() */
> > -     if (slot > 0 && prev_key->objectid != key->objectid) {
> > +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> > +         prev_key->objectid != key->objectid) {
> >               dir_item_err(leaf, slot,
> >               "invalid previous key objectid, have %llu expect %llu",
> >                            prev_key->objectid, key->objectid);
> >



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
  2019-10-04 13:52   ` Nikolay Borisov
@ 2019-10-04 14:15   ` Filipe Manana
  2019-10-07 15:31     ` David Sterba
  1 sibling, 1 reply; 10+ messages in thread
From: Filipe Manana @ 2019-10-04 14:15 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Fri, Oct 4, 2019 at 11:27 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> When running btrfs/063 in a loop, we got the following random write time
> tree checker error:
>
>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
>           item 0 key (305 1 0) itemoff 16123 itemsize 160
>                   inode generation 0 size 0 mode 40777
>           item 1 key (305 12 257) itemoff 16111 itemsize 12
>           item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>                   extent data disk bytenr 0 nr 0
>                   extent data offset 0 nr 614400 ram 671744
>           item 3 key (307 108 614400) itemoff 16005 itemsize 53
>                   extent data disk bytenr 195342336 nr 57344
>                   extent data offset 0 nr 53248 ram 57344
>           item 4 key (307 108 667648) itemoff 15952 itemsize 53
>                   extent data disk bytenr 194048000 nr 4096
>                   extent data offset 0 nr 4096 ram 4096
>           [...]
>   BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
>   BTRFS info (device dm-4): forced readonly
>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>   BTRFS info (device dm-4): use zlib compression, level 3
>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
>
> [CAUSE]
> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
> should have previous key with the same objectid as ino.
>
> But it's only true for fs trees. For log-tree, we can get above log tree
> block where an EXTENT_DATA item has no previous key with the same ino.
> As log tree only records modified items, it won't record unmodified
> items like INODE_ITEM.
>
> So this triggers write time tree check warning.
>
> [FIX]
> As a quick fix, check header owner to skip the previous key if it's not
> fs tree (log tree doesn't count as fs tree).
>
> This fix is only to be merged as a quick fix.
> There will be a more comprehensive fix to refactor the common check into
> one function.
>
> Reported-by: David Sterba <dsterba@suse.com>
> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")

So this is bogus, since that commit is not in Linus' tree, and once it
gets there its ID changes.
More likely, this will get squashed into that commit in misc-next
since we are still far from the 5.5 merge window.

> Signed-off-by: Qu Wenruo <wqu@suse.com>

Anyway, the change looks fine to me.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/tree-checker.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index b8f82d9be9f0..5e34cd5e3e2e 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>          * But if objectids mismatch, it means we have a missing
>          * INODE_ITEM.
>          */
> -       if (slot > 0 && prev_key->objectid != key->objectid) {
> +       if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +           prev_key->objectid != key->objectid) {
>                 file_extent_err(leaf, slot,
>                 "invalid previous key objectid, have %llu expect %llu",
>                                 prev_key->objectid, key->objectid);
> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
>         u32 cur = 0;
>
>         /* Same check as in check_extent_data_item() */
> -       if (slot > 0 && prev_key->objectid != key->objectid) {
> +       if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
> +           prev_key->objectid != key->objectid) {
>                 dir_item_err(leaf, slot,
>                 "invalid previous key objectid, have %llu expect %llu",
>                              prev_key->objectid, key->objectid);
> --
> 2.23.0
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04 14:13     ` Filipe Manana
@ 2019-10-04 14:19       ` Nikolay Borisov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2019-10-04 14:19 UTC (permalink / raw)
  To: fdmanana; +Cc: David Sterba, WenRuo Qu, linux-btrfs



On 4.10.19 г. 17:13 ч., Filipe Manana wrote:
> On Fri, Oct 4, 2019 at 2:54 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 4.10.19 г. 12:31 ч., Qu Wenruo wrote:
>>> [BUG]
>>> When running btrfs/063 in a loop, we got the following random write time
>>> tree checker error:
>>>
>>>   BTRFS critical (device dm-4): corrupt leaf: root=18446744073709551610 block=33095680 slot=2 ino=307 file_offset=0, invalid previous key objectid, have 305 expect 307
>>>   BTRFS info (device dm-4): leaf 33095680 gen 7 total ptrs 47 free space 12146 owner 18446744073709551610
>>>   BTRFS info (device dm-4): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 26176
>>>           item 0 key (305 1 0) itemoff 16123 itemsize 160
>>>                   inode generation 0 size 0 mode 40777
>>>           item 1 key (305 12 257) itemoff 16111 itemsize 12
>>>           item 2 key (307 108 0) itemoff 16058 itemsize 53 <<<
>>>                   extent data disk bytenr 0 nr 0
>>>                   extent data offset 0 nr 614400 ram 671744
>>>           item 3 key (307 108 614400) itemoff 16005 itemsize 53
>>>                   extent data disk bytenr 195342336 nr 57344
>>>                   extent data offset 0 nr 53248 ram 57344
>>>           item 4 key (307 108 667648) itemoff 15952 itemsize 53
>>>                   extent data disk bytenr 194048000 nr 4096
>>>                   extent data offset 0 nr 4096 ram 4096
>>>         [...]
>>>   BTRFS error (device dm-4): block=33095680 write time tree block corruption detected
>>>   BTRFS: error (device dm-4) in btrfs_commit_transaction:2332: errno=-5 IO failure (Error while writing out transaction)
>>>   BTRFS info (device dm-4): forced readonly
>>>   BTRFS warning (device dm-4): Skipping commit of aborted transaction.
>>>   BTRFS info (device dm-4): use zlib compression, level 3
>>>   BTRFS: error (device dm-4) in cleanup_transaction:1890: errno=-5 IO failure
>>>
>>> [CAUSE]
>>> Commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
>>> assumes all XATTR_ITEM/DIR_INDEX/DIR_ITEM/INODE_REF/EXTENT_DATA items
>>> should have previous key with the same objectid as ino.
>>>
>>> But it's only true for fs trees. For log-tree, we can get above log tree
>>> block where an EXTENT_DATA item has no previous key with the same ino.
>>> As log tree only records modified items, it won't record unmodified
>>> items like INODE_ITEM.
>>>
>>> So this triggers write time tree check warning.
>>>
>>> [FIX]
>>> As a quick fix, check header owner to skip the previous key if it's not
>>> fs tree (log tree doesn't count as fs tree).
>>>
>>> This fix is only to be merged as a quick fix.
>>> There will be a more comprehensive fix to refactor the common check into
>>> one function.
>>>
>>> Reported-by: David Sterba <dsterba@suse.com>
>>> Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>
>>
>> It's not entirely clear why this bug manifests. My tests show that when
>> we write extents we always update the inode's c/m time so it's always
>> dirtied hence it's logged. OTOH when punching a hole the same thing is
>> valid.
>>
>> Filipe, under what conditions should it be possible to log an
>> EXTENT_DATA item without first logging the inode it belongs to? It seems
>> using the usual write paths (e.g. buffered write and punchole) that's
>> impossible?
> 
> The tests you did are pointless, none of those operations write to a
> log tree, only fsync does that.

You were quick to judge, I tried:
xfs_io -f -c "fpunch 1m 4k" -c "fsync" /media/foo (foo was a 4m, fully
sycned file)

Similar command with the just writing in the middle of the file i.e not
changing isize.

> 
> This change is perfectly fine. Logging (fsync) always logs the inode
> item since commit [1] (2015),
> however it might do so after logging extents and other items, and in
> between that, if writeback for
> the log tree leaf happens we get that error from the tree-checker.

Fair enough, however that clarification about the sequence of events
should be in the changelog.

> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4545de5b035c7debb73d260c78377dbb69cbfb5
> 
>>
>>> ---
>>>  fs/btrfs/tree-checker.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>>> index b8f82d9be9f0..5e34cd5e3e2e 100644
>>> --- a/fs/btrfs/tree-checker.c
>>> +++ b/fs/btrfs/tree-checker.c
>>> @@ -148,7 +148,8 @@ static int check_extent_data_item(struct extent_buffer *leaf,
>>>        * But if objectids mismatch, it means we have a missing
>>>        * INODE_ITEM.
>>>        */
>>> -     if (slot > 0 && prev_key->objectid != key->objectid) {
>>> +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
>>> +         prev_key->objectid != key->objectid) {
>>>               file_extent_err(leaf, slot,
>>>               "invalid previous key objectid, have %llu expect %llu",
>>>                               prev_key->objectid, key->objectid);
>>> @@ -322,7 +323,8 @@ static int check_dir_item(struct extent_buffer *leaf,
>>>       u32 cur = 0;
>>>
>>>       /* Same check as in check_extent_data_item() */
>>> -     if (slot > 0 && prev_key->objectid != key->objectid) {
>>> +     if (slot > 0 && is_fstree(btrfs_header_owner(leaf)) &&
>>> +         prev_key->objectid != key->objectid) {
>>>               dir_item_err(leaf, slot,
>>>               "invalid previous key objectid, have %llu expect %llu",
>>>                            prev_key->objectid, key->objectid);
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: tree-checker: Fix false alerts on log trees
  2019-10-04 14:15   ` Filipe Manana
@ 2019-10-07 15:31     ` David Sterba
  0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2019-10-07 15:31 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs, David Sterba

On Fri, Oct 04, 2019 at 03:15:51PM +0100, Filipe Manana wrote:
> On Fri, Oct 4, 2019 at 11:27 AM Qu Wenruo <wqu@suse.com> wrote:
> > Reported-by: David Sterba <dsterba@suse.com>
> > Fixes: 59b0d030fb30 ("btrfs: tree-checker: Try to detect missing INODE_ITEM")
> 
> So this is bogus, since that commit is not in Linus' tree, and once it
> gets there its ID changes.
> More likely, this will get squashed into that commit in misc-next
> since we are still far from the 5.5 merge window.

You're right, squashing it in is preferred in this case. Split fixes
have bitten us in the past so if we can afford to rebase the devel
queue a single complete patch is preferred.

> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Anyway, the change looks fine to me.
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks, I can add rev-by to "btrfs: tree-checker: Try to detect missing
INODE_ITEM" as well if you want.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees
  2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-10-04  9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo
@ 2019-10-07 16:46 ` David Sterba
  3 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2019-10-07 16:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Oct 04, 2019 at 05:31:30PM +0800, Qu Wenruo wrote:
> There is a false alerts of tree-checker when running fstests/btrfs/063
> in a loop.
> 
> The bug is caused by commit 59b0d030fb30 ("btrfs: tree-checker: Try to detect
> missing INODE_ITEM").
> For the full error analyse, please check the first patch.
> 
> The first patch will give it a quick fix, so that it can be addressed in
> v5.4 release cycle.
> 
> The 2nd patch is a more proper patch, with refactor to reduce duplicated
> code and add the check to INODE_REF item.
> But it's pretty large (+72, -41), not sure if it's suitbale for late
> -rc.
> 
> Also current write-time tree checker error message is too silent, can't
> be caught by fstests nor a quick glance of dmesg. And it doesn't contain
> enough info to debug.
> 
> So to enhance the error message, and make it more noisy, the 3rd patch
> will enhance the error message.
> 
> Qu Wenruo (3):
>   btrfs: tree-checker: Fix false alerts on log trees
>   btrfs: tree-checker: Refactor prev_key check for ino into a function
>   btrfs: Enhance the error outputting for write time tree checker

Patch 1 folded to the original patch and 2 and 2 now in misc-next,
thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-07 16:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04  9:31 [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees Qu Wenruo
2019-10-04  9:31 ` [PATCH 1/3] btrfs: tree-checker: Fix false alerts on " Qu Wenruo
2019-10-04 13:52   ` Nikolay Borisov
2019-10-04 14:13     ` Filipe Manana
2019-10-04 14:19       ` Nikolay Borisov
2019-10-04 14:15   ` Filipe Manana
2019-10-07 15:31     ` David Sterba
2019-10-04  9:31 ` [PATCH 2/3] btrfs: tree-checker: Refactor prev_key check for ino into a function Qu Wenruo
2019-10-04  9:31 ` [PATCH 3/3] btrfs: Enhance the error outputting for write time tree checker Qu Wenruo
2019-10-07 16:46 ` [PATCH 0/3] btrfs: tree-checker: False alerts fixes for log trees David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).