Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/5] btrfs: Enhancement to tree block validation
@ 2019-01-18  2:19 Qu Wenruo
  2019-01-18  2:19 ` [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.0-rc1 tag.

This patchset has the following two features:
- Tree block validation output enhancement
  * Output validation failure timing (write time or read time)
  * Always output tree block level/key mismatch error message
    This part is already submitted and reviewed.

- Write time tree block validation check
  To catch memory corruption either from hardware or kernel.
  Example output would be:

    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): write time tree block corruption detected
    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): write time tree block corruption detected
    BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
    BTRFS info (device dm-3): forced readonly
    BTRFS warning (device dm-3): Skipping commit of aborted transaction.
    BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
    BTRFS info (device dm-3): delayed_refs has NO entry

Changelog:
v2:
- Unlock locked pages in lock_extent_buffer_for_io() for error handling.
- Added Reviewed-by tags.

v3:
- Remove duplicated error message.
- Use IS_ENABLED() macro to replace #ifdef.
- Added Reviewed-by tags.

Qu Wenruo (5):
  btrfs: Always output error message when key/level verification fails
  btrfs: extent_io: Kill the forward declaration of flush_write_bio()
  btrfs: extent_io: Kill the BUG_ON() in flush_write_bio()
  btrfs: disk-io: Show the timing of corrupted tree block explicitly
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c   |  21 +++++--
 fs/btrfs/extent_io.c | 135 +++++++++++++++++++++++++++----------------
 2 files changed, 101 insertions(+), 55 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
@ 2019-01-18  2:19 ` Qu Wenruo
  2019-01-18  7:38   ` Johannes Thumshirn
  2019-01-18  2:19 ` [PATCH v3 2/5] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

We have internal report of strange transaction abort due to EUCLEAN
without any error message.

Since error message inside verify_level_key() is only enabled for
CONFIG_BTRFS_DEBUG, the error message won't output for most distro.

This patch will make the error message mandatory, so when problem
happens we know what's causing the problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8da2f380d3c0..794d5bb7fe33 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -423,12 +423,11 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 
 	found_level = btrfs_header_level(eb);
 	if (found_level != level) {
-#ifdef CONFIG_BTRFS_DEBUG
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree level check failed\n");
 		btrfs_err(fs_info,
 "tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
 			  eb->start, level, found_level);
-#endif
 		return -EIO;
 	}
 
@@ -449,9 +448,9 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 		btrfs_item_key_to_cpu(eb, &found_key, 0);
 	ret = btrfs_comp_cpu_keys(first_key, &found_key);
 
-#ifdef CONFIG_BTRFS_DEBUG
 	if (ret) {
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree first key check failed\n");
 		btrfs_err(fs_info,
 "tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)",
 			  eb->start, parent_transid, first_key->objectid,
@@ -459,7 +458,6 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 			  found_key.objectid, found_key.type,
 			  found_key.offset);
 	}
-#endif
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] btrfs: extent_io: Kill the forward declaration of flush_write_bio()
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
  2019-01-18  2:19 ` [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails Qu Wenruo
@ 2019-01-18  2:19 ` Qu Wenruo
  2019-01-18  2:19 ` [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio() Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Johannes Thumshirn

There is no need to forward declare flush_write_bio(), as it only
depends on submit_one_bio().

Both of them are pretty small, just move them to kill the forward
declaration.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/extent_io.c | 66 +++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 52abe4082680..8a2335713a2d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -147,7 +147,38 @@ static int add_extent_changeset(struct extent_state *state, unsigned bits,
 	return ret;
 }
 
-static void flush_write_bio(struct extent_page_data *epd);
+static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
+				       unsigned long bio_flags)
+{
+	blk_status_t ret = 0;
+	struct bio_vec *bvec = bio_last_bvec_all(bio);
+	struct page *page = bvec->bv_page;
+	struct extent_io_tree *tree = bio->bi_private;
+	u64 start;
+
+	start = page_offset(page) + bvec->bv_offset;
+
+	bio->bi_private = NULL;
+
+	if (tree->ops)
+		ret = tree->ops->submit_bio_hook(tree->private_data, bio,
+					   mirror_num, bio_flags, start);
+	else
+		btrfsic_submit_bio(bio);
+
+	return blk_status_to_errno(ret);
+}
+
+static void flush_write_bio(struct extent_page_data *epd)
+{
+	if (epd->bio) {
+		int ret;
+
+		ret = submit_one_bio(epd->bio, 0, 0);
+		BUG_ON(ret < 0); /* -ENOMEM */
+		epd->bio = NULL;
+	}
+}
 
 int __init extent_io_init(void)
 {
@@ -2692,28 +2723,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
 	return bio;
 }
 
-static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
-				       unsigned long bio_flags)
-{
-	blk_status_t ret = 0;
-	struct bio_vec *bvec = bio_last_bvec_all(bio);
-	struct page *page = bvec->bv_page;
-	struct extent_io_tree *tree = bio->bi_private;
-	u64 start;
-
-	start = page_offset(page) + bvec->bv_offset;
-
-	bio->bi_private = NULL;
-
-	if (tree->ops)
-		ret = tree->ops->submit_bio_hook(tree->private_data, bio,
-					   mirror_num, bio_flags, start);
-	else
-		btrfsic_submit_bio(bio);
-
-	return blk_status_to_errno(ret);
-}
-
 /*
  * @opf:	bio REQ_OP_* and REQ_* flags as one value
  * @tree:	tree so we can call our merge_bio hook
@@ -4007,17 +4016,6 @@ static int extent_write_cache_pages(struct address_space *mapping,
 	return ret;
 }
 
-static void flush_write_bio(struct extent_page_data *epd)
-{
-	if (epd->bio) {
-		int ret;
-
-		ret = submit_one_bio(epd->bio, 0, 0);
-		BUG_ON(ret < 0); /* -ENOMEM */
-		epd->bio = NULL;
-	}
-}
-
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio()
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
  2019-01-18  2:19 ` [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails Qu Wenruo
  2019-01-18  2:19 ` [PATCH v3 2/5] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
@ 2019-01-18  2:19 ` Qu Wenruo
  2019-01-22 17:38   ` David Sterba
  2019-01-18  2:19 ` [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs

This BUG_ON() is really just a crappy way to workaround the _must_check
attribute of submit_one_bio().

Now kill the BUG_ON() and allow flush_write_bio() to return error
number.

Also add _must_check attribute to flush_write_bio(), and modify all
callers to handle the possible error returned.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 77 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8a2335713a2d..a60f3ec22053 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -169,15 +169,15 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 	return blk_status_to_errno(ret);
 }
 
-static void flush_write_bio(struct extent_page_data *epd)
+static int __must_check flush_write_bio(struct extent_page_data *epd)
 {
-	if (epd->bio) {
-		int ret;
+	int ret = 0;
 
+	if (epd->bio) {
 		ret = submit_one_bio(epd->bio, 0, 0);
-		BUG_ON(ret < 0); /* -ENOMEM */
 		epd->bio = NULL;
 	}
+	return ret;
 }
 
 int __init extent_io_init(void)
@@ -3504,13 +3504,15 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 			  struct btrfs_fs_info *fs_info,
 			  struct extent_page_data *epd)
 {
-	int i, num_pages;
+	int i, num_pages, failed_page_nr;
 	int flush = 0;
 	int ret = 0;
 
 	if (!btrfs_try_tree_write_lock(eb)) {
+		ret = flush_write_bio(epd);
+		if (ret < 0)
+			return ret;
 		flush = 1;
-		flush_write_bio(epd);
 		btrfs_tree_lock(eb);
 	}
 
@@ -3519,7 +3521,9 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		if (!epd->sync_io)
 			return 0;
 		if (!flush) {
-			flush_write_bio(epd);
+			ret = flush_write_bio(epd);
+			if (ret < 0)
+				return ret;
 			flush = 1;
 		}
 		while (1) {
@@ -3560,7 +3564,11 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				if (ret < 0) {
+					failed_page_nr = i;
+					goto err_unlock;
+				}
 				flush = 1;
 			}
 			lock_page(p);
@@ -3568,6 +3576,15 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 	}
 
 	return ret;
+
+err_unlock:
+	/* Unlock these already locked pages */
+	for (i = 0; i < failed_page_nr; i++) {
+		struct page *p = eb->pages[i];
+
+		unlock_page(p);
+	}
+	return ret;
 }
 
 static void end_extent_buffer_writeback(struct extent_buffer *eb)
@@ -3751,6 +3768,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 		.sync_io = wbc->sync_mode == WB_SYNC_ALL,
 	};
 	int ret = 0;
+	int flush_ret;
 	int done = 0;
 	int nr_to_write_done = 0;
 	struct pagevec pvec;
@@ -3818,6 +3836,11 @@ int btree_write_cache_pages(struct address_space *mapping,
 
 			prev_eb = eb;
 			ret = lock_extent_buffer_for_io(eb, fs_info, &epd);
+			if (ret < 0) {
+				free_extent_buffer(eb);
+				done = 1;
+				break;
+			}
 			if (!ret) {
 				free_extent_buffer(eb);
 				continue;
@@ -3850,8 +3873,10 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	flush_write_bio(&epd);
-	return ret;
+	flush_ret = flush_write_bio(&epd);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 /**
@@ -3947,7 +3972,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * tmpfs file mapping
 			 */
 			if (!trylock_page(page)) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				if (ret < 0)
+					break;
 				lock_page(page);
 			}
 
@@ -3957,8 +3984,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (wbc->sync_mode != WB_SYNC_NONE) {
-				if (PageWriteback(page))
-					flush_write_bio(epd);
+				if (PageWriteback(page)) {
+					ret = flush_write_bio(epd);
+					if (ret < 0)
+						break;
+				}
 				wait_on_page_writeback(page);
 			}
 
@@ -4019,6 +4049,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(page->mapping->host)->io_tree,
@@ -4028,14 +4059,17 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 
 	ret = __extent_writepage(page, wbc, &epd);
 
-	flush_write_bio(&epd);
-	return ret;
+	flush_ret = flush_write_bio(&epd);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 			      int mode)
 {
 	int ret = 0;
+	int flush_ret;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct page *page;
@@ -4068,14 +4102,17 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 		start += PAGE_SIZE;
 	}
 
-	flush_write_bio(&epd);
-	return ret;
+	flush_ret = flush_write_bio(&epd);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc)
 {
 	int ret = 0;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(mapping->host)->io_tree,
@@ -4084,8 +4121,10 @@ int extent_writepages(struct address_space *mapping,
 	};
 
 	ret = extent_write_cache_pages(mapping, wbc, &epd);
-	flush_write_bio(&epd);
-	return ret;
+	flush_ret = flush_write_bio(&epd);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_readpages(struct address_space *mapping, struct list_head *pages,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-01-18  2:19 ` [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio() Qu Wenruo
@ 2019-01-18  2:19 ` Qu Wenruo
  2019-01-18  7:39   ` Johannes Thumshirn
  2019-01-18  2:19 ` [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Just add one extra line to show when the corruption is detected.
Currently only read time detection is possible.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 794d5bb7fe33..426e9f450f70 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 
 	if (!ret)
 		set_extent_buffer_uptodate(eb);
+	else
+		btrfs_err(fs_info, "read time tree block corrupted detected");
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-01-18  2:19 ` [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-01-18  2:19 ` Qu Wenruo
  2019-01-18  7:48   ` Johannes Thumshirn
  2019-01-22 17:47 ` [PATCH v3 0/5] btrfs: Enhancement to tree block validation David Sterba
  2019-01-23 17:16 ` David Sterba
  6 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-01-18  2:19 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Leonard Lausen

There are at least 2 reports about memory bit flip sneaking into on-disk
data.

Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory, only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build.

This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.

The timing is set to csum_tree_block() where @verify == 0.
At that timing, we're generating csum for tree blocks before submitting
the metadata bio, so we could avoid all the unnecessary calls at
btrfs_mark_buffer_dirty(), but still catch enough error.

The example error output will be something like:
  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): write time tree block corruption detected
  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): write time tree block corruption detected
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3): forced readonly
  BTRFS warning (device dm-3): Skipping commit of aborted transaction.
  BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
  BTRFS info (device dm-3): delayed_refs has NO entry

Reported-by: Leonard Lausen <leonard@lausen.nl>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 426e9f450f70..68d75a3b15c5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -313,6 +313,15 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
 			return -EUCLEAN;
 		}
 	} else {
+		if (btrfs_header_level(buf))
+			err = btrfs_check_node(fs_info, buf);
+		else
+			err = btrfs_check_leaf_full(fs_info, buf);
+		if (err < 0) {
+			btrfs_err(fs_info,
+				  "write time tree block corruption detected");
+			return err;
+		}
 		write_extent_buffer(buf, result, 0, csum_size);
 	}
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails
  2019-01-18  2:19 ` [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails Qu Wenruo
@ 2019-01-18  7:38   ` Johannes Thumshirn
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Thumshirn @ 2019-01-18  7:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-18  2:19 ` [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-01-18  7:39   ` Johannes Thumshirn
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Thumshirn @ 2019-01-18  7:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio
  2019-01-18  2:19 ` [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
@ 2019-01-18  7:48   ` Johannes Thumshirn
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Thumshirn @ 2019-01-18  7:48 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Leonard Lausen

Looks good,
Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio()
  2019-01-18  2:19 ` [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio() Qu Wenruo
@ 2019-01-22 17:38   ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2019-01-22 17:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 18, 2019 at 10:19:54AM +0800, Qu Wenruo wrote:
> This BUG_ON() is really just a crappy way to workaround the _must_check
> attribute of submit_one_bio().
> 
> Now kill the BUG_ON() and allow flush_write_bio() to return error
> number.
> 
> Also add _must_check attribute to flush_write_bio(), and modify all
> callers to handle the possible error returned.

Can you please spit that to several steps:

1. handle errors in submit_one_bio or pass it to the callers
   ie. drop the BUG_ON and move it to all callers

2. in all callers do
   ret = flush_write_bio(...)
   BUG_ON(ret)

So now it's one level up in the call chain and up to all callers to
handle the errors properly. The code is equivalent to the previous
state, though there are more BUG_ONs.

3. one patch per function that handles errors of flush_write_bio, ie.
   actual replacement of BUG_ON with if (ret < 0) etc

As there are several different functions, each has own things to clean
up and it's easier to review them one by one. Sometimes it's necessary
to check more callers and keeping multiple contexts in mind at once does
not work very well.

Counting all the affected functions:

lock_extent_buffer_for_io(struct extent_buffer *eb,
btree_write_cache_pages(struct address_space *mapping,
extent_write_cache_pages(struct address_space *mapping,
extent_write_full_page(struct page *page, struct writeback_control *wbc)
extent_write_locked_range(struct inode *inode, u64 start, u64 end,
extent_writepages(struct address_space *mapping,

the 3rd point would produce 6 patches.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (4 preceding siblings ...)
  2019-01-18  2:19 ` [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
@ 2019-01-22 17:47 ` David Sterba
  2019-01-22 22:53   ` Qu Wenruo
  2019-01-23 17:16 ` David Sterba
  6 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2019-01-22 17:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
> Patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/write_time_tree_checker
> Which is based on v5.0-rc1 tag.
> 
> This patchset has the following two features:
> - Tree block validation output enhancement
>   * Output validation failure timing (write time or read time)
>   * Always output tree block level/key mismatch error message
>     This part is already submitted and reviewed.
> 
> - Write time tree block validation check
>   To catch memory corruption either from hardware or kernel.
>   Example output would be:
> 
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): write time tree block corruption detected
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): write time tree block corruption detected

Would it be possible to print this message only once? It's correct when
it's the last message after all the corruptions are detected as there
could be a lot of text that can scroll off the screen. The real reason
what happend would stay and give enough clue what happened.

>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>     BTRFS info (device dm-3): forced readonly
>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>     BTRFS info (device dm-3): delayed_refs has NO entry

That's not from your patch es but now that I see it, it's not for the
'info' level, maybe 'debug' or not printed at all.

The extra checks will cause some slowdown, do we have an estimate how
much? 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-22 17:47 ` [PATCH v3 0/5] btrfs: Enhancement to tree block validation David Sterba
@ 2019-01-22 22:53   ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-01-22 22:53 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2248 bytes --]



On 2019/1/23 上午1:47, David Sterba wrote:
> On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
>> Patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/write_time_tree_checker
>> Which is based on v5.0-rc1 tag.
>>
>> This patchset has the following two features:
>> - Tree block validation output enhancement
>>   * Output validation failure timing (write time or read time)
>>   * Always output tree block level/key mismatch error message
>>     This part is already submitted and reviewed.
>>
>> - Write time tree block validation check
>>   To catch memory corruption either from hardware or kernel.
>>   Example output would be:
>>
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
> 
> Would it be possible to print this message only once?

I think it's caused by DUP profile.

I'll change to make it abort.

> It's correct when
> it's the last message after all the corruptions are detected as there
> could be a lot of text that can scroll off the screen. The real reason
> what happend would stay and give enough clue what happened.
> 
>>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>>     BTRFS info (device dm-3): forced readonly
>>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>>     BTRFS info (device dm-3): delayed_refs has NO entry
> 
> That's not from your patch es but now that I see it, it's not for the
> 'info' level, maybe 'debug' or not printed at all.
> 
> The extra checks will cause some slowdown, do we have an estimate how
> much?

In theory it should be no slower than an extra csum run for tree blocks.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (5 preceding siblings ...)
  2019-01-22 17:47 ` [PATCH v3 0/5] btrfs: Enhancement to tree block validation David Sterba
@ 2019-01-23 17:16 ` David Sterba
  2019-01-24  0:08   ` Qu Wenruo
  2019-01-24  3:04   ` Qu Wenruo
  6 siblings, 2 replies; 16+ messages in thread
From: David Sterba @ 2019-01-23 17:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
> Patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/write_time_tree_checker
> Which is based on v5.0-rc1 tag.
> 
> This patchset has the following two features:
> - Tree block validation output enhancement
>   * Output validation failure timing (write time or read time)
>   * Always output tree block level/key mismatch error message
>     This part is already submitted and reviewed.
> 
> - Write time tree block validation check
>   To catch memory corruption either from hardware or kernel.
>   Example output would be:
> 
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): write time tree block corruption detected
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): write time tree block corruption detected
>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>     BTRFS info (device dm-3): forced readonly
>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>     BTRFS info (device dm-3): delayed_refs has NO entry

Two tests complain:

btrfs/139
[ 7064.718943] run fstests btrfs/139 at 2019-01-23 15:56:29
[ 7065.345503] BTRFS info (device vda): disk space caching is enabled
[ 7065.347577] BTRFS info (device vda): has skinny extents
[ 7065.666692] BTRFS: device fsid bc4c99e7-906a-44fd-b797-66eb7c9592d7 devid 1 transid 5 /dev/vdb
[ 7065.684618] BTRFS info (device vdb): disk space caching is enabled
[ 7065.687310] BTRFS info (device vdb): has skinny extents
[ 7065.688887] BTRFS info (device vdb): flagging fs with big metadata feature
[ 7065.693432] BTRFS info (device vdb): checking UUID tree
[ 7065.743019] BTRFS warning (device vdb): qgroup rescan is already in progress
[ 7065.746433] BTRFS info (device vdb): qgroup scan completed (inconsistency flag cleared)
[ 7075.359872] BTRFS critical (device vdb): corrupt leaf: root=7 block=31653888 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
[ 7075.363155] BTRFS error (device vdb): write time tree block corruption detected
[ 7077.578167] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
[ 7077.583615] BTRFS info (device vdb): forced readonly
[ 7077.585797] BTRFS warning (device vdb): Skipping commit of aborted transaction.
[ 7077.588813] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure

generic/344

[11246.567119] run fstests generic/344 at 2019-01-23 17:06:11
[11246.652295] BTRFS info (device vda): disk space caching is enabled
[11246.654434] BTRFS info (device vda): has skinny extents
[11246.730069] BTRFS: device fsid b82bc1cd-b380-4eff-a40a-1924c3bf0580 devid 1 transid 5 /dev/vdb
[11246.740955] BTRFS info (device vdb): disk space caching is enabled
[11246.742694] BTRFS info (device vdb): has skinny extents
[11246.744173] BTRFS info (device vdb): flagging fs with big metadata feature
[11246.772934] BTRFS info (device vdb): checking UUID tree
[11300.857850] BTRFS critical (device vdb): corrupt leaf: root=7 block=31080448 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
[11300.861739] BTRFS error (device vdb): write time tree block corruption detected
[11300.864232] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
[11300.867704] BTRFS info (device vdb): forced readonly
[11300.869355] BTRFS warning (device vdb): Skipping commit of aborted transaction.
[11300.871857] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure
[11300.874292] BTRFS info (device vdb): delayed_refs has NO entry
[11301.261978] BTRFS info (device vdb): disk space caching is enabled
[11301.264639] BTRFS info (device vdb): has skinny extents
[11301.266467] BTRFS info (device vdb): flagging fs with big metadata feature
[11301.299713] BTRFS info (device vdb): checking UUID tree
 [17:07:06]- output mismatch (see /tmp/fstests/results//generic/344.out.bad)
    --- tests/generic/344.out   2018-04-12 16:57:00.652225551 +0000
    +++ /tmp/fstests/results//generic/344.out.bad       2019-01-23 17:07:06.424000000 +0000
    @@ -17,6 +17,7 @@
     INFO: thread 0 created
     INFO: thread 1 created
     INFO: 0 error(s) detected
    +unlink(): Read-only file system

     INFO: zero-filled test...
     INFO: sz = 268435456
    ...
    (Run 'diff -u /tmp/fstests/tests/generic/344.out /tmp/fstests/results//generic/344.out.bad'  to see the entire diff)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-23 17:16 ` David Sterba
@ 2019-01-24  0:08   ` Qu Wenruo
  2019-01-24  3:04   ` Qu Wenruo
  1 sibling, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2019-01-24  0:08 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 5340 bytes --]



On 2019/1/24 上午1:16, David Sterba wrote:
> On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
>> Patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/write_time_tree_checker
>> Which is based on v5.0-rc1 tag.
>>
>> This patchset has the following two features:
>> - Tree block validation output enhancement
>>   * Output validation failure timing (write time or read time)
>>   * Always output tree block level/key mismatch error message
>>     This part is already submitted and reviewed.
>>
>> - Write time tree block validation check
>>   To catch memory corruption either from hardware or kernel.
>>   Example output would be:
>>
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
>>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>>     BTRFS info (device dm-3): forced readonly
>>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>>     BTRFS info (device dm-3): delayed_refs has NO entry
> 
> Two tests complain:
> 
> btrfs/139
> [ 7064.718943] run fstests btrfs/139 at 2019-01-23 15:56:29
> [ 7065.345503] BTRFS info (device vda): disk space caching is enabled
> [ 7065.347577] BTRFS info (device vda): has skinny extents
> [ 7065.666692] BTRFS: device fsid bc4c99e7-906a-44fd-b797-66eb7c9592d7 devid 1 transid 5 /dev/vdb
> [ 7065.684618] BTRFS info (device vdb): disk space caching is enabled
> [ 7065.687310] BTRFS info (device vdb): has skinny extents
> [ 7065.688887] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 7065.693432] BTRFS info (device vdb): checking UUID tree
> [ 7065.743019] BTRFS warning (device vdb): qgroup rescan is already in progress
> [ 7065.746433] BTRFS info (device vdb): qgroup scan completed (inconsistency flag cleared)
> [ 7075.359872] BTRFS critical (device vdb): corrupt leaf: root=7 block=31653888 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
> [ 7075.363155] BTRFS error (device vdb): write time tree block corruption detected

This looks strange.

In commit transaction time, we should be able to get correct csum root
node, but it obviously still didn't detect it correctly.

I'll address it.

Thanks,
Qu

> [ 7077.578167] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> [ 7077.583615] BTRFS info (device vdb): forced readonly
> [ 7077.585797] BTRFS warning (device vdb): Skipping commit of aborted transaction.
> [ 7077.588813] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure
> 
> generic/344
> 
> [11246.567119] run fstests generic/344 at 2019-01-23 17:06:11
> [11246.652295] BTRFS info (device vda): disk space caching is enabled
> [11246.654434] BTRFS info (device vda): has skinny extents
> [11246.730069] BTRFS: device fsid b82bc1cd-b380-4eff-a40a-1924c3bf0580 devid 1 transid 5 /dev/vdb
> [11246.740955] BTRFS info (device vdb): disk space caching is enabled
> [11246.742694] BTRFS info (device vdb): has skinny extents
> [11246.744173] BTRFS info (device vdb): flagging fs with big metadata feature
> [11246.772934] BTRFS info (device vdb): checking UUID tree
> [11300.857850] BTRFS critical (device vdb): corrupt leaf: root=7 block=31080448 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
> [11300.861739] BTRFS error (device vdb): write time tree block corruption detected
> [11300.864232] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> [11300.867704] BTRFS info (device vdb): forced readonly
> [11300.869355] BTRFS warning (device vdb): Skipping commit of aborted transaction.
> [11300.871857] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure
> [11300.874292] BTRFS info (device vdb): delayed_refs has NO entry
> [11301.261978] BTRFS info (device vdb): disk space caching is enabled
> [11301.264639] BTRFS info (device vdb): has skinny extents
> [11301.266467] BTRFS info (device vdb): flagging fs with big metadata feature
> [11301.299713] BTRFS info (device vdb): checking UUID tree
>  [17:07:06]- output mismatch (see /tmp/fstests/results//generic/344.out.bad)
>     --- tests/generic/344.out   2018-04-12 16:57:00.652225551 +0000
>     +++ /tmp/fstests/results//generic/344.out.bad       2019-01-23 17:07:06.424000000 +0000
>     @@ -17,6 +17,7 @@
>      INFO: thread 0 created
>      INFO: thread 1 created
>      INFO: 0 error(s) detected
>     +unlink(): Read-only file system
> 
>      INFO: zero-filled test...
>      INFO: sz = 268435456
>     ...
>     (Run 'diff -u /tmp/fstests/tests/generic/344.out /tmp/fstests/results//generic/344.out.bad'  to see the entire diff)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-23 17:16 ` David Sterba
  2019-01-24  0:08   ` Qu Wenruo
@ 2019-01-24  3:04   ` Qu Wenruo
  2019-01-24 15:20     ` David Sterba
  1 sibling, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2019-01-24  3:04 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 5331 bytes --]



On 2019/1/24 上午1:16, David Sterba wrote:
> On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
>> Patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/write_time_tree_checker
>> Which is based on v5.0-rc1 tag.
>>
>> This patchset has the following two features:
>> - Tree block validation output enhancement
>>   * Output validation failure timing (write time or read time)
>>   * Always output tree block level/key mismatch error message
>>     This part is already submitted and reviewed.
>>
>> - Write time tree block validation check
>>   To catch memory corruption either from hardware or kernel.
>>   Example output would be:
>>
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): write time tree block corruption detected
>>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>>     BTRFS info (device dm-3): forced readonly
>>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>>     BTRFS info (device dm-3): delayed_refs has NO entry
> 
> Two tests complain:

Any info about the reproducibility and VM/hardware info? Especially for
the VM ram size.

I have looped these two test cases for 32 times, still no triggering.

Thanks,
Qu

> 
> btrfs/139
> [ 7064.718943] run fstests btrfs/139 at 2019-01-23 15:56:29
> [ 7065.345503] BTRFS info (device vda): disk space caching is enabled
> [ 7065.347577] BTRFS info (device vda): has skinny extents
> [ 7065.666692] BTRFS: device fsid bc4c99e7-906a-44fd-b797-66eb7c9592d7 devid 1 transid 5 /dev/vdb
> [ 7065.684618] BTRFS info (device vdb): disk space caching is enabled
> [ 7065.687310] BTRFS info (device vdb): has skinny extents
> [ 7065.688887] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 7065.693432] BTRFS info (device vdb): checking UUID tree
> [ 7065.743019] BTRFS warning (device vdb): qgroup rescan is already in progress
> [ 7065.746433] BTRFS info (device vdb): qgroup scan completed (inconsistency flag cleared)
> [ 7075.359872] BTRFS critical (device vdb): corrupt leaf: root=7 block=31653888 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
> [ 7075.363155] BTRFS error (device vdb): write time tree block corruption detected
> [ 7077.578167] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> [ 7077.583615] BTRFS info (device vdb): forced readonly
> [ 7077.585797] BTRFS warning (device vdb): Skipping commit of aborted transaction.
> [ 7077.588813] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure
> 
> generic/344
> 
> [11246.567119] run fstests generic/344 at 2019-01-23 17:06:11
> [11246.652295] BTRFS info (device vda): disk space caching is enabled
> [11246.654434] BTRFS info (device vda): has skinny extents
> [11246.730069] BTRFS: device fsid b82bc1cd-b380-4eff-a40a-1924c3bf0580 devid 1 transid 5 /dev/vdb
> [11246.740955] BTRFS info (device vdb): disk space caching is enabled
> [11246.742694] BTRFS info (device vdb): has skinny extents
> [11246.744173] BTRFS info (device vdb): flagging fs with big metadata feature
> [11246.772934] BTRFS info (device vdb): checking UUID tree
> [11300.857850] BTRFS critical (device vdb): corrupt leaf: root=7 block=31080448 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
> [11300.861739] BTRFS error (device vdb): write time tree block corruption detected
> [11300.864232] BTRFS: error (device vdb) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> [11300.867704] BTRFS info (device vdb): forced readonly
> [11300.869355] BTRFS warning (device vdb): Skipping commit of aborted transaction.
> [11300.871857] BTRFS: error (device vdb) in cleanup_transaction:1839: errno=-5 IO failure
> [11300.874292] BTRFS info (device vdb): delayed_refs has NO entry
> [11301.261978] BTRFS info (device vdb): disk space caching is enabled
> [11301.264639] BTRFS info (device vdb): has skinny extents
> [11301.266467] BTRFS info (device vdb): flagging fs with big metadata feature
> [11301.299713] BTRFS info (device vdb): checking UUID tree
>  [17:07:06]- output mismatch (see /tmp/fstests/results//generic/344.out.bad)
>     --- tests/generic/344.out   2018-04-12 16:57:00.652225551 +0000
>     +++ /tmp/fstests/results//generic/344.out.bad       2019-01-23 17:07:06.424000000 +0000
>     @@ -17,6 +17,7 @@
>      INFO: thread 0 created
>      INFO: thread 1 created
>      INFO: 0 error(s) detected
>     +unlink(): Read-only file system
> 
>      INFO: zero-filled test...
>      INFO: sz = 268435456
>     ...
>     (Run 'diff -u /tmp/fstests/tests/generic/344.out /tmp/fstests/results//generic/344.out.bad'  to see the entire diff)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/5] btrfs: Enhancement to tree block validation
  2019-01-24  3:04   ` Qu Wenruo
@ 2019-01-24 15:20     ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2019-01-24 15:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Jan 24, 2019 at 11:04:47AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/1/24 上午1:16, David Sterba wrote:
> > On Fri, Jan 18, 2019 at 10:19:51AM +0800, Qu Wenruo wrote:
> >> Patchset can be fetched from github:
> >> https://github.com/adam900710/linux/tree/write_time_tree_checker
> >> Which is based on v5.0-rc1 tag.
> >>
> >> This patchset has the following two features:
> >> - Tree block validation output enhancement
> >>   * Output validation failure timing (write time or read time)
> >>   * Always output tree block level/key mismatch error message
> >>     This part is already submitted and reviewed.
> >>
> >> - Write time tree block validation check
> >>   To catch memory corruption either from hardware or kernel.
> >>   Example output would be:
> >>
> >>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
> >>     BTRFS error (device dm-3): write time tree block corruption detected
> >>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
> >>     BTRFS error (device dm-3): write time tree block corruption detected
> >>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> >>     BTRFS info (device dm-3): forced readonly
> >>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
> >>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
> >>     BTRFS info (device dm-3): delayed_refs has NO entry
> > 
> > Two tests complain:
> 
> Any info about the reproducibility and VM/hardware info? Especially for
> the VM ram size.

2G ram with 4 cpus, this was first run of the patchset so I can't say
how reliable it is yet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-18  2:19 [PATCH v3 0/5] btrfs: Enhancement to tree block validation Qu Wenruo
2019-01-18  2:19 ` [PATCH v3 1/5] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-01-18  7:38   ` Johannes Thumshirn
2019-01-18  2:19 ` [PATCH v3 2/5] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-01-18  2:19 ` [PATCH v3 3/5] btrfs: extent_io: Kill the BUG_ON() in flush_write_bio() Qu Wenruo
2019-01-22 17:38   ` David Sterba
2019-01-18  2:19 ` [PATCH v3 4/5] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-01-18  7:39   ` Johannes Thumshirn
2019-01-18  2:19 ` [PATCH v3 5/5] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-01-18  7:48   ` Johannes Thumshirn
2019-01-22 17:47 ` [PATCH v3 0/5] btrfs: Enhancement to tree block validation David Sterba
2019-01-22 22:53   ` Qu Wenruo
2019-01-23 17:16 ` David Sterba
2019-01-24  0:08   ` Qu Wenruo
2019-01-24  3:04   ` Qu Wenruo
2019-01-24 15:20     ` David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox