All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
@ 2019-03-20  6:27 Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.1-rc1 tag.

This patchset has the following 3 features:
- Tree block validation output enhancement
  * Output validation failure timing (write time or read time)
  * Always output tree block level/key mismatch error message
    This part is already submitted and reviewed.

- Write time tree block validation check
  To catch memory corruption either from hardware or kernel.
  Example output would be:

    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
    BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
    BTRFS info (device dm-3): forced readonly
    BTRFS warning (device dm-3): Skipping commit of aborted transaction.
    BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
    BTRFS info (device dm-3): delayed_refs has NO entry

- Better error handling before calling flush_write_bio()
  One hidden reason of calling flush_write_bio() under all cases is,
  flush_write_bio() will trigger endio function and endio function of
  epd->bio will free the bio under all cases.
  So we're in fact abusing flush_write_bio() as cleanup.

  Since now flush_write_bio() has its own return value, we shouldn't call
  flush_write_bio() no-brain, here we introduce proper cleanup helper,
  end_write_bio(). Now we call flush_write_bio() like:
              New                 |           Old
  --------------------------------------------------------------
  ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
  if (ret < 0) {                  | flush_write_bio(&epd);
  	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
  	return ret;               | return ret;
  }                               |
  ret = flush_write_bio(&epd);    |
  return ret;                     |

  Above code should be more streamline for the error handling part.

Changelog:
v2:
- Unlock locked pages in lock_extent_buffer_for_io() for error handling.
- Added Reviewed-by tags.

v3:
- Remove duplicated error message.
- Use IS_ENABLED() macro to replace #ifdef.
- Added Reviewed-by tags.

v4:
- Re-organized patch split
  Now each BUG_ON() cleanup has its own patch
- Dig much further into the call sites to eliminate unexpected >0 return
  May be a little paranoid and abuse some ASSERT(), but it should be
  much safer against further code change.
- Fix the false alert caused by balance and memory pressure
  The fix it skip owner checker for non-essential tree at write time.
  Since owner root can't always be reliable, either due to commit root
  created in current transaction or balance + memory pressure.

v5:
- Do proper error-out handling other than relying on flush_write_bio()
  to clean up.
  This has a side effect that no Reviewed-by tags for modified patches.
- New comment for why we don't need to do anything about ebp->bio when
  submit_one_bio() fails.
- Add some Reviewed-by tag.

v5.1:
- Add "block=%llu " output for write/read time error line.
- Also output read time error message for fsid/start/level check.

v5.2:
- Fix a missing page_unlock() in error hanlding

v5.3:
- Rebase to v5.1-rc1 tag

Qu Wenruo (11):
  btrfs: Always output error message when key/level verification fails
  btrfs: disk-io: Show the timing of corrupted tree block explicitly
  btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  btrfs: extent_io: Handle error better in extent_write_full_page()
  btrfs: extent_io: Handle error better in btree_write_cache_pages()
  btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_write_locked_range()
  btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_writepages()
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c      |  24 ++++++---
 fs/btrfs/extent_io.c    | 109 +++++++++++++++++++++++++++++++++-------
 fs/btrfs/tree-checker.c |  24 +++++++--
 fs/btrfs/tree-checker.h |   8 +++
 4 files changed, 137 insertions(+), 28 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Johannes Thumshirn

We have internal report of strange transaction abort due to EUCLEAN
without any error message.

Since error message inside verify_level_key() is only enabled for
CONFIG_BTRFS_DEBUG, the error message won't output for most distro.

This patch will make the error message mandatory, so when problem
happens we know what's causing the problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/disk-io.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6fe9197f6ee4..6a7a1e168169 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -424,12 +424,11 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 
 	found_level = btrfs_header_level(eb);
 	if (found_level != level) {
-#ifdef CONFIG_BTRFS_DEBUG
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree level check failed\n");
 		btrfs_err(fs_info,
 "tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
 			  eb->start, level, found_level);
-#endif
 		return -EIO;
 	}
 
@@ -450,9 +449,9 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 		btrfs_item_key_to_cpu(eb, &found_key, 0);
 	ret = btrfs_comp_cpu_keys(first_key, &found_key);
 
-#ifdef CONFIG_BTRFS_DEBUG
 	if (ret) {
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree first key check failed\n");
 		btrfs_err(fs_info,
 "tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)",
 			  eb->start, parent_transid, first_key->objectid,
@@ -460,7 +459,6 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 			  found_key.objectid, found_key.type,
 			  found_key.offset);
 	}
-#endif
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-29 14:11   ` David Sterba
  2019-03-20  6:27 ` [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Johannes Thumshirn

Just add one extra line to show when the corruption is detected.
Currently only read time detection is possible.

The planned distinguish line would be:
  read time:
    <detail report>
    block=XXXXX read time tree block corruption detected

  write time:
    <detail report>
    block=XXXXX write time tree block corruption detected

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/disk-io.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6a7a1e168169..e8d4e238f832 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -659,6 +659,10 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 
 	if (!ret)
 		set_extent_buffer_uptodate(eb);
+	else
+		btrfs_err(fs_info,
+			  "block=%llu read time tree block corruption detected",
+			  eb->start);
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20 19:39   ` David Sterba
  2019-03-20  6:27 ` [PATCH v5.3 04/11] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

We have a BUG_ON() in flush_write_bio() to handle the return value of
submit_one_bio().

Move the BUG_ON() one level up to all its callers.

This patch will introduce temporary variable, @flush_ret to keep code
change minimal in this patch. That variable will be cleaned up when
enhancing the error handling later.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/extent_io.c | 55 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ca8b8e785cf3..58bf6ae29319 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -170,15 +170,28 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 	return blk_status_to_errno(ret);
 }
 
-static void flush_write_bio(struct extent_page_data *epd)
+/*
+ * A wrapper for submit_one_bio().
+ *
+ * Return 0 if everything is OK.
+ * Return <0 for error.
+ */
+static int __must_check flush_write_bio(struct extent_page_data *epd)
 {
-	if (epd->bio) {
-		int ret;
+	int ret = 0;
 
+	if (epd->bio) {
 		ret = submit_one_bio(epd->bio, 0, 0);
-		BUG_ON(ret < 0); /* -ENOMEM */
+		/*
+		 * Clean up of epd->bio is handled by its endio function.
+		 * And endio is either triggered by successful bio execution
+		 * or the error handler of submit bio hook.
+		 * So at this point, no matter what happened, we don't need
+		 * to clean up epd->bio.
+		 */
 		epd->bio = NULL;
 	}
+	return ret;
 }
 
 int __init extent_io_init(void)
@@ -3511,7 +3524,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 
 	if (!btrfs_try_tree_write_lock(eb)) {
 		flush = 1;
-		flush_write_bio(epd);
+		ret = flush_write_bio(epd);
+		BUG_ON(ret < 0);
 		btrfs_tree_lock(eb);
 	}
 
@@ -3520,7 +3534,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		if (!epd->sync_io)
 			return 0;
 		if (!flush) {
-			flush_write_bio(epd);
+			ret = flush_write_bio(epd);
+			BUG_ON(ret < 0);
 			flush = 1;
 		}
 		while (1) {
@@ -3561,7 +3576,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				BUG_ON(ret < 0);
 				flush = 1;
 			}
 			lock_page(p);
@@ -3753,6 +3769,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 		.sync_io = wbc->sync_mode == WB_SYNC_ALL,
 	};
 	int ret = 0;
+	int flush_ret;
 	int done = 0;
 	int nr_to_write_done = 0;
 	struct pagevec pvec;
@@ -3852,7 +3869,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -3949,7 +3967,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * tmpfs file mapping
 			 */
 			if (!trylock_page(page)) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				BUG_ON(ret < 0);
 				lock_page(page);
 			}
 
@@ -3959,8 +3978,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (wbc->sync_mode != WB_SYNC_NONE) {
-				if (PageWriteback(page))
-					flush_write_bio(epd);
+				if (PageWriteback(page)) {
+					ret = flush_write_bio(epd);
+					BUG_ON(ret < 0);
+				}
 				wait_on_page_writeback(page);
 			}
 
@@ -4021,6 +4042,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(page->mapping->host)->io_tree,
@@ -4030,7 +4052,8 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 
 	ret = __extent_writepage(page, wbc, &epd);
 
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -4038,6 +4061,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 			      int mode)
 {
 	int ret = 0;
+	int flush_ret;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct page *page;
@@ -4070,7 +4094,8 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 		start += PAGE_SIZE;
 	}
 
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -4078,6 +4103,7 @@ int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc)
 {
 	int ret = 0;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(mapping->host)->io_tree,
@@ -4086,7 +4112,8 @@ int extent_writepages(struct address_space *mapping,
 	};
 
 	ret = extent_write_cache_pages(mapping, wbc, &epd);
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 04/11] btrfs: extent_io: Handle error better in extent_write_full_page()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 05/11] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

Since now flush_write_bio() could return error, kill the BUG_ON() first.

Then don't call flush_write_bio() unconditionally, instead we check the
return value from __extent_writepage() first.

If __extent_writepage() fails, we do cleanup, and return error without
submitting the possible corrupted or half-baked bio.

If __extent_writepage() successes, then we call flush_write_bio() and
return the result.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 58bf6ae29319..2a51627d7f24 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -170,6 +170,16 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 	return blk_status_to_errno(ret);
 }
 
+/* A wrapper for bio_endio() to cleanup unsubmitted bios */
+static void end_write_bio(struct extent_page_data *epd, int ret)
+{
+	if (epd->bio) {
+		epd->bio->bi_status = errno_to_blk_status(ret);
+		bio_endio(epd->bio);
+		epd->bio = NULL;
+	}
+}
+
 /*
  * A wrapper for submit_one_bio().
  *
@@ -3432,6 +3442,9 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
  * records are inserted to lock ranges in the tree, and as dirty areas
  * are found, they are marked writeback.  Then the lock bits are removed
  * and the end_io handler clears the writeback ranges
+ *
+ * Return 0 if everything goes well.
+ * Return <0 for error.
  */
 static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 			      struct extent_page_data *epd)
@@ -3501,6 +3514,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		end_extent_writepage(page, ret, start, page_end);
 	}
 	unlock_page(page);
+	ASSERT(ret <= 0);
 	return ret;
 
 done_unlocked:
@@ -4042,7 +4056,6 @@ static int extent_write_cache_pages(struct address_space *mapping,
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
-	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(page->mapping->host)->io_tree,
@@ -4051,9 +4064,14 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 	};
 
 	ret = __extent_writepage(page, wbc, &epd);
+	ASSERT(ret <= 0);
+	if (ret < 0) {
+		end_write_bio(&epd, ret);
+		return ret;
+	}
 
-	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
+	ret = flush_write_bio(&epd);
+	ASSERT(ret <= 0);
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 05/11] btrfs: extent_io: Handle error better in btree_write_cache_pages()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 04/11] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 06/11] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

In btree_write_cache_pages(), we can only get @ret <= 0.
Add an ASSERT() for it just in case.

Then instead of submitting the write bio even we got some error, check
the return value first.
If we have already hit some error, just clean up the corrupted or
half-baked bio, and return error.

If there is no error so far, then call flush_write_bio() and return the
result.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2a51627d7f24..36e9dcd764fd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3783,7 +3783,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		.sync_io = wbc->sync_mode == WB_SYNC_ALL,
 	};
 	int ret = 0;
-	int flush_ret;
 	int done = 0;
 	int nr_to_write_done = 0;
 	struct pagevec pvec;
@@ -3883,8 +3882,12 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
+	ASSERT(ret <= 0);
+	if (ret < 0) {
+		end_write_bio(&epd, ret);
+		return ret;
+	}
+	ret = flush_write_bio(&epd);
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 06/11] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (4 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 05/11] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

Since __extent_writepage() will no longer return >0 value,
(ret == AOP_WRITEPAGE_ACTIVATE) will never be true.

Kill that dead branch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/extent_io.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 36e9dcd764fd..4ad7c6afe623 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4009,11 +4009,6 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			ret = __extent_writepage(page, wbc, epd);
-
-			if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE)) {
-				unlock_page(page);
-				ret = 0;
-			}
 			if (ret < 0) {
 				/*
 				 * done_index is set past this page,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (5 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 06/11] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-21 13:19   ` David Sterba
  2019-03-20  6:27 ` [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

Do proper cleanup if we hit any error in extent_write_locked_range(),
and check the return value of flush_write_bio().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ad7c6afe623..c688049ccfc3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4077,7 +4077,6 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 			      int mode)
 {
 	int ret = 0;
-	int flush_ret;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct page *page;
@@ -4110,8 +4109,12 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 		start += PAGE_SIZE;
 	}
 
-	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
+	ASSERT(ret <= 0);
+	if (ret < 0) {
+		end_write_bio(&epd, ret);
+		return ret;
+	}
+	ret = flush_write_bio(&epd);
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (6 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-21 13:30   ` David Sterba
  2019-03-20  6:27 ` [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

This function needs some extra check on locked pages and eb.

For error handling we need to unlock locked pages and the eb.

Also add comment for possible return values of lock_extent_buffer_for_io().

There is a rare >0 return value branch, where all pages get locked
while write bio is not flushed.

Thankfully it's handled by the only caller, btree_write_cache_pages(),
as later write_one_eb() call will trigger submit_one_bio().
So there shouldn't be any problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c688049ccfc3..5de18900a3c3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3527,19 +3527,25 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 		       TASK_UNINTERRUPTIBLE);
 }
 
+/*
+ * Return 0 if nothing went wrong, its pages get locked and submitted.
+ * Return >0 is mostly the same as 0, except bio is not submitted.
+ * Return <0 if something went wrong, no page get locked.
+ */
 static noinline_for_stack int
 lock_extent_buffer_for_io(struct extent_buffer *eb,
 			  struct btrfs_fs_info *fs_info,
 			  struct extent_page_data *epd)
 {
-	int i, num_pages;
+	int i, num_pages, failed_page_nr;
 	int flush = 0;
 	int ret = 0;
 
 	if (!btrfs_try_tree_write_lock(eb)) {
-		flush = 1;
 		ret = flush_write_bio(epd);
-		BUG_ON(ret < 0);
+		if (ret < 0)
+			return ret;
+		flush = 1;
 		btrfs_tree_lock(eb);
 	}
 
@@ -3549,7 +3555,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 			return 0;
 		if (!flush) {
 			ret = flush_write_bio(epd);
-			BUG_ON(ret < 0);
+			if (ret < 0)
+				return ret;
 			flush = 1;
 		}
 		while (1) {
@@ -3591,7 +3598,10 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		if (!trylock_page(p)) {
 			if (!flush) {
 				ret = flush_write_bio(epd);
-				BUG_ON(ret < 0);
+				if (ret < 0) {
+					failed_page_nr = i;
+					goto err_unlock;
+				}
 				flush = 1;
 			}
 			lock_page(p);
@@ -3599,6 +3609,11 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 	}
 
 	return ret;
+err_unlock:
+	/* Unlock these already locked pages */
+	for (i = 0; i < failed_page_nr; i++)
+		unlock_page(eb->pages[i]);
+	return ret;
 }
 
 static void end_extent_buffer_writeback(struct extent_buffer *eb)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (7 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-21 14:14   ` David Sterba
  2019-03-20  6:27 ` [PATCH v5.3 10/11] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

In extent_write_cache_pages() since flush_write_bio() can return error,
we need some extra error handling.

For the first flush_write_bio() since we haven't locked the page, we
only need to exit the loop.

For the seconds flush_write_bio() call, we have the page locked, despite
that there is nothing special need to handle.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/extent_io.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5de18900a3c3..c38058398d75 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4000,7 +4000,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 */
 			if (!trylock_page(page)) {
 				ret = flush_write_bio(epd);
-				BUG_ON(ret < 0);
+				if (ret < 0) {
+					done = 1;
+					break;
+				}
 				lock_page(page);
 			}
 
@@ -4012,7 +4015,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			if (wbc->sync_mode != WB_SYNC_NONE) {
 				if (PageWriteback(page)) {
 					ret = flush_write_bio(epd);
-					BUG_ON(ret < 0);
+					if (ret < 0) {
+						unlock_page(page);
+						done = 1;
+						break;
+					}
 				}
 				wait_on_page_writeback(page);
 			}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 10/11] btrfs: extent_io: Handle error better in extent_writepages()
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (8 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
  2019-03-21 14:34 ` [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker David Sterba
  11 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

Do proper cleanup if we hit any error in extent_writepages(),
and check the return value of flush_write_bio().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c38058398d75..527825e64f0f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4144,7 +4144,6 @@ int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc)
 {
 	int ret = 0;
-	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(mapping->host)->io_tree,
@@ -4153,8 +4152,12 @@ int extent_writepages(struct address_space *mapping,
 	};
 
 	ret = extent_write_cache_pages(mapping, wbc, &epd);
-	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
+	ASSERT(ret <= 0);
+	if (ret < 0) {
+		end_write_bio(&epd, ret);
+		return ret;
+	}
+	ret = flush_write_bio(&epd);
 	return ret;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (9 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 10/11] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
@ 2019-03-20  6:27 ` Qu Wenruo
  2019-04-03  9:07   ` Qu Wenruo
  2019-03-21 14:34 ` [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker David Sterba
  11 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Leonard Lausen

There are at least 2 reports about memory bit flip sneaking into on-disk
data.

Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
detect such problem.

This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.

The design points are:
- Timing of the check: Tree block write hook
  This timing is chosen to reduce the overhead.
  The comprehensive check should be as expensive as csum.
  Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
  user.

- Loose empty leaf check
  Originally for empty leaf, tree-checker will report error if it's not
  a tree root.
  The problem for such check at write time is:
  * False alert for tree root created in current transaction
    In that case, the commit root still needs to be written to disk.
    And since current root can differ from commit root, then it will
    cause false alert.
    This happens for log tree.

  * False alert for relocated tree block
    Relocated tree block can be written to disk due to memory pressure,
    in that case an empty csum tree root can be written to disk and
    cause false alert, since csum root node hasn't been updated.

  Although some more reliable empty leaf check is still kept as is.
  Namely essential trees (e.g. extent, chunk) should never be empty.

The example error output will be something like:
  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3): forced readonly
  BTRFS warning (device dm-3): Skipping commit of aborted transaction.
  BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
  BTRFS info (device dm-3): delayed_refs has NO entry

Reported-by: Leonard Lausen <leonard@lausen.nl>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c      | 10 ++++++++++
 fs/btrfs/tree-checker.c | 24 +++++++++++++++++++++---
 fs/btrfs/tree-checker.h |  8 ++++++++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e8d4e238f832..327a98ca0d14 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -314,6 +314,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
 			return -EUCLEAN;
 		}
 	} else {
+		if (btrfs_header_level(buf))
+			err = btrfs_check_node(fs_info, buf);
+		else
+			err = btrfs_check_leaf_write(fs_info, buf);
+		if (err < 0) {
+			btrfs_err(fs_info,
+			"block=%llu write time tree block corruption detected",
+				  buf->start);
+			return err;
+		}
 		write_extent_buffer(buf, result, 0, csum_size);
 	}
 
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a62e1e837a89..b8cdaf472031 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -477,7 +477,7 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
 }
 
 static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
-		      bool check_item_data)
+		      bool check_item_data, bool check_empty_leaf)
 {
 	/* No valid key type is 0, so all key should be larger than this key */
 	struct btrfs_key prev_key = {0, 0, 0};
@@ -516,6 +516,18 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 				    owner);
 			return -EUCLEAN;
 		}
+
+		/*
+		 * Skip empty leaf check, mostly for write time tree block
+		 *
+		 * Such skip mostly happens for tree block write time, as
+		 * we can't use @owner as accurate owner indicator.
+		 * Case like balance and new tree block created for commit root
+		 * can break owner check easily.
+		 */
+		if (!check_empty_leaf)
+			return 0;
+
 		key.objectid = owner;
 		key.type = BTRFS_ROOT_ITEM_KEY;
 		key.offset = (u64)-1;
@@ -636,13 +648,19 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
 			  struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, true);
+	return check_leaf(fs_info, leaf, true, true);
 }
 
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, false);
+	return check_leaf(fs_info, leaf, false, true);
+}
+
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf)
+{
+	return check_leaf(fs_info, leaf, false, false);
 }
 
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node)
diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
index ff043275b784..6f8d1b627c53 100644
--- a/fs/btrfs/tree-checker.h
+++ b/fs/btrfs/tree-checker.h
@@ -23,6 +23,14 @@ int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
  */
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf);
+
+/*
+ * Write time specific leaf checker.
+ * Don't check if the empty leaf belongs to a tree root. Mostly for balance
+ * and new tree created in current transaction.
+ */
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf);
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node);
 
 #endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-03-20  6:27 ` [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
@ 2019-03-20 19:39   ` David Sterba
  0 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2019-03-20 19:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Johannes Thumshirn

On Wed, Mar 20, 2019 at 02:27:41PM +0800, Qu Wenruo wrote:
> We have a BUG_ON() in flush_write_bio() to handle the return value of
> submit_one_bio().
> 
> Move the BUG_ON() one level up to all its callers.
> 
> This patch will introduce temporary variable, @flush_ret to keep code
> change minimal in this patch. That variable will be cleaned up when
> enhancing the error handling later.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  fs/btrfs/extent_io.c | 55 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index ca8b8e785cf3..58bf6ae29319 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -170,15 +170,28 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
>  	return blk_status_to_errno(ret);
>  }
>  
> -static void flush_write_bio(struct extent_page_data *epd)
> +/*
> + * A wrapper for submit_one_bio().

Such comments are not very useful, changed to "Submit bio from extent
page data via submit_one_bio".

> + *
> + * Return 0 if everything is OK.
> + * Return <0 for error.
> + */
> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>  {
> -	if (epd->bio) {
> -		int ret;
> +	int ret = 0;
>  
> +	if (epd->bio) {
>  		ret = submit_one_bio(epd->bio, 0, 0);
> -		BUG_ON(ret < 0); /* -ENOMEM */
> +		/*
> +		 * Clean up of epd->bio is handled by its endio function.
> +		 * And endio is either triggered by successful bio execution
> +		 * or the error handler of submit bio hook.
> +		 * So at this point, no matter what happened, we don't need
> +		 * to clean up epd->bio.
> +		 */

That one is great.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range()
  2019-03-20  6:27 ` [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
@ 2019-03-21 13:19   ` David Sterba
  2019-03-21 13:45     ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2019-03-21 13:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 20, 2019 at 02:27:45PM +0800, Qu Wenruo wrote:
> Do proper cleanup if we hit any error in extent_write_locked_range(),
> and check the return value of flush_write_bio().

Yes that's what the code does, but the changelog should explain why this
is correct. Same for "btrfs: extent_io: Handle error better in
extent_writepages()". You do that in other patches, why not in this one
too? If the reason is same for several patches, then copy it, eg. btrfs:
"extent_io: Handle errors better in btree_write_cache_pages()" is ok.

But don't resend, I'll fix it here as I made some other fixups to
changelogs and am finishing the whole series.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  2019-03-20  6:27 ` [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
@ 2019-03-21 13:30   ` David Sterba
  0 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2019-03-21 13:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 20, 2019 at 02:27:46PM +0800, Qu Wenruo wrote:
> This function needs some extra check on locked pages and eb.
> 
> For error handling we need to unlock locked pages and the eb.
> 
> Also add comment for possible return values of lock_extent_buffer_for_io().
> 
> There is a rare >0 return value branch, where all pages get locked
> while write bio is not flushed.
> 
> Thankfully it's handled by the only caller, btree_write_cache_pages(),
> as later write_one_eb() call will trigger submit_one_bio().
> So there shouldn't be any problem.

Right, I'd suggest to add a comment to btree_write_cache_pages
explaining that and possibly changing the condition to make it more
explicit that all 3 types of values are really handled.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range()
  2019-03-21 13:19   ` David Sterba
@ 2019-03-21 13:45     ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-21 13:45 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1094 bytes --]



On 2019/3/21 下午9:19, David Sterba wrote:
> On Wed, Mar 20, 2019 at 02:27:45PM +0800, Qu Wenruo wrote:
>> Do proper cleanup if we hit any error in extent_write_locked_range(),
>> and check the return value of flush_write_bio().
> 
> Yes that's what the code does, but the changelog should explain why this
> is correct. Same for "btrfs: extent_io: Handle error better in
> extent_writepages()". You do that in other patches, why not in this one
> too?

My bad, I thought the patch implementing end_write_bio() explains why
the cleanup function is doing the same work as previous
flush_write_bio(), but skipping the bio submitting.

So I skipped the reason why calling end_write_bio() here is enough for
the error case.

> If the reason is same for several patches, then copy it, eg. btrfs:
> "extent_io: Handle errors better in btree_write_cache_pages()" is ok.
> 
> But don't resend, I'll fix it here as I made some other fixups to
> changelogs and am finishing the whole series.

Thank you for the effort and sorry for the inconvenience.

Thanks,
Qu

> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  2019-03-20  6:27 ` [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
@ 2019-03-21 14:14   ` David Sterba
  0 siblings, 0 replies; 24+ messages in thread
From: David Sterba @ 2019-03-21 14:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Johannes Thumshirn

On Wed, Mar 20, 2019 at 02:27:47PM +0800, Qu Wenruo wrote:
> In extent_write_cache_pages() since flush_write_bio() can return error,
> we need some extra error handling.
> 
> For the first flush_write_bio() since we haven't locked the page, we
> only need to exit the loop.
> 
> For the seconds flush_write_bio() call, we have the page locked, despite
> that there is nothing special need to handle.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

I'm leaving this patch out for now.

> ---
>  fs/btrfs/extent_io.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 5de18900a3c3..c38058398d75 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4000,7 +4000,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  			 */
>  			if (!trylock_page(page)) {
>  				ret = flush_write_bio(epd);
> -				BUG_ON(ret < 0);
> +				if (ret < 0) {
> +					done = 1;
> +					break;
> +				}
>  				lock_page(page);
>  			}
>  
> @@ -4012,7 +4015,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  			if (wbc->sync_mode != WB_SYNC_NONE) {
>  				if (PageWriteback(page)) {
>  					ret = flush_write_bio(epd);
> -					BUG_ON(ret < 0);
> +					if (ret < 0) {
> +						unlock_page(page);
> +						done = 1;

I'm not sure about the semantics of 'done' here, in the normal case it
clear, but error handling needs to be taken to context of the whole
writeback and the state.

Your patch looks correct though, so it's a matter of making sure we're
not missing something subtle.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
  2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
                   ` (10 preceding siblings ...)
  2019-03-20  6:27 ` [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
@ 2019-03-21 14:34 ` David Sterba
  2019-03-21 23:56   ` Qu Wenruo
  11 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2019-03-21 14:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 20, 2019 at 02:27:38PM +0800, Qu Wenruo wrote:
> Patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/write_time_tree_checker
> Which is based on v5.1-rc1 tag.
> 
> This patchset has the following 3 features:
> - Tree block validation output enhancement
>   * Output validation failure timing (write time or read time)
>   * Always output tree block level/key mismatch error message
>     This part is already submitted and reviewed.
> 
> - Write time tree block validation check
>   To catch memory corruption either from hardware or kernel.
>   Example output would be:
> 
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>     BTRFS info (device dm-3): forced readonly
>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>     BTRFS info (device dm-3): delayed_refs has NO entry
> 
> - Better error handling before calling flush_write_bio()
>   One hidden reason of calling flush_write_bio() under all cases is,
>   flush_write_bio() will trigger endio function and endio function of
>   epd->bio will free the bio under all cases.
>   So we're in fact abusing flush_write_bio() as cleanup.
> 
>   Since now flush_write_bio() has its own return value, we shouldn't call
>   flush_write_bio() no-brain, here we introduce proper cleanup helper,
>   end_write_bio(). Now we call flush_write_bio() like:
>               New                 |           Old
>   --------------------------------------------------------------
>   ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
>   if (ret < 0) {                  | flush_write_bio(&epd);
>   	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
>   	return ret;               | return ret;
>   }                               |
>   ret = flush_write_bio(&epd);    |
>   return ret;                     |
> 
>   Above code should be more streamline for the error handling part.
> 
> Changelog:
> v2:
> - Unlock locked pages in lock_extent_buffer_for_io() for error handling.
> - Added Reviewed-by tags.
> 
> v3:
> - Remove duplicated error message.
> - Use IS_ENABLED() macro to replace #ifdef.
> - Added Reviewed-by tags.
> 
> v4:
> - Re-organized patch split
>   Now each BUG_ON() cleanup has its own patch
> - Dig much further into the call sites to eliminate unexpected >0 return
>   May be a little paranoid and abuse some ASSERT(), but it should be
>   much safer against further code change.
> - Fix the false alert caused by balance and memory pressure
>   The fix it skip owner checker for non-essential tree at write time.
>   Since owner root can't always be reliable, either due to commit root
>   created in current transaction or balance + memory pressure.
> 
> v5:
> - Do proper error-out handling other than relying on flush_write_bio()
>   to clean up.
>   This has a side effect that no Reviewed-by tags for modified patches.
> - New comment for why we don't need to do anything about ebp->bio when
>   submit_one_bio() fails.
> - Add some Reviewed-by tag.
> 
> v5.1:
> - Add "block=%llu " output for write/read time error line.
> - Also output read time error message for fsid/start/level check.
> 
> v5.2:
> - Fix a missing page_unlock() in error hanlding
> 
> v5.3:
> - Rebase to v5.1-rc1 tag
> 
> Qu Wenruo (11):
>   btrfs: Always output error message when key/level verification fails
>   btrfs: disk-io: Show the timing of corrupted tree block explicitly
>   btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
>   btrfs: extent_io: Handle error better in extent_write_full_page()
>   btrfs: extent_io: Handle error better in btree_write_cache_pages()
>   btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
>   btrfs: extent_io: Handle error better in extent_write_locked_range()
>   btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
>   btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
>   btrfs: extent_io: Handle error better in extent_writepages()
>   btrfs: Do mandatory tree block check before submitting bio

All except 9 and 11 are going to misc-next after a final test. The two
patches are only postponed, one for review and the other one due to
conflict with another patch in misc-next.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
  2019-03-21 14:34 ` [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker David Sterba
@ 2019-03-21 23:56   ` Qu Wenruo
  2019-03-22 17:52     ` David Sterba
  0 siblings, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-03-21 23:56 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4970 bytes --]



On 2019/3/21 下午10:34, David Sterba wrote:
> On Wed, Mar 20, 2019 at 02:27:38PM +0800, Qu Wenruo wrote:
>> Patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/write_time_tree_checker
>> Which is based on v5.1-rc1 tag.
>>
>> This patchset has the following 3 features:
>> - Tree block validation output enhancement
>>   * Output validation failure timing (write time or read time)
>>   * Always output tree block level/key mismatch error message
>>     This part is already submitted and reviewed.
>>
>> - Write time tree block validation check
>>   To catch memory corruption either from hardware or kernel.
>>   Example output would be:
>>
>>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>>     BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
>>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>>     BTRFS info (device dm-3): forced readonly
>>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>>     BTRFS info (device dm-3): delayed_refs has NO entry
>>
>> - Better error handling before calling flush_write_bio()
>>   One hidden reason of calling flush_write_bio() under all cases is,
>>   flush_write_bio() will trigger endio function and endio function of
>>   epd->bio will free the bio under all cases.
>>   So we're in fact abusing flush_write_bio() as cleanup.
>>
>>   Since now flush_write_bio() has its own return value, we shouldn't call
>>   flush_write_bio() no-brain, here we introduce proper cleanup helper,
>>   end_write_bio(). Now we call flush_write_bio() like:
>>               New                 |           Old
>>   --------------------------------------------------------------
>>   ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
>>   if (ret < 0) {                  | flush_write_bio(&epd);
>>   	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
>>   	return ret;               | return ret;
>>   }                               |
>>   ret = flush_write_bio(&epd);    |
>>   return ret;                     |
>>
>>   Above code should be more streamline for the error handling part.
>>
>> Changelog:
>> v2:
>> - Unlock locked pages in lock_extent_buffer_for_io() for error handling.
>> - Added Reviewed-by tags.
>>
>> v3:
>> - Remove duplicated error message.
>> - Use IS_ENABLED() macro to replace #ifdef.
>> - Added Reviewed-by tags.
>>
>> v4:
>> - Re-organized patch split
>>   Now each BUG_ON() cleanup has its own patch
>> - Dig much further into the call sites to eliminate unexpected >0 return
>>   May be a little paranoid and abuse some ASSERT(), but it should be
>>   much safer against further code change.
>> - Fix the false alert caused by balance and memory pressure
>>   The fix it skip owner checker for non-essential tree at write time.
>>   Since owner root can't always be reliable, either due to commit root
>>   created in current transaction or balance + memory pressure.
>>
>> v5:
>> - Do proper error-out handling other than relying on flush_write_bio()
>>   to clean up.
>>   This has a side effect that no Reviewed-by tags for modified patches.
>> - New comment for why we don't need to do anything about ebp->bio when
>>   submit_one_bio() fails.
>> - Add some Reviewed-by tag.
>>
>> v5.1:
>> - Add "block=%llu " output for write/read time error line.
>> - Also output read time error message for fsid/start/level check.
>>
>> v5.2:
>> - Fix a missing page_unlock() in error hanlding
>>
>> v5.3:
>> - Rebase to v5.1-rc1 tag
>>
>> Qu Wenruo (11):
>>   btrfs: Always output error message when key/level verification fails
>>   btrfs: disk-io: Show the timing of corrupted tree block explicitly
>>   btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
>>   btrfs: extent_io: Handle error better in extent_write_full_page()
>>   btrfs: extent_io: Handle error better in btree_write_cache_pages()
>>   btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
>>   btrfs: extent_io: Handle error better in extent_write_locked_range()
>>   btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
>>   btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
>>   btrfs: extent_io: Handle error better in extent_writepages()
>>   btrfs: Do mandatory tree block check before submitting bio
> 
> All except 9 and 11 are going to misc-next after a final test. The two
> patches are only postponed, one for review and the other one due to
> conflict with another patch in misc-next.

Do I need to resend the last patch to solve the conflict by myself?

Thanks,
Qu

> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
  2019-03-21 23:56   ` Qu Wenruo
@ 2019-03-22 17:52     ` David Sterba
  2019-03-25  4:32       ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2019-03-22 17:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Mar 22, 2019 at 07:56:37AM +0800, Qu Wenruo wrote:
> >>   btrfs: Do mandatory tree block check before submitting bio
> > 
> > All except 9 and 11 are going to misc-next after a final test. The two
> > patches are only postponed, one for review and the other one due to
> > conflict with another patch in misc-next.
> 
> Do I need to resend the last patch to solve the conflict by myself?

Please check that I've resoved the conflict correctly, the updated patch
is not top commit in misc-next. Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
  2019-03-22 17:52     ` David Sterba
@ 2019-03-25  4:32       ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-25  4:32 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 668 bytes --]



On 2019/3/23 上午1:52, David Sterba wrote:
> On Fri, Mar 22, 2019 at 07:56:37AM +0800, Qu Wenruo wrote:
>>>>   btrfs: Do mandatory tree block check before submitting bio
>>>
>>> All except 9 and 11 are going to misc-next after a final test. The two
>>> patches are only postponed, one for review and the other one due to
>>> conflict with another patch in misc-next.
>>
>> Do I need to resend the last patch to solve the conflict by myself?
> 
> Please check that I've resoved the conflict correctly, the updated patch
> is not top commit in misc-next. Thanks.

Thank you for the work.

I find nothing wrong, so it should be OK.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-03-20  6:27 ` [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-03-29 14:11   ` David Sterba
  2019-03-29 14:18     ` Qu Wenruo
  0 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2019-03-29 14:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov, Johannes Thumshirn

On Wed, Mar 20, 2019 at 02:27:40PM +0800, Qu Wenruo wrote:
> Just add one extra line to show when the corruption is detected.
> Currently only read time detection is possible.
> 
> The planned distinguish line would be:
>   read time:
>     <detail report>
>     block=XXXXX read time tree block corruption detected
> 
>   write time:
>     <detail report>
>     block=XXXXX write time tree block corruption detected
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  fs/btrfs/disk-io.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6a7a1e168169..e8d4e238f832 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -659,6 +659,10 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>  
>  	if (!ret)
>  		set_extent_buffer_uptodate(eb);
> +	else
> +		btrfs_err(fs_info,
> +			  "block=%llu read time tree block corruption detected",

This showed up in test btrfs/124, the logs below do not contain the
selftests outpt, that runs twice, only the last line is kept for
reference. That's the only message from the read/write time checks, the branch
is misc-next.

[ 5432.522317] BTRFS: selftest: running extent_map tests
[ 5432.573684] BTRFS: device fsid 519acff4-bdb4-4459-87b6-8f64ab1360cf devid 1 transid 7 /dev/vdb
[ 5432.578232] BTRFS info (device vdb): allowing degraded mounts
[ 5432.579703] BTRFS info (device vdb): disk space caching is enabled
[ 5432.581348] BTRFS info (device vdb): has skinny extents
[ 5432.583483] BTRFS warning (device vdb): devid 2 uuid dc83669e-dff7-4dee-bcc6-dbef1f470896 is missing
[ 5432.586115] BTRFS warning (device vdb): devid 2 uuid dc83669e-dff7-4dee-bcc6-dbef1f470896 is missing
[ 5513.939435] BTRFS: device fsid af58e3f0-8011-4d2a-b2bd-fc956d458a75 devid 1 transid 238 /dev/vda
[ 5513.992006] BTRFS info (device vdb): disk space caching is enabled
[ 5513.994398] BTRFS info (device vdb): has skinny extents
[ 5514.022526] BTRFS critical (device vdb): corrupt leaf: root=7 block=30474240 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf
[ 5514.028271] BTRFS error (device vdb): block=30474240 read time tree block corruption detected
[ 5514.031529] BTRFS info (device vdb): read error corrected: ino 0 off 30474240 (dev /dev/vdc sector 18560)
[ 5514.034133] BTRFS info (device vdb): read error corrected: ino 0 off 30478336 (dev /dev/vdc sector 18568)
[ 5514.036662] BTRFS info (device vdb): read error corrected: ino 0 off 30482432 (dev /dev/vdc sector 18576)
[ 5514.039784] BTRFS info (device vdb): read error corrected: ino 0 off 30486528 (dev /dev/vdc sector 18584)
[ 5514.042383] BTRFS error (device vdb): bad tree block start, want 32210944 have 651061555542690057
[ 5514.044666] BTRFS info (device vdb): read error corrected: ino 0 off 32210944 (dev /dev/vdc sector 21952)
[ 5514.047146] BTRFS info (device vdb): read error corrected: ino 0 off 32215040 (dev /dev/vdc sector 21960)
[ 5514.049608] BTRFS info (device vdb): read error corrected: ino 0 off 32219136 (dev /dev/vdc sector 21968)
[ 5514.052034] BTRFS info (device vdb): read error corrected: ino 0 off 32223232 (dev /dev/vdc sector 21976)
[ 5514.056223] BTRFS error (device vdb): bad tree block start, want 32276480 have 0
[ 5514.058632] BTRFS info (device vdb): read error corrected: ino 0 off 32276480 (dev /dev/vdc sector 22080)
[ 5514.060448] BTRFS info (device vdb): read error corrected: ino 0 off 32280576 (dev /dev/vdc sector 22088)
[ 5514.064681] BTRFS info (device vdb): balance: start -d -m -s
[ 5514.066309] BTRFS info (device vdb): relocating block group 2479882240 flags data
[ 5514.068870] BTRFS error (device vdb): bad tree block start, want 31539200 have 723401728380766730
[ 5514.087183] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
[ 5514.087208] BTRFS error (device vdb): bad tree block start, want 31555584 have 723401728380766730
[ 5514.087244] BTRFS error (device vdb): bad tree block start, want 31522816 have 723401728380766730
[ 5514.107976] BTRFS error (device vdb): bad tree block start, want 31555584 have 723401728380766730
[ 5514.131715] BTRFS error (device vdb): bad tree block start, want 31440896 have 506381209866536711
[ 5514.131739] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
[ 5514.155830] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
[ 5527.886754] repair_io_failure: 54 callbacks suppressed
[ 5527.886761] BTRFS info (device vdb): read error corrected: ino 0 off 32145408 (dev /dev/vdc sector 21824)
[ 5527.895976] BTRFS info (device vdb): read error corrected: ino 0 off 32149504 (dev /dev/vdc sector 21832)
[ 5527.899446] BTRFS info (device vdb): read error corrected: ino 0 off 32153600 (dev /dev/vdc sector 21840)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-03-29 14:11   ` David Sterba
@ 2019-03-29 14:18     ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-29 14:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Johannes Thumshirn



On 2019/3/29 下午10:11, David Sterba wrote:
> On Wed, Mar 20, 2019 at 02:27:40PM +0800, Qu Wenruo wrote:
>> Just add one extra line to show when the corruption is detected.
>> Currently only read time detection is possible.
>>
>> The planned distinguish line would be:
>>   read time:
>>     <detail report>
>>     block=XXXXX read time tree block corruption detected
>>
>>   write time:
>>     <detail report>
>>     block=XXXXX write time tree block corruption detected
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>> ---
>>  fs/btrfs/disk-io.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 6a7a1e168169..e8d4e238f832 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -659,6 +659,10 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>
>>  	if (!ret)
>>  		set_extent_buffer_uptodate(eb);
>> +	else
>> +		btrfs_err(fs_info,
>> +			  "block=%llu read time tree block corruption detected",
>
> This showed up in test btrfs/124, the logs below do not contain the
> selftests outpt, that runs twice, only the last line is kept for
> reference. That's the only message from the read/write time checks, the branch
> is misc-next.
>
> [ 5432.522317] BTRFS: selftest: running extent_map tests
> [ 5432.573684] BTRFS: device fsid 519acff4-bdb4-4459-87b6-8f64ab1360cf devid 1 transid 7 /dev/vdb
> [ 5432.578232] BTRFS info (device vdb): allowing degraded mounts
> [ 5432.579703] BTRFS info (device vdb): disk space caching is enabled
> [ 5432.581348] BTRFS info (device vdb): has skinny extents
> [ 5432.583483] BTRFS warning (device vdb): devid 2 uuid dc83669e-dff7-4dee-bcc6-dbef1f470896 is missing
> [ 5432.586115] BTRFS warning (device vdb): devid 2 uuid dc83669e-dff7-4dee-bcc6-dbef1f470896 is missing
> [ 5513.939435] BTRFS: device fsid af58e3f0-8011-4d2a-b2bd-fc956d458a75 devid 1 transid 238 /dev/vda
> [ 5513.992006] BTRFS info (device vdb): disk space caching is enabled
> [ 5513.994398] BTRFS info (device vdb): has skinny extents
> [ 5514.022526] BTRFS critical (device vdb): corrupt leaf: root=7 block=30474240 slot=0, invalid nritems, have 0 should not be 0 for non-root leaf

This is csum tree, which can be empty, but the old code I kept to do the
comprehensive owner check is not doing correct thing.

Considering it's doing replace, it maybe involved in relocation code.

Anyway, I'll investigate it and possibly remove the comprehensive owner
check, just using the owner checker for empty leaves.

Thanks,
Qu

> [ 5514.028271] BTRFS error (device vdb): block=30474240 read time tree block corruption detected
> [ 5514.031529] BTRFS info (device vdb): read error corrected: ino 0 off 30474240 (dev /dev/vdc sector 18560)
> [ 5514.034133] BTRFS info (device vdb): read error corrected: ino 0 off 30478336 (dev /dev/vdc sector 18568)
> [ 5514.036662] BTRFS info (device vdb): read error corrected: ino 0 off 30482432 (dev /dev/vdc sector 18576)
> [ 5514.039784] BTRFS info (device vdb): read error corrected: ino 0 off 30486528 (dev /dev/vdc sector 18584)
> [ 5514.042383] BTRFS error (device vdb): bad tree block start, want 32210944 have 651061555542690057
> [ 5514.044666] BTRFS info (device vdb): read error corrected: ino 0 off 32210944 (dev /dev/vdc sector 21952)
> [ 5514.047146] BTRFS info (device vdb): read error corrected: ino 0 off 32215040 (dev /dev/vdc sector 21960)
> [ 5514.049608] BTRFS info (device vdb): read error corrected: ino 0 off 32219136 (dev /dev/vdc sector 21968)
> [ 5514.052034] BTRFS info (device vdb): read error corrected: ino 0 off 32223232 (dev /dev/vdc sector 21976)
> [ 5514.056223] BTRFS error (device vdb): bad tree block start, want 32276480 have 0
> [ 5514.058632] BTRFS info (device vdb): read error corrected: ino 0 off 32276480 (dev /dev/vdc sector 22080)
> [ 5514.060448] BTRFS info (device vdb): read error corrected: ino 0 off 32280576 (dev /dev/vdc sector 22088)
> [ 5514.064681] BTRFS info (device vdb): balance: start -d -m -s
> [ 5514.066309] BTRFS info (device vdb): relocating block group 2479882240 flags data
> [ 5514.068870] BTRFS error (device vdb): bad tree block start, want 31539200 have 723401728380766730
> [ 5514.087183] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
> [ 5514.087208] BTRFS error (device vdb): bad tree block start, want 31555584 have 723401728380766730
> [ 5514.087244] BTRFS error (device vdb): bad tree block start, want 31522816 have 723401728380766730
> [ 5514.107976] BTRFS error (device vdb): bad tree block start, want 31555584 have 723401728380766730
> [ 5514.131715] BTRFS error (device vdb): bad tree block start, want 31440896 have 506381209866536711
> [ 5514.131739] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
> [ 5514.155830] BTRFS error (device vdb): bad tree block start, want 31457280 have 578721382704613384
> [ 5527.886754] repair_io_failure: 54 callbacks suppressed
> [ 5527.886761] BTRFS info (device vdb): read error corrected: ino 0 off 32145408 (dev /dev/vdc sector 21824)
> [ 5527.895976] BTRFS info (device vdb): read error corrected: ino 0 off 32149504 (dev /dev/vdc sector 21832)
> [ 5527.899446] BTRFS info (device vdb): read error corrected: ino 0 off 32153600 (dev /dev/vdc sector 21840)
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio
  2019-03-20  6:27 ` [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
@ 2019-04-03  9:07   ` Qu Wenruo
  0 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-04-03  9:07 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Leonard Lausen, David Sterba


[-- Attachment #1.1: Type: text/plain, Size: 490 bytes --]



On 2019/3/20 下午2:27, Qu Wenruo wrote:
<snip>
> +int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *leaf)
> +{
> +	return check_leaf(fs_info, leaf, false, false);
>  }

Well, I got confused by those two bool paramters.

The first bool is whether to check leaf data.
!!OF COURSE WE NEED TO CHECK LEAF DATA!!

So the 3rd parameter should be "true".

David, would you please fold this change into misc-next branch?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-04-03  9:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-03-29 14:11   ` David Sterba
2019-03-29 14:18     ` Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-03-20 19:39   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 04/11] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 05/11] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 06/11] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-03-21 13:19   ` David Sterba
2019-03-21 13:45     ` Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-03-21 13:30   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-03-21 14:14   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 10/11] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-04-03  9:07   ` Qu Wenruo
2019-03-21 14:34 ` [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker David Sterba
2019-03-21 23:56   ` Qu Wenruo
2019-03-22 17:52     ` David Sterba
2019-03-25  4:32       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.