linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] btrfs: Enhancement to tree block validation
@ 2019-01-25  5:09 Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.0-rc1 tag.

This patchset has the following two features:
- Tree block validation output enhancement
  * Output validation failure timing (write time or read time)
  * Always output tree block level/key mismatch error message
    This part is already submitted and reviewed.

- Write time tree block validation check
  To catch memory corruption either from hardware or kernel.
  Example output would be:

    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): write time tree block corruption detected
    BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
    BTRFS info (device dm-3): forced readonly
    BTRFS warning (device dm-3): Skipping commit of aborted transaction.
    BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
    BTRFS info (device dm-3): delayed_refs has NO entry

Changelog:
v2:
- Unlock locked pages in lock_extent_buffer_for_io() for error handling.
- Added Reviewed-by tags.

v3:
- Remove duplicated error message.
- Use IS_ENABLED() macro to replace #ifdef.
- Added Reviewed-by tags.

v4:
- Re-organized patch split
  Now each BUG_ON() cleanup has its own patch
- Dig much further into the call sites to eliminate unexpected >0 return
  May be a little paranoid and abuse some ASSERT(), but it should be
  much safer against further code change.
- Fix the false alert caused by balance and memory pressure
  The fix it skip owner checker for non-essential tree at write time.
  Since owner root can't always be reliable, either due to commit root
  created in current transaction or balance + memory pressure.

Qu Wenruo (12):
  btrfs: Always output error message when key/level verification fails
  btrfs: extent_io: Kill the forward declaration of flush_write_bio()
  btrfs: disk-io: Show the timing of corrupted tree block explicitly
  btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page()
  btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages()
  btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range()
  btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  btrfs: extent_io: Kill the BUG_ON() in extent_writepages()
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c      |  21 ++++--
 fs/btrfs/extent_io.c    | 154 ++++++++++++++++++++++++++--------------
 fs/btrfs/tree-checker.c |  24 ++++++-
 fs/btrfs/tree-checker.h |   8 +++
 4 files changed, 144 insertions(+), 63 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:02   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

We have internal report of strange transaction abort due to EUCLEAN
without any error message.

Since error message inside verify_level_key() is only enabled for
CONFIG_BTRFS_DEBUG, the error message won't output for most distro.

This patch will make the error message mandatory, so when problem
happens we know what's causing the problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8da2f380d3c0..794d5bb7fe33 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -423,12 +423,11 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 
 	found_level = btrfs_header_level(eb);
 	if (found_level != level) {
-#ifdef CONFIG_BTRFS_DEBUG
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree level check failed\n");
 		btrfs_err(fs_info,
 "tree level mismatch detected, bytenr=%llu level expected=%u has=%u",
 			  eb->start, level, found_level);
-#endif
 		return -EIO;
 	}
 
@@ -449,9 +448,9 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 		btrfs_item_key_to_cpu(eb, &found_key, 0);
 	ret = btrfs_comp_cpu_keys(first_key, &found_key);
 
-#ifdef CONFIG_BTRFS_DEBUG
 	if (ret) {
-		WARN_ON(1);
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree first key check failed\n");
 		btrfs_err(fs_info,
 "tree first key mismatch detected, bytenr=%llu parent_transid=%llu key expected=(%llu,%u,%llu) has=(%llu,%u,%llu)",
 			  eb->start, parent_transid, first_key->objectid,
@@ -459,7 +458,6 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 			  found_key.objectid, found_key.type,
 			  found_key.offset);
 	}
-#endif
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Johannes Thumshirn

There is no need to forward declare flush_write_bio(), as it only
depends on submit_one_bio().

Both of them are pretty small, just move them to kill the forward
declaration.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/extent_io.c | 66 +++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 52abe4082680..8a2335713a2d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -147,7 +147,38 @@ static int add_extent_changeset(struct extent_state *state, unsigned bits,
 	return ret;
 }
 
-static void flush_write_bio(struct extent_page_data *epd);
+static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
+				       unsigned long bio_flags)
+{
+	blk_status_t ret = 0;
+	struct bio_vec *bvec = bio_last_bvec_all(bio);
+	struct page *page = bvec->bv_page;
+	struct extent_io_tree *tree = bio->bi_private;
+	u64 start;
+
+	start = page_offset(page) + bvec->bv_offset;
+
+	bio->bi_private = NULL;
+
+	if (tree->ops)
+		ret = tree->ops->submit_bio_hook(tree->private_data, bio,
+					   mirror_num, bio_flags, start);
+	else
+		btrfsic_submit_bio(bio);
+
+	return blk_status_to_errno(ret);
+}
+
+static void flush_write_bio(struct extent_page_data *epd)
+{
+	if (epd->bio) {
+		int ret;
+
+		ret = submit_one_bio(epd->bio, 0, 0);
+		BUG_ON(ret < 0); /* -ENOMEM */
+		epd->bio = NULL;
+	}
+}
 
 int __init extent_io_init(void)
 {
@@ -2692,28 +2723,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
 	return bio;
 }
 
-static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
-				       unsigned long bio_flags)
-{
-	blk_status_t ret = 0;
-	struct bio_vec *bvec = bio_last_bvec_all(bio);
-	struct page *page = bvec->bv_page;
-	struct extent_io_tree *tree = bio->bi_private;
-	u64 start;
-
-	start = page_offset(page) + bvec->bv_offset;
-
-	bio->bi_private = NULL;
-
-	if (tree->ops)
-		ret = tree->ops->submit_bio_hook(tree->private_data, bio,
-					   mirror_num, bio_flags, start);
-	else
-		btrfsic_submit_bio(bio);
-
-	return blk_status_to_errno(ret);
-}
-
 /*
  * @opf:	bio REQ_OP_* and REQ_* flags as one value
  * @tree:	tree so we can call our merge_bio hook
@@ -4007,17 +4016,6 @@ static int extent_write_cache_pages(struct address_space *mapping,
 	return ret;
 }
 
-static void flush_write_bio(struct extent_page_data *epd)
-{
-	if (epd->bio) {
-		int ret;
-
-		ret = submit_one_bio(epd->bio, 0, 0);
-		BUG_ON(ret < 0); /* -ENOMEM */
-		epd->bio = NULL;
-	}
-}
-
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:03   ` Johannes Thumshirn
  2019-01-30 14:57   ` David Sterba
  2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Just add one extra line to show when the corruption is detected.
Currently only read time detection is possible.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 794d5bb7fe33..426e9f450f70 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 
 	if (!ret)
 		set_extent_buffer_uptodate(eb);
+	else
+		btrfs_err(fs_info, "read time tree block corrupted detected");
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:04   ` Johannes Thumshirn
                     ` (2 more replies)
  2019-01-25  5:09 ` [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page() Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 3 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

We have a BUG_ON() in flush_write_bio() to handle the return value of
submit_one_bio().

Move the BUG_ON() one level up to all its callers.

No functional change, just to make later BUG_ON() cleanup more obvious.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 48 +++++++++++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8a2335713a2d..6f1982f8ad5c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -169,15 +169,21 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 	return blk_status_to_errno(ret);
 }
 
-static void flush_write_bio(struct extent_page_data *epd)
+/*
+ * A wrapper for submit_one_bio().
+ *
+ * Return 0 if everything is OK.
+ * Return <0 for error.
+ */
+static int __must_check flush_write_bio(struct extent_page_data *epd)
 {
-	if (epd->bio) {
-		int ret;
+	int ret = 0;
 
+	if (epd->bio) {
 		ret = submit_one_bio(epd->bio, 0, 0);
-		BUG_ON(ret < 0); /* -ENOMEM */
 		epd->bio = NULL;
 	}
+	return ret;
 }
 
 int __init extent_io_init(void)
@@ -3510,7 +3516,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 
 	if (!btrfs_try_tree_write_lock(eb)) {
 		flush = 1;
-		flush_write_bio(epd);
+		ret = flush_write_bio(epd);
+		BUG_ON(ret < 0);
 		btrfs_tree_lock(eb);
 	}
 
@@ -3519,7 +3526,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		if (!epd->sync_io)
 			return 0;
 		if (!flush) {
-			flush_write_bio(epd);
+			ret = flush_write_bio(epd);
+			BUG_ON(ret < 0);
 			flush = 1;
 		}
 		while (1) {
@@ -3560,7 +3568,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				BUG_ON(ret < 0);
 				flush = 1;
 			}
 			lock_page(p);
@@ -3751,6 +3760,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 		.sync_io = wbc->sync_mode == WB_SYNC_ALL,
 	};
 	int ret = 0;
+	int flush_ret;
 	int done = 0;
 	int nr_to_write_done = 0;
 	struct pagevec pvec;
@@ -3850,7 +3860,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -3947,7 +3958,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * tmpfs file mapping
 			 */
 			if (!trylock_page(page)) {
-				flush_write_bio(epd);
+				ret = flush_write_bio(epd);
+				BUG_ON(ret < 0);
 				lock_page(page);
 			}
 
@@ -3957,8 +3969,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (wbc->sync_mode != WB_SYNC_NONE) {
-				if (PageWriteback(page))
-					flush_write_bio(epd);
+				if (PageWriteback(page)) {
+					ret = flush_write_bio(epd);
+					BUG_ON(ret < 0);
+				}
 				wait_on_page_writeback(page);
 			}
 
@@ -4019,6 +4033,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 {
 	int ret;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(page->mapping->host)->io_tree,
@@ -4028,7 +4043,8 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 
 	ret = __extent_writepage(page, wbc, &epd);
 
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -4036,6 +4052,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 			      int mode)
 {
 	int ret = 0;
+	int flush_ret;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct page *page;
@@ -4068,7 +4085,8 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 		start += PAGE_SIZE;
 	}
 
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
@@ -4076,6 +4094,7 @@ int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc)
 {
 	int ret = 0;
+	int flush_ret;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.tree = &BTRFS_I(mapping->host)->io_tree,
@@ -4084,7 +4103,8 @@ int extent_writepages(struct address_space *mapping,
 	};
 
 	ret = extent_write_cache_pages(mapping, wbc, &epd);
-	flush_write_bio(&epd);
+	flush_ret = flush_write_bio(&epd);
+	BUG_ON(flush_ret < 0);
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:05   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages() Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

This one is pretty straightforward, __extent_writepage() can only return
<0 or 0.

So if we hit error from __extent_writepage(), then return the error.
Or return the value from flush_write_bio().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6f1982f8ad5c..bc3426dff5a3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3424,6 +3424,9 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
  * records are inserted to lock ranges in the tree, and as dirty areas
  * are found, they are marked writeback.  Then the lock bits are removed
  * and the end_io handler clears the writeback ranges
+ *
+ * Return 0 if everything goes well.
+ * Return <0 for error.
  */
 static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 			      struct extent_page_data *epd)
@@ -3493,6 +3496,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		end_extent_writepage(page, ret, start, page_end);
 	}
 	unlock_page(page);
+	ASSERT(ret <= 0);
 	return ret;
 
 done_unlocked:
@@ -4044,8 +4048,10 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 	ret = __extent_writepage(page, wbc, &epd);
 
 	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
-	return ret;
+	ASSERT(ret <= 0);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (4 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:11   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

The @ret from the original function can only be 0 or <0.
<0 case is from write_one_eb() which can only return 0 or <0.

So just return @ret if something went wrong. For anything else, return
@flush_ret.

Also add a ASSERT() to catch unexpected >0 return value from
write_one_eb().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bc3426dff5a3..5573c2355e05 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3865,8 +3865,10 @@ int btree_write_cache_pages(struct address_space *mapping,
 		goto retry;
 	}
 	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
-	return ret;
+	ASSERT(ret <= 0);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (5 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:11   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 08/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range() Qu Wenruo
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

Since __extent_writepage() will no longer return >0 value,
(ret == AOP_WRITEPAGE_ACTIVATE) will never be true.

Kill that dead branch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5573c2355e05..866972ecf70d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3989,11 +3989,6 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			ret = __extent_writepage(page, wbc, epd);
-
-			if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE)) {
-				unlock_page(page);
-				ret = 0;
-			}
 			if (ret < 0) {
 				/*
 				 * done_index is set past this page,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 08/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (6 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 866972ecf70d..fba5d4a0b194 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4089,8 +4089,10 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 	}
 
 	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
-	return ret;
+	ASSERT(ret <= 0);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_writepages(struct address_space *mapping,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (7 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 08/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

This function needs some extra check on locked pages and eb.

For error handling we need to unlock locked pages and the eb.

Also add comment for possible return values of lock_extent_buffer_for_io().

There is a rare >0 return value branch, where all pages get locked
while write bio is not flushed.

Thankfully it's handled by the only caller, btree_write_cache_pages(),
as later write_one_eb() call will trigger submit_one_bio().
So there shouldn't be any problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fba5d4a0b194..52c1b0ce6bfd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3509,19 +3509,25 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 		       TASK_UNINTERRUPTIBLE);
 }
 
+/*
+ * Return 0 if nothing went wrong, its pages get locked and submitted.
+ * Return >0 is mostly the same as 0, except bio is not submitted.
+ * Return <0 if something went wrong, no page get locked.
+ */
 static noinline_for_stack int
 lock_extent_buffer_for_io(struct extent_buffer *eb,
 			  struct btrfs_fs_info *fs_info,
 			  struct extent_page_data *epd)
 {
-	int i, num_pages;
+	int i, num_pages, failed_page_nr;
 	int flush = 0;
 	int ret = 0;
 
 	if (!btrfs_try_tree_write_lock(eb)) {
-		flush = 1;
 		ret = flush_write_bio(epd);
-		BUG_ON(ret < 0);
+		if (ret < 0)
+			return ret;
+		flush = 1;
 		btrfs_tree_lock(eb);
 	}
 
@@ -3531,7 +3537,8 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 			return 0;
 		if (!flush) {
 			ret = flush_write_bio(epd);
-			BUG_ON(ret < 0);
+			if (ret < 0)
+				return ret;
 			flush = 1;
 		}
 		while (1) {
@@ -3573,7 +3580,10 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		if (!trylock_page(p)) {
 			if (!flush) {
 				ret = flush_write_bio(epd);
-				BUG_ON(ret < 0);
+				if (ret < 0) {
+					failed_page_nr = i;
+					goto err_unlock;
+				}
 				flush = 1;
 			}
 			lock_page(p);
@@ -3581,6 +3591,11 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 	}
 
 	return ret;
+err_unlock:
+	/* Unlock these already locked pages */
+	for (i = 0; i < failed_page_nr; i++)
+		unlock_page(eb->pages[i]);
+	return ret;
 }
 
 static void end_extent_buffer_writeback(struct extent_buffer *eb)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (8 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:16   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages() Qu Wenruo
  2019-01-25  5:09 ` [PATCH v4 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 52c1b0ce6bfd..0d1f09b41421 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3980,7 +3980,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 */
 			if (!trylock_page(page)) {
 				ret = flush_write_bio(epd);
-				BUG_ON(ret < 0);
+				if (ret < 0) {
+					done = 1;
+					break;
+				}
 				lock_page(page);
 			}
 
@@ -3992,7 +3995,10 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			if (wbc->sync_mode != WB_SYNC_NONE) {
 				if (PageWriteback(page)) {
 					ret = flush_write_bio(epd);
-					BUG_ON(ret < 0);
+					if (ret < 0) {
+						done = 1;
+						break;
+					}
 				}
 				wait_on_page_writeback(page);
 			}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages()
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (9 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  2019-01-25  9:16   ` Johannes Thumshirn
  2019-01-25  5:09 ` [PATCH v4 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
  11 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

The @ret from extent_write_cache_pages() can only be 0 or <0, so we only
need to return @ret if something went wrong, or just return @flush_ret.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0d1f09b41421..ec03f4727786 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4130,8 +4130,10 @@ int extent_writepages(struct address_space *mapping,
 
 	ret = extent_write_cache_pages(mapping, wbc, &epd);
 	flush_ret = flush_write_bio(&epd);
-	BUG_ON(flush_ret < 0);
-	return ret;
+	ASSERT(ret <= 0);
+	if (ret)
+		return ret;
+	return flush_ret;
 }
 
 int extent_readpages(struct address_space *mapping, struct list_head *pages,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 12/12] btrfs: Do mandatory tree block check before submitting bio
  2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
                   ` (10 preceding siblings ...)
  2019-01-25  5:09 ` [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages() Qu Wenruo
@ 2019-01-25  5:09 ` Qu Wenruo
  11 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Leonard Lausen

There are at least 2 reports about memory bit flip sneaking into on-disk
data.

Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
detect such problem.

This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.

The design points are:
- Timing of the check: Tree block write hook
  This timing is chosen to reduce the overhead.
  The comprehensive check should be as expensive as csum.
  Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
  user.

- Loose empty leaf check
  Originally for empty leaf, tree-checker will report error if it's not
  a tree root.
  The problem for such check at write time is:
  * False alert for tree root created in current transaction
    In that case, the commit root still needs to be written to disk.
    And since current root can differ from commit root, then it will
    cause false alert.
    This happens for log tree.

  * False alert for relocated tree block
    Relocated tree block can be written to disk due to memory pressure,
    in that case an empty csum tree root can be written to disk and
    cause false alert, since csum root node hasn't been updated.

  Although some more reliable empty leaf check is still kept as is.
  Namely essential trees (e.g. extent, chunk) should never be empty.

The example error output will be something like:
  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): write time tree block corruption detected
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3): forced readonly
  BTRFS warning (device dm-3): Skipping commit of aborted transaction.
  BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
  BTRFS info (device dm-3): delayed_refs has NO entry

Reported-by: Leonard Lausen <leonard@lausen.nl>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c      |  9 +++++++++
 fs/btrfs/tree-checker.c | 24 +++++++++++++++++++++---
 fs/btrfs/tree-checker.h |  8 ++++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 426e9f450f70..135d84ed72eb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -313,6 +313,15 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
 			return -EUCLEAN;
 		}
 	} else {
+		if (btrfs_header_level(buf))
+			err = btrfs_check_node(fs_info, buf);
+		else
+			err = btrfs_check_leaf_write(fs_info, buf);
+		if (err < 0) {
+			btrfs_err(fs_info,
+				  "write time tree block corruption detected");
+			return err;
+		}
 		write_extent_buffer(buf, result, 0, csum_size);
 	}
 
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a62e1e837a89..b8cdaf472031 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -477,7 +477,7 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
 }
 
 static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
-		      bool check_item_data)
+		      bool check_item_data, bool check_empty_leaf)
 {
 	/* No valid key type is 0, so all key should be larger than this key */
 	struct btrfs_key prev_key = {0, 0, 0};
@@ -516,6 +516,18 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 				    owner);
 			return -EUCLEAN;
 		}
+
+		/*
+		 * Skip empty leaf check, mostly for write time tree block
+		 *
+		 * Such skip mostly happens for tree block write time, as
+		 * we can't use @owner as accurate owner indicator.
+		 * Case like balance and new tree block created for commit root
+		 * can break owner check easily.
+		 */
+		if (!check_empty_leaf)
+			return 0;
+
 		key.objectid = owner;
 		key.type = BTRFS_ROOT_ITEM_KEY;
 		key.offset = (u64)-1;
@@ -636,13 +648,19 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
 			  struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, true);
+	return check_leaf(fs_info, leaf, true, true);
 }
 
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, false);
+	return check_leaf(fs_info, leaf, false, true);
+}
+
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf)
+{
+	return check_leaf(fs_info, leaf, false, false);
 }
 
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node)
diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
index ff043275b784..6f8d1b627c53 100644
--- a/fs/btrfs/tree-checker.h
+++ b/fs/btrfs/tree-checker.h
@@ -23,6 +23,14 @@ int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
  */
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf);
+
+/*
+ * Write time specific leaf checker.
+ * Don't check if the empty leaf belongs to a tree root. Mostly for balance
+ * and new tree created in current transaction.
+ */
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf);
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node);
 
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails
  2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
@ 2019-01-25  9:02   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:02 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
@ 2019-01-25  9:03   ` Johannes Thumshirn
  2019-01-30 14:57   ` David Sterba
  1 sibling, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
@ 2019-01-25  9:04   ` Johannes Thumshirn
  2019-01-30 15:08   ` David Sterba
  2019-01-30 15:19   ` David Sterba
  2 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page()
  2019-01-25  5:09 ` [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page() Qu Wenruo
@ 2019-01-25  9:05   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages()
  2019-01-25  5:09 ` [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages() Qu Wenruo
@ 2019-01-25  9:11   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:11 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  2019-01-25  5:09 ` [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
@ 2019-01-25  9:11   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:11 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  2019-01-25  5:09 ` [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
@ 2019-01-25  9:16   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:16 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages()
  2019-01-25  5:09 ` [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages() Qu Wenruo
@ 2019-01-25  9:16   ` Johannes Thumshirn
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Thumshirn @ 2019-01-25  9:16 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
  2019-01-25  9:03   ` Johannes Thumshirn
@ 2019-01-30 14:57   ` David Sterba
  2019-01-30 14:59     ` Nikolay Borisov
  1 sibling, 1 reply; 33+ messages in thread
From: David Sterba @ 2019-01-30 14:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
> Just add one extra line to show when the corruption is detected.
> Currently only read time detection is possible.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/disk-io.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 794d5bb7fe33..426e9f450f70 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>  
>  	if (!ret)
>  		set_extent_buffer_uptodate(eb);
> +	else
> +		btrfs_err(fs_info, "read time tree block corrupted detected");

I'm not sure the 'read time' is clear enoug, my suggestion is to use
'post-read' (and pre-write analogicaly). What do you think?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-30 14:57   ` David Sterba
@ 2019-01-30 14:59     ` Nikolay Borisov
  2019-01-31  0:03       ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: Nikolay Borisov @ 2019-01-30 14:59 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 30.01.19 г. 16:57 ч., David Sterba wrote:
> On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
>> Just add one extra line to show when the corruption is detected.
>> Currently only read time detection is possible.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>  fs/btrfs/disk-io.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 794d5bb7fe33..426e9f450f70 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>  
>>  	if (!ret)
>>  		set_extent_buffer_uptodate(eb);
>> +	else
>> +		btrfs_err(fs_info, "read time tree block corrupted detected");
> 
> I'm not sure the 'read time' is clear enoug, my suggestion is to use
> 'post-read' (and pre-write analogicaly). What do you think?


How about "error during tree block reading" or "error reading treeblock"?

> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
  2019-01-25  9:04   ` Johannes Thumshirn
@ 2019-01-30 15:08   ` David Sterba
  2019-01-30 15:19   ` David Sterba
  2 siblings, 0 replies; 33+ messages in thread
From: David Sterba @ 2019-01-30 15:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
> We have a BUG_ON() in flush_write_bio() to handle the return value of
> submit_one_bio().
> 
> Move the BUG_ON() one level up to all its callers.
> 
> No functional change, just to make later BUG_ON() cleanup more obvious.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 48 +++++++++++++++++++++++++++++++-------------
>  1 file changed, 34 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8a2335713a2d..6f1982f8ad5c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -169,15 +169,21 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
>  	return blk_status_to_errno(ret);
>  }
>  
> -static void flush_write_bio(struct extent_page_data *epd)
> +/*
> + * A wrapper for submit_one_bio().
> + *
> + * Return 0 if everything is OK.
> + * Return <0 for error.
> + */
> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>  {
> -	if (epd->bio) {
> -		int ret;
> +	int ret = 0;
>  
> +	if (epd->bio) {
>  		ret = submit_one_bio(epd->bio, 0, 0);
> -		BUG_ON(ret < 0); /* -ENOMEM */
>  		epd->bio = NULL;
>  	}
> +	return ret;

Moving the BUG_ON one leve up is ok. I checked the callees of
submit_one_bio and it looks like the return value of btrfsic_submit_bio
is somehow missing. It should at least propagate return value of
submit_bio. This is is for another patchset though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
  2019-01-25  9:04   ` Johannes Thumshirn
  2019-01-30 15:08   ` David Sterba
@ 2019-01-30 15:19   ` David Sterba
  2019-01-31  0:45     ` Qu Wenruo
  2 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2019-01-30 15:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>  {
> -	if (epd->bio) {
> -		int ret;
> +	int ret = 0;
>  
> +	if (epd->bio) {
>  		ret = submit_one_bio(epd->bio, 0, 0);
> -		BUG_ON(ret < 0); /* -ENOMEM */
>  		epd->bio = NULL;

I'm not sure if resetting epd->bio to NULL is all that needs to be done
here. With the BUG_ON the error case never happens so if all goes fine
it's also ok to set it to NULL and continue. But the callers might need
to send the flush again.

It's not easy to trace the bio here, it's set indirectly in
submit_extent_page, submit_one_bio is another indirection, switching by
type and sometimes bio_endio is called in case of an error.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-30 14:59     ` Nikolay Borisov
@ 2019-01-31  0:03       ` Qu Wenruo
  2019-01-31 14:20         ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-31  0:03 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, Qu Wenruo, linux-btrfs



On 2019/1/30 下午10:59, Nikolay Borisov wrote:
> 
> 
> On 30.01.19 г. 16:57 ч., David Sterba wrote:
>> On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
>>> Just add one extra line to show when the corruption is detected.
>>> Currently only read time detection is possible.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>  fs/btrfs/disk-io.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 794d5bb7fe33..426e9f450f70 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>>  
>>>  	if (!ret)
>>>  		set_extent_buffer_uptodate(eb);
>>> +	else
>>> +		btrfs_err(fs_info, "read time tree block corrupted detected");
>>
>> I'm not sure the 'read time' is clear enoug, my suggestion is to use
>> 'post-read' (and pre-write analogicaly). What do you think?
> 
> 
> How about "error during tree block reading" or "error reading treeblock"?

Nikolay's suggestion looks more straightforward to me.

+1 for his idea.

The 'post-read' still could confuse end-user IMHO.

Thanks,
Qu

> 
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-30 15:19   ` David Sterba
@ 2019-01-31  0:45     ` Qu Wenruo
  2019-01-31 14:36       ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-01-31  0:45 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1019 bytes --]



On 2019/1/30 下午11:19, David Sterba wrote:
> On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
>> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>>  {
>> -	if (epd->bio) {
>> -		int ret;
>> +	int ret = 0;
>>  
>> +	if (epd->bio) {
>>  		ret = submit_one_bio(epd->bio, 0, 0);
>> -		BUG_ON(ret < 0); /* -ENOMEM */
>>  		epd->bio = NULL;
> 
> I'm not sure if resetting epd->bio to NULL is all that needs to be done
> here. With the BUG_ON the error case never happens so if all goes fine
> it's also ok to set it to NULL and continue. But the callers might need
> to send the flush again.

If flush_write_bio() get called again on the failed one, it will just
get skipped as  epd->bio is NULL, submit_one_bio() will not be triggered.

Thanks,
Qu

> 
> It's not easy to trace the bio here, it's set indirectly in
> submit_extent_page, submit_one_bio is another indirection, switching by
> type and sometimes bio_endio is called in case of an error.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-31  0:03       ` Qu Wenruo
@ 2019-01-31 14:20         ` David Sterba
  2019-01-31 14:22           ` Nikolay Borisov
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2019-01-31 14:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Nikolay Borisov, dsterba, Qu Wenruo, linux-btrfs

On Thu, Jan 31, 2019 at 08:03:36AM +0800, Qu Wenruo wrote:
> On 2019/1/30 下午10:59, Nikolay Borisov wrote:
> > On 30.01.19 г. 16:57 ч., David Sterba wrote:
> >> On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
> >>> Just add one extra line to show when the corruption is detected.
> >>> Currently only read time detection is possible.
> >>>
> >>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> >>> ---
> >>>  fs/btrfs/disk-io.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >>> index 794d5bb7fe33..426e9f450f70 100644
> >>> --- a/fs/btrfs/disk-io.c
> >>> +++ b/fs/btrfs/disk-io.c
> >>> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >>>  
> >>>  	if (!ret)
> >>>  		set_extent_buffer_uptodate(eb);
> >>> +	else
> >>> +		btrfs_err(fs_info, "read time tree block corrupted detected");
> >>
> >> I'm not sure the 'read time' is clear enoug, my suggestion is to use
> >> 'post-read' (and pre-write analogicaly). What do you think?
> > 
> > 
> > How about "error during tree block reading" or "error reading treeblock"?
> 
> Nikolay's suggestion looks more straightforward to me.
> 
> +1 for his idea.
> 
> The 'post-read' still could confuse end-user IMHO.

The idea is to distinguish if the error was because the block can't be
read or because the data it contains are wrong.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-31 14:20         ` David Sterba
@ 2019-01-31 14:22           ` Nikolay Borisov
  2019-02-07 17:27             ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: Nikolay Borisov @ 2019-01-31 14:22 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs



On 31.01.19 г. 16:20 ч., David Sterba wrote:
> On Thu, Jan 31, 2019 at 08:03:36AM +0800, Qu Wenruo wrote:
>> On 2019/1/30 下午10:59, Nikolay Borisov wrote:
>>> On 30.01.19 г. 16:57 ч., David Sterba wrote:
>>>> On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
>>>>> Just add one extra line to show when the corruption is detected.
>>>>> Currently only read time detection is possible.
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>>>>> ---
>>>>>  fs/btrfs/disk-io.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index 794d5bb7fe33..426e9f450f70 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>>>>  
>>>>>  	if (!ret)
>>>>>  		set_extent_buffer_uptodate(eb);
>>>>> +	else
>>>>> +		btrfs_err(fs_info, "read time tree block corrupted detected");
>>>>
>>>> I'm not sure the 'read time' is clear enoug, my suggestion is to use
>>>> 'post-read' (and pre-write analogicaly). What do you think?
>>>
>>>
>>> How about "error during tree block reading" or "error reading treeblock"?
>>
>> Nikolay's suggestion looks more straightforward to me.
>>
>> +1 for his idea.
>>
>> The 'post-read' still could confuse end-user IMHO.
> 
> The idea is to distinguish if the error was because the block can't be
> read or because the data it contains are wrong.

In this case we can say "Read corrupted block"

> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-31  0:45     ` Qu Wenruo
@ 2019-01-31 14:36       ` David Sterba
  2019-02-15  7:18         ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2019-01-31 14:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Jan 31, 2019 at 08:45:42AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/1/30 下午11:19, David Sterba wrote:
> > On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
> >> +static int __must_check flush_write_bio(struct extent_page_data *epd)
> >>  {
> >> -	if (epd->bio) {
> >> -		int ret;
> >> +	int ret = 0;
> >>  
> >> +	if (epd->bio) {
> >>  		ret = submit_one_bio(epd->bio, 0, 0);
> >> -		BUG_ON(ret < 0); /* -ENOMEM */
> >>  		epd->bio = NULL;
> > 
> > I'm not sure if resetting epd->bio to NULL is all that needs to be done
> > here. With the BUG_ON the error case never happens so if all goes fine
> > it's also ok to set it to NULL and continue. But the callers might need
> > to send the flush again.
> 
> If flush_write_bio() get called again on the failed one, it will just
> get skipped as  epd->bio is NULL, submit_one_bio() will not be triggered.

Yes that's clear, but what is the state of the bio that went to
flush_write_bio and submit_one_bio failed?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly
  2019-01-31 14:22           ` Nikolay Borisov
@ 2019-02-07 17:27             ` David Sterba
  0 siblings, 0 replies; 33+ messages in thread
From: David Sterba @ 2019-02-07 17:27 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Thu, Jan 31, 2019 at 04:22:23PM +0200, Nikolay Borisov wrote:
> 
> 
> On 31.01.19 г. 16:20 ч., David Sterba wrote:
> > On Thu, Jan 31, 2019 at 08:03:36AM +0800, Qu Wenruo wrote:
> >> On 2019/1/30 下午10:59, Nikolay Borisov wrote:
> >>> On 30.01.19 г. 16:57 ч., David Sterba wrote:
> >>>> On Fri, Jan 25, 2019 at 01:09:16PM +0800, Qu Wenruo wrote:
> >>>>> Just add one extra line to show when the corruption is detected.
> >>>>> Currently only read time detection is possible.
> >>>>>
> >>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>>>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> >>>>> ---
> >>>>>  fs/btrfs/disk-io.c | 2 ++
> >>>>>  1 file changed, 2 insertions(+)
> >>>>>
> >>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >>>>> index 794d5bb7fe33..426e9f450f70 100644
> >>>>> --- a/fs/btrfs/disk-io.c
> >>>>> +++ b/fs/btrfs/disk-io.c
> >>>>> @@ -658,6 +658,8 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >>>>>  
> >>>>>  	if (!ret)
> >>>>>  		set_extent_buffer_uptodate(eb);
> >>>>> +	else
> >>>>> +		btrfs_err(fs_info, "read time tree block corrupted detected");
> >>>>
> >>>> I'm not sure the 'read time' is clear enoug, my suggestion is to use
> >>>> 'post-read' (and pre-write analogicaly). What do you think?
> >>>
> >>>
> >>> How about "error during tree block reading" or "error reading treeblock"?
> >>
> >> Nikolay's suggestion looks more straightforward to me.
> >>
> >> +1 for his idea.
> >>
> >> The 'post-read' still could confuse end-user IMHO.
> > 
> > The idea is to distinguish if the error was because the block can't be
> > read or because the data it contains are wrong.
> 
> In this case we can say "Read corrupted block"

That's IMHO not better, so I'll stick with the first version 'read time'
and 'write time'.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-01-31 14:36       ` David Sterba
@ 2019-02-15  7:18         ` Qu Wenruo
  2019-02-15  9:58           ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2019-02-15  7:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1297 bytes --]



On 2019/1/31 下午10:36, David Sterba wrote:
> On Thu, Jan 31, 2019 at 08:45:42AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/1/30 下午11:19, David Sterba wrote:
>>> On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
>>>> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>>>>  {
>>>> -	if (epd->bio) {
>>>> -		int ret;
>>>> +	int ret = 0;
>>>>  
>>>> +	if (epd->bio) {
>>>>  		ret = submit_one_bio(epd->bio, 0, 0);
>>>> -		BUG_ON(ret < 0); /* -ENOMEM */
>>>>  		epd->bio = NULL;
>>>
>>> I'm not sure if resetting epd->bio to NULL is all that needs to be done
>>> here. With the BUG_ON the error case never happens so if all goes fine
>>> it's also ok to set it to NULL and continue. But the callers might need
>>> to send the flush again.
>>
>> If flush_write_bio() get called again on the failed one, it will just
>> get skipped as  epd->bio is NULL, submit_one_bio() will not be triggered.
> 
> Yes that's clear, but what is the state of the bio that went to
> flush_write_bio and submit_one_bio failed?
> 

You got me.

In fact current simple "epd->bio = NULL" on failure will just leak epd->bio.

We need to at least call "bio_put(epd->bio);" before setting "epd->bio =
NULL;"

Thanks for pointing this problem out,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  2019-02-15  7:18         ` Qu Wenruo
@ 2019-02-15  9:58           ` Qu Wenruo
  0 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2019-02-15  9:58 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1884 bytes --]



On 2019/2/15 下午3:18, Qu Wenruo wrote:
> 
> 
> On 2019/1/31 下午10:36, David Sterba wrote:
>> On Thu, Jan 31, 2019 at 08:45:42AM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2019/1/30 下午11:19, David Sterba wrote:
>>>> On Fri, Jan 25, 2019 at 01:09:17PM +0800, Qu Wenruo wrote:
>>>>> +static int __must_check flush_write_bio(struct extent_page_data *epd)
>>>>>  {
>>>>> -	if (epd->bio) {
>>>>> -		int ret;
>>>>> +	int ret = 0;
>>>>>  
>>>>> +	if (epd->bio) {
>>>>>  		ret = submit_one_bio(epd->bio, 0, 0);
>>>>> -		BUG_ON(ret < 0); /* -ENOMEM */
>>>>>  		epd->bio = NULL;
>>>>
>>>> I'm not sure if resetting epd->bio to NULL is all that needs to be done
>>>> here. With the BUG_ON the error case never happens so if all goes fine
>>>> it's also ok to set it to NULL and continue. But the callers might need
>>>> to send the flush again.
>>>
>>> If flush_write_bio() get called again on the failed one, it will just
>>> get skipped as  epd->bio is NULL, submit_one_bio() will not be triggered.
>>
>> Yes that's clear, but what is the state of the bio that went to
>> flush_write_bio and submit_one_bio failed?
>>
> 
> You got me.
> 
> In fact current simple "epd->bio = NULL" on failure will just leak epd->bio.
> 
> We need to at least call "bio_put(epd->bio);" before setting "epd->bio =
> NULL;"

Facepalm, I'm always too quick to draw a conclusion.

With proper test, the epd->bio is freed by its endio function.
So at flush_write_bio() call time, no matter whether submit_one_bio()
returns error or not, endio of epd->bio will be or has already been
triggered thus free the bio.

Either by successful bio submission, or by manual bio_endio(bio) call in
submit_bio_hook.

So in fact just setting "epd->bio = NULL" is completely fine in this
context.

Thanks,
Qu

> 
> Thanks for pointing this problem out,
> Qu
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-02-15  9:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-01-25  9:02   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-01-25  9:03   ` Johannes Thumshirn
2019-01-30 14:57   ` David Sterba
2019-01-30 14:59     ` Nikolay Borisov
2019-01-31  0:03       ` Qu Wenruo
2019-01-31 14:20         ` David Sterba
2019-01-31 14:22           ` Nikolay Borisov
2019-02-07 17:27             ` David Sterba
2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-01-25  9:04   ` Johannes Thumshirn
2019-01-30 15:08   ` David Sterba
2019-01-30 15:19   ` David Sterba
2019-01-31  0:45     ` Qu Wenruo
2019-01-31 14:36       ` David Sterba
2019-02-15  7:18         ` Qu Wenruo
2019-02-15  9:58           ` Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page() Qu Wenruo
2019-01-25  9:05   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages() Qu Wenruo
2019-01-25  9:11   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-01-25  9:11   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 08/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-01-25  9:16   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages() Qu Wenruo
2019-01-25  9:16   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).