All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
@ 2016-05-14  0:06 Liu Bo
  2016-05-14  0:06 ` [PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range Liu Bo
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:06 UTC (permalink / raw)
  To: linux-btrfs

This BUG() has been triggered by a fuzz testing image, but in fact
btrfs can handle this gracefully by returning -EIO.

Thus, use WARN_ONCE for warning purpose and don't leave a possible
kernel panic.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0b7792e..863f7fe 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-		BUG();
+		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");
 		if (generic_io)
 			btrfs_put_bbio(bbio);
 		kfree(rbio);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
@ 2016-05-14  0:06 ` Liu Bo
  2016-05-14  0:06 ` [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize Liu Bo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:06 UTC (permalink / raw)
  To: linux-btrfs

This BUG_ON is more like a warning since an invalid
btrfs_super_total_bytes() doesn't affect other stuff.

Thus, use WARN_ONCE for warning purpose and don't leave a possible
kernel panic here.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/inode.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2aaba58..5874562 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -974,8 +974,11 @@ static noinline int cow_file_range(struct inode *inode,
 		}
 	}
 
-	BUG_ON(disk_num_bytes >
-	       btrfs_super_total_bytes(root->fs_info->super_copy));
+	WARN_ONCE(disk_num_bytes >
+		  btrfs_super_total_bytes(root->fs_info->super_copy),
+	  KERN_WARNING "disk_num_bytes(%llu) > btrfs_super_total_bytes(%llu)\n",
+		  disk_num_bytes,
+		  btrfs_super_total_bytes(root->fs_info->super_copy));
 
 	alloc_hint = get_extent_allocation_hint(inode, start, num_bytes);
 	btrfs_drop_extent_cache(inode, start, start + num_bytes - 1, 0);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
  2016-05-14  0:06 ` [PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range Liu Bo
@ 2016-05-14  0:06 ` Liu Bo
  2016-05-14 10:30   ` Qu Wenruo
  2016-05-14  0:06 ` [PATCH 4/7] Btrfs: free sys_array eb as soon as possible Liu Bo
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:06 UTC (permalink / raw)
  To: linux-btrfs

Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer().  An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.

This adds a warning to let us quickly know what was happening.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d247fc0..e601e0f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4868,6 +4868,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	int uptodate = 1;
 	int ret;
 
+	WARN_ONCE(!IS_ALIGNED(start, fs_info->tree_root->sectorsize),
+		  KERN_WARNING "eb->start(%llu) is not aligned to root->sectorsize(%u)\n",
+		  start, fs_info->tree_root->sectorsize);
+
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
 		return eb;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/7] Btrfs: free sys_array eb as soon as possible
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
  2016-05-14  0:06 ` [PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range Liu Bo
  2016-05-14  0:06 ` [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize Liu Bo
@ 2016-05-14  0:06 ` Liu Bo
  2016-05-16  8:45   ` David Sterba
  2016-05-14  0:07 ` [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio Liu Bo
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:06 UTC (permalink / raw)
  To: linux-btrfs

While reading sys_chunk_array in superblock, btrfs creates a temporary
extent buffer.  Since we don't use it after finishing reading
 sys_chunk_array, we don't need to keep it in memory.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bd0f45f..1331606 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6586,13 +6586,13 @@ int btrfs_read_sys_array(struct btrfs_root *root)
 		sb_array_offset += len;
 		cur_offset += len;
 	}
-	free_extent_buffer(sb);
+	free_extent_buffer_stale(sb);
 	return ret;
 
 out_short_read:
 	printk(KERN_ERR "BTRFS: sys_array too short to read %u bytes at offset %u\n",
 			len, cur_offset);
-	free_extent_buffer(sb);
+	free_extent_buffer_stale(sb);
 	return -EIO;
 }
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (2 preceding siblings ...)
  2016-05-14  0:06 ` [PATCH 4/7] Btrfs: free sys_array eb as soon as possible Liu Bo
@ 2016-05-14  0:07 ` Liu Bo
  2016-05-16  8:44   ` David Sterba
  2016-05-14  0:07 ` [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure Liu Bo
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:07 UTC (permalink / raw)
  To: linux-btrfs

We have two BUG_ON in merge_bio, but since it can gracefully return errors
to callers, use WARN_ONCE to give error information and don't leave a
possible panic.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 1 -
 fs/btrfs/inode.c     | 6 ++++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e601e0f..99286d1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2746,7 +2746,6 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
 	if (tree->ops && tree->ops->merge_bio_hook)
 		ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
 						bio_flags);
-	BUG_ON(ret < 0);
 	return ret;
 
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5874562..3a989e3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1827,8 +1827,10 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
 	map_length = length;
 	ret = btrfs_map_block(root->fs_info, rw, logical,
 			      &map_length, NULL, 0);
-	/* Will always return 0 with map_multi == NULL */
-	BUG_ON(ret < 0);
+	if (ret < 0) {
+		WARN_ONCE(ret < 0, KERN_ERR "ret = %d\n", ret);
+		return ret;
+	}
 	if (map_length < length + size)
 		return 1;
 	return 0;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (3 preceding siblings ...)
  2016-05-14  0:07 ` [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio Liu Bo
@ 2016-05-14  0:07 ` Liu Bo
  2016-05-18 19:38   ` Josef Bacik
  2016-05-14  0:07 ` [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height Liu Bo
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:07 UTC (permalink / raw)
  To: linux-btrfs

eb->io_pages is set in read_extent_buffer_pages().

In case of readpage failure, for pages that have been added to bio,
it calls bio_endio and later readpage_io_failed_hook() does the work.

When this eb's page (couldn't be the 1st page) fails to add itself to bio
due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
 and ends up with a memory leak eventually.

This adds the 'atomic_dec(&eb->io_pages)' to the readpage error handling.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 99286d1..2327200 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3069,6 +3069,30 @@ static int __do_readpage(struct extent_io_tree *tree,
 			*bio_flags = this_bio_flag;
 		} else {
 			SetPageError(page);
+			/*
+			 * Only metadata io request has this issue, for data it
+			 * just unlocks extent and releases page's lock.
+			 *
+			 * eb->io_pages is set in read_extent_buffer_pages().
+			 *
+			 * When this eb's page fails to add itself to bio,
+			 * it cannot decrease eb->io_pages via bio_endio, and
+			 * ends up with extent_buffer_under_io() always being
+			 * true, because of that, eb won't be freed and we have
+			 * a memory leak eventually.
+			 *
+			 * Here we still hold this page's lock, and other tasks
+			 * who're also reading this eb are blocked.
+			 */
+			if (rw & REQ_META) {
+				struct extent_buffer *eb;
+
+				WARN_ON_ONCE(!PagePrivate(page));
+				eb = (struct extent_buffer *)page->private;
+
+				WARN_ON_ONCE(atomic_read(&eb->io_pages) < 1);
+				atomic_dec(&eb->io_pages);
+			}
 			unlock_extent(tree, cur, cur + iosize - 1);
 		}
 		cur = cur + iosize;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (4 preceding siblings ...)
  2016-05-14  0:07 ` [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure Liu Bo
@ 2016-05-14  0:07 ` Liu Bo
  2016-09-06 16:50   ` David Sterba
  2016-05-14 10:42 ` [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Qu Wenruo
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-14  0:07 UTC (permalink / raw)
  To: linux-btrfs

Thanks to fuzz testing, we can have invalid btree root node height.
Btrfs limits btree height to 7 and if the given height is 9, then btrfs
will have problems in both releasing root node's lock and freeing the node.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ctree.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index ec7928a..3fccbcc 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2756,6 +2756,13 @@ again:
 			}
 		}
 	}
+	if (level > BTRFS_MAX_LEVEL - 1 || level < 0) {
+		WARN_ONCE(1, KERN_WARNING "Invalid btree height %d\n", level);
+		if (!p->skip_locking)
+			btrfs_tree_unlock_rw(b, root_lock);
+		free_extent_buffer(b);
+		return -EINVAL;
+	}
 	p->nodes[level] = b;
 	if (!p->skip_locking)
 		p->locks[level] = root_lock;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize
  2016-05-14  0:06 ` [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize Liu Bo
@ 2016-05-14 10:30   ` Qu Wenruo
  2016-05-16 18:01     ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-05-14 10:30 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

Hi Liu,

Thanks for your patch first.

On 05/14/2016 08:06 AM, Liu Bo wrote:
> Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
> via alloc_extent_buffer().  An unaligned eb can have more pages than it
> should have, which ends up extent buffer's leak or some corrupted content
> in extent buffer.
>
> This adds a warning to let us quickly know what was happening.
>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d247fc0..e601e0f 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4868,6 +4868,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  	int uptodate = 1;
>  	int ret;
>
> +	WARN_ONCE(!IS_ALIGNED(start, fs_info->tree_root->sectorsize),
> +		  KERN_WARNING "eb->start(%llu) is not aligned to root->sectorsize(%u)\n",
> +		  start, fs_info->tree_root->sectorsize);
> +

IMHO this is a quite big problem. As almost all other things rely on the 
assumption that extent buffer are at least sectorsize aligned.

What about warning and returning NULL? WARN_ONCE() only won't info user 
quick enough.

BTW, after a quick glance into __alloc_extent_buffer(), it seems that we 
didn't check the return pointer of kmem_cache_zalloc(), since you're 
fixing things around that code, would you mind to fix it too?

Thanks,
Qu

>  	eb = find_extent_buffer(fs_info, start);
>  	if (eb)
>  		return eb;
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (5 preceding siblings ...)
  2016-05-14  0:07 ` [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height Liu Bo
@ 2016-05-14 10:42 ` Qu Wenruo
  2016-05-15 14:19 ` Holger Hoffstätte
  2016-06-30  0:57 ` [PATCH v2] Btrfs: remove BUG() " Liu Bo
  8 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2016-05-14 10:42 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs



On 05/14/2016 08:06 AM, Liu Bo wrote:
> This BUG() has been triggered by a fuzz testing image, but in fact
> btrfs can handle this gracefully by returning -EIO.
>
> Thus, use WARN_ONCE for warning purpose and don't leave a possible
> kernel panic.
>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/raid56.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 0b7792e..863f7fe 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
>
>  	rbio->faila = find_logical_bio_stripe(rbio, bio);
>  	if (rbio->faila == -1) {
> -		BUG();
> +		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");

Just a personal idea, it would be much better if we can use 
btrfs_info/error/warn() to output the warning message.
This would help to locate the fs causing the problem, if there are 
several btrfs mounted.

Like:
        btrfs_warn(root->fs_info, "rbio->faia is -1");
        WARN_ON(1);

Or adds a new helper like btrfs_dump_warn()?

Thanks,
Qu

>  		if (generic_io)
>  			btrfs_put_bbio(bbio);
>  		kfree(rbio);
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (6 preceding siblings ...)
  2016-05-14 10:42 ` [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Qu Wenruo
@ 2016-05-15 14:19 ` Holger Hoffstätte
  2016-05-16  8:32   ` David Sterba
  2016-06-30  0:57 ` [PATCH v2] Btrfs: remove BUG() " Liu Bo
  8 siblings, 1 reply; 32+ messages in thread
From: Holger Hoffstätte @ 2016-05-15 14:19 UTC (permalink / raw)
  To: linux-btrfs

On 05/14/16 02:06, Liu Bo wrote:
> This BUG() has been triggered by a fuzz testing image, but in fact
> btrfs can handle this gracefully by returning -EIO.
> 
> Thus, use WARN_ONCE for warning purpose and don't leave a possible
> kernel panic.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/raid56.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 0b7792e..863f7fe 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
>  
>  	rbio->faila = find_logical_bio_stripe(rbio, bio);
>  	if (rbio->faila == -1) {
> -		BUG();
> +		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");

I'm generally in favor of not BUGing out for no good reason, but what
is e.g. an admin (or user) supposed to do when he sees this message?
Same for the other rather cryptic WARNs - they contain no actionable
information, and are most likely going to be ignored as "debug spam".
IMHO things that can be ignored can be deleted.

thanks,
Holger


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
  2016-05-15 14:19 ` Holger Hoffstätte
@ 2016-05-16  8:32   ` David Sterba
  2016-10-12 15:06     ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-05-16  8:32 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-btrfs

On Sun, May 15, 2016 at 04:19:28PM +0200, Holger Hoffstätte wrote:
> On 05/14/16 02:06, Liu Bo wrote:
> > This BUG() has been triggered by a fuzz testing image, but in fact
> > btrfs can handle this gracefully by returning -EIO.
> > 
> > Thus, use WARN_ONCE for warning purpose and don't leave a possible
> > kernel panic.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/raid56.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index 0b7792e..863f7fe 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
> >  
> >  	rbio->faila = find_logical_bio_stripe(rbio, bio);
> >  	if (rbio->faila == -1) {
> > -		BUG();
> > +		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");
> 
> I'm generally in favor of not BUGing out for no good reason, but what
> is e.g. an admin (or user) supposed to do when he sees this message?
> Same for the other rather cryptic WARNs - they contain no actionable
> information, and are most likely going to be ignored as "debug spam".
> IMHO things that can be ignored can be deleted.

Agreed, the way this patchset repalces BUG on is very confusing.
WARN_ONCE is a global state, the message does not even print on which
filesystem the error happened. The only way to reset the state is to
unload the module.

This should be handled as a corruption, no matter if it's fuzzed or not,
report more details about what is corrupted or what was expected.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-14  0:07 ` [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio Liu Bo
@ 2016-05-16  8:44   ` David Sterba
  2016-05-16 17:24     ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-05-16  8:44 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Fri, May 13, 2016 at 05:07:00PM -0700, Liu Bo wrote:
> We have two BUG_ON in merge_bio, but since it can gracefully return errors
> to callers, use WARN_ONCE to give error information and don't leave a
> possible panic.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 1 -
>  fs/btrfs/inode.c     | 6 ++++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e601e0f..99286d1 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2746,7 +2746,6 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
>  	if (tree->ops && tree->ops->merge_bio_hook)
>  		ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
>  						bio_flags);
> -	BUG_ON(ret < 0);
>  	return ret;
>  
>  }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5874562..3a989e3 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1827,8 +1827,10 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
>  	map_length = length;
>  	ret = btrfs_map_block(root->fs_info, rw, logical,
>  			      &map_length, NULL, 0);
> -	/* Will always return 0 with map_multi == NULL */
> -	BUG_ON(ret < 0);
> +	if (ret < 0) {
> +		WARN_ONCE(ret < 0, KERN_ERR "ret = %d\n", ret);

btrfs_map_block is quite verbose about the errors so it's not needed to
print it here.

Otherwise I'm not sure if all paths that go through the merge hook
handle errors, eg. in btrfs_submit_compressed_read or _write. Some code
is skipped if merge hook returns nonzero. But, the code expects either 0
or 1, and when the BUG_ON(ret < 0) is removed suddenly the 'ret < 0' can
be returned. Unexpected.

It's IMO better to push up the BUG_ON error handling only one caller at
a time. That way it's easier to review the callgraph and call paths.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] Btrfs: free sys_array eb as soon as possible
  2016-05-14  0:06 ` [PATCH 4/7] Btrfs: free sys_array eb as soon as possible Liu Bo
@ 2016-05-16  8:45   ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-05-16  8:45 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Fri, May 13, 2016 at 05:06:59PM -0700, Liu Bo wrote:
> While reading sys_chunk_array in superblock, btrfs creates a temporary
> extent buffer.  Since we don't use it after finishing reading
>  sys_chunk_array, we don't need to keep it in memory.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

This patch is independent of the rest, I'll add it to next.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-16  8:44   ` David Sterba
@ 2016-05-16 17:24     ` Liu Bo
  2016-05-17  9:55       ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-16 17:24 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Mon, May 16, 2016 at 10:44:38AM +0200, David Sterba wrote:
> On Fri, May 13, 2016 at 05:07:00PM -0700, Liu Bo wrote:
> > We have two BUG_ON in merge_bio, but since it can gracefully return errors
> > to callers, use WARN_ONCE to give error information and don't leave a
> > possible panic.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_io.c | 1 -
> >  fs/btrfs/inode.c     | 6 ++++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index e601e0f..99286d1 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2746,7 +2746,6 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
> >  	if (tree->ops && tree->ops->merge_bio_hook)
> >  		ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
> >  						bio_flags);
> > -	BUG_ON(ret < 0);
> >  	return ret;
> >  
> >  }
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 5874562..3a989e3 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1827,8 +1827,10 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
> >  	map_length = length;
> >  	ret = btrfs_map_block(root->fs_info, rw, logical,
> >  			      &map_length, NULL, 0);
> > -	/* Will always return 0 with map_multi == NULL */
> > -	BUG_ON(ret < 0);
> > +	if (ret < 0) {
> > +		WARN_ONCE(ret < 0, KERN_ERR "ret = %d\n", ret);
> 
> btrfs_map_block is quite verbose about the errors so it's not needed to
> print it here.

OK.

> 
> Otherwise I'm not sure if all paths that go through the merge hook
> handle errors, eg. in btrfs_submit_compressed_read or _write. Some code
> is skipped if merge hook returns nonzero. But, the code expects either 0
> or 1, and when the BUG_ON(ret < 0) is removed suddenly the 'ret < 0' can
> be returned. Unexpected.

Right now btrfs_merge_bio_hook() only returns 1 or 0 and marks (ret < 0) as
 BUG_ON.

But compress code is ready to handle the error,

btrfs_submit_compressed_read/write() {
	...
	ret = merge_bio_hook()
	if (ret || bio_add_page()) {
		ret = btrfs_map_bio();
		BUG_ON(ret);
		...
	}
	...
}

So ret < 0 is handled, if there's any errors from merge_bio_hook(), it submits
 the current bio, so the way is sane to me.

The other consumer of merge_bio is submit_extent_page(), where it's OK
to return errors and callers are ready to handle them.

> 
> It's IMO better to push up the BUG_ON error handling only one caller at
> a time. That way it's easier to review the callgraph and call paths.

merge_bio() is a wrapper for tree->ops->merge_bio_hook(), which is the
same thing with btrfs_merge_bio_hook().  It makes no sense if we just
look at BUG_ON in btrfs_merge_bio_hook but keep BUG_ON() in merge_bio().

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize
  2016-05-14 10:30   ` Qu Wenruo
@ 2016-05-16 18:01     ` Liu Bo
  2016-05-17  9:39       ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-16 18:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sat, May 14, 2016 at 06:30:52PM +0800, Qu Wenruo wrote:
> Hi Liu,
> 
> Thanks for your patch first.
> 
> On 05/14/2016 08:06 AM, Liu Bo wrote:
> > Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
> > via alloc_extent_buffer().  An unaligned eb can have more pages than it
> > should have, which ends up extent buffer's leak or some corrupted content
> > in extent buffer.
> > 
> > This adds a warning to let us quickly know what was happening.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_io.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index d247fc0..e601e0f 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4868,6 +4868,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> >  	int uptodate = 1;
> >  	int ret;
> > 
> > +	WARN_ONCE(!IS_ALIGNED(start, fs_info->tree_root->sectorsize),
> > +		  KERN_WARNING "eb->start(%llu) is not aligned to root->sectorsize(%u)\n",
> > +		  start, fs_info->tree_root->sectorsize);
> > +
> 
> IMHO this is a quite big problem. As almost all other things rely on the
> assumption that extent buffer are at least sectorsize aligned.
> 

It won't cause too much trouble as reading eb's page can prevent btrfs
using this eb.

> What about warning and returning NULL? WARN_ONCE() only won't info user
> quick enough.

I'm OK with warning, but I just realized that warning doesn't show which
filesystem has problems, so btrfs_crit and -EINVAL is preferable.

> 
> BTW, after a quick glance into __alloc_extent_buffer(), it seems that we
> didn't check the return pointer of kmem_cache_zalloc(), since you're fixing
> things around that code, would you mind to fix it too?

It's not necessary to do that since it's using __GFP_NOFAIL.

Thanks,

-liubo

> 
> Thanks,
> Qu
> 
> >  	eb = find_extent_buffer(fs_info, start);
> >  	if (eb)
> >  		return eb;
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize
  2016-05-16 18:01     ` Liu Bo
@ 2016-05-17  9:39       ` David Sterba
  2016-05-17 17:38         ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-05-17  9:39 UTC (permalink / raw)
  To: Liu Bo; +Cc: Qu Wenruo, linux-btrfs

On Mon, May 16, 2016 at 11:01:41AM -0700, Liu Bo wrote:
> On Sat, May 14, 2016 at 06:30:52PM +0800, Qu Wenruo wrote:
> > Hi Liu,
> > 
> > Thanks for your patch first.
> > 
> > On 05/14/2016 08:06 AM, Liu Bo wrote:
> > > Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
> > > via alloc_extent_buffer().  An unaligned eb can have more pages than it
> > > should have, which ends up extent buffer's leak or some corrupted content
> > > in extent buffer.
> > > 
> > > This adds a warning to let us quickly know what was happening.
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > ---
> > >  fs/btrfs/extent_io.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > index d247fc0..e601e0f 100644
> > > --- a/fs/btrfs/extent_io.c
> > > +++ b/fs/btrfs/extent_io.c
> > > @@ -4868,6 +4868,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> > >  	int uptodate = 1;
> > >  	int ret;
> > > 
> > > +	WARN_ONCE(!IS_ALIGNED(start, fs_info->tree_root->sectorsize),
> > > +		  KERN_WARNING "eb->start(%llu) is not aligned to root->sectorsize(%u)\n",
> > > +		  start, fs_info->tree_root->sectorsize);
> > > +
> > 
> > IMHO this is a quite big problem. As almost all other things rely on the
> > assumption that extent buffer are at least sectorsize aligned.
> 
> It won't cause too much trouble as reading eb's page can prevent btrfs
> using this eb.
> 
> > What about warning and returning NULL? WARN_ONCE() only won't info user
> > quick enough.
> 
> I'm OK with warning, but I just realized that warning doesn't show which
> filesystem has problems, so btrfs_crit and -EINVAL is preferable.

NULL means it's allocation error, so please convert it to IS_ERR and
return more fine grained errors so we can distinguish the problems. An
unaligned 'start' almost always means a corruption or other problem in
the callers of alloc_extent_buffer().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-16 17:24     ` Liu Bo
@ 2016-05-17  9:55       ` David Sterba
  2016-05-17 17:30         ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-05-17  9:55 UTC (permalink / raw)
  To: Liu Bo; +Cc: dsterba, linux-btrfs

On Mon, May 16, 2016 at 10:24:01AM -0700, Liu Bo wrote:
> On Mon, May 16, 2016 at 10:44:38AM +0200, David Sterba wrote:
> > On Fri, May 13, 2016 at 05:07:00PM -0700, Liu Bo wrote:
> > > We have two BUG_ON in merge_bio, but since it can gracefully return errors
> > > to callers, use WARN_ONCE to give error information and don't leave a
> > > possible panic.
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > ---
> > >  fs/btrfs/extent_io.c | 1 -
> > >  fs/btrfs/inode.c     | 6 ++++--
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > index e601e0f..99286d1 100644
> > > --- a/fs/btrfs/extent_io.c
> > > +++ b/fs/btrfs/extent_io.c
> > > @@ -2746,7 +2746,6 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
> > >  	if (tree->ops && tree->ops->merge_bio_hook)
> > >  		ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
> > >  						bio_flags);
> > > -	BUG_ON(ret < 0);
> > >  	return ret;
> > >  
> > >  }
> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > index 5874562..3a989e3 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > @@ -1827,8 +1827,10 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
> > >  	map_length = length;
> > >  	ret = btrfs_map_block(root->fs_info, rw, logical,
> > >  			      &map_length, NULL, 0);
> > > -	/* Will always return 0 with map_multi == NULL */
> > > -	BUG_ON(ret < 0);
> > > +	if (ret < 0) {
> > > +		WARN_ONCE(ret < 0, KERN_ERR "ret = %d\n", ret);
> > 
> > btrfs_map_block is quite verbose about the errors so it's not needed to
> > print it here.
> 
> OK.
> 
> > 
> > Otherwise I'm not sure if all paths that go through the merge hook
> > handle errors, eg. in btrfs_submit_compressed_read or _write. Some code
> > is skipped if merge hook returns nonzero. But, the code expects either 0
> > or 1, and when the BUG_ON(ret < 0) is removed suddenly the 'ret < 0' can
> > be returned. Unexpected.
> 
> Right now btrfs_merge_bio_hook() only returns 1 or 0 and marks (ret < 0) as
>  BUG_ON.

IOW, "ret < 0" never leaves btrfs_merge_bio_hook in current code.

> But compress code is ready to handle the error,
> 
> btrfs_submit_compressed_read/write() {
> 	...
> 	ret = merge_bio_hook()
> 	if (ret || bio_add_page()) {

This code expects 0 or 1, < 0 is not expected here and has to be
considered after the change in merge_bio_hook.

> 		ret = btrfs_map_bio();
> 		BUG_ON(ret);
> 		...
> 	}
> 	...
> }
> 
> So ret < 0 is handled, if there's any errors from merge_bio_hook(), it submits
>  the current bio, so the way is sane to me.

Well, that's what I don't call 'handled'. 'ret < 0' will get to the 'if'
but the question is whether the code really expects to see < 0 there and
if it does the right thing then.

> The other consumer of merge_bio is submit_extent_page(), where it's OK
> to return errors and callers are ready to handle them.

Same argument.

'if (... || merge_bio() || ...)'

returns 1 "we can merge, continue with submit_bio" 
returns <0 "there was an error, we cannot continue" ... yet will still
try to do submit_bio.

> > It's IMO better to push up the BUG_ON error handling only one caller at
> > a time. That way it's easier to review the callgraph and call paths.
> 
> merge_bio() is a wrapper for tree->ops->merge_bio_hook(), which is the
> same thing with btrfs_merge_bio_hook().  It makes no sense if we just
> look at BUG_ON in btrfs_merge_bio_hook but keep BUG_ON() in merge_bio().

So, looking at the code again: the BUG_ON in btrfs_merge_bio_hook can be
replaced by return ret (no warning needed IMO).

If merge_bio gets rid of the BUG_ON, the calles must explicitly handle
'ret < 0' unless it's provably not a problem.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-17  9:55       ` David Sterba
@ 2016-05-17 17:30         ` Liu Bo
  2016-05-18 13:54           ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-05-17 17:30 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Tue, May 17, 2016 at 11:55:19AM +0200, David Sterba wrote:
> On Mon, May 16, 2016 at 10:24:01AM -0700, Liu Bo wrote:
> > On Mon, May 16, 2016 at 10:44:38AM +0200, David Sterba wrote:
> > > On Fri, May 13, 2016 at 05:07:00PM -0700, Liu Bo wrote:
> > > > We have two BUG_ON in merge_bio, but since it can gracefully return errors
> > > > to callers, use WARN_ONCE to give error information and don't leave a
> > > > possible panic.
> > > > 
> > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > > ---
> > > >  fs/btrfs/extent_io.c | 1 -
> > > >  fs/btrfs/inode.c     | 6 ++++--
> > > >  2 files changed, 4 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > index e601e0f..99286d1 100644
> > > > --- a/fs/btrfs/extent_io.c
> > > > +++ b/fs/btrfs/extent_io.c
> > > > @@ -2746,7 +2746,6 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
> > > >  	if (tree->ops && tree->ops->merge_bio_hook)
> > > >  		ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio,
> > > >  						bio_flags);
> > > > -	BUG_ON(ret < 0);
> > > >  	return ret;
> > > >  
> > > >  }
> > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > > index 5874562..3a989e3 100644
> > > > --- a/fs/btrfs/inode.c
> > > > +++ b/fs/btrfs/inode.c
> > > > @@ -1827,8 +1827,10 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
> > > >  	map_length = length;
> > > >  	ret = btrfs_map_block(root->fs_info, rw, logical,
> > > >  			      &map_length, NULL, 0);
> > > > -	/* Will always return 0 with map_multi == NULL */
> > > > -	BUG_ON(ret < 0);
> > > > +	if (ret < 0) {
> > > > +		WARN_ONCE(ret < 0, KERN_ERR "ret = %d\n", ret);
> > > 
> > > btrfs_map_block is quite verbose about the errors so it's not needed to
> > > print it here.
> > 
> > OK.
> > 
> > > 
> > > Otherwise I'm not sure if all paths that go through the merge hook
> > > handle errors, eg. in btrfs_submit_compressed_read or _write. Some code
> > > is skipped if merge hook returns nonzero. But, the code expects either 0
> > > or 1, and when the BUG_ON(ret < 0) is removed suddenly the 'ret < 0' can
> > > be returned. Unexpected.
> > 
> > Right now btrfs_merge_bio_hook() only returns 1 or 0 and marks (ret < 0) as
> >  BUG_ON.
> 
> IOW, "ret < 0" never leaves btrfs_merge_bio_hook in current code.

OK.

> 
> > But compress code is ready to handle the error,
> > 
> > btrfs_submit_compressed_read/write() {
> > 	...
> > 	ret = merge_bio_hook()
> > 	if (ret || bio_add_page()) {
> 
> This code expects 0 or 1, < 0 is not expected here and has to be
> considered after the change in merge_bio_hook.

OK.

> 
> > 		ret = btrfs_map_bio();
> > 		BUG_ON(ret);
> > 		...
> > 	}
> > 	...
> > }
> > 
> > So ret < 0 is handled, if there's any errors from merge_bio_hook(), it submits
> >  the current bio, so the way is sane to me.
> 
> Well, that's what I don't call 'handled'. 'ret < 0' will get to the 'if'
> but the question is whether the code really expects to see < 0 there and
> if it does the right thing then.

OK.

> 
> > The other consumer of merge_bio is submit_extent_page(), where it's OK
> > to return errors and callers are ready to handle them.
> 
> Same argument.
> 
> 'if (... || merge_bio() || ...)'
> 
> returns 1 "we can merge, continue with submit_bio" 

return 1 means "we _cannot_ merge" ;-)

> returns <0 "there was an error, we cannot continue" ... yet will still
> try to do submit_bio.
> 
> > > It's IMO better to push up the BUG_ON error handling only one caller at
> > > a time. That way it's easier to review the callgraph and call paths.
> > 
> > merge_bio() is a wrapper for tree->ops->merge_bio_hook(), which is the
> > same thing with btrfs_merge_bio_hook().  It makes no sense if we just
> > look at BUG_ON in btrfs_merge_bio_hook but keep BUG_ON() in merge_bio().
> 
> So, looking at the code again: the BUG_ON in btrfs_merge_bio_hook can be
> replaced by return ret (no warning needed IMO).

OK.

> 
> If merge_bio gets rid of the BUG_ON, the calles must explicitly handle
> 'ret < 0' unless it's provably not a problem.

If merge_bio() returns < 0, then it must be __btrfs_map_block() returns < 0,
so even if we continue with submiting this bio, it'd fail at
__btrfs_map_block() again because merge_bio and submit_bio are using the
same bio.  Because of that we don't bother to submit it, instead we can
just return in case of (ret < 0), this applies to both
submit_extent_page() and btrfs_submit_compressed_read/write().

What do you think?

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize
  2016-05-17  9:39       ` David Sterba
@ 2016-05-17 17:38         ` Liu Bo
  0 siblings, 0 replies; 32+ messages in thread
From: Liu Bo @ 2016-05-17 17:38 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, May 17, 2016 at 11:39:52AM +0200, David Sterba wrote:
> On Mon, May 16, 2016 at 11:01:41AM -0700, Liu Bo wrote:
> > On Sat, May 14, 2016 at 06:30:52PM +0800, Qu Wenruo wrote:
> > > Hi Liu,
> > > 
> > > Thanks for your patch first.
> > > 
> > > On 05/14/2016 08:06 AM, Liu Bo wrote:
> > > > Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
> > > > via alloc_extent_buffer().  An unaligned eb can have more pages than it
> > > > should have, which ends up extent buffer's leak or some corrupted content
> > > > in extent buffer.
> > > > 
> > > > This adds a warning to let us quickly know what was happening.
> > > > 
> > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > > ---
> > > >  fs/btrfs/extent_io.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > index d247fc0..e601e0f 100644
> > > > --- a/fs/btrfs/extent_io.c
> > > > +++ b/fs/btrfs/extent_io.c
> > > > @@ -4868,6 +4868,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> > > >  	int uptodate = 1;
> > > >  	int ret;
> > > > 
> > > > +	WARN_ONCE(!IS_ALIGNED(start, fs_info->tree_root->sectorsize),
> > > > +		  KERN_WARNING "eb->start(%llu) is not aligned to root->sectorsize(%u)\n",
> > > > +		  start, fs_info->tree_root->sectorsize);
> > > > +
> > > 
> > > IMHO this is a quite big problem. As almost all other things rely on the
> > > assumption that extent buffer are at least sectorsize aligned.
> > 
> > It won't cause too much trouble as reading eb's page can prevent btrfs
> > using this eb.
> > 
> > > What about warning and returning NULL? WARN_ONCE() only won't info user
> > > quick enough.
> > 
> > I'm OK with warning, but I just realized that warning doesn't show which
> > filesystem has problems, so btrfs_crit and -EINVAL is preferable.
> 
> NULL means it's allocation error, so please convert it to IS_ERR and
> return more fine grained errors so we can distinguish the problems. An
> unaligned 'start' almost always means a corruption or other problem in
> the callers of alloc_extent_buffer().

OK, sounds good.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio
  2016-05-17 17:30         ` Liu Bo
@ 2016-05-18 13:54           ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-05-18 13:54 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Tue, May 17, 2016 at 10:30:47AM -0700, Liu Bo wrote:
> > If merge_bio gets rid of the BUG_ON, the calles must explicitly handle
> > 'ret < 0' unless it's provably not a problem.
> 
> If merge_bio() returns < 0, then it must be __btrfs_map_block() returns < 0,
> so even if we continue with submiting this bio, it'd fail at
> __btrfs_map_block() again because merge_bio and submit_bio are using the
> same bio.  Because of that we don't bother to submit it, instead we can
> just return in case of (ret < 0), this applies to both
> submit_extent_page() and btrfs_submit_compressed_read/write().
> 
> What do you think?

Makes sense to me. All the partial processing (allocations and bios)
need to be cleaned up, which does not seem trivial from the first look.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure
  2016-05-14  0:07 ` [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure Liu Bo
@ 2016-05-18 19:38   ` Josef Bacik
  0 siblings, 0 replies; 32+ messages in thread
From: Josef Bacik @ 2016-05-18 19:38 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

On 05/13/2016 08:07 PM, Liu Bo wrote:
> eb->io_pages is set in read_extent_buffer_pages().
>
> In case of readpage failure, for pages that have been added to bio,
> it calls bio_endio and later readpage_io_failed_hook() does the work.
>
> When this eb's page (couldn't be the 1st page) fails to add itself to bio
> due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio,
>  and ends up with a memory leak eventually.
>
> This adds the 'atomic_dec(&eb->io_pages)' to the readpage error handling.

Wait why can't this be done in
>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 99286d1..2327200 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3069,6 +3069,30 @@ static int __do_readpage(struct extent_io_tree *tree,
>  			*bio_flags = this_bio_flag;
>  		} else {
>  			SetPageError(page);
> +			/*
> +			 * Only metadata io request has this issue, for data it
> +			 * just unlocks extent and releases page's lock.
> +			 *
> +			 * eb->io_pages is set in read_extent_buffer_pages().
> +			 *
> +			 * When this eb's page fails to add itself to bio,
> +			 * it cannot decrease eb->io_pages via bio_endio, and
> +			 * ends up with extent_buffer_under_io() always being
> +			 * true, because of that, eb won't be freed and we have
> +			 * a memory leak eventually.
> +			 *
> +			 * Here we still hold this page's lock, and other tasks
> +			 * who're also reading this eb are blocked.
> +			 */
> +			if (rw & REQ_META) {
> +				struct extent_buffer *eb;
> +
> +				WARN_ON_ONCE(!PagePrivate(page));
> +				eb = (struct extent_buffer *)page->private;
> +
> +				WARN_ON_ONCE(atomic_read(&eb->io_pages) < 1);
> +				atomic_dec(&eb->io_pages);
> +			}
>  			unlock_extent(tree, cur, cur + iosize - 1);
>  		}
>  		cur = cur + iosize;
>

This isn't the right way to do this.  It looks like we don't propagate 
up errors from __do_readpage, which we need to in order to clean up 
properly.  So do that and then change the error stuff to decrement the 
io_pages for the remaining, you can see write_one_eb for how to deal 
with that properly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] Btrfs: remove BUG() in raid56
  2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
                   ` (7 preceding siblings ...)
  2016-05-15 14:19 ` Holger Hoffstätte
@ 2016-06-30  0:57 ` Liu Bo
  2016-07-26 16:58   ` David Sterba
  2016-07-27 18:56   ` [PATCH v3] " Liu Bo
  8 siblings, 2 replies; 32+ messages in thread
From: Liu Bo @ 2016-06-30  0:57 UTC (permalink / raw)
  To: linux-btrfs

This BUG() has been triggered by a fuzz testing image, but in fact
btrfs can handle this gracefully by returning -EIO.

Thus, use btrfs_warn to give us more debugging information than a 
single BUG() and return error properly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: - use btrfs_warn with more debugging information instead of WARN_ONCE.
    - change the patch title.

 fs/btrfs/raid56.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f8b6d41..5f4712c 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2139,7 +2139,10 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-		BUG();
+		btrfs_warn(root->fs_info,
+	"rbio->faila is -1: (bio has logical %llu len %llu, bbio has map_type %llu)",
+			   (u64)bio->bi_iter.bi_sector << 9,
+			   (u64)bio->bi_iter.bi_size, bbio->map_type);
 		if (generic_io)
 			btrfs_put_bbio(bbio);
 		kfree(rbio);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] Btrfs: remove BUG() in raid56
  2016-06-30  0:57 ` [PATCH v2] Btrfs: remove BUG() " Liu Bo
@ 2016-07-26 16:58   ` David Sterba
  2016-07-27  5:11     ` Liu Bo
  2016-07-27 18:56   ` [PATCH v3] " Liu Bo
  1 sibling, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-07-26 16:58 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Jun 29, 2016 at 05:57:56PM -0700, Liu Bo wrote:
> This BUG() has been triggered by a fuzz testing image, but in fact
> btrfs can handle this gracefully by returning -EIO.
> 
> Thus, use btrfs_warn to give us more debugging information than a 
> single BUG() and return error properly.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v2: - use btrfs_warn with more debugging information instead of WARN_ONCE.
>     - change the patch title.
> 
>  fs/btrfs/raid56.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index f8b6d41..5f4712c 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2139,7 +2139,10 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
>  
>  	rbio->faila = find_logical_bio_stripe(rbio, bio);
>  	if (rbio->faila == -1) {
> -		BUG();
> +		btrfs_warn(root->fs_info,
> +	"rbio->faila is -1: (bio has logical %llu len %llu, bbio has map_type %llu)",

That's rather cryptic message for a casual user, can it be prepended by
a short summary what actually happened? Like "bad stripe for parity" or
whatever seems more appropriate to you. Also the changelog could
describe the error condition.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] Btrfs: remove BUG() in raid56
  2016-07-26 16:58   ` David Sterba
@ 2016-07-27  5:11     ` Liu Bo
  0 siblings, 0 replies; 32+ messages in thread
From: Liu Bo @ 2016-07-27  5:11 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs

On Tue, Jul 26, 2016 at 06:58:42PM +0200, David Sterba wrote:
> On Wed, Jun 29, 2016 at 05:57:56PM -0700, Liu Bo wrote:
> > This BUG() has been triggered by a fuzz testing image, but in fact
> > btrfs can handle this gracefully by returning -EIO.
> > 
> > Thus, use btrfs_warn to give us more debugging information than a 
> > single BUG() and return error properly.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> > v2: - use btrfs_warn with more debugging information instead of WARN_ONCE.
> >     - change the patch title.
> > 
> >  fs/btrfs/raid56.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index f8b6d41..5f4712c 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -2139,7 +2139,10 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
> >  
> >  	rbio->faila = find_logical_bio_stripe(rbio, bio);
> >  	if (rbio->faila == -1) {
> > -		BUG();
> > +		btrfs_warn(root->fs_info,
> > +	"rbio->faila is -1: (bio has logical %llu len %llu, bbio has map_type %llu)",
> 
> That's rather cryptic message for a casual user, can it be prepended by
> a short summary what actually happened? Like "bad stripe for parity" or
> whatever seems more appropriate to you. Also the changelog could
> describe the error condition.

Good point, I'll update it.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3] Btrfs: remove BUG() in raid56
  2016-06-30  0:57 ` [PATCH v2] Btrfs: remove BUG() " Liu Bo
  2016-07-26 16:58   ` David Sterba
@ 2016-07-27 18:56   ` Liu Bo
  2016-07-29 16:53     ` David Sterba
  2016-07-29 17:57     ` [PATCH v4] " Liu Bo
  1 sibling, 2 replies; 32+ messages in thread
From: Liu Bo @ 2016-07-27 18:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This BUG() has been triggered by a fuzz testing image, which contains
an invalid chunk type, ie. a single stripe chunk has the raid6 type.

Btrfs can handle this gracefully by returning -EIO, so besides using
btrfs_warn to give us more debugging information rather than a single
BUG(), we can return error properly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: use btrfs_warn with more debugging information instead of WARN_ONCE.
v3: - give a short summary about what happens when the error occurs.
    - add a ASSERT() which only takes effect for developers.

 fs/btrfs/raid56.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f8b6d41..6f8addf 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2139,7 +2139,11 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-		BUG();
+		btrfs_warn(root->fs_info,
+	"%s could not find the bad stripe in raid56 so that we cannot recover any more (bio has logical %llu len %llu, bbio has map_type %llu)",
+			   __func__, (u64)bio->bi_iter.bi_sector << 9,
+			   (u64)bio->bi_iter.bi_size, bbio->map_type);
+		ASSERT(0);
 		if (generic_io)
 			btrfs_put_bbio(bbio);
 		kfree(rbio);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] Btrfs: remove BUG() in raid56
  2016-07-27 18:56   ` [PATCH v3] " Liu Bo
@ 2016-07-29 16:53     ` David Sterba
  2016-07-29 17:57     ` [PATCH v4] " Liu Bo
  1 sibling, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-07-29 16:53 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Jul 27, 2016 at 11:56:47AM -0700, Liu Bo wrote:
> This BUG() has been triggered by a fuzz testing image, which contains
> an invalid chunk type, ie. a single stripe chunk has the raid6 type.
> 
> Btrfs can handle this gracefully by returning -EIO, so besides using
> btrfs_warn to give us more debugging information rather than a single
> BUG(), we can return error properly.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v2: use btrfs_warn with more debugging information instead of WARN_ONCE.
> v3: - give a short summary about what happens when the error occurs.
>     - add a ASSERT() which only takes effect for developers.
> 
>  fs/btrfs/raid56.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index f8b6d41..6f8addf 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2139,7 +2139,11 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
>  
>  	rbio->faila = find_logical_bio_stripe(rbio, bio);
>  	if (rbio->faila == -1) {
> -		BUG();
> +		btrfs_warn(root->fs_info,
> +	"%s could not find the bad stripe in raid56 so that we cannot recover any more (bio has logical %llu len %llu, bbio has map_type %llu)",
> +			   __func__, (u64)bio->bi_iter.bi_sector << 9,
> +			   (u64)bio->bi_iter.bi_size, bbio->map_type);
> +		ASSERT(0);

Why is this here? Say we'll test a fuzzed image, this would stop any
further testing. The message and EIO should be fine.

>  		if (generic_io)
>  			btrfs_put_bbio(bbio);
>  		kfree(rbio);
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4] Btrfs: remove BUG() in raid56
  2016-07-27 18:56   ` [PATCH v3] " Liu Bo
  2016-07-29 16:53     ` David Sterba
@ 2016-07-29 17:57     ` Liu Bo
  2016-08-24 12:11       ` David Sterba
  1 sibling, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-07-29 17:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This BUG() has been triggered by a fuzz testing image, which contains
an invalid chunk type, ie. a single stripe chunk has the raid6 type.

Btrfs can handle this gracefully by returning -EIO, so besides using
btrfs_warn to give us more debugging information rather than a single
BUG(), we can return error properly.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: use btrfs_warn with more debugging information instead of WARN_ONCE.
v3: give a short summary about what happens when the error occurs and
    add a ASSERT(0).
v4: remove ASSERT(0).

 fs/btrfs/raid56.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f8b6d41..bb0c14f 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2139,7 +2139,10 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-		BUG();
+		btrfs_warn(root->fs_info,
+	"%s could not find the bad stripe in raid56 so that we cannot recover any more (bio has logical %llu len %llu, bbio has map_type %llu)",
+			   __func__, (u64)bio->bi_iter.bi_sector << 9,
+			   (u64)bio->bi_iter.bi_size, bbio->map_type);
 		if (generic_io)
 			btrfs_put_bbio(bbio);
 		kfree(rbio);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4] Btrfs: remove BUG() in raid56
  2016-07-29 17:57     ` [PATCH v4] " Liu Bo
@ 2016-08-24 12:11       ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-08-24 12:11 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, David Sterba

On Fri, Jul 29, 2016 at 10:57:55AM -0700, Liu Bo wrote:
> This BUG() has been triggered by a fuzz testing image, which contains
> an invalid chunk type, ie. a single stripe chunk has the raid6 type.
> 
> Btrfs can handle this gracefully by returning -EIO, so besides using
> btrfs_warn to give us more debugging information rather than a single
> BUG(), we can return error properly.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height
  2016-05-14  0:07 ` [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height Liu Bo
@ 2016-09-06 16:50   ` David Sterba
  2016-09-06 22:04     ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-09-06 16:50 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Fri, May 13, 2016 at 05:07:02PM -0700, Liu Bo wrote:
> Thanks to fuzz testing, we can have invalid btree root node height.

Shouldn't we do this kind of sanity checks earlier? Not at the search
slot time but when it's read from disk. The check that you're adding can
stay, but without the early check we could hit it very often thus making
it very noisy.

> Btrfs limits btree height to 7 and if the given height is 9, then btrfs
> will have problems in both releasing root node's lock and freeing the node.


> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/ctree.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index ec7928a..3fccbcc 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2756,6 +2756,13 @@ again:
>  			}
>  		}
>  	}
> +	if (level > BTRFS_MAX_LEVEL - 1 || level < 0) {
> +		WARN_ONCE(1, KERN_WARNING "Invalid btree height %d\n", level);
> +		if (!p->skip_locking)
> +			btrfs_tree_unlock_rw(b, root_lock);
> +		free_extent_buffer(b);
> +		return -EINVAL;
> +	}
>  	p->nodes[level] = b;
>  	if (!p->skip_locking)
>  		p->locks[level] = root_lock;
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height
  2016-09-06 16:50   ` David Sterba
@ 2016-09-06 22:04     ` Liu Bo
  0 siblings, 0 replies; 32+ messages in thread
From: Liu Bo @ 2016-09-06 22:04 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs

On Tue, Sep 06, 2016 at 06:50:19PM +0200, David Sterba wrote:
> On Fri, May 13, 2016 at 05:07:02PM -0700, Liu Bo wrote:
> > Thanks to fuzz testing, we can have invalid btree root node height.
> 
> Shouldn't we do this kind of sanity checks earlier? Not at the search
> slot time but when it's read from disk. The check that you're adding can
> stay, but without the early check we could hit it very often thus making
> it very noisy.

We do have such an early check when it's read from disk
(btree_readpage_end_io_hook) and this can protect us from 99.9% cases,
the only corner case is that the fuzz image changes our chunk root node
to superblock bytenr, so we firstly reads superblock into a dummy eb, and when
we get to read chunk root, we firstly search eb tree and find one eb
matching the bytenr, then we take this invalid eb to do
btrfs_search_slot() and we come cross this surprise.

Anyway, this patch was made before I found we could actually free
superblock's eb immediately after use.  Now with freeing that eb I don't
think we can have the above problem.

Thanks,

-liubo

> 
> > Btrfs limits btree height to 7 and if the given height is 9, then btrfs
> > will have problems in both releasing root node's lock and freeing the node.
> 
> 
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/ctree.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index ec7928a..3fccbcc 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -2756,6 +2756,13 @@ again:
> >  			}
> >  		}
> >  	}
> > +	if (level > BTRFS_MAX_LEVEL - 1 || level < 0) {
> > +		WARN_ONCE(1, KERN_WARNING "Invalid btree height %d\n", level);
> > +		if (!p->skip_locking)
> > +			btrfs_tree_unlock_rw(b, root_lock);
> > +		free_extent_buffer(b);
> > +		return -EINVAL;
> > +	}
> >  	p->nodes[level] = b;
> >  	if (!p->skip_locking)
> >  		p->locks[level] = root_lock;
> > -- 
> > 2.5.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
  2016-05-16  8:32   ` David Sterba
@ 2016-10-12 15:06     ` David Sterba
  2016-10-12 19:14       ` Liu Bo
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2016-10-12 15:06 UTC (permalink / raw)
  To: dsterba, Holger Hoffstätte, linux-btrfs

On Mon, May 16, 2016 at 10:32:48AM +0200, David Sterba wrote:
> On Sun, May 15, 2016 at 04:19:28PM +0200, Holger Hoffstätte wrote:
> > On 05/14/16 02:06, Liu Bo wrote:
> > > This BUG() has been triggered by a fuzz testing image, but in fact
> > > btrfs can handle this gracefully by returning -EIO.
> > > 
> > > Thus, use WARN_ONCE for warning purpose and don't leave a possible
> > > kernel panic.
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > ---
> > >  fs/btrfs/raid56.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > > index 0b7792e..863f7fe 100644
> > > --- a/fs/btrfs/raid56.c
> > > +++ b/fs/btrfs/raid56.c
> > > @@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
> > >  
> > >  	rbio->faila = find_logical_bio_stripe(rbio, bio);
> > >  	if (rbio->faila == -1) {
> > > -		BUG();
> > > +		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");
> > 
> > I'm generally in favor of not BUGing out for no good reason, but what
> > is e.g. an admin (or user) supposed to do when he sees this message?
> > Same for the other rather cryptic WARNs - they contain no actionable
> > information, and are most likely going to be ignored as "debug spam".
> > IMHO things that can be ignored can be deleted.
> 
> Agreed, the way this patchset repalces BUG on is very confusing.
> WARN_ONCE is a global state, the message does not even print on which
> filesystem the error happened. The only way to reset the state is to
> unload the module.
> 
> This should be handled as a corruption, no matter if it's fuzzed or not,
> report more details about what is corrupted or what was expected.

Looking again at the patch, it compares an inode property (a range to
cow) against a global filesystem size, stored in superblock. This does
not IMO belong here, either we'd have to do such check everywhere (and
expect that it could really happen) or it should be removed completely.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56
  2016-10-12 15:06     ` David Sterba
@ 2016-10-12 19:14       ` Liu Bo
  0 siblings, 0 replies; 32+ messages in thread
From: Liu Bo @ 2016-10-12 19:14 UTC (permalink / raw)
  To: dsterba; +Cc: Holger Hoffstätte, linux-btrfs

On Wed, Oct 12, 2016 at 05:06:55PM +0200, David Sterba wrote:
> On Mon, May 16, 2016 at 10:32:48AM +0200, David Sterba wrote:
> > On Sun, May 15, 2016 at 04:19:28PM +0200, Holger Hoffstätte wrote:
> > > On 05/14/16 02:06, Liu Bo wrote:
> > > > This BUG() has been triggered by a fuzz testing image, but in fact
> > > > btrfs can handle this gracefully by returning -EIO.
> > > > 
> > > > Thus, use WARN_ONCE for warning purpose and don't leave a possible
> > > > kernel panic.
> > > > 
> > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > > ---
> > > >  fs/btrfs/raid56.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > > > index 0b7792e..863f7fe 100644
> > > > --- a/fs/btrfs/raid56.c
> > > > +++ b/fs/btrfs/raid56.c
> > > > @@ -2139,7 +2139,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,
> > > >  
> > > >  	rbio->faila = find_logical_bio_stripe(rbio, bio);
> > > >  	if (rbio->faila == -1) {
> > > > -		BUG();
> > > > +		WARN_ONCE(1, KERN_WARNING "rbio->faila is -1\n");
> > > 
> > > I'm generally in favor of not BUGing out for no good reason, but what
> > > is e.g. an admin (or user) supposed to do when he sees this message?
> > > Same for the other rather cryptic WARNs - they contain no actionable
> > > information, and are most likely going to be ignored as "debug spam".
> > > IMHO things that can be ignored can be deleted.
> > 
> > Agreed, the way this patchset repalces BUG on is very confusing.
> > WARN_ONCE is a global state, the message does not even print on which
> > filesystem the error happened. The only way to reset the state is to
> > unload the module.
> > 
> > This should be handled as a corruption, no matter if it's fuzzed or not,
> > report more details about what is corrupted or what was expected.
> 
> Looking again at the patch, it compares an inode property (a range to
> cow) against a global filesystem size, stored in superblock. This does
> not IMO belong here, either we'd have to do such check everywhere (and
> expect that it could really happen) or it should be removed completely.

(Are we talking about "[PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range"?)

In theory we don't need to do such a check because we've gone through
the reservation part which ensures that we have enough space,
whether super::total_bytes is valid can be verified during the mount
stage.  I prefer to removing it.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-10-12 19:10 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-14  0:06 [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Liu Bo
2016-05-14  0:06 ` [PATCH 2/7] Btrfs: replace BUG_ON with WARN_ONCE in cow_file_range Liu Bo
2016-05-14  0:06 ` [PATCH 3/7] Btrfs: check if extent buffer is aligned to sectorsize Liu Bo
2016-05-14 10:30   ` Qu Wenruo
2016-05-16 18:01     ` Liu Bo
2016-05-17  9:39       ` David Sterba
2016-05-17 17:38         ` Liu Bo
2016-05-14  0:06 ` [PATCH 4/7] Btrfs: free sys_array eb as soon as possible Liu Bo
2016-05-16  8:45   ` David Sterba
2016-05-14  0:07 ` [PATCH 5/7] Btrfs: replace BUG_ON with WARN in merge_bio Liu Bo
2016-05-16  8:44   ` David Sterba
2016-05-16 17:24     ` Liu Bo
2016-05-17  9:55       ` David Sterba
2016-05-17 17:30         ` Liu Bo
2016-05-18 13:54           ` David Sterba
2016-05-14  0:07 ` [PATCH 6/7] Btrfs: fix eb memory leak due to readpage failure Liu Bo
2016-05-18 19:38   ` Josef Bacik
2016-05-14  0:07 ` [PATCH 7/7] Btrfs: fix memory leak due to invalid btree height Liu Bo
2016-09-06 16:50   ` David Sterba
2016-09-06 22:04     ` Liu Bo
2016-05-14 10:42 ` [PATCH 1/7] Btrfs: replace BUG() with WARN_ONCE in raid56 Qu Wenruo
2016-05-15 14:19 ` Holger Hoffstätte
2016-05-16  8:32   ` David Sterba
2016-10-12 15:06     ` David Sterba
2016-10-12 19:14       ` Liu Bo
2016-06-30  0:57 ` [PATCH v2] Btrfs: remove BUG() " Liu Bo
2016-07-26 16:58   ` David Sterba
2016-07-27  5:11     ` Liu Bo
2016-07-27 18:56   ` [PATCH v3] " Liu Bo
2016-07-29 16:53     ` David Sterba
2016-07-29 17:57     ` [PATCH v4] " Liu Bo
2016-08-24 12:11       ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.