All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in
@ 2020-08-04  7:25 Qu Wenruo
  2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:25 UTC (permalink / raw)
  To: linux-btrfs

This is an attempt to remove the inode_need_compress() call in
compress_file_extent().

As that compress_file_extent() can race with inode ioctl or bad
compression ratio, to cause NULL pointer dereferecen for @pages, it's
nature to try to remove that inode_need_compress() to remove the race
completely.

However that's not that easy, we have the following problems:

- We still need to check @pages anyway
  That @pages check is for kcalloc() failure, so what we really get is
  just removing one indent from the if (inode_need_compress()).
  Everything else is still the same (in fact, even worse, see below
  problems)

- Behavior change
  Before that change, every async_chunk does their check on
  INODE_NO_COMPRESS flags.
  If we hit any bad compression ratio, all incoming async_chunk will
  fall back to plain text write.

  But if we remove that inode_need_compress() check, then we still try
  to compress, and lead to potentially wasted CPU times.

- Still race between compression disable and NULL pointer dereferecen
  There is a hidden race, mostly exposed by btrfs/071 test case, that we
  have "compress_type = fs_info->compress_type", so we can still hit case
  where that compress_type is NONE (caused by remount -o nocompress), and
  then btrfs_compress_pages() will return -E2BIG, without modifying
  @nr_pages

  Then later when we cleanup @pages, we try to access pages[i]->mapping,
  triggering NULL pointer dereference.

  This will be address in the first patch though.

Changelog:
v2:
- Fix a bad commit merge
  Which merges the commit message for the first patch, though the content is
  still correct.

Qu Wenruo (2):
  btrfs: prevent NULL pointer dereference in compress_file_range() when
    btrfs_compress_pages() hits default case
  btrfs: inode: don't re-evaluate inode_need_compress() in
    compress_file_extent()

 fs/btrfs/compression.c |  10 ++--
 fs/btrfs/inode.c       | 106 ++++++++++++++++++++---------------------
 2 files changed, 59 insertions(+), 57 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case
  2020-08-04  7:25 [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
@ 2020-08-04  7:25 ` Qu Wenruo
  2020-08-11 18:57   ` Josef Bacik
  2021-03-09  1:02   ` Su Yue
  2020-08-04  7:25 ` [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:25 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
When running btrfs/071 with inode_need_compress() removed from
compress_file_range(), we got the following crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
  Call Trace:
   ? submit_compressed_extents+0x450/0x450 [btrfs]
   async_cow_start+0x16/0x40 [btrfs]
   btrfs_work_helper+0xf2/0x3e0 [btrfs]
   process_one_work+0x278/0x5e0
   worker_thread+0x55/0x400
   ? process_one_work+0x5e0/0x5e0
   kthread+0x168/0x190
   ? kthread_create_worker_on_cpu+0x70/0x70
   ret_from_fork+0x22/0x30
  ---[ end trace 65faf4eae941fa7d ]---

This is already after the patch "btrfs: inode: fix NULL pointer
dereference if inode doesn't need compression."

[CAUSE]
@pages is firstly created by kcalloc() in compress_file_extent():
                pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);

Then passed to btrfs_compress_pages() to be utilized there:

                ret = btrfs_compress_pages(...
                                           pages,
                                           &nr_pages,
                                           ...);

btrfs_compress_pages() will initial each pages as output, in
zlib_compress_pages() we have:

                        pages[nr_pages] = out_page;
                        nr_pages++;

Normally this is completely fine, but there is a special case which
is in btrfs_compress_pages() itself:
        switch (type) {
        default:
                return -E2BIG;
        }

In this case, we didn't modify @pages nor @out_pages, leaving them
untouched, then when we cleanup pages, the we can hit NULL pointer
dereference again:

        if (pages) {
                for (i = 0; i < nr_pages; i++) {
                        WARN_ON(pages[i]->mapping);
                        put_page(pages[i]);
                }
        ...
        }

Since pages[i] are all initialized to zero, and btrfs_compress_pages()
doesn't change them at all, accessing pages[i]->mapping would lead to
NULL pointer dereference.

This is not possible for current kernel, as we check
inode_need_compress() before doing pages allocation.
But if we're going to remove that inode_need_compress() in
compress_file_extent(), then it's going to be a problem.

[FIX]
When btrfs_compress_pages() hits its default case, modify @out_pages to
0 to prevent such problem from happening.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1ab56a734e70..17c27edd804b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -115,10 +115,14 @@ static int compression_compress_pages(int type, struct list_head *ws,
 	case BTRFS_COMPRESS_NONE:
 	default:
 		/*
-		 * This can't happen, the type is validated several times
-		 * before we get here. As a sane fallback, return what the
-		 * callers will understand as 'no compression happened'.
+		 * This happens when compression races with remount to no
+		 * compress, while caller doesn't call inode_need_compress()
+		 * to check if we really need to compress.
+		 *
+		 * Not a big deal, just need to inform caller that we
+		 * haven't allocated any pages yet.
 		 */
+		*out_pages = 0;
 		return -E2BIG;
 	}
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent()
  2020-08-04  7:25 [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
  2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
@ 2020-08-04  7:25 ` Qu Wenruo
  2020-08-11 18:59   ` Josef Bacik
  2020-08-31 21:25 ` [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in David Sterba
  2021-04-19 19:34 ` David Sterba
  3 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2020-08-04  7:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

The extra inode_need_compress() has already caused problems for pages
releasing.
We had hot fix to solve that problem, now it's time to fix it from the
root.

This patch will remove the extra inode_need_compress() to address the
problem.

This would lead to the following behavior change:
- Worse bad compression ratio detection
  Now if we had one async_chunk hitting bad compression ratio, other
  async_chunk will still try to compress.

  Only newer delalloc range will follow the new INODE_NO_COMPRESS flag
  then.
  Although one could argue that, if only part of the file content has
  bad compression, we should still try on other ranges.

Despite the behavior change, the code cleanup isn't that elegant either,
as kcalloc() can still fail for @pages, thus from cont: tag, we still
need to check @pages manually, thus the cleanup doesn't bring much
benefit, just one indent removal and comments reformatting.

Suggested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 106 +++++++++++++++++++++++------------------------
 1 file changed, 52 insertions(+), 54 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 96064eb41d55..37d9cff0b1b8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -549,67 +549,65 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	ret = 0;
 
 	/*
-	 * we do compression for mount -o compress and when the
-	 * inode has not been flagged as nocompress.  This flag can
-	 * change at any time if we discover bad compression ratios.
+	 * We're in compress_file_range() because run_delalloc_range() has
+	 * already evaluated inode_need_compress().
+	 * So don't re-check it again to avoid race between ioctl.
+	 * This behavior would make bad compression ratio detection less
+	 * effective, as we will only skip compression until next
+	 * run_delalloc_range().
 	 */
-	if (inode_need_compress(BTRFS_I(inode), start, end)) {
-		WARN_ON(pages);
-		pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
-		if (!pages) {
-			/* just bail out to the uncompressed code */
-			nr_pages = 0;
-			goto cont;
-		}
+	WARN_ON(pages);
+	pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+	if (!pages) {
+		/* just bail out to the uncompressed code */
+		nr_pages = 0;
+		goto cont;
+	}
 
-		if (BTRFS_I(inode)->defrag_compress)
-			compress_type = BTRFS_I(inode)->defrag_compress;
-		else if (BTRFS_I(inode)->prop_compress)
-			compress_type = BTRFS_I(inode)->prop_compress;
+	if (BTRFS_I(inode)->defrag_compress)
+		compress_type = BTRFS_I(inode)->defrag_compress;
+	else if (BTRFS_I(inode)->prop_compress)
+		compress_type = BTRFS_I(inode)->prop_compress;
 
-		/*
-		 * we need to call clear_page_dirty_for_io on each
-		 * page in the range.  Otherwise applications with the file
-		 * mmap'd can wander in and change the page contents while
-		 * we are compressing them.
-		 *
-		 * If the compression fails for any reason, we set the pages
-		 * dirty again later on.
-		 *
-		 * Note that the remaining part is redirtied, the start pointer
-		 * has moved, the end is the original one.
-		 */
-		if (!redirty) {
-			extent_range_clear_dirty_for_io(inode, start, end);
-			redirty = 1;
-		}
+	/*
+	 * We need to call clear_page_dirty_for_io on each page in the range.
+	 * Otherwise applications with the file mmap'd can wander in and
+	 * change the page contents while we are compressing them.
+	 *
+	 * If the compression fails for any reason, we set the pages dirty
+	 * again later on.
+	 *
+	 * Note that the remaining part is redirtied, the start pointer has
+	 * moved, the end is the original one.
+	 */
+	if (!redirty) {
+		extent_range_clear_dirty_for_io(inode, start, end);
+		redirty = 1;
+	}
 
-		/* Compression level is applied here and only here */
-		ret = btrfs_compress_pages(
-			compress_type | (fs_info->compress_level << 4),
-					   inode->i_mapping, start,
-					   pages,
-					   &nr_pages,
-					   &total_in,
-					   &total_compressed);
+	/* Compression level is applied here and only here */
+	ret = btrfs_compress_pages(
+		compress_type | (fs_info->compress_level << 4),
+				   inode->i_mapping, start, pages, &nr_pages,
+				   &total_in, &total_compressed);
 
-		if (!ret) {
-			unsigned long offset = offset_in_page(total_compressed);
-			struct page *page = pages[nr_pages - 1];
-			char *kaddr;
+	if (!ret) {
+		unsigned long offset = offset_in_page(total_compressed);
+		struct page *page = pages[nr_pages - 1];
+		char *kaddr;
 
-			/* zero the tail end of the last page, we might be
-			 * sending it down to disk
-			 */
-			if (offset) {
-				kaddr = kmap_atomic(page);
-				memset(kaddr + offset, 0,
-				       PAGE_SIZE - offset);
-				kunmap_atomic(kaddr);
-			}
-			will_compress = 1;
+		/* zero the tail end of the last page, we might be
+		 * sending it down to disk
+		 */
+		if (offset) {
+			kaddr = kmap_atomic(page);
+			memset(kaddr + offset, 0,
+			       PAGE_SIZE - offset);
+			kunmap_atomic(kaddr);
 		}
+		will_compress = 1;
 	}
+
 cont:
 	if (start == 0) {
 		/* lets try to make an inline extent */
@@ -656,7 +654,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 			/*
 			 * Ensure we only free the compressed pages if we have
 			 * them allocated, as we can still reach here with
-			 * inode_need_compress() == false.
+			 * previous kcalloc() failure.
 			 */
 			if (pages) {
 				for (i = 0; i < nr_pages; i++) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case
  2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
@ 2020-08-11 18:57   ` Josef Bacik
  2021-03-09  1:02   ` Su Yue
  1 sibling, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2020-08-11 18:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 8/4/20 3:25 AM, Qu Wenruo wrote:
> [BUG]
> When running btrfs/071 with inode_need_compress() removed from
> compress_file_range(), we got the following crash:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000018
>    #PF: supervisor read access in kernel mode
>    #PF: error_code(0x0000) - not-present page
>    Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
>    RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
>    Call Trace:
>     ? submit_compressed_extents+0x450/0x450 [btrfs]
>     async_cow_start+0x16/0x40 [btrfs]
>     btrfs_work_helper+0xf2/0x3e0 [btrfs]
>     process_one_work+0x278/0x5e0
>     worker_thread+0x55/0x400
>     ? process_one_work+0x5e0/0x5e0
>     kthread+0x168/0x190
>     ? kthread_create_worker_on_cpu+0x70/0x70
>     ret_from_fork+0x22/0x30
>    ---[ end trace 65faf4eae941fa7d ]---
> 
> This is already after the patch "btrfs: inode: fix NULL pointer
> dereference if inode doesn't need compression."
> 
> [CAUSE]
> @pages is firstly created by kcalloc() in compress_file_extent():
>                  pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
> 
> Then passed to btrfs_compress_pages() to be utilized there:
> 
>                  ret = btrfs_compress_pages(...
>                                             pages,
>                                             &nr_pages,
>                                             ...);
> 
> btrfs_compress_pages() will initial each pages as output, in
> zlib_compress_pages() we have:
> 
>                          pages[nr_pages] = out_page;
>                          nr_pages++;
> 
> Normally this is completely fine, but there is a special case which
> is in btrfs_compress_pages() itself:
>          switch (type) {
>          default:
>                  return -E2BIG;
>          }
> 
> In this case, we didn't modify @pages nor @out_pages, leaving them
> untouched, then when we cleanup pages, the we can hit NULL pointer
> dereference again:
> 
>          if (pages) {
>                  for (i = 0; i < nr_pages; i++) {
>                          WARN_ON(pages[i]->mapping);
>                          put_page(pages[i]);
>                  }
>          ...
>          }
> 
> Since pages[i] are all initialized to zero, and btrfs_compress_pages()
> doesn't change them at all, accessing pages[i]->mapping would lead to
> NULL pointer dereference.
> 
> This is not possible for current kernel, as we check
> inode_need_compress() before doing pages allocation.
> But if we're going to remove that inode_need_compress() in
> compress_file_extent(), then it's going to be a problem.
> 
> [FIX]
> When btrfs_compress_pages() hits its default case, modify @out_pages to
> 0 to prevent such problem from happening.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent()
  2020-08-04  7:25 ` [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
@ 2020-08-11 18:59   ` Josef Bacik
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2020-08-11 18:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

On 8/4/20 3:25 AM, Qu Wenruo wrote:
> The extra inode_need_compress() has already caused problems for pages
> releasing.
> We had hot fix to solve that problem, now it's time to fix it from the
> root.
> 
> This patch will remove the extra inode_need_compress() to address the
> problem.
> 
> This would lead to the following behavior change:
> - Worse bad compression ratio detection
>    Now if we had one async_chunk hitting bad compression ratio, other
>    async_chunk will still try to compress.
> 
>    Only newer delalloc range will follow the new INODE_NO_COMPRESS flag
>    then.
>    Although one could argue that, if only part of the file content has
>    bad compression, we should still try on other ranges.
> 
> Despite the behavior change, the code cleanup isn't that elegant either,
> as kcalloc() can still fail for @pages, thus from cont: tag, we still
> need to check @pages manually, thus the cleanup doesn't bring much
> benefit, just one indent removal and comments reformatting.
> 
> Suggested-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in
  2020-08-04  7:25 [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
  2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
  2020-08-04  7:25 ` [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
@ 2020-08-31 21:25 ` David Sterba
  2021-04-19 19:34 ` David Sterba
  3 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2020-08-31 21:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 04, 2020 at 03:25:46PM +0800, Qu Wenruo wrote:
> This is an attempt to remove the inode_need_compress() call in
> compress_file_extent().
> 
> As that compress_file_extent() can race with inode ioctl or bad
> compression ratio, to cause NULL pointer dereferecen for @pages, it's
> nature to try to remove that inode_need_compress() to remove the race
> completely.
> 
> However that's not that easy, we have the following problems:
> 
> - We still need to check @pages anyway
>   That @pages check is for kcalloc() failure, so what we really get is
>   just removing one indent from the if (inode_need_compress()).
>   Everything else is still the same (in fact, even worse, see below
>   problems)
> 
> - Behavior change
>   Before that change, every async_chunk does their check on
>   INODE_NO_COMPRESS flags.
>   If we hit any bad compression ratio, all incoming async_chunk will
>   fall back to plain text write.
> 
>   But if we remove that inode_need_compress() check, then we still try
>   to compress, and lead to potentially wasted CPU times.

Hm, any way I read it, this does not follow the intentions how the
compression decisions should work. The whole idea is to have 2 modes,
force-compress which overrides everything, potentially wasting cpu
cycles.  This is the fallback when the automatic nocompress logic is
bailing out too early.

To fix that, the normal 'compress' mode plus heuristics should decide if
the compression should happen at all and after it does not, flip the
nocompress bit. This is still not ideal as there's no "learning" that
some file ranges can compress worse but we still want to keep the file
compression on, just skip the bad ranges. Compared to one bad range then
skip compression on the file completely.

And your code throws away the location where the heuristic can give the
hint.

As we have several ways to request the compression (mount option, defrag
ioctl, property), the decision needs to be more fine grained.

1. btrfs_run_delalloc_range

- check if the compression is allowed at all (not if there are nodatacow
  or nodatasum bits)
- check if there's explicit request (defrag, compress-force)
- skip it if the nocompress bit is set
- the rest are the weak tests, either mount option or inode flags/props

At this point we don't need to call the heuristics.

2. compress_file_range

The range is split to the 128k chunks and each is evaluated separately.
Here we know we want to compress the range because of all the checks in
1 passed, so the only remaining decisions are based on:

- heuristics, try the given range and bail out eventuall or be more
  clever and do some longer-term statistics before flipping nocompress
  bit
- check the nocompress bit that could have been set in previous
  iteration

So this is the plan, you can count how many things are not implemented.
I have prototype for the heuristic learning, but that's not the
important part, first the test conditions need to be shuffled around to
follow the above logic.

> - Still race between compression disable and NULL pointer dereferecen
>   There is a hidden race, mostly exposed by btrfs/071 test case, that we
>   have "compress_type = fs_info->compress_type", so we can still hit case
>   where that compress_type is NONE (caused by remount -o nocompress), and
>   then btrfs_compress_pages() will return -E2BIG, without modifying
>   @nr_pages

We maybe should add some last sanity check before btrfs_compress_pages
is called in case the defrag or prop changes after the checks.

>   Then later when we cleanup @pages, we try to access pages[i]->mapping,
>   triggering NULL pointer dereference.
> 
>   This will be address in the first patch though.

The patch makes it more robust so this is ok. The unexpected changes of
the compress type due to remount or prop changes could be avoided by
saving the type + level at the beginning of compress_file_range.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case
  2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
  2020-08-11 18:57   ` Josef Bacik
@ 2021-03-09  1:02   ` Su Yue
  1 sibling, 0 replies; 8+ messages in thread
From: Su Yue @ 2021-03-09  1:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 4, 2020 at 3:29 PM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> When running btrfs/071 with inode_need_compress() removed from
> compress_file_range(), we got the following crash:
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000018
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
>   RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
>   Call Trace:
>    ? submit_compressed_extents+0x450/0x450 [btrfs]
>    async_cow_start+0x16/0x40 [btrfs]
>    btrfs_work_helper+0xf2/0x3e0 [btrfs]
>    process_one_work+0x278/0x5e0
>    worker_thread+0x55/0x400
>    ? process_one_work+0x5e0/0x5e0
>    kthread+0x168/0x190
>    ? kthread_create_worker_on_cpu+0x70/0x70
>    ret_from_fork+0x22/0x30
>   ---[ end trace 65faf4eae941fa7d ]---
>
Ping?
The bug occurred while running btrfs/074 days ago. And the fix
looks reasonable.

--
Su
> This is already after the patch "btrfs: inode: fix NULL pointer
> dereference if inode doesn't need compression."
>
> [CAUSE]
> @pages is firstly created by kcalloc() in compress_file_extent():
>                 pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
>
> Then passed to btrfs_compress_pages() to be utilized there:
>
>                 ret = btrfs_compress_pages(...
>                                            pages,
>                                            &nr_pages,
>                                            ...);
>
> btrfs_compress_pages() will initial each pages as output, in
> zlib_compress_pages() we have:
>
>                         pages[nr_pages] = out_page;
>                         nr_pages++;
>
> Normally this is completely fine, but there is a special case which
> is in btrfs_compress_pages() itself:
>         switch (type) {
>         default:
>                 return -E2BIG;
>         }
>
> In this case, we didn't modify @pages nor @out_pages, leaving them
> untouched, then when we cleanup pages, the we can hit NULL pointer
> dereference again:
>
>         if (pages) {
>                 for (i = 0; i < nr_pages; i++) {
>                         WARN_ON(pages[i]->mapping);
>                         put_page(pages[i]);
>                 }
>         ...
>         }
>
> Since pages[i] are all initialized to zero, and btrfs_compress_pages()
> doesn't change them at all, accessing pages[i]->mapping would lead to
> NULL pointer dereference.
>
> This is not possible for current kernel, as we check
> inode_need_compress() before doing pages allocation.
> But if we're going to remove that inode_need_compress() in
> compress_file_extent(), then it's going to be a problem.
>
> [FIX]
> When btrfs_compress_pages() hits its default case, modify @out_pages to
> 0 to prevent such problem from happening.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/compression.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 1ab56a734e70..17c27edd804b 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -115,10 +115,14 @@ static int compression_compress_pages(int type, struct list_head *ws,
>         case BTRFS_COMPRESS_NONE:
>         default:
>                 /*
> -                * This can't happen, the type is validated several times
> -                * before we get here. As a sane fallback, return what the
> -                * callers will understand as 'no compression happened'.
> +                * This happens when compression races with remount to no
> +                * compress, while caller doesn't call inode_need_compress()
> +                * to check if we really need to compress.
> +                *
> +                * Not a big deal, just need to inform caller that we
> +                * haven't allocated any pages yet.
>                  */
> +               *out_pages = 0;
>                 return -E2BIG;
>         }
>  }
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in
  2020-08-04  7:25 [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-08-31 21:25 ` [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in David Sterba
@ 2021-04-19 19:34 ` David Sterba
  3 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2021-04-19 19:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 04, 2020 at 03:25:46PM +0800, Qu Wenruo wrote:
> This is an attempt to remove the inode_need_compress() call in
> compress_file_extent().
> 
> As that compress_file_extent() can race with inode ioctl or bad
> compression ratio, to cause NULL pointer dereferecen for @pages, it's
> nature to try to remove that inode_need_compress() to remove the race
> completely.
> 
> However that's not that easy, we have the following problems:
> 
> - We still need to check @pages anyway
>   That @pages check is for kcalloc() failure, so what we really get is
>   just removing one indent from the if (inode_need_compress()).
>   Everything else is still the same (in fact, even worse, see below
>   problems)
> 
> - Behavior change
>   Before that change, every async_chunk does their check on
>   INODE_NO_COMPRESS flags.
>   If we hit any bad compression ratio, all incoming async_chunk will
>   fall back to plain text write.
> 
>   But if we remove that inode_need_compress() check, then we still try
>   to compress, and lead to potentially wasted CPU times.
> 
> - Still race between compression disable and NULL pointer dereferecen
>   There is a hidden race, mostly exposed by btrfs/071 test case, that we
>   have "compress_type = fs_info->compress_type", so we can still hit case
>   where that compress_type is NONE (caused by remount -o nocompress), and
>   then btrfs_compress_pages() will return -E2BIG, without modifying
>   @nr_pages
> 
>   Then later when we cleanup @pages, we try to access pages[i]->mapping,
>   triggering NULL pointer dereference.
> 
>   This will be address in the first patch though.
> 
> Changelog:
> v2:
> - Fix a bad commit merge
>   Which merges the commit message for the first patch, though the content is
>   still correct.
> 
> Qu Wenruo (2):
>   btrfs: prevent NULL pointer dereference in compress_file_range() when
>     btrfs_compress_pages() hits default case

Patch 1 added to misc-next, due to this bug report
https://bugzilla.kernel.org/show_bug.cgi?id=212331

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-19 19:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04  7:25 [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in Qu Wenruo
2020-08-04  7:25 ` [PATCH v2 1/2] btrfs: prevent NULL pointer dereference in compress_file_range() when btrfs_compress_pages() hits default case Qu Wenruo
2020-08-11 18:57   ` Josef Bacik
2021-03-09  1:02   ` Su Yue
2020-08-04  7:25 ` [PATCH v2 2/2] btrfs: inode: don't re-evaluate inode_need_compress() in compress_file_extent() Qu Wenruo
2020-08-11 18:59   ` Josef Bacik
2020-08-31 21:25 ` [PATCH v2 0/2] btrfs: remove the inode_need_compress() call in David Sterba
2021-04-19 19:34 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.