All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage
@ 2020-12-17  4:57 Qu Wenruo
  2020-12-17  4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  4:57 UTC (permalink / raw)
  To: linux-btrfs

This small patchset contains 3 refactors which are subpage independent.

And the last patch is RFC where I'm not certain about the existing code,
but it solves the problem for subpage during test.

Thus I'm here asking for help on the btrfs_invalidatepage() behavior.

Qu Wenruo (4):
  btrfs: inode: use min() to replace open-code in btrfs_invalidatepage()
  btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  btrfs: inode: move the timing of TestClearPagePrivate() in
    btrfs_invalidatepage()
  btrfs: inode: make btrfs_invalidatepage() to be subpage compatible

 fs/btrfs/inode.c | 45 +++++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 18 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage()
  2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
@ 2020-12-17  4:57 ` Qu Wenruo
  2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  4:57 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_invalidatepage() we introduce a temporary variable, new_len, to
update ordered->truncated_len.

But we can use min() to replace it completely and no need for the
variable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index af3439b21bc4..dced71bccaac 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8219,15 +8219,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 */
 		if (TestClearPagePrivate2(page)) {
 			struct btrfs_ordered_inode_tree *tree;
-			u64 new_len;
 
 			tree = &inode->ordered_tree;
 
 			spin_lock_irq(&tree->lock);
 			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
-			new_len = start - ordered->file_offset;
-			if (new_len < ordered->truncated_len)
-				ordered->truncated_len = new_len;
+			ordered->truncated_len = min(ordered->truncated_len,
+					start - ordered->file_offset);
 			spin_unlock_irq(&tree->lock);
 
 			ASSERT(end - start + 1 < U32_MAX);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
  2020-12-17  4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
@ 2020-12-17  4:57 ` Qu Wenruo
  2020-12-17  5:38   ` Su Yue
  2020-12-17  5:55   ` Nikolay Borisov
  2020-12-17  4:57 ` [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() " Qu Wenruo
  2020-12-17  4:57 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Qu Wenruo
  3 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  4:57 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_invalidatepage() we re-declare @tree variable as
btrfs_ordered_inode_tree.

Remove such variable shadowing which can be very confusing.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dced71bccaac..b4d36d138008 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8169,6 +8169,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
 	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
+	struct btrfs_ordered_inode_tree *ordered_tree = &inode->ordered_tree;
 	struct extent_io_tree *tree = &inode->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
@@ -8218,15 +8219,11 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 * for the finish_ordered_io
 		 */
 		if (TestClearPagePrivate2(page)) {
-			struct btrfs_ordered_inode_tree *tree;
-
-			tree = &inode->ordered_tree;
-
-			spin_lock_irq(&tree->lock);
+			spin_lock_irq(&ordered_tree->lock);
 			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
 			ordered->truncated_len = min(ordered->truncated_len,
 					start - ordered->file_offset);
-			spin_unlock_irq(&tree->lock);
+			spin_unlock_irq(&ordered_tree->lock);
 
 			ASSERT(end - start + 1 < U32_MAX);
 			if (btrfs_dec_test_ordered_pending(inode, &ordered,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() in btrfs_invalidatepage()
  2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
  2020-12-17  4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
  2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
@ 2020-12-17  4:57 ` Qu Wenruo
  2020-12-17  4:57 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Qu Wenruo
  3 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  4:57 UTC (permalink / raw)
  To: linux-btrfs

For current sectorsize == PAGE_ZIE case, there will only be at most one
ordered extent for one page.

But for subpage case, one page can contain several ordered extents, thus
current TestClearPagePrivate2() call will only finish ordered IO for the
first ordered extent, the remaining ones will be skipped, and cause
never-end ordered io.

This patch will move the TestClearPagePrivate2 before the loop, and save
the result, so that we can finish all ordered extents.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b4d36d138008..eb493fbb65f9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8178,6 +8178,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	u64 start;
 	u64 end;
 	int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
+	bool cleared_private2;
 	bool found_ordered = false;
 	bool completed_ordered = false;
 
@@ -8197,6 +8198,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 
 	if (!inode_evicting)
 		lock_extent_bits(tree, page_start, page_end, &cached_state);
+
+	cleared_private2 = TestClearPagePrivate2(page);
 again:
 	start = page_start;
 	ordered = btrfs_lookup_ordered_range(inode, start, page_end - start + 1);
@@ -8214,11 +8217,12 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 					 EXTENT_DELALLOC |
 					 EXTENT_LOCKED | EXTENT_DO_ACCOUNTING |
 					 EXTENT_DEFRAG, 1, 0, &cached_state);
+
 		/*
 		 * whoever cleared the private bit is responsible
 		 * for the finish_ordered_io
 		 */
-		if (TestClearPagePrivate2(page)) {
+		if (cleared_private2) {
 			spin_lock_irq(&ordered_tree->lock);
 			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
 			ordered->truncated_len = min(ordered->truncated_len,
@@ -8233,6 +8237,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 				completed_ordered = true;
 			}
 		}
+
 		btrfs_put_ordered_extent(ordered);
 		if (!inode_evicting) {
 			cached_state = NULL;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
  2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-12-17  4:57 ` [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() " Qu Wenruo
@ 2020-12-17  4:57 ` Qu Wenruo
  2020-12-17 11:20   ` Filipe Manana
  2020-12-17 14:51   ` Josef Bacik
  3 siblings, 2 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  4:57 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With current subpage RW patchset, the following script can lead to
filesystem hang:
  # mkfs.btrfs -f -s 4k $dev
  # mount $dev -o nospace_cache $mnt
  # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt

The file system will hang at wait_event() of
btrfs_start_ordered_extent().

[CAUSE]
The root cause is, btrfs_invalidatepage() is freeing page::private which
still has subpage dirty bit set.

The offending situation happens like this:
btrfs_fllocate()
|- btrfs_zero_range()
   |- btrfs_punch_hole_lock_range()
      |- truncate_pagecache_range()
         |- btrfs_invalidatepage()

The involved range looks like:

0	32K	64K	96K	128K
	|///////||//////|
	| Range to drop |

For the [32K, 64K) range, since the offset is 32K, the page won't be
invalidated.

But for the [64K, 96K) range, the offset is 0, current
btrfs_invalidatepage() will call clear_page_extent_mapped() which will
detach page::private, making the subpage dirty bitmap being cleared.

This prevents later __extent_writepage_io() to locate any range to
write, thus no way to wake up the ordered extents.

[FIX]
To fix the problem this patch will:
- Only clear page status and detach page private when the full page
  is invalidated

- Change how we handle unfinished ordered extent
  If there is any ordered extent unfinished in the page range, we can't
  call clear_extent_bit() with delete == true.

[REASON FOR RFC]
There is still uncertainty around the btrfs_releasepage() call.

1. Why we need btrfs_releasepage() call for non-full-page condition?
   Other fs (aka. xfs) just exit without doing special handling if
   invalidatepage() is called with part of the page.

   Thus I didn't completely understand why btrfs_releasepage() here is
   needed for non-full page call.

2. Why "if (offset)" is not causing problem for current code?
   This existing if (offset) call can be skipped for cases like
   offset == 0 length == 2K.
   As MM layer can call invalidatepage() with unaligned offset/length,
   for cases like truncate_inode_pages_range().
   This will make btrfs_invalidatepage() to truncate the whole page when
   we only need to zero part of the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eb493fbb65f9..872c5309b4ca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8180,7 +8180,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
 	bool cleared_private2;
 	bool found_ordered = false;
-	bool completed_ordered = false;
+	bool incompleted_ordered = false;
 
 	/*
 	 * we have the page locked, so new writeback can't start,
@@ -8191,7 +8191,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	 */
 	wait_on_page_writeback(page);
 
-	if (offset) {
+	/*
+	 * The range doesn't cover the full page, just let btrfs_releasepage() to
+	 * check if we can release the extent mapping.
+	 * Any locked/pinned/logged extent map would prevent us freeing the
+	 * extent mapping.
+	 */
+	if (!(offset == 0 && length == PAGE_SIZE)) {
 		btrfs_releasepage(page, GFP_NOFS);
 		return;
 	}
@@ -8208,9 +8214,10 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		end = min(page_end,
 			  ordered->file_offset + ordered->num_bytes - 1);
 		/*
-		 * IO on this page will never be started, so we need to account
-		 * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
-		 * here, must leave that up for the ordered extent completion.
+		 * IO on this ordered extent will never be started, so we need
+		 * to account for any ordered extents now. Don't clear
+		 * EXTENT_DELALLOC_NEW here, must leave that up for the
+		 * ordered extent completion.
 		 */
 		if (!inode_evicting)
 			clear_extent_bit(tree, start, end,
@@ -8234,7 +8241,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 							   start,
 							   end - start + 1, 1)) {
 				btrfs_finish_ordered_io(ordered);
-				completed_ordered = true;
+			} else {
+				incompleted_ordered = true;
 			}
 		}
 
@@ -8276,7 +8284,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 * is cleared if we don't delete, otherwise it can lead to
 		 * corruptions if the i_size is extented later.
 		 */
-		if (found_ordered && !completed_ordered)
+		if (found_ordered && incompleted_ordered)
 			delete = false;
 		clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED |
 				 EXTENT_DELALLOC | EXTENT_UPTODATE |
@@ -8286,6 +8294,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		__btrfs_releasepage(page, GFP_NOFS);
 	}
 
+	ClearPagePrivate2(page);
 	ClearPageChecked(page);
 	clear_page_extent_mapped(page);
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
@ 2020-12-17  5:38   ` Su Yue
  2020-12-17  5:42     ` Qu Wenruo
  2020-12-17  5:55   ` Nikolay Borisov
  1 sibling, 1 reply; 16+ messages in thread
From: Su Yue @ 2020-12-17  5:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


On Thu 17 Dec 2020 at 12:57, Qu Wenruo <wqu@suse.com> wrote:

> In btrfs_invalidatepage() we re-declare @tree variable as
> btrfs_ordered_inode_tree.
>
> Remove such variable shadowing which can be very confusing.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index dced71bccaac..b4d36d138008 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8169,6 +8169,7 @@ static void btrfs_invalidatepage(struct 
> page *page, unsigned int offset,
>  				 unsigned int length)
>  {
>  	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
> +	struct btrfs_ordered_inode_tree *ordered_tree = 
> &inode->ordered_tree;
>
Any reason for the declaration here? I didn't find that patch[3/4] 
use it.

>  	struct extent_io_tree *tree = &inode->io_tree;
>  	struct btrfs_ordered_extent *ordered;
>  	struct extent_state *cached_state = NULL;
> @@ -8218,15 +8219,11 @@ static void btrfs_invalidatepage(struct 
> page *page, unsigned int offset,
>  		 * for the finish_ordered_io
>  		 */
>  		if (TestClearPagePrivate2(page)) {
> -			struct btrfs_ordered_inode_tree *tree;
> -
Better to just rename the @tree to @ordered_tree.

> -			tree = &inode->ordered_tree;
> -
> -			spin_lock_irq(&tree->lock);
> +			spin_lock_irq(&ordered_tree->lock);
>  			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
>  			ordered->truncated_len = min(ordered->truncated_len,
>  					start - ordered->file_offset);
> -			spin_unlock_irq(&tree->lock);
> +			spin_unlock_irq(&ordered_tree->lock);
>
>  			ASSERT(end - start + 1 < U32_MAX);
>  			if (btrfs_dec_test_ordered_pending(inode, &ordered,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  5:38   ` Su Yue
@ 2020-12-17  5:42     ` Qu Wenruo
  2020-12-17  6:08       ` Su Yue
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  5:42 UTC (permalink / raw)
  To: Su Yue, Qu Wenruo; +Cc: linux-btrfs



On 2020/12/17 下午1:38, Su Yue wrote:
> 
> On Thu 17 Dec 2020 at 12:57, Qu Wenruo <wqu@suse.com> wrote:
> 
>> In btrfs_invalidatepage() we re-declare @tree variable as
>> btrfs_ordered_inode_tree.
>>
>> Remove such variable shadowing which can be very confusing.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/inode.c | 9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index dced71bccaac..b4d36d138008 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -8169,6 +8169,7 @@ static void btrfs_invalidatepage(struct page 
>> *page, unsigned int offset,
>>                   unsigned int length)
>>  {
>>      struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
>> +    struct btrfs_ordered_inode_tree *ordered_tree = 
>> &inode->ordered_tree;
>>
> Any reason for the declaration here? I didn't find that patch[3/4] use it.

Didn't that ordered_tree get used lines below?

> 
>>      struct extent_io_tree *tree = &inode->io_tree;
>>      struct btrfs_ordered_extent *ordered;
>>      struct extent_state *cached_state = NULL;
>> @@ -8218,15 +8219,11 @@ static void btrfs_invalidatepage(struct page 
>> *page, unsigned int offset,
>>           * for the finish_ordered_io
>>           */
>>          if (TestClearPagePrivate2(page)) {
>> -            struct btrfs_ordered_inode_tree *tree;
>> -
> Better to just rename the @tree to @ordered_tree.

Isn't that exactly what I did?

Thanks,
Qu
> 
>> -            tree = &inode->ordered_tree;
>> -
>> -            spin_lock_irq(&tree->lock);
>> +            spin_lock_irq(&ordered_tree->lock);
>>              set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
>>              ordered->truncated_len = min(ordered->truncated_len,
>>                      start - ordered->file_offset);
>> -            spin_unlock_irq(&tree->lock);
>> +            spin_unlock_irq(&ordered_tree->lock);
>>
>>              ASSERT(end - start + 1 < U32_MAX);
>>              if (btrfs_dec_test_ordered_pending(inode, &ordered,
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
  2020-12-17  5:38   ` Su Yue
@ 2020-12-17  5:55   ` Nikolay Borisov
  2020-12-17  5:59     ` Nikolay Borisov
  1 sibling, 1 reply; 16+ messages in thread
From: Nikolay Borisov @ 2020-12-17  5:55 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 17.12.20 г. 6:57 ч., Qu Wenruo wrote:
> In btrfs_invalidatepage() we re-declare @tree variable as
> btrfs_ordered_inode_tree.
> 
> Remove such variable shadowing which can be very confusing.

You can't do that, because lock_extent_bits expects extent_io_tree !


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  5:55   ` Nikolay Borisov
@ 2020-12-17  5:59     ` Nikolay Borisov
  2020-12-17  6:13       ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Borisov @ 2020-12-17  5:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 17.12.20 г. 7:55 ч., Nikolay Borisov wrote:
> 
> 
> On 17.12.20 г. 6:57 ч., Qu Wenruo wrote:
>> In btrfs_invalidatepage() we re-declare @tree variable as
>> btrfs_ordered_inode_tree.
>>
>> Remove such variable shadowing which can be very confusing.
> 
> You can't do that, because lock_extent_bits expects extent_io_tree !
> 

Ok, nvm, you just factored the var at the beginning of the functions.
OTOH since the ordered tree is used just for lock/unlock why not do
spin_(un)lock(&inode->ordered_tree->lock);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  5:42     ` Qu Wenruo
@ 2020-12-17  6:08       ` Su Yue
  0 siblings, 0 replies; 16+ messages in thread
From: Su Yue @ 2020-12-17  6:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs


On Thu 17 Dec 2020 at 13:42, Qu Wenruo <quwenruo.btrfs@gmx.com> 
wrote:

> On 2020/12/17 下午1:38, Su Yue wrote:
>>
>> On Thu 17 Dec 2020 at 12:57, Qu Wenruo <wqu@suse.com> wrote:
>>
>>> In btrfs_invalidatepage() we re-declare @tree variable as
>>> btrfs_ordered_inode_tree.
>>>
>>> Remove such variable shadowing which can be very confusing.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/inode.c | 9 +++------
>>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index dced71bccaac..b4d36d138008 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -8169,6 +8169,7 @@ static void btrfs_invalidatepage(struct 
>>> page
>>> *page, unsigned int offset,
>>>                   unsigned int length)
>>>  {
>>>      struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
>>> +    struct btrfs_ordered_inode_tree *ordered_tree =
>>> &inode->ordered_tree;
>>>
>> Any reason for the declaration here? I didn't find that 
>> patch[3/4] use it.
>
> Didn't that ordered_tree get used lines below?
>
>>
>>>      struct extent_io_tree *tree = &inode->io_tree;
>>>      struct btrfs_ordered_extent *ordered;
>>>      struct extent_state *cached_state = NULL;
>>> @@ -8218,15 +8219,11 @@ static void 
>>> btrfs_invalidatepage(struct page
>>> *page, unsigned int offset,
>>>           * for the finish_ordered_io
>>>           */
>>>          if (TestClearPagePrivate2(page)) {
>>> -            struct btrfs_ordered_inode_tree *tree;
>>> -
>> Better to just rename the @tree to @ordered_tree.
>
> Isn't that exactly what I did?

What I mean is that keep the declaration in the block since no 
further use of it.

>
> Thanks,
> Qu
>>
>>> -            tree = &inode->ordered_tree;
>>> -
>>> -            spin_lock_irq(&tree->lock);
>>> +            spin_lock_irq(&ordered_tree->lock);
>>>              set_bit(BTRFS_ORDERED_TRUNCATED, 
>>>              &ordered->flags);
>>>              ordered->truncated_len = 
>>>              min(ordered->truncated_len,
>>>                      start - ordered->file_offset);
>>> -            spin_unlock_irq(&tree->lock);
>>> +            spin_unlock_irq(&ordered_tree->lock);
>>>
>>>              ASSERT(end - start + 1 < U32_MAX);
>>>              if (btrfs_dec_test_ordered_pending(inode, 
>>>              &ordered,
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  5:59     ` Nikolay Borisov
@ 2020-12-17  6:13       ` Qu Wenruo
  2020-12-17 12:29         ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-12-17  6:13 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2020/12/17 下午1:59, Nikolay Borisov wrote:
> 
> 
> On 17.12.20 г. 7:55 ч., Nikolay Borisov wrote:
>>
>>
>> On 17.12.20 г. 6:57 ч., Qu Wenruo wrote:
>>> In btrfs_invalidatepage() we re-declare @tree variable as
>>> btrfs_ordered_inode_tree.
>>>
>>> Remove such variable shadowing which can be very confusing.
>>
>> You can't do that, because lock_extent_bits expects extent_io_tree !
>>
> 
> Ok, nvm, you just factored the var at the beginning of the functions.
> OTOH since the ordered tree is used just for lock/unlock why not do
> spin_(un)lock(&inode->ordered_tree->lock);
> 
Oh, that indeed looks better and since Su is also complaining about the 
declaration at the beginning of the function, I guess that's the better 
way to go.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
  2020-12-17  4:57 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Qu Wenruo
@ 2020-12-17 11:20   ` Filipe Manana
  2020-12-22  4:38     ` Qu Wenruo
  2020-12-17 14:51   ` Josef Bacik
  1 sibling, 1 reply; 16+ messages in thread
From: Filipe Manana @ 2020-12-17 11:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Dec 17, 2020 at 5:03 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> With current subpage RW patchset, the following script can lead to
> filesystem hang:
>   # mkfs.btrfs -f -s 4k $dev
>   # mount $dev -o nospace_cache $mnt
>   # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
>
> The file system will hang at wait_event() of
> btrfs_start_ordered_extent().
>
> [CAUSE]
> The root cause is, btrfs_invalidatepage() is freeing page::private which
> still has subpage dirty bit set.
>
> The offending situation happens like this:
> btrfs_fllocate()
> |- btrfs_zero_range()
>    |- btrfs_punch_hole_lock_range()
>       |- truncate_pagecache_range()
>          |- btrfs_invalidatepage()
>
> The involved range looks like:
>
> 0       32K     64K     96K     128K
>         |///////||//////|
>         | Range to drop |
>
> For the [32K, 64K) range, since the offset is 32K, the page won't be
> invalidated.
>
> But for the [64K, 96K) range, the offset is 0, current
> btrfs_invalidatepage() will call clear_page_extent_mapped() which will
> detach page::private, making the subpage dirty bitmap being cleared.
>
> This prevents later __extent_writepage_io() to locate any range to
> write, thus no way to wake up the ordered extents.
>
> [FIX]
> To fix the problem this patch will:
> - Only clear page status and detach page private when the full page
>   is invalidated
>
> - Change how we handle unfinished ordered extent
>   If there is any ordered extent unfinished in the page range, we can't
>   call clear_extent_bit() with delete == true.
>
> [REASON FOR RFC]
> There is still uncertainty around the btrfs_releasepage() call.
>
> 1. Why we need btrfs_releasepage() call for non-full-page condition?
>    Other fs (aka. xfs) just exit without doing special handling if
>    invalidatepage() is called with part of the page.
>
>    Thus I didn't completely understand why btrfs_releasepage() here is
>    needed for non-full page call.
>
> 2. Why "if (offset)" is not causing problem for current code?
>    This existing if (offset) call can be skipped for cases like
>    offset == 0 length == 2K.
>    As MM layer can call invalidatepage() with unaligned offset/length,
>    for cases like truncate_inode_pages_range().
>    This will make btrfs_invalidatepage() to truncate the whole page when
>    we only need to zero part of the page.

I don't think you can ever get offset == 0 and length < PAGE_SIZE
unless this is the last page in the file, the one containing eof, in
which it's perfectly valid to invalidate the whole page.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index eb493fbb65f9..872c5309b4ca 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8180,7 +8180,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>         int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
>         bool cleared_private2;
>         bool found_ordered = false;
> -       bool completed_ordered = false;
> +       bool incompleted_ordered = false;
>
>         /*
>          * we have the page locked, so new writeback can't start,
> @@ -8191,7 +8191,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>          */
>         wait_on_page_writeback(page);
>
> -       if (offset) {
> +       /*
> +        * The range doesn't cover the full page, just let btrfs_releasepage() to
> +        * check if we can release the extent mapping.
> +        * Any locked/pinned/logged extent map would prevent us freeing the
> +        * extent mapping.
> +        */
> +       if (!(offset == 0 && length == PAGE_SIZE)) {
>                 btrfs_releasepage(page, GFP_NOFS);
>                 return;
>         }
> @@ -8208,9 +8214,10 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>                 end = min(page_end,
>                           ordered->file_offset + ordered->num_bytes - 1);
>                 /*
> -                * IO on this page will never be started, so we need to account
> -                * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
> -                * here, must leave that up for the ordered extent completion.
> +                * IO on this ordered extent will never be started, so we need
> +                * to account for any ordered extents now. Don't clear

So this comment update states that "IO on this ordered extent will
never be started".
That is not true, unless some other patch not in misc-next changed
something and I missed it (like splitting ordered extents).

If you have a 1M ordered extent, for file range [0, 1M[ for example,
and then truncate the file to 512K or punch a hole for the range
[512K, 1M[, then IO for the first 512K of the ordered extent is still
done.

So I think what you wanted to say is more like "IO for this subpage
will never be started ...".

> +                * EXTENT_DELALLOC_NEW here, must leave that up for the
> +                * ordered extent completion.
>                  */
>                 if (!inode_evicting)
>                         clear_extent_bit(tree, start, end,
> @@ -8234,7 +8241,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>                                                            start,
>                                                            end - start + 1, 1)) {
>                                 btrfs_finish_ordered_io(ordered);
> -                               completed_ordered = true;
> +                       } else {
> +                               incompleted_ordered = true;
>                         }
>                 }
>
> @@ -8276,7 +8284,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>                  * is cleared if we don't delete, otherwise it can lead to
>                  * corruptions if the i_size is extented later.
>                  */
> -               if (found_ordered && !completed_ordered)
> +               if (found_ordered && incompleted_ordered)

I find this naming, "incompleted_ordered" confusing, I think
"incompleted" is not even a valid english word.

What you mean is that if there is some ordered extent for the page
range that we could not complete ourselves.
I would suggest naming it to "completed_all_ordered", initialize it to
true and then set it to false when we can't complete an ordered extent
ourselves.

Then it would just be "if (found_ordered && !completed_all_ordered)
delete = false;".

Also, I haven't checked the other patchsets for subpage support, but
from looking only at this patchset, I'm assuming we can't set ranges
in the io tree smaller than page size, is that correct?
Otherwise this would be calling clear_extent_bit for each subpage range.

Thanks.

>                         delete = false;
>                 clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED |
>                                  EXTENT_DELALLOC | EXTENT_UPTODATE |
> @@ -8286,6 +8294,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>                 __btrfs_releasepage(page, GFP_NOFS);
>         }
>
> +       ClearPagePrivate2(page);
>         ClearPageChecked(page);
>         clear_page_extent_mapped(page);
>  }
> --
> 2.29.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: inode: remove variable shadowing in btrfs_invalidatepage()
  2020-12-17  6:13       ` Qu Wenruo
@ 2020-12-17 12:29         ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-12-17 12:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Nikolay Borisov, linux-btrfs

On Thu, Dec 17, 2020 at 02:13:37PM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/12/17 下午1:59, Nikolay Borisov wrote:
> > 
> > 
> > On 17.12.20 г. 7:55 ч., Nikolay Borisov wrote:
> >>
> >>
> >> On 17.12.20 г. 6:57 ч., Qu Wenruo wrote:
> >>> In btrfs_invalidatepage() we re-declare @tree variable as
> >>> btrfs_ordered_inode_tree.
> >>>
> >>> Remove such variable shadowing which can be very confusing.
> >>
> >> You can't do that, because lock_extent_bits expects extent_io_tree !
> >>
> > 
> > Ok, nvm, you just factored the var at the beginning of the functions.
> > OTOH since the ordered tree is used just for lock/unlock why not do
> > spin_(un)lock(&inode->ordered_tree->lock);
> > 
> Oh, that indeed looks better and since Su is also complaining about the 
> declaration at the beginning of the function, I guess that's the better 
> way to go.

The preferred style is to declare variables in the closest scope, so
there's not a huge blob of declarations that are randomly used inside
nested blocks. It's more like a recommendation, eg. when the function is
short and there are a few variables  used inside a for/while.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
  2020-12-17  4:57 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Qu Wenruo
  2020-12-17 11:20   ` Filipe Manana
@ 2020-12-17 14:51   ` Josef Bacik
  2020-12-18  0:42     ` Qu Wenruo
  1 sibling, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-12-17 14:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/16/20 11:57 PM, Qu Wenruo wrote:
> [BUG]
> With current subpage RW patchset, the following script can lead to
> filesystem hang:
>    # mkfs.btrfs -f -s 4k $dev
>    # mount $dev -o nospace_cache $mnt
>    # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
> 
> The file system will hang at wait_event() of
> btrfs_start_ordered_extent().
> 
> [CAUSE]
> The root cause is, btrfs_invalidatepage() is freeing page::private which
> still has subpage dirty bit set.
> 
> The offending situation happens like this:
> btrfs_fllocate()
> |- btrfs_zero_range()
>     |- btrfs_punch_hole_lock_range()
>        |- truncate_pagecache_range()
>           |- btrfs_invalidatepage()
> 
> The involved range looks like:
> 
> 0	32K	64K	96K	128K
> 	|///////||//////|
> 	| Range to drop |
> 
> For the [32K, 64K) range, since the offset is 32K, the page won't be
> invalidated.
> 
> But for the [64K, 96K) range, the offset is 0, current
> btrfs_invalidatepage() will call clear_page_extent_mapped() which will
> detach page::private, making the subpage dirty bitmap being cleared.
> 
> This prevents later __extent_writepage_io() to locate any range to
> write, thus no way to wake up the ordered extents.
> 
> [FIX]
> To fix the problem this patch will:
> - Only clear page status and detach page private when the full page
>    is invalidated
> 
> - Change how we handle unfinished ordered extent
>    If there is any ordered extent unfinished in the page range, we can't
>    call clear_extent_bit() with delete == true.
> 
> [REASON FOR RFC]
> There is still uncertainty around the btrfs_releasepage() call.
> 
> 1. Why we need btrfs_releasepage() call for non-full-page condition?
>     Other fs (aka. xfs) just exit without doing special handling if
>     invalidatepage() is called with part of the page.
> 
>     Thus I didn't completely understand why btrfs_releasepage() here is
>     needed for non-full page call.
> 
> 2. Why "if (offset)" is not causing problem for current code?
>     This existing if (offset) call can be skipped for cases like
>     offset == 0 length == 2K.
>     As MM layer can call invalidatepage() with unaligned offset/length,
>     for cases like truncate_inode_pages_range().
>     This will make btrfs_invalidatepage() to truncate the whole page when
>     we only need to zero part of the page.
> 

Are we ever calling with a different length when pagesize == sectorsize?  That's 
probably why it works fine now.

But I think we should follow what all the other file systems do, if len != 
PAGE_SIZE || offset != 0 then just skip it, that would probably be easier and 
work for you as well?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
  2020-12-17 14:51   ` Josef Bacik
@ 2020-12-18  0:42     ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-18  0:42 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/12/17 下午10:51, Josef Bacik wrote:
> On 12/16/20 11:57 PM, Qu Wenruo wrote:
>> [BUG]
>> With current subpage RW patchset, the following script can lead to
>> filesystem hang:
>>    # mkfs.btrfs -f -s 4k $dev
>>    # mount $dev -o nospace_cache $mnt
>>    # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
>>
>> The file system will hang at wait_event() of
>> btrfs_start_ordered_extent().
>>
>> [CAUSE]
>> The root cause is, btrfs_invalidatepage() is freeing page::private which
>> still has subpage dirty bit set.
>>
>> The offending situation happens like this:
>> btrfs_fllocate()
>> |- btrfs_zero_range()
>>     |- btrfs_punch_hole_lock_range()
>>        |- truncate_pagecache_range()
>>           |- btrfs_invalidatepage()
>>
>> The involved range looks like:
>>
>> 0    32K    64K    96K    128K
>>     |///////||//////|
>>     | Range to drop |
>>
>> For the [32K, 64K) range, since the offset is 32K, the page won't be
>> invalidated.
>>
>> But for the [64K, 96K) range, the offset is 0, current
>> btrfs_invalidatepage() will call clear_page_extent_mapped() which will
>> detach page::private, making the subpage dirty bitmap being cleared.
>>
>> This prevents later __extent_writepage_io() to locate any range to
>> write, thus no way to wake up the ordered extents.
>>
>> [FIX]
>> To fix the problem this patch will:
>> - Only clear page status and detach page private when the full page
>>    is invalidated
>>
>> - Change how we handle unfinished ordered extent
>>    If there is any ordered extent unfinished in the page range, we can't
>>    call clear_extent_bit() with delete == true.
>>
>> [REASON FOR RFC]
>> There is still uncertainty around the btrfs_releasepage() call.
>>
>> 1. Why we need btrfs_releasepage() call for non-full-page condition?
>>     Other fs (aka. xfs) just exit without doing special handling if
>>     invalidatepage() is called with part of the page.
>>
>>     Thus I didn't completely understand why btrfs_releasepage() here is
>>     needed for non-full page call.
>>
>> 2. Why "if (offset)" is not causing problem for current code?
>>     This existing if (offset) call can be skipped for cases like
>>     offset == 0 length == 2K.
>>     As MM layer can call invalidatepage() with unaligned offset/length,
>>     for cases like truncate_inode_pages_range().
>>     This will make btrfs_invalidatepage() to truncate the whole page when
>>     we only need to zero part of the page.
>>
>
> Are we ever calling with a different length when pagesize ==
> sectorsize?  That's probably why it works fine now.

The range passed in can be unaligned at all.

MM layer functions like truncate_inode_pages_range() relies on that.

That's why I'm wondering why the current code is working.

As for start == 0 and length != PAGE_SIZE case it may clear the
Private2/Checked bit unintentionally.

Or is that CoW fixup saving the problem?
>
> But I think we should follow what all the other file systems do, if len
> != PAGE_SIZE || offset != 0 then just skip it, that would probably be
> easier and work for you as well?  Thanks,

Definitely it would work for me.

Thanks,
Qu

>
> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
  2020-12-17 11:20   ` Filipe Manana
@ 2020-12-22  4:38     ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-12-22  4:38 UTC (permalink / raw)
  To: fdmanana, Qu Wenruo; +Cc: linux-btrfs



On 2020/12/17 下午7:20, Filipe Manana wrote:
> On Thu, Dec 17, 2020 at 5:03 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [BUG]
>> With current subpage RW patchset, the following script can lead to
>> filesystem hang:
>>    # mkfs.btrfs -f -s 4k $dev
>>    # mount $dev -o nospace_cache $mnt
>>    # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
>>
>> The file system will hang at wait_event() of
>> btrfs_start_ordered_extent().
>>
>> [CAUSE]
>> The root cause is, btrfs_invalidatepage() is freeing page::private which
>> still has subpage dirty bit set.
>>
>> The offending situation happens like this:
>> btrfs_fllocate()
>> |- btrfs_zero_range()
>>     |- btrfs_punch_hole_lock_range()
>>        |- truncate_pagecache_range()
>>           |- btrfs_invalidatepage()
>>
>> The involved range looks like:
>>
>> 0       32K     64K     96K     128K
>>          |///////||//////|
>>          | Range to drop |
>>
>> For the [32K, 64K) range, since the offset is 32K, the page won't be
>> invalidated.
>>
>> But for the [64K, 96K) range, the offset is 0, current
>> btrfs_invalidatepage() will call clear_page_extent_mapped() which will
>> detach page::private, making the subpage dirty bitmap being cleared.
>>
>> This prevents later __extent_writepage_io() to locate any range to
>> write, thus no way to wake up the ordered extents.
>>
>> [FIX]
>> To fix the problem this patch will:
>> - Only clear page status and detach page private when the full page
>>    is invalidated
>>
>> - Change how we handle unfinished ordered extent
>>    If there is any ordered extent unfinished in the page range, we can't
>>    call clear_extent_bit() with delete == true.
>>
>> [REASON FOR RFC]
>> There is still uncertainty around the btrfs_releasepage() call.
>>
>> 1. Why we need btrfs_releasepage() call for non-full-page condition?
>>     Other fs (aka. xfs) just exit without doing special handling if
>>     invalidatepage() is called with part of the page.
>>
>>     Thus I didn't completely understand why btrfs_releasepage() here is
>>     needed for non-full page call.
>>
>> 2. Why "if (offset)" is not causing problem for current code?
>>     This existing if (offset) call can be skipped for cases like
>>     offset == 0 length == 2K.
>>     As MM layer can call invalidatepage() with unaligned offset/length,
>>     for cases like truncate_inode_pages_range().
>>     This will make btrfs_invalidatepage() to truncate the whole page when
>>     we only need to zero part of the page.
>
> I don't think you can ever get offset == 0 and length < PAGE_SIZE
> unless this is the last page in the file, the one containing eof, in
> which it's perfectly valid to invalidate the whole page.

You're right.

After more testing, it indeed shows except setsize call which could pass
unaligned range, all other call sites are using sector aligned ranges.

Thus the existing code won't cause any problem for current code base.

Either we're invalidating the last page of the inode, or we're
invalidating sector (PAGE) aligned range.


But for subpage support, we can have cases like
btrfs_punch_hole_lock_range() which only passes sector aligned range in,
and since sector size is smaller than page size, we can have offset == 0
while length < PAGE_SIZE and it's not the last page.

Further more, the page can have dirty range not covered by the
invalidatepage range, causing the problem.

I'll update the commit message to explain the case more, and only put
the fix into the subpage series, other than sending it out without
subpage context.

>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/inode.c | 23 ++++++++++++++++-------
>>   1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index eb493fbb65f9..872c5309b4ca 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -8180,7 +8180,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>          int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
>>          bool cleared_private2;
>>          bool found_ordered = false;
>> -       bool completed_ordered = false;
>> +       bool incompleted_ordered = false;
>>
>>          /*
>>           * we have the page locked, so new writeback can't start,
>> @@ -8191,7 +8191,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>           */
>>          wait_on_page_writeback(page);
>>
>> -       if (offset) {
>> +       /*
>> +        * The range doesn't cover the full page, just let btrfs_releasepage() to
>> +        * check if we can release the extent mapping.
>> +        * Any locked/pinned/logged extent map would prevent us freeing the
>> +        * extent mapping.
>> +        */
>> +       if (!(offset == 0 && length == PAGE_SIZE)) {
>>                  btrfs_releasepage(page, GFP_NOFS);
>>                  return;
>>          }
>> @@ -8208,9 +8214,10 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>                  end = min(page_end,
>>                            ordered->file_offset + ordered->num_bytes - 1);
>>                  /*
>> -                * IO on this page will never be started, so we need to account
>> -                * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
>> -                * here, must leave that up for the ordered extent completion.
>> +                * IO on this ordered extent will never be started, so we need
>> +                * to account for any ordered extents now. Don't clear
>
> So this comment update states that "IO on this ordered extent will
> never be started".
> That is not true, unless some other patch not in misc-next changed
> something and I missed it (like splitting ordered extents).
>
> If you have a 1M ordered extent, for file range [0, 1M[ for example,
> and then truncate the file to 512K or punch a hole for the range
> [512K, 1M[, then IO for the first 512K of the ordered extent is still
> done.
>
> So I think what you wanted to say is more like "IO for this subpage
> will never be started ...".

This is indeed much better.

>
>> +                * EXTENT_DELALLOC_NEW here, must leave that up for the
>> +                * ordered extent completion.
>>                   */
>>                  if (!inode_evicting)
>>                          clear_extent_bit(tree, start, end,
>> @@ -8234,7 +8241,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>                                                             start,
>>                                                             end - start + 1, 1)) {
>>                                  btrfs_finish_ordered_io(ordered);
>> -                               completed_ordered = true;
>> +                       } else {
>> +                               incompleted_ordered = true;
>>                          }
>>                  }
>>
>> @@ -8276,7 +8284,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>                   * is cleared if we don't delete, otherwise it can lead to
>>                   * corruptions if the i_size is extented later.
>>                   */
>> -               if (found_ordered && !completed_ordered)
>> +               if (found_ordered && incompleted_ordered)
>
> I find this naming, "incompleted_ordered" confusing, I think
> "incompleted" is not even a valid english word.
>
> What you mean is that if there is some ordered extent for the page
> range that we could not complete ourselves.
> I would suggest naming it to "completed_all_ordered", initialize it to
> true and then set it to false when we can't complete an ordered extent
> ourselves.
>
> Then it would just be "if (found_ordered && !completed_all_ordered)
> delete = false;".
>
> Also, I haven't checked the other patchsets for subpage support, but
> from looking only at this patchset, I'm assuming we can't set ranges
> in the io tree smaller than page size, is that correct?
> Otherwise this would be calling clear_extent_bit for each subpage range.

For current subpage implementation, we have to support sector aligned
writeback.

The requirement comes from data balance, we have to be able to write new
data extents exactly the same size as the originals.

Thus here we support range smaller than page size for subpage.
The extent io tree itself can support it without problem.

Thanks,
Qu
>
> Thanks.
>
>>                          delete = false;
>>                  clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED |
>>                                   EXTENT_DELALLOC | EXTENT_UPTODATE |
>> @@ -8286,6 +8294,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>>                  __btrfs_releasepage(page, GFP_NOFS);
>>          }
>>
>> +       ClearPagePrivate2(page);
>>          ClearPageChecked(page);
>>          clear_page_extent_mapped(page);
>>   }
>> --
>> 2.29.2
>>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-12-22  4:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
2020-12-17  4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
2020-12-17  5:38   ` Su Yue
2020-12-17  5:42     ` Qu Wenruo
2020-12-17  6:08       ` Su Yue
2020-12-17  5:55   ` Nikolay Borisov
2020-12-17  5:59     ` Nikolay Borisov
2020-12-17  6:13       ` Qu Wenruo
2020-12-17 12:29         ` David Sterba
2020-12-17  4:57 ` [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() " Qu Wenruo
2020-12-17  4:57 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Qu Wenruo
2020-12-17 11:20   ` Filipe Manana
2020-12-22  4:38     ` Qu Wenruo
2020-12-17 14:51   ` Josef Bacik
2020-12-18  0:42     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.