All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: make Private2 lifespan more consistent
@ 2021-01-22  6:00 Qu Wenruo
  2021-01-22  8:04 ` Nikolay Borisov
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2021-01-22  6:00 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses page Private2 bit to incidate if we have ordered
extent for the page range.

But the lifespan of it is not consistent, during regular writeback path,
there are two locations to clear the same PagePrivate2:

    T ----- Page marked Dirty
    |
    + ----- Page marked Private2, through btrfs_run_dealloc_range()
    |
    + ----- Page cleared Private2, through btrfs_writepage_cow_fixup()
    |       in __extent_writepage_io()
    |       ^^^ Private2 cleared for the first time
    |
    + ----- Page marked Writeback, through btrfs_set_range_writeback()
    |       in __extent_writepage_io().
    |
    + ----- Page cleared Private2, through
    |       btrfs_writepage_endio_finish_ordered()
    |       ^^^ Private2 cleared for the second time.
    |
    + ----- Page cleared Writeback, through
            btrfs_writepage_endio_finish_ordered()

Currently PagePrivate2 is mostly to prevent ordered extent accounting
being executed for both endio and invalidatepage.
Thus only the one who cleared page Private2 is responsible for ordered
extent accounting.

But the fact is, in btrfs_writepage_endio_finish_ordered(), page
Private2 is cleared and ordered extent accounting is executed
unconditionally.

The race prevention only happens through btrfs_invalidatepage(), where
we wait the page writeback first, before checking the Private2 bit.

This means, Private2 is also protected by Writeback bit, and there is no
need for btrfs_writepage_cow_fixup() to clear Priavte2.

This patch will change btrfs_writepage_cow_fixup() to just
check PagePrivate2, not to clear it.
The clear will happen either in btrfs_invalidatepage() or
btrfs_writepage_endio_finish_ordered().

This makes the Private2 bit easier to understand, just meaning the page
has unfinished ordered extent attached to it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ef6cb7b620d0..fd06a2a1ee8f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2539,7 +2539,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end)
 	struct btrfs_writepage_fixup *fixup;
 
 	/* this page is properly in the ordered list */
-	if (TestClearPagePrivate2(page))
+	if (PagePrivate2(page))
 		return 0;
 
 	/*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: make Private2 lifespan more consistent
  2021-01-22  6:00 [PATCH] btrfs: make Private2 lifespan more consistent Qu Wenruo
@ 2021-01-22  8:04 ` Nikolay Borisov
  2021-01-22  8:24   ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Nikolay Borisov @ 2021-01-22  8:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 22.01.21 г. 8:00 ч., Qu Wenruo wrote:
> Currently btrfs uses page Private2 bit to incidate if we have ordered
> extent for the page range.
> 
> But the lifespan of it is not consistent, during regular writeback path,
> there are two locations to clear the same PagePrivate2:
> 
>     T ----- Page marked Dirty
>     |
>     + ----- Page marked Private2, through btrfs_run_dealloc_range()
>     |
>     + ----- Page cleared Private2, through btrfs_writepage_cow_fixup()
>     |       in __extent_writepage_io()
>     |       ^^^ Private2 cleared for the first time
>     |
>     + ----- Page marked Writeback, through btrfs_set_range_writeback()
>     |       in __extent_writepage_io().
>     |
>     + ----- Page cleared Private2, through
>     |       btrfs_writepage_endio_finish_ordered()
>     |       ^^^ Private2 cleared for the second time.>     |
>     + ----- Page cleared Writeback, through
>             btrfs_writepage_endio_finish_ordered()

Where exactly is page writeback cleared in btrfs_writepage_endio_finish
or  finish_ordered_fn?

> 
> Currently PagePrivate2 is mostly to prevent ordered extent accounting
> being executed for both endio and invalidatepage.
> Thus only the one who cleared page Private2 is responsible for ordered
> extent accounting.

SO this patch likely fixes the race and double accounting you've seen on
the subpage branch, however it's still not clear how the race occurs.
IIUC PagePrivate must ensure that invalidatepage and endio don't run
concurrently. To that effect invalidatepage indeed checks to see if it's
the one which cleared pageprivate and if so it will run
btrfs_dec_test_ordered_pending and btrfs_finish_ordered_io. However, in
__extent_writepage_io btrfs_writepage_cow_fixup clears it
unconditionally and calls btrfs_writepage_endio_finish_ordered for hole
extents, right?

But in this case invalidate invalidatepage can never have
cleared_private2 set to true. IMO the actual problem this could lead
warrants more explanation.

> 
> But the fact is, in btrfs_writepage_endio_finish_ordered(), page
> Private2 is cleared and ordered extent accounting is executed
> unconditionally.
> 
> The race prevention only happens through btrfs_invalidatepage(), where
> we wait the page writeback first, before checking the Private2 bit.
> 
> This means, Private2 is also protected by Writeback bit, and there is no
> need for btrfs_writepage_cow_fixup() to clear Priavte2.
> 
> This patch will change btrfs_writepage_cow_fixup() to just
> check PagePrivate2, not to clear it.
> The clear will happen either in btrfs_invalidatepage() or
> btrfs_writepage_endio_finish_ordered().
> 

<snip>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: make Private2 lifespan more consistent
  2021-01-22  8:04 ` Nikolay Borisov
@ 2021-01-22  8:24   ` Qu Wenruo
  2021-01-22  8:40     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2021-01-22  8:24 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2021/1/22 下午4:04, Nikolay Borisov wrote:
>
>
> On 22.01.21 г. 8:00 ч., Qu Wenruo wrote:
>> Currently btrfs uses page Private2 bit to incidate if we have ordered
>> extent for the page range.
>>
>> But the lifespan of it is not consistent, during regular writeback path,
>> there are two locations to clear the same PagePrivate2:
>>
>>      T ----- Page marked Dirty
>>      |
>>      + ----- Page marked Private2, through btrfs_run_dealloc_range()
>>      |
>>      + ----- Page cleared Private2, through btrfs_writepage_cow_fixup()
>>      |       in __extent_writepage_io()
>>      |       ^^^ Private2 cleared for the first time
>>      |
>>      + ----- Page marked Writeback, through btrfs_set_range_writeback()
>>      |       in __extent_writepage_io().
>>      |
>>      + ----- Page cleared Private2, through
>>      |       btrfs_writepage_endio_finish_ordered()
>>      |       ^^^ Private2 cleared for the second time.>     |
>>      + ----- Page cleared Writeback, through
>>              btrfs_writepage_endio_finish_ordered()
>
> Where exactly is page writeback cleared in btrfs_writepage_endio_finish
> or  finish_ordered_fn?

My bad. It's in finish_ordered_io().

>
>>
>> Currently PagePrivate2 is mostly to prevent ordered extent accounting
>> being executed for both endio and invalidatepage.
>> Thus only the one who cleared page Private2 is responsible for ordered
>> extent accounting.
>
> SO this patch likely fixes the race and double accounting you've seen on
> the subpage branch,

Nope, it's unrelated at all.

The subpage problem is in the patch where I convert
btrfs_writepage_endio_finish_ordered_io() to support subpage.

In that patch I wrongly moved the timing of ClearPagePrivate2() after we
queued the ordered extents.

Thus there will no be specific patch for that fix, just update that
patch to solve the problem.

> however it's still not clear how the race occurs.

There is no race in current code base.

The invalidatepage() will wait writeback, thus it means there are the
following possible combinations:

- Page Writeback | Private2
   Then invalidatepage() will wait for Writeback, and during endio,
   Private2 will be cleared.

   Accounting is done in endio.

- Page Writeback but NO Private2
   The same as previous cases

- Page Private2 but NO Writeback
   Invalidatepage() will just clear Private2 and do the ordered extent
   accounting.

   Accounting is done in invalidagepage()

- Page without Private2 nor Writeback
   Do nothing.



> IIUC PagePrivate must ensure that invalidatepage and endio don't run
> concurrently. To that effect invalidatepage indeed checks to see if it's
> the one which cleared pageprivate and if so it will run
> btrfs_dec_test_ordered_pending and btrfs_finish_ordered_io. However, in
> __extent_writepage_io btrfs_writepage_cow_fixup clears it
> unconditionally and calls btrfs_writepage_endio_finish_ordered for hole
> extents, right?
>
> But in this case invalidate invalidatepage can never have
> cleared_private2 set to true. IMO the actual problem this could lead
> warrants more explanation.
Your understanding is correct and it matches the correct code and my
understanding too.

Thus this patch is really just to make the life span easier to read, no
functional change at all.

Thanks,
Qu

>
>>
>> But the fact is, in btrfs_writepage_endio_finish_ordered(), page
>> Private2 is cleared and ordered extent accounting is executed
>> unconditionally.
>>
>> The race prevention only happens through btrfs_invalidatepage(), where
>> we wait the page writeback first, before checking the Private2 bit.
>>
>> This means, Private2 is also protected by Writeback bit, and there is no
>> need for btrfs_writepage_cow_fixup() to clear Priavte2.
>>
>> This patch will change btrfs_writepage_cow_fixup() to just
>> check PagePrivate2, not to clear it.
>> The clear will happen either in btrfs_invalidatepage() or
>> btrfs_writepage_endio_finish_ordered().
>>
>
> <snip>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: make Private2 lifespan more consistent
  2021-01-22  8:24   ` Qu Wenruo
@ 2021-01-22  8:40     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2021-01-22  8:40 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2021/1/22 下午4:24, Qu Wenruo wrote:
>
>
> On 2021/1/22 下午4:04, Nikolay Borisov wrote:
>>
>>
>> On 22.01.21 г. 8:00 ч., Qu Wenruo wrote:
>>> Currently btrfs uses page Private2 bit to incidate if we have ordered
>>> extent for the page range.
>>>
>>> But the lifespan of it is not consistent, during regular writeback path,
>>> there are two locations to clear the same PagePrivate2:
>>>
>>>      T ----- Page marked Dirty
>>>      |
>>>      + ----- Page marked Private2, through btrfs_run_dealloc_range()
>>>      |
>>>      + ----- Page cleared Private2, through btrfs_writepage_cow_fixup()
>>>      |       in __extent_writepage_io()
>>>      |       ^^^ Private2 cleared for the first time
>>>      |
>>>      + ----- Page marked Writeback, through btrfs_set_range_writeback()
>>>      |       in __extent_writepage_io().
>>>      |
>>>      + ----- Page cleared Private2, through
>>>      |       btrfs_writepage_endio_finish_ordered()
>>>      |       ^^^ Private2 cleared for the second time.>     |
>>>      + ----- Page cleared Writeback, through
>>>              btrfs_writepage_endio_finish_ordered()
>>
>> Where exactly is page writeback cleared in btrfs_writepage_endio_finish
>> or  finish_ordered_fn?
>
> My bad. It's in finish_ordered_io().

Oh no, it's in end_bio_extent_writepage().

This means my original subpage plan to fix is not corrrect at all.
The Private2 is cleared before clearing page Writeback anyway...

The fix should be reworked now...
>
>>
>>>
>>> Currently PagePrivate2 is mostly to prevent ordered extent accounting
>>> being executed for both endio and invalidatepage.
>>> Thus only the one who cleared page Private2 is responsible for ordered
>>> extent accounting.
>>
>> SO this patch likely fixes the race and double accounting you've seen on
>> the subpage branch,
>
> Nope, it's unrelated at all.
>
> The subpage problem is in the patch where I convert
> btrfs_writepage_endio_finish_ordered_io() to support subpage.
>
> In that patch I wrongly moved the timing of ClearPagePrivate2() after we
> queued the ordered extents.
>
> Thus there will no be specific patch for that fix, just update that
> patch to solve the problem.
>
>> however it's still not clear how the race occurs.
>
> There is no race in current code base.
>
> The invalidatepage() will wait writeback, thus it means there are the
> following possible combinations:
>
> - Page Writeback | Private2
>    Then invalidatepage() will wait for Writeback, and during endio,
>    Private2 will be cleared.
>
>    Accounting is done in endio.
>
> - Page Writeback but NO Private2
>    The same as previous cases
>
> - Page Private2 but NO Writeback
>    Invalidatepage() will just clear Private2 and do the ordered extent
>    accounting.
>
>    Accounting is done in invalidagepage()
>
> - Page without Private2 nor Writeback
>    Do nothing.
>
>
>
>> IIUC PagePrivate must ensure that invalidatepage and endio don't run
>> concurrently. To that effect invalidatepage indeed checks to see if it's
>> the one which cleared pageprivate and if so it will run
>> btrfs_dec_test_ordered_pending and btrfs_finish_ordered_io. However, in
>> __extent_writepage_io btrfs_writepage_cow_fixup clears it
>> unconditionally and calls btrfs_writepage_endio_finish_ordered for hole
>> extents, right?
>>
>> But in this case invalidate invalidatepage can never have
>> cleared_private2 set to true. IMO the actual problem this could lead
>> warrants more explanation.
> Your understanding is correct and it matches the correct code and my
> understanding too.
>
> Thus this patch is really just to make the life span easier to read, no
> functional change at all.
>
> Thanks,
> Qu
>
>>
>>>
>>> But the fact is, in btrfs_writepage_endio_finish_ordered(), page
>>> Private2 is cleared and ordered extent accounting is executed
>>> unconditionally.
>>>
>>> The race prevention only happens through btrfs_invalidatepage(), where
>>> we wait the page writeback first, before checking the Private2 bit.
>>>
>>> This means, Private2 is also protected by Writeback bit, and there is no
>>> need for btrfs_writepage_cow_fixup() to clear Priavte2.
>>>
>>> This patch will change btrfs_writepage_cow_fixup() to just
>>> check PagePrivate2, not to clear it.
>>> The clear will happen either in btrfs_invalidatepage() or
>>> btrfs_writepage_endio_finish_ordered().
>>>
>>
>> <snip>
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-22  8:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22  6:00 [PATCH] btrfs: make Private2 lifespan more consistent Qu Wenruo
2021-01-22  8:04 ` Nikolay Borisov
2021-01-22  8:24   ` Qu Wenruo
2021-01-22  8:40     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.