All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
@ 2022-08-15  2:42 ethanlien
  2022-08-16  8:49 ` Filipe Manana
  0 siblings, 1 reply; 6+ messages in thread
From: ethanlien @ 2022-08-15  2:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: cunankimo, Ethan Lien

From: Ethan Lien <ethanlien@synology.com>

After we copied data to page cache in buffered I/O, we
1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
   endio_readpage_release_extent(), set_extent_delalloc() or
   set_extent_defrag().
2. Set page uptodate before we unlock the page.

But the only place we check io_tree's EXTENT_UPTODATE state is in
btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.

For example, when performing a buffered random read:

	fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
		--filesize=32G --size=4G --bs=4k --name=job \
		--filename=/mnt/file --name=job

Then check how many extent_state in io_tree:

	cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'

w/o this patch, we got 640567 btrfs_extent_state.
w/  this patch, we got    204 btrfs_extent_state.

Maintaining such a big tree brings overhead since every I/O needs to insert
EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
every insert or remove, we need to lock io_tree, do tree search, alloc or
dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
io_tree in a minimal size and reduce overhead when performing buffered I/O.

Signed-off-by: Ethan Lien <ethanlien@synology.com>
Reviewed-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/extent-io-tree.h | 4 ++--
 fs/btrfs/extent_io.c      | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index c3eb52dbe61c..53ae849d0248 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
 				      struct extent_state **cached_state)
 {
 	return set_extent_bit(tree, start, end,
-			      EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
+			      EXTENT_DELALLOC | extra_bits,
 			      0, NULL, cached_state, GFP_NOFS, NULL);
 }
 
@@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached_state)
 {
 	return set_extent_bit(tree, start, end,
-			      EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
+			      EXTENT_DELALLOC | EXTENT_DEFRAG,
 			      0, NULL, cached_state, GFP_NOFS, NULL);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bfae67c593c5..e0f0a39cd6eb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	 * Now we don't have range contiguous to the processed range, release
 	 * the processed range now.
 	 */
-	if (processed->uptodate && tree->track_uptodate)
-		set_extent_uptodate(tree, processed->start, processed->end,
-				    &cached, GFP_ATOMIC);
 	unlock_extent_cached_atomic(tree, processed->start, processed->end,
 				    &cached);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
  2022-08-15  2:42 [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path ethanlien
@ 2022-08-16  8:49 ` Filipe Manana
  2022-08-17  6:30   ` Ethan Lien
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2022-08-16  8:49 UTC (permalink / raw)
  To: ethanlien; +Cc: linux-btrfs, cunankimo

On Mon, Aug 15, 2022 at 4:16 AM ethanlien <ethanlien@synology.com> wrote:
>
> From: Ethan Lien <ethanlien@synology.com>
>
> After we copied data to page cache in buffered I/O, we
> 1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
>    endio_readpage_release_extent(), set_extent_delalloc() or
>    set_extent_defrag().
> 2. Set page uptodate before we unlock the page.
>
> But the only place we check io_tree's EXTENT_UPTODATE state is in
> btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
> have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.
>
> For example, when performing a buffered random read:
>
>         fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
>                 --filesize=32G --size=4G --bs=4k --name=job \
>                 --filename=/mnt/file --name=job
>
> Then check how many extent_state in io_tree:
>
>         cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'
>
> w/o this patch, we got 640567 btrfs_extent_state.
> w/  this patch, we got    204 btrfs_extent_state.

Did fio also report increased throughput?

>
> Maintaining such a big tree brings overhead since every I/O needs to insert
> EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
> every insert or remove, we need to lock io_tree, do tree search, alloc or
> dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
> io_tree in a minimal size and reduce overhead when performing buffered I/O.

The idea is sound, and I don't see a reason to keep using
EXTENT_UPTODATE as well.

>
> Signed-off-by: Ethan Lien <ethanlien@synology.com>
> Reviewed-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/extent-io-tree.h | 4 ++--
>  fs/btrfs/extent_io.c      | 3 ---
>  2 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index c3eb52dbe61c..53ae849d0248 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
>                                       struct extent_state **cached_state)
>  {
>         return set_extent_bit(tree, start, end,
> -                             EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
> +                             EXTENT_DELALLOC | extra_bits,
>                               0, NULL, cached_state, GFP_NOFS, NULL);
>  }
>
> @@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
>                 u64 end, struct extent_state **cached_state)
>  {
>         return set_extent_bit(tree, start, end,
> -                             EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
> +                             EXTENT_DELALLOC | EXTENT_DEFRAG,
>                               0, NULL, cached_state, GFP_NOFS, NULL);
>  }
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bfae67c593c5..e0f0a39cd6eb 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
>          * Now we don't have range contiguous to the processed range, release
>          * the processed range now.
>          */
> -       if (processed->uptodate && tree->track_uptodate)
> -               set_extent_uptodate(tree, processed->start, processed->end,
> -                                   &cached, GFP_ATOMIC);

This is another good thing, to get rid of a GFP_ATOMIC allocation.

Why didn't you remove the set_extent_uptodate() call at btrfs_get_extent() too?
It can only be set during a page read at btrfs_do_readpage(), for an
inline extent.

Also, having the tests for EXTENT_UPTODATE at btrfs_do_readpage() now become
useless too, don't they? Why have you kept them?

Thanks.

>         unlock_extent_cached_atomic(tree, processed->start, processed->end,
>                                     &cached);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
  2022-08-16  8:49 ` Filipe Manana
@ 2022-08-17  6:30   ` Ethan Lien
  2022-08-17 12:35     ` Filipe Manana
  0 siblings, 1 reply; 6+ messages in thread
From: Ethan Lien @ 2022-08-17  6:30 UTC (permalink / raw)
  To: Filipe Manana; +Cc: ethanlien, linux-btrfs

Filipe Manana <fdmanana@kernel.org> 於 2022年8月16日 週二 下午4:49寫道:
>
> On Mon, Aug 15, 2022 at 4:16 AM ethanlien <ethanlien@synology.com> wrote:
> >
> > From: Ethan Lien <ethanlien@synology.com>
> >
> > After we copied data to page cache in buffered I/O, we
> > 1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
> >    endio_readpage_release_extent(), set_extent_delalloc() or
> >    set_extent_defrag().
> > 2. Set page uptodate before we unlock the page.
> >
> > But the only place we check io_tree's EXTENT_UPTODATE state is in
> > btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
> > have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.
> >
> > For example, when performing a buffered random read:
> >
> >         fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
> >                 --filesize=32G --size=4G --bs=4k --name=job \
> >                 --filename=/mnt/file --name=job
> >
> > Then check how many extent_state in io_tree:
> >
> >         cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'
> >
> > w/o this patch, we got 640567 btrfs_extent_state.
> > w/  this patch, we got    204 btrfs_extent_state.
>
> Did fio also report increased throughput?
>

Yes, but only when we have a lot of memory (32GB or above).
In a read benchmark, most of the memory can be used as page cache,
so there are no ways we can free those UPTODATE extent states unless
we explicitly call drop cache.
We have observed millions of useless EXTENT_UPTODATE extent states
in inode's io_tree, if w/o this patch.

> >
> > Maintaining such a big tree brings overhead since every I/O needs to insert
> > EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
> > every insert or remove, we need to lock io_tree, do tree search, alloc or
> > dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
> > io_tree in a minimal size and reduce overhead when performing buffered I/O.
>
> The idea is sound, and I don't see a reason to keep using
> EXTENT_UPTODATE as well.
>
> >
> > Signed-off-by: Ethan Lien <ethanlien@synology.com>
> > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > ---
> >  fs/btrfs/extent-io-tree.h | 4 ++--
> >  fs/btrfs/extent_io.c      | 3 ---
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> > index c3eb52dbe61c..53ae849d0248 100644
> > --- a/fs/btrfs/extent-io-tree.h
> > +++ b/fs/btrfs/extent-io-tree.h
> > @@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
> >                                       struct extent_state **cached_state)
> >  {
> >         return set_extent_bit(tree, start, end,
> > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
> > +                             EXTENT_DELALLOC | extra_bits,
> >                               0, NULL, cached_state, GFP_NOFS, NULL);
> >  }
> >
> > @@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
> >                 u64 end, struct extent_state **cached_state)
> >  {
> >         return set_extent_bit(tree, start, end,
> > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
> > +                             EXTENT_DELALLOC | EXTENT_DEFRAG,
> >                               0, NULL, cached_state, GFP_NOFS, NULL);
> >  }
> >
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index bfae67c593c5..e0f0a39cd6eb 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
> >          * Now we don't have range contiguous to the processed range, release
> >          * the processed range now.
> >          */
> > -       if (processed->uptodate && tree->track_uptodate)
> > -               set_extent_uptodate(tree, processed->start, processed->end,
> > -                                   &cached, GFP_ATOMIC);
>
> This is another good thing, to get rid of a GFP_ATOMIC allocation.
>
> Why didn't you remove the set_extent_uptodate() call at btrfs_get_extent() too?
> It can only be set during a page read at btrfs_do_readpage(), for an
> inline extent.
>
> Also, having the tests for EXTENT_UPTODATE at btrfs_do_readpage() now become
> useless too, don't they? Why have you kept them?
>

Currently if we found a inline extent in btrfs_get_extent(), we set
page uptodate
or page error in btrfs_do_readpage().
So we still need EXTENT_UPTODATE to let btrfs_do_readpage() knows what to do.

Or do you suggest we set page uptodate or error in btrfs_get_extent(),
for inline extent?

Thanks.

> Thanks.
>
> >         unlock_extent_cached_atomic(tree, processed->start, processed->end,
> >                                     &cached);
> >
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
  2022-08-17  6:30   ` Ethan Lien
@ 2022-08-17 12:35     ` Filipe Manana
  2022-08-18  7:32       ` Ethan Lien
  0 siblings, 1 reply; 6+ messages in thread
From: Filipe Manana @ 2022-08-17 12:35 UTC (permalink / raw)
  To: Ethan Lien; +Cc: ethanlien, linux-btrfs

On Wed, Aug 17, 2022 at 02:30:25PM +0800, Ethan Lien wrote:
> Filipe Manana <fdmanana@kernel.org> 於 2022年8月16日 週二 下午4:49寫道:
> >
> > On Mon, Aug 15, 2022 at 4:16 AM ethanlien <ethanlien@synology.com> wrote:
> > >
> > > From: Ethan Lien <ethanlien@synology.com>
> > >
> > > After we copied data to page cache in buffered I/O, we
> > > 1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
> > >    endio_readpage_release_extent(), set_extent_delalloc() or
> > >    set_extent_defrag().
> > > 2. Set page uptodate before we unlock the page.
> > >
> > > But the only place we check io_tree's EXTENT_UPTODATE state is in
> > > btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
> > > have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.
> > >
> > > For example, when performing a buffered random read:
> > >
> > >         fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
> > >                 --filesize=32G --size=4G --bs=4k --name=job \
> > >                 --filename=/mnt/file --name=job
> > >
> > > Then check how many extent_state in io_tree:
> > >
> > >         cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'
> > >
> > > w/o this patch, we got 640567 btrfs_extent_state.
> > > w/  this patch, we got    204 btrfs_extent_state.
> >
> > Did fio also report increased throughput?
> >
> 
> Yes, but only when we have a lot of memory (32GB or above).
> In a read benchmark, most of the memory can be used as page cache,
> so there are no ways we can free those UPTODATE extent states unless
> we explicitly call drop cache.
> We have observed millions of useless EXTENT_UPTODATE extent states
> in inode's io_tree, if w/o this patch.
> 
> > >
> > > Maintaining such a big tree brings overhead since every I/O needs to insert
> > > EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
> > > every insert or remove, we need to lock io_tree, do tree search, alloc or
> > > dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
> > > io_tree in a minimal size and reduce overhead when performing buffered I/O.
> >
> > The idea is sound, and I don't see a reason to keep using
> > EXTENT_UPTODATE as well.
> >
> > >
> > > Signed-off-by: Ethan Lien <ethanlien@synology.com>
> > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > ---
> > >  fs/btrfs/extent-io-tree.h | 4 ++--
> > >  fs/btrfs/extent_io.c      | 3 ---
> > >  2 files changed, 2 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> > > index c3eb52dbe61c..53ae849d0248 100644
> > > --- a/fs/btrfs/extent-io-tree.h
> > > +++ b/fs/btrfs/extent-io-tree.h
> > > @@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
> > >                                       struct extent_state **cached_state)
> > >  {
> > >         return set_extent_bit(tree, start, end,
> > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
> > > +                             EXTENT_DELALLOC | extra_bits,
> > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > >  }
> > >
> > > @@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
> > >                 u64 end, struct extent_state **cached_state)
> > >  {
> > >         return set_extent_bit(tree, start, end,
> > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
> > > +                             EXTENT_DELALLOC | EXTENT_DEFRAG,
> > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > >  }
> > >
> > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > index bfae67c593c5..e0f0a39cd6eb 100644
> > > --- a/fs/btrfs/extent_io.c
> > > +++ b/fs/btrfs/extent_io.c
> > > @@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
> > >          * Now we don't have range contiguous to the processed range, release
> > >          * the processed range now.
> > >          */
> > > -       if (processed->uptodate && tree->track_uptodate)
> > > -               set_extent_uptodate(tree, processed->start, processed->end,
> > > -                                   &cached, GFP_ATOMIC);
> >
> > This is another good thing, to get rid of a GFP_ATOMIC allocation.
> >
> > Why didn't you remove the set_extent_uptodate() call at btrfs_get_extent() too?
> > It can only be set during a page read at btrfs_do_readpage(), for an
> > inline extent.
> >
> > Also, having the tests for EXTENT_UPTODATE at btrfs_do_readpage() now become
> > useless too, don't they? Why have you kept them?
> >
> 
> Currently if we found a inline extent in btrfs_get_extent(), we set
> page uptodate
> or page error in btrfs_do_readpage().
> So we still need EXTENT_UPTODATE to let btrfs_do_readpage() knows what to do.
> 
> Or do you suggest we set page uptodate or error in btrfs_get_extent(),
> for inline extent?

My idea was like this:

1) Remove the set_extent_uptodate() call at btrfs_get_extent();

2) At btrfs_do_readpage(), if we get a inline extent (em->block_start == EXTENT_MAP_INLINE),
   then we set the page up to date (at btrfs_do_readpage()).

For step 2, we could also set the page up to date at btrfs_get_extent(), as
you said.

The only case where we get a page passed to btrfs_get_extent() is through
btrfs_do_readpage(), so I think either approach should work, and then remove
the remaining cases where we test for the EXTENT_UPTODATE state at
btrfs_do_readpage().

Thanks.

> 
> Thanks.
> 
> > Thanks.
> >
> > >         unlock_extent_cached_atomic(tree, processed->start, processed->end,
> > >                                     &cached);
> > >
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
  2022-08-17 12:35     ` Filipe Manana
@ 2022-08-18  7:32       ` Ethan Lien
  2022-08-18  8:00         ` Ethan Lien
  0 siblings, 1 reply; 6+ messages in thread
From: Ethan Lien @ 2022-08-18  7:32 UTC (permalink / raw)
  To: Filipe Manana; +Cc: ethanlien, linux-btrfs

Filipe Manana <fdmanana@kernel.org> 於 2022年8月17日 週三 晚上8:36寫道:
>
> On Wed, Aug 17, 2022 at 02:30:25PM +0800, Ethan Lien wrote:
> > Filipe Manana <fdmanana@kernel.org> 於 2022年8月16日 週二 下午4:49寫道:
> > >
> > > On Mon, Aug 15, 2022 at 4:16 AM ethanlien <ethanlien@synology.com> wrote:
> > > >
> > > > From: Ethan Lien <ethanlien@synology.com>
> > > >
> > > > After we copied data to page cache in buffered I/O, we
> > > > 1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
> > > >    endio_readpage_release_extent(), set_extent_delalloc() or
> > > >    set_extent_defrag().
> > > > 2. Set page uptodate before we unlock the page.
> > > >
> > > > But the only place we check io_tree's EXTENT_UPTODATE state is in
> > > > btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
> > > > have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.
> > > >
> > > > For example, when performing a buffered random read:
> > > >
> > > >         fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
> > > >                 --filesize=32G --size=4G --bs=4k --name=job \
> > > >                 --filename=/mnt/file --name=job
> > > >
> > > > Then check how many extent_state in io_tree:
> > > >
> > > >         cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'
> > > >
> > > > w/o this patch, we got 640567 btrfs_extent_state.
> > > > w/  this patch, we got    204 btrfs_extent_state.
> > >
> > > Did fio also report increased throughput?
> > >
> >
> > Yes, but only when we have a lot of memory (32GB or above).
> > In a read benchmark, most of the memory can be used as page cache,
> > so there are no ways we can free those UPTODATE extent states unless
> > we explicitly call drop cache.
> > We have observed millions of useless EXTENT_UPTODATE extent states
> > in inode's io_tree, if w/o this patch.
> >
> > > >
> > > > Maintaining such a big tree brings overhead since every I/O needs to insert
> > > > EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
> > > > every insert or remove, we need to lock io_tree, do tree search, alloc or
> > > > dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
> > > > io_tree in a minimal size and reduce overhead when performing buffered I/O.
> > >
> > > The idea is sound, and I don't see a reason to keep using
> > > EXTENT_UPTODATE as well.
> > >
> > > >
> > > > Signed-off-by: Ethan Lien <ethanlien@synology.com>
> > > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > > ---
> > > >  fs/btrfs/extent-io-tree.h | 4 ++--
> > > >  fs/btrfs/extent_io.c      | 3 ---
> > > >  2 files changed, 2 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> > > > index c3eb52dbe61c..53ae849d0248 100644
> > > > --- a/fs/btrfs/extent-io-tree.h
> > > > +++ b/fs/btrfs/extent-io-tree.h
> > > > @@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
> > > >                                       struct extent_state **cached_state)
> > > >  {
> > > >         return set_extent_bit(tree, start, end,
> > > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
> > > > +                             EXTENT_DELALLOC | extra_bits,
> > > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > > >  }
> > > >
> > > > @@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
> > > >                 u64 end, struct extent_state **cached_state)
> > > >  {
> > > >         return set_extent_bit(tree, start, end,
> > > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
> > > > +                             EXTENT_DELALLOC | EXTENT_DEFRAG,
> > > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > > >  }
> > > >
> > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > index bfae67c593c5..e0f0a39cd6eb 100644
> > > > --- a/fs/btrfs/extent_io.c
> > > > +++ b/fs/btrfs/extent_io.c
> > > > @@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
> > > >          * Now we don't have range contiguous to the processed range, release
> > > >          * the processed range now.
> > > >          */
> > > > -       if (processed->uptodate && tree->track_uptodate)
> > > > -               set_extent_uptodate(tree, processed->start, processed->end,
> > > > -                                   &cached, GFP_ATOMIC);
> > >
> > > This is another good thing, to get rid of a GFP_ATOMIC allocation.
> > >
> > > Why didn't you remove the set_extent_uptodate() call at btrfs_get_extent() too?
> > > It can only be set during a page read at btrfs_do_readpage(), for an
> > > inline extent.
> > >
> > > Also, having the tests for EXTENT_UPTODATE at btrfs_do_readpage() now become
> > > useless too, don't they? Why have you kept them?
> > >
> >
> > Currently if we found a inline extent in btrfs_get_extent(), we set
> > page uptodate
> > or page error in btrfs_do_readpage().
> > So we still need EXTENT_UPTODATE to let btrfs_do_readpage() knows what to do.
> >
> > Or do you suggest we set page uptodate or error in btrfs_get_extent(),
> > for inline extent?
>
> My idea was like this:
>
> 1) Remove the set_extent_uptodate() call at btrfs_get_extent();
>
> 2) At btrfs_do_readpage(), if we get a inline extent (em->block_start == EXTENT_MAP_INLINE),
>    then we set the page up to date (at btrfs_do_readpage()).
>
> For step 2, we could also set the page up to date at btrfs_get_extent(), as
> you said.
>

It is possible we found a compressed inline extent and got error in
uncompress_inline().
So we can't unconditionally set page uptodate in btrfs_do_readpage(),
if we found em->block_start == EXTENT_MAP_INLINE.
I think the only solution is we set page uptodate in btrfs_get_extent()?

Thanks.

> The only case where we get a page passed to btrfs_get_extent() is through
> btrfs_do_readpage(), so I think either approach should work, and then remove
> the remaining cases where we test for the EXTENT_UPTODATE state at
> btrfs_do_readpage().
>
> Thanks.
>
> >
> > Thanks.
> >
> > > Thanks.
> > >
> > > >         unlock_extent_cached_atomic(tree, processed->start, processed->end,
> > > >                                     &cached);
> > > >
> > > > --
> > > > 2.17.1
> > > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path
  2022-08-18  7:32       ` Ethan Lien
@ 2022-08-18  8:00         ` Ethan Lien
  0 siblings, 0 replies; 6+ messages in thread
From: Ethan Lien @ 2022-08-18  8:00 UTC (permalink / raw)
  To: Filipe Manana; +Cc: ethanlien, linux-btrfs

Ethan Lien <cunankimo@gmail.com> 於 2022年8月18日 週四 下午3:32寫道:
>
> Filipe Manana <fdmanana@kernel.org> 於 2022年8月17日 週三 晚上8:36寫道:
> >
> > On Wed, Aug 17, 2022 at 02:30:25PM +0800, Ethan Lien wrote:
> > > Filipe Manana <fdmanana@kernel.org> 於 2022年8月16日 週二 下午4:49寫道:
> > > >
> > > > On Mon, Aug 15, 2022 at 4:16 AM ethanlien <ethanlien@synology.com> wrote:
> > > > >
> > > > > From: Ethan Lien <ethanlien@synology.com>
> > > > >
> > > > > After we copied data to page cache in buffered I/O, we
> > > > > 1. Insert a EXTENT_UPTODATE state into inode's io_tree, by
> > > > >    endio_readpage_release_extent(), set_extent_delalloc() or
> > > > >    set_extent_defrag().
> > > > > 2. Set page uptodate before we unlock the page.
> > > > >
> > > > > But the only place we check io_tree's EXTENT_UPTODATE state is in
> > > > > btrfs_do_readpage(). We know we enter btrfs_do_readpage() only when we
> > > > > have a non-uptodate page, so it is unnecessary to set EXTENT_UPTODATE.
> > > > >
> > > > > For example, when performing a buffered random read:
> > > > >
> > > > >         fio --rw=randread --ioengine=libaio --direct=0 --numjobs=4 \
> > > > >                 --filesize=32G --size=4G --bs=4k --name=job \
> > > > >                 --filename=/mnt/file --name=job
> > > > >
> > > > > Then check how many extent_state in io_tree:
> > > > >
> > > > >         cat /proc/slabinfo | grep btrfs_extent_state | awk '{print $2}'
> > > > >
> > > > > w/o this patch, we got 640567 btrfs_extent_state.
> > > > > w/  this patch, we got    204 btrfs_extent_state.
> > > >
> > > > Did fio also report increased throughput?
> > > >
> > >
> > > Yes, but only when we have a lot of memory (32GB or above).
> > > In a read benchmark, most of the memory can be used as page cache,
> > > so there are no ways we can free those UPTODATE extent states unless
> > > we explicitly call drop cache.
> > > We have observed millions of useless EXTENT_UPTODATE extent states
> > > in inode's io_tree, if w/o this patch.
> > >
> > > > >
> > > > > Maintaining such a big tree brings overhead since every I/O needs to insert
> > > > > EXTENT_LOCKED, insert EXTENT_UPTODATE, then remove EXTENT_LOCKED. And in
> > > > > every insert or remove, we need to lock io_tree, do tree search, alloc or
> > > > > dealloc extent states. By removing unnecessary EXTENT_UPTODATE, we keep
> > > > > io_tree in a minimal size and reduce overhead when performing buffered I/O.
> > > >
> > > > The idea is sound, and I don't see a reason to keep using
> > > > EXTENT_UPTODATE as well.
> > > >
> > > > >
> > > > > Signed-off-by: Ethan Lien <ethanlien@synology.com>
> > > > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > > > ---
> > > > >  fs/btrfs/extent-io-tree.h | 4 ++--
> > > > >  fs/btrfs/extent_io.c      | 3 ---
> > > > >  2 files changed, 2 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> > > > > index c3eb52dbe61c..53ae849d0248 100644
> > > > > --- a/fs/btrfs/extent-io-tree.h
> > > > > +++ b/fs/btrfs/extent-io-tree.h
> > > > > @@ -211,7 +211,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
> > > > >                                       struct extent_state **cached_state)
> > > > >  {
> > > > >         return set_extent_bit(tree, start, end,
> > > > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
> > > > > +                             EXTENT_DELALLOC | extra_bits,
> > > > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > > > >  }
> > > > >
> > > > > @@ -219,7 +219,7 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
> > > > >                 u64 end, struct extent_state **cached_state)
> > > > >  {
> > > > >         return set_extent_bit(tree, start, end,
> > > > > -                             EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
> > > > > +                             EXTENT_DELALLOC | EXTENT_DEFRAG,
> > > > >                               0, NULL, cached_state, GFP_NOFS, NULL);
> > > > >  }
> > > > >
> > > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > > > > index bfae67c593c5..e0f0a39cd6eb 100644
> > > > > --- a/fs/btrfs/extent_io.c
> > > > > +++ b/fs/btrfs/extent_io.c
> > > > > @@ -2924,9 +2924,6 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
> > > > >          * Now we don't have range contiguous to the processed range, release
> > > > >          * the processed range now.
> > > > >          */
> > > > > -       if (processed->uptodate && tree->track_uptodate)
> > > > > -               set_extent_uptodate(tree, processed->start, processed->end,
> > > > > -                                   &cached, GFP_ATOMIC);
> > > >
> > > > This is another good thing, to get rid of a GFP_ATOMIC allocation.
> > > >
> > > > Why didn't you remove the set_extent_uptodate() call at btrfs_get_extent() too?
> > > > It can only be set during a page read at btrfs_do_readpage(), for an
> > > > inline extent.
> > > >
> > > > Also, having the tests for EXTENT_UPTODATE at btrfs_do_readpage() now become
> > > > useless too, don't they? Why have you kept them?
> > > >
> > >
> > > Currently if we found a inline extent in btrfs_get_extent(), we set
> > > page uptodate
> > > or page error in btrfs_do_readpage().
> > > So we still need EXTENT_UPTODATE to let btrfs_do_readpage() knows what to do.
> > >
> > > Or do you suggest we set page uptodate or error in btrfs_get_extent(),
> > > for inline extent?
> >
> > My idea was like this:
> >
> > 1) Remove the set_extent_uptodate() call at btrfs_get_extent();
> >
> > 2) At btrfs_do_readpage(), if we get a inline extent (em->block_start == EXTENT_MAP_INLINE),
> >    then we set the page up to date (at btrfs_do_readpage()).
> >
> > For step 2, we could also set the page up to date at btrfs_get_extent(), as
> > you said.
> >
>
> It is possible we found a compressed inline extent and got error in
> uncompress_inline().
> So we can't unconditionally set page uptodate in btrfs_do_readpage(),
> if we found em->block_start == EXTENT_MAP_INLINE.
> I think the only solution is we set page uptodate in btrfs_get_extent()?
>

Ooops I got this wrong.
If we failed in uncompress_inline(), btrfs_do_readpage() will error out.
That means as long as we found block_start == EXTENT_MAP_INLINE in
btrfs_do_readpage(), we can safely set page uptodate, without the help
of EXTENT_UPTODATE.

I'll send a v2 patch, thanks.

> Thanks.
>
> > The only case where we get a page passed to btrfs_get_extent() is through
> > btrfs_do_readpage(), so I think either approach should work, and then remove
> > the remaining cases where we test for the EXTENT_UPTODATE state at
> > btrfs_do_readpage().
> >
> > Thanks.
> >
> > >
> > > Thanks.
> > >
> > > > Thanks.
> > > >
> > > > >         unlock_extent_cached_atomic(tree, processed->start, processed->end,
> > > > >                                     &cached);
> > > > >
> > > > > --
> > > > > 2.17.1
> > > > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-18  8:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-15  2:42 [PATCH] btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path ethanlien
2022-08-16  8:49 ` Filipe Manana
2022-08-17  6:30   ` Ethan Lien
2022-08-17 12:35     ` Filipe Manana
2022-08-18  7:32       ` Ethan Lien
2022-08-18  8:00         ` Ethan Lien

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.