All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Drop some mis-uses of READA
@ 2020-03-13 21:09 Josef Bacik
  2020-03-13 21:09 ` [PATCH 1/2] btrfs: do not use READA for running delayed refs Josef Bacik
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Josef Bacik @ 2020-03-13 21:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

In debugging Zygo's huge commit delays I noticed we were burning a bunch of time
doing READA in cases where we don't need to.  The way READA works in btrfs is
we'll load up adjacent nodes and leaves as we walk down.  This is useful for
operations where we're going to be reading sequentially across the tree.

But for delayed refs we're looking up one bytenr, and then another one which
could be elsewhere in the tree.  With large enough extent trees this results in
a lot of unneeded latency.

The same applies to build_backref_tree, but that's even worse because we're
looking up backrefs, which are essentially randomly spread out across the extent
root.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] btrfs: do not use READA for running delayed refs
  2020-03-13 21:09 [PATCH 0/2] Drop some mis-uses of READA Josef Bacik
@ 2020-03-13 21:09 ` Josef Bacik
  2020-03-14  0:18   ` Qu Wenruo
  2020-03-13 21:09 ` [PATCH 2/2] btrfs: do not READA in build_backref_tree Josef Bacik
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2020-03-13 21:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

READA will generate a lot of extra reads for adjacent nodes, but when
running delayed refs we have no idea if the next ref is going to be
adjacent or not, so this potentially just generates a lot of extra IO.
To make matters worse each ref is truly just looking for one item, it
doesn't generally search forward, so we simply don't need it here.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a24ef1cef9fa..8e5b49baad98 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1469,7 +1469,6 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	if (!path)
 		return -ENOMEM;
 
-	path->reada = READA_FORWARD;
 	path->leave_spinning = 1;
 	/* this will setup the path even if it fails to insert the back ref */
 	ret = insert_inline_extent_backref(trans, path, bytenr, num_bytes,
@@ -1494,7 +1493,6 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
-	path->reada = READA_FORWARD;
 	path->leave_spinning = 1;
 	/* now insert the actual backref */
 	ret = insert_extent_backref(trans, path, bytenr, parent, root_objectid,
@@ -1604,7 +1602,6 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
 	}
 
 again:
-	path->reada = READA_FORWARD;
 	path->leave_spinning = 1;
 	ret = btrfs_search_slot(trans, fs_info->extent_root, &key, path, 0, 1);
 	if (ret < 0) {
@@ -2999,7 +2996,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	if (!path)
 		return -ENOMEM;
 
-	path->reada = READA_FORWARD;
 	path->leave_spinning = 1;
 
 	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] btrfs: do not READA in build_backref_tree
  2020-03-13 21:09 [PATCH 0/2] Drop some mis-uses of READA Josef Bacik
  2020-03-13 21:09 ` [PATCH 1/2] btrfs: do not use READA for running delayed refs Josef Bacik
@ 2020-03-13 21:09 ` Josef Bacik
  2020-03-14  0:19   ` Qu Wenruo
  2020-03-14  2:56 ` [PATCH 0/2] Drop some mis-uses of READA Qu Wenruo
  2020-03-18 14:40 ` David Sterba
  3 siblings, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2020-03-13 21:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Here we are just searching down to the bytenr we're building the backref
tree for, and all of it's paths to the roots.  These bytenrs are not
guaranteed to be anywhere near each other, so the READA just generates
extra latency.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4fb7e3cc2aca..3ccc126d0df3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -759,8 +759,6 @@ struct backref_node *build_backref_tree(struct reloc_control *rc,
 		err = -ENOMEM;
 		goto out;
 	}
-	path1->reada = READA_FORWARD;
-	path2->reada = READA_FORWARD;
 
 	node = alloc_backref_node(cache);
 	if (!node) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] btrfs: do not use READA for running delayed refs
  2020-03-13 21:09 ` [PATCH 1/2] btrfs: do not use READA for running delayed refs Josef Bacik
@ 2020-03-14  0:18   ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2020-03-14  0:18 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 2019 bytes --]



On 2020/3/14 上午5:09, Josef Bacik wrote:
> READA will generate a lot of extra reads for adjacent nodes, but when
> running delayed refs we have no idea if the next ref is going to be
> adjacent or not, so this potentially just generates a lot of extra IO.
> To make matters worse each ref is truly just looking for one item, it
> doesn't generally search forward, so we simply don't need it here.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>  fs/btrfs/extent-tree.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a24ef1cef9fa..8e5b49baad98 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1469,7 +1469,6 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  	if (!path)
>  		return -ENOMEM;
>  
> -	path->reada = READA_FORWARD;
>  	path->leave_spinning = 1;
>  	/* this will setup the path even if it fails to insert the back ref */
>  	ret = insert_inline_extent_backref(trans, path, bytenr, num_bytes,
> @@ -1494,7 +1493,6 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  	btrfs_mark_buffer_dirty(leaf);
>  	btrfs_release_path(path);
>  
> -	path->reada = READA_FORWARD;
>  	path->leave_spinning = 1;
>  	/* now insert the actual backref */
>  	ret = insert_extent_backref(trans, path, bytenr, parent, root_objectid,
> @@ -1604,7 +1602,6 @@ static int run_delayed_extent_op(struct btrfs_trans_handle *trans,
>  	}
>  
>  again:
> -	path->reada = READA_FORWARD;
>  	path->leave_spinning = 1;
>  	ret = btrfs_search_slot(trans, fs_info->extent_root, &key, path, 0, 1);
>  	if (ret < 0) {
> @@ -2999,7 +2996,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  	if (!path)
>  		return -ENOMEM;
>  
> -	path->reada = READA_FORWARD;
>  	path->leave_spinning = 1;
>  
>  	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] btrfs: do not READA in build_backref_tree
  2020-03-13 21:09 ` [PATCH 2/2] btrfs: do not READA in build_backref_tree Josef Bacik
@ 2020-03-14  0:19   ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2020-03-14  0:19 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 919 bytes --]



On 2020/3/14 上午5:09, Josef Bacik wrote:
> Here we are just searching down to the bytenr we're building the backref
> tree for, and all of it's paths to the roots.  These bytenrs are not
> guaranteed to be anywhere near each other, so the READA just generates
> extra latency.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>  fs/btrfs/relocation.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 4fb7e3cc2aca..3ccc126d0df3 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -759,8 +759,6 @@ struct backref_node *build_backref_tree(struct reloc_control *rc,
>  		err = -ENOMEM;
>  		goto out;
>  	}
> -	path1->reada = READA_FORWARD;
> -	path2->reada = READA_FORWARD;
>  
>  	node = alloc_backref_node(cache);
>  	if (!node) {
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Drop some mis-uses of READA
  2020-03-13 21:09 [PATCH 0/2] Drop some mis-uses of READA Josef Bacik
  2020-03-13 21:09 ` [PATCH 1/2] btrfs: do not use READA for running delayed refs Josef Bacik
  2020-03-13 21:09 ` [PATCH 2/2] btrfs: do not READA in build_backref_tree Josef Bacik
@ 2020-03-14  2:56 ` Qu Wenruo
  2020-03-18 14:44   ` David Sterba
  2020-03-18 14:40 ` David Sterba
  3 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2020-03-14  2:56 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 1091 bytes --]



On 2020/3/14 上午5:09, Josef Bacik wrote:
> In debugging Zygo's huge commit delays I noticed we were burning a bunch of time
> doing READA in cases where we don't need to.  The way READA works in btrfs is
> we'll load up adjacent nodes and leaves as we walk down.  This is useful for
> operations where we're going to be reading sequentially across the tree.
> 
> But for delayed refs we're looking up one bytenr, and then another one which
> could be elsewhere in the tree.  With large enough extent trees this results in
> a lot of unneeded latency.
> 
> The same applies to build_backref_tree, but that's even worse because we're
> looking up backrefs, which are essentially randomly spread out across the extent
> root.  Thanks,

There are quite some other locations abusing READA.

E.g. btrfs_read_block_groups(), where we're just searching for block
group items. There is no guarantee that next block group item is in next
a few leaves.

I guess it's a good time to review all READA abuse. Or would you mind me
to do that?

Thanks,
Qu
> 
> Josef
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Drop some mis-uses of READA
  2020-03-13 21:09 [PATCH 0/2] Drop some mis-uses of READA Josef Bacik
                   ` (2 preceding siblings ...)
  2020-03-14  2:56 ` [PATCH 0/2] Drop some mis-uses of READA Qu Wenruo
@ 2020-03-18 14:40 ` David Sterba
  3 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2020-03-18 14:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Mar 13, 2020 at 05:09:52PM -0400, Josef Bacik wrote:
> In debugging Zygo's huge commit delays I noticed we were burning a bunch of time
> doing READA in cases where we don't need to.  The way READA works in btrfs is
> we'll load up adjacent nodes and leaves as we walk down.  This is useful for
> operations where we're going to be reading sequentially across the tree.
> 
> But for delayed refs we're looking up one bytenr, and then another one which
> could be elsewhere in the tree.  With large enough extent trees this results in
> a lot of unneeded latency.
> 
> The same applies to build_backref_tree, but that's even worse because we're
> looking up backrefs, which are essentially randomly spread out across the extent
> root.  Thanks,

Makes sense, I'll add it to misc-next. Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Drop some mis-uses of READA
  2020-03-14  2:56 ` [PATCH 0/2] Drop some mis-uses of READA Qu Wenruo
@ 2020-03-18 14:44   ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2020-03-18 14:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Sat, Mar 14, 2020 at 10:56:06AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/3/14 上午5:09, Josef Bacik wrote:
> > In debugging Zygo's huge commit delays I noticed we were burning a bunch of time
> > doing READA in cases where we don't need to.  The way READA works in btrfs is
> > we'll load up adjacent nodes and leaves as we walk down.  This is useful for
> > operations where we're going to be reading sequentially across the tree.
> > 
> > But for delayed refs we're looking up one bytenr, and then another one which
> > could be elsewhere in the tree.  With large enough extent trees this results in
> > a lot of unneeded latency.
> > 
> > The same applies to build_backref_tree, but that's even worse because we're
> > looking up backrefs, which are essentially randomly spread out across the extent
> > root.  Thanks,
> 
> There are quite some other locations abusing READA.
> 
> E.g. btrfs_read_block_groups(), where we're just searching for block
> group items. There is no guarantee that next block group item is in next
> a few leaves.
> 
> I guess it's a good time to review all READA abuse. Or would you mind me
> to do that?

If you find some clear example where the items are scattered over the
tree then yes. For the rest it would be good to put a comment that the
readahead really helps. Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-18 14:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 21:09 [PATCH 0/2] Drop some mis-uses of READA Josef Bacik
2020-03-13 21:09 ` [PATCH 1/2] btrfs: do not use READA for running delayed refs Josef Bacik
2020-03-14  0:18   ` Qu Wenruo
2020-03-13 21:09 ` [PATCH 2/2] btrfs: do not READA in build_backref_tree Josef Bacik
2020-03-14  0:19   ` Qu Wenruo
2020-03-14  2:56 ` [PATCH 0/2] Drop some mis-uses of READA Qu Wenruo
2020-03-18 14:44   ` David Sterba
2020-03-18 14:40 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.