All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: Simplify join_running_log_trans
@ 2019-05-20  8:11 Nikolay Borisov
  2019-05-20  8:18 ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-05-20  8:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana, Nikolay Borisov

This patch removes stray smp_mb before root->log_root from join_running_log_trans
as well as the unlocked check for root->log_root. log_root is only set in
btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
Those 2 locking invariants ensure join_running_log_trans don't see invalid
values of ->log_root.

Additionally this results in around 26% improvement when deleting 500k files/dir.
All values are in seconds. 

	With Patch (real)	With patch (sys)	Without patch (real)	Without patch (sys)
	    80					78						91						90
		63					62						93						91
		65					64						92						90
		67					65						93						90
		75					73						90						88
		75					73						91						89
		75					73						93						90
		74					73						89						87
		76					74						91						89
stddev	5.76146200581454	5.45690184791497	1.42400062421959	1.22474487139159
mean	72.2222222222222	70.5555555555556	91.4444444444444	89.3333333333333
median  75					73						91						90

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---

This passed full xfstest run and the performance results were obtained with the 
following testcase: 

#!/bin/bash
for i in {1..10}; do
    echo "Testun run : %i"
    ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
    sync
    time rm -rf /media/scratch/*
    sync
done

 fs/btrfs/tree-log.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6adcd8a2c5c7..61744d8af106 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
 {
 	int ret = -ENOENT;
 
-	smp_mb();
-	if (!root->log_root)
-		return -ENOENT;
-
 	mutex_lock(&root->log_mutex);
 	if (root->log_root) {
 		ret = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Simplify join_running_log_trans
  2019-05-20  8:11 [PATCH] btrfs: Simplify join_running_log_trans Nikolay Borisov
@ 2019-05-20  8:18 ` Nikolay Borisov
  2019-05-20  9:10   ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-05-20  8:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana



On 20.05.19 г. 11:11 ч., Nikolay Borisov wrote:
> This patch removes stray smp_mb before root->log_root from join_running_log_trans
> as well as the unlocked check for root->log_root. log_root is only set in
> btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
> Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
> which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
> Those 2 locking invariants ensure join_running_log_trans don't see invalid
> values of ->log_root.
> 
> Additionally this results in around 26% improvement when deleting 500k files/dir.
> All values are in seconds. 
> 
> 	With Patch (real)	With patch (sys)	Without patch (real)	Without patch (sys)
> 	    80					78						91						90
> 		63					62						93						91
> 		65					64						92						90
> 		67					65						93						90
> 		75					73						90						88
> 		75					73						91						89
> 		75					73						93						90
> 		74					73						89						87
> 		76					74						91						89
> stddev	5.76146200581454	5.45690184791497	1.42400062421959	1.22474487139159
> mean	72.2222222222222	70.5555555555556	91.4444444444444	89.3333333333333
> median  75					73						91						90
>                                                                                                                                    
    With Patch (real)   With patch (sys)    Without patch (real)    Without patch (sys)
        80                   78                     91                      90  
        63                   62                     93                      91  
        65                   64                     92                      90  
        67                   65                     93                      90  
        75                   73                     90                      88  
        75                   73                     91                      89  
        75                   73                     93                      90  
        74                   73                     89                      87  
        76                   74                     91                      89  
stddev  5.76146200581454    5.45690184791497    1.42400062421959    1.22474487139159
mean    72.2222222222222    70.5555555555556    91.4444444444444    89.3333333333333
median  75                   73                     91                      90


Here's the perf data without being butchered.

> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> 
> This passed full xfstest run and the performance results were obtained with the 
> following testcase: 
> 
> #!/bin/bash
> for i in {1..10}; do
>     echo "Testun run : %i"
>     ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
>     sync
>     time rm -rf /media/scratch/*
>     sync
> done
> 
>  fs/btrfs/tree-log.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 6adcd8a2c5c7..61744d8af106 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
>  {
>  	int ret = -ENOENT;
>  
> -	smp_mb();
> -	if (!root->log_root)
> -		return -ENOENT;
> -
>  	mutex_lock(&root->log_mutex);
>  	if (root->log_root) {
>  		ret = 0;
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Simplify join_running_log_trans
  2019-05-20  8:18 ` Nikolay Borisov
@ 2019-05-20  9:10   ` Filipe Manana
  2019-05-20  9:13     ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Filipe Manana @ 2019-05-20  9:10 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, May 20, 2019 at 9:23 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 20.05.19 г. 11:11 ч., Nikolay Borisov wrote:
> > This patch removes stray smp_mb before root->log_root from join_running_log_trans
> > as well as the unlocked check for root->log_root. log_root is only set in
> > btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
> > Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
> > which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
> > Those 2 locking invariants ensure join_running_log_trans don't see invalid
> > values of ->log_root.
> >
> > Additionally this results in around 26% improvement when deleting 500k files/dir.
> > All values are in seconds.
> >
> >       With Patch (real)       With patch (sys)        Without patch (real)    Without patch (sys)
> >           80                                  78                                              91                                              90
> >               63                                      62                                              93                                              91
> >               65                                      64                                              92                                              90
> >               67                                      65                                              93                                              90
> >               75                                      73                                              90                                              88
> >               75                                      73                                              91                                              89
> >               75                                      73                                              93                                              90
> >               74                                      73                                              89                                              87
> >               76                                      74                                              91                                              89
> > stddev        5.76146200581454        5.45690184791497        1.42400062421959        1.22474487139159
> > mean  72.2222222222222        70.5555555555556        91.4444444444444        89.3333333333333
> > median  75                                    73                                              91                                              90
> >
>     With Patch (real)   With patch (sys)    Without patch (real)    Without patch (sys)
>         80                   78                     91                      90
>         63                   62                     93                      91
>         65                   64                     92                      90
>         67                   65                     93                      90
>         75                   73                     90                      88
>         75                   73                     91                      89
>         75                   73                     93                      90
>         74                   73                     89                      87
>         76                   74                     91                      89
> stddev  5.76146200581454    5.45690184791497    1.42400062421959    1.22474487139159
> mean    72.2222222222222    70.5555555555556    91.4444444444444    89.3333333333333
> median  75                   73                     91                      90

Great.

How was that test done?
Simply deleting the files with nothing else running in parallel?

How does it behave if while the files are being deleted,  there are
concurrent fsyncs on other files of the same subvolume, that is, while
the mutex is held?

Because that check without holding the mutex, is likely there for
performance reasons.

Thanks.


>
>
> Here's the perf data without being butchered.
>
> > Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> > ---
> >
> > This passed full xfstest run and the performance results were obtained with the
> > following testcase:
> >
> > #!/bin/bash
> > for i in {1..10}; do
> >     echo "Testun run : %i"
> >     ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
> >     sync
> >     time rm -rf /media/scratch/*
> >     sync
> > done
> >
> >  fs/btrfs/tree-log.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 6adcd8a2c5c7..61744d8af106 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
> >  {
> >       int ret = -ENOENT;
> >
> > -     smp_mb();
> > -     if (!root->log_root)
> > -             return -ENOENT;
> > -
> >       mutex_lock(&root->log_mutex);
> >       if (root->log_root) {
> >               ret = 0;
> >



--
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Simplify join_running_log_trans
  2019-05-20  9:10   ` Filipe Manana
@ 2019-05-20  9:13     ` Nikolay Borisov
  2019-05-20 14:42       ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-05-20  9:13 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Filipe David Borba Manana



On 20.05.19 г. 12:10 ч., Filipe Manana wrote:
> On Mon, May 20, 2019 at 9:23 AM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 20.05.19 г. 11:11 ч., Nikolay Borisov wrote:
>>> This patch removes stray smp_mb before root->log_root from join_running_log_trans
>>> as well as the unlocked check for root->log_root. log_root is only set in
>>> btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
>>> Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
>>> which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
>>> Those 2 locking invariants ensure join_running_log_trans don't see invalid
>>> values of ->log_root.
>>>
>>> Additionally this results in around 26% improvement when deleting 500k files/dir.
>>> All values are in seconds.
>>>
>>>       With Patch (real)       With patch (sys)        Without patch (real)    Without patch (sys)
>>>           80                                  78                                              91                                              90
>>>               63                                      62                                              93                                              91
>>>               65                                      64                                              92                                              90
>>>               67                                      65                                              93                                              90
>>>               75                                      73                                              90                                              88
>>>               75                                      73                                              91                                              89
>>>               75                                      73                                              93                                              90
>>>               74                                      73                                              89                                              87
>>>               76                                      74                                              91                                              89
>>> stddev        5.76146200581454        5.45690184791497        1.42400062421959        1.22474487139159
>>> mean  72.2222222222222        70.5555555555556        91.4444444444444        89.3333333333333
>>> median  75                                    73                                              91                                              90
>>>
>>     With Patch (real)   With patch (sys)    Without patch (real)    Without patch (sys)
>>         80                   78                     91                      90
>>         63                   62                     93                      91
>>         65                   64                     92                      90
>>         67                   65                     93                      90
>>         75                   73                     90                      88
>>         75                   73                     91                      89
>>         75                   73                     93                      90
>>         74                   73                     89                      87
>>         76                   74                     91                      89
>> stddev  5.76146200581454    5.45690184791497    1.42400062421959    1.22474487139159
>> mean    72.2222222222222    70.5555555555556    91.4444444444444    89.3333333333333
>> median  75                   73                     91                      90
> 
> Great.
> 
> How was that test done?
> Simply deleting the files with nothing else running in parallel?

Yes

> 
> How does it behave if while the files are being deleted,  there are
> concurrent fsyncs on other files of the same subvolume, that is, while
> the mutex is held?
> 
> Because that check without holding the mutex, is likely there for
> performance reasons.

So I will repeat the test when there are concurrent fsyncs running and
also with no concurrent fsyncs running but just removing the smp_mb, I
think the performance gain should be from removing the smp_mb and not
necessarily the check. In the worst case I can leave the check intact
and just remove the smp_mb because it doesn't add any value
correctness-wise.

> 
> Thanks.
> 
> 
>>
>>
>> Here's the perf data without being butchered.
>>
>>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>>> ---
>>>
>>> This passed full xfstest run and the performance results were obtained with the
>>> following testcase:
>>>
>>> #!/bin/bash
>>> for i in {1..10}; do
>>>     echo "Testun run : %i"
>>>     ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
>>>     sync
>>>     time rm -rf /media/scratch/*
>>>     sync
>>> done
>>>
>>>  fs/btrfs/tree-log.c | 4 ----
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
>>> index 6adcd8a2c5c7..61744d8af106 100644
>>> --- a/fs/btrfs/tree-log.c
>>> +++ b/fs/btrfs/tree-log.c
>>> @@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
>>>  {
>>>       int ret = -ENOENT;
>>>
>>> -     smp_mb();
>>> -     if (!root->log_root)
>>> -             return -ENOENT;
>>> -
>>>       mutex_lock(&root->log_mutex);
>>>       if (root->log_root) {
>>>               ret = 0;
>>>
> 
> 
> 
> --
> Filipe David Manana,
> 
> “Whether you think you can, or you think you can't — you're right.”
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: Simplify join_running_log_trans
  2019-05-20  9:13     ` Nikolay Borisov
@ 2019-05-20 14:42       ` Filipe Manana
  0 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2019-05-20 14:42 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, Filipe David Borba Manana

On Mon, May 20, 2019 at 10:13 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 20.05.19 г. 12:10 ч., Filipe Manana wrote:
> > On Mon, May 20, 2019 at 9:23 AM Nikolay Borisov <nborisov@suse.com> wrote:
> >>
> >>
> >>
> >> On 20.05.19 г. 11:11 ч., Nikolay Borisov wrote:
> >>> This patch removes stray smp_mb before root->log_root from join_running_log_trans
> >>> as well as the unlocked check for root->log_root. log_root is only set in
> >>> btrfs_add_log_tree, called from start_log_trans under root->log_mutex.
> >>> Furthermore, log_root is freed in btrfs_free_log, called from commit_fs_root,
> >>> which in turn is called from transaction's critical section (TRANS_COMMIT_DOING).
> >>> Those 2 locking invariants ensure join_running_log_trans don't see invalid
> >>> values of ->log_root.
> >>>
> >>> Additionally this results in around 26% improvement when deleting 500k files/dir.
> >>> All values are in seconds.
> >>>
> >>>       With Patch (real)       With patch (sys)        Without patch (real)    Without patch (sys)
> >>>           80                                  78                                              91                                              90
> >>>               63                                      62                                              93                                              91
> >>>               65                                      64                                              92                                              90
> >>>               67                                      65                                              93                                              90
> >>>               75                                      73                                              90                                              88
> >>>               75                                      73                                              91                                              89
> >>>               75                                      73                                              93                                              90
> >>>               74                                      73                                              89                                              87
> >>>               76                                      74                                              91                                              89
> >>> stddev        5.76146200581454        5.45690184791497        1.42400062421959        1.22474487139159
> >>> mean  72.2222222222222        70.5555555555556        91.4444444444444        89.3333333333333
> >>> median  75                                    73                                              91                                              90
> >>>
> >>     With Patch (real)   With patch (sys)    Without patch (real)    Without patch (sys)
> >>         80                   78                     91                      90
> >>         63                   62                     93                      91
> >>         65                   64                     92                      90
> >>         67                   65                     93                      90
> >>         75                   73                     90                      88
> >>         75                   73                     91                      89
> >>         75                   73                     93                      90
> >>         74                   73                     89                      87
> >>         76                   74                     91                      89
> >> stddev  5.76146200581454    5.45690184791497    1.42400062421959    1.22474487139159
> >> mean    72.2222222222222    70.5555555555556    91.4444444444444    89.3333333333333
> >> median  75                   73                     91                      90
> >
> > Great.
> >
> > How was that test done?
> > Simply deleting the files with nothing else running in parallel?
>
> Yes
>
> >
> > How does it behave if while the files are being deleted,  there are
> > concurrent fsyncs on other files of the same subvolume, that is, while
> > the mutex is held?
> >
> > Because that check without holding the mutex, is likely there for
> > performance reasons.
>
> So I will repeat the test when there are concurrent fsyncs running and
> also with no concurrent fsyncs running but just removing the smp_mb, I
> think the performance gain should be from removing the smp_mb and not
> necessarily the check. In the worst case I can leave the check intact
> and just remove the smp_mb because it doesn't add any value
> correctness-wise.

So the barrier is likely there to make sure we see non-null log_root
after it was set, concurrently by someone calling start_log_trans().
It pairs, implicitly, with the mutex_unlock there, at
start_log_trans(), since that implies a memory barrier.

With your patch, we always end up doing a mutex_lock(), which also
implies a memory barrier.

Sorry I missed this before, but your test doesn't make any sense. How
can a memory barrier have such a big impact on a code path (unlink)
that does lot of much heavier stuff? Like btree searches/deletes,
deleting  delayed items, etc.
Your test case is even flawed because join_running_log_trans() can
never be called.

Look at its only two callers, btrfs_del_inode_ref_in_log() and
btrfs_del_dir_entries_in_log(), they both do:

if (dir->logged_trans < trans->transid)
return 0;

ret = join_running_log_trans(root);
if (ret)
return 0;

and

if (inode->logged_trans < trans->transid)
return 0;

ret = join_running_log_trans(root);
if (ret)
return 0;

Since you never fsync the files nor the directory containing them in
that test, we never end up calling join_running_log_trans().

So I don't know where you got the 26%...

Thanks.

>
> >
> > Thanks.
> >
> >
> >>
> >>
> >> Here's the perf data without being butchered.
> >>
> >>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> >>> ---
> >>>
> >>> This passed full xfstest run and the performance results were obtained with the
> >>> following testcase:
> >>>
> >>> #!/bin/bash
> >>> for i in {1..10}; do
> >>>     echo "Testun run : %i"
> >>>     ./ltp/fsstress -z -d /media/scratch/ -f creat=100 -n 500000
> >>>     sync
> >>>     time rm -rf /media/scratch/*
> >>>     sync
> >>> done
> >>>
> >>>  fs/btrfs/tree-log.c | 4 ----
> >>>  1 file changed, 4 deletions(-)
> >>>
> >>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> >>> index 6adcd8a2c5c7..61744d8af106 100644
> >>> --- a/fs/btrfs/tree-log.c
> >>> +++ b/fs/btrfs/tree-log.c
> >>> @@ -188,10 +188,6 @@ static int join_running_log_trans(struct btrfs_root *root)
> >>>  {
> >>>       int ret = -ENOENT;
> >>>
> >>> -     smp_mb();
> >>> -     if (!root->log_root)
> >>> -             return -ENOENT;
> >>> -
> >>>       mutex_lock(&root->log_mutex);
> >>>       if (root->log_root) {
> >>>               ret = 0;
> >>>
> >
> >
> >
> > --
> > Filipe David Manana,
> >
> > “Whether you think you can, or you think you can't — you're right.”
> >



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-20 14:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20  8:11 [PATCH] btrfs: Simplify join_running_log_trans Nikolay Borisov
2019-05-20  8:18 ` Nikolay Borisov
2019-05-20  9:10   ` Filipe Manana
2019-05-20  9:13     ` Nikolay Borisov
2019-05-20 14:42       ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.