linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: optimize barrier usage for Rmw atomics
@ 2020-01-29 18:03 Davidlohr Bueso
  2020-01-29 19:07 ` Nikolay Borisov
  2020-01-29 19:14 ` David Sterba
  0 siblings, 2 replies; 6+ messages in thread
From: Davidlohr Bueso @ 2020-01-29 18:03 UTC (permalink / raw)
  To: dsterba; +Cc: nborisov, linux-btrfs, linux-kernel, dave, Davidlohr Bueso

Use smp_mb__after_atomic() instead of smp_mb() and avoid the
unnecessary barrier for non LL/SC architectures, such as x86.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/btrfs/btrfs_inode.h | 2 +-
 fs/btrfs/file.c        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..54e0d2ae22cc 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -325,7 +325,7 @@ struct btrfs_dio_private {
 static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
 {
 	set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-	smp_mb();
+	smp_mb__after_atomic();
 }
 
 static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a16da274c9aa..ea79ab068079 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2143,7 +2143,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	}
 	atomic_inc(&root->log_batch);
 
-	smp_mb();
+	smp_mb__after_atomic();
 	if (btrfs_inode_in_log(BTRFS_I(inode), fs_info->generation) ||
 	    BTRFS_I(inode)->last_trans <= fs_info->last_trans_committed) {
 		/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics
  2020-01-29 18:03 [PATCH] btrfs: optimize barrier usage for Rmw atomics Davidlohr Bueso
@ 2020-01-29 19:07 ` Nikolay Borisov
  2020-01-29 19:14 ` David Sterba
  1 sibling, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2020-01-29 19:07 UTC (permalink / raw)
  To: Davidlohr Bueso, dsterba; +Cc: linux-btrfs, linux-kernel, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 358 bytes --]



On 29.01.20 г. 20:03 ч., Davidlohr Bueso wrote:
> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
> unnecessary barrier for non LL/SC architectures, such as x86.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>


While on the topic of this I've been sitting on the following local
patch for about a year, care to review the barriers:




[-- Attachment #2: 0001-btrfs-Fix-memory-ordering-of-unlocked-dio-reads-vs-t.patch --]
[-- Type: text/x-patch, Size: 3762 bytes --]

From e659e5db649be01aec20515aef8ca48143e10c0b Mon Sep 17 00:00:00 2001
From: Nikolay Borisov <nborisov@suse.com>
Date: Wed, 7 Mar 2018 17:19:12 +0200
Subject: [PATCH] btrfs: Fix memory ordering of unlocked dio reads vs truncate

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/btrfs_inode.h | 17 -----------------
 fs/btrfs/inode.c       | 41 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 4e12a477d32e..e84f58cca02e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -317,23 +317,6 @@ struct btrfs_dio_private {
 			blk_status_t);
 };
 
-/*
- * Disable DIO read nolock optimization, so new dio readers will be forced
- * to grab i_mutex. It is used to avoid the endless truncate due to
- * nonlocked dio read.
- */
-static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
-{
-	set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-	smp_mb();
-}
-
-static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
-{
-	smp_mb__before_atomic();
-	clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
-}
-
 /* Array of bytes with variable length, hexadecimal format 0x1234 */
 #define CSUM_FMT				"0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)		size, bytes
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6d2bb58d277a..d64600268c3a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4626,10 +4626,29 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 
 		truncate_setsize(inode, newsize);
 
-		/* Disable nonlocked read DIO to avoid the endless truncate */
-		btrfs_inode_block_unlocked_dio(BTRFS_I(inode));
+		/*
+		 * This code is very subtle. It is essentially a lock of its
+		 * own type. BTRFS allows multiple DIO readers to race with
+		 * writers so long as they don't read beyond EOF of an inode.
+		 * However, if we have a pending truncate we'd like to signal
+		 * DIO readers they should fall back to DIO_LOCKING semantics.
+		 * This ensures that multiple aggressive DIO readers cannot
+		 * starve the truncating thread.
+		 *
+		 * This semantics is achieved by the use of the below flag. If
+		 * new readers come after the flag has been cleared then the
+		 * state is still consistent, since the RELEASE semantics of
+		 * clear_bit_unlock ensure the truncate inode size will be
+		 * visible and DIO readers will bail out.
+		 *
+		 * The implied memory barrier by inode_dio_wait is paired with
+		 * smp_mb__before_atomic in btrfs_direct_IO.
+		 */
+		set_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+			&BTRFS_I(inode)->runtime_flags);
 		inode_dio_wait(inode);
-		btrfs_inode_resume_unlocked_dio(BTRFS_I(inode));
+		clear_bit_unlock(BTRFS_INODE_READDIO_NEED_LOCK,
+				 &BTRFS_I(inode)->runtime_flags);
 
 		ret = btrfs_truncate(inode, newsize == oldsize);
 		if (ret && inode->i_nlink) {
@@ -8070,11 +8089,19 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		dio_data.unsubmitted_oe_range_end = (u64)offset;
 		current->journal_info = &dio_data;
 		down_read(&BTRFS_I(inode)->dio_sem);
-	} else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+	} else {
+		/*
+		 * This barrier is paired with the implied barrier in
+		 * inode_dio_wait. It ensures that READDIO_NEED_LOCK is
+		 * visible if we have a pending truncate.
+		 */
+		smp_mb__before_atomic();
+		if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 				     &BTRFS_I(inode)->runtime_flags)) {
-		inode_dio_end(inode);
-		flags = DIO_LOCKING | DIO_SKIP_HOLES;
-		wakeup = false;
+			inode_dio_end(inode);
+			flags = DIO_LOCKING | DIO_SKIP_HOLES;
+			wakeup = false;
+		}
 	}
 
 	ret = __blockdev_direct_IO(iocb, inode,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics
  2020-01-29 18:03 [PATCH] btrfs: optimize barrier usage for Rmw atomics Davidlohr Bueso
  2020-01-29 19:07 ` Nikolay Borisov
@ 2020-01-29 19:14 ` David Sterba
  2020-01-29 19:25   ` Davidlohr Bueso
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2020-01-29 19:14 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: dsterba, nborisov, linux-btrfs, linux-kernel, Davidlohr Bueso

On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
> unnecessary barrier for non LL/SC architectures, such as x86.

So that's a conflicting advice from what we got when discussing wich
barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
memory is still fresh. My first idea was to take the
smp_mb__after_atomic and __before_atomic variants and after discussion
with various people the plain smp_wmb/smp_rmb were suggested and used in
the end.

I can dig the email threads and excerpts from irc conversations, maybe
Nik has them at hand too. We do want to get rid of all unnecessary and
uncommented barriers in btrfs code, so I appreciate your patch.

> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  fs/btrfs/btrfs_inode.h | 2 +-
>  fs/btrfs/file.c        | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4e12a477d32e..54e0d2ae22cc 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -325,7 +325,7 @@ struct btrfs_dio_private {
>  static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode)
>  {
>  	set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags);
> -	smp_mb();
> +	smp_mb__after_atomic();

In this case I think we should use the smp_wmb/smp_rmb pattern rather
than the full barrier.

>  }
>  
>  static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode)
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a16da274c9aa..ea79ab068079 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2143,7 +2143,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	}
>  	atomic_inc(&root->log_batch);
>  
> -	smp_mb();
> +	smp_mb__after_atomic();

That's the problem with uncommented barriers that it's not clear what
are they related to. In this case it's not the atomic_inc above that
would justify __after_atomic. The patch that added it is years old so
any change to that barrier would require deeper analysis.

>  	if (btrfs_inode_in_log(BTRFS_I(inode), fs_info->generation) ||
>  	    BTRFS_I(inode)->last_trans <= fs_info->last_trans_committed) {
>  		/*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics
  2020-01-29 19:14 ` David Sterba
@ 2020-01-29 19:25   ` Davidlohr Bueso
  2020-01-29 23:55     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: Davidlohr Bueso @ 2020-01-29 19:25 UTC (permalink / raw)
  To: dsterba, dsterba, nborisov, linux-btrfs, linux-kernel, Davidlohr Bueso

On Wed, 29 Jan 2020, David Sterba wrote:

>On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>> unnecessary barrier for non LL/SC architectures, such as x86.
>
>So that's a conflicting advice from what we got when discussing wich
>barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>memory is still fresh. My first idea was to take the
>smp_mb__after_atomic and __before_atomic variants and after discussion
>with various people the plain smp_wmb/smp_rmb were suggested and used in
>the end.

So the patch you mention deals with test_bit(), which is out of the scope
of smp_mb__{before,after}_atomic() as it's not a RMW operation. atomic_inc()
and set_bit() are, however, meant to use these barriers.

>
>I can dig the email threads and excerpts from irc conversations, maybe
>Nik has them at hand too. We do want to get rid of all unnecessary and
>uncommented barriers in btrfs code, so I appreciate your patch.

Yeah, I struggled with the amount of undocumented barriers, and decided
not to go down that rabbit hole. This patch is only an equivalent of
what is currently there. When possible, getting rid of barriers is of
course better.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics
  2020-01-29 19:25   ` Davidlohr Bueso
@ 2020-01-29 23:55     ` Qu Wenruo
  2020-01-30  8:18       ` Nikolay Borisov
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-01-29 23:55 UTC (permalink / raw)
  To: Davidlohr Bueso, dsterba, dsterba, nborisov, linux-btrfs,
	linux-kernel, Davidlohr Bueso



On 2020/1/30 上午3:25, Davidlohr Bueso wrote:
> On Wed, 29 Jan 2020, David Sterba wrote:
>
>> On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>>> unnecessary barrier for non LL/SC architectures, such as x86.
>>
>> So that's a conflicting advice from what we got when discussing wich
>> barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>> memory is still fresh. My first idea was to take the
>> smp_mb__after_atomic and __before_atomic variants and after discussion
>> with various people the plain smp_wmb/smp_rmb were suggested and used in
>> the end.
>
> So the patch you mention deals with test_bit(), which is out of the scope
> of smp_mb__{before,after}_atomic() as it's not a RMW operation.
> atomic_inc()
> and set_bit() are, however, meant to use these barriers.

Exactly!
I'm still not convinced to use full barrier for test_bit() and I see no
reason to use any barrier for test_bit().
All mb should only be needed between two or more memory access, thus mb
should sit between set/clear_bit() and other operations, not around
test_bit().

>
>>
>> I can dig the email threads and excerpts from irc conversations, maybe
>> Nik has them at hand too. We do want to get rid of all unnecessary and
>> uncommented barriers in btrfs code, so I appreciate your patch.
>
> Yeah, I struggled with the amount of undocumented barriers, and decided
> not to go down that rabbit hole. This patch is only an equivalent of
> what is currently there. When possible, getting rid of barriers is of
> course better.

BTW, is there any convincing method to do proper mb examination?

I really found it hard to convince others or even myself when mb is
involved.

Thanks,
Qu

>
> Thanks,
> Davidlohr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics
  2020-01-29 23:55     ` Qu Wenruo
@ 2020-01-30  8:18       ` Nikolay Borisov
  0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2020-01-30  8:18 UTC (permalink / raw)
  To: Qu Wenruo, Davidlohr Bueso, dsterba, dsterba, linux-btrfs,
	linux-kernel, Davidlohr Bueso



On 30.01.20 г. 1:55 ч., Qu Wenruo wrote:
> 
> 
> On 2020/1/30 上午3:25, Davidlohr Bueso wrote:
>> On Wed, 29 Jan 2020, David Sterba wrote:
>>
>>> On Wed, Jan 29, 2020 at 10:03:24AM -0800, Davidlohr Bueso wrote:
>>>> Use smp_mb__after_atomic() instead of smp_mb() and avoid the
>>>> unnecessary barrier for non LL/SC architectures, such as x86.
>>>
>>> So that's a conflicting advice from what we got when discussing wich
>>> barriers to use in 6282675e6708ec78518cc0e9ad1f1f73d7c5c53d and the
>>> memory is still fresh. My first idea was to take the
>>> smp_mb__after_atomic and __before_atomic variants and after discussion
>>> with various people the plain smp_wmb/smp_rmb were suggested and used in
>>> the end.
>>
>> So the patch you mention deals with test_bit(), which is out of the scope
>> of smp_mb__{before,after}_atomic() as it's not a RMW operation.
>> atomic_inc()
>> and set_bit() are, however, meant to use these barriers.
> 
> Exactly!
> I'm still not convinced to use full barrier for test_bit() and I see no
> reason to use any barrier for test_bit().
> All mb should only be needed between two or more memory access, thus mb
> should sit between set/clear_bit() and other operations, not around
> test_bit().
> 
>>
>>>
>>> I can dig the email threads and excerpts from irc conversations, maybe
>>> Nik has them at hand too. We do want to get rid of all unnecessary and
>>> uncommented barriers in btrfs code, so I appreciate your patch.
>>
>> Yeah, I struggled with the amount of undocumented barriers, and decided
>> not to go down that rabbit hole. This patch is only an equivalent of
>> what is currently there. When possible, getting rid of barriers is of
>> course better.
> 
> BTW, is there any convincing method to do proper mb examination?
> 
> I really found it hard to convince others or even myself when mb is
> involved.

Yes there is - the LKMM, you can write a litmus test. Check out
tootls/memory-model
> 
> Thanks,
> Qu
> 
>>
>> Thanks,
>> Davidlohr

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-30  8:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 18:03 [PATCH] btrfs: optimize barrier usage for Rmw atomics Davidlohr Bueso
2020-01-29 19:07 ` Nikolay Borisov
2020-01-29 19:14 ` David Sterba
2020-01-29 19:25   ` Davidlohr Bueso
2020-01-29 23:55     ` Qu Wenruo
2020-01-30  8:18       ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).