linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages()
@ 2020-05-09  5:27 Jia-Ju Bai
  2020-05-12 21:56 ` David Sterba
  0 siblings, 1 reply; 3+ messages in thread
From: Jia-Ju Bai @ 2020-05-09  5:27 UTC (permalink / raw)
  To: clm, josef, dsterba; +Cc: linux-btrfs, linux-kernel, Jia-Ju Bai

The function extent_write_cache_pages is concurrently executed with
itself at runtime in the following call contexts:

Thread 1:
  btrfs_sync_file()
    start_ordered_ops()
      btrfs_fdatawrite_range()
        btrfs_writepages() [via function pointer]
          extent_writepages()
            extent_write_cache_pages()

Thread 2:
  btrfs_writepages() 
    extent_writepages()
      extent_write_cache_pages()

In extent_write_cache_pages():
  index = mapping->writeback_index;
  ...
  mapping->writeback_index = done_index;

The accesses to mapping->writeback_index are not synchronized, and thus
data races for this value can occur.
These data races were found and actually reproduced by our concurrency 
fuzzer.

To fix these races, the spinlock mapping->private_lock is used to
protect the accesses to mapping->writeback_index.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 fs/btrfs/extent_io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 39e45b8a5031..8c33a60bde1d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4160,7 +4160,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
 
 	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
+		spin_lock(&mapping->private_lock);
 		index = mapping->writeback_index; /* Start from prev offset */
+		spin_unlock(&mapping->private_lock);
 		end = -1;
 		/*
 		 * Start from the beginning does not need to cycle over the
@@ -4271,8 +4273,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			goto retry;
 	}
 
-	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
+	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) {
+		spin_lock(&mapping->private_lock);
 		mapping->writeback_index = done_index;
+		spin_unlock(&mapping->private_lock);
+	}
 
 	btrfs_add_delayed_iput(inode);
 	return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages()
  2020-05-09  5:27 [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages() Jia-Ju Bai
@ 2020-05-12 21:56 ` David Sterba
  2020-05-13  2:17   ` Jia-Ju Bai
  0 siblings, 1 reply; 3+ messages in thread
From: David Sterba @ 2020-05-12 21:56 UTC (permalink / raw)
  To: Jia-Ju Bai; +Cc: clm, josef, dsterba, linux-btrfs, linux-kernel

On Sat, May 09, 2020 at 01:27:01PM +0800, Jia-Ju Bai wrote:
> The function extent_write_cache_pages is concurrently executed with
> itself at runtime in the following call contexts:
> 
> Thread 1:
>   btrfs_sync_file()
>     start_ordered_ops()
>       btrfs_fdatawrite_range()
>         btrfs_writepages() [via function pointer]
>           extent_writepages()
>             extent_write_cache_pages()
> 
> Thread 2:
>   btrfs_writepages() 
>     extent_writepages()
>       extent_write_cache_pages()
> 
> In extent_write_cache_pages():
>   index = mapping->writeback_index;
>   ...
>   mapping->writeback_index = done_index;
> 
> The accesses to mapping->writeback_index are not synchronized, and thus
> data races for this value can occur.
> These data races were found and actually reproduced by our concurrency 
> fuzzer.
> 
> To fix these races, the spinlock mapping->private_lock is used to
> protect the accesses to mapping->writeback_index.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
>  fs/btrfs/extent_io.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 39e45b8a5031..8c33a60bde1d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4160,7 +4160,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  
>  	pagevec_init(&pvec);
>  	if (wbc->range_cyclic) {
> +		spin_lock(&mapping->private_lock);
>  		index = mapping->writeback_index; /* Start from prev offset */
> +		spin_unlock(&mapping->private_lock);
>  		end = -1;
>  		/*
>  		 * Start from the beginning does not need to cycle over the
> @@ -4271,8 +4273,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  			goto retry;
>  	}
>  
> -	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
> +	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) {
> +		spin_lock(&mapping->private_lock);
>  		mapping->writeback_index = done_index;
> +		spin_unlock(&mapping->private_lock);

I'm more and more curious what exactly is your fuzzer tool actualy
reporting. Because adding the locks around the writeback index does not
make any sense.

The variable is of type unsigned long, this is written atomically so the
only theoretical problem is on an achritecture that is not capable of
storing that in one go, which means a lot more problems eg. because
pointers are assumed to be the same width as unsigned long.

So torn write is not possible and the lock leads to the same result as
if it wasn't there and the read and write would happen not serialized by
the spinlock but somewhere on the way from CPU caches to memory.

CPU1                                   CPU2

lock
index = mapping->writeback_index
unlock
                                       lock
				       m->writeback_index = index;
				       unlock

Is the same as

CPU1                                   CPU2


index = mapping->writeback_index
				       m->writeback_index = index;

So maybe this makes your tool happy but there's no change from the
correctness point of view, only added overhead from the lock/unlock
calls.

Lockless synchronization is a thing, using memory barriers etc., this
was the case of some other patch, I think your tool needs to take that
into account to give sensible results.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages()
  2020-05-12 21:56 ` David Sterba
@ 2020-05-13  2:17   ` Jia-Ju Bai
  0 siblings, 0 replies; 3+ messages in thread
From: Jia-Ju Bai @ 2020-05-13  2:17 UTC (permalink / raw)
  To: dsterba, clm, josef, dsterba, linux-btrfs, linux-kernel



On 2020/5/13 5:56, David Sterba wrote:
> On Sat, May 09, 2020 at 01:27:01PM +0800, Jia-Ju Bai wrote:
>> The function extent_write_cache_pages is concurrently executed with
>> itself at runtime in the following call contexts:
>>
>> Thread 1:
>>    btrfs_sync_file()
>>      start_ordered_ops()
>>        btrfs_fdatawrite_range()
>>          btrfs_writepages() [via function pointer]
>>            extent_writepages()
>>              extent_write_cache_pages()
>>
>> Thread 2:
>>    btrfs_writepages()
>>      extent_writepages()
>>        extent_write_cache_pages()
>>
>> In extent_write_cache_pages():
>>    index = mapping->writeback_index;
>>    ...
>>    mapping->writeback_index = done_index;
>>
>> The accesses to mapping->writeback_index are not synchronized, and thus
>> data races for this value can occur.
>> These data races were found and actually reproduced by our concurrency
>> fuzzer.
>>
>> To fix these races, the spinlock mapping->private_lock is used to
>> protect the accesses to mapping->writeback_index.
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>> ---
>>   fs/btrfs/extent_io.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 39e45b8a5031..8c33a60bde1d 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4160,7 +4160,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
>>   
>>   	pagevec_init(&pvec);
>>   	if (wbc->range_cyclic) {
>> +		spin_lock(&mapping->private_lock);
>>   		index = mapping->writeback_index; /* Start from prev offset */
>> +		spin_unlock(&mapping->private_lock);
>>   		end = -1;
>>   		/*
>>   		 * Start from the beginning does not need to cycle over the
>> @@ -4271,8 +4273,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
>>   			goto retry;
>>   	}
>>   
>> -	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
>> +	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) {
>> +		spin_lock(&mapping->private_lock);
>>   		mapping->writeback_index = done_index;
>> +		spin_unlock(&mapping->private_lock);
> I'm more and more curious what exactly is your fuzzer tool actualy
> reporting. Because adding the locks around the writeback index does not
> make any sense.
>
> The variable is of type unsigned long, this is written atomically so the
> only theoretical problem is on an achritecture that is not capable of
> storing that in one go, which means a lot more problems eg. because
> pointers are assumed to be the same width as unsigned long.
>
> So torn write is not possible and the lock leads to the same result as
> if it wasn't there and the read and write would happen not serialized by
> the spinlock but somewhere on the way from CPU caches to memory.
>
> CPU1                                   CPU2
>
> lock
> index = mapping->writeback_index
> unlock
>                                         lock
> 				       m->writeback_index = index;
> 				       unlock
>
> Is the same as
>
> CPU1                                   CPU2
>
>
> index = mapping->writeback_index
> 				       m->writeback_index = index;
>
> So maybe this makes your tool happy but there's no change from the
> correctness point of view, only added overhead from the lock/unlock
> calls.
>
> Lockless synchronization is a thing, using memory barriers etc., this
> was the case of some other patch, I think your tool needs to take that
> into account to give sensible results.

Thanks for the reply and explanation :)
I agree that only adding locks here makes no sense, because "index = 
mapping->writeback_index" can be still executed before or after 
"m->writeback_index = index" is executed.
So what is the expected order of the two statements? Read after write or 
write after read?


Best wishes,
Jia-Ju Bai

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-05-13  2:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-09  5:27 [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages() Jia-Ju Bai
2020-05-12 21:56 ` David Sterba
2020-05-13  2:17   ` Jia-Ju Bai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).