From: Jia-Ju Bai <baijiaju1990@gmail.com>
To: dsterba@suse.cz, clm@fb.com, josef@toxicpanda.com,
dsterba@suse.com, linux-btrfs@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages()
Date: Wed, 13 May 2020 10:17:10 +0800 [thread overview]
Message-ID: <06d64482-a263-668d-7ab1-9f411eb1c794@gmail.com> (raw)
In-Reply-To: <20200512215625.GE18421@twin.jikos.cz>
On 2020/5/13 5:56, David Sterba wrote:
> On Sat, May 09, 2020 at 01:27:01PM +0800, Jia-Ju Bai wrote:
>> The function extent_write_cache_pages is concurrently executed with
>> itself at runtime in the following call contexts:
>>
>> Thread 1:
>> btrfs_sync_file()
>> start_ordered_ops()
>> btrfs_fdatawrite_range()
>> btrfs_writepages() [via function pointer]
>> extent_writepages()
>> extent_write_cache_pages()
>>
>> Thread 2:
>> btrfs_writepages()
>> extent_writepages()
>> extent_write_cache_pages()
>>
>> In extent_write_cache_pages():
>> index = mapping->writeback_index;
>> ...
>> mapping->writeback_index = done_index;
>>
>> The accesses to mapping->writeback_index are not synchronized, and thus
>> data races for this value can occur.
>> These data races were found and actually reproduced by our concurrency
>> fuzzer.
>>
>> To fix these races, the spinlock mapping->private_lock is used to
>> protect the accesses to mapping->writeback_index.
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
>> ---
>> fs/btrfs/extent_io.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 39e45b8a5031..8c33a60bde1d 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4160,7 +4160,9 @@ static int extent_write_cache_pages(struct address_space *mapping,
>>
>> pagevec_init(&pvec);
>> if (wbc->range_cyclic) {
>> + spin_lock(&mapping->private_lock);
>> index = mapping->writeback_index; /* Start from prev offset */
>> + spin_unlock(&mapping->private_lock);
>> end = -1;
>> /*
>> * Start from the beginning does not need to cycle over the
>> @@ -4271,8 +4273,11 @@ static int extent_write_cache_pages(struct address_space *mapping,
>> goto retry;
>> }
>>
>> - if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
>> + if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole)) {
>> + spin_lock(&mapping->private_lock);
>> mapping->writeback_index = done_index;
>> + spin_unlock(&mapping->private_lock);
> I'm more and more curious what exactly is your fuzzer tool actualy
> reporting. Because adding the locks around the writeback index does not
> make any sense.
>
> The variable is of type unsigned long, this is written atomically so the
> only theoretical problem is on an achritecture that is not capable of
> storing that in one go, which means a lot more problems eg. because
> pointers are assumed to be the same width as unsigned long.
>
> So torn write is not possible and the lock leads to the same result as
> if it wasn't there and the read and write would happen not serialized by
> the spinlock but somewhere on the way from CPU caches to memory.
>
> CPU1 CPU2
>
> lock
> index = mapping->writeback_index
> unlock
> lock
> m->writeback_index = index;
> unlock
>
> Is the same as
>
> CPU1 CPU2
>
>
> index = mapping->writeback_index
> m->writeback_index = index;
>
> So maybe this makes your tool happy but there's no change from the
> correctness point of view, only added overhead from the lock/unlock
> calls.
>
> Lockless synchronization is a thing, using memory barriers etc., this
> was the case of some other patch, I think your tool needs to take that
> into account to give sensible results.
Thanks for the reply and explanation :)
I agree that only adding locks here makes no sense, because "index =
mapping->writeback_index" can be still executed before or after
"m->writeback_index = index" is executed.
So what is the expected order of the two statements? Read after write or
write after read?
Best wishes,
Jia-Ju Bai
prev parent reply other threads:[~2020-05-13 2:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-09 5:27 [PATCH 2/4] fs: btrfs: fix data races in extent_write_cache_pages() Jia-Ju Bai
2020-05-12 21:56 ` David Sterba
2020-05-13 2:17 ` Jia-Ju Bai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06d64482-a263-668d-7ab1-9f411eb1c794@gmail.com \
--to=baijiaju1990@gmail.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).