Re: [PATCH 4/5] btrfs: serialize blocking_writers updates

From: David Sterba <dsterba@suse.cz>
To: Nikolay Borisov <nborisov@suse.com>
Cc: dsterba@suse.cz, David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/5] btrfs: serialize blocking_writers updates
Date: Tue, 29 Oct 2019 22:15:05 +0100	[thread overview]
Message-ID: <20191029211505.GB3001@twin.jikos.cz> (raw)
In-Reply-To: <1bff6395-5f2e-f11b-55ba-01a4f59b4881@suse.com>

On Tue, Oct 29, 2019 at 08:48:15PM +0200, Nikolay Borisov wrote:
> On 29.10.19 г. 19:51 ч., David Sterba wrote:
> > On Wed, Oct 23, 2019 at 12:57:00PM +0300, Nikolay Borisov wrote:
> >> On 17.10.19 г. 22:39 ч., David Sterba wrote:
> >>> There's one potential memory ordering problem regarding
> >>> eb::blocking_writers, although this hasn't been hit in practice. On TSO
> >>> (total store ordering) architectures, the writes will always be seen in
> >>> right order.
> >>>
> >>> In case the calls in btrfs_set_lock_blocking_write are seen in this
> >>> (wrong) order on another CPU:
> >>>
> >>> 0:  /* blocking_writers == 0 */
> >>> 1:  write_unlock()
> >>> 2:  blocking_writers = 1
> >>>
> >>> btrfs_try_tree_read_lock, btrfs_tree_read_lock_atomic or
> >>> btrfs_try_tree_write_lock would have to squeeze all of this between 1:
> >>> and 2:
> >>
> >> This is only a problem for unlocked (optimistic) accesses in those
> >> functions. Simply because from its POV the eb->lock doesn't play any
> >> role e.g. they don't know about it at all.
> > 
> > No the lock is important here. Let's take btrfs_try_tree_read_lock as
> > example for 2nd thread:
> > 
> > T1:                            T2:
> > write_unlock
> >                                blocking_writers == 0
> > 			       try_readlock (success, locked)
> > 			       if (blocking_writers) return -> not taken
> > blocking_writers = 1

Ok, follows from what you write below and I got it wrong as the above
can't happen because of the simple unlock/lock sequence. I don't know
why I missed that, the original changelog has enough clues to see that
too.

> > Same for btrfs_tree_read_lock_atomic (with read_lock) and
> > btrfs_try_tree_write_lock (with write_lock)
> > 
> > IMO it's the other way around than you say, the optimistic lock is
> > irrelevant. We need the blocking_writers update sneak into a locked
> > section.
> 
> But write_unlock is a release operation, as per memory-barriers.txt:
> 
> (6) RELEASE operations.
> 
>      This also acts as a one-way permeable barrier.  It guarantees that
> all  memory operations before the RELEASE operation will appear to
> happen  before the RELEASE operation with respect to the other
> components of the  system. RELEASE operations include UNLOCK operations
> and smp_store_release() operations.
> 
> ...
> 
> The use of ACQUIRE and RELEASE operations generally precludes the need
> for other sorts of memory barrier (but note the exceptions mentioned in
>  the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE
> pair is -not- guaranteed to act as a full memory barrier.  However,
> after  an ACQUIRE on a given variable, all memory accesses preceding any
> prior RELEASE on that same variable are guaranteed to be visible.
> 
> try_readlock is an ACQUIRE operation w.r.t the lock. So if it succeeds
> then all previous accesses e.g. blocking_writer = 1 are guaranteed to be
> visible because we have RELEASE (write_unlock ) + ACQUIRE (try_readlock
> in case of success).
> 
> So the blocking_write check AFTER try_readlock has succeeded is
> guaranteed to see the update to blocking_writers in
> btrfs_set_lock_blocking_write.

Yeah. As long as the slow paths check the value inside locks, we're
safe. And this is done in all the functions AFAICS.

So far we haven't found a problem with the locks on TSO and non-TSO
arches. Thanks for the barrier reasoning exercise, I'll drop the patch.