Re: dcache locking question

From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Eric Biggers <ebiggers@kernel.org>,
	"Tobin C. Harding" <me@tobin.cc>,
	linux-fsdevel@vger.kernel.org
Subject: Re: dcache locking question
Date: Mon, 18 Mar 2019 10:11:06 -0700	[thread overview]
Message-ID: <20190318171106.GK4102@linux.ibm.com> (raw)
In-Reply-To: <1552926378.3203.13.camel@HansenPartnership.com>

On Mon, Mar 18, 2019 at 09:26:18AM -0700, James Bottomley wrote:
> On Sun, 2019-03-17 at 17:35 -0700, Paul E. McKenney wrote:
> > On Sat, Mar 16, 2019 at 09:23:16PM -0700, James Bottomley wrote:
> > > On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> > > > On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > > > > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > > > > [...]
> > > > > >  I -have- seen stores of constant values be torn, but not
> > > > > > stores of runtime-variable values and not loads.  Still, such
> > > > > > tearing is permitted, and including the READ_ONCE() is making
> > > > > > it easier for things like thread sanitizers.  In addition,
> > > > > > the READ_ONCE() makes it clear that the value being loaded is
> > > > > > unstable, which can be useful documentation.
> > > > > 
> > > > > Um, just so I'm clear, because this assumption permeates all
> > > > > our code: load or store tearing can never occur if we're doing
> > > > > load or store of a 32 bit value which is naturally
> > > > > aligned.  Where naturally aligned is within the gift of the CPU
> > > > > to determine but which the compiler or kernel will always
> > > > > ensure for us unless we pack the structure or deliberately
> > > > > misalign the allocation.
> > 
> > A non-volatile store of certain 32-bit constants can and does tear
> > on some architectures.  These architectures would be the ones with a
> > store-immediate instruction with a small immediate field, and where
> > the 32-bit constant is such that a pair of 16-bit immediate store
> > instructions can store that value.
> 
> Understood: PA-RISC is one such architecture: our ldil (load immediate
> long) can only take 21 bits of immediate data and you have to use a
> second instruction (ldo) to get the remaining 11 bits. However, the
> compiler guarantees no tearing in memory visibility for PA by doing the
> lidl/ldo sequence on a register and then writing the register to memory
> which I believe is an architectural guarantee.

Good to know, thank you!

> > There was a bug in an old version of GCC where even volatile 32-bit
> > stores of these constants would tear.  They did fix the bug, but it
> > took some time to find a GCC person who understood that this was in
> > fact a bug.
> > 
> > Hence my preference for READ_ONCE() and WRITE_ONCE() for data-racing
> > loads and stores.
> 
> OK, but didn't everyone eventually agree this was a compiler bug?

They did agree, but only in the case where the store was volatile,
as in WRITE_ONCE(), and -not- in the case of a plain store.

At least the kernel doesn't make general use of vector instructions.
If it did, I would not be surprised to see compilers use three 32-bit
vector stores to store to a 32-bit int adjacent to a 64-bit pointer.  :-/

							Thanx, Paul

> > > > Wait a sec; are there any 64bit architectures where the same is
> > > > not guaranteed for dereferencing properly aligned void **?
> > > 
> > > Yes, naturally alligned void * dereference shouldn't tear
> > > either.  Iwas just using 32 bit as my example because 64 bit
> > > accesses will tear on 32 bit architectures but 64 bit naturally
> > > aligned accesses shouldn't tear on 64 bit architectures.  However,
> > > since we can't guarantee the 64 bitness of the architecture 32 bit
> > > or void * is our gold standard for not tearing.
> > 
> > For stores of quantities not known at compiler time, agreed.  But
> > that same store-immediate situation could happen on 64-bit systems.
> > 
> > > James
> > > 
> > > 
> > > > If that's the case, I can think of quite a few places that are
> > > > rather dubious, and I don't see how READ_ONCE() could help in
> > > > those - e.g. if an architecture only has 32bit loads, rcu list
> > > > traversals are not going to be doable without one hell of an
> > > > extra headache.
> > 
> > All the 64-bit systems that run the Linux kernel do have 64-bit load
> > instructions and rcu_dereference() uses READ_ONCE() internally, so we
> > should be fine with RCU list traverals.
> 
> I really don't think it's possible to get the same immediate constant
> tearing bug on 64 bit.  If you look at PA, we have no 64 bit
> equivalent of the ldil/ldo pair so all 64 bit immediate stores come
> straight from the global data table via a register, so no tearing.  I
> bet every 64 bit architecture has a similar approach because 64 bit
> immediate data just requires too many bits to stuff into an instruction
> pair.
> 
> James
>