All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] spinlocks: remove 'volatile'
@ 2006-07-07 10:21 Chuck Ebbert
  2006-07-07 17:09 ` Linus Torvalds
  0 siblings, 1 reply; 119+ messages in thread
From: Chuck Ebbert @ 2006-07-07 10:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Ingo Molnar, Arjan van de Ven

In-Reply-To: <Pine.LNX.4.64.0607061333190.3869@g5.osdl.org>

On Thu, 6 Jul 2006 13:34:14 -0700, Linus Torvalds wrote:

> > So I _think_ that we should change the "=m" to the much more correct "+m" 
> > at the same time (or before - it's really a bug-fix regardless) as 
> > removing the "volatile".
> 
> Here's a first cut (UNTESTED!) for x86. I didn't check any other 
> architectures, I bet they have similar problems.

>  #define __raw_spin_unlock_string \
>       "movb $1,%0" \
> -             :"=m" (lock->slock) : : "memory"
> +             :"+m" (lock->slock) : : "memory"
 
This really is just an overwrite of whatever is there.  OTOH I can't see
how this change could hurt anything..

-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 10:21 [patch] spinlocks: remove 'volatile' Chuck Ebbert
@ 2006-07-07 17:09 ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-07 17:09 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, Ingo Molnar, Arjan van de Ven



On Fri, 7 Jul 2006, Chuck Ebbert wrote:
> 
> >  #define __raw_spin_unlock_string \
> >       "movb $1,%0" \
> > -             :"=m" (lock->slock) : : "memory"
> > +             :"+m" (lock->slock) : : "memory"
>  
> This really is just an overwrite of whatever is there.  OTOH I can't see
> how this change could hurt anything..

Yeah, I don't think any non-buggy sequence can make it matter.

In theory, you could have something like

 - create a spinlock already locked
 - add the object that contains the spinlock to some global queues
 - do some op on the object to finalize it
 - unlock the spinlock 

and then the "unlock" had better not optimize away the previous 
initialization.

HOWEVER, it can't really do that anyway, since anything that made the 
spinlock visible to anybody else would have had to serialize the lock 
state for _that_, so if the "+m" vs "=m" makes a difference, you'd have 
had a serious bug already for other reasons.

I'll leave it with a "+m". I've tested it locally, and looking at the 
patch it definitely fixes real bugs in the inline asms, but I still hope 
other people will take a look before I commit it, in case there are any 
other subtle cases.

(A "+m" should always be safer than a "=m", so the patch should be very 
safe, but hey, bugs happen to "obvious" code too)

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 13:46     ` Chase Venters
  2006-07-09  4:39       ` Benjamin Herrenschmidt
@ 2006-07-14  3:30       ` Steven Rostedt
  1 sibling, 0 replies; 119+ messages in thread
From: Steven Rostedt @ 2006-07-14  3:30 UTC (permalink / raw)
  To: Chase Venters
  Cc: trajce nedev, torvalds, acahalan, linux-kernel, linux-os, khc,
	mingo, akpm, arjan

On Sat, 2006-07-08 at 08:46 -0500, Chase Venters wrote:
> On Saturday 08 July 2006 01:44, trajce nedev wrote:
> > On Sat, 8 Jul 2006, Chase Venters wrote:
> > >Perhaps you should have followed this thread closely before composing your
> > >assault on Linus. We're not talking about "asm volatile". We're talking
> > >about
> > >the "volatile" keyword as applied to variables. 'volatile' as applied to
> > >inline ASM is of course necessary in many cases -- no one is disputing
> > >that.
> >
> > Ok, let's port a spinlock macro that spins instead of context switches
> > instead of using the pthread garbage on IA64 or AMD64:
> >
> > #if ((defined (_M_IA64) || defined (_M_AMD64)) && !defined(NT_INTEREX))
> > #include <windows.h>
> > #pragma intrinsic (_InterlockedExchange)
> >
> > typedef volatile LONG lock_t[1];
> >
> > #define LockInit(v)	((v)[0] = 0)
> > #define LockFree(v)	((v)[0] = 0)
> > #define Unlock(v)	((v)[0] = 0)
> >
> > __forceinline void Lock(volatile LONG *hPtr)
> > {
> > 	int iValue;
> >
> > 	for (;;) {
> > 		iValue = _InterlockedExchange((LPLONG)hPtr, 1);
> > 		if (iValue == 0)
> > 			return;
> > 		while (*hPtr);
> > 	}
> > }
> >
> > Please show me how I can write this to spinlock without using volatile.
> 
> Please show me how that lock is safe without a compiler memory barrier! What's 
> to stop your compiler from moving loads and stores across your inlined lock 
> code?
> 
> When you add the missing compiler memory barrier, the "volatile" classifier 
> becomes unnecessary.
> 
> Actually, please just read the thread. We've been over this already. It's 
> starting to get really old.

Actually it was good that he posted.  It just proved Linus's assumption
that those that use volatile usually don't understand exactly what is
going on.  I certainly learned a lot in this thread ;)

[ /me goes off to fix some of my broken "volatile" code ... ]

-- Steve



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09 21:10             ` Pavel Machek
  2006-07-09 22:18               ` Linus Torvalds
@ 2006-07-10 16:25               ` marty fouts
  1 sibling, 0 replies; 119+ messages in thread
From: marty fouts @ 2006-07-10 16:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Albert Cahalan, tglx, joe.korty, linux-kernel,
	linux-os, khc, mingo, akpm, arjan

On 7/9/06, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
> > Btw, I think that the whole standard definition of "volatile" is pretty
> > weak and useless. The standard could be improved, and a way to improve the
> > definition of volatile would actually be to say something like
> >
> >       "volatile" implies that the access to that entity can alias with
> >       any other access.
> >
> > That's actually a lot simpler for a compiler writer (a C compiler already
> > has to know about the notion of data aliasing), and gives a lot more
> > useful (and strict) semantics to the whole concept.
> >
> > So to look at the previous example of
> >
> >       extern int a;
> >       extern int volatile b;
> >
> >       void testfn(void)
> >       {
> >               a++;
> >               b++;
> >       }
> >
> > _my_ definition of "volatile" is actually totally unambiguous, and not
> > just simpler than the current standard, it is also stronger. It would make
> > it clearly invalid to read the value of "b" until the value of "a" has
> > been written, because (by my definition), "b" may actually alias the value
> > of "a", so you clearly cannot read "b" until "a" has been updated.
> ...
> > In contrast, the current C standard definition of "volatile" is not only
> > cumbersome and inconvenient, it's also badly defined when it comes to
> > accesses to _other_ data, making it clearly less useful.
> >
> > I personally think that my simpler definition of volatile is actually a
> > perfectly valid implementation of the current definition of volatile, and
> > I suggested it to some gcc people as a better way to handle "volatile"
> > inside gcc while still being standards-conforming (ie the "can alias
> > anything" thing is not just clearer and simpler, it's strictly a subset of
> > what the C standard allows, meaning that I think you can adopt my
> > definition _without_ breaking any old programs or standards).
>
> Are you sure?
>
> volatile int a; a=1; a=2;
>
> ...under old definition, there's nothing to optimize but AFAICT, your
> definition allows optimizing out a=1.
>
>                                                Pavel
> --

You don't have to go that far.  "Alias with anything" is too vague,
because it basically means that all external variables are volatile if
any are.  In the original example, b shouldn't be allowed as an alias
for a, because a is not "volatile". Otherwise, once you've added any
volatile variables you've effectively said "all variables are
volatile".

Anyway, this approach is the opposite of the original intent of
volatile.  Originally, the idea was that volatile meant "anything
might have changed b since the last time you referenced it".  Here,
you're saying "changing b could change anything".

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09 21:10             ` Pavel Machek
@ 2006-07-09 22:18               ` Linus Torvalds
  2006-07-10 16:25               ` marty fouts
  1 sibling, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-09 22:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Albert Cahalan, tglx, joe.korty, linux-kernel, linux-os, khc,
	mingo, akpm, arjan



On Sun, 9 Jul 2006, Pavel Machek wrote:
> 
> volatile int a; a=1; a=2;
> 
> ...under old definition, there's nothing to optimize but AFAICT, your
> definition allows optimizing out a=1.

If 'a' can alias anything, then by definition the first 'a=1' could have 
changed something else than the second one. Otherwise, it couldn't have 
aliased "anything", it would have aliased only something _particular_. 

IOW, you can think of "aliasing anything" as being equivalent to saying 
"the address is indeterminate". Two writes could literally go to two 
different things.

But yeah, maybe that's not really perfect either. It leaves the 
read-vs-read orderign still open.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 20:11           ` Linus Torvalds
@ 2006-07-09 21:10             ` Pavel Machek
  2006-07-09 22:18               ` Linus Torvalds
  2006-07-10 16:25               ` marty fouts
  0 siblings, 2 replies; 119+ messages in thread
From: Pavel Machek @ 2006-07-09 21:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Albert Cahalan, tglx, joe.korty, linux-kernel, linux-os, khc,
	mingo, akpm, arjan

Hi!

> Btw, I think that the whole standard definition of "volatile" is pretty 
> weak and useless. The standard could be improved, and a way to improve the 
> definition of volatile would actually be to say something like
> 
> 	"volatile" implies that the access to that entity can alias with 
> 	any other access.
> 
> That's actually a lot simpler for a compiler writer (a C compiler already 
> has to know about the notion of data aliasing), and gives a lot more 
> useful (and strict) semantics to the whole concept.
> 
> So to look at the previous example of
> 
> 	extern int a;
> 	extern int volatile b;
> 
> 	void testfn(void)
> 	{
> 		a++;
> 		b++;
> 	}
> 
> _my_ definition of "volatile" is actually totally unambiguous, and not 
> just simpler than the current standard, it is also stronger. It would make 
> it clearly invalid to read the value of "b" until the value of "a" has 
> been written, because (by my definition), "b" may actually alias the value 
> of "a", so you clearly cannot read "b" until "a" has been updated.
...
> In contrast, the current C standard definition of "volatile" is not only 
> cumbersome and inconvenient, it's also badly defined when it comes to 
> accesses to _other_ data, making it clearly less useful.
> 
> I personally think that my simpler definition of volatile is actually a 
> perfectly valid implementation of the current definition of volatile, and 
> I suggested it to some gcc people as a better way to handle "volatile" 
> inside gcc while still being standards-conforming (ie the "can alias 
> anything" thing is not just clearer and simpler, it's strictly a subset of 
> what the C standard allows, meaning that I think you can adopt my 
> definition _without_ breaking any old programs or standards).

Are you sure?

volatile int a; a=1; a=2;

...under old definition, there's nothing to optimize but AFAICT, your
definition allows optimizing out a=1.

						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09 19:16                                           ` David Schwartz
@ 2006-07-09 19:51                                             ` Theodore Tso
  0 siblings, 0 replies; 119+ messages in thread
From: Theodore Tso @ 2006-07-09 19:51 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

On Sun, Jul 09, 2006 at 12:16:15PM -0700, David Schwartz wrote:
> > Volatile is useful for non device driver work, for example VJ-style
> > channels.  A portable volatile can help to code such things in a
> > compiler-neutral and platform-neutral way.  Linux doesn't care about
> > compiler neutrality, being coded in GNU C, and about platform
> > neutrality, having a per-arch abstraction layer, but other programs may
> > wish to run on multiple compilers and multiple platforms without
> > per-platform glue layers.
> 
> 	There is a portable volatile, it's called 'pthread_mutex_lock'. It allows
> you to code such things in a compiler-neutral and platform-neutral way. You
> don't have to worry about what the compiler might do, what the hardware
> might do, what atomic operations the CPU supports, or anything like that.
> The atomicity issues I've mentioned in my other posts make any attempt at
> creating a 'portable volatile' for shared memory more or less doomed from
> the start.

The other thing to add here is that if you're outside of the kernel,
you *don't* want to be implementing your own spinlock if for no other
reason than it's a performance disaster.  The scheduler doesn't know
that you are spinning waiting for a lock, so you will be scheduled and
burning cpu time (and energy, and heat budget) while waiting for the
the lock.  I once spent over a week on-site at a customer complaining
about Linux performance problems, and it turned out the reason why was
because they thought they were being "smart" by implementing their own
spinlock in userspace.  Bad bad bad.

So if a userspace progam ever uses volatile, it's almost certainly a
bug, one way or another.

						- Ted

^ permalink raw reply	[flat|nested] 119+ messages in thread

* RE: [patch] spinlocks: remove 'volatile'
  2006-07-09 12:47                                         ` Avi Kivity
@ 2006-07-09 19:16                                           ` David Schwartz
  2006-07-09 19:51                                             ` Theodore Tso
  0 siblings, 1 reply; 119+ messages in thread
From: David Schwartz @ 2006-07-09 19:16 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> Volatile is useful for non device driver work, for example VJ-style
> channels.  A portable volatile can help to code such things in a
> compiler-neutral and platform-neutral way.  Linux doesn't care about
> compiler neutrality, being coded in GNU C, and about platform
> neutrality, having a per-arch abstraction layer, but other programs may
> wish to run on multiple compilers and multiple platforms without
> per-platform glue layers.

	There is a portable volatile, it's called 'pthread_mutex_lock'. It allows
you to code such things in a compiler-neutral and platform-neutral way. You
don't have to worry about what the compiler might do, what the hardware
might do, what atomic operations the CPU supports, or anything like that.
The atomicity issues I've mentioned in my other posts make any attempt at
creating a 'portable volatile' for shared memory more or less doomed from
the start.

	DS



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09 14:28                             ` Roman Zippel
@ 2006-07-09 15:27                               ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-09 15:27 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Keith Owens, Andi Kleen, Jan Hubicka, David S. Miller,
	Richard Henderson, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, arjan



On Sun, 9 Jul 2006, Roman Zippel wrote:
> 
> On Sat, 8 Jul 2006, Linus Torvalds wrote:
> > 
> > but we've been using that "+m" format for a long time already (and I 
> > _think_ we did so at the suggestion of gcc people), and it would be much 
> > better if the gcc documentation was just fixed here.
> 
> IIRC early 4.0 had problems with "+m" and did what the documentation says, 
> but that got quickly fixed.
> Here is the old thread http://marc.theaimsgroup.com/?t=108384517900001&r=1&w=2

Thanks. That one implies that we had some strange warnings (but it 
apparently still worked) with 3.4, and it's since gotten fixed. Goodie. 

I'll leave the "+m" alone then, both the old and the new ones.

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  3:58                           ` Linus Torvalds
  2006-07-09  6:13                             ` David Miller
@ 2006-07-09 14:28                             ` Roman Zippel
  2006-07-09 15:27                               ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Roman Zippel @ 2006-07-09 14:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Keith Owens, Andi Kleen, Jan Hubicka, David S. Miller,
	Richard Henderson, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, arjan

Hi,

On Sat, 8 Jul 2006, Linus Torvalds wrote:

> but we've been using that "+m" format for a long time already (and I 
> _think_ we did so at the suggestion of gcc people), and it would be much 
> better if the gcc documentation was just fixed here.

IIRC early 4.0 had problems with "+m" and did what the documentation says, 
but that got quickly fixed.
Here is the old thread http://marc.theaimsgroup.com/?t=108384517900001&r=1&w=2

bye, Roman

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  4:19                                       ` Benjamin Herrenschmidt
@ 2006-07-09 12:47                                         ` Avi Kivity
  2006-07-09 19:16                                           ` David Schwartz
  0 siblings, 1 reply; 119+ messages in thread
From: Avi Kivity @ 2006-07-09 12:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Arjan van de Ven, Linus Torvalds, Mark Lord,
	linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Benjamin Herrenschmidt wrote:
>
> > I didn't suggest the compiler could or should do it, just that it would
> > be possible (for the _user_) to write portable ISO C code to access PCI
> > mmio registers, if volatile's implementation serialized access.
>
> That isn't possible neither. How to actually serialize access can
> require different set of primitives depending on the storage class of
> the memory you are accessing, which the C compiler has 0 knowledge
> about. (For example, cacheable storage vs. write-through vs.
> non-cacheable guarded vs. non-cacheable non-guarded on powerpc, there
> are different issues on other architectures).
>

Okay.  I guess this limits portable volatile to main memory.  Thanks for 
the clarification.

> > With the current implementation of volatile in gcc, it is impossible -
> > you need to resort to inline assembly for some architectures, which is
> > not an ISO C feature.
>
> ISO C has never been about writing device drivers. There is simply no
> choice here. You need an atchitecture specific set of accessors. If you
> want portable code, then pick a library like libpci and make sure it
> contains all you need on all the architectures you need. Then write
> portable code on top of it.
>

Indeed, I see no other way now.

> > And I'm not suggesting that it would be a good idea to use volatile 
> even
> > if it was corrected - it would have to take a worst-case approach and
> > thus would generate very bad code.
>
> So what is the point ?
>

Volatile is useful for non device driver work, for example VJ-style 
channels.  A portable volatile can help to code such things in a 
compiler-neutral and platform-neutral way.  Linux doesn't care about 
compiler neutrality, being coded in GNU C, and about platform 
neutrality, having a per-arch abstraction layer, but other programs may 
wish to run on multiple compilers and multiple platforms without 
per-platform glue layers.

Adding barriers to volatile can take it from dangerously useless to 
somewhat* useful.  Not for Linux, but other projects do exist.

* possibly barriers are still required, if volatile data points to 
non-volatile data.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 20:40                                             ` Chase Venters
  2006-07-08 20:47                                               ` Chase Venters
@ 2006-07-09 10:57                                               ` Krzysztof Halasa
  1 sibling, 0 replies; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-09 10:57 UTC (permalink / raw)
  To: Chase Venters
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

Chase Venters <chase.venters@clientec.com> writes:

>> Sure, but a barrier alone isn't enough. You have to use assembler and
>> it's beyond scope of C volatile.
>
> Right, which is why volatile is wrong.

In this case (and not only this). Of course. But not always.

> You need the barrier for both the CPU and the compiler.

For spinlocks, yes.

For other things... Sometimes you need a barrier for the compiler
only. Sometimes you don't need any general barrier, you only need
to make sure a single variable isn't being cached (by the compiler).
That's what volatile is for.

Saying that volatile is always wrong looks to me like saying that goto
is always wrong :-) And yes, there are people who say that every single
goto in the universe is wrong.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  3:58                           ` Linus Torvalds
@ 2006-07-09  6:13                             ` David Miller
  2006-07-09 14:28                             ` Roman Zippel
  1 sibling, 0 replies; 119+ messages in thread
From: David Miller @ 2006-07-09  6:13 UTC (permalink / raw)
  To: torvalds; +Cc: kaos, ak, jh, rth, mingo, akpm, linux-kernel, arjan

From: Linus Torvalds <torvalds@osdl.org>
Date: Sat, 8 Jul 2006 20:58:37 -0700 (PDT)

> We can certainly write
> 
> 	...
> 	:"=m" (*ptr)
> 	:"m" (*ptr)
> 	...
> 
> instead of the much simpler
> 
> 	:"+m" (*ptr)
> 
> but we've been using that "+m" format for a long time already (and I 
> _think_ we did so at the suggestion of gcc people), and it would be much 
> better if the gcc documentation was just fixed here.

I honestly don't know why that language is there about '+' saying to
only use it when the constraints allow a register.  Perhaps there are
some implications for reload.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 13:46     ` Chase Venters
@ 2006-07-09  4:39       ` Benjamin Herrenschmidt
  2006-07-14  3:30       ` Steven Rostedt
  1 sibling, 0 replies; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2006-07-09  4:39 UTC (permalink / raw)
  To: Chase Venters
  Cc: trajce nedev, torvalds, acahalan, linux-kernel, linux-os, khc,
	mingo, akpm, arjan


> Please show me how that lock is safe without a compiler memory barrier! What's 
> to stop your compiler from moving loads and stores across your inlined lock 
> code?
> 
> When you add the missing compiler memory barrier, the "volatile" classifier 
> becomes unnecessary.
> 
> Actually, please just read the thread. We've been over this already. It's 
> starting to get really old.

This is also a very good example of why pretty much every time somebody
tries to implement lock-like primitives in userland without using
libpthread, they get them wrong... we recently had to debug for a
customer a library that typically had that sort of home-made "I know how
this works" kind of locks that were actually subtley broken on machines
with deep out of order store queues. Of course, we didn't have the
source to the library and the customer was blaming the processors and/or
linux for his bugs.

There should be a real big flashing warning at the very beginning of
every computer programming course, book, whatever, compiler etc...
"Don't ever try to do locking & atomic primitives yourselves" or
something like that... In fact, I would suggest gcc to warn every time
somebody writes a function that has "lock" in the name :) Ok... maybe
not, but heh.

>From my experience, this is a very common source of bugs especially in
threaded apps. People have fantasies that they can do locks faster than
the system libraries, or have this religious-class beleif that
libpthread is bad, or whatever and they get it wrong pretty much every
single time. They are often saved by x86 which is very much in order
still but even if that is changing.

In fact, in a former life when I was still porting windows
applications/drivers to MacOS, I think pretty much ever single time I
had to port some threaded app, it happend. Ugly. Drivers writers
occasionally got smarter and uses whatever primitives the NT kernel
provides (can't remember), possibly because that's what the sample code
does :) Apps were hopeless. Every single of them.

Ben.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 10:18                                     ` Avi Kivity
  2006-07-08 10:28                                       ` Thomas Gleixner
@ 2006-07-09  4:19                                       ` Benjamin Herrenschmidt
  2006-07-09 12:47                                         ` Avi Kivity
  1 sibling, 1 reply; 119+ messages in thread
From: Benjamin Herrenschmidt @ 2006-07-09  4:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Arjan van de Ven, Linus Torvalds, Mark Lord,
	linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel


> I didn't suggest the compiler could or should do it, just that it would 
> be possible (for the _user_) to write portable ISO C code to access PCI 
> mmio registers, if volatile's implementation serialized access.

That isn't possible neither. How to actually serialize access can
require different set of primitives depending on the storage class of
the memory you are accessing, which the C compiler has 0 knowledge
about. (For example, cacheable storage vs. write-through vs.
non-cacheable guarded vs. non-cacheable non-guarded on powerpc, there
are different issues on other architectures).

> With the current implementation of volatile in gcc, it is impossible - 
> you need to resort to inline assembly for some architectures, which is 
> not an ISO C feature.

ISO C has never been about writing device drivers. There is simply no
choice here. You need an atchitecture specific set of accessors. If you
want portable code, then pick a library like libpci and make sure it
contains all you need on all the architectures you need. Then write
portable code on top of it. 

> And I'm not suggesting that it would be a good idea to use volatile even 
> if it was corrected - it would have to take a worst-case approach and 
> thus would generate very bad code.

So what is the point ?

Ben.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  3:07                         ` Keith Owens
  2006-07-09  3:29                           ` Linus Torvalds
@ 2006-07-09  3:58                           ` Linus Torvalds
  2006-07-09  6:13                             ` David Miller
  2006-07-09 14:28                             ` Roman Zippel
  1 sibling, 2 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-09  3:58 UTC (permalink / raw)
  To: Keith Owens, Andi Kleen, Jan Hubicka, David S. Miller, Richard Henderson
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List, arjan



On Sun, 9 Jul 2006, Keith Owens wrote:
> 
>   			 "... Extended asm supports input-output or
>   read-write operands.  Use the constraint character `+' to indicate
>   such an operand and list it with the output operands.  You should
>   only use read-write operands when the constraints for the operand (or
>   the operand in which only some of the bits are to be changed) allow a
>   register."

Btw, gcc-4.1.1 docs seem to also have this language, although when you 
actually go to the "Constraint Modifier Characters" section, that thing 
doesn't actually say anything about "only for registers".

It would be good to have the gcc docs fixed. As mentioned, we've been 
using "+m" for at least a year (most of our current "+m" usage was there 
in 2.6.13), and some of those uses have actually been added by people that 
are at least active on the gcc development lists (eg Andi Kleen).

But let's add a few more people who are more deeply involved with gcc. 
Jan? Richard? Davem? Who would be the right person to check this out?

We can certainly write

	...
	:"=m" (*ptr)
	:"m" (*ptr)
	...

instead of the much simpler

	:"+m" (*ptr)

but we've been using that "+m" format for a long time already (and I 
_think_ we did so at the suggestion of gcc people), and it would be much 
better if the gcc documentation was just fixed here.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  3:29                           ` Linus Torvalds
@ 2006-07-09  3:43                             ` Keith Owens
  0 siblings, 0 replies; 119+ messages in thread
From: Keith Owens @ 2006-07-09  3:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan

Linus Torvalds (on Sat, 8 Jul 2006 20:29:12 -0700 (PDT)) wrote:
>
>
>On Sun, 9 Jul 2006, Keith Owens wrote:
>> 
>> This disagrees with the gcc (4.1.0) docs.  info --index-search='Extended Asm' gcc
>> 
>>   The ordinary output operands must be write-only; GCC will assume
>>   that the values in these operands before the instruction are dead and
>>   need not be generated.  Extended asm supports input-output or
>>   read-write operands.  Use the constraint character `+' to indicate
>>   such an operand and list it with the output operands.  You should
>>   only use read-write operands when the constraints for the operand (or
>>   the operand in which only some of the bits are to be changed) allow a
>>   register.
>
>I'm fairly sure the docs are outdated (but may well be correct for older 
>gcc versions - as I already discussed elsewhere, that "+" thing was not 
>historically useful).
>
>We've been using "+m" for some time in the kernel on several 
>architectures.
>
>git aficionados can do
>
>	git grep -1 '"+m"' v2.6.17
>
>to see the pre-existing usage of this (most of them go back a lot further, 
>although some of them are newer - the <asm-i386/bitops.h> ones were added 
>in January.
>
>So if "+m" didn't work, we've been in trouble for at least the last year.

The oldest gcc I can access quickly is 3.2.3 (May 2005) which has
significantly different wording.

  The ordinary output operands must be write-only; GCC will assume that
  the values in these operands before the instruction are dead and need
  not be generated.  Extended asm supports input-output or read-write
  operands.  Use the constraint character `+' to indicate such an
  operand and list it with the output operands.

  When the constraints for the read-write operand (or the operand in
  which only some of the bits are to be changed) allows a register, you
  may, as an alternative, logically split its function into two
  separate operands, one input operand and one write-only output
  operand.  The connection between them is expressed by constraints
  which say they need to be in the same location when the instruction
  executes.

In 3.2.3, the '+' form allows any constraint, only the split notation
requires that the operand be in a register.  In 4.1.0, the docs now say
that the '+' form requires a register.  I suspect that you are right
and the gcc docs are incorrect, but it does not hurt to check.  Would
anybody from the gcc team like to pipe up and give an opinion?


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-09  3:07                         ` Keith Owens
@ 2006-07-09  3:29                           ` Linus Torvalds
  2006-07-09  3:43                             ` Keith Owens
  2006-07-09  3:58                           ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-09  3:29 UTC (permalink / raw)
  To: Keith Owens; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Sun, 9 Jul 2006, Keith Owens wrote:
> 
> This disagrees with the gcc (4.1.0) docs.  info --index-search='Extended Asm' gcc
> 
>   The ordinary output operands must be write-only; GCC will assume
>   that the values in these operands before the instruction are dead and
>   need not be generated.  Extended asm supports input-output or
>   read-write operands.  Use the constraint character `+' to indicate
>   such an operand and list it with the output operands.  You should
>   only use read-write operands when the constraints for the operand (or
>   the operand in which only some of the bits are to be changed) allow a
>   register.

I'm fairly sure the docs are outdated (but may well be correct for older 
gcc versions - as I already discussed elsewhere, that "+" thing was not 
historically useful).

We've been using "+m" for some time in the kernel on several 
architectures.

git aficionados can do

	git grep -1 '"+m"' v2.6.17

to see the pre-existing usage of this (most of them go back a lot further, 
although some of them are newer - the <asm-i386/bitops.h> ones were added 
in January.

So if "+m" didn't work, we've been in trouble for at least the last year.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 22:50                         ` Ralf Baechle
@ 2006-07-09  3:09                           ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-09  3:09 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Sat, 8 Jul 2006, Ralf Baechle wrote:
> 
> I tried the same on MIPS, for lazyness sake at first only in atomic.h.  With
> gcc 3.3 the code size is exactly the same with both "=m" and "+m", so I
> didn't look into details of the generated code.  With gcc 4.1 "+m" results
> in a size increase of about 1K for the ip27_defconfig kernel.  For example:
> 
> <unlock_kernel>:
>        df830000        ld      v1,0(gp)
>        8c620028        lw      v0,40(v1)
>        04400014        bltz    v0,a80000000029944c <unlock_kernel+0x5c>
>        00000000        nop
>        2442ffff        subiu   v0,v0,1
>        ac620028        sw      v0,40(v1)	# current->lock_depth
>        8c630028        lw      v1,40(v1)	# current->lock_depth
>        0461000b        bgez    v1,a80000000029943c <unlock_kernel+0x4c>
> 
> The poinless load isn't generated with "=m".  The interesting thing is
> that in all the instances of bloat I looked at it was actually happening
> not as part of the asm statement itself, so maybe gcc's reload is getting
> a little confused.

Indeed, that looks like gcc confusion, because that pointless load is 
literally just re-loading the "task->lock_depth" that is all part of 
perfecly normal C code _before_ the inline assembly even is reached.

Of course, if a "+m" causes gcc confusion, that's bad in itself, and 
indicates that "=m" + "m" may actually be preferable due to some internal 
gcc buglet.

I do _not_ see the same extra load on ppc64 or indeed x86 (gcc-4.1.1 in 
both cases), so there seems to be something MIPS-specific here.

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:34                       ` Linus Torvalds
  2006-07-08 22:50                         ` Ralf Baechle
@ 2006-07-09  3:07                         ` Keith Owens
  2006-07-09  3:29                           ` Linus Torvalds
  2006-07-09  3:58                           ` Linus Torvalds
  1 sibling, 2 replies; 119+ messages in thread
From: Keith Owens @ 2006-07-09  3:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan

Linus Torvalds (on Thu, 6 Jul 2006 13:34:14 -0700 (PDT)) wrote:
>
>
>On Thu, 6 Jul 2006, Linus Torvalds wrote:
>> 
>> So I _think_ that we should change the "=m" to the much more correct "+m" 
>> at the same time (or before - it's really a bug-fix regardless) as 
>> removing the "volatile".
>
>Here's a first cut (UNTESTED!) for x86. I didn't check any other 
>architectures, I bet they have similar problems.
>
>		Linus
>
>----
>diff --git a/include/asm-i386/atomic.h b/include/asm-i386/atomic.h
>index 4f061fa..51a1662 100644
>--- a/include/asm-i386/atomic.h
>+++ b/include/asm-i386/atomic.h
>@@ -46,8 +46,8 @@ static __inline__ void atomic_add(int i,
> {
> 	__asm__ __volatile__(
> 		LOCK_PREFIX "addl %1,%0"
>-		:"=m" (v->counter)
>-		:"ir" (i), "m" (v->counter));
>+		:"+m" (v->counter)
>+		:"ir" (i));
> }

This disagrees with the gcc (4.1.0) docs.  info --index-search='Extended Asm' gcc

  The ordinary output operands must be write-only; GCC will assume
  that the values in these operands before the instruction are dead and
  need not be generated.  Extended asm supports input-output or
  read-write operands.  Use the constraint character `+' to indicate
  such an operand and list it with the output operands.  You should
  only use read-write operands when the constraints for the operand (or
  the operand in which only some of the bits are to be changed) allow a
  register.

All spinlocks must be in memory, which conflicts with the "allow a
register" above.  I hope that the gcc docs are wrong, and read/write
operands can be used on memory as well as registers.  I would hate for
a future gcc to be more strict about this check and break the spinlock
code.  It is interesting that the next paragraph has no such
restriction.

  You may, as an alternative, logically split its function into two
  separate operands, one input operand and one write-only output
  operand.  The connection between them is expressed by constraints
  which say they need to be in the same location when the instruction
  executes.  You can use the same C expression for both operands, or
  different expressions.  For example, here we write the (fictitious)
  `combine' instruction with `bar' as its read-only source operand and
  `foo' as its read-write destination:

     asm ("combine %2,%0" : "=r" (foo) : "0" (foo), "g" (bar));

  The constraint `"0"' for operand 1 says that it must occupy the same
  location as operand 0.  A number in constraint is allowed only in an
  input operand and it must refer to an output operand.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* RE: [patch] spinlocks: remove 'volatile'
  2006-07-08 19:33         ` Albert Cahalan
  2006-07-08 20:11           ` Linus Torvalds
@ 2006-07-08 23:10           ` David Schwartz
  1 sibling, 0 replies; 119+ messages in thread
From: David Schwartz @ 2006-07-08 23:10 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> Damn right. This is the C standard requirement.
> Not all code has Linux-like performance requirements,
> and in any case, standards are standards.

	Umm, no it's not a C standard requirement. The C standard requires
'volatile' to work in two specific cases (signals and longjmp) and further
defines certain side-effects of 'volatile' accesses that must not be removed
under any 'as-if' optimizations. GCC fully respects these three
requirements.

	For anything else, if you cannot detect it in a compliant program, it is
not a violation of the standard. Programs compliant with the C standard
cannot access shared memory that may be modified by another process nor can
they have multiple threads concurrently executing. So any proposed violation
that can only be exposed under those circumstances is not actually a
violation.

	POSIX, for several good reasons, did not extend 'volatile' to provide any
guarantees when used in conjunction with things like 'mmap' and pthreads.
For one thing, it would have added senseless expensive overhead to programs
that used 'volatile' legitimately.

	Worse, it would have opened up the atomicity Pandora's box.

	It's hard to imagine how 'volatile' could possibly be useful in an SMP
context if it has no atomicity guarantees, and it has no atomicity
guarantees. No standard, as far as I know, has ever expanded 'volatile' to
include atomicity guarantees, so it's totally and completely useless in an
SMP context. So can this argument die now?

	DS



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:34                       ` Linus Torvalds
@ 2006-07-08 22:50                         ` Ralf Baechle
  2006-07-09  3:09                           ` Linus Torvalds
  2006-07-09  3:07                         ` Keith Owens
  1 sibling, 1 reply; 119+ messages in thread
From: Ralf Baechle @ 2006-07-08 22:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan

On Thu, Jul 06, 2006 at 01:34:14PM -0700, Linus Torvalds wrote:

> On Thu, 6 Jul 2006, Linus Torvalds wrote:
> > 
> > So I _think_ that we should change the "=m" to the much more correct "+m" 
> > at the same time (or before - it's really a bug-fix regardless) as 
> > removing the "volatile".
> 
> Here's a first cut (UNTESTED!) for x86. I didn't check any other 
> architectures, I bet they have similar problems.

I tried the same on MIPS, for lazyness sake at first only in atomic.h.  With
gcc 3.3 the code size is exactly the same with both "=m" and "+m", so I
didn't look into details of the generated code.  With gcc 4.1 "+m" results
in a size increase of about 1K for the ip27_defconfig kernel.  For example:

<unlock_kernel>:
       df830000        ld      v1,0(gp)
       8c620028        lw      v0,40(v1)
       04400014        bltz    v0,a80000000029944c <unlock_kernel+0x5c>
       00000000        nop
       2442ffff        subiu   v0,v0,1
       ac620028        sw      v0,40(v1)	# current->lock_depth
       8c630028        lw      v1,40(v1)	# current->lock_depth
       0461000b        bgez    v1,a80000000029943c <unlock_kernel+0x4c>

The poinless load isn't generated with "=m".  The interesting thing is
that in all the instances of bloat I looked at it was actually happening
not as part of the asm statement itself, so maybe gcc's reload is getting
a little confused.

  Ralf

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 20:49                                       ` Pavel Machek
@ 2006-07-08 21:43                                         ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08 21:43 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-os (Dick Johnson),
	Krzysztof Halasa, Ingo Molnar, Andrew Morton, Linux kernel,
	arjan



On Sat, 8 Jul 2006, Pavel Machek wrote:
> 
> Actually, because volatile is big hammer, it can be used to work
> around compiler bugs. If compiler dies at internal error in function
> foo, just sprinkle few volatiles into it, and you can usually work
> around that compiler problem.

Heh. That's probably an even better use of "volatile" than using it for 
hiding bugs in the sources. The bugs in the sources you'd be better off 
just _fixing_, while the compiler problems you may have a much harder time 
working around..

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:59                                     ` Linus Torvalds
  2006-07-07 22:06                                       ` Linus Torvalds
@ 2006-07-08 20:49                                       ` Pavel Machek
  2006-07-08 21:43                                         ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Pavel Machek @ 2006-07-08 20:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-os (Dick Johnson),
	Krzysztof Halasa, Ingo Molnar, Andrew Morton, Linux kernel,
	arjan

Hi!

> > This is a bait and switch argument. The code was displayed to show
> > the compiler output, not an example of good coding practice.
> 
> NO IT IS NOT.
> 
> The whole point of my argument is simple:
> 
> > > 	"'volatile' is useless. The things it did 30 years ago are much
> > > 	 more complex these days, and need to be tied to much more
> > > 	 detailed rules that depend on the actual particular problem,
> > > 	 rather than one keyword to the compiler that doesn't actually
> > > 	 give enough information for the compiler to do anything useful"
> 
> And dammit, if you cannot admit that, then you're not worth discussing 
> with.
> 
> "volatile" is useless. It's a big hammer in a world where the nails aren't 
> nails any more, they are screws, thumb-tacks, and spotwelding.

Actually, because volatile is big hammer, it can be used to work
around compiler bugs. If compiler dies at internal error in function
foo, just sprinkle few volatiles into it, and you can usually work
around that compiler problem.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 20:40                                             ` Chase Venters
@ 2006-07-08 20:47                                               ` Chase Venters
  2006-07-09 10:57                                               ` Krzysztof Halasa
  1 sibling, 0 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-08 20:47 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

On Saturday 08 July 2006 15:40, Chase Venters wrote:
> You need the barrier for both the CPU and the compiler. The CPU barrier
> comes from an instruction like '*fence' on x86 (or a locked bus op), while
> the compiler barrier comes from the memory clobber. Because the spinlocks
> already _must_ have both of these (including the other constraints in the
> inline asm), 'volatile' on the spinlock ctr is useless.

Btw, perhaps what is going on here is a misunderstanding of 
terminology? "Barrier" or "Memory barrier" can refer to both a hardware or 
compiler barrier, which is why Documentation/memory-barriers.txt speaks of 
both in the same file. Indeed, you often have both in the same spot, and the 
names are even similar:

barrier() -> compiler memory barrier
wmb() -> write memory barrier
...

Sometimes you'll see "optimization barrier", but you should remember that 
barrier() boils down to:

asm volatile ("" ::: "memory")

...because it's preventing memory caching/reordering across the unpredictable 
memory clobber.

See?

>
> Thanks,
> Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 20:09                                           ` Krzysztof Halasa
@ 2006-07-08 20:40                                             ` Chase Venters
  2006-07-08 20:47                                               ` Chase Venters
  2006-07-09 10:57                                               ` Krzysztof Halasa
  0 siblings, 2 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-08 20:40 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

On Saturday 08 July 2006 15:08, Krzysztof Halasa wrote:
> Chase Venters <chase.venters@clientec.com> writes:
> > You're mincing my words. The reason "memory" is on the clobber list is
> > because
> > a lock is supposed to synchronize all memory accesses up to that point.
> > It's a fence / a barrier, because if the compiler re-orders your
> > loads/stores across the lock, you're in trouble. That's exactly what I
> > was pointing out.
>
> Sure, but a barrier alone isn't enough. You have to use assembler and
> it's beyond scope of C volatile.

Right, which is why volatile is wrong. Which is what we've all been saying. 
Constantly.

You need the barrier for both the CPU and the compiler. The CPU barrier comes 
from an instruction like '*fence' on x86 (or a locked bus op), while the 
compiler barrier comes from the memory clobber. Because the spinlocks already 
_must_ have both of these (including the other constraints in the inline 
asm), 'volatile' on the spinlock ctr is useless.

> > A volatile cast lets you prevent the compiler from always treating the
> > variable as volatile.
>
> Yes, if that's what you really want.
>
> >> If the "volatile" is used the wrong way (which is probably true for most
> >> cases), then volatile cast and barrier() will be wrong as well. You need
> >> locks or atomic access, meaningful on hardware level.
> >
> > No. Linus already described what some example invalid uses of "volatile"
> > are.
> > One example is the very one this whole thread is about - using 'volatile'
> > on the declaration of the spinlock counter. That usage is _wrong_, and
> > barrier()
> > would not be.
>
> That's a special case, because you want to invalidate all variables,
> but you still need locking. I.e., barrier() alone doesn't buy you
> anything WRT to hardware.

Special case? It's the topic of this thread.

As for barrier() being a compiler boundary versus a hardware boundary, I don't 
think anyone is disputing that. Which is why I speak of both compiler and 
hardware barriers:

> > Volatile originally existed to tell the compiler a variable could change
> > at will. Because of reordering, it's almost never sufficient with our
> > modern compilers and CPUs. That's precisely where barrier() (and/or its
> > hardware equivalents) help in places where 'volatile' is wrong.
>
> How does barrier() help here? Some example, maybe?
> What do you consider a barrier() hardware equivalent?
> Don't you think you're mixing compiler optimization and operation of
> the hardware?

Documentation/memory-barriers.txt covers barriers pretty thoroughly. barrier() 
is a Linux macro that is part of a suite of macros to implement various kinds 
of barriers in various situations. Hardware equivalent? How about mb()?

> > Your statement is
> > additionally wrong because one use-case of memory barriers is to safely
> > write
> > lock-free code.
>
> You can't safely write lock-free code in C, if you have to deal
> with hardware or SMP. C don't know about hardware.

Documentation/memory-barriers.txt lists situations like the ones I describe. 
It's not pure C -- you end up having to insert a hardware barrier too, but 
Linux makes many macros available to do so. (Of course, these macros are 
using inline asm). Notice I specifically mention hardware barriers in the 
paragraph above. When I say one use-case of memory barriers is to safely 
write lock-free code, I'm talking about the use of both hardware and compiler 
barriers (indeed, _both_ types of barriers are documented in 
Documentation/memory-barriers.txt, along with examples of lock-free C code 
supported by Linux barrier macros).

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 19:33         ` Albert Cahalan
@ 2006-07-08 20:11           ` Linus Torvalds
  2006-07-09 21:10             ` Pavel Machek
  2006-07-08 23:10           ` David Schwartz
  1 sibling, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08 20:11 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: tglx, joe.korty, linux-kernel, linux-os, khc, mingo, akpm, arjan



On Sat, 8 Jul 2006, Albert Cahalan wrote:
> > 
> > 1. The volatile implementation of gcc is correct. The standard does not
> > talk about busses, not even about SMP.
> 
> The standard need not. An implementation must deal
> with whatever odd hardware happens to be in use.

Not really.

The fact is, "volatile" simply doesn't inform the compiler enough about 
what the effect of an access could be under various different situations.

So the compiler really has no choice. It has to balance the fact that the 
standard requires it to do _something_ different, with the fact that 
there's really not a lot of information that the user gave, apart from the 
one bit of "it's volatile".

So the compiler really has no choice.

Btw, I think that the whole standard definition of "volatile" is pretty 
weak and useless. The standard could be improved, and a way to improve the 
definition of volatile would actually be to say something like

	"volatile" implies that the access to that entity can alias with 
	any other access.

That's actually a lot simpler for a compiler writer (a C compiler already 
has to know about the notion of data aliasing), and gives a lot more 
useful (and strict) semantics to the whole concept.

So to look at the previous example of

	extern int a;
	extern int volatile b;

	void testfn(void)
	{
		a++;
		b++;
	}

_my_ definition of "volatile" is actually totally unambiguous, and not 
just simpler than the current standard, it is also stronger. It would make 
it clearly invalid to read the value of "b" until the value of "a" has 
been written, because (by my definition), "b" may actually alias the value 
of "a", so you clearly cannot read "b" until "a" has been updated.

At the same time, there's no question that

	addl $1,a
	addl $1,b

is a clearly valid instruction sequence by my simpler definition of 
volatile. The fact that "b" can alias with itself is a tautology, and is 
true of normal variables too, so any combination of ops on one variable 
(any variable always aliases _itself_) is by definition clearly always 
valid on a "volatile" variable too, and thus a compiler that can do the 
combination of "load + increment + store" on a normal variable should 
always do so on a volatile one too.

In contrast, the current C standard definition of "volatile" is not only 
cumbersome and inconvenient, it's also badly defined when it comes to 
accesses to _other_ data, making it clearly less useful.

I personally think that my simpler definition of volatile is actually a 
perfectly valid implementation of the current definition of volatile, and 
I suggested it to some gcc people as a better way to handle "volatile" 
inside gcc while still being standards-conforming (ie the "can alias 
anything" thing is not just clearer and simpler, it's strictly a subset of 
what the C standard allows, meaning that I think you can adopt my 
definition _without_ breaking any old programs or standards).

But there really is no way to "fix" volatile. You will always invariably 
need other things too (inline assembly with "lock" prefixes etc) to 
actually create true lock primitives. The suggested "can alias anything" 
semantics just clarify what it means, and thus make it less ambiguous. It 
doesn't make it fundamentally more useful in general.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 13:41                                         ` Chase Venters
@ 2006-07-08 20:09                                           ` Krzysztof Halasa
  2006-07-08 20:40                                             ` Chase Venters
  0 siblings, 1 reply; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-08 20:09 UTC (permalink / raw)
  To: Chase Venters
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

Chase Venters <chase.venters@clientec.com> writes:

> You're mincing my words. The reason "memory" is on the clobber list is
> because 
> a lock is supposed to synchronize all memory accesses up to that point. It's 
> a fence / a barrier, because if the compiler re-orders your loads/stores 
> across the lock, you're in trouble. That's exactly what I was pointing out.

Sure, but a barrier alone isn't enough. You have to use assembler and
it's beyond scope of C volatile.

> A volatile cast lets you prevent the compiler from always treating the 
> variable as volatile.

Yes, if that's what you really want.

>> If the "volatile" is used the wrong way (which is probably true for most
>> cases), then volatile cast and barrier() will be wrong as well. You need
>> locks or atomic access, meaningful on hardware level.
>
> No. Linus already described what some example invalid uses of "volatile"
> are. 
> One example is the very one this whole thread is about - using 'volatile' on 
> the declaration of the spinlock counter. That usage is _wrong_, and
> barrier() 
> would not be.

That's a special case, because you want to invalidate all variables,
but you still need locking. I.e., barrier() alone doesn't buy you
anything WRT to hardware.

> Volatile originally existed to tell the compiler a variable could change at 
> will. Because of reordering, it's almost never sufficient with our modern 
> compilers and CPUs. That's precisely where barrier() (and/or its hardware 
> equivalents) help in places where 'volatile' is wrong.

How does barrier() help here? Some example, maybe?
What do you consider a barrier() hardware equivalent?
Don't you think you're mixing compiler optimization and operation of
the hardware?

> Your statement is 
> additionally wrong because one use-case of memory barriers is to safely
> write 
> lock-free code.

You can't safely write lock-free code in C, if you have to deal
with hardware or SMP. C don't know about hardware.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 18:31       ` Thomas Gleixner
@ 2006-07-08 19:33         ` Albert Cahalan
  2006-07-08 20:11           ` Linus Torvalds
  2006-07-08 23:10           ` David Schwartz
  0 siblings, 2 replies; 119+ messages in thread
From: Albert Cahalan @ 2006-07-08 19:33 UTC (permalink / raw)
  To: tglx
  Cc: joe.korty, linux-kernel, Linus Torvalds, linux-os, khc, mingo,
	akpm, arjan

On 7/8/06, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 2006-07-08 at 11:49 -0400, Albert Cahalan wrote:
> > On 7/8/06, Thomas Gleixner <tglx@linutronix.de> wrote:

> > > In low level system programming there is no fscking way for the compiler
> > > to figure out if this is in context of a peripheral bus, cross CPU
> > > memory or whatever. All those things have hardware dependend semantics
> > > and the only way to get them straight is to enforce the correct handling
> > > with handcrafted assembler code.
> >
> > This can work. The compiler CALLS the assembly code.
> > Nothing new here: see all the libgcc functions if you aren't
> > used to the idea of the compiler calling functions behind
> > your back.
> >
> > So we have assembly functions somewhat like this:
> >
> > __volatile_read(void*dst, void*src, size_t size);
> > __volatile_write(void*dst, void*src, size_t size);
> >
> > They probably have to look up the memory address to
> > determine if it belongs to a PCI device or not, etc.
> > For userspace code, they could even be system calls.
> >
> > Without that, gcc just isn't correct on normal hardware.
>
> 1. The volatile implementation of gcc is correct. The standard does not
> talk about busses, not even about SMP.

The standard need not. An implementation must deal
with whatever odd hardware happens to be in use.

> 2. Who is going to define what normal hardware is and what not ?
> A committee perhaps, that would at least guarantee that this feature is
> never implemented.

Oh please. Normal: the stuff we have problems with.
I could as well have said "gcc just isn't correct on any
PC currently sold by eMachines, HP, or Dell".

> > I'm not suggesting this is fast, of course. Probably the
> > right answer is something like this:
> >
> > -fvolatile=smp   # Add locks
> > -fvolatile=call   # Call custom functions
>
> What a great idea! I can imagine the necessary framework, where you need
> to register the address spaces and related access functions and make
> every access to such variables burn thousands of CPU cycles.

Damn right. This is the C standard requirement.
Not all code has Linux-like performance requirements,
and in any case, standards are standards.

Don't like the standard? Don't claim to support it.
Maybe you could even propose some improvements
and get them voted into the standard for next time.

> Again: The only thing volatile guarantees is that is does not optimize
> seemingly static variables. It reads back from memory.

"memory", yes, not just the CPU's queue of writes
and/or pending reads. Not just the nearest bus.

If memory is out over the PCI bus, then it sure isn't
a simple matter to make the access occur. Tough luck.

> Nothing else.
> It's a punctual disabling of optimizations. And this is not something
> restricted to gcc. There are _NO_ compilers which solve something else
> than this.

Perhaps the wording of the standard should be changed.
Until then, the compilers are clearly not following the
requirements of the standard. (unless perhaps they
document a "do not map PCI or similar" restriction)

> There is no reasonable automated solution for this problem, otherwise
> the compiler geeks would have implemented the
> --coded_with_brain_disabled option already.

For certain values of "reasonable", yes.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 13:57   ` Andi Kleen
@ 2006-07-08 19:13     ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08 19:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: joe.korty, linux-kernel, linux-os, khc, mingo, akpm, arjan



On Sat, 8 Jul 2006, Andi Kleen wrote:
> 
> How do you define reference? While you could do locked mem++ you can't
> do locked mem *= 2 (or any other non trivial operation that doesn't
> direct map to an memory operand x86 instruction which allows lock
> prefixes)

Actually, this is one reason why I don't like "volatile".

The compiler has exactly the same problems saying "what can you do", so if 
you do

	extern int a;
	extern volatile int b;

	void testfn(void)
	{
		a++;
		b++;
	}

the compiler will generally generate _worse_ code for "b++" than for 
"a++".

There's really no _reason_ not to do

	addl $1,a
	addl $1,b

for the above on x86 (or any other architecture that does memory 
operations), but the compiler will almost certainly be totally paranoid 
about the fact that "b" was marked volatile, and it won't dare combine 
_any_ operations on it, even though they are obviously equivalent.

So try the above on gcc, and you'll see something like this:

	testfn:
	        movl    b, %eax
	        addl    $1, a
	        addl    $1, %eax
	        movl    %eax, b
	        ret

where the "volatile" on b means that the code generated was 
_fundamentally_ more expensive, for absolutely zero gain.

Btw, this also shows another weakness with "volatile", which is very 
fundamental, and very much a problem: look where the read-vs-write of "b" 
takes place.

It _reads_ the old value of "b" before it reads and updates "a", and 
_writes_ the new value of "b" after it has updated "a".

Why? It's perfectly correct code as far as the compiler is concerned: the 
"volatile" was only on the "b", and accesses to "a" were not constrained, 
so the compiler is perfectly correct in moving the access to "a" around, 
since that one doesn't have any "side effects" as far as the definition of 
the C language is concerned.

Now, people should realize what this means when it comes to things like 
locking: even in the _absense_ of the CPU re-ordering accesses, and even 
in a single-threaded program with signals, there is some real re-ordering 
by the compiler going on.

> IMHO the "barrier model" used by Linux and implemented by gcc is much
> more useful for multithreaded programming, and MMIO needs other
> mechanisms anyways.
> 
> I wish the C standard would adopt barriers.

The problem ends up being that you really can have a _lot_ of different 
kinds of barriers.

For example, if you really want to implement spinlocks in some kinf of 
"smart C" without needing inline assembly, different architectures will 
want different barriers. In many architectures the barriers are not 
stand-alone things, but are tied to the accesses themselves.

That's _not_ just the x86 kind of "locked instructions", btw, and it's not 
even just an "ugly special case". Some locking models are _defined_ as 
having the barriers not be separate from the operations, and in fact, 
those locking models tend to be the BEST ONES.

So you either need to do barriers on a high level (ie "spinlock", "mutex" 
etc) or you need to be able to describe things on a very low level indeed. 
The low level is very hard to come to grips with, but the high level has 
the problem that different classes of programs need totally different 
kinds of high-level barriers, so it's almost impossible to do them as 
somethng that the compiler can generate directly.

For example, the kernel kind of spinlocks are basically useless in user 
space, because signals (the equivalent of an "interrupt" in user space) 
are hard to block, so "spin_lock_irq()" is hard. Also, since there is some 
external entity that does scheduling, the "I'm holding a spinlock, don't 
schedule me away" kind of thing doesn't work. In the kernel, we just 
increment the preemption count. In user space, you need to do something 
different.

So the power of C is really that it makes the easy things easy, and the 
hard things are _possible_ using valid ways to break the model. Notably, 
you have type-casts to break the type model (and hey, many of the things 
you can do when you break the type model may not be portable any more), 
and you have inline assembly when you need to break the "abstract CPU 
model".

Yeah, inline assembly is a "cop-out" in the sense that it's just admitting 
that the language doesn't cover every possibility. But hey, _no_ language 
can cover every single possibility, so the fact that C _does_ have a 
cop-out is actually one of its biggest strengths. Many other languages 
don't have that capability at all, and are strictly weaker as a result.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:39   ` Nick Piggin
@ 2006-07-08 18:53     ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08 18:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Albert Cahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan



On Sat, 8 Jul 2006, Nick Piggin wrote:
> 
> The volatile casting in atomic_* and *_bit seems to be a good idea
> (now that I think about it) [1].
> 
> Because if you had a barrier there, you'd have to reload everything
> used after an atomic_read or set_bit, etc.

Yes and no. The problem _there_ is that we use the wrong inline asm 
constraints.

The "+m" constraint is basically undocumented, and while I think it has 
existed _internally_ in gcc for a long time (gcc inline asms are actually 
fairly closely related to the internal gcc code generation templates, but 
the internal templates generally have capabilities that the inline asms 
don't have), I don't think it has been actually supported for inline asms 
all that time.

But "+m" means "I will both read and write from this memory location", 
which is exactly what we want to have for things like atomic_add() and 
friends.

However, because it is badly documented, and because it didn't even exist 
long ago, we have lots of code (and lots of people) that doesn't even 
_know_ about "+m". So we have code that fakes out the "both read and 
write" part by marking things either volatile, or doing 

	...
	:"=m" (*ptr)
	:"m" (*ptr)
	...

in the constraints (or, in many cases, both).

> [1] Even though I still can't tell the exact semantics of these
>     operations eg. why do we need volatile at all? why do we have
>     volatile in the double underscore (non-atomic) versions?

So we tend to have "volatile" for a couple of different reasons:

 - the above kind of "we couldn't tell the inline assembly well enough 
   what the instruction actually _does_, so we just tell gcc to not mess 
   with it".

   This _generally_ should be replaced with using "+m", so that gcc just 
   knows that we both read and write to the location, and that allows gcc 
   to generate the best possible code, while still generating _correct_ 
   code because gcc knows what is going on and doesn't think the write 
   totally clobbers the old value.

 - many functions are used with random data, and if the caller has a 
   "volatile long array[]" (bad - but hey, it happens), then a function 
   that _acts_ on that array, like the bitops functions, need to take a
   an argument like "volatile long *".

   So for example, "test_bit()", which can act both on volatile arrays 
   _and_ on const arrays, will look like

	int test_bit(int nr, const volatile void * addr);

   which some people think is strange ("Both 'const' _and_ 'volatile'? 
   Isn't that a contradiction in terms?"), but the fact is, it reflects 
   the different callers, not necessarily test_bit() itself.

 - some functions actually really want the volatile access. The x86 IO 
   functions are the best actual example of this:

	static inline unsigned int readl(const volatile void __iomem *addr)
	{
		return *(volatile unsigned int __force *) addr;
	}

   which actually has a _combination_ of the above reason (the incoming 
   argument is already marked "volatile" just because the _caller_ may 
   have marked it that way) and the cast to volatile would be there 
   _regardless_ of the calling convention "volatile", because in this case 
   we actually use it as a way to avoid inline assembly (which a number of
   other architectures need to do, and which x86 too needs to do for the 
   PIO accesses, but we can avoid it in this case)

So those are all real reasons to use volatile, although the first one is 
obviously a case where we no longer should (but at least we have 
reasonably good historical reasons for why we did).

The thing to note that is all of the above reasons are basically 
"volatile" markers on the _code_. We haven't really marked any _data_ 
volatile, we're just saying that certain _code_ will want to act on 
the data in a certain way.

And I think that's generally a sign of "good" use of volatile - it's 
making it obvious that certain specific _use_ of a data may have certain 
rules. 

As mentioned, there is _one_ case where it is valid to use "volatile" on 
real data: it's when you have a "I don't care about locking, and I'm not 
accessign IO memory or something else that may need special care" 
situation.

In the kernel, that _one_ special case used to basically be the "jiffies" 
counter. There's nothing to "lock" there - it just keeps ticking. And it's 
still obviously normal memory, so there's no question about any special 
rules for accesses. And there are no SMP memory ordering issues for 
reading it (for the low bits), since the "jiffies" counter is not really 
tied to anything else, so there are no larger "coherency" rules either.

So in that ONE case, "volatile" is actually fine. We really don't care if 
we read the old value or the new value when it changes, and there's no 
reason to try to synchronize with anythign else. 

There _may_ be some other cases where that would be true, but quite 
frankly, I can't think of any. If the CPU were to have a built-in random 
number generator mapped into memory, that would fall under the same kind 
of rules, but that's basically it.

One final word: in user space, because of how signal handlers work, 
"volatile" can still make sense for exactly the same reasons that 
"jiffies" makes sense in the kernel. You may, for example, have a signal 
handler that updates some flag in memory, and that would basically look 
exactly like the "jiffies" case for your program. 

(In fact, because signals are very expensive to block, you may have more 
of a reason to use a "jiffies" like flag in user space than you have in 
kernel. In the kernel, you'd tend to use a spinlock to protect things. In 
user space, with signals, you may have to use some non-locking algorithm, 
where the generation count etc migth well look like "jiffies").

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 15:49     ` Albert Cahalan
@ 2006-07-08 18:31       ` Thomas Gleixner
  2006-07-08 19:33         ` Albert Cahalan
  0 siblings, 1 reply; 119+ messages in thread
From: Thomas Gleixner @ 2006-07-08 18:31 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: joe.korty, linux-kernel, Linus Torvalds, linux-os, khc, mingo,
	akpm, arjan

On Sat, 2006-07-08 at 11:49 -0400, Albert Cahalan wrote:
> On 7/8/06, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Sat, 2006-07-08 at 05:45 -0400, Joe Korty wrote:
> > > On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > > > That's all theoretical though. Today, gcc's volatile does
> > > > not follow the C standard on modern hardware. Bummer.
> > > > It'd be low-performance anyway though.
> > >
> > > But gcc would follow the standard if it emitted a 'lock'
> > > insn on every volatile reference.  It should at least
> > > have an option to do that.
> 
> That would do for x86 without MMIO.

Great. And it still is not following the standard. The standard tells
nothing about PCI, MMIO or whatever. volatile is _NOT_ about
serialization.

> > volatile works fine on trivial microcontrollers and for the basic C
> > course lesson, but there is no way for the compiler to decide which of
> > the 'lock' mechanisms should be used in complex situations.
> >
> > In low level system programming there is no fscking way for the compiler
> > to figure out if this is in context of a peripheral bus, cross CPU
> > memory or whatever. All those things have hardware dependend semantics
> > and the only way to get them straight is to enforce the correct handling
> > with handcrafted assembler code.
> 
> This can work. The compiler CALLS the assembly code.
> Nothing new here: see all the libgcc functions if you aren't
> used to the idea of the compiler calling functions behind
> your back.
> 
> So we have assembly functions somewhat like this:
> 
> __volatile_read(void*dst, void*src, size_t size);
> __volatile_write(void*dst, void*src, size_t size);
> 
> They probably have to look up the memory address to
> determine if it belongs to a PCI device or not, etc.
> For userspace code, they could even be system calls.
> 
> Without that, gcc just isn't correct on normal hardware.

1. The volatile implementation of gcc is correct. The standard does not
talk about busses, not even about SMP.

2. Who is going to define what normal hardware is and what not ?
A committee perhaps, that would at least guarantee that this feature is
never implemented.

> I'm not suggesting this is fast, of course. Probably the
> right answer is something like this:
> 
> -fvolatile=smp   # Add locks
> -fvolatile=call   # Call custom functions

What a great idea! I can imagine the necessary framework, where you need
to register the address spaces and related access functions and make
every access to such variables burn thousands of CPU cycles.

Again: The only thing volatile guarantees is that is does not optimize
seemingly static variables. It reads back from memory. Nothing else.
It's a punctual disabling of optimizations. And this is not something
restricted to gcc. There are _NO_ compilers which solve something else
than this.

There is no reasonable automated solution for this problem, otherwise
the compiler geeks would have implemented the
--coded_with_brain_disabled option already.

	tglx



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:12 trajce nedev
  2006-07-08  6:19 ` Chase Venters
@ 2006-07-08 18:23 ` Linus Torvalds
  1 sibling, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08 18:23 UTC (permalink / raw)
  To: trajce nedev; +Cc: acahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan



On Fri, 7 Jul 2006, trajce nedev wrote:
> 
> Incorrect.  I haven't been following this thread very closely [...]

Right. And maybe you should have followed it a bit more closely.

We're not talking about "asm volatile", which is a totally different use 
of the same word. 

We're not talking about pointers to volatile as arguments, which can be 
required for a generic function to not complain about it's argument types.

We're not even talking about code like

	#define writel(data, offset) \
		*(volatile int *)(offset) = (data)

which is perfectly fine on some architectures (but realize that on other 
archtiectures, you may need a _lot_ more than a single memory access to do 
an IO write, so if you don't abstract it like the above, you're broken by 
design.

In short, we're not talking about "volatile" in _code_. That's usually 
fine. We're talkign about "volatile" on data. IT'S WRONG.

Btw, your spinlock (that uses "volatile") is _totally_ and _utterly_ 
broken, exactly because it doesn't take things like memory ordering into 
account. In other words, your spinlock WON'T WORK. It won't actually 
protect the data accesses you have inside the spinlock.

Which proves my point: people who think that "volatile" is good are 
usually ignorant about the real needs of the code. To do a spinlock on 
_any_ modern CPU, you need inline assembly. End of story. You need it to 
make sure that you have told the CPU the right ordering constraints, 
something that "volatile" simply does not (and _can_not) do.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 10:24   ` Thomas Gleixner
@ 2006-07-08 15:49     ` Albert Cahalan
  2006-07-08 18:31       ` Thomas Gleixner
  0 siblings, 1 reply; 119+ messages in thread
From: Albert Cahalan @ 2006-07-08 15:49 UTC (permalink / raw)
  To: tglx
  Cc: joe.korty, linux-kernel, Linus Torvalds, linux-os, khc, mingo,
	akpm, arjan

On 7/8/06, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 2006-07-08 at 05:45 -0400, Joe Korty wrote:
> > On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > > That's all theoretical though. Today, gcc's volatile does
> > > not follow the C standard on modern hardware. Bummer.
> > > It'd be low-performance anyway though.
> >
> > But gcc would follow the standard if it emitted a 'lock'
> > insn on every volatile reference.  It should at least
> > have an option to do that.

That would do for x86 without MMIO.

> volatile works fine on trivial microcontrollers and for the basic C
> course lesson, but there is no way for the compiler to decide which of
> the 'lock' mechanisms should be used in complex situations.
>
> In low level system programming there is no fscking way for the compiler
> to figure out if this is in context of a peripheral bus, cross CPU
> memory or whatever. All those things have hardware dependend semantics
> and the only way to get them straight is to enforce the correct handling
> with handcrafted assembler code.

This can work. The compiler CALLS the assembly code.
Nothing new here: see all the libgcc functions if you aren't
used to the idea of the compiler calling functions behind
your back.

So we have assembly functions somewhat like this:

__volatile_read(void*dst, void*src, size_t size);
__volatile_write(void*dst, void*src, size_t size);

They probably have to look up the memory address to
determine if it belongs to a PCI device or not, etc.
For userspace code, they could even be system calls.

Without that, gcc just isn't correct on normal hardware.

I'm not suggesting this is fast, of course. Probably the
right answer is something like this:

-fvolatile=smp   # Add locks
-fvolatile=call   # Call custom functions

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:45 ` Joe Korty
                     ` (2 preceding siblings ...)
  2006-07-08 10:24   ` Thomas Gleixner
@ 2006-07-08 13:57   ` Andi Kleen
  2006-07-08 19:13     ` Linus Torvalds
  3 siblings, 1 reply; 119+ messages in thread
From: Andi Kleen @ 2006-07-08 13:57 UTC (permalink / raw)
  To: joe.korty; +Cc: linux-kernel, Linus Torvalds, linux-os, khc, mingo, akpm, arjan

Joe Korty <joe.korty@ccur.com> writes:

> On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > That's all theoretical though. Today, gcc's volatile does
> > not follow the C standard on modern hardware. Bummer.
> > It'd be low-performance anyway though.
> 
> But gcc would follow the standard if it emitted a 'lock'
> insn on every volatile reference.  It should at least
> have an option to do that.

How do you define reference? While you could do locked mem++ you can't
do locked mem *= 2 (or any other non trivial operation that doesn't
direct map to an memory operand x86 instruction which allows lock
prefixes)

IMHO the "barrier model" used by Linux and implemented by gcc is much
more useful for multithreaded programming, and MMIO needs other
mechanisms anyways.

I wish the C standard would adopt barriers.

-Andi

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:45   ` trajce nedev
  2006-07-08  6:58     ` Arjan van de Ven
  2006-07-08  7:02     ` Vadim Lobanov
@ 2006-07-08 13:46     ` Chase Venters
  2006-07-09  4:39       ` Benjamin Herrenschmidt
  2006-07-14  3:30       ` Steven Rostedt
  2 siblings, 2 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-08 13:46 UTC (permalink / raw)
  To: trajce nedev
  Cc: torvalds, acahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan

On Saturday 08 July 2006 01:44, trajce nedev wrote:
> On Sat, 8 Jul 2006, Chase Venters wrote:
> >Perhaps you should have followed this thread closely before composing your
> >assault on Linus. We're not talking about "asm volatile". We're talking
> >about
> >the "volatile" keyword as applied to variables. 'volatile' as applied to
> >inline ASM is of course necessary in many cases -- no one is disputing
> >that.
>
> Ok, let's port a spinlock macro that spins instead of context switches
> instead of using the pthread garbage on IA64 or AMD64:
>
> #if ((defined (_M_IA64) || defined (_M_AMD64)) && !defined(NT_INTEREX))
> #include <windows.h>
> #pragma intrinsic (_InterlockedExchange)
>
> typedef volatile LONG lock_t[1];
>
> #define LockInit(v)	((v)[0] = 0)
> #define LockFree(v)	((v)[0] = 0)
> #define Unlock(v)	((v)[0] = 0)
>
> __forceinline void Lock(volatile LONG *hPtr)
> {
> 	int iValue;
>
> 	for (;;) {
> 		iValue = _InterlockedExchange((LPLONG)hPtr, 1);
> 		if (iValue == 0)
> 			return;
> 		while (*hPtr);
> 	}
> }
>
> Please show me how I can write this to spinlock without using volatile.

Please show me how that lock is safe without a compiler memory barrier! What's 
to stop your compiler from moving loads and stores across your inlined lock 
code?

When you add the missing compiler memory barrier, the "volatile" classifier 
becomes unnecessary.

Actually, please just read the thread. We've been over this already. It's 
starting to get really old.

> 		Trajce Nedev
> 		tnedev@mail.ru

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 10:00                                       ` Krzysztof Halasa
@ 2006-07-08 13:41                                         ` Chase Venters
  2006-07-08 20:09                                           ` Krzysztof Halasa
  0 siblings, 1 reply; 119+ messages in thread
From: Chase Venters @ 2006-07-08 13:41 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

On Saturday 08 July 2006 04:59, Krzysztof Halasa wrote:
> Chase Venters <chase.venters@clientec.com> writes:
> > Locks are supposed to be syncronization points, which is why they
> > ALREADY HAVE "memory" on the clobber list! "memory" IS NECESSARY. The
> > fact that "=m" is changing to "+m" in Linus's patches is because "=m"
> > is in fact insufficient, because it would let the compiler believe
> > we're just going to over-write the value. "volatile" merely hides that
> > bug -- once that bug is _fixed_ (by going to "+m"), "volatile" is no
> > longer useful.
>
> This is a completely different story. "volatile", barrier() and "+m"/"=m"
> aren't sync points. If the variable access isn't atomic you must use
> locking even with volatiles, barriers etc.

You're mincing my words. The reason "memory" is on the clobber list is because 
a lock is supposed to synchronize all memory accesses up to that point. It's 
a fence / a barrier, because if the compiler re-orders your loads/stores 
across the lock, you're in trouble. That's exactly what I was pointing out. 

> > If "volatile" is in use elsewhere (other than locks), it's still
> > probably wrong. In these cases, you can use a barrier, a volatile
> > cast, or an inline asm with a specific clobber.
>
> A volatile cast is just a volatile, moved from data declaration to
> all access points. It doesn't buy you anything.
> barrier() is basically "all-volatile". All of them operate on the same,
> compiler level.

A volatile cast lets you prevent the compiler from always treating the 
variable as volatile.

> If the "volatile" is used the wrong way (which is probably true for most
> cases), then volatile cast and barrier() will be wrong as well. You need
> locks or atomic access, meaningful on hardware level.

No. Linus already described what some example invalid uses of "volatile" are. 
One example is the very one this whole thread is about - using 'volatile' on 
the declaration of the spinlock counter. That usage is _wrong_, and barrier() 
would not be. (Of course, it doesn't directly apply here, because barrier() 
already existed, by necessity, due to the "memory" clobber. And the lock's 
not done in pure C.)

Volatile originally existed to tell the compiler a variable could change at 
will. Because of reordering, it's almost never sufficient with our modern 
compilers and CPUs. That's precisely where barrier() (and/or its hardware 
equivalents) help in places where 'volatile' is wrong. Your statement is 
additionally wrong because one use-case of memory barriers is to safely write 
lock-free code.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  3:54 Albert Cahalan
  2006-07-08  5:25 ` Linus Torvalds
  2006-07-08  9:45 ` Joe Korty
@ 2006-07-08 10:45 ` Krzysztof Halasa
  2 siblings, 0 replies; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-08 10:45 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel, Linus Torvalds, linux-os, mingo, akpm, arjan

"Albert Cahalan" <acahalan@gmail.com> writes:

> That's all theoretical though. Today, gcc's volatile does
> not follow the C standard on modern hardware. Bummer.
> It'd be low-performance anyway though.

I think gcc's volatile does in fact follow the standard. The standard
simply doesn't say anything about hardware accesses, SMP, PCI etc.
You must know what you're doing, and for some things you need a bit
of assembler.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08 10:18                                     ` Avi Kivity
@ 2006-07-08 10:28                                       ` Thomas Gleixner
  2006-07-09  4:19                                       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 119+ messages in thread
From: Thomas Gleixner @ 2006-07-08 10:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Arjan van de Ven, Linus Torvalds, Mark Lord,
	linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

On Sat, 2006-07-08 at 13:18 +0300, Avi Kivity wrote:

> I didn't suggest the compiler could or should do it, just that it would 
> be possible (for the _user_) to write portable ISO C code to access PCI 
> mmio registers, if volatile's implementation serialized access.

And how is this portable on complex multibus architectures, where the
PCI access goes through a couple of transports before hitting the PCI
hardware ?

volatile has no notion of serialization. volatile guarantees that the
compiler does not optimize and cache seemingly static values, i.e. it
disables compiler optimizations.

	tglx



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:45 ` Joe Korty
  2006-07-08  9:52   ` Arjan van de Ven
  2006-07-08  9:59   ` David Schwartz
@ 2006-07-08 10:24   ` Thomas Gleixner
  2006-07-08 15:49     ` Albert Cahalan
  2006-07-08 13:57   ` Andi Kleen
  3 siblings, 1 reply; 119+ messages in thread
From: Thomas Gleixner @ 2006-07-08 10:24 UTC (permalink / raw)
  To: joe.korty
  Cc: Albert Cahalan, linux-kernel, Linus Torvalds, linux-os, khc,
	mingo, akpm, arjan

On Sat, 2006-07-08 at 05:45 -0400, Joe Korty wrote:
> On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > That's all theoretical though. Today, gcc's volatile does
> > not follow the C standard on modern hardware. Bummer.
> > It'd be low-performance anyway though.
> 
> But gcc would follow the standard if it emitted a 'lock'
> insn on every volatile reference.  It should at least
> have an option to do that.

Wrong. The only thing which is guaranteed by "volatile" according to the
standard is that the compiler does not optimize and cache seemingly
static values, which is just sufficient for the usual C for dummies
example: 

	while(stop == 0);

volatile works fine on trivial microcontrollers and for the basic C
course lesson, but there is no way for the compiler to decide which of
the 'lock' mechanisms should be used in complex situations.

In low level system programming there is no fscking way for the compiler
to figure out if this is in context of a peripheral bus, cross CPU
memory or whatever. All those things have hardware dependend semantics
and the only way to get them straight is to enforce the correct handling
with handcrafted assembler code.

	tglx



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:51                                   ` Arjan van de Ven
@ 2006-07-08 10:18                                     ` Avi Kivity
  2006-07-08 10:28                                       ` Thomas Gleixner
  2006-07-09  4:19                                       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 119+ messages in thread
From: Avi Kivity @ 2006-07-08 10:18 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Mark Lord, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Arjan van de Ven wrote:
>
> >
> > > with PCI, and the PCI posting rules, there is no "one" serializing
> > > instruction, you need to know the specifics of the device in 
> question to
> > > cause the flush. So at least there is no universal possible
> > > implementation of volatile as you suggest ;-)
> > >
> >
> > A serializing volatile makes it possible to write portable code to
> > access pci mmio.  You'd just follow a write with a read or whatever the
> > rules say.
>
> yeah except that the compiler cannot know what to read; reading back the
> same memory location is NOT correct nor safe. It's device specific, for
> some devices it'll be safe, for others you have to read some OTHER
> location.
>

I didn't suggest the compiler could or should do it, just that it would 
be possible (for the _user_) to write portable ISO C code to access PCI 
mmio registers, if volatile's implementation serialized access.

With the current implementation of volatile in gcc, it is impossible - 
you need to resort to inline assembly for some architectures, which is 
not an ISO C feature.

And I'm not suggesting that it would be a good idea to use volatile even 
if it was corrected - it would have to take a worst-case approach and 
thus would generate very bad code.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:48                                     ` Chase Venters
@ 2006-07-08 10:00                                       ` Krzysztof Halasa
  2006-07-08 13:41                                         ` Chase Venters
  0 siblings, 1 reply; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-08 10:00 UTC (permalink / raw)
  To: Chase Venters
  Cc: linux-os \\(Dick Johnson\\),
	Linus Torvalds, Ingo Molnar, Andrew Morton, Linux kernel, arjan

Chase Venters <chase.venters@clientec.com> writes:

> Locks are supposed to be syncronization points, which is why they
> ALREADY HAVE "memory" on the clobber list! "memory" IS NECESSARY. The
> fact that "=m" is changing to "+m" in Linus's patches is because "=m"
> is in fact insufficient, because it would let the compiler believe
> we're just going to over-write the value. "volatile" merely hides that
> bug -- once that bug is _fixed_ (by going to "+m"), "volatile" is no
> longer useful.

This is a completely different story. "volatile", barrier() and "+m"/"=m"
aren't sync points. If the variable access isn't atomic you must use
locking even with volatiles, barriers etc.

> If "volatile" is in use elsewhere (other than locks), it's still
> probably wrong. In these cases, you can use a barrier, a volatile
> cast, or an inline asm with a specific clobber.

A volatile cast is just a volatile, moved from data declaration to
all access points. It doesn't buy you anything.
barrier() is basically "all-volatile". All of them operate on the same,
compiler level.

If the "volatile" is used the wrong way (which is probably true for most
cases), then volatile cast and barrier() will be wrong as well. You need
locks or atomic access, meaningful on hardware level.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* RE: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:45 ` Joe Korty
  2006-07-08  9:52   ` Arjan van de Ven
@ 2006-07-08  9:59   ` David Schwartz
  2006-07-08 10:24   ` Thomas Gleixner
  2006-07-08 13:57   ` Andi Kleen
  3 siblings, 0 replies; 119+ messages in thread
From: David Schwartz @ 2006-07-08  9:59 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org


> On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > That's all theoretical though. Today, gcc's volatile does
> > not follow the C standard on modern hardware. Bummer.
> > It'd be low-performance anyway though.

> But gcc would follow the standard if it emitted a 'lock'
> insn on every volatile reference.  It should at least
> have an option to do that.

	The premise is wrong, therefore so is the conclusion. GCC does follow the C
standard on modern hardware. The 'volatile' keyword imposes the minimum
amount of pain necessary to get the two guarantees that 'volatile' is
supposed to provide. It has no special semantics on multiple CPUs, and was
never intended to have any.

	The problem is that the C standard's definition of volatile is meaningless
on modern hardware (because there is no one clear true right place for
accesses to be visible, accesses occur in more than one place). All you can
do is make the cases where 'volatile has guaranteed semantics work.

	There would be absolutely no point in having GCC have an option to emit a
locked instruction on every volatile reference because there would still be
no way to know what the right thing to lock is. For example:

volatile int i;
i=i+1;

	Should this be a locked read, an increment in a register, followed by a
locked write? Or must this be a locked increment?

	If you cannot explain in platform-independent terms the *semantics* the
option is supposed to give, then it's probably wrong. What happens when the
next CPU requires something different to get the same effect?

	If you need atomic 32-bit loads, you know how to get them. If you need
atomic increments, you know how to get them. If you need memory optimization
barriers, you have them. Why ask for something new with vague,
poorly-defined semantics based on what 'lock' happens to do on current
processors? That just makes it easier to write non-portable code.

	Now if you had asked for portable atomic loads, increments, stores, and the
like, that would be a sensible thing. It makes sense because you define what
behavior that function is supposed to give.

	DS



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:45 ` Joe Korty
@ 2006-07-08  9:52   ` Arjan van de Ven
  2006-07-08  9:59   ` David Schwartz
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-08  9:52 UTC (permalink / raw)
  To: joe.korty
  Cc: Albert Cahalan, linux-kernel, Linus Torvalds, linux-os, khc, mingo, akpm

On Sat, 2006-07-08 at 05:45 -0400, Joe Korty wrote:
> On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> > That's all theoretical though. Today, gcc's volatile does
> > not follow the C standard on modern hardware. Bummer.
> > It'd be low-performance anyway though.
> 
> But gcc would follow the standard if it emitted a 'lock'
> insn on every volatile reference. 

nope that's not nearly enough for pci MMIO space for example, nor for
any of the weakly ordered architectures.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  9:20                                 ` Avi Kivity
@ 2006-07-08  9:51                                   ` Arjan van de Ven
  2006-07-08 10:18                                     ` Avi Kivity
  0 siblings, 1 reply; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-08  9:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, Mark Lord, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

On Sat, 2006-07-08 at 12:20 +0300, Avi Kivity wrote:
> Arjan van de Ven wrote:
> >
> > >
> > > It could be argued that gcc's implementation of volatile is wrong, and
> > > that gcc should add the appropriate serializing instructions before and
> > > after volatile accesses.
> > >
> > > Of course, that would make volatile even more suboptimal, but at least
> > > correct.
> >
> > with PCI, and the PCI posting rules, there is no "one" serializing
> > instruction, you need to know the specifics of the device in question to
> > cause the flush. So at least there is no universal possible
> > implementation of volatile as you suggest ;-)
> >
> 
> A serializing volatile makes it possible to write portable code to 
> access pci mmio.  You'd just follow a write with a read or whatever the 
> rules say.

yeah except that the compiler cannot know what to read; reading back the
same memory location is NOT correct nor safe. It's device specific, for
some devices it'll be safe, for others you have to read some OTHER
location.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  3:54 Albert Cahalan
  2006-07-08  5:25 ` Linus Torvalds
@ 2006-07-08  9:45 ` Joe Korty
  2006-07-08  9:52   ` Arjan van de Ven
                     ` (3 more replies)
  2006-07-08 10:45 ` Krzysztof Halasa
  2 siblings, 4 replies; 119+ messages in thread
From: Joe Korty @ 2006-07-08  9:45 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel, Linus Torvalds, linux-os, khc, mingo, akpm, arjan

On Fri, Jul 07, 2006 at 11:54:10PM -0400, Albert Cahalan wrote:
> That's all theoretical though. Today, gcc's volatile does
> not follow the C standard on modern hardware. Bummer.
> It'd be low-performance anyway though.

But gcc would follow the standard if it emitted a 'lock'
insn on every volatile reference.  It should at least
have an option to do that.

Joe

^ permalink raw reply	[flat|nested] 119+ messages in thread

* RE: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:37                                         ` J.A. Magallón
@ 2006-07-08  9:33                                           ` David Schwartz
  0 siblings, 0 replies; 119+ messages in thread
From: David Schwartz @ 2006-07-08  9:33 UTC (permalink / raw)
  To: jamagallon; +Cc: linux-kernel


> > This is _totally_ incorrect. Your "lock" functions are broken, because 
> > they do not introduce syncronization points or locked bus operations. 
 
> But why should I do that ? I write C, a high level language, and I don't
> mind about buses or whatever. I just have to know that a 32 bit store is
> atomic in a 32bit arch. There is a nice high level and portable language
> feature to say 'reload this variable here'. And it lets the 
> compiler do its
> optimization work as it best can asuming it has to reload that variable.
> Instead, you prefer to lock all the bus for that.
> Kernel developers spent a full release to get rid of
> the 'big kernel lock'. Perhaps there is another release needed to get
> rid of the 'big memory barrier'.

	Wow, no. Not even close.

	If you need a full memory barrier because you want general lock/mutex semantics, then you need it and can't avoid it. If you just need atomic 32-bit operations, and you know that the platform has them, just use them directly with inline assembly.

	The proof that 'volatile' does not solve the problem is:

volatile int i;
i=i+1;

	Is this an atomic operation or not? Just because the platform has an atomic operation that will do this, does 'volatile' guarantee that I get it?

	If you are programming in a high-level language and need the atomicity guarantees that particular assembly instructions are known to give you, then you *must* specify those instructions. The 'volatile' keyword has *never* guaranteed atomicity even where such atomicity is possible.

	So both parts of your argument is wrong. The alternative to 'volatile' is not invalidating memory, we only invalidate memory when those are the specific semantics we need. And volatile is neither necessary nor sufficient to get atomicity when that is possible on the platform.

	You are actively advocating for coding practices that are known (from years of painful experience) to be disastrous.

> BTW, I really don't mind if a given architecnture has to lock the bus or
> say a prayer to Budha to reload a variable. I want it to be reloaded at
> every (or a certain, in case of a (volatile)mtx cast) usage. The compiler
> is the responsible of knowing what to do. What if nextgen P4 Xeon do not
> need a bus lock ? Will you rewrite the kernel ?

	What if the nextgen P4 Xeon needs something else in addition to what's needed to get the guaranteed semantics of 'volatile' (which are just signals and longjmp on a single CPU). Suppose it required something very expensive when accesses might occur on another CPU that wasn't needed for any of the defined uses of 'volatile'. Why would gcc provide that and slow down all the programs that use 'volatile' correctly?

	The whole problem with your use of 'volatile' that it's in the 'just happens to work' category. It just happens to work because what's needed for signals and longjmp is also what's needed for SMP. If something extre were needed for SMP that wasn't needed for the defined uses of 'volatile', you'd be screwed. That's why 'volatile' does *not* put in memory barriers by default, even though they're arguably needed if 'volatile' were supposed to prevent reordering of operations (which for locks is what we need!).

	DS



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  8:51                               ` Arjan van de Ven
@ 2006-07-08  9:20                                 ` Avi Kivity
  2006-07-08  9:51                                   ` Arjan van de Ven
  0 siblings, 1 reply; 119+ messages in thread
From: Avi Kivity @ 2006-07-08  9:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Mark Lord, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Arjan van de Ven wrote:
>
> >
> > It could be argued that gcc's implementation of volatile is wrong, and
> > that gcc should add the appropriate serializing instructions before and
> > after volatile accesses.
> >
> > Of course, that would make volatile even more suboptimal, but at least
> > correct.
>
> with PCI, and the PCI posting rules, there is no "one" serializing
> instruction, you need to know the specifics of the device in question to
> cause the flush. So at least there is no universal possible
> implementation of volatile as you suggest ;-)
>

A serializing volatile makes it possible to write portable code to 
access pci mmio.  You'd just follow a write with a read or whatever the 
rules say.

Of course, it would still generate horrible code, and would still be 
unable to express notions like atomic accesses, so there is not much 
point in it.

One point which was raised, is that optimization barriers also somewhat 
pessimize the code. I wonder how useful a partial memory clobber could be:

  #define spin_lock_data(lock, lock_protected_data...) \
      do { __spin_lock(lock); asm volatile ( "" : : : 
"memory"(lock_protected_data) ); } while(0)

Where __spin_lock has a partial memory clobber only on the lock variable 
itself.

It would take a lot of work, but it can eliminate the instructions to 
save and reload registers.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  8:40                             ` Avi Kivity
@ 2006-07-08  8:51                               ` Arjan van de Ven
  2006-07-08  9:20                                 ` Avi Kivity
  0 siblings, 1 reply; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-08  8:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, Mark Lord, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

On Sat, 2006-07-08 at 11:40 +0300, Avi Kivity wrote:
> Linus Torvalds wrote:
> > On Thu, 6 Jul 2006, Mark Lord wrote:
> >
> > >
> > > I'm still browsing a copy here, but so far have only really found this:
> > >
> > >  A volatile declaration may be used to describe an object corresponding
> > >  to a memory-mapped input/output port or an object accessed by an
> > >  aysnchronously interrupting function.  Actions on objects so declared
> > >  shall not be "optimized out" by an implementation or reordered except
> > >  as permitted by the rules for evaluating expressions.
> >
> > Note that the "reordered" is totally pointless.
> >
> > The _hardware_ will re-order accesses. Which is the whole point.
> > "volatile" is basically never sufficient in itself.
> >
> > So the definition of "volatile" literally made sense three or four 
> > decades
> > ago. It's not sensible any more.            
> >
> 
> It could be argued that gcc's implementation of volatile is wrong, and 
> that gcc should add the appropriate serializing instructions before and 
> after volatile accesses.
> 
> Of course, that would make volatile even more suboptimal, but at least 
> correct.

with PCI, and the PCI posting rules, there is no "one" serializing
instruction, you need to know the specifics of the device in question to
cause the flush. So at least there is no universal possible
implementation of volatile as you suggest ;-)



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:15                           ` Linus Torvalds
  2006-07-06 19:33                             ` Chris Friesen
@ 2006-07-08  8:40                             ` Avi Kivity
  2006-07-08  8:51                               ` Arjan van de Ven
  1 sibling, 1 reply; 119+ messages in thread
From: Avi Kivity @ 2006-07-08  8:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Linus Torvalds wrote:
> On Thu, 6 Jul 2006, Mark Lord wrote:
>
> >
> > I'm still browsing a copy here, but so far have only really found this:
> >
> >  A volatile declaration may be used to describe an object corresponding
> >  to a memory-mapped input/output port or an object accessed by an
> >  aysnchronously interrupting function.  Actions on objects so declared
> >  shall not be "optimized out" by an implementation or reordered except
> >  as permitted by the rules for evaluating expressions.
>
> Note that the "reordered" is totally pointless.
>
> The _hardware_ will re-order accesses. Which is the whole point.
> "volatile" is basically never sufficient in itself.
>
> So the definition of "volatile" literally made sense three or four 
> decades
> ago. It's not sensible any more.            
>

It could be argued that gcc's implementation of volatile is wrong, and 
that gcc should add the appropriate serializing instructions before and 
after volatile accesses.

Of course, that would make volatile even more suboptimal, but at least 
correct.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 19:51                               ` linux-os (Dick Johnson)
                                                   ` (2 preceding siblings ...)
  2006-07-07 23:06                                 ` Björn Steinbrink
@ 2006-07-08  8:36                                 ` Avi Kivity
  3 siblings, 0 replies; 119+ messages in thread
From: Avi Kivity @ 2006-07-08  8:36 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Krzysztof Halasa, Linus Torvalds, Ingo Molnar, Andrew Morton,
	linux-kernel, arjan

linux-os (Dick Johnson) wrote:
>
> >> The 'C' compiler has no choice but to actually make that memory access
> >> and read the variable because the variable is in another module 
> (a.k.a.
> >> file).
>

[code showing the compiler may cache the access]

> So read the code; you have "10:   jne 10", jumping to itself
> forever, without even doing anything else to set the flags, much
> less reading a variable.
>

Short attention span, eh? He's proven you wrong and you go on and talk 
about something else.

> >
> >> However, if I have the same code, but the variable is visible during
> >> compile time, i.e.,
> >>
> >> int spinner=0;
> >>
> >> funct(){
> >>      while(spinner)
> >>          ;
> >>
> >> ... the compiler may eliminate that code altogether because it
> >> 'knows' that spinner is FALSE, having initialized it to zero
> >> itself.
> >
>

[code showing that defining the variable in the same translation unit 
makes no difference]

> Then, you have exactly the same thing here:
>    10:   75 fe                   jne    10 <funct+0x10>
>
> Same bad code.
>

You seem to have forgotten that you claimed different code would be 
generated.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:09                                     ` Ingo Molnar
@ 2006-07-08  7:36                                       ` Ingo Molnar
  0 siblings, 0 replies; 119+ messages in thread
From: Ingo Molnar @ 2006-07-08  7:36 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Linus Torvalds, Krzysztof Halasa, Andrew Morton, Linux kernel, arjan


* Ingo Molnar <mingo@elte.hu> wrote:

> > [...] In fact, your spin-lock code already inserts "rep nops" and I 
> > never implied that they should be removed. I said only that "volatile" 
> > still needs to be used, not some macro that tells the compiler that 
> > everything in memory probably got trashed. [...]
> 
> your position here does seem to make much sense to me, so please help me 
                         ^--- not
> understand it. You suggest that the assembly code should be left alone. 
> But then why do you need the volatile keyword to begin with?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:45   ` trajce nedev
  2006-07-08  6:58     ` Arjan van de Ven
@ 2006-07-08  7:02     ` Vadim Lobanov
  2006-07-08 13:46     ` Chase Venters
  2 siblings, 0 replies; 119+ messages in thread
From: Vadim Lobanov @ 2006-07-08  7:02 UTC (permalink / raw)
  To: trajce nedev
  Cc: chase.venters, torvalds, acahalan, linux-kernel, linux-os, khc,
	mingo, akpm, arjan

On Fri, 7 Jul 2006, trajce nedev wrote:

> On Sat, 8 Jul 2006, Chase Venters wrote:
> >
> >Perhaps you should have followed this thread closely before composing your
> >assault on Linus. We're not talking about "asm volatile". We're talking
> >about
> >the "volatile" keyword as applied to variables. 'volatile' as applied to
> >inline ASM is of course necessary in many cases -- no one is disputing
> >that.
> >
>
> Ok, let's port a spinlock macro that spins instead of context switches
> instead of using the pthread garbage on IA64 or AMD64:
>
> #if ((defined (_M_IA64) || defined (_M_AMD64)) && !defined(NT_INTEREX))
> #include <windows.h>
> #pragma intrinsic (_InterlockedExchange)
>
> typedef volatile LONG lock_t[1];
>
> #define LockInit(v)	((v)[0] = 0)
> #define LockFree(v)	((v)[0] = 0)
> #define Unlock(v)	((v)[0] = 0)
>
> __forceinline void Lock(volatile LONG *hPtr)
> {
> 	int iValue;
>
> 	for (;;) {
> 		iValue = _InterlockedExchange((LPLONG)hPtr, 1);
> 		if (iValue == 0)
> 			return;
> 		while (*hPtr);
> 	}
> }
>
> Please show me how I can write this to spinlock without using volatile.

See how Linux implements them. After Linus's patch, there will be no
volatile data qualifiers. In general, if you're concerned about the
compiler turning
	while (*hPtr);
into an infinite loop, then you should look at compiler barriers
(amongst other things). To find details, search for "barrier()".

> 		Trajce Nedev
> 		tnedev@mail.ru

-- Vadim Lobanov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:45   ` trajce nedev
@ 2006-07-08  6:58     ` Arjan van de Ven
  2006-07-08  7:02     ` Vadim Lobanov
  2006-07-08 13:46     ` Chase Venters
  2 siblings, 0 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-08  6:58 UTC (permalink / raw)
  To: trajce nedev
  Cc: chase.venters, torvalds, acahalan, linux-kernel, linux-os, khc,
	mingo, akpm


> 
> __forceinline void Lock(volatile LONG *hPtr)
> {
> 	int iValue;
> 
> 	for (;;) {
> 		iValue = _InterlockedExchange((LPLONG)hPtr, 1);
> 		if (iValue == 0)
> 			return;
> 		while (*hPtr);
> 	}
> }
> 
> Please show me how I can write this to spinlock without using volatile.

this code is broken, at the very minimum that while (*hPtr); needs to be
"while (*hPtr) cpu_relax();" for hardware reasons.
At which point you can drop the volatile entirely without any change.




^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:19 ` Chase Venters
@ 2006-07-08  6:45   ` trajce nedev
  2006-07-08  6:58     ` Arjan van de Ven
                       ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: trajce nedev @ 2006-07-08  6:45 UTC (permalink / raw)
  To: chase.venters
  Cc: torvalds, acahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan

On Sat, 8 Jul 2006, Chase Venters wrote:
>
>Perhaps you should have followed this thread closely before composing your
>assault on Linus. We're not talking about "asm volatile". We're talking 
>about
>the "volatile" keyword as applied to variables. 'volatile' as applied to
>inline ASM is of course necessary in many cases -- no one is disputing 
>that.
>

Ok, let's port a spinlock macro that spins instead of context switches 
instead of using the pthread garbage on IA64 or AMD64:

#if ((defined (_M_IA64) || defined (_M_AMD64)) && !defined(NT_INTEREX))
#include <windows.h>
#pragma intrinsic (_InterlockedExchange)

typedef volatile LONG lock_t[1];

#define LockInit(v)	((v)[0] = 0)
#define LockFree(v)	((v)[0] = 0)
#define Unlock(v)	((v)[0] = 0)

__forceinline void Lock(volatile LONG *hPtr)
{
	int iValue;

	for (;;) {
		iValue = _InterlockedExchange((LPLONG)hPtr, 1);
		if (iValue == 0)
			return;
		while (*hPtr);
	}
}

Please show me how I can write this to spinlock without using volatile.

		Trajce Nedev
		tnedev@mail.ru

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  5:25 ` Linus Torvalds
@ 2006-07-08  6:39   ` Nick Piggin
  2006-07-08 18:53     ` Linus Torvalds
  0 siblings, 1 reply; 119+ messages in thread
From: Nick Piggin @ 2006-07-08  6:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Albert Cahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan

Linus Torvalds wrote:

> It's not that "volatile" is the "portable way". It's that "volatile" is 
> fundamentally not sufficient for the job.

However it does seem to be good for some things, as you say.

The volatile casting in atomic_* and *_bit seems to be a good idea
(now that I think about it) [1].

Because if you had a barrier there, you'd have to reload everything
used after an atomic_read or set_bit, etc.

But it might be nice to wrap that in something rather than use
volatile directly. force_reload(wordptr); maybe?

[1] Even though I still can't tell the exact semantics of these
     operations eg. why do we need volatile at all? why do we have
     volatile in the double underscore (non-atomic) versions?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  6:12 trajce nedev
@ 2006-07-08  6:19 ` Chase Venters
  2006-07-08  6:45   ` trajce nedev
  2006-07-08 18:23 ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Chase Venters @ 2006-07-08  6:19 UTC (permalink / raw)
  To: trajce nedev
  Cc: torvalds, acahalan, linux-kernel, linux-os, khc, mingo, akpm, arjan

On Saturday 08 July 2006 01:12, trajce nedev wrote:
> On Fri, 7 Jul 2006, Linus Torvalds wrote:
> >No.
> >
> >"volatile" simply CANNOT get the job done. It fundamentally does _nothing_
> >for all the issues that are fundamental today: CPU memory ordering in SMP,
> >special IO synchronization requirements for memory-mapped IO registers etc
> >etc.
> >
> >It's not that "volatile" is the "portable way". It's that "volatile" is
> >fundamentally not sufficient for the job.
>
> Incorrect.  I haven't been following this thread very closely but your
> assault on volatile is inappropriate.  There are several present day uses
> in assember instructions.

Perhaps you should have followed this thread closely before composing your 
assault on Linus. We're not talking about "asm volatile". We're talking about 
the "volatile" keyword as applied to variables. 'volatile' as applied to 
inline ASM is of course necessary in many cases -- no one is disputing that.

> For example, if you're going to clobber hard registers specifically,
> volatile is required because you cannot write a clobber that overlaps input
> or output operands:
> 	asm volatile ("movc3 %0,%1,%2"
>
> 		: /* none */
> 		: "g" (from), "g" (to), "g" (count)
> 		: "r0", "r1", "r2", "r3", "r4", "r5");
>
> You cannot have an operand that describes a register class with a single
> member if that register is in the clobber list; there is simply no way to
> specify that an input operand is modified without explicitly specifying it
> as output.  If all the output operands are for that purpose, they are
> considered unused.  volatile is _necessary_ for the asm to prevent the
> compiler from deleting it for this reason.
>
> Furthermore, if the asm modifies memory unpredictably, you must add
> `memory` to the list of clobbers.  The compiler will not keep the memory
> value cached in registers any longer across asm instructions; volatile is
> added if the memory you're touching is not listed in the inputs or outputs
> since the `memory` clobber does not count as a side-effect of issuing the
> asm. volatile indicates the asm has side effects that are _important_ and
> will not delete it (if it's reachable, otherwise it is fair game).
>
> If your asm has output operands, gcc currently assumes that the instruction
> has no side effects except changing these output operands (for
> optimization).  The compiler is free to eliminate or move out of loops any
> instructions with side effects if the output operands aren't used.  Even
> more dangerously, if the asm has a side effect on a variable that doesn't
> otherwise appear to change, the old value _may_ be reused later if it's
> found in a register.  volatile is used to prevent the asm from being
> deleted in this case (or more dangerously, combined):
> 	#define get_and_set_priority(new)		\
> 	({ 	int __old;				\
> 		asm volatile ("get_and_set_priority %0, %1"	\
>
> 			: "=g" (__old) : "g" (new));	\
>
> 		__old; })
> The alternative is to write an asm with no outputs so that the compiler
> knows the instruction has side effects and won't eliminate or move it.
>
> Also, gcc will not reschedule instructions across volatile asm's which is
> often necessary:
> 	*(volatile int *)addr = foo;
> 	asm volatile ("eieio" : : );
> If addr contains the address of a memory-mapped device register, then the
> PowerPC eieio instruction informs the processor that it is necessary to
> store to that device register before issuing any other I/O.
>
> 		Trajce Nedev
> 		tnedev@mail.ru
>

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
@ 2006-07-08  6:12 trajce nedev
  2006-07-08  6:19 ` Chase Venters
  2006-07-08 18:23 ` Linus Torvalds
  0 siblings, 2 replies; 119+ messages in thread
From: trajce nedev @ 2006-07-08  6:12 UTC (permalink / raw)
  To: torvalds, acahalan; +Cc: linux-kernel, linux-os, khc, mingo, akpm, arjan

On Fri, 7 Jul 2006, Linus Torvalds wrote:
>
>No.
>
>"volatile" simply CANNOT get the job done. It fundamentally does _nothing_ 
>for all the issues that are fundamental today: CPU memory ordering in SMP, 
>special IO synchronization requirements for memory-mapped IO registers etc 
>etc.
>
>It's not that "volatile" is the "portable way". It's that "volatile" is 
>fundamentally not sufficient for the job.
>

Incorrect.  I haven't been following this thread very closely but your 
assault on volatile is inappropriate.  There are several present day uses in 
assember instructions.

For example, if you're going to clobber hard registers specifically, 
volatile is required because you cannot write a clobber that overlaps input 
or output operands:
	asm volatile ("movc3 %0,%1,%2"
		: /* none */
		: "g" (from), "g" (to), "g" (count)
		: "r0", "r1", "r2", "r3", "r4", "r5");
You cannot have an operand that describes a register class with a single 
member if that register is in the clobber list; there is simply no way to 
specify that an input operand is modified without explicitly specifying it 
as output.  If all the output operands are for that purpose, they are 
considered unused.  volatile is _necessary_ for the asm to prevent the 
compiler from deleting it for this reason.

Furthermore, if the asm modifies memory unpredictably, you must add `memory` 
to the list of clobbers.  The compiler will not keep the memory value cached 
in registers any longer across asm instructions; volatile is added if the 
memory you're touching is not listed in the inputs or outputs since the 
`memory` clobber does not count as a side-effect of issuing the asm.  
volatile indicates the asm has side effects that are _important_ and will 
not delete it (if it's reachable, otherwise it is fair game).

If your asm has output operands, gcc currently assumes that the instruction 
has no side effects except changing these output operands (for 
optimization).  The compiler is free to eliminate or move out of loops any 
instructions with side effects if the output operands aren't used.  Even 
more dangerously, if the asm has a side effect on a variable that doesn't 
otherwise appear to change, the old value _may_ be reused later if it's 
found in a register.  volatile is used to prevent the asm from being deleted 
in this case (or more dangerously, combined):
	#define get_and_set_priority(new)		\
	({ 	int __old;				\
		asm volatile ("get_and_set_priority %0, %1"	\
			: "=g" (__old) : "g" (new));	\
		__old; })
The alternative is to write an asm with no outputs so that the compiler 
knows the instruction has side effects and won't eliminate or move it.

Also, gcc will not reschedule instructions across volatile asm's which is 
often necessary:
	*(volatile int *)addr = foo;
	asm volatile ("eieio" : : );
If addr contains the address of a memory-mapped device register, then the 
PowerPC eieio instruction informs the processor that it is necessary to 
store to that device register before issuing any other I/O.

		Trajce Nedev
		tnedev@mail.ru

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee® 
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-08  3:54 Albert Cahalan
@ 2006-07-08  5:25 ` Linus Torvalds
  2006-07-08  6:39   ` Nick Piggin
  2006-07-08  9:45 ` Joe Korty
  2006-07-08 10:45 ` Krzysztof Halasa
  2 siblings, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-08  5:25 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel, linux-os, khc, mingo, akpm, arjan



On Fri, 7 Jul 2006, Albert Cahalan wrote:
> 
> Key thing being "language environment", meaning gcc.

No.

The key thing is "language environment", meaning THE HARDWARE.

> By the language spec, volatile is a low-performance way to
> get the job done.

No.

"volatile" simply CANNOT get the job done. It fundamentally does _nothing_ 
for all the issues that are fundamental today: CPU memory ordering in SMP, 
special IO synchronization requirements for memory-mapped IO registers etc 
etc.

It's not that "volatile" is the "portable way". It's that "volatile" is 
fundamentally not sufficient for the job.

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
@ 2006-07-08  3:54 Albert Cahalan
  2006-07-08  5:25 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Albert Cahalan @ 2006-07-08  3:54 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, linux-os, khc, mingo, akpm, arjan

Linus Torvalds writes:

> AND I POINTED OUT THAT EVEN IN YOUR TRIVIAL EXAMPLE, VOLATILE WAS
> ACTUALLY THE WRONG THING TO DO.
>
> And that's _exactly_ because the language environment (in this case
> the CPU itself) has evolved past the point it was 30 years ago.

Key thing being "language environment", meaning gcc.

By the language spec, volatile is a low-performance way to
get the job done. Following the language spec is damn hard
these days, considering some of the evil things PCI does in
the name of performance. The compiler probably should offer
an option to call a function for read/write to volatile mem.
Such a function could take a lock, determine if a PCI read
is required to enforce ordering, perform the operation, and
then release the lock.

That's all theoretical though. Today, gcc's volatile does
not follow the C standard on modern hardware. Bummer.
It'd be low-performance anyway though.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:05                                     ` J.A. Magallón
  2006-07-07 22:22                                       ` Chase Venters
@ 2006-07-07 23:36                                       ` Davide Libenzi
  1 sibling, 0 replies; 119+ messages in thread
From: Davide Libenzi @ 2006-07-07 23:36 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: linux-os (Dick Johnson), Linus Torvalds, Linux Kernel Mailing List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; CHARSET=X-UNKNOWN; format=flowed, Size: 1869 bytes --]

On Sat, 8 Jul 2006, J.A. Magallón wrote:

> On Fri, 7 Jul 2006 17:22:31 -0400, "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote:
>
>>
>> On Fri, 7 Jul 2006, Linus Torvalds wrote:
>>
>>>
>>> On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
>>>>
>>>> Now Linus declares that instead of declaring an object volatile
>>>> so that it is actually accessed every time it is referenced, he wants
>>>> to use a GNU-ism with assembly that tells the compiler to re-read
>>>> __every__ variable existing im memory, instead of just one. Go figure!
>>>
>>> Actually, it's not just me.
>>>
>>> Read things like the Intel CPU documentation.
>>>
>>> IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU
>>> potentially over-heat, causing degreaded performance, and you're simply
>>> not supposed to do it.
>>
>> This is a bait and switch argument. The code was displayed to show
>> the compiler output, not an example of good coding practice.
>>
>
> volatile means what it means, is usefull and is right. If it is used
> in kernel for other things apart from what it was designed for it is
> kernel or programmer responsibility. It does not mention nothing about
> locking.

(looking at your code ...)
I think you guys mixed the concepts about *if* a memory access happens 
(volatile), and *where* the memory access happens (barrier).
As far as kernel coding goes (or MT userspace), if you happen to care *if*
a memory access happens, you probably want to care even *where* the memory 
access happens. And modern CPUs and compilers do not respect the WYSIWYG 
property ;)
This is not always true (*if* -> *where*), but it's very frequent.
And using "volatile" can make your code work in some cases, and misbehave 
in others.
Can we now all move on to a more refreshing "C++ kernel rewrite" thread :)



- Davide

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:49                                         ` J.A. Magallón
  2006-07-07 22:59                                           ` Vadim Lobanov
@ 2006-07-07 23:18                                           ` Chase Venters
  1 sibling, 0 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-07 23:18 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: linux-os \(Dick Johnson\), Linus Torvalds, linux-kernel

On Friday 07 July 2006 17:49, J.A. Magallón wrote:
> On Fri, 7 Jul 2006 17:22:22 -0500 (CDT), Chase Venters 
<chase.venters@clientec.com> wrote:
> > > .L7:
> > >    movl    $1, mtx    <=========
> > >    movl    spinvar, %eax
> > >    movl    $0, mtx    <=========
> > >    testl   %eax, %eax
> > >    jne .L7
> > >    popl    %ebp
> > >    ret
> >
> > NO! It's not better. You're still not syncing or locking the bus! If you
> > refer to the fact that the "movl $1" has magically appeared, that's
> > because you've just PAPERED OVER THE PROBLEM WITH "volatile", which is
> > _exactly_ what Linus is telling you NOT TO DO.
>
> BTW, I really don't mind if a given architecnture has to lock the bus or
> say a prayer to Budha to reload a variable. I want it to be reloaded at
> every (or a certain, in case of a (volatile)mtx cast) usage. The compiler
> is the responsible of knowing what to do. What if nextgen P4 Xeon do not
> need a bus lock ? Will you rewrite the kernel ?

You are missing the point. Your code is obviously trying to implement locks, 
which is why you've called your functions lock() and unlock(). It is 
impossible, on any architecture, to implement correct locks in pure C. Your 
locks are incapable of working in the kernel _or_ in user space.

Barriers are a different but related point. If you want to know more about 
barriers, and why they exist, please read Documentation/memory-barriers.txt 
in full. (FYI, Locked bus operations on x86 imply memory barriers.)

Locks are supposed to prevent sections of your code from being interrupted by 
something that touches the same data. Because both the compiler and the 
processor reserve the right to re-order your loads and stores, any lock that 
does not form a memory barrier both for the CPU and for the compiler is 
broken. Because you cannot form a CPU memory barrier from C (much less the 
ATOMIC COMPARE AND SWAP you need to implement a lock), any lock implemented 
in pure C is broken by definition. 

Your code is wrong. Barring some fundamental change to the way things work, 
locks and/or barriers are always going to be necessary.

Your usage of 'volatile' of course changes the generated code output. It will 
force the compiler to reload the variable. The problem with your lock code is 
that reloading the variable _IS NOT ENOUGH_. Hence your usage of volatile is 
a broken way to paper over the problem.

'volatile' in general, for the kernel, is wrong because it isn't specific 
enough and it causes the compiler to generate bad code.

But hey, maybe some day we'll all be proven wrong. Some magical new computer 
architecture will appear, and on that day, we will rewrite the kernel.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 19:51                               ` linux-os (Dick Johnson)
  2006-07-07 20:25                                 ` Linus Torvalds
  2006-07-07 20:39                                 ` Krzysztof Halasa
@ 2006-07-07 23:06                                 ` Björn Steinbrink
  2006-07-08  8:36                                 ` Avi Kivity
  3 siblings, 0 replies; 119+ messages in thread
From: Björn Steinbrink @ 2006-07-07 23:06 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Krzysztof Halasa, Linus Torvalds, Ingo Molnar, Andrew Morton,
	linux-kernel, arjan

On 2006.07.07 15:51:11 -0400, linux-os (Dick Johnson) wrote:
> 
> On Fri, 7 Jul 2006, Krzysztof Halasa wrote:
> 
> > "linux-os \(Dick Johnson\)" <linux-os@analogic.com> writes:
> >
> >> extern int spinner;
> >>
> >> funct(){
> >>      while(spinner)
> >>          ;
> >>
> >> The 'C' compiler has no choice but to actually make that memory access
> >> and read the variable because the variable is in another module (a.k.a.
> >> file).
> >
> > defiant:/tmp/khc$ gcc --version
> > gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
> > defiant:/tmp/khc$ cat test.c
> > extern int spinner;
> >
> > void funct(void)
> > {
> >     while(spinner)
> >         ;
> > }
> > defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> > defiant:/tmp/khc$ objdump -d test.o
> >
> > test.o:     file format elf32-i386
> >
> > Disassembly of section .text:
> >
> > 00000000 <funct>:
> >   0:   a1 00 00 00 00          mov    0x0,%eax
> >   5:   55                      push   %ebp
> >   6:   89 e5                   mov    %esp,%ebp
> >   8:   85 c0                   test   %eax,%eax
> >   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
> >  10:   75 fe                   jne    10 <funct+0x10>
> >  12:   5d                      pop    %ebp
> >  13:   c3                      ret
> >
> > "0x0" is, of course, for relocation.
> 
> 
> So read the code; you have "10:   jne 10", jumping to itself
> forever, without even doing anything else to set the flags, much
> less reading a variable.

Well, you said that the compiler is forced to make the memory access,
given the topic of this discussion probably noone assumed that you meant
exactly one memory access. Of course that code is broken, but you seemed
to say that it is not.

> >> However, if I have the same code, but the variable is visible during
> >> compile time, i.e.,
> >>
> >> int spinner=0;
> >>
> >> funct(){
> >>      while(spinner)
> >>          ;
> >>
> >> ... the compiler may eliminate that code altogether because it
> >> 'knows' that spinner is FALSE, having initialized it to zero
> >> itself.
> >
> > defiant:/tmp/khc$ cat test.c
> > int spinner = 0;
> >
> > void funct(void)
> > {
> >     while(spinner)
> >         ;
> > }
> > defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> > defiant:/tmp/khc$ objdump -d test.o
> >
> > 00000000 <funct>:
> >   0:   a1 00 00 00 00          mov    0x0,%eax
> >   5:   55                      push   %ebp
> >   6:   89 e5                   mov    %esp,%ebp
> >   8:   85 c0                   test   %eax,%eax
> >   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
> >  10:   75 fe                   jne    10 <funct+0x10>
> >  12:   5d                      pop    %ebp
> >  13:   c3                      ret
> >
> 
> Then, you have exactly the same thing here:
>    10:   75 fe                   jne    10 <funct+0x10>
> 
> Same bad code.

Which proves you wrong, you said the compiler would optimize it away o.O

> >> Since spinner is global in scope, somebody surely could have
> >> changed it before funct() was even called, but the current gcc
> >> 'C' compiler doesn't care and may optimize it away entirely.
> >
> > Personally I don't think such C compiler even existed. HISOFT C
> > on ZX Spectrum could be a good candidate but I think it didn't
> > have any optimizer :-)
> >
> >> To
> >> prevent this, you must declare the variable volatile. To do
> >> otherwise is a bug.
> >
> > Nope. Volatile just means that every read (and write) must actually
> > access the variable. Note that the compiler optimized out accesses
> > to the variable in the loop - while it has to check at the beginning
> > of funct(), it knows that the variable is constant through funct().
> >
> > Note that "volatile" is not exactly what we usually want, but it
> > does the job (i.e., the program doesn't crash, but the code is
> > probably suboptimal).
> >
> >> That said, I think that the current
> >> implementation of 'volatile' is broken because the compiler
> >> seems to believe that the variable has moved! It recalculates
> >> the address of the variable as well as accessing its value.
> >> This is what makes the code generation problematical.
> >
> > You must be using a heavily broken compiler:
> >
> > defiant:/tmp/khc$ cat test.c
> > volatile int spinner = 0;
> >
> > void funct(void)
> > {
> >     while(spinner)
> >         ;
> > }
> > defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> > defiant:/tmp/khc$ objdump -d test.o
> >
> > 00000000 <funct>:
> >   0:   55                      push   %ebp
> >   1:   89 e5                   mov    %esp,%ebp
> >   3:   a1 00 00 00 00          mov    0x0,%eax
> >   8:   85 c0                   test   %eax,%eax
> >   a:   75 f7                   jne    3 <funct+0x3>
> >   c:   5d                      pop    %ebp
> >   d:   c3                      ret
> 
> This is the only code that works. Guess why it worked? Because
> you declared the variable volatile.

The inline assembly works as well, it is made volatile and gcc will not
mess with it.

> Now Linus declares that instead of declaring an object volatile
> so that it is actually accessed every time it is referenced, he wants
> to use a GNU-ism with assembly that tells the compiler to re-read
> __every__ variable existing im memory, instead of just one. Go figure!

That wasn't Linus. Arjan suggested using barrier() in the right places.
What Linus did was just pointing out that volatile is the wrong thing to
use and that some inline assembly code had "=m" outputs instead of "+m".

The memory clobber was already there for the locking primitives and
AFAICT it is required.

lock(foo_lock);
foo = 5;
unlock(foo_lock);

// do something else

lock(foo_lock);
while (foo = 5);
unlock(foo_lock);

foo might have changed while the cpu was doing something else, but
without the memory clobber, the compiler might assume that foo is
unchanged. Making foo_lock volatile won't help here and making
everything that requires a lock volatile will get you nothing but crappy
code  (but hey, you can avoid the memory clobber!).

> Reference:
> /usr/src/linux-2.6.16.4/include/linux/compiler-gcc.h:
> #define barrier() __asm__ __volatile__("": : :"memory")

Björn

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:49                                         ` J.A. Magallón
@ 2006-07-07 22:59                                           ` Vadim Lobanov
  2006-07-07 23:18                                           ` Chase Venters
  1 sibling, 0 replies; 119+ messages in thread
From: Vadim Lobanov @ 2006-07-07 22:59 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: Chase Venters, linux-os \\(Dick Johnson\\), Linus Torvalds, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 1535 bytes --]

On Sat, 8 Jul 2006, J.A. [UTF-8] Magallón wrote:

> On Fri, 7 Jul 2006 17:22:22 -0500 (CDT), Chase Venters <chase.venters@clientec.com> wrote:
>
> > > .L7:
> > >    movl    $1, mtx    <=========
> > >    movl    spinvar, %eax
> > >    movl    $0, mtx    <=========
> > >    testl   %eax, %eax
> > >    jne .L7
> > >    popl    %ebp
> > >    ret
> >
> > NO! It's not better. You're still not syncing or locking the bus! If you
> > refer to the fact that the "movl $1" has magically appeared, that's
> > because you've just PAPERED OVER THE PROBLEM WITH "volatile", which is
> > _exactly_ what Linus is telling you NOT TO DO.
> >
>
> BTW, I really don't mind if a given architecnture has to lock the bus or
> say a prayer to Budha to reload a variable. I want it to be reloaded at
> every (or a certain, in case of a (volatile)mtx cast) usage. The compiler
> is the responsible of knowing what to do. What if nextgen P4 Xeon do not
> need a bus lock ? Will you rewrite the kernel ?

Looks like you need to read Documentation/memory-barriers.txt. That file
explains why the above assembly code is not correct.

Bonus question: what stops the processor from coalescing or rearranging
the three movl instructions in that assembly?

> --
> J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
>                                          \         It's better when it's free
> Mandriva Linux release 2007.0 (Cooker) for i586
> Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

-- Vadim Lobanov

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:22                                       ` Chase Venters
  2006-07-07 22:37                                         ` J.A. Magallón
@ 2006-07-07 22:49                                         ` J.A. Magallón
  2006-07-07 22:59                                           ` Vadim Lobanov
  2006-07-07 23:18                                           ` Chase Venters
  1 sibling, 2 replies; 119+ messages in thread
From: J.A. Magallón @ 2006-07-07 22:49 UTC (permalink / raw)
  To: Chase Venters; +Cc: linux-os \(Dick Johnson\), Linus Torvalds, linux-kernel

On Fri, 7 Jul 2006 17:22:22 -0500 (CDT), Chase Venters <chase.venters@clientec.com> wrote:

> > .L7:
> >    movl    $1, mtx    <=========
> >    movl    spinvar, %eax
> >    movl    $0, mtx    <=========
> >    testl   %eax, %eax
> >    jne .L7
> >    popl    %ebp
> >    ret
> 
> NO! It's not better. You're still not syncing or locking the bus! If you 
> refer to the fact that the "movl $1" has magically appeared, that's 
> because you've just PAPERED OVER THE PROBLEM WITH "volatile", which is 
> _exactly_ what Linus is telling you NOT TO DO.
> 

BTW, I really don't mind if a given architecnture has to lock the bus or
say a prayer to Budha to reload a variable. I want it to be reloaded at
every (or a certain, in case of a (volatile)mtx cast) usage. The compiler
is the responsible of knowing what to do. What if nextgen P4 Xeon do not
need a bus lock ? Will you rewrite the kernel ?

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.0 (Cooker) for i586
Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:22                                       ` Chase Venters
@ 2006-07-07 22:37                                         ` J.A. Magallón
  2006-07-08  9:33                                           ` David Schwartz
  2006-07-07 22:49                                         ` J.A. Magallón
  1 sibling, 1 reply; 119+ messages in thread
From: J.A. Magallón @ 2006-07-07 22:37 UTC (permalink / raw)
  To: Chase Venters; +Cc: linux-os \(Dick Johnson\), Linus Torvalds, linux-kernel

On Fri, 7 Jul 2006 17:22:22 -0500 (CDT), Chase Venters <chase.venters@clientec.com> wrote:

> On Sat, 8 Jul 2006, J.A. Magallón wrote:
> 
> > #include <stdint.h>
> >
> > //volatile
> > uint32_t spinvar = 1;
> > uint32_t mtx;
> >
> > void lock(uint32_t* l)
> > {
> >    *l = 1;
> > }
> >
> > void unlock(uint32_t* l)
> > {
> >    *l = 0;
> > }
> >
> > void spin()
> > {
> >    uint32_t local;
> >
> >    for (;;)
> >    {
> >        lock(&mtx);
> >        local = spinvar;
> >        unlock(&mtx);
> >        if (!local)
> >            break;
> >    }
> > }
> 
> This is _totally_ incorrect. Your "lock" functions are broken, because 
> they do not introduce syncronization points or locked bus operations. 

But why should I do that ? I write C, a high level language, and I don't
mind about buses or whatever. I just have to know that a 32 bit store is
atomic in a 32bit arch. There is a nice high level and portable language
feature to say 'reload this variable here'. And it lets the compiler do its
optimization work as it best can asuming it has to reload that variable.
Instead, you prefer to lock all the bus for that.
Kernel developers spent a full release to get rid of
the 'big kernel lock'. Perhaps there is another release needed to get
rid of the 'big memory barrier'.

> Due 
> to this huge failure, the compiler and/or processor is free to re-order 
> your loads and stores, resulting in totally unpredictable runtime 
> behavior.

Nope. If I want it not to reorder, put the volatile in the parameters.
So I just invalidate that variable, not everything.

> 
> > without the volatile:
> >
> > spin:
> >    pushl   %ebp
> >    movl    spinvar, %eax
> >    movl    %esp, %ebp
> >    testl   %eax, %eax
> >    je  .L7
> > .L10:
> >    jmp .L10
> > .L7:
> >    movl    $0, mtx
> >    popl    %ebp
> >    ret
> >
> > so the compiler did something like
> >
> >   local = spinvar;
> >   if (local)
> > 	for (;;);
> >
> > (notice the dead lock/unlock inlined code elimination).
> 
> ...which indicates that your code is wrong.

My code probably is wrong without the volatile, but the compiler did the
right(tm) thing.

> 
> > With the volatile, the code is correct:
> >
> > spin:
> >    pushl   %ebp
> >    movl    %esp, %ebp
> >    .p2align 4,,7
> > .L7:
> >    movl    spinvar, %eax
> >    testl   %eax, %eax
> >    jne .L7
> >    movl    $0, mtx
> >    popl    %ebp
> >    ret
> 
> Actually, it's not. It's never setting "mtx" to 1,

What is correct regarding the code I gave it.
Inlined code is:

    for (;;)
    {
        mtx = 1;
        local = spinvar;
        mtx = 0;
        if (!local)
            break;
    }

I just change mtx and don't use it for anything. So the compiler is free to
optimize out it.
If I don't want the compiler to kill it, I put the volatile.

> and it's certainly not 
> doing any sync or locked ops.
> 
> > So think about all you inlined spinlocks, mutexes and so on.
> 
> Yes, you got it wrong, and the current code gets it right. (Linus's patch 
> of =m to +m, combined with -volatile, is best)

Nope, I did the same thing with a high level and portable way.
Its like the choice between writing inlined SSE assembler or using the
vector and +,-,* builtins in gcc.

> 
> > And if you do
> >
> > void lock(volatile uint32_t* l)
> > ...
> > void unlock(volatile uint32_t* l)
> > ...
> >
> > the code is even better:
> >
> > spin:
> >    pushl   %ebp
> >    movl    %esp, %ebp
> >    .p2align 4,,7
> > .L7:
> >    movl    $1, mtx    <=========
> >    movl    spinvar, %eax
> >    movl    $0, mtx    <=========
> >    testl   %eax, %eax
> >    jne .L7
> >    popl    %ebp
> >    ret
> 
> NO! It's not better. You're still not syncing or locking the bus! If you 
> refer to the fact that the "movl $1" has magically appeared, that's 
> because you've just PAPERED OVER THE PROBLEM WITH "volatile", which is 
> _exactly_ what Linus is telling you NOT TO DO.
> 
> > So volatile just means 'dont trust this does not change even you don't see
> > why'.
> >
> 
> No.
> 
> Thanks,
> Chase


--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.0 (Cooker) for i586
Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 22:05                                     ` J.A. Magallón
@ 2006-07-07 22:22                                       ` Chase Venters
  2006-07-07 22:37                                         ` J.A. Magallón
  2006-07-07 22:49                                         ` J.A. Magallón
  2006-07-07 23:36                                       ` Davide Libenzi
  1 sibling, 2 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-07 22:22 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: linux-os \(Dick Johnson\), Linus Torvalds, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 2380 bytes --]

On Sat, 8 Jul 2006, J.A. Magallón wrote:

> #include <stdint.h>
>
> //volatile
> uint32_t spinvar = 1;
> uint32_t mtx;
>
> void lock(uint32_t* l)
> {
>    *l = 1;
> }
>
> void unlock(uint32_t* l)
> {
>    *l = 0;
> }
>
> void spin()
> {
>    uint32_t local;
>
>    for (;;)
>    {
>        lock(&mtx);
>        local = spinvar;
>        unlock(&mtx);
>        if (!local)
>            break;
>    }
> }

This is _totally_ incorrect. Your "lock" functions are broken, because 
they do not introduce syncronization points or locked bus operations. Due 
to this huge failure, the compiler and/or processor is free to re-order 
your loads and stores, resulting in totally unpredictable runtime 
behavior.

> without the volatile:
>
> spin:
>    pushl   %ebp
>    movl    spinvar, %eax
>    movl    %esp, %ebp
>    testl   %eax, %eax
>    je  .L7
> .L10:
>    jmp .L10
> .L7:
>    movl    $0, mtx
>    popl    %ebp
>    ret
>
> so the compiler did something like
>
>   local = spinvar;
>   if (local)
> 	for (;;);
>
> (notice the dead lock/unlock inlined code elimination).

...which indicates that your code is wrong.

> With the volatile, the code is correct:
>
> spin:
>    pushl   %ebp
>    movl    %esp, %ebp
>    .p2align 4,,7
> .L7:
>    movl    spinvar, %eax
>    testl   %eax, %eax
>    jne .L7
>    movl    $0, mtx
>    popl    %ebp
>    ret

Actually, it's not. It's never setting "mtx" to 1, and it's certainly not 
doing any sync or locked ops.

> So think about all you inlined spinlocks, mutexes and so on.

Yes, you got it wrong, and the current code gets it right. (Linus's patch 
of =m to +m, combined with -volatile, is best)

> And if you do
>
> void lock(volatile uint32_t* l)
> ...
> void unlock(volatile uint32_t* l)
> ...
>
> the code is even better:
>
> spin:
>    pushl   %ebp
>    movl    %esp, %ebp
>    .p2align 4,,7
> .L7:
>    movl    $1, mtx    <=========
>    movl    spinvar, %eax
>    movl    $0, mtx    <=========
>    testl   %eax, %eax
>    jne .L7
>    popl    %ebp
>    ret

NO! It's not better. You're still not syncing or locking the bus! If you 
refer to the fact that the "movl $1" has magically appeared, that's 
because you've just PAPERED OVER THE PROBLEM WITH "volatile", which is 
_exactly_ what Linus is telling you NOT TO DO.

> So volatile just means 'dont trust this does not change even you don't see
> why'.
>

No.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:22                                   ` linux-os (Dick Johnson)
                                                       ` (2 preceding siblings ...)
  2006-07-07 22:05                                     ` J.A. Magallón
@ 2006-07-07 22:09                                     ` Ingo Molnar
  2006-07-08  7:36                                       ` Ingo Molnar
  3 siblings, 1 reply; 119+ messages in thread
From: Ingo Molnar @ 2006-07-07 22:09 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Linus Torvalds, Krzysztof Halasa, Andrew Morton, Linux kernel, arjan


* linux-os (Dick Johnson) <linux-os@analogic.com> wrote:

> >> Now Linus declares that instead of declaring an object volatile
> >> so that it is actually accessed every time it is referenced, he wants
> >> to use a GNU-ism with assembly that tells the compiler to re-read
> >> __every__ variable existing im memory, instead of just one. Go figure!
> >
> > Actually, it's not just me.
> >
> > Read things like the Intel CPU documentation.
> >
> > IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU
> > potentially over-heat, causing degreaded performance, and you're simply
> > not supposed to do it.
> 
> This is a bait and switch argument. [...]

how could it possibly be? Linus has simply replied to your point:

> > > If I have some code that does:
> > > 
> > > extern int spinner;
> > > 
> > > funct(){
> > >      while(spinner)
> > >            ;
> > >
> > > The 'C' compiler has no choice but to actually make that memory 
> > > access and read the variable [...]

Linus observed that the only possible purpose of the code you cited, 
busy-looping, makes no sense.

If you dont intend to busy-loop, why does the code above look like a 
busy-loop, and what do you need the volatile keyword for?

> [...] In fact, your spin-lock code already inserts "rep nops" and I 
> never implied that they should be removed. I said only that "volatile" 
> still needs to be used, not some macro that tells the compiler that 
> everything in memory probably got trashed. [...]

your position here does seem to make much sense to me, so please help me 
understand it. You suggest that the assembly code should be left alone. 
But then why do you need the volatile keyword to begin with?

> Well, in fact I do know what I'm talking about. Your bait-and-switch 
> arguments just won't work with me.

you seem to be quite self-confident. That is a nice thing to have for 
say a pro boxer, but it can be a disadvantage when dealing with a 
complex OS. Just curious, if you turn out to be wrong will you apologize 
for wasting Linus' (and others') time and for insulting him like that? 
(Or is this a totally impossible scenario not even worth discussing?)

> > Repeat after me (or just shut up about things that you not only don't know
> > about, but are apparently not willing to even learn):
> >
> > 	"'volatile' is useless. The things it did 30 years ago are much
> > 	 more complex these days, and need to be tied to much more
> > 	 detailed rules that depend on the actual particular problem,
> > 	 rather than one keyword to the compiler that doesn't actually
> > 	 give enough information for the compiler to do anything useful"
> >
> > Comprende?

here Linus claims that 'volatile' is useless and that the use of 
'volatile' is almost always a sign of bugs. That includes your 
code-samples too. So a proper answer on a technical forum like lkml 
would _not_ be to ignore Linus' point like you did, but to answer it 
directly. There are a couple of ways to answer it:

- you could explain why your sample code _does_ make sense. That's a 
  powerful way of making your point and you'll sure see an answer 
  either rebutting your point or conceding the point.

- or if you know that it makes no sense (minor nit: in which case you 
  should have pointed that out when writing them) then you could either
  correct it, or cite actual kernel code to which your point applies.

- an even better (and by far my most favorite) method of reply would be 
  if you sent an actual kernel patch (ontop of 2.6.18-rc1 plus my 
  volatile-removal patch) that implements your suggestions! That
  would be a perfect way of making your point - one line of patch is 
  worth a thousand words.

	Ingo

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:59                                     ` Linus Torvalds
@ 2006-07-07 22:06                                       ` Linus Torvalds
  2006-07-08 20:49                                       ` Pavel Machek
  1 sibling, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-07 22:06 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Krzysztof Halasa, Ingo Molnar, Andrew Morton, Linux kernel, arjan



On Fri, 7 Jul 2006, Linus Torvalds wrote:
> 
> It still makes a difference for code generation, OF COURSE. But it's the 
> wrong thing to use.

And if you don't realize that my argument wasn't "bait-and-switch", it's 
exactly the same thing. You pointed out a place where "volatile" would 
make the code "work".

AND I POINTED OUT THAT EVEN IN YOUR TRIVIAL EXAMPLE, VOLATILE WAS 
ACTUALLY THE WRONG THING TO DO.

And that's _exactly_ because the language environment (in this case the 
CPU itself) has evolved past the point it was 30 years ago.

And my point is that this is _always_ true. There are basically no valid 
uses where you can use "volatile" today, where there isn't some reason why 
you _should_ have done it another way entirely. Either you should have 
used proper locking, or you should have used the proper IO accessor 
functions, or you should have used something like "cpu_relax()", OR ANY 
NUMBER OF OTHER MECHANISMS.

(I did give a few examples of where "volatile" can be valid, but they are 
very limited)

Yes, "volatile" is convenient - at the cost of making the compiler 
generate crap code even when it shouldn't need to. Yes, "volatile" 
sometimes allows you to be lazy and not do the proper thing. Yes, 
"volatile" can hide bugs when you _tried_ to do the proper thing but 
screwed up.

But can't you see that _none_ of those 'Yes, "volatile" ...' actually 
means that you should _use_ "volatile". 

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:22                                   ` linux-os (Dick Johnson)
  2006-07-07 21:48                                     ` Chase Venters
  2006-07-07 21:59                                     ` Linus Torvalds
@ 2006-07-07 22:05                                     ` J.A. Magallón
  2006-07-07 22:22                                       ` Chase Venters
  2006-07-07 23:36                                       ` Davide Libenzi
  2006-07-07 22:09                                     ` Ingo Molnar
  3 siblings, 2 replies; 119+ messages in thread
From: J.A. Magallón @ 2006-07-07 22:05 UTC (permalink / raw)
  To: linux-os (Dick Johnson), Linus Torvalds, linux-kernel

On Fri, 7 Jul 2006 17:22:31 -0400, "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote:

> 
> On Fri, 7 Jul 2006, Linus Torvalds wrote:
> 
> >
> >
> > On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
> >>
> >> Now Linus declares that instead of declaring an object volatile
> >> so that it is actually accessed every time it is referenced, he wants
> >> to use a GNU-ism with assembly that tells the compiler to re-read
> >> __every__ variable existing im memory, instead of just one. Go figure!
> >
> > Actually, it's not just me.
> >
> > Read things like the Intel CPU documentation.
> >
> > IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU
> > potentially over-heat, causing degreaded performance, and you're simply
> > not supposed to do it.
> 
> This is a bait and switch argument. The code was displayed to show
> the compiler output, not an example of good coding practice.
> 

volatile means what it means, is usefull and is right. If it is used
in kernel for other things apart from what it was designed for it is
kernel or programmer responsibility. It does not mention nothing about
locking.

A more real example:

#include <stdint.h>

//volatile
uint32_t spinvar = 1;
uint32_t mtx;

void lock(uint32_t* l)
{   
    *l = 1;
}

void unlock(uint32_t* l)
{
    *l = 0;
}

void spin()
{
    uint32_t local;

    for (;;)
    {
        lock(&mtx);
        local = spinvar;
        unlock(&mtx);
        if (!local)
            break;
    }
}

without the volatile:

spin:
    pushl   %ebp
    movl    spinvar, %eax
    movl    %esp, %ebp
    testl   %eax, %eax
    je  .L7
.L10:
    jmp .L10
.L7:
    movl    $0, mtx
    popl    %ebp
    ret

so the compiler did something like

   local = spinvar;
   if (local)
	for (;;);

(notice the dead lock/unlock inlined code elimination).

With the volatile, the code is correct:

spin:
    pushl   %ebp
    movl    %esp, %ebp
    .p2align 4,,7
.L7:
    movl    spinvar, %eax
    testl   %eax, %eax
    jne .L7
    movl    $0, mtx
    popl    %ebp
    ret

So think about all you inlined spinlocks, mutexes and so on.

And if you do

void lock(volatile uint32_t* l)
...
void unlock(volatile uint32_t* l)
...

the code is even better:

spin:
    pushl   %ebp
    movl    %esp, %ebp
    .p2align 4,,7
.L7:
    movl    $1, mtx    <=========
    movl    spinvar, %eax
    movl    $0, mtx    <=========
    testl   %eax, %eax
    jne .L7
    popl    %ebp
    ret

So volatile just means 'dont trust this does not change even you don't see
why'.

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.0 (Cooker) for i586
Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:22                                   ` linux-os (Dick Johnson)
  2006-07-07 21:48                                     ` Chase Venters
@ 2006-07-07 21:59                                     ` Linus Torvalds
  2006-07-07 22:06                                       ` Linus Torvalds
  2006-07-08 20:49                                       ` Pavel Machek
  2006-07-07 22:05                                     ` J.A. Magallón
  2006-07-07 22:09                                     ` Ingo Molnar
  3 siblings, 2 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-07 21:59 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Krzysztof Halasa, Ingo Molnar, Andrew Morton, Linux kernel, arjan



On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
>
> This is a bait and switch argument. The code was displayed to show
> the compiler output, not an example of good coding practice.

NO IT IS NOT.

The whole point of my argument is simple:

> > 	"'volatile' is useless. The things it did 30 years ago are much
> > 	 more complex these days, and need to be tied to much more
> > 	 detailed rules that depend on the actual particular problem,
> > 	 rather than one keyword to the compiler that doesn't actually
> > 	 give enough information for the compiler to do anything useful"

And dammit, if you cannot admit that, then you're not worth discussing 
with.

"volatile" is useless. It's a big hammer in a world where the nails aren't 
nails any more, they are screws, thumb-tacks, and spotwelding.

It still makes a difference for code generation, OF COURSE. But it's the 
wrong thing to use.

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 21:22                                   ` linux-os (Dick Johnson)
@ 2006-07-07 21:48                                     ` Chase Venters
  2006-07-08 10:00                                       ` Krzysztof Halasa
  2006-07-07 21:59                                     ` Linus Torvalds
                                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 119+ messages in thread
From: Chase Venters @ 2006-07-07 21:48 UTC (permalink / raw)
  To: linux-os \\(Dick Johnson\\)
  Cc: Linus Torvalds, Krzysztof Halasa, Ingo Molnar, Andrew Morton,
	Linux kernel, arjan

On Fri, 7 Jul 2006, linux-os \(Dick Johnson\) wrote:

> Again, I didn't propose to do that. In fact, your spin-lock
> code already inserts "rep nops" and I never implied that they
> should be removed. I said only that "volatile" still needs to
> be used, not some macro that tells the compiler that everything
> in memory probably got trashed. Read what I said, not what you
> think some idiot might have said.
>

Dude, are you even paying attention? "volatile" very much does not need to 
be used (and as Linus points out, it is _wrong_). Since we're using GCC's 
inline asm syntax _already_, it is perfectly sufficient to use the same 
syntax to tell GCC that memory should be considered invalid.

Locks are supposed to be syncronization points, which is why they ALREADY 
HAVE "memory" on the clobber list! "memory" IS NECESSARY. The fact 
that "=m" is changing to "+m" in Linus's patches is because "=m" is in 
fact insufficient, because it would let the compiler believe we're just 
going to over-write the value. "volatile" merely hides that bug -- once 
that bug is _fixed_ (by going to "+m"), "volatile" is no longer useful. 
(It wasn't useful before, it just _papered over_ a problem).

If "volatile" is in use elsewhere (other than locks), it's still 
probably wrong. In these cases, you can use a barrier, a volatile cast, or 
an inline asm with a specific clobber.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 20:25                                 ` Linus Torvalds
@ 2006-07-07 21:22                                   ` linux-os (Dick Johnson)
  2006-07-07 21:48                                     ` Chase Venters
                                                       ` (3 more replies)
  0 siblings, 4 replies; 119+ messages in thread
From: linux-os (Dick Johnson) @ 2006-07-07 21:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Krzysztof Halasa, Ingo Molnar, Andrew Morton, Linux kernel, arjan


On Fri, 7 Jul 2006, Linus Torvalds wrote:

>
>
> On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
>>
>> Now Linus declares that instead of declaring an object volatile
>> so that it is actually accessed every time it is referenced, he wants
>> to use a GNU-ism with assembly that tells the compiler to re-read
>> __every__ variable existing im memory, instead of just one. Go figure!
>
> Actually, it's not just me.
>
> Read things like the Intel CPU documentation.
>
> IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU
> potentially over-heat, causing degreaded performance, and you're simply
> not supposed to do it.

This is a bait and switch argument. The code was displayed to show
the compiler output, not an example of good coding practice.

Furthermore, I'm still waiting for the new spin-friction that's
supposed to cause the increased heat. It doesn't exist and furthermore
no Intel processor instruction manual (that is available for public
inspection) claims that busy-waiting increases any heating, only that
some processors that __can__ fall back into a low-power mode will not
do so if they are spinning (naturally). But this is just another
bait-and-switch as well because I only wrote about volatile and
I even provided references.

>
>> Reference:
>> /usr/src/linux-2.6.16.4/include/linux/compiler-gcc.h:
>> #define barrier() __asm__ __volatile__("": : :"memory")
>
> Actually, for your kind of stupid loop, you should use
>
> - include/asm-i386/processor.h:
> 	#define cpu_relax()        rep_nop()
>

Again, I didn't propose to do that. In fact, your spin-lock
code already inserts "rep nops" and I never implied that they
should be removed. I said only that "volatile" still needs to
be used, not some macro that tells the compiler that everything
in memory probably got trashed. Read what I said, not what you
think some idiot might have said.

> where rep_nop() is
>
> 	/* REP NOP (PAUSE) is a a good thing to insert into busy-wait loops. */
> 	static inline void rep_nop(void)
> 	{
> 		__asm__ __volatile__("rep;nop": : :"memory");
> 	}
>
> on x86, and can be other things on other CPU's. On ppc64 it is
>
> 	#define cpu_relax()     do { HMT_low(); HMT_medium(); barrier(); } while (0)
>
> where those HMT macros adjust thread priority.
>

Not either of these things have anything to do with the topic and
I never even implied that they did. Again, I spoke only of the
well known volatile keyword. Nothing else.

> In other words, you just don't know what you're talking about. "volatile"
> is simply not useful, and the fact that you cannot seem to grasp that is
> _your_ problem.
>

Well, in fact I do know what I'm talking about. Your bait-and-switch
arguments just won't work with me.

> Repeat after me (or just shut up about things that you not only don't know
> about, but are apparently not willing to even learn):
>
> 	"'volatile' is useless. The things it did 30 years ago are much
> 	 more complex these days, and need to be tied to much more
> 	 detailed rules that depend on the actual particular problem,
> 	 rather than one keyword to the compiler that doesn't actually
> 	 give enough information for the compiler to do anything useful"
>
> Comprende?
>
> 		Linus
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 19:51                               ` linux-os (Dick Johnson)
  2006-07-07 20:25                                 ` Linus Torvalds
@ 2006-07-07 20:39                                 ` Krzysztof Halasa
  2006-07-07 23:06                                 ` Björn Steinbrink
  2006-07-08  8:36                                 ` Avi Kivity
  3 siblings, 0 replies; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-07 20:39 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, arjan

"linux-os \(Dick Johnson\)" <linux-os@analogic.com> writes:

>> 00000000 <funct>:
>>   0:   a1 00 00 00 00          mov    0x0,%eax
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>   5:   55                      push   %ebp
>>   6:   89 e5                   mov    %esp,%ebp
>>   8:   85 c0                   test   %eax,%eax
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
>>  10:   75 fe                   jne    10 <funct+0x10>
>>  12:   5d                      pop    %ebp
>>  13:   c3                      ret
>>
>> "0x0" is, of course, for relocation.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> So read the code; you have "10:   jne 10", jumping to itself
> forever, without even doing anything else to set the flags, much
> less reading a variable.

The variable is tested before entering the loop, of course. But
it _is_ tested and the initial state doesn't matter.
The "0x0" may be misleading so I added the note about relocation.

It _is_ BTW correct machine code.

>> 00000000 <funct>:
>>   0:   55                      push   %ebp
>>   1:   89 e5                   mov    %esp,%ebp
>>   3:   a1 00 00 00 00          mov    0x0,%eax
>>   8:   85 c0                   test   %eax,%eax
>>   a:   75 f7                   jne    3 <funct+0x3>
>>   c:   5d                      pop    %ebp
>>   d:   c3                      ret
>
> This is the only code that works. Guess why it worked? Because
> you declared the variable volatile.

Of course. But I don't see any address recalculations.

> Now Linus declares that instead of declaring an object volatile
> so that it is actually accessed every time it is referenced, he wants
> to use a GNU-ism with assembly that tells the compiler to re-read
> __every__ variable existing im memory, instead of just one. Go figure!

That's probably overkill, I can think of more cases where "volatile"
actually makes sense. Most are probably broken anyway, especially
those that aren't (guaranteed to be) atomic but the author uses them
as if they were.

BTW: barrier() isn't a lock.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 19:51                               ` linux-os (Dick Johnson)
@ 2006-07-07 20:25                                 ` Linus Torvalds
  2006-07-07 21:22                                   ` linux-os (Dick Johnson)
  2006-07-07 20:39                                 ` Krzysztof Halasa
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-07 20:25 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Krzysztof Halasa, Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Fri, 7 Jul 2006, linux-os (Dick Johnson) wrote:
> 
> Now Linus declares that instead of declaring an object volatile
> so that it is actually accessed every time it is referenced, he wants
> to use a GNU-ism with assembly that tells the compiler to re-read
> __every__ variable existing im memory, instead of just one. Go figure!

Actually, it's not just me.

Read things like the Intel CPU documentation.

IT IS ACTIVELY WRONG to busy-loop on a variable. It will make the CPU 
potentially over-heat, causing degreaded performance, and you're simply 
not supposed to do it.

> Reference:
> /usr/src/linux-2.6.16.4/include/linux/compiler-gcc.h:
> #define barrier() __asm__ __volatile__("": : :"memory")

Actually, for your kind of stupid loop, you should use

 - include/asm-i386/processor.h:
	#define cpu_relax()        rep_nop()

where rep_nop() is

	/* REP NOP (PAUSE) is a a good thing to insert into busy-wait loops. */
	static inline void rep_nop(void)
	{
		__asm__ __volatile__("rep;nop": : :"memory");
	}

on x86, and can be other things on other CPU's. On ppc64 it is

	#define cpu_relax()     do { HMT_low(); HMT_medium(); barrier(); } while (0)

where those HMT macros adjust thread priority.

In other words, you just don't know what you're talking about. "volatile" 
is simply not useful, and the fact that you cannot seem to grasp that is 
_your_ problem.

Repeat after me (or just shut up about things that you not only don't know 
about, but are apparently not willing to even learn):

	"'volatile' is useless. The things it did 30 years ago are much 
	 more complex these days, and need to be tied to much more 
	 detailed rules that depend on the actual particular problem, 
	 rather than one keyword to the compiler that doesn't actually 
	 give enough information for the compiler to do anything useful"

Comprende?

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-07 18:16                             ` Krzysztof Halasa
@ 2006-07-07 19:51                               ` linux-os (Dick Johnson)
  2006-07-07 20:25                                 ` Linus Torvalds
                                                   ` (3 more replies)
  0 siblings, 4 replies; 119+ messages in thread
From: linux-os (Dick Johnson) @ 2006-07-07 19:51 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, arjan


On Fri, 7 Jul 2006, Krzysztof Halasa wrote:

> "linux-os \(Dick Johnson\)" <linux-os@analogic.com> writes:
>
>> extern int spinner;
>>
>> funct(){
>>      while(spinner)
>>          ;
>>
>> The 'C' compiler has no choice but to actually make that memory access
>> and read the variable because the variable is in another module (a.k.a.
>> file).
>
> defiant:/tmp/khc$ gcc --version
> gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
> defiant:/tmp/khc$ cat test.c
> extern int spinner;
>
> void funct(void)
> {
>     while(spinner)
>         ;
> }
> defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> defiant:/tmp/khc$ objdump -d test.o
>
> test.o:     file format elf32-i386
>
> Disassembly of section .text:
>
> 00000000 <funct>:
>   0:   a1 00 00 00 00          mov    0x0,%eax
>   5:   55                      push   %ebp
>   6:   89 e5                   mov    %esp,%ebp
>   8:   85 c0                   test   %eax,%eax
>   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
>  10:   75 fe                   jne    10 <funct+0x10>
>  12:   5d                      pop    %ebp
>  13:   c3                      ret
>
> "0x0" is, of course, for relocation.


So read the code; you have "10:   jne 10", jumping to itself
forever, without even doing anything else to set the flags, much
less reading a variable.


>
>> However, if I have the same code, but the variable is visible during
>> compile time, i.e.,
>>
>> int spinner=0;
>>
>> funct(){
>>      while(spinner)
>>          ;
>>
>> ... the compiler may eliminate that code altogether because it
>> 'knows' that spinner is FALSE, having initialized it to zero
>> itself.
>
> defiant:/tmp/khc$ cat test.c
> int spinner = 0;
>
> void funct(void)
> {
>     while(spinner)
>         ;
> }
> defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> defiant:/tmp/khc$ objdump -d test.o
>
> 00000000 <funct>:
>   0:   a1 00 00 00 00          mov    0x0,%eax
>   5:   55                      push   %ebp
>   6:   89 e5                   mov    %esp,%ebp
>   8:   85 c0                   test   %eax,%eax
>   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
>  10:   75 fe                   jne    10 <funct+0x10>
>  12:   5d                      pop    %ebp
>  13:   c3                      ret
>

Then, you have exactly the same thing here:
   10:   75 fe                   jne    10 <funct+0x10>

Same bad code.

>> Since spinner is global in scope, somebody surely could have
>> changed it before funct() was even called, but the current gcc
>> 'C' compiler doesn't care and may optimize it away entirely.
>
> Personally I don't think such C compiler even existed. HISOFT C
> on ZX Spectrum could be a good candidate but I think it didn't
> have any optimizer :-)
>
>> To
>> prevent this, you must declare the variable volatile. To do
>> otherwise is a bug.
>
> Nope. Volatile just means that every read (and write) must actually
> access the variable. Note that the compiler optimized out accesses
> to the variable in the loop - while it has to check at the beginning
> of funct(), it knows that the variable is constant through funct().
>
> Note that "volatile" is not exactly what we usually want, but it
> does the job (i.e., the program doesn't crash, but the code is
> probably suboptimal).
>
>> That said, I think that the current
>> implementation of 'volatile' is broken because the compiler
>> seems to believe that the variable has moved! It recalculates
>> the address of the variable as well as accessing its value.
>> This is what makes the code generation problematical.
>
> You must be using a heavily broken compiler:
>
> defiant:/tmp/khc$ cat test.c
> volatile int spinner = 0;
>
> void funct(void)
> {
>     while(spinner)
>         ;
> }
> defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
> defiant:/tmp/khc$ objdump -d test.o
>
> 00000000 <funct>:
>   0:   55                      push   %ebp
>   1:   89 e5                   mov    %esp,%ebp
>   3:   a1 00 00 00 00          mov    0x0,%eax
>   8:   85 c0                   test   %eax,%eax
>   a:   75 f7                   jne    3 <funct+0x3>
>   c:   5d                      pop    %ebp
>   d:   c3                      ret

This is the only code that works. Guess why it worked? Because
you declared the variable volatile.

Now Linus declares that instead of declaring an object volatile
so that it is actually accessed every time it is referenced, he wants
to use a GNU-ism with assembly that tells the compiler to re-read
__every__ variable existing im memory, instead of just one. Go figure!

Reference:
/usr/src/linux-2.6.16.4/include/linux/compiler-gcc.h:
#define barrier() __asm__ __volatile__("": : :"memory")


> --
> Krzysztof Halasa
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
                                               ` (2 preceding siblings ...)
  2006-07-06 18:16                             ` Chase Venters
@ 2006-07-07 18:16                             ` Krzysztof Halasa
  2006-07-07 19:51                               ` linux-os (Dick Johnson)
  3 siblings, 1 reply; 119+ messages in thread
From: Krzysztof Halasa @ 2006-07-07 18:16 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, arjan

"linux-os \(Dick Johnson\)" <linux-os@analogic.com> writes:

> extern int spinner;
>
> funct(){
>      while(spinner)
>          ;
>
> The 'C' compiler has no choice but to actually make that memory access
> and read the variable because the variable is in another module (a.k.a.
> file).

defiant:/tmp/khc$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
defiant:/tmp/khc$ cat test.c 
extern int spinner;

void funct(void)
{
     while(spinner)
         ;
}
defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
defiant:/tmp/khc$ objdump -d test.o 

test.o:     file format elf32-i386

Disassembly of section .text:

00000000 <funct>:
   0:   a1 00 00 00 00          mov    0x0,%eax
   5:   55                      push   %ebp
   6:   89 e5                   mov    %esp,%ebp
   8:   85 c0                   test   %eax,%eax
   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  10:   75 fe                   jne    10 <funct+0x10>
  12:   5d                      pop    %ebp
  13:   c3                      ret    

"0x0" is, of course, for relocation.

> However, if I have the same code, but the variable is visible during
> compile time, i.e.,
>
> int spinner=0;
>
> funct(){
>      while(spinner)
>          ;
>
> ... the compiler may eliminate that code altogether because it
> 'knows' that spinner is FALSE, having initialized it to zero
> itself.

defiant:/tmp/khc$ cat test.c 
int spinner = 0;

void funct(void)
{
     while(spinner)
         ;
}
defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
defiant:/tmp/khc$ objdump -d test.o 

00000000 <funct>:
   0:   a1 00 00 00 00          mov    0x0,%eax
   5:   55                      push   %ebp
   6:   89 e5                   mov    %esp,%ebp
   8:   85 c0                   test   %eax,%eax
   a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
  10:   75 fe                   jne    10 <funct+0x10>
  12:   5d                      pop    %ebp
  13:   c3                      ret    

> Since spinner is global in scope, somebody surely could have
> changed it before funct() was even called, but the current gcc
> 'C' compiler doesn't care and may optimize it away entirely.

Personally I don't think such C compiler even existed. HISOFT C
on ZX Spectrum could be a good candidate but I think it didn't
have any optimizer :-)

> To
> prevent this, you must declare the variable volatile. To do
> otherwise is a bug.

Nope. Volatile just means that every read (and write) must actually
access the variable. Note that the compiler optimized out accesses
to the variable in the loop - while it has to check at the beginning
of funct(), it knows that the variable is constant through funct().

Note that "volatile" is not exactly what we usually want, but it
does the job (i.e., the program doesn't crash, but the code is
probably suboptimal).

> That said, I think that the current
> implementation of 'volatile' is broken because the compiler
> seems to believe that the variable has moved! It recalculates
> the address of the variable as well as accessing its value.
> This is what makes the code generation problematical.

You must be using a heavily broken compiler:

defiant:/tmp/khc$ cat test.c 
volatile int spinner = 0;

void funct(void)
{
     while(spinner)
         ;
}
defiant:/tmp/khc$ gcc -Wall -O2 -c test.c
defiant:/tmp/khc$ objdump -d test.o 

00000000 <funct>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   a1 00 00 00 00          mov    0x0,%eax
   8:   85 c0                   test   %eax,%eax
   a:   75 f7                   jne    3 <funct+0x3>
   c:   5d                      pop    %ebp
   d:   c3                      ret    
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 119+ messages in thread

* RE: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:29                         ` linux-os (Dick Johnson)
  2006-07-06 12:39                           ` Arjan van de Ven
  2006-07-06 16:40                           ` Nick Piggin
@ 2006-07-06 23:19                           ` David Schwartz
  2 siblings, 0 replies; 119+ messages in thread
From: David Schwartz @ 2006-07-06 23:19 UTC (permalink / raw)
  To: linux-os (Dick Johnson), Arjan van de Ven; +Cc: linux-kernel


> Look at:
>
>  	http://en.wikipedia.org/wiki/Volatile_variable
>
> This is just what is needed to prevent the compiler from making
> non working
> code during optimization.

	That article is totally and completely wrong, in fact it's so wrong it's
harmful. For example, it says:

... [A] variable that might be concurrently modified by multiple
threads (without locks or a similar form of mutual exclusion) should be
declared volatile.

	Without pointing out that the use of 'volatile' is neither required nor
sufficient, this is an utterly false statement. The reference to "mutual
exclusion" is puzzling, since the problem is cached data, not concurrent
accesses.

	It talks about controlling compiler optimizations. What difference does it
make to me whether an optimization that breaks my code is made by the
compiler or the processor?

	The most serious problem with the article is that it does not point out
what is guaranteed behavior and what happens to be true for some particular
platforms. In fact, the only platform I know of where the behavior the
article implies is guaranteed is (at least arguably) actually guaranteed is
Win32. (And I'm not sure of what value a guarantee is that you have to argue
is implied mostly by omission.)

	Sadly, it omits any mention of the *actual* legitimate use of 'volatile'.
That is, the cases where it has guaranteed semantics and actually is both
necessary and sufficient.

	DS



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 21:02                               ` J.A. Magallón
@ 2006-07-06 21:12                                 ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 21:12 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: linux-os (Dick Johnson), Ingo Molnar, Andrew Morton, linux-kernel, arjan

[-- Attachment #1: Type: TEXT/PLAIN, Size: 418 bytes --]



On Thu, 6 Jul 2006, J.A. Magallón wrote:
> 
> I think you are mixing apples and oranges. Using volatile to control o-o-o
> memory accesses is sure wrong.

.. and there _is_ no right way to use it, except the two I've already 
mentioned. 

Why is that so hard to accept?

The fact is, "volatile" was designed in a different era, and tough, it's 
not one of the better parts of the C language.

Get over it.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:55                               ` Jan Engelhardt
@ 2006-07-06 21:07                                 ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 21:07 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Jeremy Fitzhardinge, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, Jan Engelhardt wrote:
> >
> > You need to exclude "asm volatile", which is a completely different thing.
> 
> 10077.

Yeah, way too many. 

That said, at least _some_ of them are:

 - casts to volatile inside arch-specific code serquences (ie the _good_ 
   kind of volatile - associated with _code_ rather than data). 

   See for example include/asm-i386/io.h for 100% valid examples of this
   kind of usage.

 - function argument values for functions that need to be able to take an 
   arbitrary pointer ("const volatile void *" is the most permissive 
   argument type - anything else the compiler can complain about you 
   dropping qualifiers)

   See include/asm-i386/bitops.h for examples of this kind of volatile.

So I'd expect that maybe one percent of them are actually valid ;)

And I suspect that a huge majority of the truly crapola ones are in 
drivers. Oh, well..

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 18:00                             ` Linus Torvalds
@ 2006-07-06 21:02                               ` J.A. Magallón
  2006-07-06 21:12                                 ` Linus Torvalds
  0 siblings, 1 reply; 119+ messages in thread
From: J.A. Magallón @ 2006-07-06 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-os (Dick Johnson), Ingo Molnar, Andrew Morton, linux-kernel, arjan

On Thu, 6 Jul 2006 11:00:43 -0700 (PDT), Linus Torvalds <torvalds@osdl.org> wrote:

> 
> 
> On Thu, 6 Jul 2006, linux-os (Dick Johnson) wrote:
> > 
> > Linus, you may have been reading too many novels.
> > 
> > If I have some code that does:
> > 
> > extern int spinner;
> > 
> > funct(){
> >      while(spinner)
> >          ;
> > 
> > The 'C' compiler has no choice but to actually make that memory access
> > and read the variable because the variable is in another module (a.k.a.
> > file).
> 
> You don't know how C works, do you?
> 
> You also have no idea of what out-of-order memory accesses do to OS code, 
> right?
> 

I think you are mixing apples and oranges. Using volatile to control o-o-o
memory accesses is sure wrong.
It just means "don't assume this variable has not changed because you don't
see any access to it" to the compiler. It is designed to prevent high level
optimizations like code movement or dead code elimination, but a _high_ level.
It has nothing to do with memory barriers and so on. That is surely a
misuse of volatile.

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.0 (Cooker) for i586
Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:40                                 ` Chris Friesen
@ 2006-07-06 21:00                                   ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 21:00 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel



On Thu, 6 Jul 2006, Chris Friesen wrote:
> 
> But for someone coding to POSIX, is there any reason why they would use
> barriers?  If so, what API would they use?

The POSIX thread api? 

Anyway, who cares? The final word on the kernel is that "volatile" is 
wrong. Arguing against that standpoint is pointless, especially since 
other real kernel developers have already stood up to back me up on this.

In user programs, you can do whatever you damn well please. I don't care, 
it's not my problem.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:26                             ` Jeremy Fitzhardinge
@ 2006-07-06 20:55                               ` Jan Engelhardt
  2006-07-06 21:07                                 ` Linus Torvalds
  0 siblings, 1 reply; 119+ messages in thread
From: Jan Engelhardt @ 2006-07-06 20:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel, arjan

>> $ find linux-2.6.17 -type f -iname '*.[ch]' -print0 | xargs -0 grep
>> volatile | wc -l
>> 13948
>> 
>> Tough job.
>
> You need to exclude "asm volatile", which is a completely different thing.

10077.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:58                               ` Linus Torvalds
  2006-07-06 20:27                                 ` Mark Lord
@ 2006-07-06 20:40                                 ` Chris Friesen
  2006-07-06 21:00                                   ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Chris Friesen @ 2006-07-06 20:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 6 Jul 2006, Chris Friesen wrote:
> 
>>The C standard requires the use of volatile for signal handlers and setjmp.
> 
> 
> Actually, the C standard requires "sigatomic_t".


 From ISO/IEC 9899:TC2, both of these specifically mention volatile:


7.13.2.1 The longjmp function

3 All accessible objects have values, and all other components of the 
abstract machine212) have state, as of the time the longjmp function was 
called, except that the values of objects of automatic storage duration 
that are local to the function containing the invocation of the 
corresponding setjmp macro that do not have volatile-qualified type and 
have been changed between the setjmp invocation and longjmp call are 
indeterminate.


7.14.1.1 The signal function

5 If the signal occurs other than as the result of calling the abort or 
raise function, the behavior is undefined if the signal handler refers 
to any object with static storage duration other than by assigning a 
value to an object declared as volatile sig_atomic_t, ...


>>For userspace at least the whole discussion of "barriers" is sort of
>>moot--there are no memory barriers defined in the C language, which makes it
>>kind of hard to write portable code that uses them.
> 
> 
> Any locking primitive BY DEFINITION has a barrier in it.
> 
> If it doesn't, it's not a locking primitive, it's a random sequence of 
> code that does something pointless.

Sure, so the implementation of the locking primitives requires 
hardware-specific knowledge.

But for someone coding to POSIX, is there any reason why they would use 
barriers?  If so, what API would they use?

Chris

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:56                     ` Linus Torvalds
@ 2006-07-06 20:34                       ` Linus Torvalds
  2006-07-08 22:50                         ` Ralf Baechle
  2006-07-09  3:07                         ` Keith Owens
  0 siblings, 2 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 20:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, Linus Torvalds wrote:
> 
> So I _think_ that we should change the "=m" to the much more correct "+m" 
> at the same time (or before - it's really a bug-fix regardless) as 
> removing the "volatile".

Here's a first cut (UNTESTED!) for x86. I didn't check any other 
architectures, I bet they have similar problems.

		Linus

----
diff --git a/include/asm-i386/atomic.h b/include/asm-i386/atomic.h
index 4f061fa..51a1662 100644
--- a/include/asm-i386/atomic.h
+++ b/include/asm-i386/atomic.h
@@ -46,8 +46,8 @@ static __inline__ void atomic_add(int i,
 {
 	__asm__ __volatile__(
 		LOCK_PREFIX "addl %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"+m" (v->counter)
+		:"ir" (i));
 }
 
 /**
@@ -61,8 +61,8 @@ static __inline__ void atomic_sub(int i,
 {
 	__asm__ __volatile__(
 		LOCK_PREFIX "subl %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"+m" (v->counter)
+		:"ir" (i));
 }
 
 /**
@@ -80,8 +80,8 @@ static __inline__ int atomic_sub_and_tes
 
 	__asm__ __volatile__(
 		LOCK_PREFIX "subl %2,%0; sete %1"
-		:"=m" (v->counter), "=qm" (c)
-		:"ir" (i), "m" (v->counter) : "memory");
+		:"+m" (v->counter), "=qm" (c)
+		:"ir" (i) : "memory");
 	return c;
 }
 
@@ -95,8 +95,7 @@ static __inline__ void atomic_inc(atomic
 {
 	__asm__ __volatile__(
 		LOCK_PREFIX "incl %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"+m" (v->counter));
 }
 
 /**
@@ -109,8 +108,7 @@ static __inline__ void atomic_dec(atomic
 {
 	__asm__ __volatile__(
 		LOCK_PREFIX "decl %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"+m" (v->counter));
 }
 
 /**
@@ -127,8 +125,8 @@ static __inline__ int atomic_dec_and_tes
 
 	__asm__ __volatile__(
 		LOCK_PREFIX "decl %0; sete %1"
-		:"=m" (v->counter), "=qm" (c)
-		:"m" (v->counter) : "memory");
+		:"+m" (v->counter), "=qm" (c)
+		: : "memory");
 	return c != 0;
 }
 
@@ -146,8 +144,8 @@ static __inline__ int atomic_inc_and_tes
 
 	__asm__ __volatile__(
 		LOCK_PREFIX "incl %0; sete %1"
-		:"=m" (v->counter), "=qm" (c)
-		:"m" (v->counter) : "memory");
+		:"+m" (v->counter), "=qm" (c)
+		: : "memory");
 	return c != 0;
 }
 
@@ -166,8 +164,8 @@ static __inline__ int atomic_add_negativ
 
 	__asm__ __volatile__(
 		LOCK_PREFIX "addl %2,%0; sets %1"
-		:"=m" (v->counter), "=qm" (c)
-		:"ir" (i), "m" (v->counter) : "memory");
+		:"+m" (v->counter), "=qm" (c)
+		:"ir" (i) : "memory");
 	return c;
 }
 
diff --git a/include/asm-i386/futex.h b/include/asm-i386/futex.h
index 7b8ceef..946d97c 100644
--- a/include/asm-i386/futex.h
+++ b/include/asm-i386/futex.h
@@ -20,8 +20,8 @@ #define __futex_atomic_op1(insn, ret, ol
 	.align	8\n\
 	.long	1b,3b\n\
 	.previous"						\
-	: "=r" (oldval), "=r" (ret), "=m" (*uaddr)		\
-	: "i" (-EFAULT), "m" (*uaddr), "0" (oparg), "1" (0))
+	: "=r" (oldval), "=r" (ret), "+m" (*uaddr)		\
+	: "i" (-EFAULT), "0" (oparg), "1" (0))
 
 #define __futex_atomic_op2(insn, ret, oldval, uaddr, oparg) \
   __asm__ __volatile (						\
@@ -38,9 +38,9 @@ #define __futex_atomic_op2(insn, ret, ol
 	.align	8\n\
 	.long	1b,4b,2b,4b\n\
 	.previous"						\
-	: "=&a" (oldval), "=&r" (ret), "=m" (*uaddr),		\
+	: "=&a" (oldval), "=&r" (ret), "+m" (*uaddr),		\
 	  "=&r" (tem)						\
-	: "r" (oparg), "i" (-EFAULT), "m" (*uaddr), "1" (0))
+	: "r" (oparg), "i" (-EFAULT), "1" (0))
 
 static inline int
 futex_atomic_op_inuser (int encoded_op, int __user *uaddr)
@@ -123,7 +123,7 @@ futex_atomic_cmpxchg_inatomic(int __user
 		"	.long   1b,3b				\n"
 		"	.previous				\n"
 
-		: "=a" (oldval), "=m" (*uaddr)
+		: "=a" (oldval), "+m" (*uaddr)
 		: "i" (-EFAULT), "r" (newval), "0" (oldval)
 		: "memory"
 	);
diff --git a/include/asm-i386/local.h b/include/asm-i386/local.h
index 3b4998c..12060e2 100644
--- a/include/asm-i386/local.h
+++ b/include/asm-i386/local.h
@@ -17,32 +17,30 @@ static __inline__ void local_inc(local_t
 {
 	__asm__ __volatile__(
 		"incl %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"+m" (v->counter));
 }
 
 static __inline__ void local_dec(local_t *v)
 {
 	__asm__ __volatile__(
 		"decl %0"
-		:"=m" (v->counter)
-		:"m" (v->counter));
+		:"+m" (v->counter));
 }
 
 static __inline__ void local_add(long i, local_t *v)
 {
 	__asm__ __volatile__(
 		"addl %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"+m" (v->counter)
+		:"ir" (i));
 }
 
 static __inline__ void local_sub(long i, local_t *v)
 {
 	__asm__ __volatile__(
 		"subl %1,%0"
-		:"=m" (v->counter)
-		:"ir" (i), "m" (v->counter));
+		:"+m" (v->counter)
+		:"ir" (i));
 }
 
 /* On x86, these are no better than the atomic variants. */
diff --git a/include/asm-i386/posix_types.h b/include/asm-i386/posix_types.h
index 4e47ed0..133e31e 100644
--- a/include/asm-i386/posix_types.h
+++ b/include/asm-i386/posix_types.h
@@ -51,12 +51,12 @@ #if defined(__KERNEL__) || !defined(__GL
 #undef	__FD_SET
 #define __FD_SET(fd,fdsetp) \
 		__asm__ __volatile__("btsl %1,%0": \
-			"=m" (*(__kernel_fd_set *) (fdsetp)):"r" ((int) (fd)))
+			"+m" (*(__kernel_fd_set *) (fdsetp)):"r" ((int) (fd)))
 
 #undef	__FD_CLR
 #define __FD_CLR(fd,fdsetp) \
 		__asm__ __volatile__("btrl %1,%0": \
-			"=m" (*(__kernel_fd_set *) (fdsetp)):"r" ((int) (fd)))
+			"+m" (*(__kernel_fd_set *) (fdsetp)):"r" ((int) (fd)))
 
 #undef	__FD_ISSET
 #define __FD_ISSET(fd,fdsetp) (__extension__ ({ \
diff --git a/include/asm-i386/rwlock.h b/include/asm-i386/rwlock.h
index 94f0019..96b0bef 100644
--- a/include/asm-i386/rwlock.h
+++ b/include/asm-i386/rwlock.h
@@ -37,7 +37,7 @@ #define __build_read_lock_const(rw, help
 			"popl %%eax\n\t" \
 			"1:\n", \
 			"subl $1,%0\n\t", \
-			"=m" (*(volatile int *)rw) : : "memory")
+			"+m" (*(volatile int *)rw) : : "memory")
 
 #define __build_read_lock(rw, helper)	do { \
 						if (__builtin_constant_p(rw)) \
@@ -63,7 +63,7 @@ #define __build_write_lock_const(rw, hel
 			"popl %%eax\n\t" \
 			"1:\n", \
 			"subl $" RW_LOCK_BIAS_STR ",%0\n\t", \
-			"=m" (*(volatile int *)rw) : : "memory")
+			"+m" (*(volatile int *)rw) : : "memory")
 
 #define __build_write_lock(rw, helper)	do { \
 						if (__builtin_constant_p(rw)) \
diff --git a/include/asm-i386/rwsem.h b/include/asm-i386/rwsem.h
index 2f07601..43113f5 100644
--- a/include/asm-i386/rwsem.h
+++ b/include/asm-i386/rwsem.h
@@ -111,8 +111,8 @@ LOCK_PREFIX	"  incl      (%%eax)\n\t" /*
 		"  jmp       1b\n"
 		LOCK_SECTION_END
 		"# ending down_read\n\t"
-		: "=m"(sem->count)
-		: "a"(sem), "m"(sem->count)
+		: "+m" (sem->count)
+		: "a" (sem)
 		: "memory", "cc");
 }
 
@@ -133,8 +133,8 @@ LOCK_PREFIX	"  cmpxchgl  %2,%0\n\t"
 		"  jnz	     1b\n\t"
 		"2:\n\t"
 		"# ending __down_read_trylock\n\t"
-		: "+m"(sem->count), "=&a"(result), "=&r"(tmp)
-		: "i"(RWSEM_ACTIVE_READ_BIAS)
+		: "+m" (sem->count), "=&a" (result), "=&r" (tmp)
+		: "i" (RWSEM_ACTIVE_READ_BIAS)
 		: "memory", "cc");
 	return result>=0 ? 1 : 0;
 }
@@ -161,8 +161,8 @@ LOCK_PREFIX	"  xadd      %%edx,(%%eax)\n
 		"  jmp       1b\n"
 		LOCK_SECTION_END
 		"# ending down_write"
-		: "=m"(sem->count), "=d"(tmp)
-		: "a"(sem), "1"(tmp), "m"(sem->count)
+		: "+m" (sem->count), "=d" (tmp)
+		: "a" (sem), "1" (tmp)
 		: "memory", "cc");
 }
 
@@ -205,8 +205,8 @@ LOCK_PREFIX	"  xadd      %%edx,(%%eax)\n
 		"  jmp       1b\n"
 		LOCK_SECTION_END
 		"# ending __up_read\n"
-		: "=m"(sem->count), "=d"(tmp)
-		: "a"(sem), "1"(tmp), "m"(sem->count)
+		: "+m" (sem->count), "=d" (tmp)
+		: "a" (sem), "1" (tmp)
 		: "memory", "cc");
 }
 
@@ -231,8 +231,8 @@ LOCK_PREFIX	"  xaddl     %%edx,(%%eax)\n
 		"  jmp       1b\n"
 		LOCK_SECTION_END
 		"# ending __up_write\n"
-		: "=m"(sem->count)
-		: "a"(sem), "i"(-RWSEM_ACTIVE_WRITE_BIAS), "m"(sem->count)
+		: "+m" (sem->count)
+		: "a" (sem), "i" (-RWSEM_ACTIVE_WRITE_BIAS)
 		: "memory", "cc", "edx");
 }
 
@@ -256,8 +256,8 @@ LOCK_PREFIX	"  addl      %2,(%%eax)\n\t"
 		"  jmp       1b\n"
 		LOCK_SECTION_END
 		"# ending __downgrade_write\n"
-		: "=m"(sem->count)
-		: "a"(sem), "i"(-RWSEM_WAITING_BIAS), "m"(sem->count)
+		: "+m" (sem->count)
+		: "a" (sem), "i" (-RWSEM_WAITING_BIAS)
 		: "memory", "cc");
 }
 
@@ -268,8 +268,8 @@ static inline void rwsem_atomic_add(int 
 {
 	__asm__ __volatile__(
 LOCK_PREFIX	"addl %1,%0"
-		: "=m"(sem->count)
-		: "ir"(delta), "m"(sem->count));
+		: "+m" (sem->count)
+		: "ir" (delta));
 }
 
 /*
@@ -280,10 +280,9 @@ static inline int rwsem_atomic_update(in
 	int tmp = delta;
 
 	__asm__ __volatile__(
-LOCK_PREFIX	"xadd %0,(%2)"
-		: "+r"(tmp), "=m"(sem->count)
-		: "r"(sem), "m"(sem->count)
-		: "memory");
+LOCK_PREFIX	"xadd %0,%1"
+		: "+r" (tmp), "+m" (sem->count)
+		: : "memory");
 
 	return tmp+delta;
 }
diff --git a/include/asm-i386/semaphore.h b/include/asm-i386/semaphore.h
index f7a0f31..d51e800 100644
--- a/include/asm-i386/semaphore.h
+++ b/include/asm-i386/semaphore.h
@@ -107,7 +107,7 @@ static inline void down(struct semaphore
 		"call __down_failed\n\t"
 		"jmp 1b\n"
 		LOCK_SECTION_END
-		:"=m" (sem->count)
+		:"+m" (sem->count)
 		:
 		:"memory","ax");
 }
@@ -132,7 +132,7 @@ static inline int down_interruptible(str
 		"call __down_failed_interruptible\n\t"
 		"jmp 1b\n"
 		LOCK_SECTION_END
-		:"=a" (result), "=m" (sem->count)
+		:"=a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;
@@ -157,7 +157,7 @@ static inline int down_trylock(struct se
 		"call __down_failed_trylock\n\t"
 		"jmp 1b\n"
 		LOCK_SECTION_END
-		:"=a" (result), "=m" (sem->count)
+		:"=a" (result), "+m" (sem->count)
 		:
 		:"memory");
 	return result;
@@ -182,7 +182,7 @@ static inline void up(struct semaphore *
 		"jmp 1b\n"
 		LOCK_SECTION_END
 		".subsection 0\n"
-		:"=m" (sem->count)
+		:"+m" (sem->count)
 		:
 		:"memory","ax");
 }
diff --git a/include/asm-i386/spinlock.h b/include/asm-i386/spinlock.h
index 87c40f8..d816c62 100644
--- a/include/asm-i386/spinlock.h
+++ b/include/asm-i386/spinlock.h
@@ -65,7 +65,7 @@ static inline void __raw_spin_lock(raw_s
 	alternative_smp(
 		__raw_spin_lock_string,
 		__raw_spin_lock_string_up,
-		"=m" (lock->slock) : : "memory");
+		"+m" (lock->slock) : : "memory");
 }
 
 /*
@@ -79,7 +79,7 @@ static inline void __raw_spin_lock_flags
 	alternative_smp(
 		__raw_spin_lock_string_flags,
 		__raw_spin_lock_string_up,
-		"=m" (lock->slock) : "r" (flags) : "memory");
+		"+m" (lock->slock) : "r" (flags) : "memory");
 }
 #endif
 
@@ -88,7 +88,7 @@ static inline int __raw_spin_trylock(raw
 	char oldval;
 	__asm__ __volatile__(
 		"xchgb %b0,%1"
-		:"=q" (oldval), "=m" (lock->slock)
+		:"=q" (oldval), "+m" (lock->slock)
 		:"0" (0) : "memory");
 	return oldval > 0;
 }
@@ -104,7 +104,7 @@ #if !defined(CONFIG_X86_OOSTORE) && !def
 
 #define __raw_spin_unlock_string \
 	"movb $1,%0" \
-		:"=m" (lock->slock) : : "memory"
+		:"+m" (lock->slock) : : "memory"
 
 
 static inline void __raw_spin_unlock(raw_spinlock_t *lock)
@@ -118,7 +118,7 @@ #else
 
 #define __raw_spin_unlock_string \
 	"xchgb %b0, %1" \
-		:"=q" (oldval), "=m" (lock->slock) \
+		:"=q" (oldval), "+m" (lock->slock) \
 		:"0" (oldval) : "memory"
 
 static inline void __raw_spin_unlock(raw_spinlock_t *lock)
@@ -199,13 +199,13 @@ static inline int __raw_write_trylock(ra
 
 static inline void __raw_read_unlock(raw_rwlock_t *rw)
 {
-	asm volatile(LOCK_PREFIX "incl %0" :"=m" (rw->lock) : : "memory");
+	asm volatile(LOCK_PREFIX "incl %0" :"+m" (rw->lock) : : "memory");
 }
 
 static inline void __raw_write_unlock(raw_rwlock_t *rw)
 {
 	asm volatile(LOCK_PREFIX "addl $" RW_LOCK_BIAS_STR ", %0"
-				 : "=m" (rw->lock) : : "memory");
+				 : "+m" (rw->lock) : : "memory");
 }
 
 #endif /* __ASM_SPINLOCK_H */

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 20:28                                 ` Chris Friesen
@ 2006-07-06 20:32                                   ` Linus Torvalds
  0 siblings, 0 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 20:32 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel



On Thu, 6 Jul 2006, Chris Friesen wrote:
> 
> As long as you're not talking to external devices, each cpu must be coherent
> with respect to itself, no?  It's allowed to execute out-of-order, but it
> needs to make sure that by doing so it doesn't cause changes that are visible
> to software.

Right. But then "volatile" won't really matter either, unless you have 
some _ordering_ constraint, in which case "volatile" is not enough unless 
you're guaranteed to be single-threaded.

In other words, again, "volatile" is almost always the wrong thing to 
have, and just makes you _think_ your code is correct.

		Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:37                               ` Mark Lord
@ 2006-07-06 20:28                                 ` Chris Friesen
  2006-07-06 20:32                                   ` Linus Torvalds
  0 siblings, 1 reply; 119+ messages in thread
From: Chris Friesen @ 2006-07-06 20:28 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Mark Lord wrote:
> Chris Friesen wrote:
> 
>>
>> The "reordered" thing really only matters on SMP machines, no?
> 
> 
> Also (very much!) for device drivers.

Certainly...but I was coming at it from the perspective of userspace code.

As long as you're not talking to external devices, each cpu must be 
coherent with respect to itself, no?  It's allowed to execute 
out-of-order, but it needs to make sure that by doing so it doesn't 
cause changes that are visible to software.

Chris

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:58                               ` Linus Torvalds
@ 2006-07-06 20:27                                 ` Mark Lord
  2006-07-06 20:40                                 ` Chris Friesen
  1 sibling, 0 replies; 119+ messages in thread
From: Mark Lord @ 2006-07-06 20:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Friesen, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 6 Jul 2006, Chris Friesen wrote:
>> The C standard requires the use of volatile for signal handlers and setjmp.
> 
> Actually, the C standard requires "sigatomic_t".

That's sig_atomic_t, which simply guarantees that the data item
can be read or written indivisibly with respect to signal handlers.

The standard goes on to suggest using it in combination with volatile
as needed.

Cheers

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:32                           ` Jan Engelhardt
@ 2006-07-06 20:26                             ` Jeremy Fitzhardinge
  2006-07-06 20:55                               ` Jan Engelhardt
  0 siblings, 1 reply; 119+ messages in thread
From: Jeremy Fitzhardinge @ 2006-07-06 20:26 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Linus Torvalds, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel, arjan

Jan Engelhardt wrote:
> $ find linux-2.6.17 -type f -iname '*.[ch]' -print0 | xargs -0 grep 
> volatile | wc -l
> 13948
>
> Tough job.
You need to exclude "asm volatile", which is a completely different thing.

    J

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:33                             ` Chris Friesen
                                                 ` (3 preceding siblings ...)
  2006-07-06 19:42                               ` Jeff Garzik
@ 2006-07-06 19:58                               ` Linus Torvalds
  2006-07-06 20:27                                 ` Mark Lord
  2006-07-06 20:40                                 ` Chris Friesen
  4 siblings, 2 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 19:58 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel



On Thu, 6 Jul 2006, Chris Friesen wrote:
> 
> The C standard requires the use of volatile for signal handlers and setjmp.

Actually, the C standard requires "sigatomic_t".

> For userspace at least the whole discussion of "barriers" is sort of
> moot--there are no memory barriers defined in the C language, which makes it
> kind of hard to write portable code that uses them.

Any locking primitive BY DEFINITION has a barrier in it.

If it doesn't, it's not a locking primitive, it's a random sequence of 
code that does something pointless.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06  8:16                   ` [patch] spinlocks: remove 'volatile' Ingo Molnar
                                       ` (2 preceding siblings ...)
  2006-07-06 11:59                     ` linux-os (Dick Johnson)
@ 2006-07-06 19:56                     ` Linus Torvalds
  2006-07-06 20:34                       ` Linus Torvalds
  3 siblings, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 19:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, Ingo Molnar wrote:
> 
> yeah. I tried this and it indeed slashed 42K off text size (0.2%):
> 
>  text            data    bss     dec             filename
>  20779489        6073834 3075176 29928499        vmlinux.volatile
>  20736884        6073834 3075176 29885894        vmlinux.non-volatile
> 
> i booted the resulting allyesconfig bzImage and everything seems to be 
> working fine. Find patch below.

Ok, the patch itself looks fine, but looking at some of the usage of the 
spinlocks, I worry that the 'volatile' may actually have hidden a real 
bug.

Here's what __raw_spin_lock() looks like on x86:

        alternative_smp(
                __raw_spin_lock_string,
                __raw_spin_lock_string_up,
                "=m" (lock->slock) : : "memory");

where the actual spinlock sequence itself isn't important.

What's important is what we tell the compiler, and the "=m" in particular.

The compiler will validly think that the spinlock assembly wil overwrite 
the "slock" location _WITHOUT_ caring about the old value. 

And that's dangerous, because it means that in theory a sequence like

	spin_lock_init(&lock);
	spin_lock(&lock);

might end up having the compiler optimize away the "unnecessary" 
initialization code because the compiler (correctly, as far as we've told 
it) thinks that the spin_lock() will overwrite the initial value.

And the "volatile" would have hidden that bug, for totally stupid and 
unrelated reasons..

Now, admittedly, the above kind of code is insane, and possibly doesn't 
exist, but still..

So I _think_ that we should change the "=m" to the much more correct "+m" 
at the same time (or before - it's really a bug-fix regardless) as 
removing the "volatile".

It would be interesting to hear whether that actually changes any code 
generation (hopefully it doesn't, but if it does, that in itself is 
interesting).

Btw, grepping for "=m" shows that semaphores may have the same bug, and in 
fact, we seem to have that "volatile" there too (perhaps, in fact, because 
somebody hit the bug and decided to fix it the simple way with "volatile"? 
Who knows, it might even have been me, back before I realized how badly 
gcc messes up volatiles)

Interestingly (or perhaps not), "atomic_add()" and friends (along with the 
local_t stuff that seems to largely have been copied from the atomic code) 
use a combination of "=m" and "m" in the assembler dst/src to avoid the 
bug. They too would probably be better off using "+m".

I think that's because the whole "+m" thing is fairly new (and not well 
documented either).

Appended is my previous test-program extended to show the difference 
between "=m" and "+m"..

For me, I get:

	horrid:
	        subl    $16, %esp
	        movl    $1, 12(%esp)
	        movl    12(%esp), %eax
	        movl    %eax, a1
	        movl    $1, b1
	#APP
	        # overwrite a1
	        # modify b1
	#NO_APP
	        addl    $16, %esp
	        ret

	notbad:
	        movl    $1, b2
	#APP
	        # overwrite a2
	        # modify b2
	#NO_APP
	        ret

which shows that "horrid" (with volatile) generates truly crapola code due 
to the volatile, but that the "overwrite a1" will not optimize away the 
initializer.

And "notbad" has good code, but exactly because it's good code, the 
compiler has also noticed that the "=m" means that the initialization of 
"a2" is unnecessary, and has thus promptly removed it.

(Removing the initializer is _correct_ for that particular source code - 
it's just that with the current x86 spinlock() macros, it would be 
disastrous, and shows that your "just remove volatile" patch is dangerous 
because of our incorrect inline assembly)

		Linus
---
struct duh {
	volatile int i;
};

void horrid(void)
{
	extern struct duh a1;
	extern struct duh b1;

	a1 = (struct duh) { 1 };
	b1.i = 1;
	asm("# overwrite %0":"=m" (a1.i));
	asm("# modify %0":"+m" (b1.i));
}

struct gaah {
	int i;
};

void notbad(void)
{
	extern struct gaah a2;
	extern struct gaah b2;

	a2 = (struct gaah) { 1 };
	b2.i = 1;
	asm("# overwrite %0":"=m" (a2.i));
	asm("# modify %0":"+m" (b2.i));
}

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:33                             ` Chris Friesen
                                                 ` (2 preceding siblings ...)
  2006-07-06 19:41                               ` Måns Rullgård
@ 2006-07-06 19:42                               ` Jeff Garzik
  2006-07-06 19:58                               ` Linus Torvalds
  4 siblings, 0 replies; 119+ messages in thread
From: Jeff Garzik @ 2006-07-06 19:42 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Linus Torvalds, Mark Lord, Arjan van de Ven,
	linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Chris Friesen wrote:
> The "reordered" thing really only matters on SMP machines, no?  In which 

No.

Pentiums have had out-of-order execution for a while now.  Started with 
the Pentium Pro, I think.

	Jeff



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:33                             ` Chris Friesen
  2006-07-06 19:37                               ` Mark Lord
  2006-07-06 19:38                               ` Arjan van de Ven
@ 2006-07-06 19:41                               ` Måns Rullgård
  2006-07-06 19:42                               ` Jeff Garzik
  2006-07-06 19:58                               ` Linus Torvalds
  4 siblings, 0 replies; 119+ messages in thread
From: Måns Rullgård @ 2006-07-06 19:41 UTC (permalink / raw)
  To: linux-kernel

"Chris Friesen" <cfriesen@nortel.com> writes:

> Linus Torvalds wrote:
>
>> On Thu, 6 Jul 2006, Mark Lord wrote:
>
>>> A volatile declaration may be used to describe an object corresponding
>>> to a memory-mapped input/output port or an object accessed by an
>>> aysnchronously interrupting function.  Actions on objects so declared
>>> shall not be "optimized out" by an implementation or reordered except
>>> as permitted by the rules for evaluating expressions.
>> Note that the "reordered" is totally pointless.
>> The _hardware_ will re-order accesses. Which is the whole
>> point. "volatile" is basically never sufficient in itself.
>
> The "reordered" thing really only matters on SMP machines, no?

No, each CPU does write combining and write merging all on its own.

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:33                             ` Chris Friesen
  2006-07-06 19:37                               ` Mark Lord
@ 2006-07-06 19:38                               ` Arjan van de Ven
  2006-07-06 19:41                               ` Måns Rullgård
                                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-06 19:38 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Linus Torvalds, Mark Lord, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

On Thu, 2006-07-06 at 13:33 -0600, Chris Friesen wrote:
> Linus Torvalds wrote:
> 
> > On Thu, 6 Jul 2006, Mark Lord wrote:
> 
> >> A volatile declaration may be used to describe an object corresponding
> >> to a memory-mapped input/output port or an object accessed by an
> >> aysnchronously interrupting function.  Actions on objects so declared
> >> shall not be "optimized out" by an implementation or reordered except
> >> as permitted by the rules for evaluating expressions.
> > 
> > 
> > Note that the "reordered" is totally pointless.
> > 
> > The _hardware_ will re-order accesses. Which is the whole point. 
> > "volatile" is basically never sufficient in itself.
> 
> The "reordered" thing really only matters on SMP machines, no?  In which 
> case (for userspace) the locking mechanisms (mutexes, etc.) should do 
> The Right Thing to ensure visibility between cpus.
> 
> The C standard requires the use of volatile for signal handlers and setjmp.
> 
> For userspace at least the whole discussion of "barriers" is sort of 
> moot--there are no memory barriers defined in the C language, which 
> makes it kind of hard to write portable code that uses them.

You're falling into RBJ's trap. I did not say *MEMORY BARRIER*. While
for some uses of "volatile" that is the right substitute, for others it
is *optimization barrier* which matters. 



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:33                             ` Chris Friesen
@ 2006-07-06 19:37                               ` Mark Lord
  2006-07-06 20:28                                 ` Chris Friesen
  2006-07-06 19:38                               ` Arjan van de Ven
                                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 119+ messages in thread
From: Mark Lord @ 2006-07-06 19:37 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Linus Torvalds, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Chris Friesen wrote:
>
> The "reordered" thing really only matters on SMP machines, no?

Also (very much!) for device drivers.

Cheers

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 19:15                           ` Linus Torvalds
@ 2006-07-06 19:33                             ` Chris Friesen
  2006-07-06 19:37                               ` Mark Lord
                                                 ` (4 more replies)
  2006-07-08  8:40                             ` Avi Kivity
  1 sibling, 5 replies; 119+ messages in thread
From: Chris Friesen @ 2006-07-06 19:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Lord, Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel

Linus Torvalds wrote:

> On Thu, 6 Jul 2006, Mark Lord wrote:

>> A volatile declaration may be used to describe an object corresponding
>> to a memory-mapped input/output port or an object accessed by an
>> aysnchronously interrupting function.  Actions on objects so declared
>> shall not be "optimized out" by an implementation or reordered except
>> as permitted by the rules for evaluating expressions.
> 
> 
> Note that the "reordered" is totally pointless.
> 
> The _hardware_ will re-order accesses. Which is the whole point. 
> "volatile" is basically never sufficient in itself.

The "reordered" thing really only matters on SMP machines, no?  In which 
case (for userspace) the locking mechanisms (mutexes, etc.) should do 
The Right Thing to ensure visibility between cpus.

The C standard requires the use of volatile for signal handlers and setjmp.

For userspace at least the whole discussion of "barriers" is sort of 
moot--there are no memory barriers defined in the C language, which 
makes it kind of hard to write portable code that uses them.

Only a couple months ago Dave Miller mentioned setting variables 
modified in signal handlers as "volatile", and you didn't complain.  If 
fact you added that they should be of type "sigatomic_t".

(Pardon the long URL)

http://groups.google.com/group/linux.kernel/browse_frm/thread/18a59e3c9d8f6310/84881a7e53038b0e?lnk=st&q=sigatomic_t&rnum=8&hl=en#84881a7e53038b0e


Chris


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 16:13                         ` Linus Torvalds
  2006-07-06 17:04                           ` Jeff Garzik
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
@ 2006-07-06 19:32                           ` Jan Engelhardt
  2006-07-06 20:26                             ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 119+ messages in thread
From: Jan Engelhardt @ 2006-07-06 19:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-os (Dick Johnson), Ingo Molnar, Andrew Morton, linux-kernel, arjan

>> Any other use of "volatile" is almost certainly a bug, or just useless. 
>
>That's not just a theoretical notion, btw. We had _tons_ of these kinds of 
>"volatile"s in the original old networking code. They were _all_ wrong. 
>Every single one.
>

$ find linux-2.6.17 -type f -iname '*.[ch]' -print0 | xargs -0 grep 
volatile | wc -l
13948

Tough job.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 18:15                         ` Mark Lord
@ 2006-07-06 19:15                           ` Linus Torvalds
  2006-07-06 19:33                             ` Chris Friesen
  2006-07-08  8:40                             ` Avi Kivity
  0 siblings, 2 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 19:15 UTC (permalink / raw)
  To: Mark Lord
  Cc: Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Andrew Morton, linux-kernel



On Thu, 6 Jul 2006, Mark Lord wrote:
>
> I'm still browsing a copy here, but so far have only really found this:
> 
>  A volatile declaration may be used to describe an object corresponding
>  to a memory-mapped input/output port or an object accessed by an
>  aysnchronously interrupting function.  Actions on objects so declared
>  shall not be "optimized out" by an implementation or reordered except
>  as permitted by the rules for evaluating expressions.

Note that the "reordered" is totally pointless.

The _hardware_ will re-order accesses. Which is the whole point. 
"volatile" is basically never sufficient in itself.

So the definition of "volatile" literally made sense three or four decades 
ago. It's not sensible any more.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
  2006-07-06 18:00                             ` Linus Torvalds
  2006-07-06 18:10                             ` Michael Buesch
@ 2006-07-06 18:16                             ` Chase Venters
  2006-07-07 18:16                             ` Krzysztof Halasa
  3 siblings, 0 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-06 18:16 UTC (permalink / raw)
  To: linux-os \\(Dick Johnson\\)
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel, arjan

On Thu, 6 Jul 2006, linux-os \(Dick Johnson\) wrote:

>
> On Thu, 6 Jul 2006, Linus Torvalds wrote:
>
>>
>>
>> On Thu, 6 Jul 2006, Linus Torvalds wrote:
>>>
>>> Any other use of "volatile" is almost certainly a bug, or just useless.
>>
>> Side note: it's also totally possible that a volatiles _hides_ a bug, ie
>> removing the volatile ends up having bad effects, but that's because the
>> software itself isn't actually following the rules (or, more commonly, the
>> rules are broken, and somebody added "volatile" to hide the problem).
>>
>> That's not just a theoretical notion, btw. We had _tons_ of these kinds of
>> "volatile"s in the original old networking code. They were _all_ wrong.
>> Every single one.
>>
>> 			Linus
>>
>
> Linus, you may have been reading too many novels.
>
> If I have some code that does:
>
> extern int spinner;
>
> funct(){
>     while(spinner)
>         ;
>
> The 'C' compiler has no choice but to actually make that memory access
> and read the variable because the variable is in another module (a.k.a.
> file).
>
> However, if I have the same code, but the variable is visible during
> compile time, i.e.,
>
> int spinner=0;
>
> funct(){
>     while(spinner)
>         ;
>
> ... the compiler may eliminate that code altogether because it
> 'knows' that spinner is FALSE, having initialized it to zero
> itself.
>
> Since spinner is global in scope, somebody surely could have
> changed it before funct() was even called, but the current gcc
> 'C' compiler doesn't care and may optimize it away entirely. To
> prevent this, you must declare the variable volatile. To do
> otherwise is a bug.

No. The compiler can only optimize away that loop if spinner is provably 
const. Otherwise, _all_ non-const global variables would need to be 
declared 'volatile' because (as you imply) the compiler could forever assume their 
initial state has not changed.

Try it. Examine the -s. The loop is always present for me, even with -O3. 
Now, you'll notice that without a barrier, the value first loaded into the register
is never reloaded, making an infinite loop. But we know the compiler makes 
such optimizations, which is why we tell the compiler that the value 
should not be cached across the magic (barrier()) line.

> Reading between the lines of your text, you are trying to say
> that object 'spinner' should remain an integer, but any access
> should be cast, like:
>
> funct() {
>     while((volatile)spinner)
>         ;
>
> This is just a matter of style. It substitutes a number of casts
> for a simple declaration. That said, I think that the current
> implementation of 'volatile' is broken because the compiler
> seems to believe that the variable has moved! It recalculates
> the address of the variable as well as accessing its value.
> This is what makes the code generation problematical.
>

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:01                       ` Arjan van de Ven
  2006-07-06 12:29                         ` linux-os (Dick Johnson)
@ 2006-07-06 18:15                         ` Mark Lord
  2006-07-06 19:15                           ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Mark Lord @ 2006-07-06 18:15 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-os (Dick Johnson),
	Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

Arjan van de Ven wrote:
>
> this is not really what the C standard says.

..and now half of LKML scurries to find a copy of the ANSI C specs..

I'm still browsing a copy here, but so far have only really found this:

  A volatile declaration may be used to describe an object corresponding
  to a memory-mapped input/output port or an object accessed by an
  aysnchronously interrupting function.  Actions on objects so declared
  shall not be "optimized out" by an implementation or reordered except
  as permitted by the rules for evaluating expressions.

Still lots of document left to go through, so I suppose it might
contradict itself later on.

And of course this is only a language spec, not Real Life (tm).

Cheers


^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
  2006-07-06 18:00                             ` Linus Torvalds
@ 2006-07-06 18:10                             ` Michael Buesch
  2006-07-06 18:16                             ` Chase Venters
  2006-07-07 18:16                             ` Krzysztof Halasa
  3 siblings, 0 replies; 119+ messages in thread
From: Michael Buesch @ 2006-07-06 18:10 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan, Linus Torvalds

On Thursday 06 July 2006 19:52, linux-os (Dick Johnson) wrote:
> int spinner=0;
> 
> funct(){
>      while(spinner)
		barrier();

-- 
Greetings Michael.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
@ 2006-07-06 18:00                             ` Linus Torvalds
  2006-07-06 21:02                               ` J.A. Magallón
  2006-07-06 18:10                             ` Michael Buesch
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 18:00 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, linux-os (Dick Johnson) wrote:
> 
> Linus, you may have been reading too many novels.
> 
> If I have some code that does:
> 
> extern int spinner;
> 
> funct(){
>      while(spinner)
>          ;
> 
> The 'C' compiler has no choice but to actually make that memory access
> and read the variable because the variable is in another module (a.k.a.
> file).

You don't know how C works, do you?

You also have no idea of what out-of-order memory accesses do to OS code, 
right?

THE FACT IS, "volatile" IS USELESS, BADLY DEFINED, AND AN ALMOST 
COMPLETELY SURE SIGN OF BUGS.

Go on, do your own OS, and try to use "volatile" in it as the 
serialization abstraction. I personally will guarantee that you will fail. 
But hey, you can prove me wrong.

In the meantime, in Linux, "volatile" is considered a bug in any but the 
two special cases I already mentioned.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 16:13                         ` Linus Torvalds
  2006-07-06 17:04                           ` Jeff Garzik
@ 2006-07-06 17:52                           ` linux-os (Dick Johnson)
  2006-07-06 18:00                             ` Linus Torvalds
                                               ` (3 more replies)
  2006-07-06 19:32                           ` Jan Engelhardt
  2 siblings, 4 replies; 119+ messages in thread
From: linux-os (Dick Johnson) @ 2006-07-06 17:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan


On Thu, 6 Jul 2006, Linus Torvalds wrote:

>
>
> On Thu, 6 Jul 2006, Linus Torvalds wrote:
>>
>> Any other use of "volatile" is almost certainly a bug, or just useless.
>
> Side note: it's also totally possible that a volatiles _hides_ a bug, ie
> removing the volatile ends up having bad effects, but that's because the
> software itself isn't actually following the rules (or, more commonly, the
> rules are broken, and somebody added "volatile" to hide the problem).
>
> That's not just a theoretical notion, btw. We had _tons_ of these kinds of
> "volatile"s in the original old networking code. They were _all_ wrong.
> Every single one.
>
> 			Linus
>

Linus, you may have been reading too many novels.

If I have some code that does:

extern int spinner;

funct(){
     while(spinner)
         ;

The 'C' compiler has no choice but to actually make that memory access
and read the variable because the variable is in another module (a.k.a.
file).

However, if I have the same code, but the variable is visible during
compile time, i.e.,

int spinner=0;

funct(){
     while(spinner)
         ;

... the compiler may eliminate that code altogether because it
'knows' that spinner is FALSE, having initialized it to zero
itself.

Since spinner is global in scope, somebody surely could have
changed it before funct() was even called, but the current gcc
'C' compiler doesn't care and may optimize it away entirely. To
prevent this, you must declare the variable volatile. To do
otherwise is a bug.

Reading between the lines of your text, you are trying to say
that object 'spinner' should remain an integer, but any access
should be cast, like:

funct() {
     while((volatile)spinner)
         ;

This is just a matter of style. It substitutes a number of casts
for a simple declaration. That said, I think that the current
implementation of 'volatile' is broken because the compiler
seems to believe that the variable has moved! It recalculates
the address of the variable as well as accessing its value.
This is what makes the code generation problematical.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 16:13                         ` Linus Torvalds
@ 2006-07-06 17:04                           ` Jeff Garzik
  2006-07-06 17:52                           ` linux-os (Dick Johnson)
  2006-07-06 19:32                           ` Jan Engelhardt
  2 siblings, 0 replies; 119+ messages in thread
From: Jeff Garzik @ 2006-07-06 17:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-os (Dick Johnson), Ingo Molnar, Andrew Morton, linux-kernel, arjan

Linus Torvalds wrote:
> 
> On Thu, 6 Jul 2006, Linus Torvalds wrote:
>> Any other use of "volatile" is almost certainly a bug, or just useless. 
> 
> Side note: it's also totally possible that a volatiles _hides_ a bug, ie 
> removing the volatile ends up having bad effects, but that's because the 
> software itself isn't actually following the rules (or, more commonly, the 
> rules are broken, and somebody added "volatile" to hide the problem).
> 
> That's not just a theoretical notion, btw. We had _tons_ of these kinds of 
> "volatile"s in the original old networking code. They were _all_ wrong. 
> Every single one.

I see precisely what you describe in newly submitted network _drivers_, 
too.  People use volatile to cover up missing barriers; to attempt to 
cover up missing flushes (needing readl after a writel); to hide the 
fact that the driver sometimes uses writel() and sometimes just does a 
direct de-ref into MMIO space.

To my view, seeing "volatile" in code is often a "I was too lazy to 
debug the code" or "I was too lazy to make my code portable" situation.

	Jeff




^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:29                         ` linux-os (Dick Johnson)
  2006-07-06 12:39                           ` Arjan van de Ven
@ 2006-07-06 16:40                           ` Nick Piggin
  2006-07-06 23:19                           ` David Schwartz
  2 siblings, 0 replies; 119+ messages in thread
From: Nick Piggin @ 2006-07-06 16:40 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Arjan van de Ven, Ingo Molnar, Linus Torvalds, Andrew Morton,
	linux-kernel

linux-os (Dick Johnson) wrote:

>  	http://en.wikipedia.org/wiki/Memory_barrier
> 
> This is used to prevent out-of-order execution, not at all what is
> necessary.

I don't think memory barriers prevent out of order execution,
just out of order loads and stores (which could happen on CPUs
that do in-order execution).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 16:07                       ` Linus Torvalds
@ 2006-07-06 16:13                         ` Linus Torvalds
  2006-07-06 17:04                           ` Jeff Garzik
                                             ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 16:13 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, Linus Torvalds wrote:
> 
> Any other use of "volatile" is almost certainly a bug, or just useless. 

Side note: it's also totally possible that a volatiles _hides_ a bug, ie 
removing the volatile ends up having bad effects, but that's because the 
software itself isn't actually following the rules (or, more commonly, the 
rules are broken, and somebody added "volatile" to hide the problem).

That's not just a theoretical notion, btw. We had _tons_ of these kinds of 
"volatile"s in the original old networking code. They were _all_ wrong. 
Every single one.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 11:59                     ` linux-os (Dick Johnson)
  2006-07-06 12:01                       ` Arjan van de Ven
@ 2006-07-06 16:07                       ` Linus Torvalds
  2006-07-06 16:13                         ` Linus Torvalds
  1 sibling, 1 reply; 119+ messages in thread
From: Linus Torvalds @ 2006-07-06 16:07 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Ingo Molnar, Andrew Morton, linux-kernel, arjan



On Thu, 6 Jul 2006, linux-os (Dick Johnson) wrote:
> 
> Then GCC must be fixed. The keyword volatile is correct. It should
> force the compiler to read the variable every time it's used.

No.

"volatile" really _is_ misdesigned. The semantics of it are so unclear as 
to be totally useless. The only thing "volatile" can ever do is generate 
worse code, WITH NO UPSIDES.

Historically (and from the standpoint of the C standard), the definition 
of "volatile" is that any access is "visible" in the machine, and it 
really kind of makes sense for hardware accesses, except these days 
hardware accesses have other rules that are _not_ covered by "volatile", 
so you can't actually use them for that.

And for accesses that have some software rules (ie not IO devices etc), 
the rules for "volatile" are too vague to be useful. 

So if you actually have rules about how to access a particular piece of 
memory, just make those rules _explicit_. Use the real rules. Not 
volatile, because volatile will always do the wrong thing.

Also, more importantly, "volatile" is on the wrong _part_ of the whole 
system. In C, it's "data" that is volatile, but that is insane. Data 
isn't volatile - _accesses_ are volatile. So it may make sense to say 
"make this particular _access_ be careful", but not "make all accesses to 
this data use some random strategy".

So the only thing "volatile" is potentially useful for is:

 - actual accessor functions can use it in a _cast_ to make one particular 
   access follow the rules of "don't cache this one dereference". That is 
   useful as part of a _bigger_ set of rules about that access (ie it 
   might be the internal implementation of a "readb()", for example).

 - for "random number generation" data locations, where you literally 
   don't _have_ any rules except "it's a random number". The only really 
   valid example of this is the "jiffy" timer tick.

Any other use of "volatile" is almost certainly a bug, or just useless. 

It's a bug if the volatile means that you don't follow the proper protocol 
for accessing the data, and it's useless (and generally generates worse 
code) if you already do.

So just say NO! to volatile except under the above circumstances.

			Linus

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 13:39                             ` J.A. Magallón
  2006-07-06 13:43                               ` Arjan van de Ven
  2006-07-06 14:05                               ` Chase Venters
@ 2006-07-06 14:26                               ` Andreas Schwab
  2 siblings, 0 replies; 119+ messages in thread
From: Andreas Schwab @ 2006-07-06 14:26 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

"J.A. Magallón" <jamagallon@ono.com> writes:

> The barrier prevents the compiler of translating this to:
>
> for (int i=0; i<10; i++)
> {
> 	b[i] = *inb;
> 	a[i] = *ina;
> }
>
> or even to:
>
> for (int i=0; i<10; i++)
> 	a[i] = *ina;
> for (int i=0; i<10; i++)
> 	b[i] = *inb;
>
> but does not prevent it to do this:
>
> register int tmp_a = *ina;
> register int tmp_b = *inb;
>
> for (int i=0; i<10; i++)
> {
> 	a[i] = tmp_a;
> 	b[i] = tmp_b;
> }
>
> because nor 'ina' nor 'inb' change under what the compiler sees inside
> the loop. 'volatile' prevents the compiler of do a high level cache of
> *ina or *inb.

Actually the compiler may not do any of these transformations because *ina
or *inb may alias any of a[i] or b[i].

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 13:39                             ` J.A. Magallón
  2006-07-06 13:43                               ` Arjan van de Ven
@ 2006-07-06 14:05                               ` Chase Venters
  2006-07-06 14:26                               ` Andreas Schwab
  2 siblings, 0 replies; 119+ messages in thread
From: Chase Venters @ 2006-07-06 14:05 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: Arjan van de Ven, linux-os (Dick Johnson),
	Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Thursday 06 July 2006 08:39, J.A. Magallón wrote:
>
> // Read 10 samples from 2 A/D converters.
>
> int*	ina;
> int	a[10];
> int*	inb;
> int	b[10];
>
> for (int i=0; i<10; i++)
> {
> 	a[i] = *ina;
> 	barrier();
> 	b[i] = *inb;
> }
>
> The barrier prevents the compiler of translating this to:
>
> for (int i=0; i<10; i++)
> {
> 	b[i] = *inb;
> 	a[i] = *ina;
> }
>
> or even to:
>
> for (int i=0; i<10; i++)
> 	a[i] = *ina;
> for (int i=0; i<10; i++)
> 	b[i] = *inb;
>
> but does not prevent it to do this:
>
> register int tmp_a = *ina;
> register int tmp_b = *inb;
>
> for (int i=0; i<10; i++)
> {
> 	a[i] = tmp_a;
> 	b[i] = tmp_b;
> }
>
> because nor 'ina' nor 'inb' change under what the compiler sees inside
> the loop. 'volatile' prevents the compiler of do a high level cache of
> *ina or *inb.
>

Check the GCC documentation:

http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Extended-Asm.html
 
> If your assembler instructions access memory in an unpredictable fashion,
> add `memory' to the list of clobbered registers. This will cause GCC to not
> keep memory values cached in registers across the assembler instruction and
> not optimize stores or loads to that memory. You will also want to add the
> volatile keyword if the memory affected is not listed in the inputs or
> outputs of the asm, as the `memory' clobber does not count as a side-effect
> of the asm. If you know how large the accessed memory is, you can add it as
> input or output but if this is not known, you should add `memory'. As an
> example, if you access ten bytes of a string, you can use a memory input
> like:         

The reference to the volatile keyword here is of course talking about asm 
volatile usage to keep the compiler from optimizing out the seemingly 
pointless assembly (not usage on a variable).

Thanks,
Chase

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 13:39                             ` J.A. Magallón
@ 2006-07-06 13:43                               ` Arjan van de Ven
  2006-07-06 14:05                               ` Chase Venters
  2006-07-06 14:26                               ` Andreas Schwab
  2 siblings, 0 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-06 13:43 UTC (permalink / raw)
  To: J.A. Magallón
  Cc: linux-os (Dick Johnson),
	Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel


> > I did not talk about memory barriers. In fact, barrier() is NOT a memory
> > barrier. It's a compiler optimization barrier!
> > 
> 
> // Read 10 samples from 2 A/D converters.
> 
> int*	ina;
> int	a[10];
> int*	inb;
> int	b[10];
> 
> for (int i=0; i<10; i++)
> {
> 	a[i] = *ina;
> 	barrier();
> 	b[i] = *inb;
> }
> 
> The barrier prevents the compiler of translating this to:
> 
> for (int i=0; i<10; i++)
> {
> 	b[i] = *inb;
> 	a[i] = *ina;
> }
> 
> or even to:
> 
> for (int i=0; i<10; i++)
> 	a[i] = *ina;
> for (int i=0; i<10; i++)
> 	b[i] = *inb;
> 
> but does not prevent it to do this:

yes it does. It's a full optimization barrier; the compiler assumes all
register and memory content has changed from before the barrier(), and
it will start "fresh".



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:39                           ` Arjan van de Ven
@ 2006-07-06 13:39                             ` J.A. Magallón
  2006-07-06 13:43                               ` Arjan van de Ven
                                                 ` (2 more replies)
  0 siblings, 3 replies; 119+ messages in thread
From: J.A. Magallón @ 2006-07-06 13:39 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-os (Dick Johnson),
	Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Thu, 06 Jul 2006 14:39:43 +0200, Arjan van de Ven <arjan@infradead.org> wrote:

> On Thu, 2006-07-06 at 08:29 -0400, linux-os (Dick Johnson) wrote:
> > On Thu, 6 Jul 2006, Arjan van de Ven wrote:
> > 
> > > On Thu, 2006-07-06 at 07:59 -0400, linux-os (Dick Johnson) wrote:
> > >> On Thu, 6 Jul 2006, Ingo Molnar wrote:
> > >>
> > >>>
> > >>> * Linus Torvalds <torvalds@osdl.org> wrote:
> > >>>
> > >>>> I wonder if we should remove the "volatile". There really isn't
> > >>>> anything _good_ that gcc can do with it, but we've seen gcc code
> > >>>> generation do stupid things before just because "volatile" seems to
> > >>>> just disable even proper normal working.
> > >>
> > >> Then GCC must be fixed. The keyword volatile is correct. It should
> > >> force the compiler to read the variable every time it's used.
> > >
> > > this is not really what the C standard says.
> > >
> > >
> > >
> > >> This is not pointless. If GCC generates bad code, tell the
> > >> GCC people. The volatile keyword is essential.
> > >
> > > no the "volatile" semantics are vague, trecherous and evil. It's a LOT
> > > better to insert the well defined "barrier()" in the right places.
> > 
> > Look at:
> > 
> >  	http://en.wikipedia.org/wiki/Volatile_variable
> > 
> > This is just what is needed to prevent the compiler from making non working
> > code during optimization.
> 
> and an entry level document at wikipedia is more important than the C
> standard ;)
> 
> > 
> > Also look at:
> > 
> >  	http://en.wikipedia.org/wiki/Memory_barrier
> > 
> > This is used to prevent out-of-order execution, not at all what is
> > necessary.
> 
> I did not talk about memory barriers. In fact, barrier() is NOT a memory
> barrier. It's a compiler optimization barrier!
> 

// Read 10 samples from 2 A/D converters.

int*	ina;
int	a[10];
int*	inb;
int	b[10];

for (int i=0; i<10; i++)
{
	a[i] = *ina;
	barrier();
	b[i] = *inb;
}

The barrier prevents the compiler of translating this to:

for (int i=0; i<10; i++)
{
	b[i] = *inb;
	a[i] = *ina;
}

or even to:

for (int i=0; i<10; i++)
	a[i] = *ina;
for (int i=0; i<10; i++)
	b[i] = *inb;

but does not prevent it to do this:

register int tmp_a = *ina;
register int tmp_b = *inb;

for (int i=0; i<10; i++)
{
	a[i] = tmp_a;
	b[i] = tmp_b;
}

because nor 'ina' nor 'inb' change under what the compiler sees inside
the loop. 'volatile' prevents the compiler of do a high level cache of
*ina or *inb.

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.0 (Cooker) for i586
Linux 2.6.17-jam01 (gcc 4.1.1 20060518 (prerelease)) #2 SMP PREEMPT Wed

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:29                         ` linux-os (Dick Johnson)
@ 2006-07-06 12:39                           ` Arjan van de Ven
  2006-07-06 13:39                             ` J.A. Magallón
  2006-07-06 16:40                           ` Nick Piggin
  2006-07-06 23:19                           ` David Schwartz
  2 siblings, 1 reply; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-06 12:39 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Thu, 2006-07-06 at 08:29 -0400, linux-os (Dick Johnson) wrote:
> On Thu, 6 Jul 2006, Arjan van de Ven wrote:
> 
> > On Thu, 2006-07-06 at 07:59 -0400, linux-os (Dick Johnson) wrote:
> >> On Thu, 6 Jul 2006, Ingo Molnar wrote:
> >>
> >>>
> >>> * Linus Torvalds <torvalds@osdl.org> wrote:
> >>>
> >>>> I wonder if we should remove the "volatile". There really isn't
> >>>> anything _good_ that gcc can do with it, but we've seen gcc code
> >>>> generation do stupid things before just because "volatile" seems to
> >>>> just disable even proper normal working.
> >>
> >> Then GCC must be fixed. The keyword volatile is correct. It should
> >> force the compiler to read the variable every time it's used.
> >
> > this is not really what the C standard says.
> >
> >
> >
> >> This is not pointless. If GCC generates bad code, tell the
> >> GCC people. The volatile keyword is essential.
> >
> > no the "volatile" semantics are vague, trecherous and evil. It's a LOT
> > better to insert the well defined "barrier()" in the right places.
> 
> Look at:
> 
>  	http://en.wikipedia.org/wiki/Volatile_variable
> 
> This is just what is needed to prevent the compiler from making non working
> code during optimization.

and an entry level document at wikipedia is more important than the C
standard ;)

> 
> Also look at:
> 
>  	http://en.wikipedia.org/wiki/Memory_barrier
> 
> This is used to prevent out-of-order execution, not at all what is
> necessary.

I did not talk about memory barriers. In fact, barrier() is NOT a memory
barrier. It's a compiler optimization barrier!



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 12:01                       ` Arjan van de Ven
@ 2006-07-06 12:29                         ` linux-os (Dick Johnson)
  2006-07-06 12:39                           ` Arjan van de Ven
                                             ` (2 more replies)
  2006-07-06 18:15                         ` Mark Lord
  1 sibling, 3 replies; 119+ messages in thread
From: linux-os (Dick Johnson) @ 2006-07-06 12:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel


On Thu, 6 Jul 2006, Arjan van de Ven wrote:

> On Thu, 2006-07-06 at 07:59 -0400, linux-os (Dick Johnson) wrote:
>> On Thu, 6 Jul 2006, Ingo Molnar wrote:
>>
>>>
>>> * Linus Torvalds <torvalds@osdl.org> wrote:
>>>
>>>> I wonder if we should remove the "volatile". There really isn't
>>>> anything _good_ that gcc can do with it, but we've seen gcc code
>>>> generation do stupid things before just because "volatile" seems to
>>>> just disable even proper normal working.
>>
>> Then GCC must be fixed. The keyword volatile is correct. It should
>> force the compiler to read the variable every time it's used.
>
> this is not really what the C standard says.
>
>
>
>> This is not pointless. If GCC generates bad code, tell the
>> GCC people. The volatile keyword is essential.
>
> no the "volatile" semantics are vague, trecherous and evil. It's a LOT
> better to insert the well defined "barrier()" in the right places.

Look at:

 	http://en.wikipedia.org/wiki/Volatile_variable

This is just what is needed to prevent the compiler from making non working
code during optimization.

Also look at:

 	http://en.wikipedia.org/wiki/Memory_barrier

This is used to prevent out-of-order execution, not at all what is
necessary.

Also, just because it 'works' means nothing. When SMP processors
came about, we had many drivers with no spin-locks at all. Sometimes
some files had a byte or two changed. That was the only obvious
problem. If this had continued, the entire file system would be
trashed, but fortunately we learned about spin-locks and the
volatile keyword. You must not destroy the spin locks by removing
the volatile keyword.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06 11:59                     ` linux-os (Dick Johnson)
@ 2006-07-06 12:01                       ` Arjan van de Ven
  2006-07-06 12:29                         ` linux-os (Dick Johnson)
  2006-07-06 18:15                         ` Mark Lord
  2006-07-06 16:07                       ` Linus Torvalds
  1 sibling, 2 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-06 12:01 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Thu, 2006-07-06 at 07:59 -0400, linux-os (Dick Johnson) wrote:
> On Thu, 6 Jul 2006, Ingo Molnar wrote:
> 
> >
> > * Linus Torvalds <torvalds@osdl.org> wrote:
> >
> >> I wonder if we should remove the "volatile". There really isn't
> >> anything _good_ that gcc can do with it, but we've seen gcc code
> >> generation do stupid things before just because "volatile" seems to
> >> just disable even proper normal working.
> 
> Then GCC must be fixed. The keyword volatile is correct. It should
> force the compiler to read the variable every time it's used.

this is not really what the C standard says.



> This is not pointless. If GCC generates bad code, tell the
> GCC people. The volatile keyword is essential.

no the "volatile" semantics are vague, trecherous and evil. It's a LOT
better to insert the well defined "barrier()" in the right places.



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06  8:16                   ` [patch] spinlocks: remove 'volatile' Ingo Molnar
  2006-07-06  8:23                     ` Ingo Molnar
  2006-07-06  9:27                     ` Heiko Carstens
@ 2006-07-06 11:59                     ` linux-os (Dick Johnson)
  2006-07-06 12:01                       ` Arjan van de Ven
  2006-07-06 16:07                       ` Linus Torvalds
  2006-07-06 19:56                     ` Linus Torvalds
  3 siblings, 2 replies; 119+ messages in thread
From: linux-os (Dick Johnson) @ 2006-07-06 11:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Andrew Morton, linux-kernel, arjan


On Thu, 6 Jul 2006, Ingo Molnar wrote:

>
> * Linus Torvalds <torvalds@osdl.org> wrote:
>
>> I wonder if we should remove the "volatile". There really isn't
>> anything _good_ that gcc can do with it, but we've seen gcc code
>> generation do stupid things before just because "volatile" seems to
>> just disable even proper normal working.

Then GCC must be fixed. The keyword volatile is correct. It should
force the compiler to read the variable every time it's used.

>
> yeah. I tried this and it indeed slashed 42K off text size (0.2%):
>
> text            data    bss     dec             filename
> 20779489        6073834 3075176 29928499        vmlinux.volatile
> 20736884        6073834 3075176 29885894        vmlinux.non-volatile
>
> i booted the resulting allyesconfig bzImage and everything seems to be
> working fine. Find patch below.
>
> 	Ingo
>
> ------------------>
> Subject: spinlocks: remove 'volatile'
> From: Ingo Molnar <mingo@elte.hu>
>
> remove 'volatile' from the spinlock types, it causes gcc to
> generate really bad code. (and it's pointless anyway)
>

This is not pointless. If GCC generates bad code, tell the
GCC people. The volatile keyword is essential.


> this reduces the non-debug SMP kernel's size by 0.2% (!).
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> include/asm-i386/spinlock_types.h   |    4 ++--
> include/asm-x86_64/spinlock_types.h |    4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> Index: linux/include/asm-i386/spinlock_types.h
> ===================================================================
> --- linux.orig/include/asm-i386/spinlock_types.h
> +++ linux/include/asm-i386/spinlock_types.h
> @@ -6,13 +6,13 @@
> #endif
>
> typedef struct {
> -	volatile unsigned int slock;
> +	unsigned int slock;
> } raw_spinlock_t;
>
> #define __RAW_SPIN_LOCK_UNLOCKED	{ 1 }
>
> typedef struct {
> -	volatile unsigned int lock;
> +	unsigned int lock;
> } raw_rwlock_t;
>
> #define __RAW_RW_LOCK_UNLOCKED		{ RW_LOCK_BIAS }
> Index: linux/include/asm-x86_64/spinlock_types.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/spinlock_types.h
> +++ linux/include/asm-x86_64/spinlock_types.h
> @@ -6,13 +6,13 @@
> #endif
>
> typedef struct {
> -	volatile unsigned int slock;
> +	unsigned int slock;
> } raw_spinlock_t;
>
> #define __RAW_SPIN_LOCK_UNLOCKED	{ 1 }
>
> typedef struct {
> -	volatile unsigned int lock;
> +	unsigned int lock;
> } raw_rwlock_t;
>
> #define __RAW_RW_LOCK_UNLOCKED		{ RW_LOCK_BIAS }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06  9:27                     ` Heiko Carstens
@ 2006-07-06  9:34                       ` Arjan van de Ven
  0 siblings, 0 replies; 119+ messages in thread
From: Arjan van de Ven @ 2006-07-06  9:34 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel


> 
> Shouldn't the __raw_read_can_lock and __raw_write_can_lock macros be changed too, just
> to make sure the value gets read every single time if it's used in a loop?
> Just like the __raw_spin_is_locked already has a (volatile signed char * cast)?
> -

it shouldn't get a volatile case imo, just a barrier().
A barrier() at least has well defined semantics, unlike volatile...



^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06  8:16                   ` [patch] spinlocks: remove 'volatile' Ingo Molnar
  2006-07-06  8:23                     ` Ingo Molnar
@ 2006-07-06  9:27                     ` Heiko Carstens
  2006-07-06  9:34                       ` Arjan van de Ven
  2006-07-06 11:59                     ` linux-os (Dick Johnson)
  2006-07-06 19:56                     ` Linus Torvalds
  3 siblings, 1 reply; 119+ messages in thread
From: Heiko Carstens @ 2006-07-06  9:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, Andrew Morton, linux-kernel, arjan

> Subject: spinlocks: remove 'volatile'
> From: Ingo Molnar <mingo@elte.hu>
> 
> remove 'volatile' from the spinlock types, it causes gcc to
> generate really bad code. (and it's pointless anyway)
> 
> this reduces the non-debug SMP kernel's size by 0.2% (!).
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  include/asm-i386/spinlock_types.h   |    4 ++--
>  include/asm-x86_64/spinlock_types.h |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux/include/asm-i386/spinlock_types.h
> ===================================================================
> --- linux.orig/include/asm-i386/spinlock_types.h
> +++ linux/include/asm-i386/spinlock_types.h
> @@ -6,13 +6,13 @@
>  #endif
>  
>  typedef struct {
> -	volatile unsigned int slock;
> +	unsigned int slock;
>  } raw_spinlock_t;
>  
>  #define __RAW_SPIN_LOCK_UNLOCKED	{ 1 }
>  
>  typedef struct {
> -	volatile unsigned int lock;
> +	unsigned int lock;
>  } raw_rwlock_t;

Shouldn't the __raw_read_can_lock and __raw_write_can_lock macros be changed too, just
to make sure the value gets read every single time if it's used in a loop?
Just like the __raw_spin_is_locked already has a (volatile signed char * cast)?

^ permalink raw reply	[flat|nested] 119+ messages in thread

* Re: [patch] spinlocks: remove 'volatile'
  2006-07-06  8:16                   ` [patch] spinlocks: remove 'volatile' Ingo Molnar
@ 2006-07-06  8:23                     ` Ingo Molnar
  2006-07-06  9:27                     ` Heiko Carstens
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 119+ messages in thread
From: Ingo Molnar @ 2006-07-06  8:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, arjan


* Ingo Molnar <mingo@elte.hu> wrote:

> * Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > I wonder if we should remove the "volatile". There really isn't 
> > anything _good_ that gcc can do with it, but we've seen gcc code 
> > generation do stupid things before just because "volatile" seems to 
> > just disable even proper normal working.
> 
> yeah. I tried this and it indeed slashed 42K off text size (0.2%):
> 
>  text            data    bss     dec             filename
>  20779489        6073834 3075176 29928499        vmlinux.volatile
>  20736884        6073834 3075176 29885894        vmlinux.non-volatile
> 
> i booted the resulting allyesconfig bzImage and everything seems to be 
> working fine. Find patch below.

btw., this effect accounted for roughly half of the per-callsite win of 
the wait.h uninlining patch. That still leaves 18 bytes of per-callsite 
win - i'll send that patch next.

	Ingo

^ permalink raw reply	[flat|nested] 119+ messages in thread

* [patch] spinlocks: remove 'volatile'
  2006-07-05 23:11                 ` Linus Torvalds
@ 2006-07-06  8:16                   ` Ingo Molnar
  2006-07-06  8:23                     ` Ingo Molnar
                                       ` (3 more replies)
  0 siblings, 4 replies; 119+ messages in thread
From: Ingo Molnar @ 2006-07-06  8:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, arjan


* Linus Torvalds <torvalds@osdl.org> wrote:

> I wonder if we should remove the "volatile". There really isn't 
> anything _good_ that gcc can do with it, but we've seen gcc code 
> generation do stupid things before just because "volatile" seems to 
> just disable even proper normal working.

yeah. I tried this and it indeed slashed 42K off text size (0.2%):

 text            data    bss     dec             filename
 20779489        6073834 3075176 29928499        vmlinux.volatile
 20736884        6073834 3075176 29885894        vmlinux.non-volatile

i booted the resulting allyesconfig bzImage and everything seems to be 
working fine. Find patch below.

	Ingo

------------------>
Subject: spinlocks: remove 'volatile'
From: Ingo Molnar <mingo@elte.hu>

remove 'volatile' from the spinlock types, it causes gcc to
generate really bad code. (and it's pointless anyway)

this reduces the non-debug SMP kernel's size by 0.2% (!).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/asm-i386/spinlock_types.h   |    4 ++--
 include/asm-x86_64/spinlock_types.h |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux/include/asm-i386/spinlock_types.h
===================================================================
--- linux.orig/include/asm-i386/spinlock_types.h
+++ linux/include/asm-i386/spinlock_types.h
@@ -6,13 +6,13 @@
 #endif
 
 typedef struct {
-	volatile unsigned int slock;
+	unsigned int slock;
 } raw_spinlock_t;
 
 #define __RAW_SPIN_LOCK_UNLOCKED	{ 1 }
 
 typedef struct {
-	volatile unsigned int lock;
+	unsigned int lock;
 } raw_rwlock_t;
 
 #define __RAW_RW_LOCK_UNLOCKED		{ RW_LOCK_BIAS }
Index: linux/include/asm-x86_64/spinlock_types.h
===================================================================
--- linux.orig/include/asm-x86_64/spinlock_types.h
+++ linux/include/asm-x86_64/spinlock_types.h
@@ -6,13 +6,13 @@
 #endif
 
 typedef struct {
-	volatile unsigned int slock;
+	unsigned int slock;
 } raw_spinlock_t;
 
 #define __RAW_SPIN_LOCK_UNLOCKED	{ 1 }
 
 typedef struct {
-	volatile unsigned int lock;
+	unsigned int lock;
 } raw_rwlock_t;
 
 #define __RAW_RW_LOCK_UNLOCKED		{ RW_LOCK_BIAS }

^ permalink raw reply	[flat|nested] 119+ messages in thread

end of thread, other threads:[~2006-07-14  3:31 UTC | newest]

Thread overview: 119+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-07 10:21 [patch] spinlocks: remove 'volatile' Chuck Ebbert
2006-07-07 17:09 ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2006-07-08  6:12 trajce nedev
2006-07-08  6:19 ` Chase Venters
2006-07-08  6:45   ` trajce nedev
2006-07-08  6:58     ` Arjan van de Ven
2006-07-08  7:02     ` Vadim Lobanov
2006-07-08 13:46     ` Chase Venters
2006-07-09  4:39       ` Benjamin Herrenschmidt
2006-07-14  3:30       ` Steven Rostedt
2006-07-08 18:23 ` Linus Torvalds
2006-07-08  3:54 Albert Cahalan
2006-07-08  5:25 ` Linus Torvalds
2006-07-08  6:39   ` Nick Piggin
2006-07-08 18:53     ` Linus Torvalds
2006-07-08  9:45 ` Joe Korty
2006-07-08  9:52   ` Arjan van de Ven
2006-07-08  9:59   ` David Schwartz
2006-07-08 10:24   ` Thomas Gleixner
2006-07-08 15:49     ` Albert Cahalan
2006-07-08 18:31       ` Thomas Gleixner
2006-07-08 19:33         ` Albert Cahalan
2006-07-08 20:11           ` Linus Torvalds
2006-07-09 21:10             ` Pavel Machek
2006-07-09 22:18               ` Linus Torvalds
2006-07-10 16:25               ` marty fouts
2006-07-08 23:10           ` David Schwartz
2006-07-08 13:57   ` Andi Kleen
2006-07-08 19:13     ` Linus Torvalds
2006-07-08 10:45 ` Krzysztof Halasa
2006-07-05 11:46 [patch] uninline init_waitqueue_*() functions Ingo Molnar
2006-07-05 17:10 ` Andrew Morton
2006-07-05 19:35   ` Ingo Molnar
2006-07-05 20:18     ` Andrew Morton
2006-07-05 20:36       ` Linus Torvalds
2006-07-05 20:47         ` Ingo Molnar
2006-07-05 21:21           ` Linus Torvalds
2006-07-05 21:45             ` Ingo Molnar
2006-07-05 22:09               ` Linus Torvalds
2006-07-05 23:11                 ` Linus Torvalds
2006-07-06  8:16                   ` [patch] spinlocks: remove 'volatile' Ingo Molnar
2006-07-06  8:23                     ` Ingo Molnar
2006-07-06  9:27                     ` Heiko Carstens
2006-07-06  9:34                       ` Arjan van de Ven
2006-07-06 11:59                     ` linux-os (Dick Johnson)
2006-07-06 12:01                       ` Arjan van de Ven
2006-07-06 12:29                         ` linux-os (Dick Johnson)
2006-07-06 12:39                           ` Arjan van de Ven
2006-07-06 13:39                             ` J.A. Magallón
2006-07-06 13:43                               ` Arjan van de Ven
2006-07-06 14:05                               ` Chase Venters
2006-07-06 14:26                               ` Andreas Schwab
2006-07-06 16:40                           ` Nick Piggin
2006-07-06 23:19                           ` David Schwartz
2006-07-06 18:15                         ` Mark Lord
2006-07-06 19:15                           ` Linus Torvalds
2006-07-06 19:33                             ` Chris Friesen
2006-07-06 19:37                               ` Mark Lord
2006-07-06 20:28                                 ` Chris Friesen
2006-07-06 20:32                                   ` Linus Torvalds
2006-07-06 19:38                               ` Arjan van de Ven
2006-07-06 19:41                               ` Måns Rullgård
2006-07-06 19:42                               ` Jeff Garzik
2006-07-06 19:58                               ` Linus Torvalds
2006-07-06 20:27                                 ` Mark Lord
2006-07-06 20:40                                 ` Chris Friesen
2006-07-06 21:00                                   ` Linus Torvalds
2006-07-08  8:40                             ` Avi Kivity
2006-07-08  8:51                               ` Arjan van de Ven
2006-07-08  9:20                                 ` Avi Kivity
2006-07-08  9:51                                   ` Arjan van de Ven
2006-07-08 10:18                                     ` Avi Kivity
2006-07-08 10:28                                       ` Thomas Gleixner
2006-07-09  4:19                                       ` Benjamin Herrenschmidt
2006-07-09 12:47                                         ` Avi Kivity
2006-07-09 19:16                                           ` David Schwartz
2006-07-09 19:51                                             ` Theodore Tso
2006-07-06 16:07                       ` Linus Torvalds
2006-07-06 16:13                         ` Linus Torvalds
2006-07-06 17:04                           ` Jeff Garzik
2006-07-06 17:52                           ` linux-os (Dick Johnson)
2006-07-06 18:00                             ` Linus Torvalds
2006-07-06 21:02                               ` J.A. Magallón
2006-07-06 21:12                                 ` Linus Torvalds
2006-07-06 18:10                             ` Michael Buesch
2006-07-06 18:16                             ` Chase Venters
2006-07-07 18:16                             ` Krzysztof Halasa
2006-07-07 19:51                               ` linux-os (Dick Johnson)
2006-07-07 20:25                                 ` Linus Torvalds
2006-07-07 21:22                                   ` linux-os (Dick Johnson)
2006-07-07 21:48                                     ` Chase Venters
2006-07-08 10:00                                       ` Krzysztof Halasa
2006-07-08 13:41                                         ` Chase Venters
2006-07-08 20:09                                           ` Krzysztof Halasa
2006-07-08 20:40                                             ` Chase Venters
2006-07-08 20:47                                               ` Chase Venters
2006-07-09 10:57                                               ` Krzysztof Halasa
2006-07-07 21:59                                     ` Linus Torvalds
2006-07-07 22:06                                       ` Linus Torvalds
2006-07-08 20:49                                       ` Pavel Machek
2006-07-08 21:43                                         ` Linus Torvalds
2006-07-07 22:05                                     ` J.A. Magallón
2006-07-07 22:22                                       ` Chase Venters
2006-07-07 22:37                                         ` J.A. Magallón
2006-07-08  9:33                                           ` David Schwartz
2006-07-07 22:49                                         ` J.A. Magallón
2006-07-07 22:59                                           ` Vadim Lobanov
2006-07-07 23:18                                           ` Chase Venters
2006-07-07 23:36                                       ` Davide Libenzi
2006-07-07 22:09                                     ` Ingo Molnar
2006-07-08  7:36                                       ` Ingo Molnar
2006-07-07 20:39                                 ` Krzysztof Halasa
2006-07-07 23:06                                 ` Björn Steinbrink
2006-07-08  8:36                                 ` Avi Kivity
2006-07-06 19:32                           ` Jan Engelhardt
2006-07-06 20:26                             ` Jeremy Fitzhardinge
2006-07-06 20:55                               ` Jan Engelhardt
2006-07-06 21:07                                 ` Linus Torvalds
2006-07-06 19:56                     ` Linus Torvalds
2006-07-06 20:34                       ` Linus Torvalds
2006-07-08 22:50                         ` Ralf Baechle
2006-07-09  3:09                           ` Linus Torvalds
2006-07-09  3:07                         ` Keith Owens
2006-07-09  3:29                           ` Linus Torvalds
2006-07-09  3:43                             ` Keith Owens
2006-07-09  3:58                           ` Linus Torvalds
2006-07-09  6:13                             ` David Miller
2006-07-09 14:28                             ` Roman Zippel
2006-07-09 15:27                               ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.