Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-09 13:14 [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
@ 2007-08-09 12:41 ` Arnd Bergmann
  2007-08-09 14:29   ` Chris Snook
  2007-08-14 22:31 ` Christoph Lameter
  1 sibling, 1 reply; 657+ messages in thread
From: Arnd Bergmann @ 2007-08-09 12:41 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thursday 09 August 2007, Chris Snook wrote:
> This patchset makes the behavior of atomic_read uniform by removing the
> volatile keyword from all atomic_t and atomic64_t definitions that currently
> have it, and instead explicitly casts the variable as volatile in
> atomic_read().  This leaves little room for creative optimization by the
> compiler, and is in keeping with the principles behind "volatile considered
> harmful".
> 

Just an idea: since all architectures already include asm-generic/atomic.h,
why not move the definitions of atomic_t and atomic64_t, as well as anything
that does not involve architecture specific inline assembly into the generic
header?

	Arnd <><

^ permalink raw reply	[flat|nested] 657+ messages in thread

* [PATCH 0/24] make atomic_read() behave consistently across all architectures
@ 2007-08-09 13:14 Chris Snook
  2007-08-09 12:41 ` Arnd Bergmann
  2007-08-14 22:31 ` Christoph Lameter
  0 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-09 13:14 UTC (permalink / raw)
  To: linux-kernel, linux-arch, torvalds
  Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl

As recent discussions[1], and bugs[2] have shown, there is a great deal of
confusion about the expected behavior of atomic_read(), compounded by the
fact that it is not the same on all architectures.  Since users expect calls
to atomic_read() to actually perform a read, it is not desirable to allow
the compiler to optimize this away.  Requiring the use of barrier() in this
case is inefficient, since we only want to re-load the atomic_t variable,
not everything else in scope.

This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read().  This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".

Busy-waiters should still use cpu_relax(), but fast paths may be able to
reduce their use of barrier() between some atomic_read() calls.

	-- Chris

1)	http://lkml.org/lkml/2007/7/1/52
2)	http://lkml.org/lkml/2007/8/8/122

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-09 12:41 ` Arnd Bergmann
@ 2007-08-09 14:29   ` Chris Snook
  2007-08-09 15:30     ` Arnd Bergmann
  0 siblings, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-09 14:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Arnd Bergmann wrote:
> On Thursday 09 August 2007, Chris Snook wrote:
>> This patchset makes the behavior of atomic_read uniform by removing the
>> volatile keyword from all atomic_t and atomic64_t definitions that currently
>> have it, and instead explicitly casts the variable as volatile in
>> atomic_read().  This leaves little room for creative optimization by the
>> compiler, and is in keeping with the principles behind "volatile considered
>> harmful".
>>
> 
> Just an idea: since all architectures already include asm-generic/atomic.h,
> why not move the definitions of atomic_t and atomic64_t, as well as anything
> that does not involve architecture specific inline assembly into the generic
> header?
> 
> 	Arnd <><

a) chicken and egg: asm-generic/atomic.h depends on definitions in asm/atomic.h

If you can find a way to reshuffle the code and make it simpler, I personally am 
all for it.  I'm skeptical that you'll get much to show for the effort.

b) The definitions aren't precisely identical between all architectures, so it 
would be a mess of special cases, which gets us right back to where we are now.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-09 14:29   ` Chris Snook
@ 2007-08-09 15:30     ` Arnd Bergmann
  0 siblings, 0 replies; 657+ messages in thread
From: Arnd Bergmann @ 2007-08-09 15:30 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thursday 09 August 2007, Chris Snook wrote:
> a) chicken and egg: asm-generic/atomic.h depends on definitions in asm/atomic.h

Ok, I see.
 
> If you can find a way to reshuffle the code and make it simpler, I personally am 
> all for it. I'm skeptical that you'll get much to show for the effort. 

I guess it could  be done using more macros or new headers, but I don't see
a way that would actually improve the situation.

> b) The definitions aren't precisely identical between all architectures, so it 
> would be a mess of special cases, which gets us right back to where we are now.

Why are they not identical? Anything beyond the 32/64 bit difference should
be the same afaics, or it might cause more bugs.

	Arnd <><

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-09 13:14 [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
  2007-08-09 12:41 ` Arnd Bergmann
@ 2007-08-14 22:31 ` Christoph Lameter
  2007-08-14 22:45   ` Chris Snook
  2007-08-14 23:08   ` Satyam Sharma
  1 sibling, 2 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-14 22:31 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Thu, 9 Aug 2007, Chris Snook wrote:

> This patchset makes the behavior of atomic_read uniform by removing the
> volatile keyword from all atomic_t and atomic64_t definitions that currently
> have it, and instead explicitly casts the variable as volatile in
> atomic_read().  This leaves little room for creative optimization by the
> compiler, and is in keeping with the principles behind "volatile considered
> harmful".

volatile is generally harmful even in atomic_read(). Barriers control
visibility and AFAICT things are fine.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 22:31 ` Christoph Lameter
@ 2007-08-14 22:45   ` Chris Snook
  2007-08-14 22:51     ` Christoph Lameter
  2007-08-14 23:08   ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-14 22:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

Christoph Lameter wrote:
> On Thu, 9 Aug 2007, Chris Snook wrote:
> 
>> This patchset makes the behavior of atomic_read uniform by removing the
>> volatile keyword from all atomic_t and atomic64_t definitions that currently
>> have it, and instead explicitly casts the variable as volatile in
>> atomic_read().  This leaves little room for creative optimization by the
>> compiler, and is in keeping with the principles behind "volatile considered
>> harmful".
> 
> volatile is generally harmful even in atomic_read(). Barriers control
> visibility and AFAICT things are fine.

But barriers force a flush of *everything* in scope, which we generally don't 
want.  On the other hand, we pretty much always want to flush atomic_* 
operations.  One way or another, we should be restricting the volatile behavior 
to the thing that needs it.  On most architectures, this patch set just moves 
that from the declaration, where it is considered harmful, to the use, where it 
is considered an occasional necessary evil.

See the resubmitted patchset, which also puts a cast in the atomic[64]_set 
operations.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 22:45   ` Chris Snook
@ 2007-08-14 22:51     ` Christoph Lameter
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-14 22:51 UTC (permalink / raw)
  To: Chris Snook
  Cc: linux-kernel, linux-arch, torvalds, netdev, akpm, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl

On Tue, 14 Aug 2007, Chris Snook wrote:

> But barriers force a flush of *everything* in scope, which we generally don't
> want.  On the other hand, we pretty much always want to flush atomic_*
> operations.  One way or another, we should be restricting the volatile
> behavior to the thing that needs it.  On most architectures, this patch set
> just moves that from the declaration, where it is considered harmful, to the
> use, where it is considered an occasional necessary evil.

Then we would need

	atomic_read()

and

	atomic_read_volatile()

atomic_read_volatile() would imply an object sized memory barrier before 
and after?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 23:08   ` Satyam Sharma
@ 2007-08-14 23:04     ` Chris Snook
  2007-08-14 23:14       ` Christoph Lameter
  2007-08-15  6:49       ` Herbert Xu
  2007-08-14 23:26     ` Paul E. McKenney
  2007-08-15 10:35     ` Stefan Richter
  2 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-14 23:04 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Linux Kernel Mailing List, linux-arch,
	torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Satyam Sharma wrote:
> 
> On Tue, 14 Aug 2007, Christoph Lameter wrote:
> 
>> On Thu, 9 Aug 2007, Chris Snook wrote:
>>
>>> This patchset makes the behavior of atomic_read uniform by removing the
>>> volatile keyword from all atomic_t and atomic64_t definitions that currently
>>> have it, and instead explicitly casts the variable as volatile in
>>> atomic_read().  This leaves little room for creative optimization by the
>>> compiler, and is in keeping with the principles behind "volatile considered
>>> harmful".
>> volatile is generally harmful even in atomic_read(). Barriers control
>> visibility and AFAICT things are fine.
> 
> Frankly, I don't see the need for this series myself either. Personal
> opinion (others may differ), but I consider "volatile" to be a sad /
> unfortunate wart in C (numerous threads on this list and on the gcc
> lists/bugzilla over the years stand testimony to this) and if we _can_
> steer clear of it, then why not -- why use this ill-defined primitive
> whose implementation has often differed over compiler versions and
> platforms? Granted, barrier() _is_ heavy-handed in that it makes the
> optimizer forget _everything_, but then somebody did post a forget()
> macro on this thread itself ...
> 
> [ BTW, why do we want the compiler to not optimize atomic_read()'s in
>   the first place? Atomic ops guarantee atomicity, which has nothing
>   to do with "volatility" -- users that expect "volatility" from
>   atomic ops are the ones who must be fixed instead, IMHO. ]

Because atomic operations are generally used for synchronization, which requires 
volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
  Some forget to and we get bugs, because people assume that atomic_read() 
actually reads something, and atomic_write() actually writes something.  Worse, 
these are architecture-specific, even compiler version-specific bugs that are 
often difficult to track down.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 22:31 ` Christoph Lameter
  2007-08-14 22:45   ` Chris Snook
@ 2007-08-14 23:08   ` Satyam Sharma
  2007-08-14 23:04     ` Chris Snook
                       ` (2 more replies)
  1 sibling, 3 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-14 23:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Chris Snook, Linux Kernel Mailing List, linux-arch, torvalds,
	netdev, Andrew Morton, ak, heiko.carstens, davem, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Tue, 14 Aug 2007, Christoph Lameter wrote:

> On Thu, 9 Aug 2007, Chris Snook wrote:
> 
> > This patchset makes the behavior of atomic_read uniform by removing the
> > volatile keyword from all atomic_t and atomic64_t definitions that currently
> > have it, and instead explicitly casts the variable as volatile in
> > atomic_read().  This leaves little room for creative optimization by the
> > compiler, and is in keeping with the principles behind "volatile considered
> > harmful".
> 
> volatile is generally harmful even in atomic_read(). Barriers control
> visibility and AFAICT things are fine.

Frankly, I don't see the need for this series myself either. Personal
opinion (others may differ), but I consider "volatile" to be a sad /
unfortunate wart in C (numerous threads on this list and on the gcc
lists/bugzilla over the years stand testimony to this) and if we _can_
steer clear of it, then why not -- why use this ill-defined primitive
whose implementation has often differed over compiler versions and
platforms? Granted, barrier() _is_ heavy-handed in that it makes the
optimizer forget _everything_, but then somebody did post a forget()
macro on this thread itself ...

[ BTW, why do we want the compiler to not optimize atomic_read()'s in
  the first place? Atomic ops guarantee atomicity, which has nothing
  to do with "volatility" -- users that expect "volatility" from
  atomic ops are the ones who must be fixed instead, IMHO. ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 23:04     ` Chris Snook
@ 2007-08-14 23:14       ` Christoph Lameter
  2007-08-15  6:49       ` Herbert Xu
  1 sibling, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-14 23:14 UTC (permalink / raw)
  To: Chris Snook
  Cc: Satyam Sharma, Linux Kernel Mailing List, linux-arch, torvalds,
	netdev, Andrew Morton, ak, heiko.carstens, davem, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Tue, 14 Aug 2007, Chris Snook wrote:

> Because atomic operations are generally used for synchronization, which
> requires volatile behavior.  Most such codepaths currently use an inefficient
> barrier().  Some forget to and we get bugs, because people assume that
> atomic_read() actually reads something, and atomic_write() actually writes
> something.  Worse, these are architecture-specific, even compiler
> version-specific bugs that are often difficult to track down.

Looks like we need to have lock and unlock semantics?

atomic_read()

which has no barrier or volatile implications.

atomic_read_for_lock

	Acquire semantics?


atomic_read_for_unlock

	Release semantics?


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 23:08   ` Satyam Sharma
  2007-08-14 23:04     ` Chris Snook
@ 2007-08-14 23:26     ` Paul E. McKenney
  2007-08-15 10:35     ` Stefan Richter
  2 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-14 23:26 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, torvalds, netdev, Andrew Morton, ak, heiko.carstens,
	davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx,
	rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 04:38:54AM +0530, Satyam Sharma wrote:
> 
> 
> On Tue, 14 Aug 2007, Christoph Lameter wrote:
> 
> > On Thu, 9 Aug 2007, Chris Snook wrote:
> > 
> > > This patchset makes the behavior of atomic_read uniform by removing the
> > > volatile keyword from all atomic_t and atomic64_t definitions that currently
> > > have it, and instead explicitly casts the variable as volatile in
> > > atomic_read().  This leaves little room for creative optimization by the
> > > compiler, and is in keeping with the principles behind "volatile considered
> > > harmful".
> > 
> > volatile is generally harmful even in atomic_read(). Barriers control
> > visibility and AFAICT things are fine.
> 
> Frankly, I don't see the need for this series myself either. Personal
> opinion (others may differ), but I consider "volatile" to be a sad /
> unfortunate wart in C (numerous threads on this list and on the gcc
> lists/bugzilla over the years stand testimony to this) and if we _can_
> steer clear of it, then why not -- why use this ill-defined primitive
> whose implementation has often differed over compiler versions and
> platforms? Granted, barrier() _is_ heavy-handed in that it makes the
> optimizer forget _everything_, but then somebody did post a forget()
> macro on this thread itself ...
> 
> [ BTW, why do we want the compiler to not optimize atomic_read()'s in
>   the first place? Atomic ops guarantee atomicity, which has nothing
>   to do with "volatility" -- users that expect "volatility" from
>   atomic ops are the ones who must be fixed instead, IMHO. ]

Interactions between mainline code and interrupt/NMI handlers on the same
CPU (for example, when both are using per-CPU variables.  See examples
previously posted in this thread, or look at the rcu_read_lock() and
rcu_read_unlock() implementations in http://lkml.org/lkml/2007/8/7/280.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 23:04     ` Chris Snook
  2007-08-14 23:14       ` Christoph Lameter
@ 2007-08-15  6:49       ` Herbert Xu
  2007-08-15  8:18         ` Heiko Carstens
  2007-08-15 16:13         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
  1 sibling, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-15  6:49 UTC (permalink / raw)
  To: Chris Snook
  Cc: satyam, clameter, linux-kernel, linux-arch, torvalds, netdev,
	akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Chris Snook <csnook@redhat.com> wrote:
> 
> Because atomic operations are generally used for synchronization, which requires 
> volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
>  Some forget to and we get bugs, because people assume that atomic_read() 
> actually reads something, and atomic_write() actually writes something.  Worse, 
> these are architecture-specific, even compiler version-specific bugs that are 
> often difficult to track down.

I'm yet to see a single example from the current tree where
this patch series is the correct solution.  So far the only
example has been a buggy piece of code which has since been
fixed with a cpu_relax.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15  6:49       ` Herbert Xu
@ 2007-08-15  8:18         ` Heiko Carstens
  2007-08-15 13:53           ` Stefan Richter
  2007-08-16  0:39           ` [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert() Satyam Sharma
  2007-08-15 16:13         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
  1 sibling, 2 replies; 657+ messages in thread
From: Heiko Carstens @ 2007-08-15  8:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Chris Snook, satyam, clameter, linux-kernel, linux-arch,
	torvalds, netdev, akpm, ak, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
> Chris Snook <csnook@redhat.com> wrote:
> > 
> > Because atomic operations are generally used for synchronization, which requires 
> > volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
> >  Some forget to and we get bugs, because people assume that atomic_read() 
> > actually reads something, and atomic_write() actually writes something.  Worse, 
> > these are architecture-specific, even compiler version-specific bugs that are 
> > often difficult to track down.
> 
> I'm yet to see a single example from the current tree where
> this patch series is the correct solution.  So far the only
> example has been a buggy piece of code which has since been
> fixed with a cpu_relax.

Btw.: we still have

include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));

Looks like they need to be fixed as well.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-14 23:08   ` Satyam Sharma
  2007-08-14 23:04     ` Chris Snook
  2007-08-14 23:26     ` Paul E. McKenney
@ 2007-08-15 10:35     ` Stefan Richter
  2007-08-15 12:04       ` Herbert Xu
                         ` (2 more replies)
  2 siblings, 3 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 10:35 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, torvalds, netdev, Andrew Morton, ak, heiko.carstens,
	davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx,
	rpjday, jesper.juhl, segher, Herbert Xu, Paul E. McKenney

Satyam Sharma wrote:
> [ BTW, why do we want the compiler to not optimize atomic_read()'s in
>   the first place? Atomic ops guarantee atomicity, which has nothing
>   to do with "volatility" -- users that expect "volatility" from
>   atomic ops are the ones who must be fixed instead, IMHO. ]

LDD3 says on page 125:  "The following operations are defined for the
type [atomic_t] and are guaranteed to be atomic with respect to all
processors of an SMP computer."

Doesn't "atomic WRT all processors" require volatility?
-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 10:35     ` Stefan Richter
@ 2007-08-15 12:04       ` Herbert Xu
  2007-08-15 12:31       ` Satyam Sharma
  2007-08-15 19:59       ` Christoph Lameter
  2 siblings, 0 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-15 12:04 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Satyam Sharma, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Paul E. McKenney

On Wed, Aug 15, 2007 at 12:35:31PM +0200, Stefan Richter wrote:
> 
> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

Not at all.  We also require this to be atomic without any
hint of volatility.

	extern int foo;
	foo = 4;

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 10:35     ` Stefan Richter
  2007-08-15 12:04       ` Herbert Xu
@ 2007-08-15 12:31       ` Satyam Sharma
  2007-08-15 13:08         ` Stefan Richter
                           ` (3 more replies)
  2007-08-15 19:59       ` Christoph Lameter
  2 siblings, 4 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 12:31 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Herbert Xu,
	Paul E. McKenney

On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > [ BTW, why do we want the compiler to not optimize atomic_read()'s in
> >   the first place? Atomic ops guarantee atomicity, which has nothing
> >   to do with "volatility" -- users that expect "volatility" from
> >   atomic ops are the ones who must be fixed instead, IMHO. ]
> 
> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

No, it definitely doesn't. Why should it?

"Atomic w.r.t. all processors" is just your normal, simple "atomicity"
for SMP systems (ensure that that object is modified / set / replaced
in main memory atomically) and has nothing to do with "volatile"
behaviour.

"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms), but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.

For example (say on i386):

(A)
$ cat tp1.c
int a;

void func(void)
{
	a = 10;
	a = 20;
}
$ gcc -Os -S tp1.c
$ cat tp1.s
...
movl    $20, a
...

(B)
$ cat tp2.c
volatile int a;

void func(void)
{
	a = 10;
	a = 20;
}
$ gcc -Os -S tp2.c
$ cat tp2.s
...
movl    $10, a
movl    $20, a
...

(C)
$ cat tp3.c
int a;

void func(void)
{
	*(volatile int *)&a = 10;
	*(volatile int *)&a = 20;
}
$ gcc -Os -S tp3.c
$ cat tp3.s
...
movl    $10, a
movl    $20, a
...

In (A) the compiler optimized "a = 10;" away, but the actual store
of the final value "20" to "a" was still "atomic". (B) and (C) also
exhibit "volatile" behaviour apart from the "atomicity".

But as others replied, it seems some callers out there depend upon
atomic ops exhibiting "volatile" behaviour as well, so that answers
my initial question, actually. I haven't looked at the code Paul
pointed me at, but I wonder if that "forget(x)" macro would help
those cases. I'd wish to avoid the "volatile" primitive, personally.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 12:31       ` Satyam Sharma
@ 2007-08-15 13:08         ` Stefan Richter
  2007-08-15 13:11           ` Stefan Richter
  2007-08-15 13:47           ` Satyam Sharma
  2007-08-15 18:31         ` Segher Boessenkool
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 13:08 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Herbert Xu,
	Paul E. McKenney

Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Stefan Richter wrote:
>> Doesn't "atomic WRT all processors" require volatility?
> 
> No, it definitely doesn't. Why should it?
> 
> "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> for SMP systems (ensure that that object is modified / set / replaced
> in main memory atomically) and has nothing to do with "volatile"
> behaviour.
> 
> "Volatile behaviour" itself isn't consistently defined (at least
> definitely not consistently implemented in various gcc versions across
> platforms), but it is /expected/ to mean something like: "ensure that
> every such access actually goes all the way to memory, and is not
> re-ordered w.r.t. to other accesses, as far as the compiler can take
> care of these". The last "as far as compiler can take care" disclaimer
> comes about due to CPUs doing their own re-ordering nowadays.
> 
> For example (say on i386):

[...]

> In (A) the compiler optimized "a = 10;" away, but the actual store
> of the final value "20" to "a" was still "atomic". (B) and (C) also
> exhibit "volatile" behaviour apart from the "atomicity".
> 
> But as others replied, it seems some callers out there depend upon
> atomic ops exhibiting "volatile" behaviour as well, so that answers
> my initial question, actually. I haven't looked at the code Paul
> pointed me at, but I wonder if that "forget(x)" macro would help
> those cases. I'd wish to avoid the "volatile" primitive, personally.

So, looking at load instead of store, understand I correctly that in
your opinion

	int b;

	b = atomic_read(&a);
	if (b)
		do_something_time_consuming();

	b = atomic_read(&a);
	if (b)
		do_something_more();

should be changed to explicitly forget(&a) after
do_something_time_consuming?

If so, how about the following:

static inline void A(atomic_t *a)
{
	int b = atomic_read(a);
	if (b)
		do_something_time_consuming();
}

static inline void B(atomic_t *a)
{
	int b = atomic_read(a);
	if (b)
		do_something_more();
}

static void C(atomic_t *a)
{
	A(a);
	B(b);
}

Would this need forget(a) after A(a)?

(Is the latter actually answered in C99 or is it compiler-dependent?)
-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 13:08         ` Stefan Richter
@ 2007-08-15 13:11           ` Stefan Richter
  2007-08-15 13:47           ` Satyam Sharma
  1 sibling, 0 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 13:11 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Herbert Xu,
	Paul E. McKenney

I wrote:
> static inline void A(atomic_t *a)
> {
> 	int b = atomic_read(a);
> 	if (b)
> 		do_something_time_consuming();
> }
> 
> static inline void B(atomic_t *a)
> {
> 	int b = atomic_read(a);
> 	if (b)
> 		do_something_more();
> }
> 
> static void C(atomic_t *a)
> {
> 	A(a);
> 	B(b);
	/* ^ typo */
	B(a);
> }
> 
> Would this need forget(a) after A(a)?
> 
> (Is the latter actually answered in C99 or is it compiler-dependent?)


-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 13:08         ` Stefan Richter
  2007-08-15 13:11           ` Stefan Richter
@ 2007-08-15 13:47           ` Satyam Sharma
  2007-08-15 14:25             ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 13:47 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Herbert Xu,
	Paul E. McKenney



On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > On Wed, 15 Aug 2007, Stefan Richter wrote:
> >> Doesn't "atomic WRT all processors" require volatility?
> > 
> > No, it definitely doesn't. Why should it?
> > 
> > "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> > for SMP systems (ensure that that object is modified / set / replaced
> > in main memory atomically) and has nothing to do with "volatile"
> > behaviour.
> > 
> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms), but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> > 
> > For example (say on i386):
> 
> [...]
> 
> > In (A) the compiler optimized "a = 10;" away, but the actual store
> > of the final value "20" to "a" was still "atomic". (B) and (C) also
> > exhibit "volatile" behaviour apart from the "atomicity".
> > 
> > But as others replied, it seems some callers out there depend upon
> > atomic ops exhibiting "volatile" behaviour as well, so that answers
> > my initial question, actually. I haven't looked at the code Paul
> > pointed me at, but I wonder if that "forget(x)" macro would help
> > those cases. I'd wish to avoid the "volatile" primitive, personally.
> 
> So, looking at load instead of store, understand I correctly that in
> your opinion
> 
> 	int b;
> 
> 	b = atomic_read(&a);
> 	if (b)
> 		do_something_time_consuming();
> 
> 	b = atomic_read(&a);
> 	if (b)
> 		do_something_more();
> 
> should be changed to explicitly forget(&a) after
> do_something_time_consuming?

No, I'd actually prefer something like what Christoph Lameter suggested,
i.e. users (such as above) who want "volatile"-like behaviour from atomic
ops can use alternative functions. How about something like:

#define atomic_read_volatile(v)			\
	({					\
		forget(&(v)->counter);		\
		((v)->counter);			\
	})

Or possibly, implement these "volatile" atomic ops variants in inline asm
like the patch that Sebastian Siewior has submitted on another thread just
a while back.

Of course, if we find there are more callers in the kernel who want the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions to
somethine else, say "atomic_read_nonvolatile" for all I care.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15  8:18         ` Heiko Carstens
@ 2007-08-15 13:53           ` Stefan Richter
  2007-08-15 14:35             ` Satyam Sharma
  2007-08-16  0:39           ` [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert() Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 13:53 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Herbert Xu, Chris Snook, satyam, clameter, linux-kernel,
	linux-arch, torvalds, netdev, akpm, ak, davem, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On 8/15/2007 10:18 AM, Heiko Carstens wrote:
> On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
>> Chris Snook <csnook@redhat.com> wrote:
>> > 
>> > Because atomic operations are generally used for synchronization, which requires 
>> > volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
>> >  Some forget to and we get bugs, because people assume that atomic_read() 
>> > actually reads something, and atomic_write() actually writes something.  Worse, 
>> > these are architecture-specific, even compiler version-specific bugs that are 
>> > often difficult to track down.
>> 
>> I'm yet to see a single example from the current tree where
>> this patch series is the correct solution.  So far the only
>> example has been a buggy piece of code which has since been
>> fixed with a cpu_relax.
> 
> Btw.: we still have
> 
> include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
> include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));
> 
> Looks like they need to be fixed as well.


I don't know if this here is affected:

/* drivers/ieee1394/ieee1394_core.h */
static inline unsigned int get_hpsb_generation(struct hpsb_host *host)
{
	return atomic_read(&host->generation);
}

/* drivers/ieee1394/nodemgr.c */
static int nodemgr_host_thread(void *__hi)
{
	[...]

	for (;;) {
		[... sleep until bus reset event ...]

		/* Pause for 1/4 second in 1/16 second intervals,
		 * to make sure things settle down. */
		g = get_hpsb_generation(host);
		for (i = 0; i < 4 ; i++) {
			if (msleep_interruptible(63) ||
			    kthread_should_stop())
				goto exit;

	/* Now get the generation in which the node ID's we collect
	 * are valid.  During the bus scan we will use this generation
	 * for the read transactions, so that if another reset occurs
	 * during the scan the transactions will fail instead of
	 * returning bogus data. */

			generation = get_hpsb_generation(host);

	/* If we get a reset before we are done waiting, then
	 * start the waiting over again */

			if (generation != g)
				g = generation, i = 0;
		}

		[... scan bus, using generation ...]

	}
exit:
[...]
}



-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 13:47           ` Satyam Sharma
@ 2007-08-15 14:25             ` Paul E. McKenney
  2007-08-15 15:33               ` Herbert Xu
                                 ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 14:25 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Stefan Richter wrote:
> > Satyam Sharma wrote:
> > > On Wed, 15 Aug 2007, Stefan Richter wrote:
> > >> Doesn't "atomic WRT all processors" require volatility?
> > > 
> > > No, it definitely doesn't. Why should it?
> > > 
> > > "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> > > for SMP systems (ensure that that object is modified / set / replaced
> > > in main memory atomically) and has nothing to do with "volatile"
> > > behaviour.
> > > 
> > > "Volatile behaviour" itself isn't consistently defined (at least
> > > definitely not consistently implemented in various gcc versions across
> > > platforms), but it is /expected/ to mean something like: "ensure that
> > > every such access actually goes all the way to memory, and is not
> > > re-ordered w.r.t. to other accesses, as far as the compiler can take
> > > care of these". The last "as far as compiler can take care" disclaimer
> > > comes about due to CPUs doing their own re-ordering nowadays.
> > > 
> > > For example (say on i386):
> > 
> > [...]
> > 
> > > In (A) the compiler optimized "a = 10;" away, but the actual store
> > > of the final value "20" to "a" was still "atomic". (B) and (C) also
> > > exhibit "volatile" behaviour apart from the "atomicity".
> > > 
> > > But as others replied, it seems some callers out there depend upon
> > > atomic ops exhibiting "volatile" behaviour as well, so that answers
> > > my initial question, actually. I haven't looked at the code Paul
> > > pointed me at, but I wonder if that "forget(x)" macro would help
> > > those cases. I'd wish to avoid the "volatile" primitive, personally.
> > 
> > So, looking at load instead of store, understand I correctly that in
> > your opinion
> > 
> > 	int b;
> > 
> > 	b = atomic_read(&a);
> > 	if (b)
> > 		do_something_time_consuming();
> > 
> > 	b = atomic_read(&a);
> > 	if (b)
> > 		do_something_more();
> > 
> > should be changed to explicitly forget(&a) after
> > do_something_time_consuming?
> 
> No, I'd actually prefer something like what Christoph Lameter suggested,
> i.e. users (such as above) who want "volatile"-like behaviour from atomic
> ops can use alternative functions. How about something like:
> 
> #define atomic_read_volatile(v)			\
> 	({					\
> 		forget(&(v)->counter);		\
> 		((v)->counter);			\
> 	})

Wouldn't the above "forget" the value, throw it away, then forget
that it forgot it, giving non-volatile semantics?

> Or possibly, implement these "volatile" atomic ops variants in inline asm
> like the patch that Sebastian Siewior has submitted on another thread just
> a while back.

Given that you are advocating a change (please keep in mind that
atomic_read() and atomic_set() had volatile semantics on almost all
platforms), care to give some example where these historical volatile
semantics are causing a problem?

> Of course, if we find there are more callers in the kernel who want the
> volatility behaviour than those who don't care, we can re-define the
> existing ops to such variants, and re-name the existing definitions to
> somethine else, say "atomic_read_nonvolatile" for all I care.

Do we really need another set of APIs?  Can you give even one example
where the pre-existing volatile semantics are causing enough of a problem
to justify adding yet more atomic_*() APIs?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 13:53           ` Stefan Richter
@ 2007-08-15 14:35             ` Satyam Sharma
  2007-08-15 14:52               ` Herbert Xu
  2007-08-15 19:58               ` Stefan Richter
  0 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 14:35 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Hi Stefan,

On Wed, 15 Aug 2007, Stefan Richter wrote:

> On 8/15/2007 10:18 AM, Heiko Carstens wrote:
> > On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
> >> Chris Snook <csnook@redhat.com> wrote:
> >> > 
> >> > Because atomic operations are generally used for synchronization, which requires 
> >> > volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
> >> >  Some forget to and we get bugs, because people assume that atomic_read() 
> >> > actually reads something, and atomic_write() actually writes something.  Worse, 
> >> > these are architecture-specific, even compiler version-specific bugs that are 
> >> > often difficult to track down.
> >> 
> >> I'm yet to see a single example from the current tree where
> >> this patch series is the correct solution.  So far the only
> >> example has been a buggy piece of code which has since been
> >> fixed with a cpu_relax.
> > 
> > Btw.: we still have
> > 
> > include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
> > include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));
> > 
> > Looks like they need to be fixed as well.
> 
> 
> I don't know if this here is affected:

Yes, I think it is. You're clearly expecting the read to actually happen
when you call get_hpsb_generation(). It's clearly not a busy-loop, so
cpu_relax() sounds pointless / wrong solution for this case, so I'm now
somewhat beginning to appreciate the motivation behind this series :-)

But as I said, there are ways to achieve the same goals of this series
without using "volatile".

I think I'll submit a RFC/patch or two on this myself (will also fix
the code pieces listed here).

> /* drivers/ieee1394/ieee1394_core.h */
> static inline unsigned int get_hpsb_generation(struct hpsb_host *host)
> {
> 	return atomic_read(&host->generation);
> }
> 
> /* drivers/ieee1394/nodemgr.c */
> static int nodemgr_host_thread(void *__hi)
> {
> 	[...]
> 
> 	for (;;) {
> 		[... sleep until bus reset event ...]
> 
> 		/* Pause for 1/4 second in 1/16 second intervals,
> 		 * to make sure things settle down. */
> 		g = get_hpsb_generation(host);
> 		for (i = 0; i < 4 ; i++) {
> 			if (msleep_interruptible(63) ||
> 			    kthread_should_stop())
> 				goto exit;

Totally unrelated, but this looks weird. IMHO you actually wanted:

	msleep_interruptible(63);
	if (kthread_should_stop())
		goto exit;

here, didn't you? Otherwise the thread will exit even when
kthread_should_stop() != TRUE (just because it received a signal),
and it is not good for a kthread to exit on its own if it uses
kthread_should_stop() or if some other piece of kernel code could
eventually call kthread_stop(tsk) on it.

Ok, probably the thread will never receive a signal in the first
place because it's spawned off kthreadd which ignores all signals
beforehand, but still ...

[PATCH] ieee1394: Fix kthread stopping in nodemgr_host_thread

The nodemgr host thread can exit on its own even when kthread_should_stop
is not true, on receiving a signal (might never happen in practice, as
it ignores signals). But considering kthread_stop() must not be mixed with
kthreads that can exit on their own, I think changing the code like this
is clearer. This change means the thread can cut its sleep short when
receive a signal but looking at the code around, that sounds okay (and
again, it might never actually recieve a signal in practice).

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 drivers/ieee1394/nodemgr.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/ieee1394/nodemgr.c b/drivers/ieee1394/nodemgr.c
index 2ffd534..981a7da 100644
--- a/drivers/ieee1394/nodemgr.c
+++ b/drivers/ieee1394/nodemgr.c
@@ -1721,7 +1721,8 @@ static int nodemgr_host_thread(void *__hi)
 		 * to make sure things settle down. */
 		g = get_hpsb_generation(host);
 		for (i = 0; i < 4 ; i++) {
-			if (msleep_interruptible(63) || kthread_should_stop())
+			msleep_interruptible(63);
+			if (kthread_should_stop())
 				goto exit;

 			/* Now get the generation in which the node ID's we collect

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:35             ` Satyam Sharma
@ 2007-08-15 14:52               ` Herbert Xu
  2007-08-15 16:09                 ` Stefan Richter
  2007-08-15 19:58               ` Stefan Richter
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-15 14:52 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Heiko Carstens, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
>
> > I don't know if this here is affected:
> 
> Yes, I think it is. You're clearly expecting the read to actually happen
> when you call get_hpsb_generation(). It's clearly not a busy-loop, so
> cpu_relax() sounds pointless / wrong solution for this case, so I'm now
> somewhat beginning to appreciate the motivation behind this series :-)

Nope, we're calling schedule which is a rather heavy-weight
barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:25             ` Paul E. McKenney
@ 2007-08-15 15:33               ` Herbert Xu
  2007-08-15 16:08                 ` Paul E. McKenney
  2007-08-15 17:55               ` Satyam Sharma
  2007-08-15 18:19               ` David Howells
  2 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-15 15:33 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Satyam Sharma, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> 
> Do we really need another set of APIs?  Can you give even one example
> where the pre-existing volatile semantics are causing enough of a problem
> to justify adding yet more atomic_*() APIs?

Let's turn this around.  Can you give a single example where
the volatile semantics is needed in a legitimate way?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 15:33               ` Herbert Xu
@ 2007-08-15 16:08                 ` Paul E. McKenney
  2007-08-15 17:18                   ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 16:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Satyam Sharma, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > 
> > Do we really need another set of APIs?  Can you give even one example
> > where the pre-existing volatile semantics are causing enough of a problem
> > to justify adding yet more atomic_*() APIs?
> 
> Let's turn this around.  Can you give a single example where
> the volatile semantics is needed in a legitimate way?

Sorry, but you are the one advocating for the change.

Nice try, though!  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:52               ` Herbert Xu
@ 2007-08-15 16:09                 ` Stefan Richter
  2007-08-15 16:27                   ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 16:09 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Satyam Sharma, Heiko Carstens, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
>>> I don't know if this here is affected:

[...something like]
	b = atomic_read(a);
	for (i = 0; i < 4; i++) {
		msleep_interruptible(63);
		c = atomic_read(a);
		if (c != b) {
			b = c;
			i = 0;
		}
	}

> Nope, we're calling schedule which is a rather heavy-weight
> barrier.

How does the compiler know that msleep() has got barrier()s?
-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15  6:49       ` Herbert Xu
  2007-08-15  8:18         ` Heiko Carstens
@ 2007-08-15 16:13         ` Chris Snook
  2007-08-15 23:40           ` Herbert Xu
  1 sibling, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-15 16:13 UTC (permalink / raw)
  To: Herbert Xu
  Cc: satyam, clameter, linux-kernel, linux-arch, torvalds, netdev,
	akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu wrote:
> Chris Snook <csnook@redhat.com> wrote:
>> Because atomic operations are generally used for synchronization, which requires 
>> volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
>>  Some forget to and we get bugs, because people assume that atomic_read() 
>> actually reads something, and atomic_write() actually writes something.  Worse, 
>> these are architecture-specific, even compiler version-specific bugs that are 
>> often difficult to track down.
> 
> I'm yet to see a single example from the current tree where
> this patch series is the correct solution.  So far the only
> example has been a buggy piece of code which has since been
> fixed with a cpu_relax.

Part of the motivation here is to fix heisenbugs.  If I knew where they 
were, I'd be posting patches for them.  Unlike most bugs, where we want 
to expose them as obviously as possible, these can be extremely 
difficult to track down, and are often due to people assuming that the 
atomic_* operations have the same semantics they've historically had. 
Remember that until recently, all SMP architectures except s390 (which 
very few kernel developers outside of IBM, Red Hat, and SuSE do much 
work on) had volatile declarations for atomic_t.  Removing the volatile 
declarations from i386 and x86_64 may have created heisenbugs that won't 
manifest themselves until GCC 6.0 comes out and people start compiling 
kernels with -O5.  We should have consistent semantics for atomic_* 
operations.

The other motivation is to reduce the need for the barriers used to 
prevent/fix such problems which clobber all your registers, and instead 
force atomic_* operations to behave in the way they're actually used. 
After the (resubmitted) patchset is merged, we'll be able to remove a 
whole bunch of barriers, shrinking our source and our binaries, and 
improving performance.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 16:09                 ` Stefan Richter
@ 2007-08-15 16:27                   ` Paul E. McKenney
  2007-08-15 17:13                     ` Satyam Sharma
  2007-08-15 18:31                     ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 16:27 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Herbert Xu, Satyam Sharma, Heiko Carstens, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 06:09:35PM +0200, Stefan Richter wrote:
> Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
> >>> I don't know if this here is affected:
> 
> [...something like]
> 	b = atomic_read(a);
> 	for (i = 0; i < 4; i++) {
> 		msleep_interruptible(63);
> 		c = atomic_read(a);
> 		if (c != b) {
> 			b = c;
> 			i = 0;
> 		}
> 	}
> 
> > Nope, we're calling schedule which is a rather heavy-weight
> > barrier.
> 
> How does the compiler know that msleep() has got barrier()s?

Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.
In many cases, the compiler also has to assume that msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 16:27                   ` Paul E. McKenney
@ 2007-08-15 17:13                     ` Satyam Sharma
  2007-08-15 18:31                     ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 17:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Stefan Richter, Herbert Xu, Heiko Carstens, Chris Snook,
	clameter, Linux Kernel Mailing List, linux-arch, Linus Torvalds,
	netdev, Andrew Morton, ak, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher



On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 06:09:35PM +0200, Stefan Richter wrote:
> > Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
> > >>> I don't know if this here is affected:
> > 
> > [...something like]
> > 	b = atomic_read(a);
> > 	for (i = 0; i < 4; i++) {
> > 		msleep_interruptible(63);
> > 		c = atomic_read(a);
> > 		if (c != b) {
> > 			b = c;
> > 			i = 0;
> > 		}
> > 	}
> > 
> > > Nope, we're calling schedule which is a rather heavy-weight
> > > barrier.
> > 
> > How does the compiler know that msleep() has got barrier()s?
> 
> Because msleep_interruptible() is in a separate compilation unit,
> the compiler has to assume that it might modify any arbitrary global.
> In many cases, the compiler also has to assume that msleep_interruptible()
> might call back into a function in the current compilation unit, thus
> possibly modifying global static variables.

Yup, I've just verified this with a testcase. So a call to any function
outside of the current compilation unit acts as a compiler barrier. Cool.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 16:08                 ` Paul E. McKenney
@ 2007-08-15 17:18                   ` Satyam Sharma
  2007-08-15 17:33                     ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 17:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher



On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > > 
> > > Do we really need another set of APIs?  Can you give even one example
> > > where the pre-existing volatile semantics are causing enough of a problem
> > > to justify adding yet more atomic_*() APIs?
> > 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Sorry, but you are the one advocating for the change.

Not for i386 and x86_64 -- those have atomic ops without any "volatile"
semantics (currently as per existing definitions).

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 17:18                   ` Satyam Sharma
@ 2007-08-15 17:33                     ` Paul E. McKenney
  2007-08-15 18:05                       ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 17:33 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 10:48:28PM +0530, Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> > On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > Do we really need another set of APIs?  Can you give even one example
> > > > where the pre-existing volatile semantics are causing enough of a problem
> > > > to justify adding yet more atomic_*() APIs?
> > > 
> > > Let's turn this around.  Can you give a single example where
> > > the volatile semantics is needed in a legitimate way?
> > 
> > Sorry, but you are the one advocating for the change.
> 
> Not for i386 and x86_64 -- those have atomic ops without any "volatile"
> semantics (currently as per existing definitions).

I claim unit volumes with arm, and the majority of the architectures, but
I cannot deny the popularity of i386 and x86_64 with many developers.  ;-)

However, I am not aware of code in the kernel that would benefit
from the compiler coalescing multiple atomic_set() and atomic_read()
invocations, thus I don't see the downside to volatility in this case.
Are there some performance-critical code fragments that I am missing?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:25             ` Paul E. McKenney
  2007-08-15 15:33               ` Herbert Xu
@ 2007-08-15 17:55               ` Satyam Sharma
  2007-08-15 19:07                 ` Paul E. McKenney
  2007-08-15 20:58                 ` Segher Boessenkool
  2007-08-15 18:19               ` David Howells
  2 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 17:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

Hi Paul,


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > [...]
> > No, I'd actually prefer something like what Christoph Lameter suggested,
> > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > ops can use alternative functions. How about something like:
> > 
> > #define atomic_read_volatile(v)			\
> > 	({					\
> > 		forget(&(v)->counter);		\
> > 		((v)->counter);			\
> > 	})
> 
> Wouldn't the above "forget" the value, throw it away, then forget
> that it forgot it, giving non-volatile semantics?

Nope, I don't think so. I wrote the following trivial testcases:
[ See especially tp4.c and tp4.s (last example). ]

==============================================================================
$ cat tp1.c # Using volatile access casts

#define atomic_read(a)	(*(volatile int *)&a)

int a;

void func(void)
{
	a = 0;
	while (atomic_read(a))
		;
}
==============================================================================
$ gcc -Os -S tp1.c; cat tp1.s

func:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$0, a
.L2:
	movl	a, %eax
	testl	%eax, %eax
	jne	.L2
	popl	%ebp
	ret
==============================================================================
$ cat tp2.c # Using nothing; gcc will optimize the whole loop away

#define forget(x)

#define atomic_read(a)		\
	({			\
		forget(&(a));	\
		(a);		\
	})

int a;

void func(void)
{
	a = 0;
	while (atomic_read(a))
		;
}
==============================================================================
$ gcc -Os -S tp2.c; cat tp2.s

func:
	pushl	%ebp
	movl	%esp, %ebp
	popl	%ebp
	movl	$0, a
	ret
==============================================================================
$ cat tp3.c # Using a full memory clobber barrier

#define forget(x)	asm volatile ("":::"memory")

#define atomic_read(a)		\
	({			\
		forget(&(a));	\
		(a);		\
	})

int a;

void func(void)
{
	a = 0;
	while (atomic_read(a))
		;
}
==============================================================================
$ gcc -Os -S tp3.c; cat tp3.s

func:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$0, a
.L2:
	cmpl	$0, a
	jne	.L2
	popl	%ebp
	ret
==============================================================================
$ cat tp4.c # Using a forget(var) macro

#define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))

#define atomic_read(a)		\
	({			\
		forget(a);	\
		(a);		\
	})

int a;

void func(void)
{
	a = 0;
	while (atomic_read(a))
		;
}
==============================================================================
$ gcc -Os -S tp4.c; cat tp4.s

func:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$0, a
.L2:
	cmpl	$0, a
	jne	.L2
	popl	%ebp
	ret
==============================================================================


Possibly these were too trivial to expose any potential problems that you
may have been referring to, so would be helpful if you could write a more
concrete example / sample code.


> > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > like the patch that Sebastian Siewior has submitted on another thread just
> > a while back.
> 
> Given that you are advocating a change (please keep in mind that
> atomic_read() and atomic_set() had volatile semantics on almost all
> platforms), care to give some example where these historical volatile
> semantics are causing a problem?
> [...]
> Can you give even one example
> where the pre-existing volatile semantics are causing enough of a problem
> to justify adding yet more atomic_*() APIs?

Will take this to the other sub-thread ...


> > Of course, if we find there are more callers in the kernel who want the
> > volatility behaviour than those who don't care, we can re-define the
> > existing ops to such variants, and re-name the existing definitions to
> > somethine else, say "atomic_read_nonvolatile" for all I care.
> 
> Do we really need another set of APIs?

Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


Thanks,
Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 17:33                     ` Paul E. McKenney
@ 2007-08-15 18:05                       ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 18:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 10:48:28PM +0530, Satyam Sharma wrote:
> > [...]
> > Not for i386 and x86_64 -- those have atomic ops without any "volatile"
> > semantics (currently as per existing definitions).
> 
> I claim unit volumes with arm, and the majority of the architectures, but
> I cannot deny the popularity of i386 and x86_64 with many developers.  ;-)

Hmm, does arm really need that "volatile int counter;"? Hopefully RMK will
take a patch removing that "volatile" ... ;-)

> However, I am not aware of code in the kernel that would benefit
> from the compiler coalescing multiple atomic_set() and atomic_read()
> invocations, thus I don't see the downside to volatility in this case.
> Are there some performance-critical code fragments that I am missing?

I don't know, and yes, code with multiple atomic_set's and atomic_read's
getting optimized or coalesced does sound strange to start with. Anyway,
I'm not against "volatile semantics" per se. As replied elsewhere, I do
appreciate the motivation behind this series (to _avoid_ gotchas, not to
fix existing ones). Just that I'd like to avoid using "volatile", for
aforementioned reasons, especially given that there are perfectly
reasonable alternatives to achieve the same desired behaviour.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:25             ` Paul E. McKenney
  2007-08-15 15:33               ` Herbert Xu
  2007-08-15 17:55               ` Satyam Sharma
@ 2007-08-15 18:19               ` David Howells
  2007-08-15 18:45                 ` Paul E. McKenney
  2 siblings, 1 reply; 657+ messages in thread
From: David Howells @ 2007-08-15 18:19 UTC (permalink / raw)
  To: Herbert Xu
  Cc: dhowells, Paul E. McKenney, Satyam Sharma, Stefan Richter,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Let's turn this around.  Can you give a single example where
> the volatile semantics is needed in a legitimate way?

Accessing H/W registers?  But apart from that...

David

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 16:27                   ` Paul E. McKenney
  2007-08-15 17:13                     ` Satyam Sharma
@ 2007-08-15 18:31                     ` Segher Boessenkool
  2007-08-15 18:57                       ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 18:31 UTC (permalink / raw)
  To: paulmck
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>> How does the compiler know that msleep() has got barrier()s?
>
> Because msleep_interruptible() is in a separate compilation unit,
> the compiler has to assume that it might modify any arbitrary global.

No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)

> In many cases, the compiler also has to assume that 
> msleep_interruptible()
> might call back into a function in the current compilation unit, thus
> possibly modifying global static variables.

It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 12:31       ` Satyam Sharma
  2007-08-15 13:08         ` Stefan Richter
@ 2007-08-15 18:31         ` Segher Boessenkool
  2007-08-15 19:40           ` Satyam Sharma
  2007-08-15 23:22         ` Paul Mackerras
  2007-08-16  3:37         ` Bill Fink
  3 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 18:31 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak,
	cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

> "Volatile behaviour" itself isn't consistently defined (at least
> definitely not consistently implemented in various gcc versions across
> platforms),

It should be consistent across platforms; if not, file a bug please.

> but it is /expected/ to mean something like: "ensure that
> every such access actually goes all the way to memory, and is not
> re-ordered w.r.t. to other accesses, as far as the compiler can take
> care of these". The last "as far as compiler can take care" disclaimer
> comes about due to CPUs doing their own re-ordering nowadays.

You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.

Segher

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:19               ` David Howells
@ 2007-08-15 18:45                 ` Paul E. McKenney
  2007-08-15 23:41                   ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 18:45 UTC (permalink / raw)
  To: David Howells
  Cc: Herbert Xu, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Accessing H/W registers?  But apart from that...

Communicating between process context and interrupt/NMI handlers using
per-CPU variables.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:31                     ` Segher Boessenkool
@ 2007-08-15 18:57                       ` Paul E. McKenney
  2007-08-15 19:54                         ` Satyam Sharma
  2007-08-15 21:05                         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 18:57 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> >>How does the compiler know that msleep() has got barrier()s?
> >
> >Because msleep_interruptible() is in a separate compilation unit,
> >the compiler has to assume that it might modify any arbitrary global.
> 
> No; compilation units have nothing to do with it, GCC can optimise
> across compilation unit boundaries just fine, if you tell it to
> compile more than one compilation unit at once.

Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.

> What you probably mean is that the compiler has to assume any code
> it cannot currently see can do anything (insofar as allowed by the
> relevant standards etc.)

Indeed.

> >In many cases, the compiler also has to assume that 
> >msleep_interruptible()
> >might call back into a function in the current compilation unit, thus
> >possibly modifying global static variables.
> 
> It most often is smart enough to see what compilation-unit-local
> variables might be modified that way, though :-)

Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.
At least given that gcc doesn't know about multiple threads of execution!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 17:55               ` Satyam Sharma
@ 2007-08-15 19:07                 ` Paul E. McKenney
  2007-08-15 21:07                   ` Segher Boessenkool
  2007-08-15 20:58                 ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 19:07 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 11:25:05PM +0530, Satyam Sharma wrote:
> Hi Paul,
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > > [...]
> > > No, I'd actually prefer something like what Christoph Lameter suggested,
> > > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > > ops can use alternative functions. How about something like:
> > > 
> > > #define atomic_read_volatile(v)			\
> > > 	({					\
> > > 		forget(&(v)->counter);		\
> > > 		((v)->counter);			\
> > > 	})
> > 
> > Wouldn't the above "forget" the value, throw it away, then forget
> > that it forgot it, giving non-volatile semantics?
> 
> Nope, I don't think so. I wrote the following trivial testcases:
> [ See especially tp4.c and tp4.s (last example). ]

Right.  I should have said "wouldn't the compiler be within its rights
to forget the value, throw it away, then forget that it forgot it".
The value coming out of the #define above is an unadorned ((v)->counter),
which has no volatile semantics.

> ==============================================================================
> $ cat tp1.c # Using volatile access casts
> 
> #define atomic_read(a)	(*(volatile int *)&a)
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp1.c; cat tp1.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	movl	a, %eax
> 	testl	%eax, %eax
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> $ cat tp2.c # Using nothing; gcc will optimize the whole loop away
> 
> #define forget(x)
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(&(a));	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp2.c; cat tp2.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	popl	%ebp
> 	movl	$0, a
> 	ret
> ==============================================================================
> $ cat tp3.c # Using a full memory clobber barrier
> 
> #define forget(x)	asm volatile ("":::"memory")
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(&(a));	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp3.c; cat tp3.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	cmpl	$0, a
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> $ cat tp4.c # Using a forget(var) macro
> 
> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(a);	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp4.c; cat tp4.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	cmpl	$0, a
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> 
> Possibly these were too trivial to expose any potential problems that you
> may have been referring to, so would be helpful if you could write a more
> concrete example / sample code.

The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.  If the value is non-volatile,
it will refetch it (and expect it not to have changed, possibly being
disappointed by an interrupt handler running on that same CPU).

> > > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > > like the patch that Sebastian Siewior has submitted on another thread just
> > > a while back.
> > 
> > Given that you are advocating a change (please keep in mind that
> > atomic_read() and atomic_set() had volatile semantics on almost all
> > platforms), care to give some example where these historical volatile
> > semantics are causing a problem?
> > [...]
> > Can you give even one example
> > where the pre-existing volatile semantics are causing enough of a problem
> > to justify adding yet more atomic_*() APIs?
> 
> Will take this to the other sub-thread ...

OK.

> > > Of course, if we find there are more callers in the kernel who want the
> > > volatility behaviour than those who don't care, we can re-define the
> > > existing ops to such variants, and re-name the existing definitions to
> > > somethine else, say "atomic_read_nonvolatile" for all I care.
> > 
> > Do we really need another set of APIs?
> 
> Well, if there's one set of users who do care about volatile behaviour,
> and another set that doesn't, it only sounds correct to provide both
> those APIs, instead of forcing one behaviour on all users.

Well, if the second set doesn't care, they should be OK with the volatile
behavior in this case.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:31         ` Segher Boessenkool
@ 2007-08-15 19:40           ` Satyam Sharma
  2007-08-15 20:42             ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 19:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak,
	cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang



On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms),
> 
> It should be consistent across platforms; if not, file a bug please.
> 
> > but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
                              ^
                              (volatile)

(or, alternatively, "other accesses to the same volatile object" ...)

> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> 
> You can *expect* whatever you want, but this isn't in line with
> reality at all.
> 
> volatile _does not_ prevent reordering wrt other accesses.
> [...]
> What volatile does are a) never optimise away a read (or write)
> to the object, since the data can change in ways the compiler
> cannot see; and b) never move stores to the object across a
> sequence point.  This does not mean other accesses cannot be
> reordered wrt the volatile access.
> 
> If the abstract machine would do an access to a volatile-
> qualified object, the generated machine code will do that
> access too.  But, for example, it can still be optimised
> away by the compiler, if it can prove it is allowed to.

As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.

BTW:

#define atomic_read(a)	(*(volatile int *)&(a))
#define atomic_set(a,i)	(*(volatile int *)&(a) = (i))

int a;

void func(void)
{
	int b;

	b = atomic_read(a);
	atomic_set(a, 20);
	b = atomic_read(a);
}

gives:

func:
	pushl	%ebp
	movl	a, %eax
	movl	%esp, %ebp
	movl	$20, a
	movl	a, %eax
	popl	%ebp
	ret

so the first atomic_read() wasn't optimized away.


> volatile _does not_ make accesses go all the way to memory.
> [...]
> If you want stuff to go all the way to memory, you need some
> architecture-specific flush sequence; to make a store globally
> visible before another store, you need mb(); before some following
> read, you need mb(); to prevent reordering you need a barrier.

Sure, which explains the "as far as the compiler can take care" bit.
Poor phrase / choice of words, probably.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:57                       ` Paul E. McKenney
@ 2007-08-15 19:54                         ` Satyam Sharma
  2007-08-15 20:17                           ` Paul E. McKenney
  2007-08-15 20:47                           ` Segher Boessenkool
  2007-08-15 21:05                         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-15 19:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, rpjday, netdev, ak, cfriesen,
	Heiko Carstens, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

[ The Cc: list scares me. Should probably trim it. ]


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > >>How does the compiler know that msleep() has got barrier()s?
> > >
> > >Because msleep_interruptible() is in a separate compilation unit,
> > >the compiler has to assume that it might modify any arbitrary global.
> > 
> > No; compilation units have nothing to do with it, GCC can optimise
> > across compilation unit boundaries just fine, if you tell it to
> > compile more than one compilation unit at once.
> 
> Last I checked, the Linux kernel build system did compile each .c file
> as a separate compilation unit.
> 
> > What you probably mean is that the compiler has to assume any code
> > it cannot currently see can do anything (insofar as allowed by the
> > relevant standards etc.)

I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 14:35             ` Satyam Sharma
  2007-08-15 14:52               ` Herbert Xu
@ 2007-08-15 19:58               ` Stefan Richter
  1 sibling, 0 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-15 19:58 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: Linux Kernel Mailing List, Linus Torvalds

(trimmed Cc)

Satyam Sharma wrote:
> [PATCH] ieee1394: Fix kthread stopping in nodemgr_host_thread
> 
> The nodemgr host thread can exit on its own even when kthread_should_stop
> is not true, on receiving a signal (might never happen in practice, as
> it ignores signals). But considering kthread_stop() must not be mixed with
> kthreads that can exit on their own, I think changing the code like this
> is clearer. This change means the thread can cut its sleep short when
> receive a signal but looking at the code around, that sounds okay (and
> again, it might never actually recieve a signal in practice).

Thanks, committed to linux1394-2.6.git.
-- 
Stefan Richter
-=====-=-=== =--- -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 10:35     ` Stefan Richter
  2007-08-15 12:04       ` Herbert Xu
  2007-08-15 12:31       ` Satyam Sharma
@ 2007-08-15 19:59       ` Christoph Lameter
  2 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-15 19:59 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Satyam Sharma, Chris Snook, Linux Kernel Mailing List,
	linux-arch, torvalds, netdev, Andrew Morton, ak, heiko.carstens,
	davem, schwidefsky, wensong, horms, wjiang, cfriesen, zlynx,
	rpjday, jesper.juhl, segher, Herbert Xu, Paul E. McKenney

On Wed, 15 Aug 2007, Stefan Richter wrote:

> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

Atomic operations only require exclusive access to the cacheline while the 
value is modified.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 19:54                         ` Satyam Sharma
@ 2007-08-15 20:17                           ` Paul E. McKenney
  2007-08-15 20:52                             ` Segher Boessenkool
  2007-08-15 20:47                           ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 20:17 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, rpjday, netdev, ak, cfriesen,
	Heiko Carstens, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

On Thu, Aug 16, 2007 at 01:24:42AM +0530, Satyam Sharma wrote:
> [ The Cc: list scares me. Should probably trim it. ]

Trim away!  ;-)

> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > > >>How does the compiler know that msleep() has got barrier()s?
> > > >
> > > >Because msleep_interruptible() is in a separate compilation unit,
> > > >the compiler has to assume that it might modify any arbitrary global.
> > > 
> > > No; compilation units have nothing to do with it, GCC can optimise
> > > across compilation unit boundaries just fine, if you tell it to
> > > compile more than one compilation unit at once.
> > 
> > Last I checked, the Linux kernel build system did compile each .c file
> > as a separate compilation unit.
> > 
> > > What you probably mean is that the compiler has to assume any code
> > > it cannot currently see can do anything (insofar as allowed by the
> > > relevant standards etc.)
> 
> I think this was just terminology confusion here again. Isn't "any code
> that it cannot currently see" the same as "another compilation unit",
> and wouldn't the "compilation unit" itself expand if we ask gcc to
> compile more than one unit at once? Or is there some more specific
> "definition" for "compilation unit" (in gcc lingo, possibly?)

This is indeed my understanding -- "compilation unit" is whatever the
compiler looks at in one go.  I have heard the word "module" used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 19:40           ` Satyam Sharma
@ 2007-08-15 20:42             ` Segher Boessenkool
  2007-08-16  1:23               ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 20:42 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak,
	cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>> What volatile does are a) never optimise away a read (or write)
>> to the object, since the data can change in ways the compiler
>> cannot see; and b) never move stores to the object across a
>> sequence point.  This does not mean other accesses cannot be
>> reordered wrt the volatile access.
>>
>> If the abstract machine would do an access to a volatile-
>> qualified object, the generated machine code will do that
>> access too.  But, for example, it can still be optimised
>> away by the compiler, if it can prove it is allowed to.
>
> As (now) indicated above, I had meant multiple volatile accesses to
> the same object, obviously.

Yes, accesses to volatile objects are never reordered with
respect to each other.

> BTW:
>
> #define atomic_read(a)	(*(volatile int *)&(a))
> #define atomic_set(a,i)	(*(volatile int *)&(a) = (i))
>
> int a;
>
> void func(void)
> {
> 	int b;
>
> 	b = atomic_read(a);
> 	atomic_set(a, 20);
> 	b = atomic_read(a);
> }
>
> gives:
>
> func:
> 	pushl	%ebp
> 	movl	a, %eax
> 	movl	%esp, %ebp
> 	movl	$20, a
> 	movl	a, %eax
> 	popl	%ebp
> 	ret
>
> so the first atomic_read() wasn't optimized away.

Of course.  It is executed by the abstract machine, so
it will be executed by the actual machine.  On the other
hand, try

	b = 0;
	if (b)
		b = atomic_read(a);

or similar.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 19:54                         ` Satyam Sharma
  2007-08-15 20:17                           ` Paul E. McKenney
@ 2007-08-15 20:47                           ` Segher Boessenkool
  2007-08-16  0:36                             ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 20:47 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: horms, Stefan Richter, Linux Kernel Mailing List,
	Paul E. McKenney, ak, netdev, cfriesen, Heiko Carstens, rpjday,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>>> What you probably mean is that the compiler has to assume any code
>>> it cannot currently see can do anything (insofar as allowed by the
>>> relevant standards etc.)
>
> I think this was just terminology confusion here again. Isn't "any code
> that it cannot currently see" the same as "another compilation unit",

It is not; try  gcc -combine  or the upcoming link-time optimisation
stuff, for example.

> and wouldn't the "compilation unit" itself expand if we ask gcc to
> compile more than one unit at once? Or is there some more specific
> "definition" for "compilation unit" (in gcc lingo, possibly?)

"compilation unit" is a C standard term.  It typically boils down
to "single .c file".


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 20:17                           ` Paul E. McKenney
@ 2007-08-15 20:52                             ` Segher Boessenkool
  2007-08-15 22:42                               ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 20:52 UTC (permalink / raw)
  To: paulmck
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>> I think this was just terminology confusion here again. Isn't "any 
>> code
>> that it cannot currently see" the same as "another compilation unit",
>> and wouldn't the "compilation unit" itself expand if we ask gcc to
>> compile more than one unit at once? Or is there some more specific
>> "definition" for "compilation unit" (in gcc lingo, possibly?)
>
> This is indeed my understanding -- "compilation unit" is whatever the
> compiler looks at in one go.  I have heard the word "module" used for
> the minimal compilation unit covering a single .c file and everything
> that it #includes, but there might be a better name for this.

Yes, that's what's called "compilation unit" :-)

[/me double checks]

Erm, the C standard actually calls it "translation unit".

To be exact, to avoid any more confusion:

5.1.1.1/1:
A C program need not all be translated at the same time. The
text of the program is kept in units called source files, (or
preprocessing files) in this International Standard. A source
file together with all the headers and source files included
via the preprocessing directive #include is known as a
preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.



Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 17:55               ` Satyam Sharma
  2007-08-15 19:07                 ` Paul E. McKenney
@ 2007-08-15 20:58                 ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 20:58 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak,
	cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>>> Of course, if we find there are more callers in the kernel who want 
>>> the
>>> volatility behaviour than those who don't care, we can re-define the
>>> existing ops to such variants, and re-name the existing definitions 
>>> to
>>> somethine else, say "atomic_read_nonvolatile" for all I care.
>>
>> Do we really need another set of APIs?
>
> Well, if there's one set of users who do care about volatile behaviour,
> and another set that doesn't, it only sounds correct to provide both
> those APIs, instead of forcing one behaviour on all users.

But since there currently is only one such API, and there are
users expecting the stronger behaviour, the only sane thing to
do is let the API provide that behaviour.  You can always add
a new API with weaker behaviour later, and move users that are
okay with it over to that new API.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:57                       ` Paul E. McKenney
  2007-08-15 19:54                         ` Satyam Sharma
@ 2007-08-15 21:05                         ` Segher Boessenkool
  2007-08-15 22:44                           ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 21:05 UTC (permalink / raw)
  To: paulmck
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>> No; compilation units have nothing to do with it, GCC can optimise
>> across compilation unit boundaries just fine, if you tell it to
>> compile more than one compilation unit at once.
>
> Last I checked, the Linux kernel build system did compile each .c file
> as a separate compilation unit.

I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.

>>> In many cases, the compiler also has to assume that
>>> msleep_interruptible()
>>> might call back into a function in the current compilation unit, thus
>>> possibly modifying global static variables.
>>
>> It most often is smart enough to see what compilation-unit-local
>> variables might be modified that way, though :-)
>
> Yep.  For example, if it knows the current value of a given such local
> variable, and if all code paths that would change some other variable
> cannot be reached given that current value of the first variable.

Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can "escape" from that translation unit, nothing outside
the translation unit can write to the variable.

> At least given that gcc doesn't know about multiple threads of 
> execution!

Heh, only about the threads it creates itself (not relevant to
the kernel, for sure :-) )


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 19:07                 ` Paul E. McKenney
@ 2007-08-15 21:07                   ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-15 21:07 UTC (permalink / raw)
  To: paulmck
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, rpjday, netdev, ak,
	cfriesen, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>> Possibly these were too trivial to expose any potential problems that 
>> you
>> may have been referring to, so would be helpful if you could write a 
>> more
>> concrete example / sample code.
>
> The trick is to have a sufficiently complicated expression to force
> the compiler to run out of registers.

You can use -ffixed-XXX to keep the testcase simple.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 20:52                             ` Segher Boessenkool
@ 2007-08-15 22:42                               ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 22:42 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Wed, Aug 15, 2007 at 10:52:53PM +0200, Segher Boessenkool wrote:
> >>I think this was just terminology confusion here again. Isn't "any 
> >>code
> >>that it cannot currently see" the same as "another compilation unit",
> >>and wouldn't the "compilation unit" itself expand if we ask gcc to
> >>compile more than one unit at once? Or is there some more specific
> >>"definition" for "compilation unit" (in gcc lingo, possibly?)
> >
> >This is indeed my understanding -- "compilation unit" is whatever the
> >compiler looks at in one go.  I have heard the word "module" used for
> >the minimal compilation unit covering a single .c file and everything
> >that it #includes, but there might be a better name for this.
> 
> Yes, that's what's called "compilation unit" :-)
> 
> [/me double checks]
> 
> Erm, the C standard actually calls it "translation unit".
> 
> To be exact, to avoid any more confusion:
> 
> 5.1.1.1/1:
> A C program need not all be translated at the same time. The
> text of the program is kept in units called source files, (or
> preprocessing files) in this International Standard. A source
> file together with all the headers and source files included
> via the preprocessing directive #include is known as a
> preprocessing translation unit. After preprocessing, a
> preprocessing translation unit is called a translation unit.

I am OK with "translation" and "compilation" being near-synonyms.  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 21:05                         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
@ 2007-08-15 22:44                           ` Paul E. McKenney
  2007-08-16  1:23                             ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 22:44 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Wed, Aug 15, 2007 at 11:05:35PM +0200, Segher Boessenkool wrote:
> >>No; compilation units have nothing to do with it, GCC can optimise
> >>across compilation unit boundaries just fine, if you tell it to
> >>compile more than one compilation unit at once.
> >
> >Last I checked, the Linux kernel build system did compile each .c file
> >as a separate compilation unit.
> 
> I have some patches to use -combine -fwhole-program for Linux.
> Highly experimental, you need a patched bleeding edge toolchain.
> If there's interest I'll clean it up and put it online.
> 
> David Woodhouse had some similar patches about a year ago.

Sounds exciting...  ;-)

> >>>In many cases, the compiler also has to assume that
> >>>msleep_interruptible()
> >>>might call back into a function in the current compilation unit, thus
> >>>possibly modifying global static variables.
> >>
> >>It most often is smart enough to see what compilation-unit-local
> >>variables might be modified that way, though :-)
> >
> >Yep.  For example, if it knows the current value of a given such local
> >variable, and if all code paths that would change some other variable
> >cannot be reached given that current value of the first variable.
> 
> Or the most common thing: if neither the address of the translation-
> unit local variable nor the address of any function writing to that
> variable can "escape" from that translation unit, nothing outside
> the translation unit can write to the variable.

But there is usually at least one externally callable function in
a .c file.

> >At least given that gcc doesn't know about multiple threads of 
> >execution!
> 
> Heh, only about the threads it creates itself (not relevant to
> the kernel, for sure :-) )

;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 12:31       ` Satyam Sharma
  2007-08-15 13:08         ` Stefan Richter
  2007-08-15 18:31         ` Segher Boessenkool
@ 2007-08-15 23:22         ` Paul Mackerras
  2007-08-16  0:26           ` Christoph Lameter
  2007-08-24 12:50           ` Denys Vlasenko
  2007-08-16  3:37         ` Bill Fink
  3 siblings, 2 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-15 23:22 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

Satyam Sharma writes:

> > Doesn't "atomic WRT all processors" require volatility?
> 
> No, it definitely doesn't. Why should it?
> 
> "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> for SMP systems (ensure that that object is modified / set / replaced
> in main memory atomically) and has nothing to do with "volatile"
> behaviour.

Atomic variables are "volatile" in the sense that they are liable to
be changed at any time by mechanisms that are outside the knowledge of
the C compiler, namely, other CPUs, or this CPU executing an interrupt
routine.

In the kernel we use atomic variables in precisely those situations
where a variable is potentially accessed concurrently by multiple
CPUs, and where each CPU needs to see updates done by other CPUs in a
timely fashion.  That is what they are for.  Therefore the compiler
must not cache values of atomic variables in registers; each
atomic_read must result in a load and each atomic_set must result in a
store.  Anything else will just lead to subtle bugs.

I have no strong opinion about whether or not the best way to achieve
this is through the use of the "volatile" C keyword.  Segher's idea of
using asm instead seems like a good one to me.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 16:13         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
@ 2007-08-15 23:40           ` Herbert Xu
  2007-08-15 23:51             ` Paul E. McKenney
  2007-08-16  1:26             ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-15 23:40 UTC (permalink / raw)
  To: Chris Snook
  Cc: satyam, clameter, linux-kernel, linux-arch, torvalds, netdev,
	akpm, ak, heiko.carstens, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Wed, Aug 15, 2007 at 12:13:12PM -0400, Chris Snook wrote:
>
> Part of the motivation here is to fix heisenbugs.  If I knew where they 

By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 18:45                 ` Paul E. McKenney
@ 2007-08-15 23:41                   ` Herbert Xu
  2007-08-15 23:53                     ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-15 23:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, Aug 15, 2007 at 11:45:20AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > 
> > > Let's turn this around.  Can you give a single example where
> > > the volatile semantics is needed in a legitimate way?
> > 
> > Accessing H/W registers?  But apart from that...
> 
> Communicating between process context and interrupt/NMI handlers using
> per-CPU variables.

Remeber we're talking about atomic_read/atomic_set.  Please
cite the actual file/function name you have in mind.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:40           ` Herbert Xu
@ 2007-08-15 23:51             ` Paul E. McKenney
  2007-08-16  1:30               ` Segher Boessenkool
  2007-08-16  1:26             ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 23:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Chris Snook, satyam, clameter, linux-kernel, linux-arch,
	torvalds, netdev, akpm, ak, heiko.carstens, davem, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Thu, Aug 16, 2007 at 07:40:21AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 12:13:12PM -0400, Chris Snook wrote:
> >
> > Part of the motivation here is to fix heisenbugs.  If I knew where they 
> 
> By the same token we should probably disable optimisations
> altogether since that too can create heisenbugs.

Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.  But they are also intended to leave other
valuable optimizations in force.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:41                   ` Herbert Xu
@ 2007-08-15 23:53                     ` Paul E. McKenney
  2007-08-16  0:12                       ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-15 23:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 07:41:46AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 11:45:20AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> > > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > > 
> > > > Let's turn this around.  Can you give a single example where
> > > > the volatile semantics is needed in a legitimate way?
> > > 
> > > Accessing H/W registers?  But apart from that...
> > 
> > Communicating between process context and interrupt/NMI handlers using
> > per-CPU variables.
> 
> Remeber we're talking about atomic_read/atomic_set.  Please
> cite the actual file/function name you have in mind.

Yep, we are indeed talking about atomic_read()/atomic_set().

We have been through this issue already in this thread.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:53                     ` Paul E. McKenney
@ 2007-08-16  0:12                       ` Herbert Xu
  2007-08-16  0:23                         ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  0:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
>
> > > Communicating between process context and interrupt/NMI handlers using
> > > per-CPU variables.
> > 
> > Remeber we're talking about atomic_read/atomic_set.  Please
> > cite the actual file/function name you have in mind.
> 
> Yep, we are indeed talking about atomic_read()/atomic_set().
> 
> We have been through this issue already in this thread.

Sorry, but I must've missed it.  Could you cite the file or
function for my benefit?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:12                       ` Herbert Xu
@ 2007-08-16  0:23                         ` Paul E. McKenney
  2007-08-16  0:30                           ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  0:23 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> >
> > > > Communicating between process context and interrupt/NMI handlers using
> > > > per-CPU variables.
> > > 
> > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > cite the actual file/function name you have in mind.
> > 
> > Yep, we are indeed talking about atomic_read()/atomic_set().
> > 
> > We have been through this issue already in this thread.
> 
> Sorry, but I must've missed it.  Could you cite the file or
> function for my benefit?

I might summarize the thread if there is interest, but I am not able to
do so right this minute.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:22         ` Paul Mackerras
@ 2007-08-16  0:26           ` Christoph Lameter
  2007-08-16  0:34             ` Paul Mackerras
  2007-08-16  0:39             ` Paul E. McKenney
  2007-08-24 12:50           ` Denys Vlasenko
  1 sibling, 2 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  0:26 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> In the kernel we use atomic variables in precisely those situations
> where a variable is potentially accessed concurrently by multiple
> CPUs, and where each CPU needs to see updates done by other CPUs in a
> timely fashion.  That is what they are for.  Therefore the compiler
> must not cache values of atomic variables in registers; each
> atomic_read must result in a load and each atomic_set must result in a
> store.  Anything else will just lead to subtle bugs.

This may have been the intend. However, today the visibility is controlled 
using barriers. And we have barriers that we use with atomic operations. 
Having volatile be the default just lead to confusion. Atomic read should 
just read with no extras. Extras can be added by using variants like 
atomic_read_volatile or so.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:23                         ` Paul E. McKenney
@ 2007-08-16  0:30                           ` Herbert Xu
  2007-08-16  0:49                             ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  0:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, Aug 15, 2007 at 05:23:10PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> > >
> > > > > Communicating between process context and interrupt/NMI handlers using
> > > > > per-CPU variables.
> > > > 
> > > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > > cite the actual file/function name you have in mind.
> > > 
> > > Yep, we are indeed talking about atomic_read()/atomic_set().
> > > 
> > > We have been through this issue already in this thread.
> > 
> > Sorry, but I must've missed it.  Could you cite the file or
> > function for my benefit?
> 
> I might summarize the thread if there is interest, but I am not able to
> do so right this minute.

Thanks.  But I don't need a summary of the thread, I'm asking
for an extant code snippet in our kernel that benefits from
the volatile change and is not part of a busy-wait.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-08-16  0:36                             ` Satyam Sharma
@ 2007-08-16  0:32                               ` Herbert Xu
  2007-08-16  0:58                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Satyam Sharma
  2007-08-16  1:38                               ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  0:32 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, ak, netdev,
	cfriesen, Heiko Carstens, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, clameter, schwidefsky, Chris Snook, davem,
	Linus Torvalds, wensong, wjiang

On Thu, Aug 16, 2007 at 06:06:00AM +0530, Satyam Sharma wrote:
> 
> that are:
> 
> 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> 		mdelay(1);
> 		msecs--;
> 	}
> 
> where mdelay() becomes __const_udelay() which happens to be in another
> translation unit (arch/i386/lib/delay.c) and hence saves this callsite
> from being a bug :-)

The udelay itself certainly should have some form of cpu_relax in it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:26           ` Christoph Lameter
@ 2007-08-16  0:34             ` Paul Mackerras
  2007-08-16  0:40               ` Christoph Lameter
  2007-08-16  0:39             ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  0:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

Christoph Lameter writes:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> This may have been the intend. However, today the visibility is controlled 
> using barriers. And we have barriers that we use with atomic operations. 

Those barriers are for when we need ordering between atomic variables
and other memory locations.  An atomic variable by itself doesn't and
shouldn't need any barriers for other CPUs to be able to see what's
happening to it.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* (no subject)
  2007-08-15 20:47                           ` Segher Boessenkool
@ 2007-08-16  0:36                             ` Satyam Sharma
  2007-08-16  0:32                               ` your mail Herbert Xu
  2007-08-16  1:38                               ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  0:36 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Linux Kernel Mailing List,
	Paul E. McKenney, ak, netdev, cfriesen, Heiko Carstens, rpjday,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > > > What you probably mean is that the compiler has to assume any code
> > > > it cannot currently see can do anything (insofar as allowed by the
> > > > relevant standards etc.)
> > 
> > I think this was just terminology confusion here again. Isn't "any code
> > that it cannot currently see" the same as "another compilation unit",
> 
> It is not; try  gcc -combine  or the upcoming link-time optimisation
> stuff, for example.
> 
> > and wouldn't the "compilation unit" itself expand if we ask gcc to
> > compile more than one unit at once? Or is there some more specific
> > "definition" for "compilation unit" (in gcc lingo, possibly?)
> 
> "compilation unit" is a C standard term.  It typically boils down
> to "single .c file".

As you mentioned later, "single .c file with all the other files (headers
or other .c files) that it pulls in via #include" is actually "translation
unit", both in the C standard as well as gcc docs. "Compilation unit"
doesn't seem to be nearly as standard a term, though in most places it
is indeed meant to be same as "translation unit", but with the new gcc
inter-module-analysis stuff that you referred to above, I suspect one may
reasonably want to call a "compilation unit" as all that the compiler sees
at a given instant.

BTW I did some auditing (only inside include/asm-{i386,x86_64}/ and
arch/{i386,x86_64}/) and found a couple more callsites that don't use
cpu_relax():

arch/i386/kernel/crash.c:101
arch/x86_64/kernel/crash.c:97

that are:

	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
		mdelay(1);
		msecs--;
	}

where mdelay() becomes __const_udelay() which happens to be in another
translation unit (arch/i386/lib/delay.c) and hence saves this callsite
from being a bug :-)

Curiously, __const_udelay() is still marked as "inline" where it is
implemented in lib/delay.c which is weird, considering it won't ever
be inlined, would it? With the kernel presently being compiled one
translation unit at a time, I don't see how the implementation would
be visible to any callsite out there to be able to inline it.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-15  8:18         ` Heiko Carstens
  2007-08-15 13:53           ` Stefan Richter
@ 2007-08-16  0:39           ` Satyam Sharma
  2007-08-24 11:59             ` Denys Vlasenko
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  0:39 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Herbert Xu, Chris Snook, clameter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher



On Wed, 15 Aug 2007, Heiko Carstens wrote:

> [...]
> Btw.: we still have
> 
> include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
> include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));
> 
> Looks like they need to be fixed as well.


[PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

Use cpu_relax() in the busy loops, as atomic_read() doesn't automatically
imply volatility for i386 and x86_64. x86_64 doesn't have this issue because
it open-codes the while loop in smpboot.c:smp_callin() itself that already
uses cpu_relax().

For i386, however, smpboot.c:smp_callin() calls wait_for_init_deassert()
which is buggy for mach-default and mach-es7000 cases.

[ I test-built a kernel -- smp_callin() itself got inlined in its only
  callsite, smpboot.c:start_secondary() -- and the relevant piece of
  code disassembles to the following:

0xc1019704 <start_secondary+12>:        mov    0xc144c4c8,%eax
0xc1019709 <start_secondary+17>:        test   %eax,%eax
0xc101970b <start_secondary+19>:        je     0xc1019709 <start_secondary+17>

  init_deasserted (at 0xc144c4c8) gets fetched into %eax only once and
  then we loop over the test of the stale value in the register only,
  so these look like real bugs to me. With the fix below, this becomes:

0xc1019706 <start_secondary+14>:        pause
0xc1019708 <start_secondary+16>:        cmpl   $0x0,0xc144c4c8
0xc101970f <start_secondary+23>:        je     0xc1019706 <start_secondary+14>

  which looks nice and healthy. ]

Thanks to Heiko Carstens for noticing this.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 include/asm-i386/mach-default/mach_wakecpu.h |    3 ++-
 include/asm-i386/mach-es7000/mach_wakecpu.h  |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/asm-i386/mach-default/mach_wakecpu.h b/include/asm-i386/mach-default/mach_wakecpu.h
index 673b85c..3ebb178 100644
--- a/include/asm-i386/mach-default/mach_wakecpu.h
+++ b/include/asm-i386/mach-default/mach_wakecpu.h
@@ -15,7 +15,8 @@
 
 static inline void wait_for_init_deassert(atomic_t *deassert)
 {
-	while (!atomic_read(deassert));
+	while (!atomic_read(deassert))
+		cpu_relax();
 	return;
 }
 
diff --git a/include/asm-i386/mach-es7000/mach_wakecpu.h b/include/asm-i386/mach-es7000/mach_wakecpu.h
index efc903b..84ff583 100644
--- a/include/asm-i386/mach-es7000/mach_wakecpu.h
+++ b/include/asm-i386/mach-es7000/mach_wakecpu.h
@@ -31,7 +31,8 @@ wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip)
 static inline void wait_for_init_deassert(atomic_t *deassert)
 {
 #ifdef WAKE_SECONDARY_VIA_INIT
-	while (!atomic_read(deassert));
+	while (!atomic_read(deassert))
+		cpu_relax();
 #endif
 	return;
 }

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:26           ` Christoph Lameter
  2007-08-16  0:34             ` Paul Mackerras
@ 2007-08-16  0:39             ` Paul E. McKenney
  2007-08-16  0:42               ` Christoph Lameter
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  0:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 05:26:34PM -0700, Christoph Lameter wrote:
> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> This may have been the intend. However, today the visibility is controlled 
> using barriers. And we have barriers that we use with atomic operations. 
> Having volatile be the default just lead to confusion. Atomic read should 
> just read with no extras. Extras can be added by using variants like 
> atomic_read_volatile or so.

Seems to me that we face greater chance of confusion without the
volatile than with, particularly as compiler optimizations become
more aggressive.  Yes, we could simply disable optimization, but
optimization can be quite helpful.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:34             ` Paul Mackerras
@ 2007-08-16  0:40               ` Christoph Lameter
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  0:40 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Those barriers are for when we need ordering between atomic variables
> and other memory locations.  An atomic variable by itself doesn't and
> shouldn't need any barriers for other CPUs to be able to see what's
> happening to it.

It does not need any barriers. As soon as one cpu acquires the 
cacheline for write it will be invalidated in the caches of the others. So 
the other cpu will have to refetch. No need for volatile.

The issue here may be that the compiler has fetched the atomic variable 
earlier and put it into a register. However, that prefetching is limited 
because it cannot cross functions calls etc. The only problem could be 
loops where the compiler does not refetch the variable since it assumes 
that it does not change and there are no function calls in the body of the 
loop. But AFAIK these loops need cpu_relax and other measures anyways to 
avoid bad effects from busy waiting.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:39             ` Paul E. McKenney
@ 2007-08-16  0:42               ` Christoph Lameter
  2007-08-16  0:53                 ` Paul E. McKenney
  2007-08-16  1:51                 ` Paul Mackerras
  0 siblings, 2 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  0:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> Seems to me that we face greater chance of confusion without the
> volatile than with, particularly as compiler optimizations become
> more aggressive.  Yes, we could simply disable optimization, but
> optimization can be quite helpful.

A volatile default would disable optimizations for atomic_read. 
atomic_read without volatile would allow for full optimization by the 
compiler. Seems that this is what one wants in many cases.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:30                           ` Herbert Xu
@ 2007-08-16  0:49                             ` Paul E. McKenney
  2007-08-16  0:53                               ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  0:49 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:23:10PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> > > >
> > > > > > Communicating between process context and interrupt/NMI handlers using
> > > > > > per-CPU variables.
> > > > > 
> > > > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > > > cite the actual file/function name you have in mind.
> > > > 
> > > > Yep, we are indeed talking about atomic_read()/atomic_set().
> > > > 
> > > > We have been through this issue already in this thread.
> > > 
> > > Sorry, but I must've missed it.  Could you cite the file or
> > > function for my benefit?
> > 
> > I might summarize the thread if there is interest, but I am not able to
> > do so right this minute.
> 
> Thanks.  But I don't need a summary of the thread, I'm asking
> for an extant code snippet in our kernel that benefits from
> the volatile change and is not part of a busy-wait.

Sorry, can't help you there.  I really do believe that the information
you need (as opposed to the specific item you are asking for) really
has been put forth in this thread.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:58                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Satyam Sharma
@ 2007-08-16  0:51                                   ` Herbert Xu
  2007-08-16  1:18                                     ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  0:51 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, ak, netdev,
	cfriesen, Heiko Carstens, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, clameter, schwidefsky, Chris Snook, davem,
	Linus Torvalds, wensong, wjiang

On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
>
> > The udelay itself certainly should have some form of cpu_relax in it.
> 
> Yes, a form of barrier() must be present in mdelay() or udelay() itself
> as you say, having it in __const_udelay() is *not* enough (superflous
> actually, considering it is already a separate translation unit and
> invisible to the compiler).

As long as __const_udelay does something which has the same
effect as barrier it is enough even if it's in the same unit.
As a matter of fact it does on i386 where __delay either uses
rep_nop or asm/volatile.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:49                             ` Paul E. McKenney
@ 2007-08-16  0:53                               ` Herbert Xu
  2007-08-16  1:14                                 ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  0:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
>
> > Thanks.  But I don't need a summary of the thread, I'm asking
> > for an extant code snippet in our kernel that benefits from
> > the volatile change and is not part of a busy-wait.
> 
> Sorry, can't help you there.  I really do believe that the information
> you need (as opposed to the specific item you are asking for) really
> has been put forth in this thread.

That only leads me to believe that such a code snippet simply
does not exist.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:42               ` Christoph Lameter
@ 2007-08-16  0:53                 ` Paul E. McKenney
  2007-08-16  0:59                   ` Christoph Lameter
  2007-08-16  1:51                 ` Paul Mackerras
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  0:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 05:42:07PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Seems to me that we face greater chance of confusion without the
> > volatile than with, particularly as compiler optimizations become
> > more aggressive.  Yes, we could simply disable optimization, but
> > optimization can be quite helpful.
> 
> A volatile default would disable optimizations for atomic_read. 
> atomic_read without volatile would allow for full optimization by the 
> compiler. Seems that this is what one wants in many cases.

The volatile cast should not disable all that many optimizations,
for example, it is much less hurtful than barrier().  Furthermore,
the main optimizations disabled (pulling atomic_read() and atomic_set()
out of loops) really do need to be disabled.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:32                               ` your mail Herbert Xu
@ 2007-08-16  0:58                                 ` Satyam Sharma
  2007-08-16  0:51                                   ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  0:58 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, ak, netdev,
	cfriesen, Heiko Carstens, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, clameter, schwidefsky, Chris Snook, davem,
	Linus Torvalds, wensong, wjiang

[ Sorry for empty subject line in previous mail. I intended to make
  a patch so cleared it to change it, but ultimately neither made
  a patch nor restored subject line. Done that now. ]

On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:06:00AM +0530, Satyam Sharma wrote:
> > 
> > that are:
> > 
> > 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> > 		mdelay(1);
> > 		msecs--;
> > 	}
> > 
> > where mdelay() becomes __const_udelay() which happens to be in another
> > translation unit (arch/i386/lib/delay.c) and hence saves this callsite
> > from being a bug :-)
> 
> The udelay itself certainly should have some form of cpu_relax in it.

Yes, a form of barrier() must be present in mdelay() or udelay() itself
as you say, having it in __const_udelay() is *not* enough (superflous
actually, considering it is already a separate translation unit and
invisible to the compiler).

However, there are no compiler barriers on the macro-definition-path
between mdelay(1) and __const_udelay(), so the only thing that saves us
from being a bug here is indeed the different-translation-unit concept.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:53                 ` Paul E. McKenney
@ 2007-08-16  0:59                   ` Christoph Lameter
  2007-08-16  1:14                     ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  0:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> The volatile cast should not disable all that many optimizations,
> for example, it is much less hurtful than barrier().  Furthermore,
> the main optimizations disabled (pulling atomic_read() and atomic_set()
> out of loops) really do need to be disabled.

In many cases you do not need a barrier. Having volatile there *will* 
impact optimization because the compiler cannot use a register that may 
contain the value that was fetched earlier. And the compiler cannot choose 
freely when to fetch the value. The order of memory accesses are fixed if 
you use volatile. If the variable is not volatile then the compiler can 
arrange memory accesses any way they fit and thus generate better code.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:59                   ` Christoph Lameter
@ 2007-08-16  1:14                     ` Paul E. McKenney
  2007-08-16  1:41                       ` Christoph Lameter
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  1:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 05:59:41PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > The volatile cast should not disable all that many optimizations,
> > for example, it is much less hurtful than barrier().  Furthermore,
> > the main optimizations disabled (pulling atomic_read() and atomic_set()
> > out of loops) really do need to be disabled.
> 
> In many cases you do not need a barrier. Having volatile there *will* 
> impact optimization because the compiler cannot use a register that may 
> contain the value that was fetched earlier. And the compiler cannot choose 
> freely when to fetch the value. The order of memory accesses are fixed if 
> you use volatile. If the variable is not volatile then the compiler can 
> arrange memory accesses any way they fit and thus generate better code.

Understood.  My point is not that the impact is precisely zero, but
rather that the impact on optimization is much less hurtful than the
problems that could arise otherwise, particularly as compilers become
more aggressive in their optimizations.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:53                               ` Herbert Xu
@ 2007-08-16  1:14                                 ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  1:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Howells, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 08:53:16AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> >
> > > Thanks.  But I don't need a summary of the thread, I'm asking
> > > for an extant code snippet in our kernel that benefits from
> > > the volatile change and is not part of a busy-wait.
> > 
> > Sorry, can't help you there.  I really do believe that the information
> > you need (as opposed to the specific item you are asking for) really
> > has been put forth in this thread.
> 
> That only leads me to believe that such a code snippet simply
> does not exist.

Whatever...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:51                                   ` Herbert Xu
@ 2007-08-16  1:18                                     ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  1:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, ak, netdev,
	cfriesen, Heiko Carstens, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, clameter, schwidefsky, Chris Snook, davem,
	Linus Torvalds, wensong, wjiang

Hi Herbert,


On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
> >
> > > The udelay itself certainly should have some form of cpu_relax in it.
> > 
> > Yes, a form of barrier() must be present in mdelay() or udelay() itself
> > as you say, having it in __const_udelay() is *not* enough (superflous
> > actually, considering it is already a separate translation unit and
> > invisible to the compiler).
> 
> As long as __const_udelay does something which has the same
> effect as barrier it is enough even if it's in the same unit.

Only if __const_udelay() is inlined. But as I said, __const_udelay()
-- although marked "inline" -- will never be inlined anywhere in the
kernel in reality. It's an exported symbol, and never inlined from
modules. Even from built-in targets, the definition of __const_udelay
is invisible when gcc is compiling the compilation units of those
callsites. The compiler has no idea that that function has barriers
or not, so we're saved here _only_ by the lucky fact that
__const_udelay() is in a different compilation unit.


> As a matter of fact it does on i386 where __delay either uses
> rep_nop or asm/volatile.

__delay() can be either delay_tsc() or delay_loop() on i386.

delay_tsc() uses the rep_nop() there for it's own little busy
loop, actually. But for a call site that inlines __const_udelay()
-- if it were ever moved to a .h file and marked inline -- the
call to __delay() will _still_ be across compilation units. So,
again for this case, it does not matter if the callee function
has compiler barriers or not (it would've been a different story
if we were discussing real/CPU barriers, I think), what saves us
here is just the fact that a call is made to a function from a
different compilation unit, which is invisible to the compiler
when compiling the callsite, and hence acting as the compiler
barrier.

Regarding delay_loop(), it uses "volatile" for the "asm" which
has quite different semantics from the C language "volatile"
type-qualifier keyword and does not imply any compiler barrier
at all.


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 22:44                           ` Paul E. McKenney
@ 2007-08-16  1:23                             ` Segher Boessenkool
  2007-08-16  2:22                               ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16  1:23 UTC (permalink / raw)
  To: paulmck
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>>>> No; compilation units have nothing to do with it, GCC can optimise
>>>> across compilation unit boundaries just fine, if you tell it to
>>>> compile more than one compilation unit at once.
>>>
>>> Last I checked, the Linux kernel build system did compile each .c 
>>> file
>>> as a separate compilation unit.
>>
>> I have some patches to use -combine -fwhole-program for Linux.
>> Highly experimental, you need a patched bleeding edge toolchain.
>> If there's interest I'll clean it up and put it online.
>>
>> David Woodhouse had some similar patches about a year ago.
>
> Sounds exciting...  ;-)

Yeah, the breakage is *quite* spectacular :-)

>>>>> In many cases, the compiler also has to assume that
>>>>> msleep_interruptible()
>>>>> might call back into a function in the current compilation unit, 
>>>>> thus
>>>>> possibly modifying global static variables.
>>>>
>>>> It most often is smart enough to see what compilation-unit-local
>>>> variables might be modified that way, though :-)
>>>
>>> Yep.  For example, if it knows the current value of a given such 
>>> local
>>> variable, and if all code paths that would change some other variable
>>> cannot be reached given that current value of the first variable.
>>
>> Or the most common thing: if neither the address of the translation-
>> unit local variable nor the address of any function writing to that
>> variable can "escape" from that translation unit, nothing outside
>> the translation unit can write to the variable.
>
> But there is usually at least one externally callable function in
> a .c file.

Of course, but often none of those will (indirectly) write a certain
static variable.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 20:42             ` Segher Boessenkool
@ 2007-08-16  1:23               ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  1:23 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak,
	cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang



On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> [...]
> > BTW:
> > 
> > #define atomic_read(a)	(*(volatile int *)&(a))
> > #define atomic_set(a,i)	(*(volatile int *)&(a) = (i))
> > 
> > int a;
> > 
> > void func(void)
> > {
> > 	int b;
> > 
> > 	b = atomic_read(a);
> > 	atomic_set(a, 20);
> > 	b = atomic_read(a);
> > }
> > 
> > gives:
> > 
> > func:
> > 	pushl	%ebp
> > 	movl	a, %eax
> > 	movl	%esp, %ebp
> > 	movl	$20, a
> > 	movl	a, %eax
> > 	popl	%ebp
> > 	ret
> > 
> > so the first atomic_read() wasn't optimized away.
> 
> Of course.  It is executed by the abstract machine, so
> it will be executed by the actual machine.  On the other
> hand, try
> 
> 	b = 0;
> 	if (b)
> 		b = atomic_read(a);
> 
> or similar.

Yup, obviously. Volatile accesses (or any access to volatile objects),
or even "__volatile__ asms" (which gcc normally promises never to elid)
can always be optimized for cases such as these where the compiler can
trivially determine that the code in question is not reachable.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:40           ` Herbert Xu
  2007-08-15 23:51             ` Paul E. McKenney
@ 2007-08-16  1:26             ` Segher Boessenkool
  2007-08-16  2:23               ` Nick Piggin
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16  1:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, davem, wensong, wjiang

>> Part of the motivation here is to fix heisenbugs.  If I knew where 
>> they
>
> By the same token we should probably disable optimisations
> altogether since that too can create heisenbugs.

Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:51             ` Paul E. McKenney
@ 2007-08-16  1:30               ` Segher Boessenkool
  2007-08-16  2:30                 ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16  1:30 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

>>> Part of the motivation here is to fix heisenbugs.  If I knew where 
>>> they
>>
>> By the same token we should probably disable optimisations
>> altogether since that too can create heisenbugs.
>
> Precisely the point -- use of volatile (whether in casts or on asms)
> in these cases are intended to disable those optimizations likely to
> result in heisenbugs.

The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re:
  2007-08-16  0:36                             ` Satyam Sharma
  2007-08-16  0:32                               ` your mail Herbert Xu
@ 2007-08-16  1:38                               ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16  1:38 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: horms, Stefan Richter, Linux Kernel Mailing List,
	Paul E. McKenney, ak, netdev, cfriesen, Heiko Carstens, rpjday,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>> "compilation unit" is a C standard term.  It typically boils down
>> to "single .c file".
>
> As you mentioned later, "single .c file with all the other files 
> (headers
> or other .c files) that it pulls in via #include" is actually 
> "translation
> unit", both in the C standard as well as gcc docs.

Yeah.  "single .c file after preprocessing".  Same thing :-)

> "Compilation unit"
> doesn't seem to be nearly as standard a term, though in most places it
> is indeed meant to be same as "translation unit", but with the new gcc
> inter-module-analysis stuff that you referred to above, I suspect one 
> may
> reasonably want to call a "compilation unit" as all that the compiler 
> sees
> at a given instant.

That would be a bit confusing, would it not?  They'd better find
some better name for that if they want to name it at all (remember,
none of these optimisations should have any effect on the semantics
of the program, you just get fewer .o files etc.).


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:14                     ` Paul E. McKenney
@ 2007-08-16  1:41                       ` Christoph Lameter
  2007-08-16  2:15                         ` Satyam Sharma
  2007-08-16  2:32                         ` Paul E. McKenney
  0 siblings, 2 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  1:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> Understood.  My point is not that the impact is precisely zero, but
> rather that the impact on optimization is much less hurtful than the
> problems that could arise otherwise, particularly as compilers become
> more aggressive in their optimizations.

The problems arise because barriers are not used as required. Volatile 
has wishy washy semantics and somehow marries memory barriers with data 
access. It is clearer to separate the two. Conceptual cleanness usually 
translates into better code. If one really wants the volatile then lets 
make it explicit and use

	atomic_read_volatile()

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  0:42               ` Christoph Lameter
  2007-08-16  0:53                 ` Paul E. McKenney
@ 2007-08-16  1:51                 ` Paul Mackerras
  2007-08-16  2:00                   ` Herbert Xu
  2007-08-16  2:07                   ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  1:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

Christoph Lameter writes:

> A volatile default would disable optimizations for atomic_read. 
> atomic_read without volatile would allow for full optimization by the 
> compiler. Seems that this is what one wants in many cases.

Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:51                 ` Paul Mackerras
@ 2007-08-16  2:00                   ` Herbert Xu
  2007-08-16  2:05                     ` Paul Mackerras
  2007-08-16  2:07                   ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  2:00 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 11:51:42AM +1000, Paul Mackerras wrote:
> 
> Name one such case.

See sk_stream_mem_schedule in net/core/stream.c:

        /* Under limit. */
        if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
                if (*sk->sk_prot->memory_pressure)
                        *sk->sk_prot->memory_pressure = 0;
                return 1;
        }

        /* Over hard limit. */
        if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
                sk->sk_prot->enter_memory_pressure();
                goto suppress_allocation;
        }

We don't need to reload sk->sk_prot->memory_allocated here.

Now where is your example again?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:00                   ` Herbert Xu
@ 2007-08-16  2:05                     ` Paul Mackerras
  2007-08-16  2:11                       ` Herbert Xu
                                         ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  2:05 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> See sk_stream_mem_schedule in net/core/stream.c:
> 
>         /* Under limit. */
>         if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
>                 if (*sk->sk_prot->memory_pressure)
>                         *sk->sk_prot->memory_pressure = 0;
>                 return 1;
>         }
> 
>         /* Over hard limit. */
>         if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
>                 sk->sk_prot->enter_memory_pressure();
>                 goto suppress_allocation;
>         }
> 
> We don't need to reload sk->sk_prot->memory_allocated here.

Are you sure?  How do you know some other CPU hasn't changed the value
in between?

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:51                 ` Paul Mackerras
  2007-08-16  2:00                   ` Herbert Xu
@ 2007-08-16  2:07                   ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16  2:07 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, Paul E. McKenney,
	netdev, ak, cfriesen, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

>> A volatile default would disable optimizations for atomic_read.
>> atomic_read without volatile would allow for full optimization by the
>> compiler. Seems that this is what one wants in many cases.
>
> Name one such case.
>
> An atomic_read should do a load from memory.  If the programmer puts
> an atomic_read() in the code then the compiler should emit a load for
> it, not re-use a value returned by a previous atomic_read.  I do not
> believe it would ever be useful for the compiler to collapse two
> atomic_read statements into a single load.

An atomic_read() implemented as a "normal" C variable read would
allow that read to be combined with another "normal" read from
that variable.  This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.

With an asm() implementation, the compiler can not do this; with
a "volatile" implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:15                         ` Satyam Sharma
@ 2007-08-16  2:08                           ` Herbert Xu
  2007-08-16  2:18                             ` Christoph Lameter
  2007-08-16  2:18                             ` Chris Friesen
  0 siblings, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  2:08 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul E. McKenney, Paul Mackerras,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 07:45:44AM +0530, Satyam Sharma wrote:
>
> Completely agreed, again. To summarize again (had done so about ~100 mails
> earlier in this thread too :-) ...
> 
> atomic_{read,set}_volatile() -- guarantees volatility also along with
> atomicity (the two _are_ different concepts after all, irrespective of
> whether callsites normally want one with the other or not)
> 
> atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
> free to elid / coalesce / optimize such accesses, can keep the object
> in question cached in a local register, leads to smaller text, etc.
> 
> As to which one should be the default atomic_read() is a question of
> whether majority of callsites (more weightage to important / hot
> codepaths, lesser to obscure callsites) want a particular behaviour.
> 
> Do we have a consensus here? (hoping against hope, probably :-)

I can certainly agree with this.

But I have to say that I still don't know of a single place
where one would actually use the volatile variant.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:05                     ` Paul Mackerras
@ 2007-08-16  2:11                       ` Herbert Xu
  2007-08-16  2:35                         ` Paul E. McKenney
  2007-08-16  3:15                         ` Paul Mackerras
  2007-08-16  2:15                       ` Christoph Lameter
  2007-08-16  2:33                       ` Satyam Sharma
  2 siblings, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  2:11 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 12:05:56PM +1000, Paul Mackerras wrote:
> Herbert Xu writes:
> 
> > See sk_stream_mem_schedule in net/core/stream.c:
> > 
> >         /* Under limit. */
> >         if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> >                 if (*sk->sk_prot->memory_pressure)
> >                         *sk->sk_prot->memory_pressure = 0;
> >                 return 1;
> >         }
> > 
> >         /* Over hard limit. */
> >         if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> >                 sk->sk_prot->enter_memory_pressure();
> >                 goto suppress_allocation;
> >         }
> > 
> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

Yes I'm sure, because we don't care if others have increased
the reservation.

Note that even if we did we'd be using barriers so volatile
won't do us any good here.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:41                       ` Christoph Lameter
@ 2007-08-16  2:15                         ` Satyam Sharma
  2007-08-16  2:08                           ` Herbert Xu
  2007-08-16  2:32                         ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  2:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Paul Mackerras, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, 15 Aug 2007, Christoph Lameter wrote:

> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Understood.  My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
> 
> The problems arise because barriers are not used as required. Volatile 
> has wishy washy semantics and somehow marries memory barriers with data 
> access. It is clearer to separate the two. Conceptual cleanness usually 
> translates into better code. If one really wants the volatile then lets 
> make it explicit and use
> 
> 	atomic_read_volatile()

Completely agreed, again. To summarize again (had done so about ~100 mails
earlier in this thread too :-) ...

atomic_{read,set}_volatile() -- guarantees volatility also along with
atomicity (the two _are_ different concepts after all, irrespective of
whether callsites normally want one with the other or not)

atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
free to elid / coalesce / optimize such accesses, can keep the object
in question cached in a local register, leads to smaller text, etc.

As to which one should be the default atomic_read() is a question of
whether majority of callsites (more weightage to important / hot
codepaths, lesser to obscure callsites) want a particular behaviour.

Do we have a consensus here? (hoping against hope, probably :-)

[ This thread has gotten completely out of hand ... for my mail client
  alpine as well, it now seems. Reminds of that 1000+ GPLv3 fest :-) ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:05                     ` Paul Mackerras
  2007-08-16  2:11                       ` Herbert Xu
@ 2007-08-16  2:15                       ` Christoph Lameter
  2007-08-16  2:17                         ` Christoph Lameter
  2007-08-16  2:33                       ` Satyam Sharma
  2 siblings, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  2:15 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Paul E. McKenney, Satyam Sharma, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

The cpu knows because the cacheline was not invalidated.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:15                       ` Christoph Lameter
@ 2007-08-16  2:17                         ` Christoph Lameter
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  2:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Paul E. McKenney, Satyam Sharma, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Wed, 15 Aug 2007, Christoph Lameter wrote:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> The cpu knows because the cacheline was not invalidated.

Crap my statement above is wrong..... We do not care that the 
value was changed otherwise we would have put a barrier in there.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:08                           ` Herbert Xu
@ 2007-08-16  2:18                             ` Christoph Lameter
  2007-08-16  3:23                               ` Paul Mackerras
  2007-08-16  2:18                             ` Chris Friesen
  1 sibling, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16  2:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Satyam Sharma, Paul E. McKenney, Paul Mackerras, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, 16 Aug 2007, Herbert Xu wrote:

> > Do we have a consensus here? (hoping against hope, probably :-)
> 
> I can certainly agree with this.

I agree too.

> But I have to say that I still don't know of a single place
> where one would actually use the volatile variant.

I suspect that what you say is true after we have looked at all callers.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:08                           ` Herbert Xu
  2007-08-16  2:18                             ` Christoph Lameter
@ 2007-08-16  2:18                             ` Chris Friesen
  1 sibling, 0 replies; 657+ messages in thread
From: Chris Friesen @ 2007-08-16  2:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney,
	Paul Mackerras, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, zlynx, rpjday, jesper.juhl, segher

Herbert Xu wrote:

> But I have to say that I still don't know of a single place
> where one would actually use the volatile variant.

Given that many of the existing users do currently have "volatile", are 
you comfortable simply removing that behaviour from them?  Are you sure 
that you will not introduce any issues?

Forcing a re-read is only a performance penalty.  Removing it can cause 
behavioural changes.

I would be more comfortable making the default match the majority of the 
current implementations (ie: volatile semantics).  Then, if someone 
cares about performance they can explicitly validate the call path and 
convert it over to the non-volatile version.

Correctness before speed...

Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:23                             ` Segher Boessenkool
@ 2007-08-16  2:22                               ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  2:22 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Thu, Aug 16, 2007 at 03:23:28AM +0200, Segher Boessenkool wrote:
> >>>>No; compilation units have nothing to do with it, GCC can optimise
> >>>>across compilation unit boundaries just fine, if you tell it to
> >>>>compile more than one compilation unit at once.
> >>>
> >>>Last I checked, the Linux kernel build system did compile each .c 
> >>>file
> >>>as a separate compilation unit.
> >>
> >>I have some patches to use -combine -fwhole-program for Linux.
> >>Highly experimental, you need a patched bleeding edge toolchain.
> >>If there's interest I'll clean it up and put it online.
> >>
> >>David Woodhouse had some similar patches about a year ago.
> >
> >Sounds exciting...  ;-)
> 
> Yeah, the breakage is *quite* spectacular :-)

I bet!!!  ;-)

> >>>>>In many cases, the compiler also has to assume that
> >>>>>msleep_interruptible()
> >>>>>might call back into a function in the current compilation unit, 
> >>>>>thus
> >>>>>possibly modifying global static variables.
> >>>>
> >>>>It most often is smart enough to see what compilation-unit-local
> >>>>variables might be modified that way, though :-)
> >>>
> >>>Yep.  For example, if it knows the current value of a given such 
> >>>local
> >>>variable, and if all code paths that would change some other variable
> >>>cannot be reached given that current value of the first variable.
> >>
> >>Or the most common thing: if neither the address of the translation-
> >>unit local variable nor the address of any function writing to that
> >>variable can "escape" from that translation unit, nothing outside
> >>the translation unit can write to the variable.
> >
> >But there is usually at least one externally callable function in
> >a .c file.
> 
> Of course, but often none of those will (indirectly) write a certain
> static variable.

But there has to be some path to the static functions, assuming that
they are not dead code.  Yes, there can be cases where the compiler
knows enough about the state of the variables to rule out some of code
paths to them, but I have no idea how often this happens in kernel
code.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:26             ` Segher Boessenkool
@ 2007-08-16  2:23               ` Nick Piggin
  2007-08-16 19:32                 ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-16  2:23 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Herbert Xu, heiko.carstens, horms, linux-kernel, rpjday, ak,
	netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx,
	satyam, clameter, schwidefsky, Chris Snook, davem, wensong,
	wjiang

Segher Boessenkool wrote:
>>> Part of the motivation here is to fix heisenbugs.  If I knew where they
>>
>>
>> By the same token we should probably disable optimisations
>> altogether since that too can create heisenbugs.
> 
> 
> Almost everything is a tradeoff; and so is this.  I don't
> believe most people would find disabling all compiler
> optimisations an acceptable price to pay for some peace
> of mind.

So why is this a good tradeoff?

I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?

On the actual proposal to make atomic_operators volatile: I think the
better approach in the long term, for both maintainability of the
code and education of coders, is to make the use of barriers _more_
explicit rather than sprinkling these "just in case" ones around.

You may get rid of a few atomic_read heisenbugs (in noise when
compared to all bugs), but if the coder was using a regular atomic
load, or a test_bit (which is also atomic), etc. then they're going
to have problems.

It would be better for Linux if everyone was to have better awareness
of barriers than to hide some of the cases where they're required.
A pretty large number of bugs I see in lock free code in the VM is
due to memory ordering problems. It's hard to find those bugs, or
even be aware when you're writing buggy code if you don't have some
feel for barriers.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:30               ` Segher Boessenkool
@ 2007-08-16  2:30                 ` Paul E. McKenney
  2007-08-16 19:33                   ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  2:30 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

On Thu, Aug 16, 2007 at 03:30:44AM +0200, Segher Boessenkool wrote:
> >>>Part of the motivation here is to fix heisenbugs.  If I knew where 
> >>>they
> >>
> >>By the same token we should probably disable optimisations
> >>altogether since that too can create heisenbugs.
> >
> >Precisely the point -- use of volatile (whether in casts or on asms)
> >in these cases are intended to disable those optimizations likely to
> >result in heisenbugs.
> 
> The only thing volatile on an asm does is create a side effect
> on the asm statement; in effect, it tells the compiler "do not
> remove this asm even if you don't need any of its outputs".
> 
> It's not disabling optimisation likely to result in bugs,
> heisen- or otherwise; _not_ putting the volatile on an asm
> that needs it simply _is_ a bug :-)

Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  1:41                       ` Christoph Lameter
  2007-08-16  2:15                         ` Satyam Sharma
@ 2007-08-16  2:32                         ` Paul E. McKenney
  1 sibling, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  2:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu

On Wed, Aug 15, 2007 at 06:41:40PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Understood.  My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
> 
> The problems arise because barriers are not used as required. Volatile 
> has wishy washy semantics and somehow marries memory barriers with data 
> access. It is clearer to separate the two. Conceptual cleanness usually 
> translates into better code. If one really wants the volatile then lets 
> make it explicit and use
> 
> 	atomic_read_volatile()

There are indeed architectures where you can cause gcc to emit memory
barriers in response to volatile.  I am assuming that we are -not-
making gcc do this.  Given this, then volatiles and memory barrier
instructions are orthogonal -- one controls the compiler, the other
controls the CPU.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:05                     ` Paul Mackerras
  2007-08-16  2:11                       ` Herbert Xu
  2007-08-16  2:15                       ` Christoph Lameter
@ 2007-08-16  2:33                       ` Satyam Sharma
  2007-08-16  3:01                         ` Satyam Sharma
  2007-08-16  3:05                         ` Paul Mackerras
  2 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  2:33 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Christoph Lameter, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher



On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > See sk_stream_mem_schedule in net/core/stream.c:
> > 
> >         /* Under limit. */
> >         if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> >                 if (*sk->sk_prot->memory_pressure)
> >                         *sk->sk_prot->memory_pressure = 0;
> >                 return 1;
> >         }
> > 
> >         /* Over hard limit. */
> >         if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> >                 sk->sk_prot->enter_memory_pressure();
> >                 goto suppress_allocation;
> >         }
> > 
> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

I can't speak for this particular case, but there could be similar code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind of
problem you're referring to here. It would be useful for such or similar
code if the compiler kept the value of that atomic object in a register.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:11                       ` Herbert Xu
@ 2007-08-16  2:35                         ` Paul E. McKenney
  2007-08-16  3:15                         ` Paul Mackerras
  1 sibling, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16  2:35 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Christoph Lameter, Satyam Sharma, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 10:11:05AM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 12:05:56PM +1000, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > See sk_stream_mem_schedule in net/core/stream.c:
> > > 
> > >         /* Under limit. */
> > >         if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> > >                 if (*sk->sk_prot->memory_pressure)
> > >                         *sk->sk_prot->memory_pressure = 0;
> > >                 return 1;
> > >         }
> > > 
> > >         /* Over hard limit. */
> > >         if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> > >                 sk->sk_prot->enter_memory_pressure();
> > >                 goto suppress_allocation;
> > >         }
> > > 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> Yes I'm sure, because we don't care if others have increased
> the reservation.
> 
> Note that even if we did we'd be using barriers so volatile
> won't do us any good here.

If the load-coalescing is important to performance, why not load into
a local variable?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:33                       ` Satyam Sharma
@ 2007-08-16  3:01                         ` Satyam Sharma
  2007-08-16  4:11                           ` Paul Mackerras
  2007-08-16  3:05                         ` Paul Mackerras
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  3:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Christoph Lameter, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher



On Thu, 16 Aug 2007, Satyam Sharma wrote:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > See sk_stream_mem_schedule in net/core/stream.c:
> > > 
> > >         /* Under limit. */
> > >         if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> > >                 if (*sk->sk_prot->memory_pressure)
> > >                         *sk->sk_prot->memory_pressure = 0;
> > >                 return 1;
> > >         }
> > > 
> > >         /* Over hard limit. */
> > >         if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
> > >                 sk->sk_prot->enter_memory_pressure();
> > >                 goto suppress_allocation;
> > >         }
> > > 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> I can't speak for this particular case, but there could be similar code
> examples elsewhere, where we do the atomic ops on an atomic_t object
> inside a higher-level locking scheme that would take care of the kind of
> problem you're referring to here. It would be useful for such or similar
> code if the compiler kept the value of that atomic object in a register.

We might not be using atomic_t (and ops) if we already have a higher-level
locking scheme, actually. So as Herbert mentioned, such cases might just
not care. [ Too much of this thread, too little sleep, sorry! ]

Anyway, the problem, of course, is that this conversion to a stronger /
safer-by-default behaviour doesn't happen with zero cost to performance.
Converting atomic ops to "volatile" behaviour did add ~2K to kernel text
for archs such as i386 (possibly to important codepaths) that didn't have
those semantics already so it would be constructive to actually look at
those differences and see if there were really any heisenbugs that got
rectified. Or if there were legitimate optimizations that got wrongly
disabled. Onus lies on those proposing the modifications, I'd say ;-)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:33                       ` Satyam Sharma
  2007-08-16  3:01                         ` Satyam Sharma
@ 2007-08-16  3:05                         ` Paul Mackerras
  2007-08-16 19:39                           ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  3:05 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Christoph Lameter, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Satyam Sharma writes:

> I can't speak for this particular case, but there could be similar code
> examples elsewhere, where we do the atomic ops on an atomic_t object
> inside a higher-level locking scheme that would take care of the kind of
> problem you're referring to here. It would be useful for such or similar
> code if the compiler kept the value of that atomic object in a register.

If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:11                       ` Herbert Xu
  2007-08-16  2:35                         ` Paul E. McKenney
@ 2007-08-16  3:15                         ` Paul Mackerras
  2007-08-16  3:43                           ` Herbert Xu
  1 sibling, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  3:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> Yes I'm sure, because we don't care if others have increased
> the reservation.

But others can also reduce the reservation.  Also, the code sets and
clears *sk->sk_prot->memory_pressure nonatomically with respect to the
reads of sk->sk_prot->memory_allocated, so in fact the code doesn't
guarantee any particular relationship between the two.

That code looks like a beautiful example of buggy, racy code where
someone has sprinkled magic fix-the-races dust (otherwise known as
atomic_t) around in a vain attempt to fix the races.

That's assuming that all that stuff actually performs any useful
purpose, of course, and that there isn't some lock held by the
callers.  In the latter case it is pointless using atomic_t.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:18                             ` Christoph Lameter
@ 2007-08-16  3:23                               ` Paul Mackerras
  2007-08-16  3:33                                 ` Herbert Xu
                                                   ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  3:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Herbert Xu, Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Christoph Lameter writes:

> > But I have to say that I still don't know of a single place
> > where one would actually use the volatile variant.
> 
> I suspect that what you say is true after we have looked at all callers.

It seems that there could be a lot of places where atomic_t is used in
a non-atomic fashion, and that those uses are either buggy, or there
is some lock held at the time which guarantees that other CPUs aren't
changing the value.  In both cases there is no point in using
atomic_t; we might as well just use an ordinary int.

In particular, atomic_read seems to lend itself to buggy uses.  People
seem to do things like:

	atomic_add(&v, something);
	if (atomic_read(&v) > something_else) ...

and expect that there is some relationship between the value that the
atomic_add stored and the value that the atomic_read will return,
which there isn't.  People seem to think that using atomic_t magically
gets rid of races.  It doesn't.

I'd go so far as to say that anywhere where you want a non-"volatile"
atomic_read, either your code is buggy, or else an int would work just
as well.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:23                               ` Paul Mackerras
@ 2007-08-16  3:33                                 ` Herbert Xu
  2007-08-16  3:48                                   ` Paul Mackerras
  2007-08-16 18:48                                 ` Christoph Lameter
  2007-08-16 19:44                                 ` Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  3:33 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 01:23:06PM +1000, Paul Mackerras wrote:
>
> In particular, atomic_read seems to lend itself to buggy uses.  People
> seem to do things like:
> 
> 	atomic_add(&v, something);
> 	if (atomic_read(&v) > something_else) ...

If you're referring to the code in sk_stream_mem_schedule
then it's working as intended.  The atomicity guarantees
that the atomic_add/atomic_sub won't be seen in parts by
other readers.

We certainly do not need to see other atomic_add/atomic_sub
operations immediately.

If you're referring to another code snippet please cite.

> I'd go so far as to say that anywhere where you want a non-"volatile"
> atomic_read, either your code is buggy, or else an int would work just
> as well.

An int won't work here because += and -= do not have the
atomicity guarantees that atomic_add/atomic_sub do.  In
particular, this may cause an atomic_read on another CPU
to give a bogus reading.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 12:31       ` Satyam Sharma
                           ` (2 preceding siblings ...)
  2007-08-15 23:22         ` Paul Mackerras
@ 2007-08-16  3:37         ` Bill Fink
  2007-08-16  5:20           ` Satyam Sharma
  3 siblings, 1 reply; 657+ messages in thread
From: Bill Fink @ 2007-08-16  3:37 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Wed, 15 Aug 2007, Satyam Sharma wrote:

> (C)
> $ cat tp3.c
> int a;
> 
> void func(void)
> {
> 	*(volatile int *)&a = 10;
> 	*(volatile int *)&a = 20;
> }
> $ gcc -Os -S tp3.c
> $ cat tp3.s
> ...
> movl    $10, a
> movl    $20, a
> ...

I'm curious about one minor tangential point.  Why, instead of:

	b = *(volatile int *)&a;

why can't this just be expressed as:

	b = (volatile int)a;

Isn't it the contents of a that's volatile, i.e. it's value can change
invisibly to the compiler, and that's why you want to force a read from
memory?  Why do you need the "*(volatile int *)&" construct?

						-Bill

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:15                         ` Paul Mackerras
@ 2007-08-16  3:43                           ` Herbert Xu
  0 siblings, 0 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  3:43 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Paul E. McKenney, Satyam Sharma,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 01:15:05PM +1000, Paul Mackerras wrote:
> 
> But others can also reduce the reservation.  Also, the code sets and
> clears *sk->sk_prot->memory_pressure nonatomically with respect to the
> reads of sk->sk_prot->memory_allocated, so in fact the code doesn't
> guarantee any particular relationship between the two.

Yes others can reduce the reservation, but the point of this
is that the code doesn't care.  We'll either see the value
before or after the reduction and in either case we'll do
something sensible.

The worst that can happen is when we're just below the hard
limit and multiple CPUs fail to allocate but that's not really
a problem because if the machine is making progress at all
then we will eventually scale back and allow these allocations
to succeed.

As to the non-atomic operation on memory_pressue, that's OK
because we only ever assign values to it and never do other
operations such as += or -=.  Remember that int/long assignments
must be atomic or Linux won't run on your architecture.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:33                                 ` Herbert Xu
@ 2007-08-16  3:48                                   ` Paul Mackerras
  2007-08-16  4:03                                     ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  3:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> If you're referring to the code in sk_stream_mem_schedule
> then it's working as intended.  The atomicity guarantees

You mean it's intended that *sk->sk_prot->memory_pressure can end up
as 1 when sk->sk_prot->memory_allocated is small (less than
->sysctl_mem[0]), or as 0 when ->memory_allocated is large (greater
than ->sysctl_mem[2])?  Because that's the effect of the current code.
If so I wonder why you bother computing it.

> that the atomic_add/atomic_sub won't be seen in parts by
> other readers.
> 
> We certainly do not need to see other atomic_add/atomic_sub
> operations immediately.
> 
> If you're referring to another code snippet please cite.
> 
> > I'd go so far as to say that anywhere where you want a non-"volatile"
> > atomic_read, either your code is buggy, or else an int would work just
> > as well.
> 
> An int won't work here because += and -= do not have the
> atomicity guarantees that atomic_add/atomic_sub do.  In
> particular, this may cause an atomic_read on another CPU
> to give a bogus reading.

The point is that guaranteeing the atomicity of the increment or
decrement does not suffice to make the code race-free.  In this case
the race arises from the fact that reading ->memory_allocated and
setting *->memory_pressure are separate operations.  To make that code
work properly you need a lock.  And once you have the lock an ordinary
int would suffice for ->memory_allocated.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:48                                   ` Paul Mackerras
@ 2007-08-16  4:03                                     ` Herbert Xu
  2007-08-16  4:34                                       ` Paul Mackerras
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  4:03 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 01:48:32PM +1000, Paul Mackerras wrote:
> Herbert Xu writes:
> 
> > If you're referring to the code in sk_stream_mem_schedule
> > then it's working as intended.  The atomicity guarantees
> 
> You mean it's intended that *sk->sk_prot->memory_pressure can end up
> as 1 when sk->sk_prot->memory_allocated is small (less than
> ->sysctl_mem[0]), or as 0 when ->memory_allocated is large (greater
> than ->sysctl_mem[2])?  Because that's the effect of the current code.
> If so I wonder why you bother computing it.

You need to remember that there are three different limits:
minimum, pressure, and maximum.  By default we should never
be in a situation where what you say can occur.

If you set all three limits to the same thing, then yes it
won't work as intended but it's still well-behaved.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:01                         ` Satyam Sharma
@ 2007-08-16  4:11                           ` Paul Mackerras
  2007-08-16  5:39                             ` Herbert Xu
  2007-08-16 18:54                             ` Christoph Lameter
  0 siblings, 2 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  4:11 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Christoph Lameter, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Satyam Sharma writes:

> Anyway, the problem, of course, is that this conversion to a stronger /
> safer-by-default behaviour doesn't happen with zero cost to performance.
> Converting atomic ops to "volatile" behaviour did add ~2K to kernel text
> for archs such as i386 (possibly to important codepaths) that didn't have
> those semantics already so it would be constructive to actually look at
> those differences and see if there were really any heisenbugs that got
> rectified. Or if there were legitimate optimizations that got wrongly
> disabled. Onus lies on those proposing the modifications, I'd say ;-)

The uses of atomic_read where one might want it to allow caching of
the result seem to me to fall into 3 categories:

1. Places that are buggy because of a race arising from the way it's
   used.

2. Places where there is a race but it doesn't matter because we're
   doing some clever trick.

3. Places where there is some locking in place that eliminates any
   potential race.

In case 1, adding volatile won't solve the race, of course, but it's
hard to argue that we shouldn't do something because it will slow down
buggy code.  Case 2 is hopefully pretty rare and accompanied by large
comment blocks, and in those cases caching the result of atomic_read
explicitly in a local variable would probably make the code clearer.
And in case 3 there is no reason to use atomic_t at all; we might as
well just use an int.

So I don't see any good reason to make the atomic API more complex by
having "volatile" and "non-volatile" versions of atomic_read.  It
should just have the "volatile" behaviour.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  4:03                                     ` Herbert Xu
@ 2007-08-16  4:34                                       ` Paul Mackerras
  2007-08-16  5:37                                         ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  4:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> > You mean it's intended that *sk->sk_prot->memory_pressure can end up
> > as 1 when sk->sk_prot->memory_allocated is small (less than
> > ->sysctl_mem[0]), or as 0 when ->memory_allocated is large (greater
> > than ->sysctl_mem[2])?  Because that's the effect of the current code.
> > If so I wonder why you bother computing it.
> 
> You need to remember that there are three different limits:
> minimum, pressure, and maximum.  By default we should never
> be in a situation where what you say can occur.
> 
> If you set all three limits to the same thing, then yes it
> won't work as intended but it's still well-behaved.

I'm not talking about setting all three limits to the same thing.

I'm talking about this situation:

CPU 0 comes into __sk_stream_mem_reclaim, reads memory_allocated, but
then before it can do the store to *memory_pressure, CPUs 1-1023 all
go through sk_stream_mem_schedule, collectively increase
memory_allocated to more than sysctl_mem[2] and set *memory_pressure.
Finally CPU 0 gets to do its store and it sets *memory_pressure back
to 0, but by this stage memory_allocated is way larger than
sysctl_mem[2].

Yes, it's unlikely, but that is the nature of race conditions - they
are unlikely, and only show up at inconvenient times, never when
someone who could fix the bug is watching. :)

Similarly it would be possible for other CPUs to decrease
memory_allocated from greater than sysctl_mem[2] to less than
sysctl_mem[0] in the interval between when we read memory_allocated
and set *memory_pressure to 1.  And it's quite possible for their
setting of *memory_pressure to 0 to happen before our setting of it to
1, so that it ends up at 1 when it should be 0.

Now, maybe it's the case that it doesn't really matter whether
*->memory_pressure is 0 or 1.  But if so, why bother computing it at
all?

People seem to think that using atomic_t means they don't need to use
a spinlock.  That's fine if there is only one variable involved, but
as soon as there's more than one, there's the possibility of a race,
whether or not you use atomic_t, and whether or not atomic_read has
"volatile" behaviour.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:37         ` Bill Fink
@ 2007-08-16  5:20           ` Satyam Sharma
  2007-08-16  5:57             ` Satyam Sharma
  2007-08-16 20:50             ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  5:20 UTC (permalink / raw)
  To: Bill Fink
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

Hi Bill,

On Wed, 15 Aug 2007, Bill Fink wrote:

> On Wed, 15 Aug 2007, Satyam Sharma wrote:
> 
> > (C)
> > $ cat tp3.c
> > int a;
> > 
> > void func(void)
> > {
> > 	*(volatile int *)&a = 10;
> > 	*(volatile int *)&a = 20;
> > }
> > $ gcc -Os -S tp3.c
> > $ cat tp3.s
> > ...
> > movl    $10, a
> > movl    $20, a
> > ...
> 
> I'm curious about one minor tangential point.  Why, instead of:
> 
> 	b = *(volatile int *)&a;
> 
> why can't this just be expressed as:
> 
> 	b = (volatile int)a;
> 
> Isn't it the contents of a that's volatile, i.e. it's value can change
> invisibly to the compiler, and that's why you want to force a read from
> memory?  Why do you need the "*(volatile int *)&" construct?

"b = (volatile int)a;" doesn't help us because a cast to a qualified type
has the same effect as a cast to an unqualified version of that type, as
mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
is a type-qualifier, not a type itself, so a cast of the _object_ itself
to a qualified-type i.e. (volatile int) would not make the access itself
volatile-qualified.

To serve our purposes, it is necessary for us to take the address of this
(non-volatile) object, cast the resulting _pointer_ to the corresponding
volatile-qualified pointer-type, and then dereference it. This makes that
particular _access_ be volatile-qualified, without the object itself being
such. Also note that the (dereferenced) result is also a valid lvalue and
hence can be used in "*(volatile int *)&a = b;" kind of construction
(which we use for the atomic_set case).

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  4:34                                       ` Paul Mackerras
@ 2007-08-16  5:37                                         ` Herbert Xu
  2007-08-16  6:00                                           ` Paul Mackerras
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  5:37 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 02:34:25PM +1000, Paul Mackerras wrote:
>
> I'm talking about this situation:
> 
> CPU 0 comes into __sk_stream_mem_reclaim, reads memory_allocated, but
> then before it can do the store to *memory_pressure, CPUs 1-1023 all
> go through sk_stream_mem_schedule, collectively increase
> memory_allocated to more than sysctl_mem[2] and set *memory_pressure.
> Finally CPU 0 gets to do its store and it sets *memory_pressure back
> to 0, but by this stage memory_allocated is way larger than
> sysctl_mem[2].

It doesn't matter.  The memory pressure flag is an *advisory*
flag.  If we get it wrong the worst that'll happen is that we'd
waste some time doing work that'll be thrown away.

Please look at the places where it's used before jumping to
conclusions.

> Now, maybe it's the case that it doesn't really matter whether
> *->memory_pressure is 0 or 1.  But if so, why bother computing it at
> all?

As long as we get it right most of the time (and I think you
would agree that we do get it right most of the time), then
this flag has achieved its purpose.

> People seem to think that using atomic_t means they don't need to use
> a spinlock.  That's fine if there is only one variable involved, but
> as soon as there's more than one, there's the possibility of a race,
> whether or not you use atomic_t, and whether or not atomic_read has
> "volatile" behaviour.

In any case, this actually illustrates why the addition of
volatile is completely pointless.  Even if this code was
broken, which it definitely is not, having the volatile
there wouldn't have helped at all.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  4:11                           ` Paul Mackerras
@ 2007-08-16  5:39                             ` Herbert Xu
  2007-08-16  6:56                               ` Paul Mackerras
  2007-08-16 18:54                             ` Christoph Lameter
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  5:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 02:11:43PM +1000, Paul Mackerras wrote:
>
> The uses of atomic_read where one might want it to allow caching of
> the result seem to me to fall into 3 categories:
> 
> 1. Places that are buggy because of a race arising from the way it's
>    used.
> 
> 2. Places where there is a race but it doesn't matter because we're
>    doing some clever trick.
> 
> 3. Places where there is some locking in place that eliminates any
>    potential race.

Agreed.

> In case 1, adding volatile won't solve the race, of course, but it's
> hard to argue that we shouldn't do something because it will slow down
> buggy code.  Case 2 is hopefully pretty rare and accompanied by large
> comment blocks, and in those cases caching the result of atomic_read
> explicitly in a local variable would probably make the code clearer.
> And in case 3 there is no reason to use atomic_t at all; we might as
> well just use an int.

Since adding volatile doesn't help any of the 3 cases, and
takes away optimisations from both 2 and 3, I wonder what
is the point of the addition after all?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:20           ` Satyam Sharma
@ 2007-08-16  5:57             ` Satyam Sharma
  2007-08-16  9:25               ` Satyam Sharma
  2007-08-16 21:00               ` Segher Boessenkool
  2007-08-16 20:50             ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  5:57 UTC (permalink / raw)
  To: Bill Fink
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney



On Thu, 16 Aug 2007, Satyam Sharma wrote:

> Hi Bill,
> 
> 
> On Wed, 15 Aug 2007, Bill Fink wrote:
> 
> > On Wed, 15 Aug 2007, Satyam Sharma wrote:
> > 
> > > (C)
> > > $ cat tp3.c
> > > int a;
> > > 
> > > void func(void)
> > > {
> > > 	*(volatile int *)&a = 10;
> > > 	*(volatile int *)&a = 20;
> > > }
> > > $ gcc -Os -S tp3.c
> > > $ cat tp3.s
> > > ...
> > > movl    $10, a
> > > movl    $20, a
> > > ...
> > 
> > I'm curious about one minor tangential point.  Why, instead of:
> > 
> > 	b = *(volatile int *)&a;
> > 
> > why can't this just be expressed as:
> > 
> > 	b = (volatile int)a;
> > 
> > Isn't it the contents of a that's volatile, i.e. it's value can change
> > invisibly to the compiler, and that's why you want to force a read from
> > memory?  Why do you need the "*(volatile int *)&" construct?
> 
> "b = (volatile int)a;" doesn't help us because a cast to a qualified type
> has the same effect as a cast to an unqualified version of that type, as
> mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
> is a type-qualifier, not a type itself, so a cast of the _object_ itself
> to a qualified-type i.e. (volatile int) would not make the access itself
> volatile-qualified.
> 
> To serve our purposes, it is necessary for us to take the address of this
> (non-volatile) object, cast the resulting _pointer_ to the corresponding
> volatile-qualified pointer-type, and then dereference it. This makes that
> particular _access_ be volatile-qualified, without the object itself being
> such. Also note that the (dereferenced) result is also a valid lvalue and
> hence can be used in "*(volatile int *)&a = b;" kind of construction
> (which we use for the atomic_set case).

Here, I should obviously admit that the semantics of *(volatile int *)&
aren't any neater or well-defined in the _language standard_ at all. The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help us out here by doing the right thing. Accessing the non-volatile
object there using the volatile-qualified pointer-type cast makes GCC
treat the object stored at that memory address itself as if it were a 
volatile object, thus making the access end up having what we're calling
as "volatility" semantics here.

Honestly, given such confusion, and the propensity of the "volatile"
type-qualifier keyword to be ill-defined (or at least poorly understood,
often inconsistently implemented), I'd (again) express my opinion that it
would be best to avoid its usage, given other alternatives do exist.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:37                                         ` Herbert Xu
@ 2007-08-16  6:00                                           ` Paul Mackerras
  2007-08-16 18:50                                             ` Christoph Lameter
  0 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  6:00 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Satyam Sharma, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> It doesn't matter.  The memory pressure flag is an *advisory*
> flag.  If we get it wrong the worst that'll happen is that we'd
> waste some time doing work that'll be thrown away.

Ah, so it's the "racy but I don't care because it's only an
optimization" case.  That's fine.  Somehow I find it hard to believe
that all the racy uses of atomic_read in the kernel are like that,
though. :)

> In any case, this actually illustrates why the addition of
> volatile is completely pointless.  Even if this code was
> broken, which it definitely is not, having the volatile
> there wouldn't have helped at all.

Yes, adding volatile to racy code doesn't somehow make it race-free.
Neither does using atomic_t, despite what some seem to believe.

I have actually started going through all the uses of atomic_read in
the kernel.  So far out of the first 100 I have found none where we
have two atomic_reads of the same variable and the compiler could
usefully use the value from the first as the result of the second.
But there's still > 2500 to go...

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:39                             ` Herbert Xu
@ 2007-08-16  6:56                               ` Paul Mackerras
  2007-08-16  7:09                                 ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-16  6:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> On Thu, Aug 16, 2007 at 02:11:43PM +1000, Paul Mackerras wrote:
> >
> > The uses of atomic_read where one might want it to allow caching of
> > the result seem to me to fall into 3 categories:
> > 
> > 1. Places that are buggy because of a race arising from the way it's
> >    used.
> > 
> > 2. Places where there is a race but it doesn't matter because we're
> >    doing some clever trick.
> > 
> > 3. Places where there is some locking in place that eliminates any
> >    potential race.
> 
> Agreed.
> 
> > In case 1, adding volatile won't solve the race, of course, but it's
> > hard to argue that we shouldn't do something because it will slow down
> > buggy code.  Case 2 is hopefully pretty rare and accompanied by large
> > comment blocks, and in those cases caching the result of atomic_read
> > explicitly in a local variable would probably make the code clearer.
> > And in case 3 there is no reason to use atomic_t at all; we might as
> > well just use an int.
> 
> Since adding volatile doesn't help any of the 3 cases, and
> takes away optimisations from both 2 and 3, I wonder what
> is the point of the addition after all?

Note that I said these are the cases _where one might want to allow
caching_, so of course adding volatile doesn't help _these_ cases.
There are of course other cases where one definitely doesn't want to
allow the compiler to cache the value, such as when polling an atomic
variable waiting for another CPU to change it, and from my inspection
so far these cases seem to be the majority.

The reasons for having "volatile" behaviour of atomic_read (whether or
not that is achieved by use of the "volatile" C keyword) are

- It matches the normal expectation based on the name "atomic_read"
- It matches the behaviour of the other atomic_* primitives
- It avoids bugs in the cases where "volatile" behaviour is required

To my mind these outweigh the small benefit for some code of the
non-volatile (caching-allowed) behaviour.  In fact it's pretty minor
either way, and since x86[-64] has this behaviour, one can expect the
potential bugs in generic code to have mostly been found, although
perhaps not all of them since x86[-64] has less aggressive reordering
of memory accesses and fewer registers in which to cache things than
some other architectures.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  6:56                               ` Paul Mackerras
@ 2007-08-16  7:09                                 ` Herbert Xu
  2007-08-16  8:06                                   ` Stefan Richter
  2007-08-16 14:48                                   ` Ilpo Järvinen
  0 siblings, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  7:09 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney,
	Stefan Richter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote:
>
> Note that I said these are the cases _where one might want to allow
> caching_, so of course adding volatile doesn't help _these_ cases.
> There are of course other cases where one definitely doesn't want to
> allow the compiler to cache the value, such as when polling an atomic
> variable waiting for another CPU to change it, and from my inspection
> so far these cases seem to be the majority.

We've been through that already.  If it's a busy-wait it
should use cpu_relax.  If it's scheduling away that already
forces the compiler to reread anyway.

Do you have an actual example where volatile is needed?

> - It matches the normal expectation based on the name "atomic_read"
> - It matches the behaviour of the other atomic_* primitives

Can't argue since you left out what those expectations
or properties are.

> - It avoids bugs in the cases where "volatile" behaviour is required

Do you (or anyone else for that matter) have an example of this?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  7:09                                 ` Herbert Xu
@ 2007-08-16  8:06                                   ` Stefan Richter
  2007-08-16  8:10                                     ` Herbert Xu
  2007-08-16 14:48                                   ` Ilpo Järvinen
  1 sibling, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-16  8:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote:
>>
>> Note that I said these are the cases _where one might want to allow
>> caching_, so of course adding volatile doesn't help _these_ cases.
>> There are of course other cases where one definitely doesn't want to
>> allow the compiler to cache the value, such as when polling an atomic
>> variable waiting for another CPU to change it, and from my inspection
>> so far these cases seem to be the majority.
> 
> We've been through that already.  If it's a busy-wait it
> should use cpu_relax.  If it's scheduling away that already
> forces the compiler to reread anyway.
> 
> Do you have an actual example where volatile is needed?
> 
>> - It matches the normal expectation based on the name "atomic_read"
>> - It matches the behaviour of the other atomic_* primitives
> 
> Can't argue since you left out what those expectations
> or properties are.

We use atomic_t for data that is concurrently locklessly written and
read at arbitrary times.  My naive expectation as driver author (driver
maintainer) is that all atomic_t accessors, including atomic_read, (and
atomic bitops) work with the then current value of the atomic data.

>> - It avoids bugs in the cases where "volatile" behaviour is required
> 
> Do you (or anyone else for that matter) have an example of this?

The only code I somewhat know, the ieee1394 subsystem, was perhaps
authored and is currently maintained with the expectation that each
occurrence of atomic_read actually results in a load operation, i.e. is
not optimized away.  This means all atomic_t (bus generation, packet and
buffer refcounts, and some other state variables)* and likewise all
atomic bitops in that subsystem.

If that assumption is wrong, then what is the API or language primitive
to force a load operation to occur?


*)  Interesting what a quick LXR session in search for all atomic_t
usages in 'my' subsystem brings to light.  I now noticed an apparently
unused struct member in the bitrotting pcilynx driver, and more
importantly, a pairing of two atomic_t variables in raw1394 that should
be audited for race conditions and for possible replacement by plain int.
-- 
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  8:06                                   ` Stefan Richter
@ 2007-08-16  8:10                                     ` Herbert Xu
  2007-08-16  9:54                                       ` Stefan Richter
                                                         ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16  8:10 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
> > 
> > Do you (or anyone else for that matter) have an example of this?
> 
> The only code I somewhat know, the ieee1394 subsystem, was perhaps
> authored and is currently maintained with the expectation that each
> occurrence of atomic_read actually results in a load operation, i.e. is
> not optimized away.  This means all atomic_t (bus generation, packet and
> buffer refcounts, and some other state variables)* and likewise all
> atomic bitops in that subsystem.

Can you find an actual atomic_read code snippet there that is
broken without the volatile modifier?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:57             ` Satyam Sharma
@ 2007-08-16  9:25               ` Satyam Sharma
  2007-08-16 21:00               ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-16  9:25 UTC (permalink / raw)
  To: Bill Fink
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

[ Bill tells me in private communication he gets this already, but I
  think it's more complicated than the shoddy explanation I'd made
  earlier so would wish to make this clearer in detail one last time,
  for the benefit of others listening in or reading the archives. ]

On Thu, 16 Aug 2007, Satyam Sharma wrote:

> On Thu, 16 Aug 2007, Satyam Sharma wrote:
> [...]
> > On Wed, 15 Aug 2007, Bill Fink wrote:
> > > [...]
> > > I'm curious about one minor tangential point.  Why, instead of:
> > > 
> > > 	b = *(volatile int *)&a;
> > > 
> > > why can't this just be expressed as:
> > > 
> > > 	b = (volatile int)a;
> > > 
> > > Isn't it the contents of a that's volatile, i.e. it's value can change
> > > invisibly to the compiler, and that's why you want to force a read from
> > > memory?  Why do you need the "*(volatile int *)&" construct?
> > 
> > "b = (volatile int)a;" doesn't help us because a cast to a qualified type
> > has the same effect as a cast to an unqualified version of that type, as
> > mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
> > is a type-qualifier, not a type itself, so a cast of the _object_ itself
> > to a qualified-type i.e. (volatile int) would not make the access itself
> > volatile-qualified.

Casts don't produce lvalues, and the cast ((volatile int)a) does not
produce the object-int-a-qualified-as-"volatile" -- in fact, the
result of the above cast is whatever is the _value_ of "int a", with
the access to that object having _already_ taken place, as per the
actual type-qualification of the object (that was originally declared
as being _non-volatile_, in fact). Hence, defining atomic_read() as:

#define atomic_read(v)          ((volatile int)((v)->counter))

would be buggy and not give "volatility" semantics at all, unless the
"counter" object itself isn't volatile-qualified already (which it
isn't).

The result of the cast itself being the _value_ of the int object, and
not the object itself (i.e., not an lvalue), is thereby independent of
type-qualification in that cast itself (it just wouldn't make any
difference), hence the "cast to a qualified type has the same effect
as a cast to an unqualified version of that type" bit in section 6.5.4:4
of the standard.

> > To serve our purposes, it is necessary for us to take the address of this
> > (non-volatile) object, cast the resulting _pointer_ to the corresponding
> > volatile-qualified pointer-type, and then dereference it. This makes that
> > particular _access_ be volatile-qualified, without the object itself being
> > such. Also note that the (dereferenced) result is also a valid lvalue and
> > hence can be used in "*(volatile int *)&a = b;" kind of construction
> > (which we use for the atomic_set case).

Dereferencing using the *(pointer-type-cast)& construct, OTOH, serves
us well:

#define atomic_read(v)          (*(volatile int *)&(v)->counter)

Firstly, note that the cast here being (volatile int *) and not
(int * volatile) qualifies the type of the _object_ being pointed to
by the pointer in question as being volatile-qualified, and not the
pointer itself (6.2.5:27 of the standard, and 6.3.2.3:2 allows us to
convert from a pointer-to-non-volatile-qualified-int to a pointer-to-
volatile-qualified-int, which suits us just fine) -- but note that
the _access_ to that address itself has not yet occurred.

_After_ specifying the memory address as containing a volatile-qualified-
int-type object, (and GCC co-operates as mentioned below), we proceed to
dereference it, which is when the _actual access_ occurs, therefore with
"volatility" semantics this time.

Interesting.

> Here, I should obviously admit that the semantics of *(volatile int *)&
> aren't any neater or well-defined in the _language standard_ at all. The
> standard does say (verbatim) "precisely what constitutes as access to
> object of volatile-qualified type is implementation-defined", but GCC
> does help us out here by doing the right thing. Accessing the non-volatile
> object there using the volatile-qualified pointer-type cast makes GCC
> treat the object stored at that memory address itself as if it were a 
> volatile object, thus making the access end up having what we're calling
> as "volatility" semantics here.
> 
> Honestly, given such confusion, and the propensity of the "volatile"
> type-qualifier keyword to be ill-defined (or at least poorly understood,
> often inconsistently implemented), I'd (again) express my opinion that it
> would be best to avoid its usage, given other alternatives do exist.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  8:10                                     ` Herbert Xu
@ 2007-08-16  9:54                                       ` Stefan Richter
  2007-08-16 10:31                                         ` Stefan Richter
  2007-08-16 10:35                                         ` Herbert Xu
  2007-08-16 19:48                                       ` Chris Snook
  2007-08-17  5:09                                       ` Paul Mackerras
  2 siblings, 2 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-16  9:54 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
>> > 
>> > Do you (or anyone else for that matter) have an example of this?
>> 
>> The only code I somewhat know, the ieee1394 subsystem, was perhaps
>> authored and is currently maintained with the expectation that each
>> occurrence of atomic_read actually results in a load operation, i.e. is
>> not optimized away.  This means all atomic_t (bus generation, packet and
>> buffer refcounts, and some other state variables)* and likewise all
>> atomic bitops in that subsystem.
> 
> Can you find an actual atomic_read code snippet there that is
> broken without the volatile modifier?

What do I have to look for?  atomic_read after another read or write
access to the same variable, in the same function scope?  Or in the sum
of scopes of functions that could be merged by function inlining?

One example was discussed here earlier:  The for (;;) loop in
nodemgr_host_thread.  There an msleep_interruptible implicitly acted as
barrier (at the moment because it's in a different translation unit; if
it were the same, then because it hopefully has own barriers).  So that
happens to work, although such an implicit barrier is bad style:  Better
enforce the desired behaviour (== guaranteed load operation) *explicitly*.
-- 
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  9:54                                       ` Stefan Richter
@ 2007-08-16 10:31                                         ` Stefan Richter
  2007-08-16 10:42                                           ` Herbert Xu
  2007-08-16 10:35                                         ` Herbert Xu
  1 sibling, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-16 10:31 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

I wrote:
> Herbert Xu wrote:
>> On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
[...]
>>> expectation that each
>>> occurrence of atomic_read actually results in a load operation, i.e. is
>>> not optimized away.
[...]
>> Can you find an actual atomic_read code snippet there that is
>> broken without the volatile modifier?

PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
not speaking for any particular implementation of atomic_t and its
accessors at all.  All I am saying is that
  - we use atomically accessed data types because we concurrently but
    locklessly access this data,
  - hence a read access to this data that could be optimized away
    makes *no sense at all*.

The only sensible read accessor to an atomic datatype is a read accessor
that will not be optimized away.

So, the architecture guys can implement atomic_read however they want
--- as long as it cannot be optimized away.*

PPS:  If somebody has code where he can afford to let the compiler
coalesce atomic_read with a previous access to the same data, i.e.
doesn't need and doesn't want all guarantees that the atomic_read API
makes (or IMO should make), then he can replace the atomic_read by a
local temporary variable.


*) Exceptions:
	if (known_to_be_false)
		read_access(a);
and the like.
-- 
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  9:54                                       ` Stefan Richter
  2007-08-16 10:31                                         ` Stefan Richter
@ 2007-08-16 10:35                                         ` Herbert Xu
  1 sibling, 0 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16 10:35 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 11:54:44AM +0200, Stefan Richter wrote:
> 
> One example was discussed here earlier:  The for (;;) loop in
> nodemgr_host_thread.  There an msleep_interruptible implicitly acted as
> barrier (at the moment because it's in a different translation unit; if
> it were the same, then because it hopefully has own barriers).  So that
> happens to work, although such an implicit barrier is bad style:  Better
> enforce the desired behaviour (== guaranteed load operation) *explicitly*.

Hmm, it's not bad style at all.  Let's assume that everything
is in the same scope.  Such a loop must either call a function
that busy-waits, which should always have a cpu_relax or
something equivalent, or it'll call a function that schedules
away which immediately invalidates any values the compiler might
have cached for the atomic_read.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 10:31                                         ` Stefan Richter
@ 2007-08-16 10:42                                           ` Herbert Xu
  2007-08-16 16:34                                             ` Paul E. McKenney
  2007-08-17  5:04                                             ` Paul Mackerras
  0 siblings, 2 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-16 10:42 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote:
> 
> PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
> not speaking for any particular implementation of atomic_t and its
> accessors at all.  All I am saying is that
>   - we use atomically accessed data types because we concurrently but
>     locklessly access this data,
>   - hence a read access to this data that could be optimized away
>     makes *no sense at all*.

No sane compiler can optimise away an atomic_read per se.
That's only possible if there's a preceding atomic_set or
atomic_read, with no barriers in the middle.

If that's the case, then one has to conclude that doing
away with the second read is acceptable, as otherwise
a memory (or at least a compiler) barrier should have been
used.

In fact, volatile doesn't guarantee that the memory gets
read anyway.  You might be reading some stale value out
of the cache.  Granted this doesn't happen on x86 but
when you're coding for the kernel you can't make such
assumptions.

So the point here is that if you don't mind getting a stale
value from the CPU cache when doing an atomic_read, then
surely you won't mind getting a stale value from the compiler
"cache".

> So, the architecture guys can implement atomic_read however they want
> --- as long as it cannot be optimized away.*

They can implement it however they want as long as it stays
atomic.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  7:09                                 ` Herbert Xu
  2007-08-16  8:06                                   ` Stefan Richter
@ 2007-08-16 14:48                                   ` Ilpo Järvinen
  2007-08-16 16:19                                     ` Stefan Richter
                                                       ` (2 more replies)
  1 sibling, 3 replies; 657+ messages in thread
From: Ilpo Järvinen @ 2007-08-16 14:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Thu, 16 Aug 2007, Herbert Xu wrote:

> We've been through that already.  If it's a busy-wait it
> should use cpu_relax. 

I looked around a bit by using some command lines and ended up wondering 
if these are equal to busy-wait case (and should be fixed) or not:

./drivers/telephony/ixj.c
6674:   while (atomic_read(&j->DSPWrite) > 0)
6675-           atomic_dec(&j->DSPWrite);

...besides that, there are couple of more similar cases in the same file 
(with braces)...


-- 
 i.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 14:48                                   ` Ilpo Järvinen
@ 2007-08-16 16:19                                     ` Stefan Richter
  2007-08-16 19:55                                     ` Chris Snook
  2007-08-16 19:55                                     ` Chris Snook
  2 siblings, 0 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-16 16:19 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Ilpo Järvinen wrote:
> I looked around a bit by using some command lines and ended up wondering 
> if these are equal to busy-wait case (and should be fixed) or not:
> 
> ./drivers/telephony/ixj.c
> 6674:   while (atomic_read(&j->DSPWrite) > 0)
> 6675-           atomic_dec(&j->DSPWrite);
> 
> ...besides that, there are couple of more similar cases in the same file 
> (with braces)...

Generally, ixj.c has several occurrences of couples of atomic write and
atomic read which potentially do not do what the author wanted.
-- 
Stefan Richter
-=====-=-=== =--- =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 10:42                                           ` Herbert Xu
@ 2007-08-16 16:34                                             ` Paul E. McKenney
  2007-08-16 23:59                                               ` Herbert Xu
  2007-08-17  3:15                                               ` Nick Piggin
  2007-08-17  5:04                                             ` Paul Mackerras
  1 sibling, 2 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16 16:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote:
> > 
> > PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
> > not speaking for any particular implementation of atomic_t and its
> > accessors at all.  All I am saying is that
> >   - we use atomically accessed data types because we concurrently but
> >     locklessly access this data,
> >   - hence a read access to this data that could be optimized away
> >     makes *no sense at all*.
> 
> No sane compiler can optimise away an atomic_read per se.
> That's only possible if there's a preceding atomic_set or
> atomic_read, with no barriers in the middle.
> 
> If that's the case, then one has to conclude that doing
> away with the second read is acceptable, as otherwise
> a memory (or at least a compiler) barrier should have been
> used.

The compiler can also reorder non-volatile accesses.  For an example
patch that cares about this, please see:

	http://lkml.org/lkml/2007/8/7/280

This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
rcu_read_unlock() to ensure that accesses aren't reordered with respect
to interrupt handlers and NMIs/SMIs running on that same CPU.

> In fact, volatile doesn't guarantee that the memory gets
> read anyway.  You might be reading some stale value out
> of the cache.  Granted this doesn't happen on x86 but
> when you're coding for the kernel you can't make such
> assumptions.
> 
> So the point here is that if you don't mind getting a stale
> value from the CPU cache when doing an atomic_read, then
> surely you won't mind getting a stale value from the compiler
> "cache".

Absolutely disagree.  An interrupt/NMI/SMI handler running on the CPU
will see the same value (whether in cache or in store buffer) that
the mainline code will see.  In this case, we don't care about CPU
misordering, only about compiler misordering.  It is easy to see
other uses that combine communication with handlers on the current
CPU with communication among CPUs -- again, see prior messages in
this thread.

> > So, the architecture guys can implement atomic_read however they want
> > --- as long as it cannot be optimized away.*
> 
> They can implement it however they want as long as it stays
> atomic.

Precisely.  And volatility is a key property of "atomic".  Let's please
not throw it away.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:23                               ` Paul Mackerras
  2007-08-16  3:33                                 ` Herbert Xu
@ 2007-08-16 18:48                                 ` Christoph Lameter
  2007-08-16 19:44                                 ` Segher Boessenkool
  2 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16 18:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> 
> It seems that there could be a lot of places where atomic_t is used in
> a non-atomic fashion, and that those uses are either buggy, or there
> is some lock held at the time which guarantees that other CPUs aren't
> changing the value.  In both cases there is no point in using
> atomic_t; we might as well just use an ordinary int.

The point of atomic_t is to do atomic *changes* to the variable.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  6:00                                           ` Paul Mackerras
@ 2007-08-16 18:50                                             ` Christoph Lameter
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16 18:50 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > It doesn't matter.  The memory pressure flag is an *advisory*
> > flag.  If we get it wrong the worst that'll happen is that we'd
> > waste some time doing work that'll be thrown away.
> 
> Ah, so it's the "racy but I don't care because it's only an
> optimization" case.  That's fine.  Somehow I find it hard to believe
> that all the racy uses of atomic_read in the kernel are like that,
> though. :)

My use of atomic_read in SLUB is like that. Volatile does not magically 
sync up reads somehow.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  4:11                           ` Paul Mackerras
  2007-08-16  5:39                             ` Herbert Xu
@ 2007-08-16 18:54                             ` Christoph Lameter
  2007-08-16 20:07                               ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16 18:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Herbert Xu, Paul E. McKenney, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> The uses of atomic_read where one might want it to allow caching of
> the result seem to me to fall into 3 categories:
> 
> 1. Places that are buggy because of a race arising from the way it's
>    used.
> 
> 2. Places where there is a race but it doesn't matter because we're
>    doing some clever trick.
> 
> 3. Places where there is some locking in place that eliminates any
>    potential race.
> 
> In case 1, adding volatile won't solve the race, of course, but it's
> hard to argue that we shouldn't do something because it will slow down
> buggy code.  Case 2 is hopefully pretty rare and accompanied by large
> comment blocks, and in those cases caching the result of atomic_read
> explicitly in a local variable would probably make the code clearer.
> And in case 3 there is no reason to use atomic_t at all; we might as
> well just use an int.

In 2 + 3 you may increment the atomic variable in some places. The value 
of the atomic variable may not matter because you only do optimizations.

Checking a atomic_t for a definite state has to involve either
some side conditions (lock only taken if refcount is <= 0 or so) or done 
by changing the state (see f.e. atomic_inc_unless_zero).

> So I don't see any good reason to make the atomic API more complex by
> having "volatile" and "non-volatile" versions of atomic_read.  It
> should just have the "volatile" behaviour.

If you want to make it less complex then drop volatile which causes weird 
side effects without solving any problems as you just pointed out.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:23               ` Nick Piggin
@ 2007-08-16 19:32                 ` Segher Boessenkool
  2007-08-17  2:19                   ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 19:32 UTC (permalink / raw)
  To: Nick Piggin
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

>>>> Part of the motivation here is to fix heisenbugs.  If I knew where 
>>>> they
>>>
>>>
>>> By the same token we should probably disable optimisations
>>> altogether since that too can create heisenbugs.
>> Almost everything is a tradeoff; and so is this.  I don't
>> believe most people would find disabling all compiler
>> optimisations an acceptable price to pay for some peace
>> of mind.
>
> So why is this a good tradeoff?

It certainly is better than disabling all compiler optimisations!

> I also think that just adding things to APIs in the hope it might fix
> up some bugs isn't really a good road to go down. Where do you stop?

I look at it the other way: keeping the "volatile" semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


[some confusion about barriers wrt atomics snipped]


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  2:30                 ` Paul E. McKenney
@ 2007-08-16 19:33                   ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 19:33 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

>> The only thing volatile on an asm does is create a side effect
>> on the asm statement; in effect, it tells the compiler "do not
>> remove this asm even if you don't need any of its outputs".
>>
>> It's not disabling optimisation likely to result in bugs,
>> heisen- or otherwise; _not_ putting the volatile on an asm
>> that needs it simply _is_ a bug :-)
>
> Yep.  And the reason it is a bug is that it fails to disable
> the relevant compiler optimizations.  So I suspect that we might
> actually be saying the same thing here.

We're not saying the same thing, but we do agree :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:05                         ` Paul Mackerras
@ 2007-08-16 19:39                           ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 19:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, Paul E. McKenney,
	netdev, ak, cfriesen, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

>> I can't speak for this particular case, but there could be similar 
>> code
>> examples elsewhere, where we do the atomic ops on an atomic_t object
>> inside a higher-level locking scheme that would take care of the kind 
>> of
>> problem you're referring to here. It would be useful for such or 
>> similar
>> code if the compiler kept the value of that atomic object in a 
>> register.
>
> If there is a higher-level locking scheme then there is no point to
> using atomic_t variables.  Atomic_t is specifically for the situation
> where multiple CPUs are updating a variable without locking.

And don't forget about the case where it is an I/O device that is
updating the memory (in buffer descriptors or similar).  The driver
needs to do a "volatile" atomic read to get at the most recent version
of that data, which can be important for optimising latency (or 
throughput
even).  There is no other way the kernel can get that info -- doing an
MMIO read is way way too expensive.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  3:23                               ` Paul Mackerras
  2007-08-16  3:33                                 ` Herbert Xu
  2007-08-16 18:48                                 ` Christoph Lameter
@ 2007-08-16 19:44                                 ` Segher Boessenkool
  2 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 19:44 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, Paul E. McKenney,
	netdev, ak, cfriesen, rpjday, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

> I'd go so far as to say that anywhere where you want a non-"volatile"
> atomic_read, either your code is buggy, or else an int would work just
> as well.

Even, the only way to implement a "non-volatile" atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the result and stuff like that, but that's not the issue
here).

So if that would be the behaviour we wanted, just get rid of that
whole atomic_read() thing, so no one can misuse it anymore.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  8:10                                     ` Herbert Xu
  2007-08-16  9:54                                       ` Stefan Richter
@ 2007-08-16 19:48                                       ` Chris Snook
  2007-08-17  0:02                                         ` Herbert Xu
  2007-08-17  5:09                                       ` Paul Mackerras
  2 siblings, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-16 19:48 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
>>> Do you (or anyone else for that matter) have an example of this?
>> The only code I somewhat know, the ieee1394 subsystem, was perhaps
>> authored and is currently maintained with the expectation that each
>> occurrence of atomic_read actually results in a load operation, i.e. is
>> not optimized away.  This means all atomic_t (bus generation, packet and
>> buffer refcounts, and some other state variables)* and likewise all
>> atomic bitops in that subsystem.
> 
> Can you find an actual atomic_read code snippet there that is
> broken without the volatile modifier?

A whole bunch of atomic_read uses will be broken without the volatile 
modifier once we start removing barriers that aren't needed if volatile 
behavior is guaranteed.

barrier() clobbers all your registers.  volatile atomic_read() only 
clobbers one register, and more often than not it's a register you 
wanted to clobber anyway.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 14:48                                   ` Ilpo Järvinen
  2007-08-16 16:19                                     ` Stefan Richter
@ 2007-08-16 19:55                                     ` Chris Snook
  2007-08-16 20:20                                       ` Christoph Lameter
  2007-08-16 21:08                                       ` Luck, Tony
  2007-08-16 19:55                                     ` Chris Snook
  2 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-16 19:55 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Ilpo Järvinen wrote:
> On Thu, 16 Aug 2007, Herbert Xu wrote:
> 
>> We've been through that already.  If it's a busy-wait it
>> should use cpu_relax. 
> 
> I looked around a bit by using some command lines and ended up wondering 
> if these are equal to busy-wait case (and should be fixed) or not:
> 
> ./drivers/telephony/ixj.c
> 6674:   while (atomic_read(&j->DSPWrite) > 0)
> 6675-           atomic_dec(&j->DSPWrite);
> 
> ...besides that, there are couple of more similar cases in the same file 
> (with braces)...

atomic_dec() already has volatile behavior everywhere, so this is 
semantically okay, but this code (and any like it) should be calling 
cpu_relax() each iteration through the loop, unless there's a compelling 
reason not to.  I'll allow that for some hardware drivers (possibly this 
one) such a compelling reason may exist, but hardware-independent core 
subsystems probably have no excuse.

If the maintainer of this code doesn't see a compelling reason to add 
cpu_relax() in this loop, then it should be patched.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 14:48                                   ` Ilpo Järvinen
  2007-08-16 16:19                                     ` Stefan Richter
  2007-08-16 19:55                                     ` Chris Snook
@ 2007-08-16 19:55                                     ` Chris Snook
  2 siblings, 0 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-16 19:55 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Ilpo Järvinen wrote:
> On Thu, 16 Aug 2007, Herbert Xu wrote:
> 
>> We've been through that already.  If it's a busy-wait it
>> should use cpu_relax. 
> 
> I looked around a bit by using some command lines and ended up wondering 
> if these are equal to busy-wait case (and should be fixed) or not:
> 
> ./drivers/telephony/ixj.c
> 6674:   while (atomic_read(&j->DSPWrite) > 0)
> 6675-           atomic_dec(&j->DSPWrite);
> 
> ...besides that, there are couple of more similar cases in the same file 
> (with braces)...

atomic_dec() already has volatile behavior everywhere, so this is 
semantically okay, but this code (and any like it) should be calling 
cpu_relax() each iteration through the loop, unless there's a compelling 
reason not to.  I'll allow that for some hardware drivers (possibly this 
one) such a compelling reason may exist, but hardware-independent core 
subsystems probably have no excuse.

If the maintainer of this code doesn't see a compelling reason not to 
add cpu_relax() in this loop, then it should be patched.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 18:54                             ` Christoph Lameter
@ 2007-08-16 20:07                               ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-16 20:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Herbert Xu, Stefan Richter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 11:54:54AM -0700, Christoph Lameter wrote:
> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> > So I don't see any good reason to make the atomic API more complex by
> > having "volatile" and "non-volatile" versions of atomic_read.  It
> > should just have the "volatile" behaviour.
> 
> If you want to make it less complex then drop volatile which causes weird 
> side effects without solving any problems as you just pointed out.

The other set of problems are communication between process context
and interrupt/NMI handlers.  Volatile does help here.  And the performance
impact of volatile is pretty near zero, so why have the non-volatile
variant?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 19:55                                     ` Chris Snook
@ 2007-08-16 20:20                                       ` Christoph Lameter
  2007-08-17  1:02                                         ` Paul E. McKenney
                                                           ` (2 more replies)
  2007-08-16 21:08                                       ` Luck, Tony
  1 sibling, 3 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-16 20:20 UTC (permalink / raw)
  To: Chris Snook
  Cc: Ilpo Järvinen, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, 16 Aug 2007, Chris Snook wrote:

> atomic_dec() already has volatile behavior everywhere, so this is semantically
> okay, but this code (and any like it) should be calling cpu_relax() each
> iteration through the loop, unless there's a compelling reason not to.  I'll
> allow that for some hardware drivers (possibly this one) such a compelling
> reason may exist, but hardware-independent core subsystems probably have no
> excuse.

No it does not have any volatile semantics. atomic_dec() can be reordered 
at will by the compiler within the current basic unit if you do not add a 
barrier.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:20           ` Satyam Sharma
  2007-08-16  5:57             ` Satyam Sharma
@ 2007-08-16 20:50             ` Segher Boessenkool
  2007-08-16 22:40               ` David Schwartz
  2007-08-17  4:24               ` Satyam Sharma
  1 sibling, 2 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 20:50 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

> Note that "volatile"
> is a type-qualifier, not a type itself, so a cast of the _object_ 
> itself
> to a qualified-type i.e. (volatile int) would not make the access 
> itself
> volatile-qualified.

There is no such thing as "volatile-qualified access" defined
anywhere; there only is the concept of a "volatile-qualified
*object*".

> To serve our purposes, it is necessary for us to take the address of 
> this
> (non-volatile) object, cast the resulting _pointer_ to the 
> corresponding
> volatile-qualified pointer-type, and then dereference it. This makes 
> that
> particular _access_ be volatile-qualified, without the object itself 
> being
> such. Also note that the (dereferenced) result is also a valid lvalue 
> and
> hence can be used in "*(volatile int *)&a = b;" kind of construction
> (which we use for the atomic_set case).

There is a quite convincing argument that such an access _is_ an
access to a volatile object; see GCC PR21568 comment #9.  This
probably isn't the last word on the matter though...


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  5:57             ` Satyam Sharma
  2007-08-16  9:25               ` Satyam Sharma
@ 2007-08-16 21:00               ` Segher Boessenkool
  2007-08-17  4:32                 ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-16 21:00 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

> Here, I should obviously admit that the semantics of *(volatile int *)&
> aren't any neater or well-defined in the _language standard_ at all. 
> The
> standard does say (verbatim) "precisely what constitutes as access to
> object of volatile-qualified type is implementation-defined", but GCC
> does help us out here by doing the right thing.

Where do you get that idea?  GCC manual, section 6.1, "When
is a Volatile Object Accessed?" doesn't say anything of the
kind.  PR33053 and some others.

> Honestly, given such confusion, and the propensity of the "volatile"
> type-qualifier keyword to be ill-defined (or at least poorly 
> understood,
> often inconsistently implemented), I'd (again) express my opinion that 
> it
> would be best to avoid its usage, given other alternatives do exist.

Yeah.  Or we can have an email thread like this every time
someone proposes a patch that uses an atomic variable ;-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 19:55                                     ` Chris Snook
  2007-08-16 20:20                                       ` Christoph Lameter
@ 2007-08-16 21:08                                       ` Luck, Tony
  1 sibling, 0 replies; 657+ messages in thread
From: Luck, Tony @ 2007-08-16 21:08 UTC (permalink / raw)
  To: Chris Snook, Ilpo Järvinen
  Cc: Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

>> 6674:   while (atomic_read(&j->DSPWrite) > 0)
>> 6675-           atomic_dec(&j->DSPWrite);
>
> If the maintainer of this code doesn't see a compelling reason to add 
> cpu_relax() in this loop, then it should be patched.

Shouldn't it be just re-written without the loop:

	if ((tmp = atomic_read(&j->DSPWrite)) > 0)
		atomic_sub(&j->DSPWrite, tmp);

Has all the same bugs, but runs much faster :-)

-Tony

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 20:50             ` Segher Boessenkool
@ 2007-08-16 22:40               ` David Schwartz
  2007-08-17  4:36                 ` Satyam Sharma
  2007-08-17  4:24               ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: David Schwartz @ 2007-08-16 22:40 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org

> There is a quite convincing argument that such an access _is_ an
> access to a volatile object; see GCC PR21568 comment #9.  This
> probably isn't the last word on the matter though...

I find this argument completely convincing and retract the contrary argument
that I've made many times in this forum and others. You learn something new
every day.

Just in case it wasn't clear:
int i;
*(volatile int *)&i=2;

In this case, there *is* an access to a volatile object. This is the end
result of the the standard's definition of what it means to apply the
'volatile int *' cast to '&i' and then apply the '*' operator to the result
and use it as an lvalue.

C does not define the type of an object by how it is defined but by how it
is accessed!

DS

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 16:34                                             ` Paul E. McKenney
@ 2007-08-16 23:59                                               ` Herbert Xu
  2007-08-17  1:01                                                 ` Paul E. McKenney
  2007-08-17  3:15                                               ` Nick Piggin
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-16 23:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
>
> The compiler can also reorder non-volatile accesses.  For an example
> patch that cares about this, please see:
> 
> 	http://lkml.org/lkml/2007/8/7/280
> 
> This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
> rcu_read_unlock() to ensure that accesses aren't reordered with respect
> to interrupt handlers and NMIs/SMIs running on that same CPU.

Good, finally we have some code to discuss (even though it's
not actually in the kernel yet).

First of all, I think this illustrates that what you want
here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
macro occurs a lot more times in your patch than atomic
reads/sets.  So *assuming* that it was necessary at all,
then having an ordered variant of the atomic_read/atomic_set
ops could do just as well.

However, I still don't know which atomic_read/atomic_set in
your patch would be broken if there were no volatile.  Could
you please point them out?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 19:48                                       ` Chris Snook
@ 2007-08-17  0:02                                         ` Herbert Xu
  2007-08-17  2:04                                           ` Chris Snook
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-17  0:02 UTC (permalink / raw)
  To: Chris Snook
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:
>
> >Can you find an actual atomic_read code snippet there that is
> >broken without the volatile modifier?
> 
> A whole bunch of atomic_read uses will be broken without the volatile 
> modifier once we start removing barriers that aren't needed if volatile 
> behavior is guaranteed.

Could you please cite the file/function names so we can
see whether removing the barrier makes sense?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 23:59                                               ` Herbert Xu
@ 2007-08-17  1:01                                                 ` Paul E. McKenney
  2007-08-17  7:39                                                   ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17  1:01 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
> >
> > The compiler can also reorder non-volatile accesses.  For an example
> > patch that cares about this, please see:
> > 
> > 	http://lkml.org/lkml/2007/8/7/280
> > 
> > This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
> > rcu_read_unlock() to ensure that accesses aren't reordered with respect
> > to interrupt handlers and NMIs/SMIs running on that same CPU.
> 
> Good, finally we have some code to discuss (even though it's
> not actually in the kernel yet).

There was some earlier in this thread as well.

> First of all, I think this illustrates that what you want
> here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> macro occurs a lot more times in your patch than atomic
> reads/sets.  So *assuming* that it was necessary at all,
> then having an ordered variant of the atomic_read/atomic_set
> ops could do just as well.

Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
to maintain ordering, then I could just use them instead of having to
create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
different patch.)

> However, I still don't know which atomic_read/atomic_set in
> your patch would be broken if there were no volatile.  Could
> you please point them out?

Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
atomic_read() and atomic_set().  Starting with __rcu_read_lock():

o	If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
	was ordered by the compiler after
	"ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
	suppose an NMI/SMI happened after the rcu_read_lock_nesting but
	before the rcu_flipctr.

	Then if there was an rcu_read_lock() in the SMI/NMI
	handler (which is perfectly legal), the nested rcu_read_lock()
	would believe that it could take the then-clause of the
	enclosing "if" statement.  But because the rcu_flipctr per-CPU
	variable had not yet been incremented, an RCU updater would
	be within its rights to assume that there were no RCU reads
	in progress, thus possibly yanking a data structure out from
	under the reader in the SMI/NMI function.

	Fatal outcome.  Note that only one CPU is involved here
	because these are all either per-CPU or per-task variables.

o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
	was ordered by the compiler to follow the
	"ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
	happened between the two, then an __rcu_read_lock() in the NMI/SMI
	would incorrectly take the "else" clause of the enclosing "if"
	statement.  If some other CPU flipped the rcu_ctrlblk.completed
	in the meantime, then the __rcu_read_lock() would (correctly)
	write the new value into rcu_flipctr_idx.

	Well and good so far.  But the problem arises in
	__rcu_read_unlock(), which then decrements the wrong counter.
	Depending on exactly how subsequent events played out, this could
	result in either prematurely ending grace periods or never-ending
	grace periods, both of which are fatal outcomes.

And the following are not needed in the current version of the
patch, but will be in a future version that either avoids disabling
irqs or that dispenses with the smp_read_barrier_depends() that I
have 99% convinced myself is unneeded:

o	nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);

o	idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;

Furthermore, in that future version, irq handlers can cause the same
mischief that SMI/NMI handlers can in this version.

Next, looking at __rcu_read_unlock():

o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting - 1"
	was reordered by the compiler to follow the
	"ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--",
	then if an NMI/SMI containing an rcu_read_lock() occurs between
	the two, this nested rcu_read_lock() would incorrectly believe
	that it was protected by an enclosing RCU read-side critical
	section as described in the first reversal discussed for
	__rcu_read_lock() above.  Again, fatal outcome.

This is what we have now.  It is not hard to imagine situations that
interact with -both- interrupt handlers -and- other CPUs, as described
earlier.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 20:20                                       ` Christoph Lameter
@ 2007-08-17  1:02                                         ` Paul E. McKenney
  2007-08-17  1:28                                           ` Herbert Xu
  2007-08-17  2:16                                         ` Paul Mackerras
  2007-08-17 17:41                                         ` Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17  1:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Chris Snook, Ilpo Järvinen, Herbert Xu, Paul Mackerras,
	Satyam Sharma, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thu, Aug 16, 2007 at 01:20:26PM -0700, Christoph Lameter wrote:
> On Thu, 16 Aug 2007, Chris Snook wrote:
> 
> > atomic_dec() already has volatile behavior everywhere, so this is semantically
> > okay, but this code (and any like it) should be calling cpu_relax() each
> > iteration through the loop, unless there's a compelling reason not to.  I'll
> > allow that for some hardware drivers (possibly this one) such a compelling
> > reason may exist, but hardware-independent core subsystems probably have no
> > excuse.
> 
> No it does not have any volatile semantics. atomic_dec() can be reordered 
> at will by the compiler within the current basic unit if you do not add a 
> barrier.

Yep.  Or you can use atomic_dec_return() instead of using a barrier.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  1:02                                         ` Paul E. McKenney
@ 2007-08-17  1:28                                           ` Herbert Xu
  2007-08-17  5:07                                             ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-17  1:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, Chris Snook, Ilpo Järvinen,
	Paul Mackerras, Satyam Sharma, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Thu, Aug 16, 2007 at 06:02:32PM -0700, Paul E. McKenney wrote:
> 
> Yep.  Or you can use atomic_dec_return() instead of using a barrier.

Or you could use smp_mb__{before,after}_atomic_dec.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  0:02                                         ` Herbert Xu
@ 2007-08-17  2:04                                           ` Chris Snook
  2007-08-17  2:13                                             ` Herbert Xu
  2007-08-17  2:31                                             ` Nick Piggin
  0 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-17  2:04 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:
>>> Can you find an actual atomic_read code snippet there that is
>>> broken without the volatile modifier?
>> A whole bunch of atomic_read uses will be broken without the volatile 
>> modifier once we start removing barriers that aren't needed if volatile 
>> behavior is guaranteed.
> 
> Could you please cite the file/function names so we can
> see whether removing the barrier makes sense?
> 
> Thanks,

At a glance, several architectures' implementations of smp_call_function() have 
one or more legitimate atomic_read() busy-waits that shouldn't be using 
CPU-relax.  Some of them do work in the loop.

I'm sure there are plenty more examples that various maintainers could find in 
their own code.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  2:04                                           ` Chris Snook
@ 2007-08-17  2:13                                             ` Herbert Xu
  2007-08-17  2:31                                             ` Nick Piggin
  1 sibling, 0 replies; 657+ messages in thread
From: Herbert Xu @ 2007-08-17  2:13 UTC (permalink / raw)
  To: Chris Snook
  Cc: Stefan Richter, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Thu, Aug 16, 2007 at 10:04:24PM -0400, Chris Snook wrote:
>
> >Could you please cite the file/function names so we can
> >see whether removing the barrier makes sense?
> 
> At a glance, several architectures' implementations of smp_call_function() 
> have one or more legitimate atomic_read() busy-waits that shouldn't be 
> using CPU-relax.  Some of them do work in the loop.

Care to name one so we can discuss it?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 20:20                                       ` Christoph Lameter
  2007-08-17  1:02                                         ` Paul E. McKenney
@ 2007-08-17  2:16                                         ` Paul Mackerras
  2007-08-17  3:03                                           ` Linus Torvalds
  2007-08-17 17:41                                         ` Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  2:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Chris Snook, Ilpo Järvinen, Herbert Xu, Satyam Sharma,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Christoph Lameter writes:

> No it does not have any volatile semantics. atomic_dec() can be reordered 
> at will by the compiler within the current basic unit if you do not add a 
> barrier.

Volatile doesn't mean it can't be reordered; volatile means the
accesses can't be eliminated.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 19:32                 ` Segher Boessenkool
@ 2007-08-17  2:19                   ` Nick Piggin
  2007-08-17  3:16                     ` Paul Mackerras
  2007-08-17 17:37                     ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  2:19 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

Segher Boessenkool wrote:
>>>>> Part of the motivation here is to fix heisenbugs.  If I knew where 
>>>>> they
>>>>
>>>>
>>>>
>>>> By the same token we should probably disable optimisations
>>>> altogether since that too can create heisenbugs.
>>>
>>> Almost everything is a tradeoff; and so is this.  I don't
>>> believe most people would find disabling all compiler
>>> optimisations an acceptable price to pay for some peace
>>> of mind.
>>
>>
>> So why is this a good tradeoff?
> 
> 
> It certainly is better than disabling all compiler optimisations!

It's easy to be better than something really stupid :)

So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text. What you need to justify is why it is a good
tradeoff to make them volatile (which btw, is much harder to go
the other way after we let people make those assumptions).

>> I also think that just adding things to APIs in the hope it might fix
>> up some bugs isn't really a good road to go down. Where do you stop?
> 
> 
> I look at it the other way: keeping the "volatile" semantics in
> atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;

Yeah, but we could add lots of things to help prevent bugs and
would never be included. I would also contend that it helps _hide_
bugs and encourages people to be lazy when thinking about these
things.

Also, you dismiss the fact that we'd actually be *adding* volatile
semantics back to the 2 most widely tested architectures (in terms
of test time, number of testers, variety of configurations, and
coverage of driver code). This is a very important different from
just keeping volatile semantics because it is basically a one-way
API change.

> certainly most people expect that behaviour, and also that behaviour
> is *needed* in some places and no other interface provides that
> functionality.

I don't know that most people would expect that behaviour. Is there
any documentation anywhere that would suggest this?

Also, barrier() most definitely provides the required functionality.
It is overkill in some situations, but volatile is overkill in _most_
situations. If that's what you're worried about, we should add a new
ordering primitive.

> [some confusion about barriers wrt atomics snipped]

What were you confused about?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  2:04                                           ` Chris Snook
  2007-08-17  2:13                                             ` Herbert Xu
@ 2007-08-17  2:31                                             ` Nick Piggin
  1 sibling, 0 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  2:31 UTC (permalink / raw)
  To: Chris Snook
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Paul E. McKenney, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Chris Snook wrote:
> Herbert Xu wrote:
> 
>> On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:
>>
>>>> Can you find an actual atomic_read code snippet there that is
>>>> broken without the volatile modifier?
>>>
>>> A whole bunch of atomic_read uses will be broken without the volatile 
>>> modifier once we start removing barriers that aren't needed if 
>>> volatile behavior is guaranteed.
>>
>>
>> Could you please cite the file/function names so we can
>> see whether removing the barrier makes sense?
>>
>> Thanks,
> 
> 
> At a glance, several architectures' implementations of 
> smp_call_function() have one or more legitimate atomic_read() busy-waits 
> that shouldn't be using CPU-relax.  Some of them do work in the loop.

sh looks like the only one there that would be broken (and that's only
because they don't have a cpu_relax there, but it should be added anyway).
Sure, if we removed volatile from other architectures, it would be wise
to audit arch code because arch maintainers do sometimes make assumptions
about their implementation details... however we can assume most generic
code is safe without volatile.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  2:16                                         ` Paul Mackerras
@ 2007-08-17  3:03                                           ` Linus Torvalds
  2007-08-17  3:43                                             ` Paul Mackerras
  2007-08-17 22:09                                             ` Segher Boessenkool
  0 siblings, 2 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-17  3:03 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Christoph Lameter, Chris Snook, Ilpo J?rvinen, Herbert Xu,
	Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Paul Mackerras wrote:
>
> Volatile doesn't mean it can't be reordered; volatile means the
> accesses can't be eliminated.

It also does limit re-ordering. 

Of course, since *normal* accesses aren't necessarily limited wrt 
re-ordering, the question then becomes one of "with regard to *what* does 
it limit re-ordering?".

A C compiler that re-orders two different volatile accesses that have a 
sequence point in between them is pretty clearly a buggy compiler. So at a 
minimum, it limits re-ordering wrt other volatiles (assuming sequence 
points exists). It also means that the compiler cannot move it 
speculatively across conditionals, but other than that it's starting to 
get fuzzy.

In general, I'd *much* rather we used barriers. Anything that "depends" on 
volatile is pretty much set up to be buggy. But I'm certainly also willing 
to have that volatile inside "atomic_read/atomic_set()" if it avoids code 
that would otherwise break - ie if it hides a bug.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 16:34                                             ` Paul E. McKenney
  2007-08-16 23:59                                               ` Herbert Xu
@ 2007-08-17  3:15                                               ` Nick Piggin
  2007-08-17  4:02                                                 ` Paul Mackerras
                                                                   ` (2 more replies)
  1 sibling, 3 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  3:15 UTC (permalink / raw)
  To: paulmck
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote:

>>In fact, volatile doesn't guarantee that the memory gets
>>read anyway.  You might be reading some stale value out
>>of the cache.  Granted this doesn't happen on x86 but
>>when you're coding for the kernel you can't make such
>>assumptions.
>>
>>So the point here is that if you don't mind getting a stale
>>value from the CPU cache when doing an atomic_read, then
>>surely you won't mind getting a stale value from the compiler
>>"cache".
> 
> 
> Absolutely disagree.  An interrupt/NMI/SMI handler running on the CPU
> will see the same value (whether in cache or in store buffer) that
> the mainline code will see.  In this case, we don't care about CPU
> misordering, only about compiler misordering.  It is easy to see
> other uses that combine communication with handlers on the current
> CPU with communication among CPUs -- again, see prior messages in
> this thread.

I still don't agree with the underlying assumption that everybody
(or lots of kernel code) treats atomic accesses as volatile.

Nobody that does has managed to explain my logic problem either:
loads and stores to long and ptr have always been considered to be
atomic, test_bit is atomic; so why are another special subclass of
atomic loads and stores? (and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)

Why are people making these undocumented and just plain false
assumptions about atomic_t? If they're using lockless code (ie.
which they must be if using atomics), then they actually need to be
thinking much harder about memory ordering issues. If that is too
much for them, then they can just use locks.

>>>So, the architecture guys can implement atomic_read however they want
>>>--- as long as it cannot be optimized away.*
>>
>>They can implement it however they want as long as it stays
>>atomic.
> 
> 
> Precisely.  And volatility is a key property of "atomic".  Let's please
> not throw it away.

It isn't, though (at least not since i386 and x86-64 don't have it).
_Adding_ it is trivial, and can be done any time. Throwing it away
(ie. making the API weaker) is _hard_. So let's not add it without
really good reasons. It most definitely results in worse code
generation in practice.

I don't know why people would assume volatile of atomics. AFAIK, most
of the documentation is pretty clear that all the atomic stuff can be
reordered etc. except for those that modify and return a value.

It isn't even intuitive: `*lp = value` is like the most fundamental
atomic operation in Linux.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  2:19                   ` Nick Piggin
@ 2007-08-17  3:16                     ` Paul Mackerras
  2007-08-17  3:32                       ` Nick Piggin
  2007-08-17  3:42                       ` Linus Torvalds
  2007-08-17 17:37                     ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  3:16 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Segher Boessenkool, heiko.carstens, horms, linux-kernel, rpjday,
	ak, netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, satyam, clameter, schwidefsky, Chris Snook, Herbert Xu,
	davem, wensong, wjiang

Nick Piggin writes:

> So i386 and x86-64 don't have volatiles there, and it saves them a
> few K of kernel text. What you need to justify is why it is a good

I'm really surprised it's as much as a few K.  I tried it on powerpc
and it only saved 40 bytes (10 instructions) for a G5 config.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:16                     ` Paul Mackerras
@ 2007-08-17  3:32                       ` Nick Piggin
  2007-08-17  3:50                         ` Linus Torvalds
  2007-08-17  3:42                       ` Linus Torvalds
  1 sibling, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  3:32 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Segher Boessenkool, heiko.carstens, horms, linux-kernel, rpjday,
	ak, netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch,
	zlynx, satyam, clameter, schwidefsky, Chris Snook, Herbert Xu,
	davem, wensong, wjiang

Paul Mackerras wrote:
> Nick Piggin writes:
> 
> 
>>So i386 and x86-64 don't have volatiles there, and it saves them a
>>few K of kernel text. What you need to justify is why it is a good
> 
> 
> I'm really surprised it's as much as a few K.  I tried it on powerpc
> and it only saved 40 bytes (10 instructions) for a G5 config.
> 
> Paul.
> 

I'm surprised too. Numbers were from the "...use asm() like the other
atomic operations already do" thread. According to them,

   text    data     bss     dec     hex filename
3434150  249176  176128 3859454  3ae3fe atomic_normal/vmlinux
3436203  249176  176128 3861507  3aec03 atomic_volatile/vmlinux

The first one is a stock kenel, the second is with atomic_read/set
cast to volatile. gcc-4.1 -- maybe if you have an earlier gcc it
won't optimise as much?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:16                     ` Paul Mackerras
  2007-08-17  3:32                       ` Nick Piggin
@ 2007-08-17  3:42                       ` Linus Torvalds
  2007-08-17  5:18                         ` Paul E. McKenney
                                           ` (4 more replies)
  1 sibling, 5 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-17  3:42 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Nick Piggin, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, satyam, clameter, schwidefsky, Chris Snook,
	Herbert Xu, davem, wensong, wjiang

On Fri, 17 Aug 2007, Paul Mackerras wrote:
> 
> I'm really surprised it's as much as a few K.  I tried it on powerpc
> and it only saved 40 bytes (10 instructions) for a G5 config.

One of the things that "volatile" generally screws up is a simple

	volatile int i;

	i++;

which a compiler will generally get horribly, horribly wrong.

In a reasonable world, gcc should just make that be (on x86)

	addl $1,i(%rip)

on x86-64, which is indeed what it does without the volatile. But with the 
volatile, the compiler gets really nervous, and doesn't dare do it in one 
instruction, and thus generates crap like

        movl    i(%rip), %eax
        addl    $1, %eax
        movl    %eax, i(%rip)

instead. For no good reason, except that "volatile" just doesn't have any 
good/clear semantics for the compiler, so most compilers will just make it 
be "I will not touch this access in any way, shape, or form". Including 
even trivially correct instruction optimization/combination.

This is one of the reasons why we should never use "volatile". It 
pessimises code generation for no good reason - just because compilers 
don't know what the heck it even means! 

Now, people don't do "i++" on atomics (you'd use "atomic_inc()" for that), 
but people *do* do things like

	if (atomic_read(..) <= 1)
		..

On ppc, things like that probably don't much matter. But on x86, it makes 
a *huge* difference whether you do

	movl i(%rip),%eax
	cmpl $1,%eax

or if you can just use the value directly for the operation, like this:

	cmpl $1,i(%rip)

which is again a totally obvious and totally safe optimization, but is 
(again) something that gcc doesn't dare do, since "i" is volatile.

In other words: "volatile" is a horribly horribly bad way of doing things, 
because it generates *worse*code*, for no good reason. You just don't see 
it on powerpc, because it's already a load-store architecture, so there is 
no "good code" for doing direct-to-memory operations.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:03                                           ` Linus Torvalds
@ 2007-08-17  3:43                                             ` Paul Mackerras
  2007-08-17  3:53                                               ` Herbert Xu
  2007-08-17 22:09                                             ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  3:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Chris Snook, Ilpo J?rvinen, Herbert Xu,
	Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Linus Torvalds writes:

> In general, I'd *much* rather we used barriers. Anything that "depends" on 
> volatile is pretty much set up to be buggy. But I'm certainly also willing 
> to have that volatile inside "atomic_read/atomic_set()" if it avoids code 
> that would otherwise break - ie if it hides a bug.

The cost of doing so seems to me to be well down in the noise - 44
bytes of extra kernel text on a ppc64 G5 config, and I don't believe
the extra few cycles for the occasional extra load would be measurable
(they should all hit in the L1 dcache).  I don't mind if x86[-64] have
atomic_read/set be nonvolatile and find all the missing barriers, but
for now on powerpc, I think that not having to find those missing
barriers is worth the 0.00076% increase in kernel text size.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:32                       ` Nick Piggin
@ 2007-08-17  3:50                         ` Linus Torvalds
  2007-08-17 23:59                           ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2007-08-17  3:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, satyam, clameter, schwidefsky, Chris Snook,
	Herbert Xu, davem, wensong, wjiang



On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> I'm surprised too. Numbers were from the "...use asm() like the other
> atomic operations already do" thread. According to them,
> 
>   text    data     bss     dec     hex filename
> 3434150  249176  176128 3859454  3ae3fe atomic_normal/vmlinux
> 3436203  249176  176128 3861507  3aec03 atomic_volatile/vmlinux
> 
> The first one is a stock kenel, the second is with atomic_read/set
> cast to volatile. gcc-4.1 -- maybe if you have an earlier gcc it
> won't optimise as much?

No, see my earlier reply. "volatile" really *is* an incredible piece of 
crap.

Just try it yourself:

	volatile int i;
	int j;

	int testme(void)
	{
	        return i <= 1;
	}

	int testme2(void)
	{
	        return j <= 1;
	}

and compile with all the optimizations you can.

I get:

	testme:
	        movl    i(%rip), %eax
	        subl    $1, %eax
	        setle   %al
	        movzbl  %al, %eax
	        ret

vs

	testme2:
	        xorl    %eax, %eax
	        cmpl    $1, j(%rip)
	        setle   %al
	        ret

(now, whether that "xorl + setle" is better than "setle + movzbl", I don't 
really know - maybe it is. But that's not thepoint. The point is the 
difference between

                movl    i(%rip), %eax
                subl    $1, %eax

and

                cmpl    $1, j(%rip)

and imagine this being done for *every* single volatile access.

Just do a 

	git grep atomic_read

to see how atomics are actually used. A lot of them are exactly the above 
kind of "compare against a value".

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:43                                             ` Paul Mackerras
@ 2007-08-17  3:53                                               ` Herbert Xu
  2007-08-17  6:26                                                 ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-17  3:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linus Torvalds, Christoph Lameter, Chris Snook, Ilpo J?rvinen,
	Satyam Sharma, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote:
>
> The cost of doing so seems to me to be well down in the noise - 44
> bytes of extra kernel text on a ppc64 G5 config, and I don't believe
> the extra few cycles for the occasional extra load would be measurable
> (they should all hit in the L1 dcache).  I don't mind if x86[-64] have
> atomic_read/set be nonvolatile and find all the missing barriers, but
> for now on powerpc, I think that not having to find those missing
> barriers is worth the 0.00076% increase in kernel text size.

BTW, the sort of missing barriers that triggered this thread
aren't that subtle.  It'll result in a simple lock-up if the
loop condition holds upon entry.  At which point it's fairly
straightforward to find the culprit.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:15                                               ` Nick Piggin
@ 2007-08-17  4:02                                                 ` Paul Mackerras
  2007-08-17  4:39                                                   ` Nick Piggin
  2007-08-17  7:25                                                 ` Stefan Richter
  2007-08-17 22:14                                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  4:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Herbert Xu, Stefan Richter, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Nick Piggin writes:

> Why are people making these undocumented and just plain false
> assumptions about atomic_t?

Well, it has only been false since December 2006.  Prior to that
atomics *were* volatile on all platforms.

> If they're using lockless code (ie.
> which they must be if using atomics), then they actually need to be
> thinking much harder about memory ordering issues.

Indeed.  I believe that most uses of atomic_read other than in polling
loops or debug printk statements are actually racy.  In some cases the
race doesn't seem to matter, but I'm sure there are cases where it
does.

> If that is too
> much for them, then they can just use locks.

Why use locks when you can just sprinkle magic fix-the-races dust (aka
atomic_t) over your code? :) :)

> > Precisely.  And volatility is a key property of "atomic".  Let's please
> > not throw it away.
> 
> It isn't, though (at least not since i386 and x86-64 don't have it).

Conceptually it is, because atomic_t is specifically for variables
which are liable to be modified by other CPUs, and volatile _means_
"liable to be changed by mechanisms outside the knowledge of the
compiler".

> _Adding_ it is trivial, and can be done any time. Throwing it away
> (ie. making the API weaker) is _hard_. So let's not add it without

Well, in one sense it's not that hard - Linus did it just 8 months ago
in commit f9e9dcb3. :)

> really good reasons. It most definitely results in worse code
> generation in practice.

0.0008% increase in kernel text size on powerpc according to my
measurement. :)

> I don't know why people would assume volatile of atomics. AFAIK, most

By making something an atomic_t you're saying "other CPUs are going to
be modifying this, so treat it specially".  It's reasonable to assume
that special treatment extends to reading and setting it.

> of the documentation is pretty clear that all the atomic stuff can be
> reordered etc. except for those that modify and return a value.

Volatility isn't primarily about reordering (though as Linus says it
does restrict reordering to some extent).

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 20:50             ` Segher Boessenkool
  2007-08-16 22:40               ` David Schwartz
@ 2007-08-17  4:24               ` Satyam Sharma
  2007-08-17 22:34                 ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  4:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang, davids



On Thu, 16 Aug 2007, Segher Boessenkool wrote:

> > Note that "volatile"
> > is a type-qualifier, not a type itself, so a cast of the _object_ itself
> > to a qualified-type i.e. (volatile int) would not make the access itself
> > volatile-qualified.
> 
> There is no such thing as "volatile-qualified access" defined
> anywhere; there only is the concept of a "volatile-qualified
> *object*".

Sure, "volatile-qualified access" was not some standard term I used
there. Just something to mean "an access that would make the compiler
treat the object at that memory as if it were an object with a
volatile-qualified type".

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 21:00               ` Segher Boessenkool
@ 2007-08-17  4:32                 ` Satyam Sharma
  2007-08-17 22:38                   ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  4:32 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang



On Thu, 16 Aug 2007, Segher Boessenkool wrote:

> > Here, I should obviously admit that the semantics of *(volatile int *)&
> > aren't any neater or well-defined in the _language standard_ at all. The
> > standard does say (verbatim) "precisely what constitutes as access to
> > object of volatile-qualified type is implementation-defined", but GCC
> > does help us out here by doing the right thing.
> 
> Where do you get that idea?

Try a testcase (experimentally verify).

> GCC manual, section 6.1, "When
> is a Volatile Object Accessed?" doesn't say anything of the
> kind.

True, "implementation-defined" as per the C standard _is_ supposed to mean
"unspecified behaviour where each implementation documents how the choice
is made". So ok, probably GCC isn't "documenting" this
implementation-defined behaviour which it is supposed to, but can't really
fault them much for this, probably.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 22:40               ` David Schwartz
@ 2007-08-17  4:36                 ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  4:36 UTC (permalink / raw)
  To: David Schwartz; +Cc: Linux-Kernel@Vger. Kernel. Org

[ Your mailer drops Cc: lists, munges headers,
  does all sorts of badness. Please fix that. ]


On Thu, 16 Aug 2007, David Schwartz wrote:

> 
> > There is a quite convincing argument that such an access _is_ an
> > access to a volatile object; see GCC PR21568 comment #9.  This
> > probably isn't the last word on the matter though...
> 
> I find this argument completely convincing and retract the contrary argument
> that I've made many times in this forum and others. You learn something new
> every day.
> 
> Just in case it wasn't clear:
> int i;
> *(volatile int *)&i=2;
> 
> In this case, there *is* an access to a volatile object. This is the end
> result of the the standard's definition of what it means to apply the
> 'volatile int *' cast to '&i' and then apply the '*' operator to the result
> and use it as an lvalue.

True, see my last mail in this sub-thread that explains precisely this :-)


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  4:02                                                 ` Paul Mackerras
@ 2007-08-17  4:39                                                   ` Nick Piggin
  0 siblings, 0 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  4:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: paulmck, Herbert Xu, Stefan Richter, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Paul Mackerras wrote:
> Nick Piggin writes:
> 
> 
>>Why are people making these undocumented and just plain false
>>assumptions about atomic_t?
> 
> 
> Well, it has only been false since December 2006.  Prior to that
> atomics *were* volatile on all platforms.

Hmm, although I don't think it has ever been guaranteed by the
API documentation (concede documentation is often not treated
as the authoritative source here, but for atomic it is actually
very good and obviously indispensable as the memory ordering
reference).

>>If they're using lockless code (ie.
>>which they must be if using atomics), then they actually need to be
>>thinking much harder about memory ordering issues.
> 
> 
> Indeed.  I believe that most uses of atomic_read other than in polling
> loops or debug printk statements are actually racy.  In some cases the
> race doesn't seem to matter, but I'm sure there are cases where it
> does.
> 
> 
>>If that is too
>>much for them, then they can just use locks.
> 
> 
> Why use locks when you can just sprinkle magic fix-the-races dust (aka
> atomic_t) over your code? :) :)

I agree with your skepticism of a lot of lockless code. But I think
a lot of the more subtle race problems will not be fixed with volatile.
The big, dumb infinite loop bugs would be fixed, but they're pretty
trivial to debug and even audit for.

>>>Precisely.  And volatility is a key property of "atomic".  Let's please
>>>not throw it away.
>>
>>It isn't, though (at least not since i386 and x86-64 don't have it).
> 
> 
> Conceptually it is, because atomic_t is specifically for variables
> which are liable to be modified by other CPUs, and volatile _means_
> "liable to be changed by mechanisms outside the knowledge of the
> compiler".

Usually that is the case, yes. But also most of the time we don't
care that it has been changed and don't mind it being reordered or
eliminated.

One of the only places we really care about that at all is for
variables that are modified by the *same* CPU.

>>_Adding_ it is trivial, and can be done any time. Throwing it away
>>(ie. making the API weaker) is _hard_. So let's not add it without
> 
> 
> Well, in one sense it's not that hard - Linus did it just 8 months ago
> in commit f9e9dcb3. :)

Well it would have been harder if the documentation also guaranteed
that atomic_read/atomic_set was ordered. Or it would have been harder
for _me_ to make such a change, anyway ;)

>>really good reasons. It most definitely results in worse code
>>generation in practice.
> 
> 
> 0.0008% increase in kernel text size on powerpc according to my
> measurement. :)

I don't think you're making a bad choice by keeping it volatile on
powerpc and waiting for others to shake out more of the bugs. You
get to fix everybody else's memory ordering bugs :)

>>I don't know why people would assume volatile of atomics. AFAIK, most
> 
> 
> By making something an atomic_t you're saying "other CPUs are going to
> be modifying this, so treat it specially".  It's reasonable to assume
> that special treatment extends to reading and setting it.

But I don't actually know what that "special treatment" is. Well
actually, I do know that operations will never result in a partial
modification being exposed. I also know that the operators that
do not modify and return are not guaranteed to have any sort of
ordering constraints.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 10:42                                           ` Herbert Xu
  2007-08-16 16:34                                             ` Paul E. McKenney
@ 2007-08-17  5:04                                             ` Paul Mackerras
  1 sibling, 0 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  5:04 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> So the point here is that if you don't mind getting a stale
> value from the CPU cache when doing an atomic_read, then
> surely you won't mind getting a stale value from the compiler
> "cache".

No, that particular argument is bogus, because there is a cache
coherency protocol operating to keep the CPU cache coherent with
stores from other CPUs, but there isn't any such protocol (nor should
there be) for a register used as a "cache".

(Linux requires SMP systems to keep any CPU caches coherent as far as
accesses by other CPUs are concerned.  It doesn't support any SMP
systems that are not cache-coherent as far as CPU accesses are
concerned.  It does support systems with non-cache-coherent DMA.)

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  1:28                                           ` Herbert Xu
@ 2007-08-17  5:07                                             ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17  5:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Christoph Lameter, Chris Snook, Ilpo Järvinen,
	Paul Mackerras, Satyam Sharma, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Fri, Aug 17, 2007 at 09:28:00AM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 06:02:32PM -0700, Paul E. McKenney wrote:
> > 
> > Yep.  Or you can use atomic_dec_return() instead of using a barrier.
> 
> Or you could use smp_mb__{before,after}_atomic_dec.

Yep.  That would be an example of a barrier, either in the
atomic_dec() itself or in the smp_mb...().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16  8:10                                     ` Herbert Xu
  2007-08-16  9:54                                       ` Stefan Richter
  2007-08-16 19:48                                       ` Chris Snook
@ 2007-08-17  5:09                                       ` Paul Mackerras
  2007-08-17  5:32                                         ` Herbert Xu
  2 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  5:09 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> Can you find an actual atomic_read code snippet there that is
> broken without the volatile modifier?

There are some in arch-specific code, for example line 1073 of
arch/mips/kernel/smtc.c.  On mips, cpu_relax() is just barrier(), so
the empty loop body is ok provided that atomic_read actually does the
load each time around the loop.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:42                       ` Linus Torvalds
@ 2007-08-17  5:18                         ` Paul E. McKenney
  2007-08-17  5:56                         ` Satyam Sharma
                                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17  5:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, Nick Piggin, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, satyam, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, wensong, wjiang

On Thu, Aug 16, 2007 at 08:42:23PM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > 
> > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > and it only saved 40 bytes (10 instructions) for a G5 config.
> 
> One of the things that "volatile" generally screws up is a simple
> 
> 	volatile int i;
> 
> 	i++;
> 
> which a compiler will generally get horribly, horribly wrong.
> 
> In a reasonable world, gcc should just make that be (on x86)
> 
> 	addl $1,i(%rip)
> 
> on x86-64, which is indeed what it does without the volatile. But with the 
> volatile, the compiler gets really nervous, and doesn't dare do it in one 
> instruction, and thus generates crap like
> 
>         movl    i(%rip), %eax
>         addl    $1, %eax
>         movl    %eax, i(%rip)

Blech.  Sounds like a chat with some gcc people is in order.  Will
see what I can do.

						Thanx, Paul

> instead. For no good reason, except that "volatile" just doesn't have any 
> good/clear semantics for the compiler, so most compilers will just make it 
> be "I will not touch this access in any way, shape, or form". Including 
> even trivially correct instruction optimization/combination.
> 
> This is one of the reasons why we should never use "volatile". It 
> pessimises code generation for no good reason - just because compilers 
> don't know what the heck it even means! 
> 
> Now, people don't do "i++" on atomics (you'd use "atomic_inc()" for that), 
> but people *do* do things like
> 
> 	if (atomic_read(..) <= 1)
> 		..
> 
> On ppc, things like that probably don't much matter. But on x86, it makes 
> a *huge* difference whether you do
> 
> 	movl i(%rip),%eax
> 	cmpl $1,%eax
> 
> or if you can just use the value directly for the operation, like this:
> 
> 	cmpl $1,i(%rip)
> 
> which is again a totally obvious and totally safe optimization, but is 
> (again) something that gcc doesn't dare do, since "i" is volatile.
> 
> In other words: "volatile" is a horribly horribly bad way of doing things, 
> because it generates *worse*code*, for no good reason. You just don't see 
> it on powerpc, because it's already a load-store architecture, so there is 
> no "good code" for doing direct-to-memory operations.
> 
> 		Linus
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  5:09                                       ` Paul Mackerras
@ 2007-08-17  5:32                                         ` Herbert Xu
  2007-08-17  5:41                                           ` Paul Mackerras
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-17  5:32 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Stefan Richter, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, Aug 17, 2007 at 03:09:57PM +1000, Paul Mackerras wrote:
> Herbert Xu writes:
> 
> > Can you find an actual atomic_read code snippet there that is
> > broken without the volatile modifier?
> 
> There are some in arch-specific code, for example line 1073 of
> arch/mips/kernel/smtc.c.  On mips, cpu_relax() is just barrier(), so
> the empty loop body is ok provided that atomic_read actually does the
> load each time around the loop.

A barrier() is all you need to force the compiler to reread
the value.

The people advocating volatile in this thread are talking
about code that doesn't use barrier()/cpu_relax().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  5:32                                         ` Herbert Xu
@ 2007-08-17  5:41                                           ` Paul Mackerras
  2007-08-17  8:28                                             ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  5:41 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stefan Richter, Satyam Sharma, Christoph Lameter,
	Paul E. McKenney, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Herbert Xu writes:

> On Fri, Aug 17, 2007 at 03:09:57PM +1000, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > Can you find an actual atomic_read code snippet there that is
> > > broken without the volatile modifier?
> > 
> > There are some in arch-specific code, for example line 1073 of
> > arch/mips/kernel/smtc.c.  On mips, cpu_relax() is just barrier(), so
> > the empty loop body is ok provided that atomic_read actually does the
> > load each time around the loop.
> 
> A barrier() is all you need to force the compiler to reread
> the value.
> 
> The people advocating volatile in this thread are talking
> about code that doesn't use barrier()/cpu_relax().

Did you look at it?  Here it is:

	/* Someone else is initializing in parallel - let 'em finish */
	while (atomic_read(&idle_hook_initialized) < 1000)
		;

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:42                       ` Linus Torvalds
  2007-08-17  5:18                         ` Paul E. McKenney
@ 2007-08-17  5:56                         ` Satyam Sharma
  2007-08-17  7:26                           ` Nick Piggin
  2007-08-17 22:49                           ` Segher Boessenkool
  2007-08-17  6:42                         ` Geert Uytterhoeven
                                           ` (2 subsequent siblings)
  4 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  5:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, Nick Piggin, Segher Boessenkool, heiko.carstens,
	horms, Linux Kernel Mailing List, rpjday, ak, netdev, cfriesen,
	Andrew Morton, jesper.juhl, linux-arch, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, wensong, wjiang

Hi Linus,

[ and others; I think there's a communication gap in a lot of this
  thread, and a little summary would be useful. Hence this posting. ]

On Thu, 16 Aug 2007, Linus Torvalds wrote:

> On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > 
> > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > and it only saved 40 bytes (10 instructions) for a G5 config.
> 
> One of the things that "volatile" generally screws up is a simple
> 
> 	volatile int i;
> 
> 	i++;
> 
> which a compiler will generally get horribly, horribly wrong.
> 
> [...] For no good reason, except that "volatile" just doesn't have any 
> good/clear semantics for the compiler, so most compilers will just make it 
> be "I will not touch this access in any way, shape, or form". Including 
> even trivially correct instruction optimization/combination.
> 
> This is one of the reasons why we should never use "volatile". It 
> pessimises code generation for no good reason - just because compilers 
> don't know what the heck it even means! 
> [...]
> In other words: "volatile" is a horribly horribly bad way of doing things, 
> because it generates *worse*code*, for no good reason. You just don't see 
> it on powerpc, because it's already a load-store architecture, so there is 
> no "good code" for doing direct-to-memory operations.

True, and I bet *everybody* on this list has already agreed for a very
long time that using "volatile" to type-qualify the _declaration_ of an
object itself as being horribly bad (taste-wise, code-generation-wise,
often even buggy for sitations where real CPU barriers should've been
used instead).

However, the discussion on this thread (IIRC) began with only "giving
volatility semantics" to atomic ops. Now that is different, and may not
require the use the "volatile" keyword (at least not in the declaration
of the object) itself.

Sadly, most arch's *still* do type-qualify the declaration of the
"counter" member of atomic_t as "volatile". This is probably a historic
hangover, and I suspect not yet rectified because of lethargy.

Anyway, some of the variants I can think of are:

[1]

#define atomic_read_volatile(v)				\
	({						\
		forget((v)->counter);			\
		((v)->counter);				\
	})

where:

#define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as recently
  explained on a GCC list somewhere, in response to the patch in my bitops
  series a few weeks back where I thought "+m" was bogus. ]

[2]

#define atomic_read_volatile(v)		(*(volatile int *)&(v)->counter)

This is something that does work. It has reasonably good semantics
guaranteed by the C standard in conjunction with how GCC currently
behaves (and how it has behaved for all supported versions). I haven't
checked if generates much different code than the first variant above,
(it probably will generate similar code to just declaring the object
as volatile, but would still be better in terms of code-clarity and
taste, IMHO), but in any case, we should pick whichever of these variants
works for us and generates good code.

[3]

static inline int atomic_read_volatile(atomic_t *v)
{
	... arch-dependent __asm__ __volatile__ stuff ...
}

I can reasonably bet this variant would often generate worse code than
at least the variant "[1]" above.

Now, why do we even require these "volatility" semantics variants?

Note, "volatility" semantics *know* / assume that it can have a meaning
_only_ as far as the compiler, so atomic_read_volatile() doesn't really
care reading stale values from the cache for certain non-x86 archs, etc.

The first argument is "safety":

Use of atomic_read() (possibly in conjunction with other atomic ops) in
a lot of code out there in the kernel *assumes* the compiler will not
optimize away those ops. (which is possible given current definitions
of atomic_set and atomic_read on archs such as x86 in present code).
An additional argument that builds on this one says that by ensuring
the compiler will not elid or coalesce these ops, we could even avoid
potential heisenbugs in the future.

However, there is a counter-argument:

As Herbert Xu has often been making the point, there is *no* bug out
there involving "atomic_read" in busy-while-loops that should not have
a compiler barrier (or cpu_relax() in fact) anyway. As for non-busy-loops,
they would invariable call schedule() at some point (possibly directly)
and thus have an "implicit" compiler barrier by virtue of calling out
a function that is not in scope of the current compilation unit (although
users in sched.c itself would probably require an explicit compiler
barrier).

The second pro-volatility-in-atomic-ops argument is performance:
(surprise!)

Using a full memory clobber compiler barrier in busy loops will disqualify
optimizations for loop invariants so it probably makes sense to *only*
make the compiler forget *that* particular address of the atomic counter
object, and none other. All 3 variants above would work nicely here.

So the final idea may be to have a cpu_relax_no_barrier() variant as a
rep;nop (pause) *without* an included full memory clobber, and replace
a lot of kernel busy-while-loops out there with:

-	cpu_relax();
+	cpu_relax_no_barrier();
+	forget(x);

or may be just:

-	cpu_relax();
+	cpu_relax_no_barrier();

because the "forget" / "volatility" / specific-variable-compiler-barrier
could be made implicit inside the atomic ops themselves.

This could especially make a difference for register-rich CPUs (probably
not x86) where using a full memory clobber will disqualify a hell lot of
compiler optimizations for loop-invariants.

On x86 itself, cpu_relax_no_barrier() could be:

#define cpu_relax_no_barrier()	__asm__ __volatile__ ("rep;nop":::);

and still continue to do its job as it is doing presently.

However, there is still a counter-argument:

As Herbert Xu and Christoph Lameter have often been saying, giving
"volatility" semantics to the atomic ops will disqualify compiler
optimizations such as eliding / coalescing of atomic ops, etc, and
probably some sections of code in the kernel (Christoph mentioned code
in SLUB, and I saw such code in sched) benefit from such optimizations.

Paul Mackerras has, otoh, mentioned that a lot of such places probably
don't need (or shouldn't use) atomic ops in the first place.
Alternatively, such callsites should probably just cache the atomic_read
in a local variable (which compiler could just as well make a register)
explicitly, and repeating atomic_read() isn't really necessary.

There could still be legitimate uses of atomic ops that don't care about
them being elided / coalesced, but given the loop-invariant-optimization
benefit, personally, I do see some benefit in the use of defining atomic
ops variants with "volatility" semantics (for only the atomic counter
object) but also having a non-volatile atomic ops API side-by-side for
performance critical users (probably sched, slub) that may require that.

Possibly, one of the two APIs above could turn out to be redundant, but
that's still very much the issue of debate presently.

Satyam

[ Sorry if I missed anything important, but this thread has been long
  and noisy, although I've tried to keep up ... ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:53                                               ` Herbert Xu
@ 2007-08-17  6:26                                                 ` Satyam Sharma
  2007-08-17  8:38                                                   ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  6:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Paul Mackerras, Linus Torvalds, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher



On Fri, 17 Aug 2007, Herbert Xu wrote:

> On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote:
> >
> > The cost of doing so seems to me to be well down in the noise - 44
> > bytes of extra kernel text on a ppc64 G5 config, and I don't believe
> > the extra few cycles for the occasional extra load would be measurable
> > (they should all hit in the L1 dcache).  I don't mind if x86[-64] have
> > atomic_read/set be nonvolatile and find all the missing barriers, but
> > for now on powerpc, I think that not having to find those missing
> > barriers is worth the 0.00076% increase in kernel text size.
> 
> BTW, the sort of missing barriers that triggered this thread
> aren't that subtle.  It'll result in a simple lock-up if the
> loop condition holds upon entry.  At which point it's fairly
> straightforward to find the culprit.

Not necessarily. A barrier-less buggy code such as below:

	atomic_set(&v, 0);

	... /* some initial code */

	while (atomic_read(&v))
		;

	... /* code that MUST NOT be executed unless v becomes non-zero */

(where v->counter is has no volatile access semantics)

could be generated by the compiler to simply *elid* or *do away* with
the loop itself, thereby making the:

"/* code that MUST NOT be executed unless v becomes non-zero */"

to be executed even when v is zero! That is subtle indeed, and causes
no hard lockups.

Granted, the above IS buggy code. But, the stated objective is to avoid
heisenbugs. And we have driver / subsystem maintainers such as Stefan
coming up and admitting that often a lot of code that's written to use
atomic_read() does assume the read will not be elided by the compiler.

See, I agree, "volatility" semantics != what we often want. However, if
what we want is compiler barrier, for only the object under consideration,
"volatility" semantics aren't really "nonsensical" or anything.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:42                       ` Linus Torvalds
  2007-08-17  5:18                         ` Paul E. McKenney
  2007-08-17  5:56                         ` Satyam Sharma
@ 2007-08-17  6:42                         ` Geert Uytterhoeven
  2007-08-17  8:52                         ` Andi Kleen
  2007-08-17 22:29                         ` Segher Boessenkool
  4 siblings, 0 replies; 657+ messages in thread
From: Geert Uytterhoeven @ 2007-08-17  6:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, Nick Piggin, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, satyam, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, wensong, wjiang

On Thu, 16 Aug 2007, Linus Torvalds wrote:
> On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > and it only saved 40 bytes (10 instructions) for a G5 config.
> 
> One of the things that "volatile" generally screws up is a simple
> 
> 	volatile int i;
> 
> 	i++;
> 
> which a compiler will generally get horribly, horribly wrong.
> 
> In a reasonable world, gcc should just make that be (on x86)
> 
> 	addl $1,i(%rip)
> 
> on x86-64, which is indeed what it does without the volatile. But with the 
> volatile, the compiler gets really nervous, and doesn't dare do it in one 
> instruction, and thus generates crap like
> 
>         movl    i(%rip), %eax
>         addl    $1, %eax
>         movl    %eax, i(%rip)
> 
> instead. For no good reason, except that "volatile" just doesn't have any 
> good/clear semantics for the compiler, so most compilers will just make it 
> be "I will not touch this access in any way, shape, or form". Including 
> even trivially correct instruction optimization/combination.

Apart from having to fetch more bytes for the instructions (which does
matter), execution time is probably the same on modern processors, as they
convert the single instruction to RISC-style load, modify, store anyway.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:15                                               ` Nick Piggin
  2007-08-17  4:02                                                 ` Paul Mackerras
@ 2007-08-17  7:25                                                 ` Stefan Richter
  2007-08-17  8:06                                                   ` Nick Piggin
  2007-08-17 22:14                                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-17  7:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Nick Piggin wrote:
> I don't know why people would assume volatile of atomics. AFAIK, most
> of the documentation is pretty clear that all the atomic stuff can be
> reordered etc. except for those that modify and return a value.

Which documentation is there?

For driver authors, there is LDD3.  It doesn't specifically cover
effects of optimization on accesses to atomic_t.

For architecture port authors, there is Documentation/atomic_ops.txt.
Driver authors also can learn something from that document, as it
indirectly documents the atomic_t and bitops APIs.

Prompted by this thread, I reread this document, and indeed, the
sentence "Unlike the above routines, it is required that explicit memory
barriers are performed before and after [atomic_{inc,dec}_return]"
indicates that atomic_read (one of the "above routines") is very
different from all other atomic_t accessors that return values.

This is strange.  Why is it that atomic_read stands out that way?  IMO
this API imbalance is quite unexpected by many people.  Wouldn't it be
beneficial to change the atomic_read API to behave the same like all
other atomic_t accessors that return values?

OK, it is also different from the other accessors that return data in so
far as it doesn't modify the data.  But as driver "author", i.e. user of
the API, I can't see much use of an atomic_read that can be reordered
and, more importantly, can be optimized away by the compiler.  Sure, now
that I learned of these properties I can start to audit code and insert
barriers where I believe they are needed, but this simply means that
almost all occurrences of atomic_read will get barriers (unless there
already are implicit but more or less obvious barriers like msleep).
-- 
Stefan Richter
-=====-=-=== =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  5:56                         ` Satyam Sharma
@ 2007-08-17  7:26                           ` Nick Piggin
  2007-08-17  8:47                             ` Satyam Sharma
  2007-08-17 22:49                           ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  7:26 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

Satyam Sharma wrote:

> #define atomic_read_volatile(v)				\
> 	({						\
> 		forget((v)->counter);			\
> 		((v)->counter);				\
> 	})
> 
> where:

*vomit* :)

Not only do I hate the keyword volatile, but the barrier is only a
one-sided affair so its probable this is going to have slightly
different allowed reorderings than a real volatile access.

Also, why would you want to make these insane accessors for atomic_t
types? Just make sure everybody knows the basics of barriers, and they
can apply that knowledge to atomic_t and all other lockless memory
accesses as well.

> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))

I like order(x) better, but it's not the most perfect name either.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  1:01                                                 ` Paul E. McKenney
@ 2007-08-17  7:39                                                   ` Satyam Sharma
  2007-08-17 14:31                                                     ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  7:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher



On Thu, 16 Aug 2007, Paul E. McKenney wrote:

> On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
> > >
> > > The compiler can also reorder non-volatile accesses.  For an example
> > > patch that cares about this, please see:
> > > 
> > > 	http://lkml.org/lkml/2007/8/7/280
> > > 
> > > This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
> > > rcu_read_unlock() to ensure that accesses aren't reordered with respect
> > > to interrupt handlers and NMIs/SMIs running on that same CPU.
> > 
> > Good, finally we have some code to discuss (even though it's
> > not actually in the kernel yet).
> 
> There was some earlier in this thread as well.

Hmm, I never quite got what all this interrupt/NMI/SMI handling and
RCU business you mentioned earlier was all about, but now that you've
pointed to the actual code and issues with it ...


> > First of all, I think this illustrates that what you want
> > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > macro occurs a lot more times in your patch than atomic
> > reads/sets.  So *assuming* that it was necessary at all,
> > then having an ordered variant of the atomic_read/atomic_set
> > ops could do just as well.
> 
> Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> to maintain ordering, then I could just use them instead of having to
> create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> different patch.)

+#define WHATEVER(x)	(*(volatile typeof(x) *)&(x))

I suppose one could want volatile access semantics for stuff that's
a bit-field too, no?

Also, this gives *zero* "re-ordering" guarantees that your code wants
as you've explained it below) -- neither w.r.t. CPU re-ordering (which
probably you don't care about) *nor* w.r.t. compiler re-ordering
(which you definitely _do_ care about).


> > However, I still don't know which atomic_read/atomic_set in
> > your patch would be broken if there were no volatile.  Could
> > you please point them out?
> 
> Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> 
> o	If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
> 	was ordered by the compiler after
> 	"ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
> 	suppose an NMI/SMI happened after the rcu_read_lock_nesting but
> 	before the rcu_flipctr.
> 
> 	Then if there was an rcu_read_lock() in the SMI/NMI
> 	handler (which is perfectly legal), the nested rcu_read_lock()
> 	would believe that it could take the then-clause of the
> 	enclosing "if" statement.  But because the rcu_flipctr per-CPU
> 	variable had not yet been incremented, an RCU updater would
> 	be within its rights to assume that there were no RCU reads
> 	in progress, thus possibly yanking a data structure out from
> 	under the reader in the SMI/NMI function.
> 
> 	Fatal outcome.  Note that only one CPU is involved here
> 	because these are all either per-CPU or per-task variables.

Ok, so you don't care about CPU re-ordering. Still, I should let you know
that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you
want is a full compiler optimization barrier().

[ Your code probably works now, and emits correct code, but that's
  just because of gcc did what it did. Nothing in any standard,
  or in any documented behaviour of gcc, or anything about the real
  (or expected) semantics of "volatile" is protecting the code here. ]


> o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
> 	was ordered by the compiler to follow the
> 	"ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
> 	happened between the two, then an __rcu_read_lock() in the NMI/SMI
> 	would incorrectly take the "else" clause of the enclosing "if"
> 	statement.  If some other CPU flipped the rcu_ctrlblk.completed
> 	in the meantime, then the __rcu_read_lock() would (correctly)
> 	write the new value into rcu_flipctr_idx.
> 
> 	Well and good so far.  But the problem arises in
> 	__rcu_read_unlock(), which then decrements the wrong counter.
> 	Depending on exactly how subsequent events played out, this could
> 	result in either prematurely ending grace periods or never-ending
> 	grace periods, both of which are fatal outcomes.
> 
> And the following are not needed in the current version of the
> patch, but will be in a future version that either avoids disabling
> irqs or that dispenses with the smp_read_barrier_depends() that I
> have 99% convinced myself is unneeded:
> 
> o	nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> 
> o	idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;
> 
> Furthermore, in that future version, irq handlers can cause the same
> mischief that SMI/NMI handlers can in this version.
> 
> Next, looking at __rcu_read_unlock():
> 
> o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting - 1"
> 	was reordered by the compiler to follow the
> 	"ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--",
> 	then if an NMI/SMI containing an rcu_read_lock() occurs between
> 	the two, this nested rcu_read_lock() would incorrectly believe
> 	that it was protected by an enclosing RCU read-side critical
> 	section as described in the first reversal discussed for
> 	__rcu_read_lock() above.  Again, fatal outcome.
> 
> This is what we have now.  It is not hard to imagine situations that
> interact with -both- interrupt handlers -and- other CPUs, as described
> earlier.

It's not about interrupt/SMI/NMI handlers at all! What you clearly want,
simply put, is that a certain stream of C statements must be emitted
by the compiler _as they are_ with no re-ordering optimizations! You must
*definitely* use barrier(), IMHO.


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  7:25                                                 ` Stefan Richter
@ 2007-08-17  8:06                                                   ` Nick Piggin
  2007-08-17  8:58                                                     ` Satyam Sharma
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  8:06 UTC (permalink / raw)
  To: Stefan Richter
  Cc: paulmck, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Stefan Richter wrote:
> Nick Piggin wrote:
> 
>>I don't know why people would assume volatile of atomics. AFAIK, most
>>of the documentation is pretty clear that all the atomic stuff can be
>>reordered etc. except for those that modify and return a value.
> 
> 
> Which documentation is there?

Documentation/atomic_ops.txt


> For driver authors, there is LDD3.  It doesn't specifically cover
> effects of optimization on accesses to atomic_t.
> 
> For architecture port authors, there is Documentation/atomic_ops.txt.
> Driver authors also can learn something from that document, as it
> indirectly documents the atomic_t and bitops APIs.
>

"Semantics and Behavior of Atomic and Bitmask Operations" is
pretty direct :)

Sure, it says that it's for arch maintainers, but there is no
reason why users can't make use of it.


> Prompted by this thread, I reread this document, and indeed, the
> sentence "Unlike the above routines, it is required that explicit memory
> barriers are performed before and after [atomic_{inc,dec}_return]"
> indicates that atomic_read (one of the "above routines") is very
> different from all other atomic_t accessors that return values.
> 
> This is strange.  Why is it that atomic_read stands out that way?  IMO

It is not just atomic_read of course. It is atomic_add,sub,inc,dec,set.


> this API imbalance is quite unexpected by many people.  Wouldn't it be
> beneficial to change the atomic_read API to behave the same like all
> other atomic_t accessors that return values?

It is very consistent and well defined. Operations which both modify
the data _and_ return something are defined to have full barriers
before and after.

What do you want to add to the other atomic accessors? Full memory
barriers? Only compiler barriers? It's quite likely that if you think
some barriers will fix bugs, then there are other bugs lurking there
anyway.

Just use spinlocks if you're not absolutely clear about potential
races and memory ordering issues -- they're pretty cheap and simple.


> OK, it is also different from the other accessors that return data in so
> far as it doesn't modify the data.  But as driver "author", i.e. user of
> the API, I can't see much use of an atomic_read that can be reordered
> and, more importantly, can be optimized away by the compiler.

It will return to you an atomic snapshot of the data (loaded from
memory at some point since the last compiler barrier). All you have
to be aware of compiler barriers and the Linux SMP memory ordering
model, which should be a given if you are writing lockless code.


> Sure, now
> that I learned of these properties I can start to audit code and insert
> barriers where I believe they are needed, but this simply means that
> almost all occurrences of atomic_read will get barriers (unless there
> already are implicit but more or less obvious barriers like msleep).

You might find that these places that appear to need barriers are
buggy for other reasons anyway. Can you point to some in-tree code
we can have a look at?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  5:41                                           ` Paul Mackerras
@ 2007-08-17  8:28                                             ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  8:28 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Herbert Xu, Stefan Richter, Christoph Lameter, Paul E. McKenney,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Fri, 17 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > On Fri, Aug 17, 2007 at 03:09:57PM +1000, Paul Mackerras wrote:
> > > Herbert Xu writes:
> > > 
> > > > Can you find an actual atomic_read code snippet there that is
> > > > broken without the volatile modifier?
> > > 
> > > There are some in arch-specific code, for example line 1073 of
> > > arch/mips/kernel/smtc.c.  On mips, cpu_relax() is just barrier(), so
> > > the empty loop body is ok provided that atomic_read actually does the
> > > load each time around the loop.
> > 
> > A barrier() is all you need to force the compiler to reread
> > the value.
> > 
> > The people advocating volatile in this thread are talking
> > about code that doesn't use barrier()/cpu_relax().
> 
> Did you look at it?  Here it is:
> 
> 	/* Someone else is initializing in parallel - let 'em finish */
> 	while (atomic_read(&idle_hook_initialized) < 1000)
> 		;

Honestly, this thread is suffering from HUGE communication gaps.

What Herbert (obviously) meant there was that "this loop could've
been okay _without_ using volatile-semantics-atomic_read() also, if
only it used cpu_relax()".

That does work, because cpu_relax() is _at least_ barrier() on all
archs (on some it also emits some arch-dependent "pause" kind of
instruction).

Now, saying that "MIPS does not have such an instruction so I won't
use cpu_relax() for arch-dependent-busy-while-loops in arch/mips/"
sounds like a wrong argument, because: tomorrow, such arch's _may_
introduce such an instruction, so naturally, at that time we'd
change cpu_relax() appropriately (in reality, we would actually
*re-define* cpu_relax() and ensure that the correct version gets
pulled in depending on whether the callsite code is legacy or only
for the newer such CPUs of said arch, whatever), but loops such as
this would remain un-changed, because they never used cpu_relax()!

OTOH an argument that said the following would've made a stronger case:

"I don't want to use cpu_relax() because that's a full memory
clobber barrier() and I have loop-invariants / other variables
around in that code that I *don't* want the compiler to forget
just because it used cpu_relax(), and hence I will not use
cpu_relax() but instead make my atomic_read() itself have
"volatility" semantics. Not just that, but I will introduce a
cpu_relax_no_barrier() on MIPS, that would be a no-op #define
for now, but which may not be so forever, and continue to use
that in such busy loops."

In general, please read the thread-summary I've tried to do at:
http://lkml.org/lkml/2007/8/17/25
Feel free to continue / comment / correct stuff from there, there's
too much confusion and circular-arguments happening on this thread
otherwise.

[ I might've made an incorrect statement there about
  "volatile" w.r.t. cache on non-x86 archs, I think. ]

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  6:26                                                 ` Satyam Sharma
@ 2007-08-17  8:38                                                   ` Nick Piggin
  2007-08-17  9:14                                                     ` Satyam Sharma
  2007-08-17 11:08                                                     ` Stefan Richter
  0 siblings, 2 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  8:38 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:
> 
> On Fri, 17 Aug 2007, Herbert Xu wrote:
> 
> 
>>On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote:
>>
>>BTW, the sort of missing barriers that triggered this thread
>>aren't that subtle.  It'll result in a simple lock-up if the
>>loop condition holds upon entry.  At which point it's fairly
>>straightforward to find the culprit.
> 
> 
> Not necessarily. A barrier-less buggy code such as below:
> 
> 	atomic_set(&v, 0);
> 
> 	... /* some initial code */
> 
> 	while (atomic_read(&v))
> 		;
> 
> 	... /* code that MUST NOT be executed unless v becomes non-zero */
> 
> (where v->counter is has no volatile access semantics)
> 
> could be generated by the compiler to simply *elid* or *do away* with
> the loop itself, thereby making the:
> 
> "/* code that MUST NOT be executed unless v becomes non-zero */"
> 
> to be executed even when v is zero! That is subtle indeed, and causes
> no hard lockups.

Then I presume you mean

while (!atomic_read(&v))
     ;

Which is just the same old infinite loop bug solved with cpu_relax().
These are pretty trivial to audit and fix, and also to debug, I would
think.

> Granted, the above IS buggy code. But, the stated objective is to avoid
> heisenbugs.

Anyway, why are you making up code snippets that are buggy in other
ways in order to support this assertion being made that lots of kernel
code supposedly depends on volatile semantics. Just reference the
actual code.

> And we have driver / subsystem maintainers such as Stefan
> coming up and admitting that often a lot of code that's written to use
> atomic_read() does assume the read will not be elided by the compiler.

So these are broken on i386 and x86-64?

Are they definitely safe on SMP and weakly ordered machines with
just a simple compiler barrier there? Because I would not be
surprised if there are a lot of developers who don't really know
what to assume when it comes to memory ordering issues.

This is not a dig at driver writers: we still have memory ordering
problems in the VM too (and probably most of the subtle bugs in
lockless VM code are memory ordering ones). Let's not make up a
false sense of security and hope that sprinkling volatile around
will allow people to write bug-free lockless code. If a writer
can't be bothered reading API documentation and learning the Linux
memory model, they can still be productive writing safely locked
code.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  7:26                           ` Nick Piggin
@ 2007-08-17  8:47                             ` Satyam Sharma
  2007-08-17  9:15                               ` Nick Piggin
  2007-08-17  9:48                               ` Paul Mackerras
  0 siblings, 2 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  8:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > #define atomic_read_volatile(v)				\
> > 	({						\
> > 		forget((v)->counter);			\
> > 		((v)->counter);				\
> > 	})
> > 
> > where:
> 
> *vomit* :)

I wonder if this'll generate smaller and better code than _both_ the
other atomic_read_volatile() variants. Would need to build allyesconfig
on lots of diff arch's etc to test the theory though.


> Not only do I hate the keyword volatile, but the barrier is only a
> one-sided affair so its probable this is going to have slightly
> different allowed reorderings than a real volatile access.

True ...


> Also, why would you want to make these insane accessors for atomic_t
> types? Just make sure everybody knows the basics of barriers, and they
> can apply that knowledge to atomic_t and all other lockless memory
> accesses as well.

Code that looks like:

	while (!atomic_read(&v)) {
		...
		cpu_relax_no_barrier();
		forget(v.counter);
		        ^^^^^^^^
	}

would be uglier. Also think about code such as:

	a = atomic_read();
	if (!a)
		do_something();

	forget();
	a = atomic_read();
	... /* some code that depends on value of a, obviously */

	forget();
	a = atomic_read();
	...

So much explicit sprinkling of "forget()" looks ugly.

	atomic_read_volatile()

on the other hand, looks neater. The "_volatile()" suffix makes it also
no less explicit than an explicit barrier-like macro that this primitive
is something "special", for code clarity purposes.


> > #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> 
> I like order(x) better, but it's not the most perfect name either.

forget(x) is just a stupid-placeholder-for-a-better-name. order(x) sounds
good but we could leave quibbling about function or macro names for later,
this thread is noisy as it is :-)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:42                       ` Linus Torvalds
                                           ` (2 preceding siblings ...)
  2007-08-17  6:42                         ` Geert Uytterhoeven
@ 2007-08-17  8:52                         ` Andi Kleen
  2007-08-17 10:08                           ` Satyam Sharma
  2007-08-17 22:29                         ` Segher Boessenkool
  4 siblings, 1 reply; 657+ messages in thread
From: Andi Kleen @ 2007-08-17  8:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, Nick Piggin, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, satyam, clameter, schwidefsky, Chris Snook,
	Herbert Xu, davem, wensong, wjiang

On Friday 17 August 2007 05:42, Linus Torvalds wrote:
> On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > and it only saved 40 bytes (10 instructions) for a G5 config.
>
> One of the things that "volatile" generally screws up is a simple
>
> 	volatile int i;
>
> 	i++;

But for atomic_t people use atomic_inc() anyways which does this correctly.
It shouldn't really matter for atomic_t.

I'm worrying a bit that the volatile atomic_t change caused subtle code 
breakage like these delay read loops people here pointed out.
Wouldn't it be safer to just re-add the volatile to atomic_read() 
for 2.6.23? Or alternatively make it asm(), but volatile seems more
proven.

-Andi

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:06                                                   ` Nick Piggin
@ 2007-08-17  8:58                                                     ` Satyam Sharma
  2007-08-17  9:15                                                       ` Nick Piggin
  2007-08-17 10:48                                                     ` Stefan Richter
  2007-08-18 14:35                                                     ` LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures) Stefan Richter
  2 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  8:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Stefan Richter wrote:
> [...]
> Just use spinlocks if you're not absolutely clear about potential
> races and memory ordering issues -- they're pretty cheap and simple.

I fully agree with this. As Paul Mackerras mentioned elsewhere,
a lot of authors sprinkle atomic_t in code thinking they're somehow
done with *locking*. This is sad, and I wonder if it's time for a
Documentation/atomic-considered-dodgy.txt kind of document :-)


> > Sure, now
> > that I learned of these properties I can start to audit code and insert
> > barriers where I believe they are needed, but this simply means that
> > almost all occurrences of atomic_read will get barriers (unless there
> > already are implicit but more or less obvious barriers like msleep).
> 
> You might find that these places that appear to need barriers are
> buggy for other reasons anyway. Can you point to some in-tree code
> we can have a look at?

Such code was mentioned elsewhere (query nodemgr_host_thread in cscope)
that managed to escape the requirement for a barrier only because of
some completely un-obvious compilation-unit-scope thing. But I find such
an non-explicit barrier quite bad taste. Stefan, do consider plunking an
explicit call to barrier() there.


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:38                                                   ` Nick Piggin
@ 2007-08-17  9:14                                                     ` Satyam Sharma
  2007-08-17  9:31                                                       ` Nick Piggin
  2007-08-17 11:08                                                     ` Stefan Richter
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17  9:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> [...]
> > Granted, the above IS buggy code. But, the stated objective is to avoid
> > heisenbugs.
    ^^^^^^^^^^

> Anyway, why are you making up code snippets that are buggy in other
> ways in order to support this assertion being made that lots of kernel
> code supposedly depends on volatile semantics. Just reference the
> actual code.

Because the point is *not* about existing bugs in kernel code. At some
point Chris Snook (who started this thread) did write that "If I knew
of the existing bugs in the kernel, I would be sending patches for them,
not this series" or something to that effect.

The point is about *author expecations*. If people do expect atomic_read()
(or a variant thereof) to have volatile semantics, why not give them such
a variant?

And by the way, the point is *also* about the fact that cpu_relax(), as
of today, implies a full memory clobber, which is not what a lot of such
loops want. (due to stuff mentioned elsewhere, summarized in that summary)

> > And we have driver / subsystem maintainers such as Stefan
> > coming up and admitting that often a lot of code that's written to use
> > atomic_read() does assume the read will not be elided by the compiler.
                                                             ^^^^^^^^^^^^^

(so it's about compiler barrier expectations only, though I fully agree
that those who're using atomic_t as if it were some magic thing that lets
them write lockless code are sorrily mistaken.)

> So these are broken on i386 and x86-64?

Possibly, but the point is not about existing bugs, as mentioned above.

Some such bugs have been found nonetheless -- reminds me, can somebody
please apply http://www.gossamer-threads.com/lists/linux/kernel/810674 ?

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:47                             ` Satyam Sharma
@ 2007-08-17  9:15                               ` Nick Piggin
  2007-08-17 10:12                                 ` Satyam Sharma
  2007-08-17  9:48                               ` Paul Mackerras
  1 sibling, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  9:15 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

Satyam Sharma wrote:
> 
> On Fri, 17 Aug 2007, Nick Piggin wrote:

>>Also, why would you want to make these insane accessors for atomic_t
>>types? Just make sure everybody knows the basics of barriers, and they
>>can apply that knowledge to atomic_t and all other lockless memory
>>accesses as well.
> 
> 
> Code that looks like:
> 
> 	while (!atomic_read(&v)) {
> 		...
> 		cpu_relax_no_barrier();
> 		forget(v.counter);
> 		        ^^^^^^^^
> 	}
> 
> would be uglier. Also think about code such as:

I think they would both be equally ugly, but the atomic_read_volatile
variant would be more prone to subtle bugs because of the weird
implementation.

And it would be more ugly than introducing an order(x) statement for
all memory operations, and adding an order_atomic() wrapper for it
for atomic types.

> 	a = atomic_read();
> 	if (!a)
> 		do_something();
> 
> 	forget();
> 	a = atomic_read();
> 	... /* some code that depends on value of a, obviously */
> 
> 	forget();
> 	a = atomic_read();
> 	...
> 
> So much explicit sprinkling of "forget()" looks ugly.

Firstly, why is it ugly? It's nice because of those nice explicit
statements there that give us a good heads up and would have some
comments attached to them (also, lack of the word "volatile" is
always a plus).

Secondly, what sort of code would do such a thing? In most cases,
it is probably riddled with bugs anyway (unless it is doing a
really specific sequence of interrupts or something, but in that
case it is very likely to either require locking or busy waits
anyway -> ie. barriers).

> on the other hand, looks neater. The "_volatile()" suffix makes it also
> no less explicit than an explicit barrier-like macro that this primitive
> is something "special", for code clarity purposes.

Just don't use the word volatile, and have barriers both before
and after the memory operation, and I'm OK with it. I don't see
the point though, when you could just have a single barrier(x)
barrier function defined for all memory locations, rather than
this odd thing that only works for atomics (and would have to
be duplicated for atomic_set.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:58                                                     ` Satyam Sharma
@ 2007-08-17  9:15                                                       ` Nick Piggin
  2007-08-17 10:03                                                         ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  9:15 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:
> 
> On Fri, 17 Aug 2007, Nick Piggin wrote:

>>>Sure, now
>>>that I learned of these properties I can start to audit code and insert
>>>barriers where I believe they are needed, but this simply means that
>>>almost all occurrences of atomic_read will get barriers (unless there
>>>already are implicit but more or less obvious barriers like msleep).
>>
>>You might find that these places that appear to need barriers are
>>buggy for other reasons anyway. Can you point to some in-tree code
>>we can have a look at?
> 
> 
> Such code was mentioned elsewhere (query nodemgr_host_thread in cscope)
> that managed to escape the requirement for a barrier only because of
> some completely un-obvious compilation-unit-scope thing. But I find such
> an non-explicit barrier quite bad taste. Stefan, do consider plunking an
> explicit call to barrier() there.

It is very obvious. msleep calls schedule() (ie. sleeps), which is
always a barrier.

The "unobvious" thing is that you wanted to know how the compiler knows
a function is a barrier -- answer is that if it does not *know* it is not
a barrier, it must assume it is a barrier. If the whole msleep call chain
including the scheduler were defined static in the current compilation
unit, then it would still be a barrier because it would actually be able
to see the barriers in schedule(void), if nothing else.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  9:14                                                     ` Satyam Sharma
@ 2007-08-17  9:31                                                       ` Nick Piggin
  2007-08-17 10:55                                                         ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17  9:31 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:
> 
> On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> 
>>Satyam Sharma wrote:
>>[...]
>>
>>>Granted, the above IS buggy code. But, the stated objective is to avoid
>>>heisenbugs.
> 
>     ^^^^^^^^^^
> 
> 
>>Anyway, why are you making up code snippets that are buggy in other
>>ways in order to support this assertion being made that lots of kernel
>>code supposedly depends on volatile semantics. Just reference the
>>actual code.
> 
> 
> Because the point is *not* about existing bugs in kernel code. At some
> point Chris Snook (who started this thread) did write that "If I knew
> of the existing bugs in the kernel, I would be sending patches for them,
> not this series" or something to that effect.
> 
> The point is about *author expecations*. If people do expect atomic_read()
> (or a variant thereof) to have volatile semantics, why not give them such
> a variant?

Because they should be thinking about them in terms of barriers, over
which the compiler / CPU is not to reorder accesses or cache memory
operations, rather than "special" "volatile" accesses. Linux's whole
memory ordering and locking model is completely geared around the
former.


> And by the way, the point is *also* about the fact that cpu_relax(), as
> of today, implies a full memory clobber, which is not what a lot of such
> loops want. (due to stuff mentioned elsewhere, summarized in that summary)

That's not the point, because as I also mentioned, the logical extention
to Linux's barrier API to handle this is the order(x) macro. Again, not
special volatile accessors.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:47                             ` Satyam Sharma
  2007-08-17  9:15                               ` Nick Piggin
@ 2007-08-17  9:48                               ` Paul Mackerras
  2007-08-17 10:23                                 ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Paul Mackerras @ 2007-08-17  9:48 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Nick Piggin, Linus Torvalds, Segher Boessenkool, heiko.carstens,
	horms, Linux Kernel Mailing List, rpjday, ak, netdev, cfriesen,
	Andrew Morton, jesper.juhl, linux-arch, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, wensong, wjiang

Satyam Sharma writes:

> I wonder if this'll generate smaller and better code than _both_ the
> other atomic_read_volatile() variants. Would need to build allyesconfig
> on lots of diff arch's etc to test the theory though.

I'm sure it would be a tiny effect.

This whole thread is arguing about effects that are quite
insignificant.  On the one hand we have the non-volatile proponents,
who want to let the compiler do extra optimizations - which amounts to
letting it elide maybe a dozen loads in the whole kernel, loads which
would almost always be L1 cache hits.

On the other hand we have the volatile proponents, who are concerned
that some code somewhere in the kernel might be buggy without the
volatile behaviour, and who also want to be able to remove some
barriers and thus save a few bytes of code and a few loads here and
there (and possibly some stores too).

Either way the effect on code size and execution time is miniscule.

In the end the strongest argument is actually that gcc generates
unnecessarily verbose code on x86[-64] for volatile accesses.  Even
then we're only talking about ~2000 bytes, or less than 1 byte per
instance of atomic_read on average, about 0.06% of the kernel text
size.

The x86[-64] developers seem to be willing to bear the debugging cost
involved in having the non-volatile behaviour for atomic_read, in
order to save the 2kB.  That's fine with me.  Either way I think
somebody should audit all the uses of atomic_read, not just for
missing barriers, but also to find the places where it's used in a
racy manner.  Then we can work out where the races matter and fix them
if they do.

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  9:15                                                       ` Nick Piggin
@ 2007-08-17 10:03                                                         ` Satyam Sharma
  2007-08-17 11:50                                                           ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 10:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> > > > Sure, now
> > > > that I learned of these properties I can start to audit code and insert
> > > > barriers where I believe they are needed, but this simply means that
> > > > almost all occurrences of atomic_read will get barriers (unless there
> > > > already are implicit but more or less obvious barriers like msleep).
> > > 
> > > You might find that these places that appear to need barriers are
> > > buggy for other reasons anyway. Can you point to some in-tree code
> > > we can have a look at?
> > 
> > 
> > Such code was mentioned elsewhere (query nodemgr_host_thread in cscope)
> > that managed to escape the requirement for a barrier only because of
> > some completely un-obvious compilation-unit-scope thing. But I find such
> > an non-explicit barrier quite bad taste. Stefan, do consider plunking an
> > explicit call to barrier() there.
> 
> It is very obvious. msleep calls schedule() (ie. sleeps), which is
> always a barrier.

Probably you didn't mean that, but no, schedule() is not barrier because
it sleeps. It's a barrier because it's invisible.

> The "unobvious" thing is that you wanted to know how the compiler knows
> a function is a barrier -- answer is that if it does not *know* it is not
> a barrier, it must assume it is a barrier.

True, that's clearly what happens here. But are you're definitely joking
that this is "obvious" in terms of code-clarity, right?

Just 5 minutes back you mentioned elsewhere you like seeing lots of
explicit calls to barrier() (with comments, no less, hmm? :-)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:52                         ` Andi Kleen
@ 2007-08-17 10:08                           ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 10:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Paul Mackerras, Nick Piggin, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, netdev,
	cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

On Fri, 17 Aug 2007, Andi Kleen wrote:

> On Friday 17 August 2007 05:42, Linus Torvalds wrote:
> > On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > > and it only saved 40 bytes (10 instructions) for a G5 config.
> >
> > One of the things that "volatile" generally screws up is a simple
> >
> > 	volatile int i;
> >
> > 	i++;
> 
> But for atomic_t people use atomic_inc() anyways which does this correctly.
> It shouldn't really matter for atomic_t.
> 
> I'm worrying a bit that the volatile atomic_t change caused subtle code 
> breakage like these delay read loops people here pointed out.

Umm, I followed most of the thread, but which breakage is this?

> Wouldn't it be safer to just re-add the volatile to atomic_read() 
> for 2.6.23? Or alternatively make it asm(), but volatile seems more
> proven.

The problem with volatile is not just trashy code generation (which also
definitely is a major problem), but definition holes, and implementation
inconsistencies. Making it asm() is not the only other alternative to
volatile either (read another reply to this mail), but considering most
of the thread has been about people not wanting even a
atomic_read_volatile() variant, making atomic_read() itself have volatile
semantics sounds ... strange :-)

PS: http://lkml.org/lkml/2007/8/15/407 was submitted a couple days back,
any word if you saw that?

I have another one for you:

[PATCH] i386, x86_64: __const_udelay() should not be marked inline

Because it can never get inlined in any callsite (each translation unit
is compiled separately for the kernel and so the implementation of
__const_udelay() would be invisible to all other callsites). In fact it
turns out, the correctness of callsites at arch/x86_64/kernel/crash.c:97
and arch/i386/kernel/crash.c:101 explicitly _depends_ upon it not being
inlined, and also it's an exported symbol (modules may want to call
mdelay() and udelay() that often becomes __const_udelay() after some
macro-ing in various headers). So let's not mark it as "inline" either.

Signed-off-by: Satyam Sharma <satyam@infradead.org>

---

 arch/i386/lib/delay.c   |    2 +-
 arch/x86_64/lib/delay.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/i386/lib/delay.c b/arch/i386/lib/delay.c
index f6edb11..0082c99 100644
--- a/arch/i386/lib/delay.c
+++ b/arch/i386/lib/delay.c
@@ -74,7 +74,7 @@ void __delay(unsigned long loops)
 	delay_fn(loops);
 }

-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
 	int d0;

diff --git a/arch/x86_64/lib/delay.c b/arch/x86_64/lib/delay.c
index 2dbebd3..d0cd9cd 100644
--- a/arch/x86_64/lib/delay.c
+++ b/arch/x86_64/lib/delay.c
@@ -38,7 +38,7 @@ void __delay(unsigned long loops)
 }
 EXPORT_SYMBOL(__delay);

-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
 	__delay(((xloops * HZ * cpu_data[raw_smp_processor_id()].loops_per_jiffy) >> 32) + 1);
 }

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  9:15                               ` Nick Piggin
@ 2007-08-17 10:12                                 ` Satyam Sharma
  2007-08-17 12:14                                   ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 10:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> > > Also, why would you want to make these insane accessors for atomic_t
> > > types? Just make sure everybody knows the basics of barriers, and they
> > > can apply that knowledge to atomic_t and all other lockless memory
> > > accesses as well.
> > 
> > 
> > Code that looks like:
> > 
> > 	while (!atomic_read(&v)) {
> > 		...
> > 		cpu_relax_no_barrier();
> > 		forget(v.counter);
> > 		        ^^^^^^^^
> > 	}
> > 
> > would be uglier. Also think about code such as:
> 
> I think they would both be equally ugly,

You think both these are equivalent in terms of "looks":

					|
while (!atomic_read(&v)) {		|	while (!atomic_read_xxx(&v)) {
	...				|		...
	cpu_relax_no_barrier();		|		cpu_relax_no_barrier();
	order_atomic(&v);		|	}
}					|

(where order_atomic() is an atomic_t
specific wrapper as you mentioned below)

?

Well, taste varies, but ...

> but the atomic_read_volatile
> variant would be more prone to subtle bugs because of the weird
> implementation.

What bugs?

> And it would be more ugly than introducing an order(x) statement for
> all memory operations, and adding an order_atomic() wrapper for it
> for atomic types.

Oh, that order() / forget() macro [forget() was named such by Chuck Ebbert
earlier in this thread where he first mentioned it, btw] could definitely
be generically introduced for any memory operations.

> > 	a = atomic_read();
> > 	if (!a)
> > 		do_something();
> > 
> > 	forget();
> > 	a = atomic_read();
> > 	... /* some code that depends on value of a, obviously */
> > 
> > 	forget();
> > 	a = atomic_read();
> > 	...
> > 
> > So much explicit sprinkling of "forget()" looks ugly.
> 
> Firstly, why is it ugly? It's nice because of those nice explicit
> statements there that give us a good heads up and would have some
> comments attached to them

atomic_read_xxx (where xxx = whatever naming sounds nice to you) would
obviously also give a heads up, and could also have some comments
attached to it.

> (also, lack of the word "volatile" is always a plus).

Ok, xxx != volatile.

> Secondly, what sort of code would do such a thing?

See the nodemgr_host_thread() that does something similar, though not
exactly same.

> > on the other hand, looks neater. The "_volatile()" suffix makes it also
> > no less explicit than an explicit barrier-like macro that this primitive
> > is something "special", for code clarity purposes.
> 
> Just don't use the word volatile,

That sounds amazingly frivolous, but hey, why not. As I said, ok,
xxx != volatile.

> and have barriers both before and after the memory operation,

How could that lead to bugs? (if you can point to existing code,
but just some testcase / sample code would be fine as well).

> [...] I don't see
> the point though, when you could just have a single barrier(x)
> barrier function defined for all memory locations,

As I said, barrier() is too heavy-handed.

> rather than
> this odd thing that only works for atomics

Why would it work only for atomics? You could use that generic macro
for anything you well damn please.

> (and would have to
> be duplicated for atomic_set.

#define atomic_set_xxx for something similar. Big deal ... NOT.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  9:48                               ` Paul Mackerras
@ 2007-08-17 10:23                                 ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 10:23 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Nick Piggin, Linus Torvalds, Segher Boessenkool, heiko.carstens,
	horms, Linux Kernel Mailing List, rpjday, ak, netdev, cfriesen,
	Andrew Morton, jesper.juhl, linux-arch, zlynx, clameter,
	schwidefsky, Chris Snook, Herbert Xu, davem, wensong, wjiang



On Fri, 17 Aug 2007, Paul Mackerras wrote:

> Satyam Sharma writes:
> 
> > I wonder if this'll generate smaller and better code than _both_ the
> > other atomic_read_volatile() variants. Would need to build allyesconfig
> > on lots of diff arch's etc to test the theory though.
> 
> I'm sure it would be a tiny effect.
> 
> This whole thread is arguing about effects that are quite
> insignificant.

Hmm, the fact that this thread became what it did, probably means that
most developers on this list do not mind thinking/arguing about effects
or optimizations that are otherwise "tiny". But yeah, they are tiny
nonetheless.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:06                                                   ` Nick Piggin
  2007-08-17  8:58                                                     ` Satyam Sharma
@ 2007-08-17 10:48                                                     ` Stefan Richter
  2007-08-17 10:58                                                       ` Stefan Richter
  2007-08-18 14:35                                                     ` LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures) Stefan Richter
  2 siblings, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-17 10:48 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Nick Piggin wrote:
> Stefan Richter wrote:
>> For architecture port authors, there is Documentation/atomic_ops.txt.
>> Driver authors also can learn something from that document, as it
>> indirectly documents the atomic_t and bitops APIs.
> 
> "Semantics and Behavior of Atomic and Bitmask Operations" is
> pretty direct :)

"Indirect", "pretty direct"... It's subjective.

(It is not an API documentation; it is an implementation specification.)

> Sure, it says that it's for arch maintainers, but there is no
> reason why users can't make use of it.
> 
> 
>> Prompted by this thread, I reread this document, and indeed, the
>> sentence "Unlike the above routines, it is required that explicit memory
>> barriers are performed before and after [atomic_{inc,dec}_return]"
>> indicates that atomic_read (one of the "above routines") is very
>> different from all other atomic_t accessors that return values.
>> 
>> This is strange.  Why is it that atomic_read stands out that way?  IMO
> 
> It is not just atomic_read of course. It is atomic_add,sub,inc,dec,set.

Yes, but unlike these, atomic_read returns a value.

Without me (the API user) providing extra barriers, that value may
become something else whenever someone touches code in the vicinity of
the atomic_read.

>> this API imbalance is quite unexpected by many people.  Wouldn't it be
>> beneficial to change the atomic_read API to behave the same like all
>> other atomic_t accessors that return values?
> 
> It is very consistent and well defined. Operations which both modify
> the data _and_ return something are defined to have full barriers
> before and after.

You are right, atomic_read is not only different from accessors that
don't retunr values, it is also different from all other accessors that
return values (because they all also modify the value).  There is just
no actual API documentation, which contributes to the issue that some
people (or at least one: me) learn a little bit late how special
atomic_read is.

> What do you want to add to the other atomic accessors? Full memory
> barriers? Only compiler barriers? It's quite likely that if you think
> some barriers will fix bugs, then there are other bugs lurking there
> anyway.

A lot of different though related issues are discussed in this thread,
but I personally am only occupied by one particular thing:  What kind of
return values do I get from atomic_read.

> Just use spinlocks if you're not absolutely clear about potential
> races and memory ordering issues -- they're pretty cheap and simple.

Probably good advice, like generally if driver guys consider lockless
algorithms.

>> OK, it is also different from the other accessors that return data in so
>> far as it doesn't modify the data.  But as driver "author", i.e. user of
>> the API, I can't see much use of an atomic_read that can be reordered
>> and, more importantly, can be optimized away by the compiler.
> 
> It will return to you an atomic snapshot of the data (loaded from
> memory at some point since the last compiler barrier). All you have
> to be aware of compiler barriers and the Linux SMP memory ordering
> model, which should be a given if you are writing lockless code.

OK, that's what I slowly realized during this discussion, and I
appreciate the explanations that were given here.

>> Sure, now
>> that I learned of these properties I can start to audit code and insert
>> barriers where I believe they are needed, but this simply means that
>> almost all occurrences of atomic_read will get barriers (unless there
>> already are implicit but more or less obvious barriers like msleep).
> 
> You might find that these places that appear to need barriers are
> buggy for other reasons anyway. Can you point to some in-tree code
> we can have a look at?

I could, or could not, if I were through with auditing the code.  I
remembered one case and posted it (nodemgr_host_thread) which was safe
because msleep_interruptible provided the necessary barrier there, and
this implicit barrier is not in danger to be removed by future patches.
-- 
Stefan Richter
-=====-=-=== =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  9:31                                                       ` Nick Piggin
@ 2007-08-17 10:55                                                         ` Satyam Sharma
  2007-08-17 12:39                                                           ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 10:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > [...]
> > The point is about *author expecations*. If people do expect atomic_read()
> > (or a variant thereof) to have volatile semantics, why not give them such
> > a variant?
> 
> Because they should be thinking about them in terms of barriers, over
> which the compiler / CPU is not to reorder accesses or cache memory
> operations, rather than "special" "volatile" accesses.

This is obviously just a taste thing. Whether to have that forget(x)
barrier as something author should explicitly sprinkle appropriately
in appropriate places in the code by himself or use a primitive that
includes it itself.

I'm not saying "taste matters aren't important" (they are), but I'm really
skeptical if most folks would find the former tasteful.

> > And by the way, the point is *also* about the fact that cpu_relax(), as
> > of today, implies a full memory clobber, which is not what a lot of such
> > loops want. (due to stuff mentioned elsewhere, summarized in that summary)
> 
> That's not the point,

That's definitely the point, why not. This is why "barrier()", being
heavy-handed, is not the best option.

> because as I also mentioned, the logical extention
> to Linux's barrier API to handle this is the order(x) macro. Again, not
> special volatile accessors.

Sure, that forget(x) macro _is_ proposed to be made part of the generic
API. Doesn't explain why not to define/use primitives that has volatility
semantics in itself, though (taste matters apart).

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 10:48                                                     ` Stefan Richter
@ 2007-08-17 10:58                                                       ` Stefan Richter
  0 siblings, 0 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-17 10:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

I wrote:
> Nick Piggin wrote:
>> You might find that these places that appear to need barriers are
>> buggy for other reasons anyway. Can you point to some in-tree code
>> we can have a look at?
> 
> I could, or could not, if I were through with auditing the code.  I
> remembered one case and posted it (nodemgr_host_thread) which was safe
> because msleep_interruptible provided the necessary barrier there, and
> this implicit barrier is not in danger to be removed by future patches.

PS, just in case anybody holds his breath for more example code from me,
I don't plan to continue with an actual audit of the drivers I maintain.
It's an important issue, but my current time budget will restrict me to
look at it ad hoc, per case.  (Open bugs have higher priority than
potential bugs.)
-- 
Stefan Richter
-=====-=-=== =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  8:38                                                   ` Nick Piggin
  2007-08-17  9:14                                                     ` Satyam Sharma
@ 2007-08-17 11:08                                                     ` Stefan Richter
  1 sibling, 0 replies; 657+ messages in thread
From: Stefan Richter @ 2007-08-17 11:08 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Satyam Sharma, Herbert Xu, Paul Mackerras, Linus Torvalds,
	Christoph Lameter, Chris Snook, Ilpo Jarvinen, Paul E. McKenney,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Nick Piggin wrote:
> Satyam Sharma wrote:
>> And we have driver / subsystem maintainers such as Stefan
>> coming up and admitting that often a lot of code that's written to use
>> atomic_read() does assume the read will not be elided by the compiler.
> 
> So these are broken on i386 and x86-64?

The ieee1394 and firewire subsystems have open, undiagnosed bugs, also
on i386 and x86-64.  But whether there is any bug because of wrong
assumptions about atomic_read among them, I don't know.  I don't know
which assumptions the authors made, I only know that I wasn't aware of
all the properties of atomic_read until now.

> Are they definitely safe on SMP and weakly ordered machines with
> just a simple compiler barrier there? Because I would not be
> surprised if there are a lot of developers who don't really know
> what to assume when it comes to memory ordering issues.
> 
> This is not a dig at driver writers: we still have memory ordering
> problems in the VM too (and probably most of the subtle bugs in
> lockless VM code are memory ordering ones). Let's not make up a
> false sense of security and hope that sprinkling volatile around
> will allow people to write bug-free lockless code. If a writer
> can't be bothered reading API documentation

...or, if there is none, the implementation specification (as in case of
the atomic ops), or, if there is none, the implementation (as in case of
a some infrastructure code here and there)...

> and learning the Linux memory model, they can still be productive
> writing safely locked code.

Provided they are aware that they might not have the full picture of the
lockless primitives.  :-)
-- 
Stefan Richter
-=====-=-=== =--- =---=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 10:03                                                         ` Satyam Sharma
@ 2007-08-17 11:50                                                           ` Nick Piggin
  2007-08-17 12:50                                                             ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17 11:50 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:

>
>On Fri, 17 Aug 2007, Nick Piggin wrote:
>
>
>>Satyam Sharma wrote:
>>
>>It is very obvious. msleep calls schedule() (ie. sleeps), which is
>>always a barrier.
>>
>
>Probably you didn't mean that, but no, schedule() is not barrier because
>it sleeps. It's a barrier because it's invisible.
>

Where did I say it is a barrier because it sleeps?

It is always a barrier because, at the lowest level, schedule() (and thus
anything that sleeps) is defined to always be a barrier. Regardless of
whatever obscure means the compiler might need to infer the barrier.

In other words, you can ignore those obscure details because schedule() is
always going to have an explicit barrier in it.

>>The "unobvious" thing is that you wanted to know how the compiler knows
>>a function is a barrier -- answer is that if it does not *know* it is not
>>a barrier, it must assume it is a barrier.
>>
>
>True, that's clearly what happens here. But are you're definitely joking
>that this is "obvious" in terms of code-clarity, right?
>

No. If you accept that barrier() is implemented correctly, and you know
that sleeping is defined to be a barrier, then its perfectly clear. You
don't have to know how the compiler "knows" that some function contains
a barrier.

>Just 5 minutes back you mentioned elsewhere you like seeing lots of
>explicit calls to barrier() (with comments, no less, hmm? :-)
>

Sure, but there are well known primitives which contain barriers, and
trivial recognisable code sequences for which you don't need comments.
waiting-loops using sleeps or cpu_relax() are prime examples.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 10:12                                 ` Satyam Sharma
@ 2007-08-17 12:14                                   ` Nick Piggin
  2007-08-17 13:05                                     ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17 12:14 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

Satyam Sharma wrote:

>
>On Fri, 17 Aug 2007, Nick Piggin wrote:
>
>>I think they would both be equally ugly,
>>
>
>You think both these are equivalent in terms of "looks":
>
>					|
>while (!atomic_read(&v)) {		|	while (!atomic_read_xxx(&v)) {
>	...				|		...
>	cpu_relax_no_barrier();		|		cpu_relax_no_barrier();
>	order_atomic(&v);		|	}
>}					|
>
>(where order_atomic() is an atomic_t
>specific wrapper as you mentioned below)
>
>?
>

I think the LHS is better if your atomic_read_xxx primitive is using the
crazy one-sided barrier, because the LHS code you immediately know what
barriers are happening, and with the RHS you have to look at the 
atomic_read_xxx
definition.

If your atomic_read_xxx implementation was more intuitive, then both are
pretty well equal. More lines != ugly code.

>>but the atomic_read_volatile
>>variant would be more prone to subtle bugs because of the weird
>>implementation.
>>
>
>What bugs?
>

You can't think for yourself? Your atomic_read_volatile contains a compiler
barrier to the atomic variable before the load. 2 such reads from different
locations look like this:

asm volatile("" : "+m" (v1));
atomic_read(&v1);
asm volatile("" : "+m" (v2));
atomic_read(&v2);

Which implies that the load of v1 can be reordered to occur after the load
of v2. Bet you didn't expect that?

>>Secondly, what sort of code would do such a thing?
>>
>
>See the nodemgr_host_thread() that does something similar, though not
>exactly same.
>

I'm sorry, all this waffling about made up code which might do this and
that is just a waste of time. Seriously, the thread is bloated enough
and never going to get anywhere with all this handwaving. If someone is
saving up all the really real and actually good arguments for why we
must have a volatile here, now is the time to use them.

>>and have barriers both before and after the memory operation,
>>
>
>How could that lead to bugs? (if you can point to existing code,
>but just some testcase / sample code would be fine as well).
>

See above.

>As I said, barrier() is too heavy-handed.
>

Typo. I meant: defined for a single memory location (ie. order(x)).

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 10:55                                                         ` Satyam Sharma
@ 2007-08-17 12:39                                                           ` Nick Piggin
  2007-08-17 13:36                                                             ` Satyam Sharma
  2007-08-17 16:48                                                             ` Linus Torvalds
  0 siblings, 2 replies; 657+ messages in thread
From: Nick Piggin @ 2007-08-17 12:39 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:

>
>On Fri, 17 Aug 2007, Nick Piggin wrote:
>
>
>>Because they should be thinking about them in terms of barriers, over
>>which the compiler / CPU is not to reorder accesses or cache memory
>>operations, rather than "special" "volatile" accesses.
>>
>
>This is obviously just a taste thing. Whether to have that forget(x)
>barrier as something author should explicitly sprinkle appropriately
>in appropriate places in the code by himself or use a primitive that
>includes it itself.
>

That's not obviously just taste to me. Not when the primitive has many
(perhaps, the majority) of uses that do not require said barriers. And
this is not solely about the code generation (which, as Paul says, is
relatively minor even on x86). I prefer people to think explicitly
about barriers in their lockless code.

>I'm not saying "taste matters aren't important" (they are), but I'm really
>skeptical if most folks would find the former tasteful.
>

So I /do/ have better taste than most folks? Thanks! :-)

>>>And by the way, the point is *also* about the fact that cpu_relax(), as
>>>of today, implies a full memory clobber, which is not what a lot of such
>>>loops want. (due to stuff mentioned elsewhere, summarized in that summary)
>>>
>>That's not the point,
>>
>
>That's definitely the point, why not. This is why "barrier()", being
>heavy-handed, is not the best option.
>

That is _not_ the point (of why a volatile atomic_read is good) because 
there
has already been an alternative posted that better conforms with Linux 
barrier
API and is much more widely useful and more usable. If you are so 
worried about
barrier() being too heavyweight, then you're off to a poor start by 
wanting to
add a few K of kernel text by making atomic_read volatile.

>>because as I also mentioned, the logical extention
>>to Linux's barrier API to handle this is the order(x) macro. Again, not
>>special volatile accessors.
>>
>
>Sure, that forget(x) macro _is_ proposed to be made part of the generic
>API. Doesn't explain why not to define/use primitives that has volatility
>semantics in itself, though (taste matters apart).
>

If you follow the discussion.... You were thinking of a reason why the
semantics *should* be changed or added, and I was rebutting your argument
that it must be used when a full barrier() is too heavy (ie. by pointing
out that order() has superior semantics anyway).

Why do I keep repeating the same things? I'll not continue bloating this
thread until a new valid point comes up...

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 11:50                                                           ` Nick Piggin
@ 2007-08-17 12:50                                                             ` Satyam Sharma
  2007-08-17 12:56                                                               ` Nick Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 12:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> > > Satyam Sharma wrote:
> > > 
> > > It is very obvious. msleep calls schedule() (ie. sleeps), which is
> > > always a barrier.
> > 
> > Probably you didn't mean that, but no, schedule() is not barrier because
> > it sleeps. It's a barrier because it's invisible.
> 
> Where did I say it is a barrier because it sleeps?

Just below. What you wrote:

> It is always a barrier because, at the lowest level, schedule() (and thus
> anything that sleeps) is defined to always be a barrier.

"It is always a barrier because, at the lowest level, anything that sleeps
is defined to always be a barrier".

> Regardless of
> whatever obscure means the compiler might need to infer the barrier.
> 
> In other words, you can ignore those obscure details because schedule() is
> always going to have an explicit barrier in it.

I didn't quite understand what you said here, so I'll tell what I think:

* foo() is a compiler barrier if the definition of foo() is invisible to
  the compiler at a callsite.

* foo() is also a compiler barrier if the definition of foo() includes
  a barrier, and it is inlined at the callsite.

If the above is wrong, or if there's something else at play as well,
do let me know.

> > > The "unobvious" thing is that you wanted to know how the compiler knows
> > > a function is a barrier -- answer is that if it does not *know* it is not
> > > a barrier, it must assume it is a barrier.
> > 
> > True, that's clearly what happens here. But are you're definitely joking
> > that this is "obvious" in terms of code-clarity, right?
> 
> No. If you accept that barrier() is implemented correctly, and you know
> that sleeping is defined to be a barrier,

Curiously, that's the second time you've said "sleeping is defined to
be a (compiler) barrier". How does the compiler even know if foo() is
a function that "sleeps"? Do compilers have some notion of "sleeping"
to ensure they automatically assume a compiler barrier whenever such
a function is called? Or are you saying that the compiler can see the
barrier() inside said function ... nopes, you're saying quite the
opposite below.

> then its perfectly clear. You
> don't have to know how the compiler "knows" that some function contains
> a barrier.

I think I do, why not? Would appreciate if you could elaborate on this.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 12:50                                                             ` Satyam Sharma
@ 2007-08-17 12:56                                                               ` Nick Piggin
  2007-08-18  2:15                                                                 ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Nick Piggin @ 2007-08-17 12:56 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Satyam Sharma wrote:

>
>On Fri, 17 Aug 2007, Nick Piggin wrote:
>
>
>>Satyam Sharma wrote:
>>
>>>On Fri, 17 Aug 2007, Nick Piggin wrote:
>>>
>>>>Satyam Sharma wrote:
>>>>
>>>>It is very obvious. msleep calls schedule() (ie. sleeps), which is
>>>>always a barrier.
>>>>
>>>Probably you didn't mean that, but no, schedule() is not barrier because
>>>it sleeps. It's a barrier because it's invisible.
>>>
>>Where did I say it is a barrier because it sleeps?
>>
>
>Just below. What you wrote:
>
>
>>It is always a barrier because, at the lowest level, schedule() (and thus
>>anything that sleeps) is defined to always be a barrier.
>>
>
>"It is always a barrier because, at the lowest level, anything that sleeps
>is defined to always be a barrier".
>

... because it must call schedule and schedule is a barrier.


>>Regardless of
>>whatever obscure means the compiler might need to infer the barrier.
>>
>>In other words, you can ignore those obscure details because schedule() is
>>always going to have an explicit barrier in it.
>>
>
>I didn't quite understand what you said here, so I'll tell what I think:
>
>* foo() is a compiler barrier if the definition of foo() is invisible to
>  the compiler at a callsite.
>
>* foo() is also a compiler barrier if the definition of foo() includes
>  a barrier, and it is inlined at the callsite.
>
>If the above is wrong, or if there's something else at play as well,
>do let me know.
>

Right.


>>>>The "unobvious" thing is that you wanted to know how the compiler knows
>>>>a function is a barrier -- answer is that if it does not *know* it is not
>>>>a barrier, it must assume it is a barrier.
>>>>
>>>True, that's clearly what happens here. But are you're definitely joking
>>>that this is "obvious" in terms of code-clarity, right?
>>>
>>No. If you accept that barrier() is implemented correctly, and you know
>>that sleeping is defined to be a barrier,
>>
>
>Curiously, that's the second time you've said "sleeping is defined to
>be a (compiler) barrier".
>

_In Linux,_ sleeping is defined to be a compiler barrier.

>How does the compiler even know if foo() is
>a function that "sleeps"? Do compilers have some notion of "sleeping"
>to ensure they automatically assume a compiler barrier whenever such
>a function is called? Or are you saying that the compiler can see the
>barrier() inside said function ... nopes, you're saying quite the
>opposite below.
>

You're getting too worried about the compiler implementation. Start
by assuming that it does work ;)


>>then its perfectly clear. You
>>don't have to know how the compiler "knows" that some function contains
>>a barrier.
>>
>
>I think I do, why not? Would appreciate if you could elaborate on this.
>

If a function is not completely visible to the compiler (so it can't
determine whether a barrier could be in it or not), then it must always
assume it will contain a barrier so it always does the right thing.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 12:14                                   ` Nick Piggin
@ 2007-08-17 13:05                                     ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 13:05 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linus Torvalds, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, Linux Kernel Mailing List, rpjday, ak,
	netdev, cfriesen, Andrew Morton, jesper.juhl, linux-arch, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> [...]
> > You think both these are equivalent in terms of "looks":
> > 
> > 					|
> > while (!atomic_read(&v)) {		|	while (!atomic_read_xxx(&v)) {
> > 	...				|		...
> > 	cpu_relax_no_barrier();		|
> > cpu_relax_no_barrier();
> > 	order_atomic(&v);		|	}
> > }					|
> > 
> > (where order_atomic() is an atomic_t
> > specific wrapper as you mentioned below)
> > 
> > ?
> 
> I think the LHS is better if your atomic_read_xxx primitive is using the
> crazy one-sided barrier,
  ^^^^^

I'd say it's purposefully one-sided.

> because the LHS code you immediately know what
> barriers are happening, and with the RHS you have to look at the
> atomic_read_xxx
> definition.

No. As I said, the _xxx (whatever the heck you want to name it as) should
give the same heads-up that your "order_atomic" thing is supposed to give.


> If your atomic_read_xxx implementation was more intuitive, then both are
> pretty well equal. More lines != ugly code.
> 
> > [...]
> > What bugs?
> 
> You can't think for yourself? Your atomic_read_volatile contains a compiler
> barrier to the atomic variable before the load. 2 such reads from different
> locations look like this:
> 
> asm volatile("" : "+m" (v1));
> atomic_read(&v1);
> asm volatile("" : "+m" (v2));
> atomic_read(&v2);
> 
> Which implies that the load of v1 can be reordered to occur after the load
> of v2.

And how would that be a bug? (sorry, I really can't think for myself)


> > > Secondly, what sort of code would do such a thing?
> > 
> > See the nodemgr_host_thread() that does something similar, though not
> > exactly same.
> 
> I'm sorry, all this waffling about made up code which might do this and
> that is just a waste of time.

First, you could try looking at the code.

And by the way, as I've already said (why do *require* people to have to
repeat things to you?) this isn't even about only existing code.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 12:39                                                           ` Nick Piggin
@ 2007-08-17 13:36                                                             ` Satyam Sharma
  2007-08-17 16:48                                                             ` Linus Torvalds
  1 sibling, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 13:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, Paul Mackerras, Linus Torvalds, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> > 
> > > Because they should be thinking about them in terms of barriers, over
> > > which the compiler / CPU is not to reorder accesses or cache memory
> > > operations, rather than "special" "volatile" accesses.
> > 
> > This is obviously just a taste thing. Whether to have that forget(x)
> > barrier as something author should explicitly sprinkle appropriately
> > in appropriate places in the code by himself or use a primitive that
> > includes it itself.
> 
> That's not obviously just taste to me. Not when the primitive has many
> (perhaps, the majority) of uses that do not require said barriers. And
> this is not solely about the code generation (which, as Paul says, is
> relatively minor even on x86).

See, you do *require* people to have to repeat the same things to you!

As has been written about enough times already, and if you followed the
discussion on this thread, I am *not* proposing that atomic_read()'s
semantics be changed to have any extra barriers. What is proposed is a
different atomic_read_xxx() variant thereof, that those can use who do
want that.

Now whether to have a kind of barrier ("volatile", whatever) in the
atomic_read_xxx() itself, or whether to make the code writer himself to
explicitly write the order(x) appropriately in appropriate places in the
code _is_ a matter of taste.

> > That's definitely the point, why not. This is why "barrier()", being
> > heavy-handed, is not the best option.
> 
> That is _not_ the point [...]

Again, you're requiring me to repeat things that were already made evident
on this thread (if you follow it).

This _is_ the point, because a lot of loops out there (too many of them,
I WILL NOT bother citing file_name:line_number) end up having to use a
barrier just because they're using a loop-exit-condition that depends
on a value returned by atomic_read(). It would be good for them if they
used an atomic_read_xxx() primitive that gave these "volatility" semantics
without junking compiler optimizations for other memory references.

> because there has already been an alternative posted

Whether that alternative (explicitly using forget(x), or wrappers thereof,
such as the "order_atomic" you proposed) is better than other alternatives
(such as atomic_read_xxx() which includes the volatility behaviour in
itself) is still open, and precisely what we started discussing just one
mail back.

(The above was also mostly stuff I had to repeated for you, sadly.)

> that better conforms with Linux barrier
> API and is much more widely useful and more usable.

I don't think so.

(Now *this* _is_ the "taste-dependent matter" that I mentioned earlier.)

> If you are so worried
> about
> barrier() being too heavyweight, then you're off to a poor start by wanting to
> add a few K of kernel text by making atomic_read volatile.

Repeating myself, for the N'th time, NO, I DON'T want to make atomic_read
have "volatile" semantics.

> > > because as I also mentioned, the logical extention
> > > to Linux's barrier API to handle this is the order(x) macro. Again, not
> > > special volatile accessors.
> > 
> > Sure, that forget(x) macro _is_ proposed to be made part of the generic
> > API. Doesn't explain why not to define/use primitives that has volatility
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > semantics in itself, though (taste matters apart).
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> If you follow the discussion.... You were thinking of a reason why the
> semantics *should* be changed or added, and I was rebutting your argument
> that it must be used when a full barrier() is too heavy (ie. by pointing
> out that order() has superior semantics anyway).

Amazing. Either you have reading comprehension problems, or else, please
try reading this thread (or at least this sub-thread) again. I don't want
_you_ blaming _me_ for having to repeat things to you all over again.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  7:39                                                   ` Satyam Sharma
@ 2007-08-17 14:31                                                     ` Paul E. McKenney
  2007-08-17 18:31                                                       ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17 14:31 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote:
> 
> 
> On Thu, 16 Aug 2007, Paul E. McKenney wrote:
> 
> > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > > On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
> > > >
> > > > The compiler can also reorder non-volatile accesses.  For an example
> > > > patch that cares about this, please see:
> > > > 
> > > > 	http://lkml.org/lkml/2007/8/7/280
> > > > 
> > > > This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
> > > > rcu_read_unlock() to ensure that accesses aren't reordered with respect
> > > > to interrupt handlers and NMIs/SMIs running on that same CPU.
> > > 
> > > Good, finally we have some code to discuss (even though it's
> > > not actually in the kernel yet).
> > 
> > There was some earlier in this thread as well.
> 
> Hmm, I never quite got what all this interrupt/NMI/SMI handling and
> RCU business you mentioned earlier was all about, but now that you've
> pointed to the actual code and issues with it ...

Glad to help...

> > > First of all, I think this illustrates that what you want
> > > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > > macro occurs a lot more times in your patch than atomic
> > > reads/sets.  So *assuming* that it was necessary at all,
> > > then having an ordered variant of the atomic_read/atomic_set
> > > ops could do just as well.
> > 
> > Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> > to maintain ordering, then I could just use them instead of having to
> > create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> > different patch.)
> 
> +#define WHATEVER(x)	(*(volatile typeof(x) *)&(x))
> 
> I suppose one could want volatile access semantics for stuff that's
> a bit-field too, no?

One could, but this is not supported in general.  So if you want that,
you need to use the usual bit-mask tricks and (for setting) atomic
operations.

> Also, this gives *zero* "re-ordering" guarantees that your code wants
> as you've explained it below) -- neither w.r.t. CPU re-ordering (which
> probably you don't care about) *nor* w.r.t. compiler re-ordering
> (which you definitely _do_ care about).

You are correct about CPU re-ordering (and about the fact that this
example doesn't care about it), but not about compiler re-ordering.

The compiler is prohibited from moving a volatile access across a sequence
point.  One example of a sequence point is a statement boundary.  Because
all of the volatile accesses in this code are separated by statement
boundaries, a conforming compiler is prohibited from reordering them.

> > > However, I still don't know which atomic_read/atomic_set in
> > > your patch would be broken if there were no volatile.  Could
> > > you please point them out?
> > 
> > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> > atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> > 
> > o	If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
> > 	was ordered by the compiler after
> > 	"ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
> > 	suppose an NMI/SMI happened after the rcu_read_lock_nesting but
> > 	before the rcu_flipctr.
> > 
> > 	Then if there was an rcu_read_lock() in the SMI/NMI
> > 	handler (which is perfectly legal), the nested rcu_read_lock()
> > 	would believe that it could take the then-clause of the
> > 	enclosing "if" statement.  But because the rcu_flipctr per-CPU
> > 	variable had not yet been incremented, an RCU updater would
> > 	be within its rights to assume that there were no RCU reads
> > 	in progress, thus possibly yanking a data structure out from
> > 	under the reader in the SMI/NMI function.
> > 
> > 	Fatal outcome.  Note that only one CPU is involved here
> > 	because these are all either per-CPU or per-task variables.
> 
> Ok, so you don't care about CPU re-ordering. Still, I should let you know
> that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you
> want is a full compiler optimization barrier().

No.  See above.

> [ Your code probably works now, and emits correct code, but that's
>   just because of gcc did what it did. Nothing in any standard,
>   or in any documented behaviour of gcc, or anything about the real
>   (or expected) semantics of "volatile" is protecting the code here. ]

Really?  Why doesn't the prohibition against moving volatile accesses
across sequence points take care of this?

> > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
> > 	was ordered by the compiler to follow the
> > 	"ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
> > 	happened between the two, then an __rcu_read_lock() in the NMI/SMI
> > 	would incorrectly take the "else" clause of the enclosing "if"
> > 	statement.  If some other CPU flipped the rcu_ctrlblk.completed
> > 	in the meantime, then the __rcu_read_lock() would (correctly)
> > 	write the new value into rcu_flipctr_idx.
> > 
> > 	Well and good so far.  But the problem arises in
> > 	__rcu_read_unlock(), which then decrements the wrong counter.
> > 	Depending on exactly how subsequent events played out, this could
> > 	result in either prematurely ending grace periods or never-ending
> > 	grace periods, both of which are fatal outcomes.
> > 
> > And the following are not needed in the current version of the
> > patch, but will be in a future version that either avoids disabling
> > irqs or that dispenses with the smp_read_barrier_depends() that I
> > have 99% convinced myself is unneeded:
> > 
> > o	nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> > 
> > o	idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;
> > 
> > Furthermore, in that future version, irq handlers can cause the same
> > mischief that SMI/NMI handlers can in this version.
> > 
> > Next, looking at __rcu_read_unlock():
> > 
> > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting - 1"
> > 	was reordered by the compiler to follow the
> > 	"ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--",
> > 	then if an NMI/SMI containing an rcu_read_lock() occurs between
> > 	the two, this nested rcu_read_lock() would incorrectly believe
> > 	that it was protected by an enclosing RCU read-side critical
> > 	section as described in the first reversal discussed for
> > 	__rcu_read_lock() above.  Again, fatal outcome.
> > 
> > This is what we have now.  It is not hard to imagine situations that
> > interact with -both- interrupt handlers -and- other CPUs, as described
> > earlier.
> 
> It's not about interrupt/SMI/NMI handlers at all! What you clearly want,
> simply put, is that a certain stream of C statements must be emitted
> by the compiler _as they are_ with no re-ordering optimizations! You must
> *definitely* use barrier(), IMHO.

Almost.  I don't care about most of the operations, only about the loads
and stores marked volatile.  Again, although the compiler is free to
reorder volatile accesses that occur -within- a single statement, it
is prohibited by the standard from moving volatile accesses from one
statement to another.  Therefore, this code can legitimately use volatile.

Or am I missing something subtle?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 12:39                                                           ` Nick Piggin
  2007-08-17 13:36                                                             ` Satyam Sharma
@ 2007-08-17 16:48                                                             ` Linus Torvalds
  2007-08-17 18:50                                                               ` Chris Friesen
                                                                                 ` (2 more replies)
  1 sibling, 3 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-17 16:48 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Satyam Sharma, Herbert Xu, Paul Mackerras, Christoph Lameter,
	Chris Snook, Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> That's not obviously just taste to me. Not when the primitive has many
> (perhaps, the majority) of uses that do not require said barriers. And
> this is not solely about the code generation (which, as Paul says, is
> relatively minor even on x86). I prefer people to think explicitly
> about barriers in their lockless code.

Indeed.

I think the important issues are:

 - "volatile" itself is simply a badly/weakly defined issue. The semantics 
   of it as far as the compiler is concerned are really not very good, and 
   in practice tends to boil down to "I will generate so bad code that 
   nobody can accuse me of optimizing anything away".

 - "volatile" - regardless of how well or badly defined it is - is purely 
   a compiler thing. It has absolutely no meaning for the CPU itself, so 
   it at no point implies any CPU barriers. As a result, even if the 
   compiler generates crap code and doesn't re-order anything, there's 
   nothing that says what the CPU will do.

 - in other words, the *only* possible meaning for "volatile" is a purely 
   single-CPU meaning. And if you only have a single CPU involved in the 
   process, the "volatile" is by definition pointless (because even 
   without a volatile, the compiler is required to make the C code appear 
   consistent as far as a single CPU is concerned).

So, let's take the example *buggy* code where we use "volatile" to wait 
for other CPU's:

	atomic_set(&var, 0);
	while (!atomic_read(&var))
		/* nothing */;

which generates an endless loop if we don't have atomic_read() imply 
volatile.

The point here is that it's buggy whether the volatile is there or not! 
Exactly because the user expects multi-processing behaviour, but 
"volatile" doesn't actually give any real guarantees about it. Another CPU 
may have done:

	external_ptr = kmalloc(..);
	/* Setup is now complete, inform the waiter */
	atomic_inc(&var);

but the fact is, since the other CPU isn't serialized in any way, the 
"while-loop" (even in the presense of "volatile") doesn't actually work 
right! Whatever the "atomic_read()" was waiting for may not have 
completed, because we have no barriers!

So if "volatile" makes a difference, it is invariably a sign of a bug in 
serialization (the one exception is for IO - we use "volatile" to avoid 
having to use inline asm for IO on x86) - and for "random values" like 
jiffies).

So the question should *not* be whether "volatile" actually fixes bugs. It 
*never* fixes a bug. But what it can do is to hide the obvious ones. In 
other words, adding a volaile in the above kind of situation of 
"atomic_read()" will certainly turn an obvious bug into something that 
works "practically all of the time).

So anybody who argues for "volatile" fixing bugs is fundamentally 
incorrect. It does NO SUCH THING. By arguing that, such people only show 
that you have no idea what they are talking about.

So the only reason to add back "volatile" to the atomic_read() sequence is 
not to fix bugs, but to _hide_ the bugs better. They're still there, they 
are just a lot harder to trigger, and tend to be a lot subtler.

And hey, sometimes "hiding bugs well enough" is ok. In this case, I'd 
argue that we've successfully *not* had the volatile there for eight 
months on x86-64, and that should tell people something. 

(Does _removing_ the volatile fix bugs? No - callers still need to think 
about barriers etc, and lots of people don't. So I'm not claiming that 
removing volatile fixes any bugs either, but I *am* claiming that:

 - removing volatile makes some bugs easier to see (which is mostly a good 
   thing: they were there before, anyway).

 - removing volatile generates better code (which is a good thing, even if 
   it's just 0.1%)

 - removing volatile removes a huge mental *bug* that lots of people seem 
   to have, as shown by this whole thread. Anybody who thinks that 
   "volatile" actually fixes anything has a gaping hole in their head, and 
   we should remove volatile just to make sure that nobody thinks that it 
   means soemthign that it doesn't mean!

In other words, this whole discussion has just convinced me that we should 
*not* add back "volatile" to "atomic_read()" - I was willing to do it for 
practical and "hide the bugs" reasons, but having seen people argue for 
it, thinking that it actually fixes something, I'm now convinced that the 
*last* thing we should do is to encourage that kind of superstitious 
thinking.

"volatile" is like a black cat crossing the road. Sure, it affects 
*something* (at a minimum: before, the black cat was on one side of the 
road, afterwards it is on the other side of the road), but it has no 
bigger and longer-lasting direct affects. 

People who think "volatile" really matters are just fooling themselves.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  2:19                   ` Nick Piggin
  2007-08-17  3:16                     ` Paul Mackerras
@ 2007-08-17 17:37                     ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 17:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx, satyam,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem, wensong,
	wjiang

>>>>>> Part of the motivation here is to fix heisenbugs.  If I knew 
>>>>>> where they
>>>>>
>>>>> By the same token we should probably disable optimisations
>>>>> altogether since that too can create heisenbugs.
>>>>
>>>> Almost everything is a tradeoff; and so is this.  I don't
>>>> believe most people would find disabling all compiler
>>>> optimisations an acceptable price to pay for some peace
>>>> of mind.
>>>
>>>
>>> So why is this a good tradeoff?
>> It certainly is better than disabling all compiler optimisations!
>
> It's easy to be better than something really stupid :)

Sure, it wasn't me who made the comparison though.

> So i386 and x86-64 don't have volatiles there, and it saves them a
> few K of kernel text.

Which has to be investigated.  A few kB is a lot more than expected.

> What you need to justify is why it is a good
> tradeoff to make them volatile (which btw, is much harder to go
> the other way after we let people make those assumptions).

My point is that people *already* made those assumptions.  There
are two ways to clean up this mess:

1) Have the "volatile" semantics by default, change the users
    that don't need it;
2) Have "non-volatile" semantics by default, change the users
    that do need it.

Option 2) randomly breaks stuff all over the place, option 1)
doesn't.  Yeah 1) could cause some extremely minor speed or
code size regression, but only temporarily until everything has
been audited.

>>> I also think that just adding things to APIs in the hope it might fix
>>> up some bugs isn't really a good road to go down. Where do you stop?
>> I look at it the other way: keeping the "volatile" semantics in
>> atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
>
> Yeah, but we could add lots of things to help prevent bugs and
> would never be included. I would also contend that it helps _hide_
> bugs and encourages people to be lazy when thinking about these
> things.

Sure.  We aren't _adding_ anything here though, not on the platforms
where it is most likely to show up, anyway.

> Also, you dismiss the fact that we'd actually be *adding* volatile
> semantics back to the 2 most widely tested architectures (in terms
> of test time, number of testers, variety of configurations, and
> coverage of driver code).

I'm not dismissing that.  x86 however is one of the few architectures
where mistakenly leaving out a "volatile" will not easily show up on
user testing, since the compiler will very often produce a memory
reference anyway because it has no registers to play with.

> This is a very important different from
> just keeping volatile semantics because it is basically a one-way
> API change.

That's a good point.  Maybe we should create _two_ new APIs, one
explicitly going each way.

>> certainly most people expect that behaviour, and also that behaviour
>> is *needed* in some places and no other interface provides that
>> functionality.
>
> I don't know that most people would expect that behaviour.

I didn't conduct any formal poll either :-)

> Is there any documentation anywhere that would suggest this?

Not really I think, no.  But not the other way around, either.
Most uses of it seem to expect it though.

>> [some confusion about barriers wrt atomics snipped]
>
> What were you confused about?

Me?  Not much.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-16 20:20                                       ` Christoph Lameter
  2007-08-17  1:02                                         ` Paul E. McKenney
  2007-08-17  2:16                                         ` Paul Mackerras
@ 2007-08-17 17:41                                         ` Segher Boessenkool
  2007-08-17 18:38                                           ` Satyam Sharma
  2007-09-10 18:59                                           ` Christoph Lameter
  2 siblings, 2 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>> atomic_dec() already has volatile behavior everywhere, so this is 
>> semantically
>> okay, but this code (and any like it) should be calling cpu_relax() 
>> each
>> iteration through the loop, unless there's a compelling reason not 
>> to.  I'll
>> allow that for some hardware drivers (possibly this one) such a 
>> compelling
>> reason may exist, but hardware-independent core subsystems probably 
>> have no
>> excuse.
>
> No it does not have any volatile semantics. atomic_dec() can be 
> reordered
> at will by the compiler within the current basic unit if you do not 
> add a
> barrier.

"volatile" has nothing to do with reordering.  atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 14:31                                                     ` Paul E. McKenney
@ 2007-08-17 18:31                                                       ` Satyam Sharma
  2007-08-17 18:56                                                         ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 18:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Fri, 17 Aug 2007, Paul E. McKenney wrote:

> On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote:
> > 
> > On Thu, 16 Aug 2007, Paul E. McKenney wrote:
> > 
> > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > > > 
> > > > First of all, I think this illustrates that what you want
> > > > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > > > macro occurs a lot more times in your patch than atomic
> > > > reads/sets.  So *assuming* that it was necessary at all,
> > > > then having an ordered variant of the atomic_read/atomic_set
> > > > ops could do just as well.
> > > 
> > > Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> > > to maintain ordering, then I could just use them instead of having to
> > > create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> > > different patch.)
> > 
> > +#define WHATEVER(x)	(*(volatile typeof(x) *)&(x))
> > [...]
> > Also, this gives *zero* "re-ordering" guarantees that your code wants
> > as you've explained it below) -- neither w.r.t. CPU re-ordering (which
> > probably you don't care about) *nor* w.r.t. compiler re-ordering
> > (which you definitely _do_ care about).
> 
> You are correct about CPU re-ordering (and about the fact that this
> example doesn't care about it), but not about compiler re-ordering.
> 
> The compiler is prohibited from moving a volatile access across a sequence
> point.  One example of a sequence point is a statement boundary.  Because
> all of the volatile accesses in this code are separated by statement
> boundaries, a conforming compiler is prohibited from reordering them.

Yes, you're right, and I believe precisely this was discussed elsewhere
as well today.

But I'd call attention to what Herbert mentioned there. You're using
ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all:

* Member "completed" of struct rcu_ctrlblk is a long.
* Per-cpu variable rcu_flipctr is an array of ints.
* Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of
  struct task_struct are ints.

So are you saying you're "having to use" this volatile-access macro
because you *couldn't* declare all the above as atomic_t and thus just
expect the right thing to happen by using the atomic ops API by default,
because it lacks volatile access semantics (on x86)?

If so, then I wonder if using the volatile access cast is really the
best way to achieve (at least in terms of code clarity) the kind of
re-ordering guarantees it wants there. (there could be alternative
solutions, such as using barrier(), or that at bottom of this mail)

What I mean is this: If you switch to atomic_t, and x86 switched to
make atomic_t have "volatile" semantics by default, the statements
would be simply a string of: atomic_inc(), atomic_add(), atomic_set(),
and atomic_read() statements, and nothing in there that clearly makes
it *explicit* that the code is correct (and not buggy) simply because
of the re-ordering guarantees that the C "volatile" type-qualifier
keyword gives us as per the standard. But now we're firmly in
"subjective" territory, so you or anybody could legitimately disagree.

> > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> > > atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> > > 
> > > o	If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
> > > 	was ordered by the compiler after
> > > 	"ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
> > > 	suppose an NMI/SMI happened after the rcu_read_lock_nesting but
> > > 	before the rcu_flipctr.
> > > 
> > > 	Then if there was an rcu_read_lock() in the SMI/NMI
> > > 	handler (which is perfectly legal), the nested rcu_read_lock()
> > > 	would believe that it could take the then-clause of the
> > > 	enclosing "if" statement.  But because the rcu_flipctr per-CPU
> > > 	variable had not yet been incremented, an RCU updater would
> > > 	be within its rights to assume that there were no RCU reads
> > > 	in progress, thus possibly yanking a data structure out from
> > > 	under the reader in the SMI/NMI function.
> > > 
> > > 	Fatal outcome.  Note that only one CPU is involved here
> > > 	because these are all either per-CPU or per-task variables.
> > 
> > Ok, so you don't care about CPU re-ordering. Still, I should let you know
> > that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you
> > want is a full compiler optimization barrier().
> 
> No.  See above.

True, *(volatile foo *)& _will_ work for this case.

But multiple calls to barrier() (granted, would invalidate all other
optimizations also) would work as well, would it not?

[ Interestingly, if you declared all those objects mentioned earlier as
  atomic_t, and x86(-64) switched to an __asm__ __volatile__ based variant
  for atomic_{read,set}_volatile(), the bugs you want to avoid would still
  be there. "volatile" the C language type-qualifier does have compiler
  re-ordering semantics you mentioned earlier, but the "volatile" that
  applies to inline asm()s gives no re-ordering guarantees. ]

> > > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
> > > 	was ordered by the compiler to follow the
> > > 	"ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
> > > 	happened between the two, then an __rcu_read_lock() in the NMI/SMI
> > > 	would incorrectly take the "else" clause of the enclosing "if"
> > > 	statement.  If some other CPU flipped the rcu_ctrlblk.completed
> > > 	in the meantime, then the __rcu_read_lock() would (correctly)
> > > 	write the new value into rcu_flipctr_idx.
> > > 
> > > 	Well and good so far.  But the problem arises in
> > > 	__rcu_read_unlock(), which then decrements the wrong counter.
> > > 	Depending on exactly how subsequent events played out, this could
> > > 	result in either prematurely ending grace periods or never-ending
> > > 	grace periods, both of which are fatal outcomes.
> > > 
> > > And the following are not needed in the current version of the
> > > patch, but will be in a future version that either avoids disabling
> > > irqs or that dispenses with the smp_read_barrier_depends() that I
> > > have 99% convinced myself is unneeded:
> > > 
> > > o	nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> > > 
> > > o	idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;
> > > 
> > > Furthermore, in that future version, irq handlers can cause the same
> > > mischief that SMI/NMI handlers can in this version.

So don't remove the local_irq_save/restore, which is well-established and
well-understood for such cases (it doesn't help you with SMI/NMI,
admittedly). This isn't really about RCU or per-cpu vars as such, it's
just about racy code where you don't want to get hit by a concurrent
interrupt (it does turn out that doing things in a _particular order_ will
not cause fatal/buggy behaviour, but it's still a race issue, after all).

> > > Next, looking at __rcu_read_unlock():
> > > 
> > > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting - 1"
> > > 	was reordered by the compiler to follow the
> > > 	"ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--",
> > > 	then if an NMI/SMI containing an rcu_read_lock() occurs between
> > > 	the two, this nested rcu_read_lock() would incorrectly believe
> > > 	that it was protected by an enclosing RCU read-side critical
> > > 	section as described in the first reversal discussed for
> > > 	__rcu_read_lock() above.  Again, fatal outcome.
> > > 
> > > This is what we have now.  It is not hard to imagine situations that
> > > interact with -both- interrupt handlers -and- other CPUs, as described
> > > earlier.

Unless somebody's going for a lockless implementation, such situations
normally use spin_lock_irqsave() based locking (or local_irq_save for
those who care only for current CPU) -- problem with the patch in question,
is that you want to prevent races with concurrent SMI/NMIs as well, which
is not something that a lot of code needs to consider.

[ Curiously, another thread is discussing something similar also:
  http://lkml.org/lkml/2007/8/15/393 "RFC: do get_rtc_time() correctly" ]

Anyway, I didn't look at the code in that patch very much in detail, but
why couldn't you implement some kind of synchronization variable that lets
rcu_read_lock() or rcu_read_unlock() -- when being called from inside an
NMI or SMI handler -- know that it has concurrently interrupted an ongoing
rcu_read_{un}lock() and so must do things differently ... (?)

I'm also wondering if there's other code that's not using locking in the
kernel that faces similar issues, and what they've done to deal with it
(if anything). Such bugs would be subtle, and difficult to diagnose.

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 17:41                                         ` Segher Boessenkool
@ 2007-08-17 18:38                                           ` Satyam Sharma
  2007-08-17 23:17                                             ` Segher Boessenkool
  2007-09-10 18:59                                           ` Christoph Lameter
  1 sibling, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 18:38 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang



On Fri, 17 Aug 2007, Segher Boessenkool wrote:

> > > atomic_dec() already has volatile behavior everywhere, so this is
> > > semantically
> > > okay, but this code (and any like it) should be calling cpu_relax() each
> > > iteration through the loop, unless there's a compelling reason not to.
> > > I'll
> > > allow that for some hardware drivers (possibly this one) such a compelling
> > > reason may exist, but hardware-independent core subsystems probably have
> > > no
> > > excuse.
> > 
> > No it does not have any volatile semantics. atomic_dec() can be reordered
> > at will by the compiler within the current basic unit if you do not add a
> > barrier.
> 
> "volatile" has nothing to do with reordering.

If you're talking of "volatile" the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.

> atomic_dec() writes
> to memory, so it _does_ have "volatile semantics", implicitly, as
> long as the compiler cannot optimise the atomic variable away
> completely -- any store counts as a side effect.

I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 16:48                                                             ` Linus Torvalds
@ 2007-08-17 18:50                                                               ` Chris Friesen
  2007-08-17 18:54                                                                 ` Arjan van de Ven
  2007-08-17 19:08                                                                 ` Linus Torvalds
  2007-08-20 13:15                                                               ` Chris Snook
  2007-09-09 18:02                                                               ` Denys Vlasenko
  2 siblings, 2 replies; 657+ messages in thread
From: Chris Friesen @ 2007-08-17 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, zlynx, rpjday, jesper.juhl, segher

Linus Torvalds wrote:

>  - in other words, the *only* possible meaning for "volatile" is a purely 
>    single-CPU meaning. And if you only have a single CPU involved in the 
>    process, the "volatile" is by definition pointless (because even 
>    without a volatile, the compiler is required to make the C code appear 
>    consistent as far as a single CPU is concerned).

I assume you mean "except for IO-related code and 'random' values like 
jiffies" as you mention later on?  I assume other values set in 
interrupt handlers would count as "random" from a volatility perspective?

> So anybody who argues for "volatile" fixing bugs is fundamentally 
> incorrect. It does NO SUCH THING. By arguing that, such people only show 
> that you have no idea what they are talking about.

What about reading values modified in interrupt handlers, as in your 
"random" case?  Or is this a bug where the user of atomic_read() is 
invalidly expecting a read each time it is called?

Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 18:50                                                               ` Chris Friesen
@ 2007-08-17 18:54                                                                 ` Arjan van de Ven
  2007-08-17 19:49                                                                   ` Paul E. McKenney
  2007-08-17 19:08                                                                 ` Linus Torvalds
  1 sibling, 1 reply; 657+ messages in thread
From: Arjan van de Ven @ 2007-08-17 18:54 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher


On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote:
> Linus Torvalds wrote:
> 
> >  - in other words, the *only* possible meaning for "volatile" is a purely 
> >    single-CPU meaning. And if you only have a single CPU involved in the 
> >    process, the "volatile" is by definition pointless (because even 
> >    without a volatile, the compiler is required to make the C code appear 
> >    consistent as far as a single CPU is concerned).
> 
> I assume you mean "except for IO-related code and 'random' values like 
> jiffies" as you mention later on?  I assume other values set in 
> interrupt handlers would count as "random" from a volatility perspective?
> 
> > So anybody who argues for "volatile" fixing bugs is fundamentally 
> > incorrect. It does NO SUCH THING. By arguing that, such people only show 
> > that you have no idea what they are talking about.
> 
> What about reading values modified in interrupt handlers, as in your 
> "random" case?  Or is this a bug where the user of atomic_read() is 
> invalidly expecting a read each time it is called?

the interrupt handler case is an SMP case since you do not know
beforehand what cpu your interrupt handler will run on.




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 18:31                                                       ` Satyam Sharma
@ 2007-08-17 18:56                                                         ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17 18:56 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Herbert Xu, Stefan Richter, Paul Mackerras, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Sat, Aug 18, 2007 at 12:01:38AM +0530, Satyam Sharma wrote:
> 
> 
> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
> > On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote:
> > > 
> > > On Thu, 16 Aug 2007, Paul E. McKenney wrote:
> > > 
> > > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > > > > 
> > > > > First of all, I think this illustrates that what you want
> > > > > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > > > > macro occurs a lot more times in your patch than atomic
> > > > > reads/sets.  So *assuming* that it was necessary at all,
> > > > > then having an ordered variant of the atomic_read/atomic_set
> > > > > ops could do just as well.
> > > > 
> > > > Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> > > > to maintain ordering, then I could just use them instead of having to
> > > > create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> > > > different patch.)
> > > 
> > > +#define WHATEVER(x)	(*(volatile typeof(x) *)&(x))
> > > [...]
> > > Also, this gives *zero* "re-ordering" guarantees that your code wants
> > > as you've explained it below) -- neither w.r.t. CPU re-ordering (which
> > > probably you don't care about) *nor* w.r.t. compiler re-ordering
> > > (which you definitely _do_ care about).
> > 
> > You are correct about CPU re-ordering (and about the fact that this
> > example doesn't care about it), but not about compiler re-ordering.
> > 
> > The compiler is prohibited from moving a volatile access across a sequence
> > point.  One example of a sequence point is a statement boundary.  Because
> > all of the volatile accesses in this code are separated by statement
> > boundaries, a conforming compiler is prohibited from reordering them.
> 
> Yes, you're right, and I believe precisely this was discussed elsewhere
> as well today.
> 
> But I'd call attention to what Herbert mentioned there. You're using
> ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all:
> 
> * Member "completed" of struct rcu_ctrlblk is a long.
> * Per-cpu variable rcu_flipctr is an array of ints.
> * Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of
>   struct task_struct are ints.
> 
> So are you saying you're "having to use" this volatile-access macro
> because you *couldn't* declare all the above as atomic_t and thus just
> expect the right thing to happen by using the atomic ops API by default,
> because it lacks volatile access semantics (on x86)?
> 
> If so, then I wonder if using the volatile access cast is really the
> best way to achieve (at least in terms of code clarity) the kind of
> re-ordering guarantees it wants there. (there could be alternative
> solutions, such as using barrier(), or that at bottom of this mail)
> 
> What I mean is this: If you switch to atomic_t, and x86 switched to
> make atomic_t have "volatile" semantics by default, the statements
> would be simply a string of: atomic_inc(), atomic_add(), atomic_set(),
> and atomic_read() statements, and nothing in there that clearly makes
> it *explicit* that the code is correct (and not buggy) simply because
> of the re-ordering guarantees that the C "volatile" type-qualifier
> keyword gives us as per the standard. But now we're firmly in
> "subjective" territory, so you or anybody could legitimately disagree.

In any case, given Linus's note, it appears that atomic_read() and
atomic_set() won't consistently have volatile semantics, at least
not while the compiler generates such ugly code for volatile accesses.
So I will continue with my current approach.

In any case, I will not be using atomic_inc() or atomic_add() in this
code, as doing so would more than double the overhead, even on machines
that are the most efficient at implementing atomic operations.

> > > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> > > > atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> > > > 
> > > > o	If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
> > > > 	was ordered by the compiler after
> > > > 	"ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
> > > > 	suppose an NMI/SMI happened after the rcu_read_lock_nesting but
> > > > 	before the rcu_flipctr.
> > > > 
> > > > 	Then if there was an rcu_read_lock() in the SMI/NMI
> > > > 	handler (which is perfectly legal), the nested rcu_read_lock()
> > > > 	would believe that it could take the then-clause of the
> > > > 	enclosing "if" statement.  But because the rcu_flipctr per-CPU
> > > > 	variable had not yet been incremented, an RCU updater would
> > > > 	be within its rights to assume that there were no RCU reads
> > > > 	in progress, thus possibly yanking a data structure out from
> > > > 	under the reader in the SMI/NMI function.
> > > > 
> > > > 	Fatal outcome.  Note that only one CPU is involved here
> > > > 	because these are all either per-CPU or per-task variables.
> > > 
> > > Ok, so you don't care about CPU re-ordering. Still, I should let you know
> > > that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you
> > > want is a full compiler optimization barrier().
> > 
> > No.  See above.
> 
> True, *(volatile foo *)& _will_ work for this case.
> 
> But multiple calls to barrier() (granted, would invalidate all other
> optimizations also) would work as well, would it not?

They work, but are a bit slower.  So they do work, but not as well.

> [ Interestingly, if you declared all those objects mentioned earlier as
>   atomic_t, and x86(-64) switched to an __asm__ __volatile__ based variant
>   for atomic_{read,set}_volatile(), the bugs you want to avoid would still
>   be there. "volatile" the C language type-qualifier does have compiler
>   re-ordering semantics you mentioned earlier, but the "volatile" that
>   applies to inline asm()s gives no re-ordering guarantees. ]

Well, that certainly would be a point in favor of "volatile" over inline
asms.  ;-)

> > > > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
> > > > 	was ordered by the compiler to follow the
> > > > 	"ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
> > > > 	happened between the two, then an __rcu_read_lock() in the NMI/SMI
> > > > 	would incorrectly take the "else" clause of the enclosing "if"
> > > > 	statement.  If some other CPU flipped the rcu_ctrlblk.completed
> > > > 	in the meantime, then the __rcu_read_lock() would (correctly)
> > > > 	write the new value into rcu_flipctr_idx.
> > > > 
> > > > 	Well and good so far.  But the problem arises in
> > > > 	__rcu_read_unlock(), which then decrements the wrong counter.
> > > > 	Depending on exactly how subsequent events played out, this could
> > > > 	result in either prematurely ending grace periods or never-ending
> > > > 	grace periods, both of which are fatal outcomes.
> > > > 
> > > > And the following are not needed in the current version of the
> > > > patch, but will be in a future version that either avoids disabling
> > > > irqs or that dispenses with the smp_read_barrier_depends() that I
> > > > have 99% convinced myself is unneeded:
> > > > 
> > > > o	nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> > > > 
> > > > o	idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;
> > > > 
> > > > Furthermore, in that future version, irq handlers can cause the same
> > > > mischief that SMI/NMI handlers can in this version.
> 
> So don't remove the local_irq_save/restore, which is well-established and
> well-understood for such cases (it doesn't help you with SMI/NMI,
> admittedly). This isn't really about RCU or per-cpu vars as such, it's
> just about racy code where you don't want to get hit by a concurrent
> interrupt (it does turn out that doing things in a _particular order_ will
> not cause fatal/buggy behaviour, but it's still a race issue, after all).

The local_irq_save/restore are something like 30% of the overhead of
these two functions, so will be looking hard at getting rid of them.
Doing so allows the scheduling-clock interrupt to get into the mix,
and also allows preemption.  The goal would be to find some trick that
suppresses preemption, fends off the grace-period-computation code
invoked from the the scheduling-clock interrupt, and otherwise keeps
things on an even keel.

> > > > Next, looking at __rcu_read_unlock():
> > > > 
> > > > o	If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting - 1"
> > > > 	was reordered by the compiler to follow the
> > > > 	"ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--",
> > > > 	then if an NMI/SMI containing an rcu_read_lock() occurs between
> > > > 	the two, this nested rcu_read_lock() would incorrectly believe
> > > > 	that it was protected by an enclosing RCU read-side critical
> > > > 	section as described in the first reversal discussed for
> > > > 	__rcu_read_lock() above.  Again, fatal outcome.
> > > > 
> > > > This is what we have now.  It is not hard to imagine situations that
> > > > interact with -both- interrupt handlers -and- other CPUs, as described
> > > > earlier.
> 
> Unless somebody's going for a lockless implementation, such situations
> normally use spin_lock_irqsave() based locking (or local_irq_save for
> those who care only for current CPU) -- problem with the patch in question,
> is that you want to prevent races with concurrent SMI/NMIs as well, which
> is not something that a lot of code needs to consider.

Or that needs to resolve similar races with IRQs without disabling them.
One reason to avoid disabling IRQs is to avoid degrading scheduling
latency.  In any case, I do agree that the amount of code that must
worry about this is quite small at the moment.  I believe that it
will become more common, but would imagine that this belief might not
be universal.  Yet, anyway.  ;-)

> [ Curiously, another thread is discussing something similar also:
>   http://lkml.org/lkml/2007/8/15/393 "RFC: do get_rtc_time() correctly" ]
> 
> Anyway, I didn't look at the code in that patch very much in detail, but
> why couldn't you implement some kind of synchronization variable that lets
> rcu_read_lock() or rcu_read_unlock() -- when being called from inside an
> NMI or SMI handler -- know that it has concurrently interrupted an ongoing
> rcu_read_{un}lock() and so must do things differently ... (?)

Given some low-level details of the current implementation, I could
imagine manipulating rcu_read_lock_nesting on entry to and exit from
all NMI/SMI handlers, but would like to avoid that kind of architecture
dependency.  I am not confident of locating all of them, for one thing...

> I'm also wondering if there's other code that's not using locking in the
> kernel that faces similar issues, and what they've done to deal with it
> (if anything). Such bugs would be subtle, and difficult to diagnose.

Agreed!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 18:50                                                               ` Chris Friesen
  2007-08-17 18:54                                                                 ` Arjan van de Ven
@ 2007-08-17 19:08                                                                 ` Linus Torvalds
  1 sibling, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-17 19:08 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, zlynx, rpjday, jesper.juhl, segher

On Fri, 17 Aug 2007, Chris Friesen wrote:
> 
> I assume you mean "except for IO-related code and 'random' values like
> jiffies" as you mention later on?

Yes. There *are* valid uses for "volatile", but they have remained the 
same for the last few years:
 - "jiffies"
 - internal per-architecture IO implementations that can do them as normal 
   stores.

> I assume other values set in interrupt handlers would count as "random" 
> from a volatility perspective?

I don't really see any valid case. I can imagine that you have your own 
"jiffy" counter in a driver, but what's the point, really? I'd suggest not 
using volatile, and using barriers instead.

> 
> > So anybody who argues for "volatile" fixing bugs is fundamentally 
> > incorrect. It does NO SUCH THING. By arguing that, such people only 
> > show that you have no idea what they are talking about.

> What about reading values modified in interrupt handlers, as in your 
> "random" case?  Or is this a bug where the user of atomic_read() is 
> invalidly expecting a read each time it is called?

Quite frankly, the biggest reason for using "volatile" on jiffies was 
really historic. So even the "random" case is not really a very strong 
one. You'll notice that anybody who is actually careful will be using 
sequence locks for the jiffy accesses, if only because the *full* jiffy 
count is actually a 64-bit value, and so you cannot get it atomically on a 
32-bit architecture even on a single CPU (ie a timer interrupt might 
happen in between reading the low and the high word, so "volatile" is only 
used for the low 32 bits).

So even for jiffies, we actually have:

	extern u64 __jiffy_data jiffies_64;
	extern unsigned long volatile __jiffy_data jiffies;

where the *real* jiffies is not volatile: the volatile one is using linker 
tricks to alias the low 32 bits:

 - arch/i386/kernel/vmlinux.lds.S:

	...
	jiffies = jiffies_64;
	...

and the only reason we do all these games is (a) it works and (b) it's 
legacy.

Note how I do *not* say "(c) it's a good idea".

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 19:49                                                                   ` Paul E. McKenney
@ 2007-08-17 19:49                                                                     ` Arjan van de Ven
  2007-08-17 20:12                                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Arjan van de Ven @ 2007-08-17 19:49 UTC (permalink / raw)
  To: paulmck
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher


On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote:
> > > What about reading values modified in interrupt handlers, as in your 
> > > "random" case?  Or is this a bug where the user of atomic_read() is 
> > > invalidly expecting a read each time it is called?
> > 
> > the interrupt handler case is an SMP case since you do not know
> > beforehand what cpu your interrupt handler will run on.
> 
> With the exception of per-CPU variables, yes.

if you're spinning waiting for a per-CPU variable to get changed by an
interrupt handler... you have bigger problems than "volatile" ;-)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 18:54                                                                 ` Arjan van de Ven
@ 2007-08-17 19:49                                                                   ` Paul E. McKenney
  2007-08-17 19:49                                                                     ` Arjan van de Ven
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17 19:49 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher

On Fri, Aug 17, 2007 at 11:54:33AM -0700, Arjan van de Ven wrote:
> 
> On Fri, 2007-08-17 at 12:50 -0600, Chris Friesen wrote:
> > Linus Torvalds wrote:
> > 
> > >  - in other words, the *only* possible meaning for "volatile" is a purely 
> > >    single-CPU meaning. And if you only have a single CPU involved in the 
> > >    process, the "volatile" is by definition pointless (because even 
> > >    without a volatile, the compiler is required to make the C code appear 
> > >    consistent as far as a single CPU is concerned).
> > 
> > I assume you mean "except for IO-related code and 'random' values like 
> > jiffies" as you mention later on?  I assume other values set in 
> > interrupt handlers would count as "random" from a volatility perspective?
> > 
> > > So anybody who argues for "volatile" fixing bugs is fundamentally 
> > > incorrect. It does NO SUCH THING. By arguing that, such people only show 
> > > that you have no idea what they are talking about.
> > 
> > What about reading values modified in interrupt handlers, as in your 
> > "random" case?  Or is this a bug where the user of atomic_read() is 
> > invalidly expecting a read each time it is called?
> 
> the interrupt handler case is an SMP case since you do not know
> beforehand what cpu your interrupt handler will run on.

With the exception of per-CPU variables, yes.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 19:49                                                                     ` Arjan van de Ven
@ 2007-08-17 20:12                                                                       ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17 20:12 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Chris Friesen, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, zlynx, rpjday,
	jesper.juhl, segher

On Fri, Aug 17, 2007 at 12:49:00PM -0700, Arjan van de Ven wrote:
> 
> On Fri, 2007-08-17 at 12:49 -0700, Paul E. McKenney wrote:
> > > > What about reading values modified in interrupt handlers, as in your 
> > > > "random" case?  Or is this a bug where the user of atomic_read() is 
> > > > invalidly expecting a read each time it is called?
> > > 
> > > the interrupt handler case is an SMP case since you do not know
> > > beforehand what cpu your interrupt handler will run on.
> > 
> > With the exception of per-CPU variables, yes.
> 
> if you're spinning waiting for a per-CPU variable to get changed by an
> interrupt handler... you have bigger problems than "volatile" ;-)

That would be true, if you were doing that.  But you might instead be
simply making sure that the mainline actions were seen in order by the
interrupt handler.  My current example is the NMI-save rcu_read_lock()
implementation for realtime.  Not the common case, I will admit, but
still real.  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:03                                           ` Linus Torvalds
  2007-08-17  3:43                                             ` Paul Mackerras
@ 2007-08-17 22:09                                             ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Satyam Sharma, Ilpo J?rvinen,
	Linux Kernel Mailing List, David Miller, Paul E. McKenney, ak,
	Netdev, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, wensong, wjiang

> Of course, since *normal* accesses aren't necessarily limited wrt
> re-ordering, the question then becomes one of "with regard to *what* 
> does
> it limit re-ordering?".
>
> A C compiler that re-orders two different volatile accesses that have a
> sequence point in between them is pretty clearly a buggy compiler. So 
> at a
> minimum, it limits re-ordering wrt other volatiles (assuming sequence
> points exists). It also means that the compiler cannot move it
> speculatively across conditionals, but other than that it's starting to
> get fuzzy.

This is actually really well-defined in C, not fuzzy at all.
"Volatile accesses" are a side effect, and no side effects can
be reordered with respect to sequence points.  The side effects
that matter in the kernel environment are: 1) accessing a volatile
object; 2) modifying an object; 3) volatile asm(); 4) calling a
function that does any of these.

We certainly should avoid volatile whenever possible, but "because
it's fuzzy wrt reordering" is not a reason -- all alternatives have
exactly the same issues.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:15                                               ` Nick Piggin
  2007-08-17  4:02                                                 ` Paul Mackerras
  2007-08-17  7:25                                                 ` Stefan Richter
@ 2007-08-17 22:14                                                 ` Segher Boessenkool
  2 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: paulmck, Christoph Lameter, Paul Mackerras, heiko.carstens,
	Stefan Richter, horms, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, jesper.juhl, linux-arch,
	Andrew Morton, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

> (and yes, it is perfectly legitimate to
> want a non-volatile read for a data type that you also want to do
> atomic RMW operations on)

...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:42                       ` Linus Torvalds
                                           ` (3 preceding siblings ...)
  2007-08-17  8:52                         ` Andi Kleen
@ 2007-08-17 22:29                         ` Segher Boessenkool
  4 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Mackerras, heiko.carstens, horms, linux-kernel, rpjday, ak,
	netdev, cfriesen, akpm, Nick Piggin, linux-arch, jesper.juhl,
	satyam, zlynx, clameter, schwidefsky, Chris Snook, Herbert Xu,
	davem, wensong, wjiang

> In a reasonable world, gcc should just make that be (on x86)
>
> 	addl $1,i(%rip)
>
> on x86-64, which is indeed what it does without the volatile. But with 
> the
> volatile, the compiler gets really nervous, and doesn't dare do it in 
> one
> instruction, and thus generates crap like
>
>         movl    i(%rip), %eax
>         addl    $1, %eax
>         movl    %eax, i(%rip)
>
> instead. For no good reason, except that "volatile" just doesn't have 
> any
> good/clear semantics for the compiler, so most compilers will just 
> make it
> be "I will not touch this access in any way, shape, or form". Including
> even trivially correct instruction optimization/combination.

It's just a (target-specific, perhaps) missed-optimisation kind
of bug in GCC.  Care to file a bug report?

> but is
> (again) something that gcc doesn't dare do, since "i" is volatile.

Just nobody taught it it can do this; perhaps no one wanted to
add optimisations like that, maybe with a reasoning like "people
who hit the go-slow-in-unspecified-ways button should get what
they deserve" ;-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  4:24               ` Satyam Sharma
@ 2007-08-17 22:34                 ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:34 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, davids, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

> Now the second wording *IS* technically correct, but come on, it's
> 24 words long whereas the original one was 3 -- and hopefully anybody
> reading the shorter phrase *would* have known anyway what was meant,
> without having to be pedantic about it :-)

Well you were talking pretty formal (and detailed) stuff, so
IMHO it's good to have that exactly correct.  Sure it's nicer
to use small words most of the time :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  4:32                 ` Satyam Sharma
@ 2007-08-17 22:38                   ` Segher Boessenkool
  2007-08-18 14:42                     ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:38 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang

>>> Here, I should obviously admit that the semantics of *(volatile int 
>>> *)&
>>> aren't any neater or well-defined in the _language standard_ at all. 
>>> The
>>> standard does say (verbatim) "precisely what constitutes as access to
>>> object of volatile-qualified type is implementation-defined", but GCC
>>> does help us out here by doing the right thing.
>>
>> Where do you get that idea?
>
> Try a testcase (experimentally verify).

That doesn't prove anything.  Experiments can only disprove
things.

>> GCC manual, section 6.1, "When
>> is a Volatile Object Accessed?" doesn't say anything of the
>> kind.
>
> True, "implementation-defined" as per the C standard _is_ supposed to 
> mean
> "unspecified behaviour where each implementation documents how the 
> choice
> is made". So ok, probably GCC isn't "documenting" this
> implementation-defined behaviour which it is supposed to, but can't 
> really
> fault them much for this, probably.

GCC _is_ documenting this, namely in this section 6.1.  It doesn't
mention volatile-casted stuff.  Draw your own conclusions.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  5:56                         ` Satyam Sharma
  2007-08-17  7:26                           ` Nick Piggin
@ 2007-08-17 22:49                           ` Segher Boessenkool
  2007-08-17 23:51                             ` Satyam Sharma
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 22:49 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
>
> [ This is exactly equivalent to using "+m" in the constraints, as 
> recently
>   explained on a GCC list somewhere, in response to the patch in my 
> bitops
>   series a few weeks back where I thought "+m" was bogus. ]

[It wasn't explained on a GCC list in response to your patch, as
far as I can see -- if I missed it, please point me to an archived
version of it].

One last time: it isn't equivalent on older (but still supported
by Linux) versions of GCC.  Current versions of GCC allow it, but
it has no documented behaviour at all, so use it at your own risk.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 18:38                                           ` Satyam Sharma
@ 2007-08-17 23:17                                             ` Segher Boessenkool
  2007-08-17 23:55                                               ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 23:17 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>>> No it does not have any volatile semantics. atomic_dec() can be 
>>> reordered
>>> at will by the compiler within the current basic unit if you do not 
>>> add a
>>> barrier.
>>
>> "volatile" has nothing to do with reordering.
>
> If you're talking of "volatile" the type-qualifier keyword, then
> http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> otherwise.

I'm not sure what in that mail you mean, but anyway...

Yes, of course, the fact that "volatile" creates a side effect
prevents certain things from being reordered wrt the atomic_dec();
but the atomic_dec() has a side effect *already* so the volatile
doesn't change anything.

>> atomic_dec() writes
>> to memory, so it _does_ have "volatile semantics", implicitly, as
>> long as the compiler cannot optimise the atomic variable away
>> completely -- any store counts as a side effect.
>
> I don't think an atomic_dec() implemented as an inline "asm volatile"
> or one that uses a "forget" macro would have the same re-ordering
> guarantees as an atomic_dec() that uses a volatile access cast.

The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing, if that is
implemented by GCC in the "obvious" way.  Even a "plain" asm()
will do the same.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 22:49                           ` Segher Boessenkool
@ 2007-08-17 23:51                             ` Satyam Sharma
  2007-08-17 23:55                               ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 23:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> > 
> > [ This is exactly equivalent to using "+m" in the constraints, as recently
> >   explained on a GCC list somewhere, in response to the patch in my bitops
> >   series a few weeks back where I thought "+m" was bogus. ]
> 
> [It wasn't explained on a GCC list in response to your patch, as
> far as I can see -- if I missed it, please point me to an archived
> version of it].

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

is a follow-up in the thread on the gcc-patches@gcc.gnu.org mailing list,
which began with:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01677.html

that was posted by Jan Kubicka, as he quotes in that initial posting,
after I had submitted:

http://lkml.org/lkml/2007/7/23/252

which was a (wrong) patch to "rectify" what I thought was the "bogus"
"+m" constraint, as per the quoted extract from gcc docs (that was
given in that (wrong) patch's changelog).

That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


> One last time: it isn't equivalent on older (but still supported
> by Linux) versions of GCC.  Current versions of GCC allow it, but
> it has no documented behaviour at all, so use it at your own risk.

True.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 23:17                                             ` Segher Boessenkool
@ 2007-08-17 23:55                                               ` Satyam Sharma
  2007-08-18  0:04                                                 ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-17 23:55 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > No it does not have any volatile semantics. atomic_dec() can be
> > > > reordered
> > > > at will by the compiler within the current basic unit if you do not add
> > > > a
> > > > barrier.
> > > 
> > > "volatile" has nothing to do with reordering.
> > 
> > If you're talking of "volatile" the type-qualifier keyword, then
> > http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> > otherwise.
> 
> I'm not sure what in that mail you mean, but anyway...
> 
> Yes, of course, the fact that "volatile" creates a side effect
> prevents certain things from being reordered wrt the atomic_dec();
> but the atomic_dec() has a side effect *already* so the volatile
> doesn't change anything.

That's precisely what that sub-thread (read down to the last mail
there, and not the first mail only) shows. So yes, "volatile" does
have something to do with re-ordering (as guaranteed by the C
standard).


> > > atomic_dec() writes
> > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > long as the compiler cannot optimise the atomic variable away
> > > completely -- any store counts as a side effect.
> > 
> > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > or one that uses a "forget" macro would have the same re-ordering
> > guarantees as an atomic_dec() that uses a volatile access cast.
> 
> The "asm volatile" implementation does have exactly the same
> reordering guarantees as the "volatile cast" thing,

I don't think so.

> if that is
> implemented by GCC in the "obvious" way.  Even a "plain" asm()
> will do the same.

Read the relevant GCC documentation.

[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 23:51                             ` Satyam Sharma
@ 2007-08-17 23:55                               ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-17 23:55 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Paul Mackerras, heiko.carstens, horms, Linux Kernel Mailing List,
	rpjday, ak, netdev, cfriesen, Nick Piggin, linux-arch,
	jesper.juhl, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>>> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
>>>
>>> [ This is exactly equivalent to using "+m" in the constraints, as 
>>> recently
>>>   explained on a GCC list somewhere, in response to the patch in my 
>>> bitops
>>>   series a few weeks back where I thought "+m" was bogus. ]
>>
>> [It wasn't explained on a GCC list in response to your patch, as
>> far as I can see -- if I missed it, please point me to an archived
>> version of it].
>
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

Ah yes, that old thread, thank you.

> That's when _I_ came to know how GCC interprets "+m", but probably
> this has been explained on those lists multiple times. Who cares,
> anyway?

I just couldn't find the thread you meant, I thought I missed
have it, that's all :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17  3:50                         ` Linus Torvalds
@ 2007-08-17 23:59                           ` Paul E. McKenney
  2007-08-18  0:09                             ` Herbert Xu
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-17 23:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Paul Mackerras, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, satyam, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, wensong, wjiang

On Thu, Aug 16, 2007 at 08:50:30PM -0700, Linus Torvalds wrote:
> Just try it yourself:
> 
> 	volatile int i;
> 	int j;
> 
> 	int testme(void)
> 	{
> 	        return i <= 1;
> 	}
> 
> 	int testme2(void)
> 	{
> 	        return j <= 1;
> 	}
> 
> and compile with all the optimizations you can.
> 
> I get:
> 
> 	testme:
> 	        movl    i(%rip), %eax
> 	        subl    $1, %eax
> 	        setle   %al
> 	        movzbl  %al, %eax
> 	        ret
> 
> vs
> 
> 	testme2:
> 	        xorl    %eax, %eax
> 	        cmpl    $1, j(%rip)
> 	        setle   %al
> 	        ret
> 
> (now, whether that "xorl + setle" is better than "setle + movzbl", I don't 
> really know - maybe it is. But that's not thepoint. The point is the 
> difference between
> 
>                 movl    i(%rip), %eax
>                 subl    $1, %eax
> 
> and
> 
>                 cmpl    $1, j(%rip)

gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 23:55                                               ` Satyam Sharma
@ 2007-08-18  0:04                                                 ` Segher Boessenkool
  2007-08-18  1:56                                                   ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-18  0:04 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>>>> atomic_dec() writes
>>>> to memory, so it _does_ have "volatile semantics", implicitly, as
>>>> long as the compiler cannot optimise the atomic variable away
>>>> completely -- any store counts as a side effect.
>>>
>>> I don't think an atomic_dec() implemented as an inline "asm volatile"
>>> or one that uses a "forget" macro would have the same re-ordering
>>> guarantees as an atomic_dec() that uses a volatile access cast.
>>
>> The "asm volatile" implementation does have exactly the same
>> reordering guarantees as the "volatile cast" thing,
>
> I don't think so.

"asm volatile" creates a side effect.  Side effects aren't
allowed to be reordered wrt sequence points.  This is exactly
the same reason as why "volatile accesses" cannot be reordered.

>> if that is
>> implemented by GCC in the "obvious" way.  Even a "plain" asm()
>> will do the same.
>
> Read the relevant GCC documentation.

I did, yes.

> [ of course, if the (latest) GCC documentation is *yet again*
>   wrong, then alright, not much I can do about it, is there. ]

There was (and is) nothing wrong about the "+m" documentation, if
that is what you are talking about.  It could be extended now, to
allow "+m" -- but that takes more than just "fixing" the documentation.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 23:59                           ` Paul E. McKenney
@ 2007-08-18  0:09                             ` Herbert Xu
  2007-08-18  1:08                               ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-18  0:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Nick Piggin, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, jesper.juhl, linux-arch, zlynx, satyam, clameter,
	schwidefsky, Chris Snook, davem, wensong, wjiang

On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
>
> gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)

I had totally forgotten that I'd already filed that bug more
than six years ago until they just closed yours as a duplicate
of mine :)

Good luck in getting it fixed!

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  0:09                             ` Herbert Xu
@ 2007-08-18  1:08                               ` Paul E. McKenney
  2007-08-18  1:24                                 ` Christoph Lameter
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-18  1:08 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linus Torvalds, Nick Piggin, Paul Mackerras, Segher Boessenkool,
	heiko.carstens, horms, linux-kernel, rpjday, ak, netdev,
	cfriesen, akpm, jesper.juhl, linux-arch, zlynx, satyam, clameter,
	schwidefsky, Chris Snook, davem, wensong, wjiang

On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> >
> > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> 
> I had totally forgotten that I'd already filed that bug more
> than six years ago until they just closed yours as a duplicate
> of mine :)
> 
> Good luck in getting it fixed!

Well, just got done re-opening it for the third time.  And a local
gcc community member advised me not to give up too easily.  But I
must admit that I am impressed with the speed that it was identified
as duplicate.

Should be entertaining!  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:08                               ` Paul E. McKenney
@ 2007-08-18  1:24                                 ` Christoph Lameter
  2007-08-18  1:41                                   ` Satyam Sharma
                                                     ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-08-18  1:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Linus Torvalds, Nick Piggin, Paul Mackerras,
	Segher Boessenkool, heiko.carstens, horms, linux-kernel, rpjday,
	ak, netdev, cfriesen, akpm, jesper.juhl, linux-arch, zlynx,
	satyam, schwidefsky, Chris Snook, davem, wensong, wjiang

On Fri, 17 Aug 2007, Paul E. McKenney wrote:

> On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > >
> > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > 
> > I had totally forgotten that I'd already filed that bug more
> > than six years ago until they just closed yours as a duplicate
> > of mine :)
> > 
> > Good luck in getting it fixed!
> 
> Well, just got done re-opening it for the third time.  And a local
> gcc community member advised me not to give up too easily.  But I
> must admit that I am impressed with the speed that it was identified
> as duplicate.
> 
> Should be entertaining!  ;-)

Right. ROTFL... volatile actually breaks atomic_t instead of making it 
safe. x++ becomes a register load, increment and a register store. Without 
volatile we can increment the memory directly. It seems that volatile 
requires that the variable is loaded into a register first and then 
operated upon. Understandable when you think about volatile being used to 
access memory mapped I/O registers where a RMW operation could be 
problematic.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3506

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:24                                 ` Christoph Lameter
@ 2007-08-18  1:41                                   ` Satyam Sharma
  2007-08-18  4:13                                     ` Linus Torvalds
  2007-08-18 21:56                                   ` Paul E. McKenney
  2007-08-20 13:31                                   ` Chris Snook
  2 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18  1:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Herbert Xu, Linus Torvalds, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang



On Fri, 17 Aug 2007, Christoph Lameter wrote:

> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
> > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > > >
> > > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > > 
> > > I had totally forgotten that I'd already filed that bug more
> > > than six years ago until they just closed yours as a duplicate
> > > of mine :)
> > > 
> > > Good luck in getting it fixed!
> > 
> > Well, just got done re-opening it for the third time.  And a local
> > gcc community member advised me not to give up too easily.  But I
> > must admit that I am impressed with the speed that it was identified
> > as duplicate.
> > 
> > Should be entertaining!  ;-)
> 
> Right. ROTFL... volatile actually breaks atomic_t instead of making it 
> safe. x++ becomes a register load, increment and a register store. Without 
> volatile we can increment the memory directly.

No code does (or would do, or should do):

	x.counter++;

on an "atomic_t x;" anyway.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  0:04                                                 ` Segher Boessenkool
@ 2007-08-18  1:56                                                   ` Satyam Sharma
  2007-08-18  2:15                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18  1:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > atomic_dec() writes
> > > > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > > > long as the compiler cannot optimise the atomic variable away
> > > > > completely -- any store counts as a side effect.
> > > > 
> > > > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > > > or one that uses a "forget" macro would have the same re-ordering
> > > > guarantees as an atomic_dec() that uses a volatile access cast.
> > > 
> > > The "asm volatile" implementation does have exactly the same
> > > reordering guarantees as the "volatile cast" thing,
> > 
> > I don't think so.
> 
> "asm volatile" creates a side effect.

Yeah.

> Side effects aren't
> allowed to be reordered wrt sequence points.

Yeah.

> This is exactly
> the same reason as why "volatile accesses" cannot be reordered.

No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that volatile
access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).

But you seem to be missing the simple and basic fact that:

	(something_that_has_side_effects || statement)
			!= something_that_is_a_sequence_point

Now, one cannot fantasize that "volatile asms" are also sequence points.
In fact such an argument would be sadly mistaken, because "sequence
points" are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about "volatile asms".

> > > if that is
> > > implemented by GCC in the "obvious" way.  Even a "plain" asm()
> > > will do the same.
> > 
> > Read the relevant GCC documentation.
> 
> I did, yes.

No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the example
given there:

	asm volatile("mtfsf 255,%0" : : "f" (fpenv));
	sum = x + y;

The docs explicitly say the addition can be moved before the "volatile
asm". Hopefully, as you know, (x + y) is an C "expression" and hence
a "sequence point" as defined by the standard. So the "volatile asm"
should've happened before it, right? Wrong.

I know there is also stuff written about "side-effects" there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts, but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...

See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs). Semantics of "volatile" as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...

> > [ of course, if the (latest) GCC documentation is *yet again*
> >   wrong, then alright, not much I can do about it, is there. ]
> 
> There was (and is) nothing wrong about the "+m" documentation, if
> that is what you are talking about.  It could be extended now, to
> allow "+m" -- but that takes more than just "fixing" the documentation.

No, there was (and is) _everything_ wrong about the "+" documentation as
applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it possible
to use "+m" (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.

[ You seem to often take issue with _amazingly_ petty and pedantic things,
  by the way :-) ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 12:56                                                               ` Nick Piggin
@ 2007-08-18  2:15                                                                 ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18  2:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Stefan Richter, paulmck, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher



On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > I didn't quite understand what you said here, so I'll tell what I think:
> > 
> > * foo() is a compiler barrier if the definition of foo() is invisible to
> >  the compiler at a callsite.
> > 
> > * foo() is also a compiler barrier if the definition of foo() includes
> >  a barrier, and it is inlined at the callsite.
> > 
> > If the above is wrong, or if there's something else at play as well,
> > do let me know.
> 
> [...]
> If a function is not completely visible to the compiler (so it can't
> determine whether a barrier could be in it or not), then it must always
> assume it will contain a barrier so it always does the right thing.

Yup, that's what I'd said just a few sentences above, as you can see. I
was actually asking for "elaboration" on "how a compiler determines that
function foo() (say foo == schedule), even when it cannot see that it has
a barrier(), as you'd mentioned, is a 'sleeping' function" actually, and
whether compilers have a "notion of sleep to automatically assume a
compiler barrier whenever such a sleeping function foo() is called". But
I think you've already qualified the discussion to this kernel, so okay,
I shouldn't nit-pick anymore.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:56                                                   ` Satyam Sharma
@ 2007-08-18  2:15                                                     ` Segher Boessenkool
  2007-08-18  3:33                                                       ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-18  2:15 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>>>> The "asm volatile" implementation does have exactly the same
>>>> reordering guarantees as the "volatile cast" thing,
>>>
>>> I don't think so.
>>
>> "asm volatile" creates a side effect.
>
> Yeah.
>
>> Side effects aren't
>> allowed to be reordered wrt sequence points.
>
> Yeah.
>
>> This is exactly
>> the same reason as why "volatile accesses" cannot be reordered.
>
> No, the code in that sub-thread I earlier pointed you at *WAS* written
> such that there was a sequence point after all the uses of that 
> volatile
> access cast macro, and _therefore_ we were safe from re-ordering
> (behaviour guaranteed by the C standard).

And exactly the same is true for the "asm" version.

> Now, one cannot fantasize that "volatile asms" are also sequence 
> points.

Sure you can do that.  I don't though.

> In fact such an argument would be sadly mistaken, because "sequence
> points" are defined by the C standard and it'd be horribly wrong to
> even _try_ claiming that the C standard knows about "volatile asms".

That's nonsense.  GCC can extend the C standard any way they
bloody well please -- witness the fact that they added an
extra class of side effects...

>>> Read the relevant GCC documentation.
>>
>> I did, yes.
>
> No, you didn't read:
>
> http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
>
> Read the bit about the need for artificial dependencies, and the 
> example
> given there:
>
> 	asm volatile("mtfsf 255,%0" : : "f" (fpenv));
> 	sum = x + y;
>
> The docs explicitly say the addition can be moved before the "volatile
> asm". Hopefully, as you know, (x + y) is an C "expression" and hence
> a "sequence point" as defined by the standard.

The _end of a full expression_ is a sequence point, not every
expression.  And that is irrelevant here anyway.

It is perfectly fine to compute x+y any time before the
assignment; the C compiler is allowed to compute it _after_
the assignment even, if it could figure out how ;-)

x+y does not contain a side effect, you know.

> I know there is also stuff written about "side-effects" there which
> _could_ give the same semantic w.r.t. sequence points as the volatile
> access casts,

s/could/does/

> but hey, it's GCC's own documentation, you obviously can't
> find fault with _me_ if there's wrong stuff written in there. Say that
> to GCC ...

There's nothing wrong there.

> See, "volatile" C keyword, for all it's ill-definition and dodgy
> semantics, is still at least given somewhat of a treatment in the C
> standard (whose quality is ... ummm, sadly not always good and clear,
> but unsurprisingly, still about 5,482 orders-of-magnitude times
> better than GCC docs).

If you find any problems/shortcomings in the GCC documentation,
please file a PR, don't go whine on some unrelated mailing lists.
Thank you.

> Semantics of "volatile" as applies to inline
> asm, OTOH? You're completely relying on the compiler for that ...

Yes, and?  GCC promises the behaviour it has documented.

>>> [ of course, if the (latest) GCC documentation is *yet again*
>>>   wrong, then alright, not much I can do about it, is there. ]
>>
>> There was (and is) nothing wrong about the "+m" documentation, if
>> that is what you are talking about.  It could be extended now, to
>> allow "+m" -- but that takes more than just "fixing" the 
>> documentation.
>
> No, there was (and is) _everything_ wrong about the "+" documentation 
> as
> applies to memory-constrained operands. I don't give a whit if it's
> some workaround in their gimplifier, or the other, that makes it 
> possible
> to use "+m" (like the current kernel code does). The docs suggest
> otherwise, so there's obviously a clear disconnect between the docs and
> actual GCC behaviour.

The documentation simply doesn't say "+m" is allowed.  The code to
allow it was added for the benefit of people who do not read the
documentation.  Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obvious).

> [ You seem to often take issue with _amazingly_ petty and pedantic 
> things,
>   by the way :-) ]

If you're talking details, you better get them right.  Handwaving is
fine with me as long as you're not purporting you're not.

And I simply cannot stand false assertions.

You can always ignore me if _you_ take issue with _that_ :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  2:15                                                     ` Segher Boessenkool
@ 2007-08-18  3:33                                                       ` Satyam Sharma
  2007-08-18  5:18                                                         ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18  3:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > The "asm volatile" implementation does have exactly the same
> > > > > reordering guarantees as the "volatile cast" thing,
> > > > 
> > > > I don't think so.
> > > 
> > > "asm volatile" creates a side effect.
> > 
> > Yeah.
> > 
> > > Side effects aren't
> > > allowed to be reordered wrt sequence points.
> > 
> > Yeah.
> > 
> > > This is exactly
> > > the same reason as why "volatile accesses" cannot be reordered.
> > 
> > No, the code in that sub-thread I earlier pointed you at *WAS* written
> > such that there was a sequence point after all the uses of that volatile
> > access cast macro, and _therefore_ we were safe from re-ordering
> > (behaviour guaranteed by the C standard).
> 
> And exactly the same is true for the "asm" version.
> 
> > Now, one cannot fantasize that "volatile asms" are also sequence points.
> 
> Sure you can do that.  I don't though.
> 
> > In fact such an argument would be sadly mistaken, because "sequence
> > points" are defined by the C standard and it'd be horribly wrong to
> > even _try_ claiming that the C standard knows about "volatile asms".
> 
> That's nonsense.  GCC can extend the C standard any way they
> bloody well please -- witness the fact that they added an
> extra class of side effects...
> 
> > > > Read the relevant GCC documentation.
> > > 
> > > I did, yes.
> > 
> > No, you didn't read:
> > 
> > http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> > 
> > Read the bit about the need for artificial dependencies, and the example
> > given there:
> > 
> > 	asm volatile("mtfsf 255,%0" : : "f" (fpenv));
> > 	sum = x + y;
> > 
> > The docs explicitly say the addition can be moved before the "volatile
> > asm". Hopefully, as you know, (x + y) is an C "expression" and hence
> > a "sequence point" as defined by the standard.
> 
> The _end of a full expression_ is a sequence point, not every
> expression.  And that is irrelevant here anyway.
> 
> It is perfectly fine to compute x+y any time before the
> assignment; the C compiler is allowed to compute it _after_
> the assignment even, if it could figure out how ;-)
> 
> x+y does not contain a side effect, you know.
> 
> > I know there is also stuff written about "side-effects" there which
> > _could_ give the same semantic w.r.t. sequence points as the volatile
> > access casts,
> 
> s/could/does/
> 
> > but hey, it's GCC's own documentation, you obviously can't
> > find fault with _me_ if there's wrong stuff written in there. Say that
> > to GCC ...
> 
> There's nothing wrong there.
> 
> > See, "volatile" C keyword, for all it's ill-definition and dodgy
> > semantics, is still at least given somewhat of a treatment in the C
> > standard (whose quality is ... ummm, sadly not always good and clear,
> > but unsurprisingly, still about 5,482 orders-of-magnitude times
> > better than GCC docs).
> 
> If you find any problems/shortcomings in the GCC documentation,
> please file a PR, don't go whine on some unrelated mailing lists.
> Thank you.
> 
> > Semantics of "volatile" as applies to inline
> > asm, OTOH? You're completely relying on the compiler for that ...
> 
> Yes, and?  GCC promises the behaviour it has documented.

LOTS there, which obviously isn't correct, but which I'll reply to later,
easier stuff first. (call this "handwaving" if you want, but don't worry,
I /will/ bother myself to reply)


> > > > [ of course, if the (latest) GCC documentation is *yet again*
> > > >   wrong, then alright, not much I can do about it, is there. ]
> > > 
> > > There was (and is) nothing wrong about the "+m" documentation, if
> > > that is what you are talking about.  It could be extended now, to
> > > allow "+m" -- but that takes more than just "fixing" the documentation.
> > 
> > No, there was (and is) _everything_ wrong about the "+" documentation as
> > applies to memory-constrained operands. I don't give a whit if it's
> > some workaround in their gimplifier, or the other, that makes it possible
> > to use "+m" (like the current kernel code does). The docs suggest
> > otherwise, so there's obviously a clear disconnect between the docs and
> > actual GCC behaviour.
> 
> The documentation simply doesn't say "+m" is allowed.  The code to
> allow it was added for the benefit of people who do not read the
> documentation.  Documentation for "+m" might get added later if it
> is decided this [the code, not the documentation] is a sane thing
> to have (which isn't directly obvious).

Huh?

"If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong."

I wonder how could you even try to disagree with that.

And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong, which is true,
but probably you feel saying that is "not allowed" on non-gcc lists?)

As for the "PR" you're requesting me to file with GCC for this, that
gcc-patches@ thread did precisely that and more (submitted a patch to
said documentation -- and no, saying "documentation might get added
later" is totally bogus and nonsensical -- documentation exists to
document current behaviour, not past). But come on, this is wholly
petty. I wouldn't have replied, really, if you weren't so provoking.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:41                                   ` Satyam Sharma
@ 2007-08-18  4:13                                     ` Linus Torvalds
  2007-08-18 13:36                                       ` Satyam Sharma
                                                         ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-18  4:13 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul E. McKenney, Herbert Xu, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang

On Sat, 18 Aug 2007, Satyam Sharma wrote:
> 
> No code does (or would do, or should do):
> 
> 	x.counter++;
> 
> on an "atomic_t x;" anyway.

That's just an example of a general problem.

No, you don't use "x.counter++". But you *do* use

	if (atomic_read(&x) <= 1)

and loading into a register is stupid and pointless, when you could just 
do it as a regular memory-operand to the cmp instruction.

And as far as the compiler is concerned, the problem is the 100% same: 
combining operations with the volatile memop.

The fact is, a compiler that thinks that

	movl mem,reg
	cmpl $val,reg

is any better than

	cmpl $val,mem

is just not a very good compiler. But when talking about "volatile", 
that's exactly what ytou always get (and always have gotten - this is 
not a regression, and I doubt gcc is alone in this).

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  3:33                                                       ` Satyam Sharma
@ 2007-08-18  5:18                                                         ` Segher Boessenkool
  2007-08-18 13:20                                                           ` Satyam Sharma
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-18  5:18 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>> The documentation simply doesn't say "+m" is allowed.  The code to
>> allow it was added for the benefit of people who do not read the
>> documentation.  Documentation for "+m" might get added later if it
>> is decided this [the code, not the documentation] is a sane thing
>> to have (which isn't directly obvious).
>
> Huh?
>
> "If the (current) documentation doesn't match up with the (current)
> code, then _at least one_ of them has to be (as of current) wrong."
>
> I wonder how could you even try to disagree with that.

Easy.

The GCC documentation you're referring to is the user's manual.
See the blurb on the first page:

"This manual documents how to use the GNU compilers, as well as their
features and incompatibilities, and how to report bugs.  It corresponds
to GCC version 4.3.0.  The internals of the GNU compilers, including
how to port them to new targets and some information about how to write
front ends for new languages, are documented in a separate manual."

_How to use_.  This documentation doesn't describe in minute detail
everything the compiler does (see the source code for that -- no, it
isn't described in the internals manual either).

If it doesn't tell you how to use "+m", and even tells you _not_ to
use it, maybe that is what it means to say?  It doesn't mean "+m"
doesn't actually do something.  It also doesn't mean it does what
you think it should do.  It might do just that of course.  But treating
writing C code as an empirical science isn't such a smart idea.

> And I didn't go whining about this ... you asked me. (I think I'd said
> something to the effect of GCC docs are often wrong,

No need to guess at what you said, even if you managed to delete
your own mail already, there are plenty of free web-based archives
around.  You said:

> See, "volatile" C keyword, for all it's ill-definition and dodgy
> semantics, is still at least given somewhat of a treatment in the C
> standard (whose quality is ... ummm, sadly not always good and clear,
> but unsurprisingly, still about 5,482 orders-of-magnitude times
> better than GCC docs).

and that to me reads as complaining that the ISO C standard "isn't
very good" and that the GCC documentation is 10**5482 times worse
even.  Which of course is hyperbole and cannot be true.  It also
isn't helpful in any way or form for anyone on this list.  I call
that whining.

> which is true,

Yes, documentation of that size often has shortcomings.  No surprise
there.  However, great effort is made to make it better documentation,
and especially to keep it up to date; if you find any errors or
omissions, please report them.  There are many ways how to do that,
see the GCC homepage.</end-of-marketing-blurb>

> but probably you feel saying that is "not allowed" on non-gcc lists?)

You're allowed to say whatever you want.  Let's have a quote again
shall we?  I said:

> If you find any problems/shortcomings in the GCC documentation,
> please file a PR, don't go whine on some unrelated mailing lists.
> Thank you.

I read that as a friendly request, not a prohibition.  Well maybe
not actually friendly, more a bit angry.  A request, either way.

> As for the "PR"

"Problem report", a bugzilla ticket.  Sorry for using terminology
unknown to you.

> you're requesting me to file with GCC for this, that
> gcc-patches@ thread did precisely that

Actually not -- PRs make sure issues aren't forgotten (although
they might gather dust, sure).  But yes, submitting patches is a
Great Thing(tm).

> and more (submitted a patch to
> said documentation -- and no, saying "documentation might get added
> later" is totally bogus and nonsensical -- documentation exists to
> document current behaviour, not past).

When code like you want to write becomes a supported feature, that
will be reflected in the user manual.  It is completely nonsensical
to expect everything that is *not* a supported feature to be mentioned
there.

> I wouldn't have replied, really, if you weren't so provoking.

Hey, maybe that character trait is good for something, then.
Now to build a business plan around it...

Segher

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  5:18                                                         ` Segher Boessenkool
@ 2007-08-18 13:20                                                           ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18 13:20 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, zlynx, Andrew Morton,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

[ LOL, you _are_ shockingly petty! ]


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > The documentation simply doesn't say "+m" is allowed.  The code to
> > > allow it was added for the benefit of people who do not read the
> > > documentation.  Documentation for "+m" might get added later if it
> > > is decided this [the code, not the documentation] is a sane thing
> > > to have (which isn't directly obvious).
> > 
> > Huh?
> > 
> > "If the (current) documentation doesn't match up with the (current)
> > code, then _at least one_ of them has to be (as of current) wrong."
> > 
> > I wonder how could you even try to disagree with that.
> 
> Easy.
> 
> The GCC documentation you're referring to is the user's manual.
> See the blurb on the first page:
> 
> "This manual documents how to use the GNU compilers, as well as their
> features and incompatibilities, and how to report bugs.  It corresponds
> to GCC version 4.3.0.  The internals of the GNU compilers, including
> how to port them to new targets and some information about how to write
> front ends for new languages, are documented in a separate manual."
> 
> _How to use_.  This documentation doesn't describe in minute detail
> everything the compiler does (see the source code for that -- no, it
> isn't described in the internals manual either).

Wow, now that's a nice "disclaimer". By your (poor) standards of writing
documentation, one can as well write any factually incorrect stuff that
one wants in a document once you've got such a blurb in place :-)


> If it doesn't tell you how to use "+m", and even tells you _not_ to
> use it, maybe that is what it means to say?  It doesn't mean "+m"
> doesn't actually do something.  It also doesn't mean it does what
> you think it should do.  It might do just that of course.  But treating
> writing C code as an empirical science isn't such a smart idea.

Oh, really? Considering how much is (left out of being) documented, often
one would reasonably have to experimentally see (with testcases) how the
compiler behaves for some given code. Well, at least _I_ do it often
(several others on this list do as well), and I think there's everything
smart about it rather than having to read gcc sources -- I'd be surprised
(unless you have infinite free time on your hands, which does look like
teh case actually) if someone actually prefers reading gcc sources first
to know what/how gcc does something for some given code, rather than
simply write it out, compile and look the generated code (saves time for
those who don't have an infinite amount of it).


> > And I didn't go whining about this ... you asked me. (I think I'd said
> > something to the effect of GCC docs are often wrong,
> 
> No need to guess at what you said, even if you managed to delete
> your own mail already, there are plenty of free web-based archives
> around.  You said:
> 
> > See, "volatile" C keyword, for all it's ill-definition and dodgy
> > semantics, is still at least given somewhat of a treatment in the C
> > standard (whose quality is ... ummm, sadly not always good and clear,
> > but unsurprisingly, still about 5,482 orders-of-magnitude times
> > better than GCC docs).

Try _reading_ what I said there, for a change, dude. I'd originally only
said "unless GCC's docs is yet again wrong" ... then _you_ asked me what,
after which this discussion began and I wrote the above [which I fully
agree with -- so what if I used hyperbole in my sentence (yup, that was
intended, and obviously, exaggeration), am I not even allowed to do that?
Man, you're a Nazi or what ...] I didn't go whining about on my own as
you'd had earlier suggested, until _you_ asked me.

[ Ick, I somehow managed to reply this ... this is such a ...
  *disgustingly* petty argument you made here. ]


> > which is true,
> 
> Yes, documentation of that size often has shortcomings.  No surprise
> there.  However, great effort is made to make it better documentation,
> and especially to keep it up to date; if you find any errors or
> omissions, please report them.  There are many ways how to do that,
> see the GCC homepage.</end-of-marketing-blurb>
                         ^^^^^^^^^^^^^^^^^^^^^^

Looks like you even get paid :-)


> > but probably you feel saying that is "not allowed" on non-gcc lists?)
> 
> [amazingly pointless stuff snipped]
> 
> > As for the "PR"
> > you're requesting me to file with GCC for this, that
> > gcc-patches@ thread did precisely that
> 
> [more amazingly pointless stuff snipped]
> 
> > and more (submitted a patch to
> > said documentation -- and no, saying "documentation might get added
> > later" is totally bogus and nonsensical -- documentation exists to
> > document current behaviour, not past).
> 
> When code like you want to write becomes a supported feature, that
> will be reflected in the user manual.  It is completely nonsensical
> to expect everything that is *not* a supported feature to be mentioned
> there.

What crap. It is _perfectly reasonable_ to expect (current) documentation
to keep up with (current) code behaviour. In fact trying to justify such
a state is completely bogus and nonsensical.


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  4:13                                     ` Linus Torvalds
@ 2007-08-18 13:36                                       ` Satyam Sharma
  2007-08-18 21:54                                       ` Paul E. McKenney
  2007-08-24 12:19                                       ` Denys Vlasenko
  2 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18 13:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Lameter, Paul E. McKenney, Herbert Xu, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang



On Fri, 17 Aug 2007, Linus Torvalds wrote:

> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> > 
> > No code does (or would do, or should do):
> > 
> > 	x.counter++;
> > 
> > on an "atomic_t x;" anyway.
> 
> That's just an example of a general problem.
> 
> No, you don't use "x.counter++". But you *do* use
> 
> 	if (atomic_read(&x) <= 1)
> 
> and loading into a register is stupid and pointless, when you could just 
> do it as a regular memory-operand to the cmp instruction.

True, but that makes this a bad/poor code generation issue with the
compiler, not something that affects the _correctness_ of atomic ops if
"volatile" is used for that counter object (as was suggested), because
we'd always use the atomic_inc() etc primitives to do increments, which
are always (should be!) implemented to be atomic.


> And as far as the compiler is concerned, the problem is the 100% same: 
> combining operations with the volatile memop.
> 
> The fact is, a compiler that thinks that
> 
> 	movl mem,reg
> 	cmpl $val,reg
> 
> is any better than
> 
> 	cmpl $val,mem
> 
> is just not a very good compiler.

Absolutely, this is definitely a bug report worth opening with gcc. And
what you've said to explain this previously sounds definitely correct --
seeing "volatile" for any access does appear to just scare the hell out
of gcc and makes it generate such (poor) code.


Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures)
  2007-08-17  8:06                                                   ` Nick Piggin
  2007-08-17  8:58                                                     ` Satyam Sharma
  2007-08-17 10:48                                                     ` Stefan Richter
@ 2007-08-18 14:35                                                     ` Stefan Richter
  2007-08-20 13:28                                                       ` Chris Snook
  2 siblings, 1 reply; 657+ messages in thread
From: Stefan Richter @ 2007-08-18 14:35 UTC (permalink / raw)
  To: Jonathan Corbet, Greg Kroah-Hartman
  Cc: Nick Piggin, paulmck, Herbert Xu, Paul Mackerras, Satyam Sharma,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

Nick Piggin wrote:
> Stefan Richter wrote:
>> Nick Piggin wrote:
>>
>>> I don't know why people would assume volatile of atomics. AFAIK, most
>>> of the documentation is pretty clear that all the atomic stuff can be
>>> reordered etc. except for those that modify and return a value.
>>
>>
>> Which documentation is there?
> 
> Documentation/atomic_ops.txt
> 
> 
>> For driver authors, there is LDD3.  It doesn't specifically cover
>> effects of optimization on accesses to atomic_t.
>>
>> For architecture port authors, there is Documentation/atomic_ops.txt.
>> Driver authors also can learn something from that document, as it
>> indirectly documents the atomic_t and bitops APIs.
>>
> 
> "Semantics and Behavior of Atomic and Bitmask Operations" is
> pretty direct :)
> 
> Sure, it says that it's for arch maintainers, but there is no
> reason why users can't make use of it.


Note, LDD3 page 238 says:  "It is worth noting that most of the other
kernel primitives dealing with synchronization, such as spinlock and
atomic_t operations, also function as memory barriers."

I don't know about Linux 2.6.10 against which LDD3 was written, but
currently only _some_ atomic_t operations function as memory barriers.

Besides, judging from some posts in this thread, saying that atomic_t
operations dealt with synchronization may not be entirely precise.
-- 
Stefan Richter
-=====-=-=== =--- =--=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 22:38                   ` Segher Boessenkool
@ 2007-08-18 14:42                     ` Satyam Sharma
  0 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-18 14:42 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Bill Fink, Linux Kernel Mailing List, Paul E. McKenney, netdev,
	ak, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang



On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > GCC manual, section 6.1, "When
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > is a Volatile Object Accessed?" doesn't say anything of the
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > kind.
      ^^^^^

> > True, "implementation-defined" as per the C standard _is_ supposed to mean
    ^^^^^

> > "unspecified behaviour where each implementation documents how the choice
> > is made". So ok, probably GCC isn't "documenting" this
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > implementation-defined behaviour which it is supposed to, but can't really
> > fault them much for this, probably.
> 
> GCC _is_ documenting this, namely in this section 6.1.

(Again totally petty, but) Yes, but ...

> It doesn't
  ^^^^^^^^^^
> mention volatile-casted stuff.  Draw your own conclusions.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

... exactly. So that's why I said "GCC isn't documenting _this_".

Man, try _reading_ mails before replying to them ...

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  4:13                                     ` Linus Torvalds
  2007-08-18 13:36                                       ` Satyam Sharma
@ 2007-08-18 21:54                                       ` Paul E. McKenney
  2007-08-18 22:41                                         ` Linus Torvalds
  2007-08-24 12:19                                       ` Denys Vlasenko
  2 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-18 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Satyam Sharma, Christoph Lameter, Herbert Xu, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang

On Fri, Aug 17, 2007 at 09:13:35PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> > 
> > No code does (or would do, or should do):
> > 
> > 	x.counter++;
> > 
> > on an "atomic_t x;" anyway.
> 
> That's just an example of a general problem.
> 
> No, you don't use "x.counter++". But you *do* use
> 
> 	if (atomic_read(&x) <= 1)
> 
> and loading into a register is stupid and pointless, when you could just 
> do it as a regular memory-operand to the cmp instruction.
> 
> And as far as the compiler is concerned, the problem is the 100% same: 
> combining operations with the volatile memop.
> 
> The fact is, a compiler that thinks that
> 
> 	movl mem,reg
> 	cmpl $val,reg
> 
> is any better than
> 
> 	cmpl $val,mem
> 
> is just not a very good compiler. But when talking about "volatile", 
> that's exactly what ytou always get (and always have gotten - this is 
> not a regression, and I doubt gcc is alone in this).

One of the gcc guys claimed that he thought that the two-instruction
sequence would be faster on some x86 machines.  I pointed out that
there might be a concern about code size.  I chose not to point out
that people might also care about the other x86 machines.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:24                                 ` Christoph Lameter
  2007-08-18  1:41                                   ` Satyam Sharma
@ 2007-08-18 21:56                                   ` Paul E. McKenney
  2007-08-20 13:31                                   ` Chris Snook
  2 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-18 21:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Herbert Xu, Linus Torvalds, Nick Piggin, Paul Mackerras,
	Segher Boessenkool, heiko.carstens, horms, linux-kernel, rpjday,
	ak, netdev, cfriesen, akpm, jesper.juhl, linux-arch, zlynx,
	satyam, schwidefsky, Chris Snook, davem, wensong, wjiang

On Fri, Aug 17, 2007 at 06:24:15PM -0700, Christoph Lameter wrote:
> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
> > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > > >
> > > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > > 
> > > I had totally forgotten that I'd already filed that bug more
> > > than six years ago until they just closed yours as a duplicate
> > > of mine :)
> > > 
> > > Good luck in getting it fixed!
> > 
> > Well, just got done re-opening it for the third time.  And a local
> > gcc community member advised me not to give up too easily.  But I
> > must admit that I am impressed with the speed that it was identified
> > as duplicate.
> > 
> > Should be entertaining!  ;-)
> 
> Right. ROTFL... volatile actually breaks atomic_t instead of making it 
> safe. x++ becomes a register load, increment and a register store. Without 
> volatile we can increment the memory directly. It seems that volatile 
> requires that the variable is loaded into a register first and then 
> operated upon. Understandable when you think about volatile being used to 
> access memory mapped I/O registers where a RMW operation could be 
> problematic.
> 
> See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3506

Yep.  The initial reaction was in fact to close my bug as a duplicate
of 3506.  But I was not asking for atomicity, but rather for smaller
code to be generated, so I reopened it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18 21:54                                       ` Paul E. McKenney
@ 2007-08-18 22:41                                         ` Linus Torvalds
  2007-08-18 23:19                                           ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2007-08-18 22:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Satyam Sharma, Christoph Lameter, Herbert Xu, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang

On Sat, 18 Aug 2007, Paul E. McKenney wrote:
> 
> One of the gcc guys claimed that he thought that the two-instruction
> sequence would be faster on some x86 machines.  I pointed out that
> there might be a concern about code size.  I chose not to point out
> that people might also care about the other x86 machines.  ;-)

Some (very few) x86 uarchs do tend to prefer "load-store" like code 
generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can 
actually be faster on some of them. Not any that are relevant today, 
though.

Also, that has nothing to do with volatile, and should be controlled by 
optimization flags (like -mtune). In fact, I thought there was a separate 
flag to do that (ie something like "-mload-store"), but I can't find it, 
so maybe that's just my fevered brain..

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18 22:41                                         ` Linus Torvalds
@ 2007-08-18 23:19                                           ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-18 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Satyam Sharma, Christoph Lameter, Herbert Xu, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, schwidefsky, Chris Snook, davem, wensong,
	wjiang

On Sat, Aug 18, 2007 at 03:41:13PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 18 Aug 2007, Paul E. McKenney wrote:
> > 
> > One of the gcc guys claimed that he thought that the two-instruction
> > sequence would be faster on some x86 machines.  I pointed out that
> > there might be a concern about code size.  I chose not to point out
> > that people might also care about the other x86 machines.  ;-)
> 
> Some (very few) x86 uarchs do tend to prefer "load-store" like code 
> generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can 
> actually be faster on some of them. Not any that are relevant today, 
> though.

;-)

> Also, that has nothing to do with volatile, and should be controlled by 
> optimization flags (like -mtune). In fact, I thought there was a separate 
> flag to do that (ie something like "-mload-store"), but I can't find it, 
> so maybe that's just my fevered brain..

Good point, will suggest this if the need arises.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 16:48                                                             ` Linus Torvalds
  2007-08-17 18:50                                                               ` Chris Friesen
@ 2007-08-20 13:15                                                               ` Chris Snook
  2007-08-20 13:32                                                                 ` Herbert Xu
  2007-08-21  5:46                                                                 ` Linus Torvalds
  2007-09-09 18:02                                                               ` Denys Vlasenko
  2 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-20 13:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

Linus Torvalds wrote:
> So the only reason to add back "volatile" to the atomic_read() sequence is 
> not to fix bugs, but to _hide_ the bugs better. They're still there, they 
> are just a lot harder to trigger, and tend to be a lot subtler.

What about barrier removal?  With consistent semantics we could optimize a fair 
amount of code.  Whether or not that constitutes "premature" optimization is 
open to debate, but there's no question we could reduce our register wiping in 
some places.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures)
  2007-08-18 14:35                                                     ` LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures) Stefan Richter
@ 2007-08-20 13:28                                                       ` Chris Snook
  0 siblings, 0 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-20 13:28 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Jonathan Corbet, Greg Kroah-Hartman, Nick Piggin, paulmck,
	Herbert Xu, Paul Mackerras, Satyam Sharma, Christoph Lameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Stefan Richter wrote:
> Nick Piggin wrote:
>> Stefan Richter wrote:
>>> Nick Piggin wrote:
>>>
>>>> I don't know why people would assume volatile of atomics. AFAIK, most
>>>> of the documentation is pretty clear that all the atomic stuff can be
>>>> reordered etc. except for those that modify and return a value.
>>>
>>> Which documentation is there?
>> Documentation/atomic_ops.txt
>>
>>
>>> For driver authors, there is LDD3.  It doesn't specifically cover
>>> effects of optimization on accesses to atomic_t.
>>>
>>> For architecture port authors, there is Documentation/atomic_ops.txt.
>>> Driver authors also can learn something from that document, as it
>>> indirectly documents the atomic_t and bitops APIs.
>>>
>> "Semantics and Behavior of Atomic and Bitmask Operations" is
>> pretty direct :)
>>
>> Sure, it says that it's for arch maintainers, but there is no
>> reason why users can't make use of it.
> 
> 
> Note, LDD3 page 238 says:  "It is worth noting that most of the other
> kernel primitives dealing with synchronization, such as spinlock and
> atomic_t operations, also function as memory barriers."
> 
> I don't know about Linux 2.6.10 against which LDD3 was written, but
> currently only _some_ atomic_t operations function as memory barriers.
> 
> Besides, judging from some posts in this thread, saying that atomic_t
> operations dealt with synchronization may not be entirely precise.

atomic_t is often used as the basis for implementing more sophisticated 
synchronization mechanisms, such as rwlocks.  Whether or not they are designed 
for that purpose, the atomic_* operations are de facto synchronization primitives.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  1:24                                 ` Christoph Lameter
  2007-08-18  1:41                                   ` Satyam Sharma
  2007-08-18 21:56                                   ` Paul E. McKenney
@ 2007-08-20 13:31                                   ` Chris Snook
  2007-08-20 22:04                                     ` Segher Boessenkool
  2 siblings, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-20 13:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Herbert Xu, Linus Torvalds, Nick Piggin,
	Paul Mackerras, Segher Boessenkool, heiko.carstens, horms,
	linux-kernel, rpjday, ak, netdev, cfriesen, akpm, jesper.juhl,
	linux-arch, zlynx, satyam, schwidefsky, davem, wensong, wjiang

Christoph Lameter wrote:
> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
>> On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
>>> On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
>>>> gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
>>> I had totally forgotten that I'd already filed that bug more
>>> than six years ago until they just closed yours as a duplicate
>>> of mine :)
>>>
>>> Good luck in getting it fixed!
>> Well, just got done re-opening it for the third time.  And a local
>> gcc community member advised me not to give up too easily.  But I
>> must admit that I am impressed with the speed that it was identified
>> as duplicate.
>>
>> Should be entertaining!  ;-)
> 
> Right. ROTFL... volatile actually breaks atomic_t instead of making it 
> safe. x++ becomes a register load, increment and a register store. Without 
> volatile we can increment the memory directly. It seems that volatile 
> requires that the variable is loaded into a register first and then 
> operated upon. Understandable when you think about volatile being used to 
> access memory mapped I/O registers where a RMW operation could be 
> problematic.

So, if we want consistent behavior, we're pretty much screwed unless we use 
inline assembler everywhere?

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 13:15                                                               ` Chris Snook
@ 2007-08-20 13:32                                                                 ` Herbert Xu
  2007-08-20 13:38                                                                   ` Chris Snook
  2007-08-21  5:46                                                                 ` Linus Torvalds
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Xu @ 2007-08-20 13:32 UTC (permalink / raw)
  To: Chris Snook
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Paul Mackerras,
	Christoph Lameter, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Mon, Aug 20, 2007 at 09:15:11AM -0400, Chris Snook wrote:
> Linus Torvalds wrote:
> >So the only reason to add back "volatile" to the atomic_read() sequence is 
> >not to fix bugs, but to _hide_ the bugs better. They're still there, they 
> >are just a lot harder to trigger, and tend to be a lot subtler.
> 
> What about barrier removal?  With consistent semantics we could optimize a 
> fair amount of code.  Whether or not that constitutes "premature" 
> optimization is open to debate, but there's no question we could reduce our 
> register wiping in some places.

If you've been reading all of Linus's emails you should be
thinking about adding memory barriers, and not removing
compiler barriers.

He's just told you that code of the kind

	while (!atomic_read(cond))
		;

	do_something()

probably needs a memory barrier (not just compiler) so that
do_something() doesn't see stale cache content that occured
before cond flipped.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 13:32                                                                 ` Herbert Xu
@ 2007-08-20 13:38                                                                   ` Chris Snook
  2007-08-20 22:07                                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Chris Snook @ 2007-08-20 13:38 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Paul Mackerras,
	Christoph Lameter, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

Herbert Xu wrote:
> On Mon, Aug 20, 2007 at 09:15:11AM -0400, Chris Snook wrote:
>> Linus Torvalds wrote:
>>> So the only reason to add back "volatile" to the atomic_read() sequence is 
>>> not to fix bugs, but to _hide_ the bugs better. They're still there, they 
>>> are just a lot harder to trigger, and tend to be a lot subtler.
>> What about barrier removal?  With consistent semantics we could optimize a 
>> fair amount of code.  Whether or not that constitutes "premature" 
>> optimization is open to debate, but there's no question we could reduce our 
>> register wiping in some places.
> 
> If you've been reading all of Linus's emails you should be
> thinking about adding memory barriers, and not removing
> compiler barriers.
> 
> He's just told you that code of the kind
> 
> 	while (!atomic_read(cond))
> 		;
> 
> 	do_something()
> 
> probably needs a memory barrier (not just compiler) so that
> do_something() doesn't see stale cache content that occured
> before cond flipped.

Such code generally doesn't care precisely when it gets the update, just that 
the update is atomic, and it doesn't loop forever.  Regardless, I'm convinced we 
just need to do it all in assembly.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 13:31                                   ` Chris Snook
@ 2007-08-20 22:04                                     ` Segher Boessenkool
  2007-08-20 22:48                                       ` Russell King
  0 siblings, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-20 22:04 UTC (permalink / raw)
  To: Chris Snook
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

>> Right. ROTFL... volatile actually breaks atomic_t instead of making 
>> it safe. x++ becomes a register load, increment and a register store. 
>> Without volatile we can increment the memory directly. It seems that 
>> volatile requires that the variable is loaded into a register first 
>> and then operated upon. Understandable when you think about volatile 
>> being used to access memory mapped I/O registers where a RMW 
>> operation could be problematic.
>
> So, if we want consistent behavior, we're pretty much screwed unless 
> we use inline assembler everywhere?

Nah, this whole argument is flawed -- "without volatile" we still
*cannot* "increment the memory directly".  On x86, you need a lock
prefix; on other archs, some other mechanism to make the memory
increment an *atomic* memory increment.

And no, RMW on MMIO isn't "problematic" at all, either.

An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode.  But three actual operations.


The advantages of asm code for atomic_{read,set} are:
1) all the other atomic ops are implemented that way already;
2) you have full control over the asm insns selected, in particular,
    you can guarantee you *do* get an atomic op;
3) you don't need to use "volatile <data>" which generates
    not-all-that-good code on archs like x86, and we want to get rid
    of it anyway since it is problematic in many ways;
4) you don't need to use *(volatile <type>*)&<data>, which a) doesn't
    exist in C; b) isn't documented or supported in GCC; c) has a recent
    history of bugginess; d) _still uses volatile objects_; e) _still_
    is problematic in almost all those same ways as in 3);
5) you can mix atomic and non-atomic accesses to the atomic_t, which
    you cannot with the other alternatives.

The only disadvantage I know of is potentially slightly worse
instruction scheduling.  This is a generic asm() problem: GCC
cannot see what actual insns are inside the asm() block.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 13:38                                                                   ` Chris Snook
@ 2007-08-20 22:07                                                                     ` Segher Boessenkool
  0 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-20 22:07 UTC (permalink / raw)
  To: Chris Snook
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens,
	Stefan Richter, horms, Satyam Sharma, Ilpo Jarvinen,
	Linux Kernel Mailing List, David Miller, Paul E. McKenney, ak,
	Netdev, cfriesen, rpjday, jesper.juhl, linux-arch, Andrew Morton,
	zlynx, schwidefsky, Herbert Xu, Linus Torvalds, wensong,
	Nick Piggin, wjiang

> Such code generally doesn't care precisely when it gets the update, 
> just that the update is atomic, and it doesn't loop forever.

Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.

> Regardless, I'm convinced we just need to do it all in assembly.

So do you want "volatile asm" or "plain asm", for atomic_read()?
The asm version has two ways to go about it too...


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 22:04                                     ` Segher Boessenkool
@ 2007-08-20 22:48                                       ` Russell King
  2007-08-20 23:02                                         ` Segher Boessenkool
  0 siblings, 1 reply; 657+ messages in thread
From: Russell King @ 2007-08-20 22:48 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Chris Snook, Christoph Lameter, Paul Mackerras, heiko.carstens,
	horms, linux-kernel, Paul E. McKenney, ak, netdev, cfriesen,
	akpm, rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam,
	zlynx, schwidefsky, Herbert Xu, davem, Linus Torvalds, wensong,
	wjiang

On Tue, Aug 21, 2007 at 12:04:17AM +0200, Segher Boessenkool wrote:
> And no, RMW on MMIO isn't "problematic" at all, either.
> 
> An RMW op is a read op, a modify op, and a write op, all rolled
> into one opcode.  But three actual operations.

Maybe for some CPUs, but not all.  ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.

This means placing atomic_t or bitops into MMIO space is a definite
no-go on ARM.  It breaks.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 22:48                                       ` Russell King
@ 2007-08-20 23:02                                         ` Segher Boessenkool
  2007-08-21  0:05                                           ` Paul E. McKenney
  2007-08-21  7:05                                           ` Russell King
  0 siblings, 2 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-20 23:02 UTC (permalink / raw)
  To: Russell King
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>> And no, RMW on MMIO isn't "problematic" at all, either.
>>
>> An RMW op is a read op, a modify op, and a write op, all rolled
>> into one opcode.  But three actual operations.
>
> Maybe for some CPUs, but not all.  ARM for instance can't use the
> load exclusive and store exclusive instructions to MMIO space.

Sure, your CPU doesn't have RMW instructions -- how to emulate
those if you don't have them is a totally different thing.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 23:02                                         ` Segher Boessenkool
@ 2007-08-21  0:05                                           ` Paul E. McKenney
  2007-08-21  7:08                                             ` Russell King
  2007-08-21  7:05                                           ` Russell King
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-21  0:05 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Russell King, Christoph Lameter, Paul Mackerras, heiko.carstens,
	horms, linux-kernel, ak, netdev, cfriesen, akpm, rpjday,
	Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote:
> >>And no, RMW on MMIO isn't "problematic" at all, either.
> >>
> >>An RMW op is a read op, a modify op, and a write op, all rolled
> >>into one opcode.  But three actual operations.
> >
> >Maybe for some CPUs, but not all.  ARM for instance can't use the
> >load exclusive and store exclusive instructions to MMIO space.
> 
> Sure, your CPU doesn't have RMW instructions -- how to emulate
> those if you don't have them is a totally different thing.

I thought that ARM's load exclusive and store exclusive instructions
were its equivalent of LL and SC, which RISC machines typically use to
build atomic sequences of instructions -- and which normally cannot be
applied to MMIO space.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 13:15                                                               ` Chris Snook
  2007-08-20 13:32                                                                 ` Herbert Xu
@ 2007-08-21  5:46                                                                 ` Linus Torvalds
  2007-08-21  7:04                                                                   ` David Miller
  1 sibling, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2007-08-21  5:46 UTC (permalink / raw)
  To: Chris Snook
  Cc: Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher



On Mon, 20 Aug 2007, Chris Snook wrote:
>
> What about barrier removal?  With consistent semantics we could optimize a
> fair amount of code.  Whether or not that constitutes "premature" optimization
> is open to debate, but there's no question we could reduce our register wiping
> in some places.

Why do people think that barriers are expensive? They really aren't. 
Especially the regular compiler barrier is basically zero cost. Any 
reasonable compiler will just flush the stuff it holds in registers that 
isn't already automatic local variables, and for regular kernel code, that 
tends to basically be nothing at all.

Ie a "barrier()" is likely _cheaper_ than the code generation downside 
from using "volatile".

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  5:46                                                                 ` Linus Torvalds
@ 2007-08-21  7:04                                                                   ` David Miller
  2007-08-21 13:50                                                                     ` Chris Snook
  0 siblings, 1 reply; 657+ messages in thread
From: David Miller @ 2007-08-21  7:04 UTC (permalink / raw)
  To: torvalds
  Cc: csnook, piggin, satyam, herbert, paulus, clameter, ilpo.jarvinen,
	paulmck, stefanr, linux-kernel, linux-arch, netdev, akpm, ak,
	heiko.carstens, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)

> Ie a "barrier()" is likely _cheaper_ than the code generation downside 
> from using "volatile".

Assuming GCC were ever better about the code generation badness
with volatile that has been discussed here, I much prefer
we tell GCC "this memory piece changed" rather than "every
piece of memory has changed" which is what the barrier() does.

I happened to have been scanning a lot of assembler lately to
track down a gcc-4.2 miscompilation on sparc64, and the barriers
do hurt quite a bit in some places.  Instead of keeping unrelated
variables around cached in local registers, it reloads everything.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-20 23:02                                         ` Segher Boessenkool
  2007-08-21  0:05                                           ` Paul E. McKenney
@ 2007-08-21  7:05                                           ` Russell King
  2007-08-21  9:33                                             ` Paul Mackerras
  2007-08-21 14:39                                             ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Russell King @ 2007-08-21  7:05 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote:
> >>And no, RMW on MMIO isn't "problematic" at all, either.
> >>
> >>An RMW op is a read op, a modify op, and a write op, all rolled
> >>into one opcode.  But three actual operations.
> >
> >Maybe for some CPUs, but not all.  ARM for instance can't use the
> >load exclusive and store exclusive instructions to MMIO space.
> 
> Sure, your CPU doesn't have RMW instructions -- how to emulate
> those if you don't have them is a totally different thing.

Let me say it more clearly: On ARM, it is impossible to perform atomic
operations on MMIO space.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  0:05                                           ` Paul E. McKenney
@ 2007-08-21  7:08                                             ` Russell King
  0 siblings, 0 replies; 657+ messages in thread
From: Russell King @ 2007-08-21  7:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Segher Boessenkool, Christoph Lameter, Paul Mackerras,
	heiko.carstens, horms, linux-kernel, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

On Mon, Aug 20, 2007 at 05:05:18PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 21, 2007 at 01:02:01AM +0200, Segher Boessenkool wrote:
> > >>And no, RMW on MMIO isn't "problematic" at all, either.
> > >>
> > >>An RMW op is a read op, a modify op, and a write op, all rolled
> > >>into one opcode.  But three actual operations.
> > >
> > >Maybe for some CPUs, but not all.  ARM for instance can't use the
> > >load exclusive and store exclusive instructions to MMIO space.
> > 
> > Sure, your CPU doesn't have RMW instructions -- how to emulate
> > those if you don't have them is a totally different thing.
> 
> I thought that ARM's load exclusive and store exclusive instructions
> were its equivalent of LL and SC, which RISC machines typically use to
> build atomic sequences of instructions -- and which normally cannot be
> applied to MMIO space.

Absolutely correct.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  7:05                                           ` Russell King
@ 2007-08-21  9:33                                             ` Paul Mackerras
  2007-08-21 11:37                                               ` Andi Kleen
  2007-08-21 14:48                                               ` Segher Boessenkool
  2007-08-21 14:39                                             ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Paul Mackerras @ 2007-08-21  9:33 UTC (permalink / raw)
  To: Russell King
  Cc: Segher Boessenkool, Christoph Lameter, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

Russell King writes:

> Let me say it more clearly: On ARM, it is impossible to perform atomic
> operations on MMIO space.

Actually, no one is suggesting that we try to do that at all.

The discussion about RMW ops on MMIO space started with a comment
attributed to the gcc developers that one reason why gcc on x86
doesn't use instructions that do RMW ops on volatile variables is that
volatile is used to mark MMIO addresses, and there was some
uncertainty about whether (non-atomic) RMW ops on x86 could be used on
MMIO.  This is in regard to the question about why gcc on x86 always
moves a volatile variable into a register before doing anything to it.

So the whole discussion is irrelevant to ARM, PowerPC and any other
architecture except x86[-64].

Paul.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  9:33                                             ` Paul Mackerras
@ 2007-08-21 11:37                                               ` Andi Kleen
  2007-08-21 14:48                                               ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Andi Kleen @ 2007-08-21 11:37 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Russell King, Segher Boessenkool, Christoph Lameter,
	heiko.carstens, horms, linux-kernel, Paul E. McKenney, ak,
	netdev, cfriesen, akpm, rpjday, Nick Piggin, linux-arch,
	jesper.juhl, satyam, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

On Tue, Aug 21, 2007 at 07:33:49PM +1000, Paul Mackerras wrote:
> So the whole discussion is irrelevant to ARM, PowerPC and any other
> architecture except x86[-64].

It's even irrelevant on x86 because all modifying operations on atomic_t 
are coded in inline assembler and will always be RMW no matter
if atomic_t is volatile or not.

[ignoring atomic_set(x, atomic_read(x) + 1) which nobody should do]

The only issue is if atomic_t should have a implicit barrier or not.
My personal opinion is yes -- better safe than sorry. And any code
impact it may have is typically dwarved by the next cache miss anyways,
so it doesn't matter much.

-Andi


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  7:04                                                                   ` David Miller
@ 2007-08-21 13:50                                                                     ` Chris Snook
  2007-08-21 14:59                                                                       ` Segher Boessenkool
                                                                                         ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-21 13:50 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, piggin, satyam, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, linux-kernel, linux-arch,
	netdev, akpm, ak, heiko.carstens, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)
> 
>> Ie a "barrier()" is likely _cheaper_ than the code generation downside 
>> from using "volatile".
> 
> Assuming GCC were ever better about the code generation badness
> with volatile that has been discussed here, I much prefer
> we tell GCC "this memory piece changed" rather than "every
> piece of memory has changed" which is what the barrier() does.
> 
> I happened to have been scanning a lot of assembler lately to
> track down a gcc-4.2 miscompilation on sparc64, and the barriers
> do hurt quite a bit in some places.  Instead of keeping unrelated
> variables around cached in local registers, it reloads everything.

Moore's law is definitely working against us here.  Register counts, 
pipeline depths, core counts, and clock multipliers are all increasing 
in the long run.  At some point in the future, barrier() will be 
universally regarded as a hammer too big for most purposes.  Whether or 
not removing it now constitutes premature optimization is arguable, but 
I think we should allow such optimization to happen (or not happen) in 
architecture-dependent code, and provide a consistent API that doesn't 
require the use of such things in arch-independent code where it might 
turn into a totally superfluous performance killer depending on what 
hardware it gets compiled for.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  7:05                                           ` Russell King
  2007-08-21  9:33                                             ` Paul Mackerras
@ 2007-08-21 14:39                                             ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-21 14:39 UTC (permalink / raw)
  To: Russell King
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>>>> And no, RMW on MMIO isn't "problematic" at all, either.
>>>>
>>>> An RMW op is a read op, a modify op, and a write op, all rolled
>>>> into one opcode.  But three actual operations.
>>>
>>> Maybe for some CPUs, but not all.  ARM for instance can't use the
>>> load exclusive and store exclusive instructions to MMIO space.
>>
>> Sure, your CPU doesn't have RMW instructions -- how to emulate
>> those if you don't have them is a totally different thing.
>
> Let me say it more clearly: On ARM, it is impossible to perform atomic
> operations on MMIO space.

It's all completely beside the point, see the other subthread, but...

Yeah, you can't do LL/SC to MMIO space; ARM isn't alone in that.
You could still implement atomic operations on MMIO space by taking
a lock elsewhere, in normal cacheable memory space.  Why you would
do this is a separate question, you probably don't want it :-)


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21  9:33                                             ` Paul Mackerras
  2007-08-21 11:37                                               ` Andi Kleen
@ 2007-08-21 14:48                                               ` Segher Boessenkool
  2007-08-21 16:16                                                 ` Paul E. McKenney
  1 sibling, 1 reply; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-21 14:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Russell King, Christoph Lameter, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang

>> Let me say it more clearly: On ARM, it is impossible to perform atomic
>> operations on MMIO space.
>
> Actually, no one is suggesting that we try to do that at all.
>
> The discussion about RMW ops on MMIO space started with a comment
> attributed to the gcc developers that one reason why gcc on x86
> doesn't use instructions that do RMW ops on volatile variables is that
> volatile is used to mark MMIO addresses, and there was some
> uncertainty about whether (non-atomic) RMW ops on x86 could be used on
> MMIO.  This is in regard to the question about why gcc on x86 always
> moves a volatile variable into a register before doing anything to it.

This question is GCC PR33102, which was incorrectly closed as a 
duplicate
of PR3506 -- and *that* PR was closed because its reporter seemed to
claim the GCC generated code for an increment on a volatile (namely, 
three
machine instructions: load, modify, store) was incorrect, and it has to
be one machine instruction.

> So the whole discussion is irrelevant to ARM, PowerPC and any other
> architecture except x86[-64].

And even there, it's not something the kernel can take advantage of
before GCC 4.4 is in widespread use, if then.  Let's move on.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 13:50                                                                     ` Chris Snook
@ 2007-08-21 14:59                                                                       ` Segher Boessenkool
  2007-08-21 16:31                                                                       ` Satyam Sharma
  2007-08-21 16:43                                                                       ` Linus Torvalds
  2 siblings, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-08-21 14:59 UTC (permalink / raw)
  To: Chris Snook
  Cc: paulmck, heiko.carstens, ilpo.jarvinen, horms, linux-kernel,
	David Miller, rpjday, netdev, ak, piggin, akpm, torvalds,
	cfriesen, jesper.juhl, linux-arch, paulus, herbert, satyam,
	clameter, stefanr, schwidefsky, zlynx, wensong, wjiang

> At some point in the future, barrier() will be universally regarded as 
> a hammer too big for most purposes.  Whether or not removing it now

You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained "compiler barrier",
I presume?

> constitutes premature optimization is arguable, but I think we should 
> allow such optimization to happen (or not happen) in 
> architecture-dependent code, and provide a consistent API that doesn't 
> require the use of such things in arch-independent code where it might 
> turn into a totally superfluous performance killer depending on what 
> hardware it gets compiled for.

Explicit barrier()s won't be too hard to replace -- but what to do
about the implicit barrier()s in rmb() etc. etc. -- *those* will be
hard to get rid of, if only because it is hard enough to teach driver
authors about how to use those primitives *already*.  It is far from
clear what a good interface like that would look like, anyway.

Probably we should first start experimenting with a forget()-style
micro-barrier (but please, find a better name), and see if a nice
usage pattern shows up that can be turned into an API.

Segher

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 14:48                                               ` Segher Boessenkool
@ 2007-08-21 16:16                                                 ` Paul E. McKenney
  2007-08-21 22:51                                                   ` Valdis.Kletnieks
  0 siblings, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-21 16:16 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul Mackerras, Russell King, Christoph Lameter, heiko.carstens,
	horms, linux-kernel, ak, netdev, cfriesen, akpm, rpjday,
	Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang

On Tue, Aug 21, 2007 at 04:48:51PM +0200, Segher Boessenkool wrote:
> >>Let me say it more clearly: On ARM, it is impossible to perform atomic
> >>operations on MMIO space.
> >
> >Actually, no one is suggesting that we try to do that at all.
> >
> >The discussion about RMW ops on MMIO space started with a comment
> >attributed to the gcc developers that one reason why gcc on x86
> >doesn't use instructions that do RMW ops on volatile variables is that
> >volatile is used to mark MMIO addresses, and there was some
> >uncertainty about whether (non-atomic) RMW ops on x86 could be used on
> >MMIO.  This is in regard to the question about why gcc on x86 always
> >moves a volatile variable into a register before doing anything to it.
> 
> This question is GCC PR33102, which was incorrectly closed as a 
> duplicate
> of PR3506 -- and *that* PR was closed because its reporter seemed to
> claim the GCC generated code for an increment on a volatile (namely, 
> three
> machine instructions: load, modify, store) was incorrect, and it has to
> be one machine instruction.
> 
> >So the whole discussion is irrelevant to ARM, PowerPC and any other
> >architecture except x86[-64].
> 
> And even there, it's not something the kernel can take advantage of
> before GCC 4.4 is in widespread use, if then.  Let's move on.

I agree that instant gratification is hard to come by when synching
up compiler and kernel versions.  Nonetheless, it should be possible
to create APIs that are are conditioned on the compiler version.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 13:50                                                                     ` Chris Snook
  2007-08-21 14:59                                                                       ` Segher Boessenkool
@ 2007-08-21 16:31                                                                       ` Satyam Sharma
  2007-08-21 16:43                                                                       ` Linus Torvalds
  2 siblings, 0 replies; 657+ messages in thread
From: Satyam Sharma @ 2007-08-21 16:31 UTC (permalink / raw)
  To: Chris Snook
  Cc: David Miller, Linus Torvalds, piggin, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, Linux Kernel Mailing List,
	linux-arch, netdev, Andrew Morton, ak, heiko.carstens,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher

On Tue, 21 Aug 2007, Chris Snook wrote:

> David Miller wrote:
> > From: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)
> > 
> > > Ie a "barrier()" is likely _cheaper_ than the code generation downside
> > > from using "volatile".
> > 
> > Assuming GCC were ever better about the code generation badness
> > with volatile that has been discussed here, I much prefer
> > we tell GCC "this memory piece changed" rather than "every
> > piece of memory has changed" which is what the barrier() does.
> > 
> > I happened to have been scanning a lot of assembler lately to
> > track down a gcc-4.2 miscompilation on sparc64, and the barriers
> > do hurt quite a bit in some places.  Instead of keeping unrelated
> > variables around cached in local registers, it reloads everything.
> 
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

I do agree, and the important point to note is that the benefits of a
/lighter/ compiler barrier, such as what David referred to above, _can_
be had without having to do anything with the "volatile" keyword at all.
And such a primitive has already been mentioned/proposed on this thread.

But this is all tangential to the core question at hand -- whether to have
implicit (compiler, possibly "light-weight" of the kind referred above)
barrier semantics in atomic ops that do not have them, or not.

I was lately looking in the kernel for _actual_ code that uses atomic_t
and benefits from the lack of any implicit barrier, with the compiler
being free to cache the atomic_t in a register. Now that often does _not_
happen, because all other ops (implemented in asm with LOCK prefix on x86)
_must_ therefore constrain the atomic_t to memory anyway. So typically all
atomic ops code sequences end up operating on memory.

Then I did locate sched.c:select_nohz_load_balancer() -- it repeatedly
references the same atomic_t object, and the code that I saw generated
(with CC_OPTIMIZE_FOR_SIZE=y) did cache it in a register for a sequence of
instructions. It uses atomic_cmpxchg, thereby not requiring explicit
memory barriers anywhere in the code, and is an example of an atomic_t
user that is safe, and yet benefits from its memory loads/stores being
elided/coalesced by the compiler.

# at this point, %%eax holds num_online_cpus() and
# %%ebx holds cpus_weight(nohz.cpu_mask)
# the variable "cpu" is in %esi

0xc1018e1d:      cmp    %eax,%ebx		# if No.A.
0xc1018e1f:      mov    0xc134d900,%eax		# first atomic_read()
0xc1018e24:      jne    0xc1018e36
0xc1018e26:      cmp    %esi,%eax		# if No.B.
0xc1018e28:      jne    0xc1018e80		# returns with 0
0xc1018e2a:      movl   $0xffffffff,0xc134d900	# atomic_set(-1), and ...
0xc1018e34:      jmp    0xc1018e80		# ... returns with 0
0xc1018e36:      cmp    $0xffffffff,%eax	# if No.C. (NOTE!)
0xc1018e39:      jne    0xc1018e46
0xc1018e3b:      lock cmpxchg %esi,0xc134d900	# atomic_cmpxchg()
0xc1018e43:      inc    %eax
0xc1018e44:      jmp    0xc1018e48
0xc1018e46:      cmp    %esi,%eax		# if No.D. (NOTE!)
0xc1018e48:      jne    0xc1018e80		# if !=, default return 0 (if No.E.)
0xc1018e4a:      jmp    0xc1018e84		# otherwise (==) returns with 1

The above is:

	if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {	/* if No.A. */
		if (atomic_read(&nohz.load_balancer) == cpu)	/* if No.B. */
			atomic_set(&nohz.load_balancer, -1);	/* XXX */
		return 0;
	}
	if (atomic_read(&nohz.load_balancer) == -1) {		/* if No.C. */
		/* make me the ilb owner */
		if (atomic_cmpxchg(&nohz.load_balancer, -1, cpu) == -1)	/* if No.E. */
			return 1;
	} else if (atomic_read(&nohz.load_balancer) == cpu)	/* if No.D. */
		return 1;
	...
	...
	return 0; /* default return from function */

As you can see, the atomic_read()'s of "if"s Nos. B, C, and D, were _all_
coalesced into a single memory reference "mov    0xc134d900,%eax" at the
top of the function, and then "if"s Nos. C and D simply used the value
from %%eax itself. But that's perfectly safe, such is the logic of this
function. It uses cmpxchg _whenever_ updating the value in the memory
atomic_t and then returns appropriately. The _only_ point that a casual
reader may find racy is that marked /* XXX */ above -- atomic_read()
followed by atomic_set() with no barrier in between. But even that is ok,
because if one thread ever finds that condition to succeed, it is 100%
guaranteed no other thread on any other CPU will find _any_ condition
to be true, thereby avoiding any race in the modification of that value.

BTW it does sound reasonable that a lot of atomic_t users that want a
compiler barrier probably also want a memory barrier. Do we make _that_
implicit too? Quite clearly, making _either_ one of those implicit in
atomic_{read,set} (in any form of implementation -- a forget() macro
based, *(volatile int *)& based, or inline asm based) would end up
harming code such as that cited above.

Lastly, the most obvious reason that should be considered against implicit
barriers in atomic ops is that it isn't "required" -- atomicity does not
imply any barrier after all, and making such a distinction would actually
be a healthy separation that helps people think more clearly when writing
lockless code.

[ But the "authors' expectations" / heisenbugs argument also holds some
  water ... for that, we can have a _variant_ in the API for atomic ops
  that has implicit compiler/memory barriers, to make it easier on those
  who want that behaviour. But let us not penalize code that knows what
  it is doing by changing the default to that, please. ]

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 13:50                                                                     ` Chris Snook
  2007-08-21 14:59                                                                       ` Segher Boessenkool
  2007-08-21 16:31                                                                       ` Satyam Sharma
@ 2007-08-21 16:43                                                                       ` Linus Torvalds
  2 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-21 16:43 UTC (permalink / raw)
  To: Chris Snook
  Cc: David Miller, piggin, satyam, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, linux-kernel, linux-arch,
	netdev, akpm, ak, heiko.carstens, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Tue, 21 Aug 2007, Chris Snook wrote:
> 
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

Note that "barrier()" is purely a compiler barrier. It has zero impact on 
the CPU pipeline itself, and also has zero impact on anything that gcc 
knows isn't visible in memory (ie local variables that don't have their 
address taken), so barrier() really is pretty cheap.

Now, it's possible that gcc messes up in some circumstances, and that the 
memory clobber will cause gcc to also do things like flush local registers 
unnecessarily to their stack slots, but quite frankly, if that happens, 
it's a gcc problem, and I also have to say that I've not seen that myself.

So in a very real sense, "barrier()" will just make sure that there is a 
stronger sequence point for the compiler where things are stable. In most 
cases it has absolutely zero performance impact - apart from the 
-intended- impact of making sure that the compiler doesn't re-order or 
cache stuff around it.

And sure, we could make it more finegrained, and also introduce a 
per-variable barrier, but the fact is, people _already_ have problems with 
thinking about these kinds of things, and adding new abstraction issues 
with subtle semantics is the last thing we want.

So I really think you'd want to show a real example of real code that 
actually gets noticeably slower or bigger.

In removing "volatile", we have shown that. It may not have made a big 
difference on powerpc, but it makes a real difference on x86 - and more 
importantly, it removes something that people clearly don't know how it 
works, and incorrectly expect to just fix bugs.

[ There are *other* barriers - the ones that actually add memory barriers 
  to the CPU - that really can be quite expensive. The good news is that 
  the expense is going down rather than up: both Intel and AMD are not 
  only removing the need for some of them (ie "smp_rmb()" will become a 
  compiler-only barrier), but we're _also_ seeing the whole "pipeline 
  flush" approach go away, and be replaced by the CPU itself actually 
  being better - so even the actual CPU pipeline barriers are getting
  cheaper, not more expensive. ]

For example, did anybody even _test_ how expensive "barrier()" is? Just 
as a lark, I did

	#undef barrier
	#define barrier() do { } while (0)

in kernel/sched.c (which only has three of them in it, but hey, that's 
more than most files), and there were _zero_ code generation downsides. 
One instruction was moved (and a few line numbers changed), so it wasn't 
like the assembly language was identical, but the point is, barrier() 
simply doesn't have the same kinds of downsides that "volatile" has.

(That may not be true on other architectures or in other source files, of 
course. This *does* depend on code generation details. But anybody who 
thinks that "barrier()" is fundamentally expensive is simply incorrect. It 
is *fundamnetally* a no-op).

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 16:16                                                 ` Paul E. McKenney
@ 2007-08-21 22:51                                                   ` Valdis.Kletnieks
  2007-08-22  0:50                                                     ` Paul E. McKenney
  2007-08-22 21:38                                                     ` Adrian Bunk
  0 siblings, 2 replies; 657+ messages in thread
From: Valdis.Kletnieks @ 2007-08-21 22:51 UTC (permalink / raw)
  To: paulmck
  Cc: Segher Boessenkool, Paul Mackerras, Russell King,
	Christoph Lameter, heiko.carstens, horms, linux-kernel, ak,
	netdev, cfriesen, akpm, rpjday, Nick Piggin, linux-arch,
	jesper.juhl, satyam, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said:

> I agree that instant gratification is hard to come by when synching
> up compiler and kernel versions.  Nonetheless, it should be possible
> to create APIs that are are conditioned on the compiler version.

We've tried that, sort of.  See the mess surrounding the whole
extern/static/inline/__whatever boondogle, which seems to have
changed semantics in every single gcc release since 2.95 or so.

And recently mention was made that gcc4.4 will have *new* semantics
in this area. Yee. Hah.






[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 22:51                                                   ` Valdis.Kletnieks
@ 2007-08-22  0:50                                                     ` Paul E. McKenney
  2007-08-22 21:38                                                     ` Adrian Bunk
  1 sibling, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-08-22  0:50 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Segher Boessenkool, Paul Mackerras, Russell King,
	Christoph Lameter, heiko.carstens, horms, linux-kernel, ak,
	netdev, cfriesen, akpm, rpjday, Nick Piggin, linux-arch,
	jesper.juhl, satyam, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

On Tue, Aug 21, 2007 at 06:51:16PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said:
> 
> > I agree that instant gratification is hard to come by when synching
> > up compiler and kernel versions.  Nonetheless, it should be possible
> > to create APIs that are are conditioned on the compiler version.
> 
> We've tried that, sort of.  See the mess surrounding the whole
> extern/static/inline/__whatever boondogle, which seems to have
> changed semantics in every single gcc release since 2.95 or so.
> 
> And recently mention was made that gcc4.4 will have *new* semantics
> in this area. Yee. Hah.

;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-21 22:51                                                   ` Valdis.Kletnieks
  2007-08-22  0:50                                                     ` Paul E. McKenney
@ 2007-08-22 21:38                                                     ` Adrian Bunk
  1 sibling, 0 replies; 657+ messages in thread
From: Adrian Bunk @ 2007-08-22 21:38 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: paulmck, Segher Boessenkool, Paul Mackerras, Russell King,
	Christoph Lameter, heiko.carstens, horms, linux-kernel, ak,
	netdev, cfriesen, akpm, rpjday, Nick Piggin, linux-arch,
	jesper.juhl, satyam, zlynx, schwidefsky, Chris Snook, Herbert Xu,
	davem, Linus Torvalds, wensong, wjiang

On Tue, Aug 21, 2007 at 06:51:16PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said:
> 
> > I agree that instant gratification is hard to come by when synching
> > up compiler and kernel versions.  Nonetheless, it should be possible
> > to create APIs that are are conditioned on the compiler version.
> 
> We've tried that, sort of.  See the mess surrounding the whole
> extern/static/inline/__whatever boondogle, which seems to have
> changed semantics in every single gcc release since 2.95 or so.
>...

There is exactly one semantics change in gcc in this area, and that is 
the change of the "extern inline" semantics in gcc 4.3 to the
C99 semantics.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-16  0:39           ` [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert() Satyam Sharma
@ 2007-08-24 11:59             ` Denys Vlasenko
  2007-08-24 12:07               ` Andi Kleen
                                 ` (3 more replies)
  0 siblings, 4 replies; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 11:59 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
>
>  static inline void wait_for_init_deassert(atomic_t *deassert)
>  {
> -	while (!atomic_read(deassert));
> +	while (!atomic_read(deassert))
> +		cpu_relax();
>  	return;
>  }

For less-than-briliant people like me, it's totally non-obvious that
cpu_relax() is needed for correctness here, not just to make P4 happy.

IOW: "atomic_read" name quite unambiguously means "I will read
this variable from main memory". Which is not true and creates
potential for confusion and bugs.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 11:59             ` Denys Vlasenko
@ 2007-08-24 12:07               ` Andi Kleen
  2007-08-24 12:12               ` Kenn Humborg
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 657+ messages in thread
From: Andi Kleen @ 2007-08-24 12:07 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Satyam Sharma, Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Friday 24 August 2007 13:59:32 Denys Vlasenko wrote:
> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -	while (!atomic_read(deassert));
> > +	while (!atomic_read(deassert))
> > +		cpu_relax();
> >  	return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

I find it also non obvious. It would be really better to have a barrier
or equivalent (volatile or variable clobber) in the atomic_read()
 
> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

Agreed.

-Andi

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 11:59             ` Denys Vlasenko
  2007-08-24 12:07               ` Andi Kleen
@ 2007-08-24 12:12               ` Kenn Humborg
  2007-08-24 14:25                 ` Denys Vlasenko
  2007-08-24 13:30               ` Satyam Sharma
  2007-08-24 16:19               ` Luck, Tony
  3 siblings, 1 reply; 657+ messages in thread
From: Kenn Humborg @ 2007-08-24 12:12 UTC (permalink / raw)
  To: Denys Vlasenko, Satyam Sharma
  Cc: Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -	while (!atomic_read(deassert));
> > +	while (!atomic_read(deassert))
> > +		cpu_relax();
> >  	return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.
> 
> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

To me, "atomic_read" means a read which is synchronized with other 
changes to the variable (using the atomic_XXX functions) in such 
a way that I will always only see the "before" or "after"
state of the variable - never an intermediate state while a 
modification is happening.  It doesn't imply that I have to 
see the "after" state immediately after another thread modifies
it.

Perhaps the Linux atomic_XXX functions work like that, or used
to work like that, but it's counter-intuitive to me that "atomic"
should imply a memory read.

Later,
Kenn


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-18  4:13                                     ` Linus Torvalds
  2007-08-18 13:36                                       ` Satyam Sharma
  2007-08-18 21:54                                       ` Paul E. McKenney
@ 2007-08-24 12:19                                       ` Denys Vlasenko
  2007-08-24 17:19                                         ` Linus Torvalds
  2 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 12:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney, Herbert Xu,
	Nick Piggin, Paul Mackerras, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, schwidefsky, Chris Snook, davem,
	wensong, wjiang

On Saturday 18 August 2007 05:13, Linus Torvalds wrote:
> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> > No code does (or would do, or should do):
> >
> > 	x.counter++;
> >
> > on an "atomic_t x;" anyway.
>
> That's just an example of a general problem.
>
> No, you don't use "x.counter++". But you *do* use
>
> 	if (atomic_read(&x) <= 1)
>
> and loading into a register is stupid and pointless, when you could just
> do it as a regular memory-operand to the cmp instruction.

It doesn't mean that (volatile int*) cast is bad, it means that current gcc
is bad (or "not good enough"). IOW: instead of avoiding volatile cast,
it's better to fix the compiler.

> And as far as the compiler is concerned, the problem is the 100% same:
> combining operations with the volatile memop.
>
> The fact is, a compiler that thinks that
>
> 	movl mem,reg
> 	cmpl $val,reg
>
> is any better than
>
> 	cmpl $val,mem
>
> is just not a very good compiler.

Linus, in all honesty gcc has many more cases of suboptimal code,
case of "volatile" is just one of many.

Off the top of my head:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

unsigned v;
void f(unsigned A) { v = ((unsigned long long)A) * 365384439 >> (27+32); }

gcc-4.1.1 -S -Os -fomit-frame-pointer t.c

f:
        movl    $365384439, %eax
        mull    4(%esp)
        movl    %edx, %eax <===== ?
        shrl    $27, %eax
        movl    %eax, v
        ret

Why is it moving %edx to %eax?

gcc-4.2.1 -S -Os -fomit-frame-pointer t.c

f:
        movl    $365384439, %eax
        mull    4(%esp)
        movl    %edx, %eax <===== ?
        xorl    %edx, %edx <===== ??!
        shrl    $27, %eax
        movl    %eax, v
        ret

Progress... Now we also zero out %edx afterwards for no apparent reason.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-15 23:22         ` Paul Mackerras
  2007-08-16  0:26           ` Christoph Lameter
@ 2007-08-24 12:50           ` Denys Vlasenko
  2007-08-24 17:15             ` Christoph Lameter
  1 sibling, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 12:50 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Satyam Sharma, Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> Satyam Sharma writes:
> In the kernel we use atomic variables in precisely those situations
> where a variable is potentially accessed concurrently by multiple
> CPUs, and where each CPU needs to see updates done by other CPUs in a
> timely fashion.  That is what they are for.  Therefore the compiler
> must not cache values of atomic variables in registers; each
> atomic_read must result in a load and each atomic_set must result in a
> store.  Anything else will just lead to subtle bugs.

Amen.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 11:59             ` Denys Vlasenko
  2007-08-24 12:07               ` Andi Kleen
  2007-08-24 12:12               ` Kenn Humborg
@ 2007-08-24 13:30               ` Satyam Sharma
  2007-08-24 17:06                 ` Christoph Lameter
  2007-08-24 16:19               ` Luck, Tony
  3 siblings, 1 reply; 657+ messages in thread
From: Satyam Sharma @ 2007-08-24 13:30 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Nick Piggin

Hi Denys,

On Fri, 24 Aug 2007, Denys Vlasenko wrote:

> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -	while (!atomic_read(deassert));
> > +	while (!atomic_read(deassert))
> > +		cpu_relax();
> >  	return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

This thread has been round-and-round with exactly the same discussions
:-) I had proposed few such variants to make a compiler barrier implicit
in atomic_{read,set} myself, but frankly, at least personally speaking
(now that I know better), I'm not so much in favour of implicit barriers
(compiler, memory or both) in atomic_{read,set}.

This might sound like an about-turn if you read my own postings to Nick
Piggin from a week back, but I do agree with most his opinions on the
matter now -- separation of barriers from atomic ops is actually good,
beneficial to certain code that knows what it's doing, explicit usage
of barriers stands out more clearly (most people here who deal with it
do know cpu_relax() is an explicit compiler barrier) compared to an
implicit usage in an atomic_read() or such variant ...

> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

I'd have to disagree here -- atomic ops are all about _atomicity_ of
memory accesses, not _making_ them happen (or visible to other CPUs)
_then and there_ itself. The latter are the job of barriers.

The behaviour (and expectations) are quite comprehensively covered in
atomic_ops.txt -- let alone atomic_{read,set}, even atomic_{inc,dec}
are permitted by archs' implementations to _not_ have any memory
barriers, for that matter. [It is unrelated that on x86 making them
SMP-safe requires the use of the LOCK prefix that also happens to be
an implicit memory barrier.]

An argument was also made about consistency of atomic_{read,set} w.r.t.
the other atomic ops -- but clearly, they are all already consistent!
All of them are atomic :-) The fact that atomic_{read,set} do _not_
require any inline asm or LOCK prefix whereas the others do, has to do
with the fact that unlike all others, atomic_{read,set} are not RMW ops
and hence guaranteed to be atomic just as they are in plain & simple C.

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)

Satyam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 12:12               ` Kenn Humborg
@ 2007-08-24 14:25                 ` Denys Vlasenko
  2007-08-24 17:34                   ` Linus Torvalds
  0 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 14:25 UTC (permalink / raw)
  To: Kenn Humborg
  Cc: Satyam Sharma, Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

On Friday 24 August 2007 13:12, Kenn Humborg wrote:
> > On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> > >  static inline void wait_for_init_deassert(atomic_t *deassert)
> > >  {
> > > -	while (!atomic_read(deassert));
> > > +	while (!atomic_read(deassert))
> > > +		cpu_relax();
> > >  	return;
> > >  }
> >
> > For less-than-briliant people like me, it's totally non-obvious that
> > cpu_relax() is needed for correctness here, not just to make P4 happy.
> >
> > IOW: "atomic_read" name quite unambiguously means "I will read
> > this variable from main memory". Which is not true and creates
> > potential for confusion and bugs.
>
> To me, "atomic_read" means a read which is synchronized with other
> changes to the variable (using the atomic_XXX functions) in such
> a way that I will always only see the "before" or "after"
> state of the variable - never an intermediate state while a
> modification is happening.  It doesn't imply that I have to
> see the "after" state immediately after another thread modifies
> it.

So you are ok with compiler propagating n1 to n2 here:

n1 += atomic_read(x);
other_variable++;
n2 += atomic_read(x);

without accessing x second time. What's the point? Any sane coder
will say that explicitly anyway:

tmp = atomic_read(x);
n1 += tmp;
other_variable++;
n2 += tmp;

if only for the sake of code readability. Because first code
is definitely hinting that it reads RAM twice, and it's actively *bad*
for code readability when in fact it's not the case!

Locking, compiler and CPU barriers are complicated enough already,
please don't make them even harder to understand.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 11:59             ` Denys Vlasenko
                                 ` (2 preceding siblings ...)
  2007-08-24 13:30               ` Satyam Sharma
@ 2007-08-24 16:19               ` Luck, Tony
  3 siblings, 0 replies; 657+ messages in thread
From: Luck, Tony @ 2007-08-24 16:19 UTC (permalink / raw)
  To: Denys Vlasenko, Satyam Sharma
  Cc: Heiko Carstens, Herbert Xu, Chris Snook, clameter,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher

>>  static inline void wait_for_init_deassert(atomic_t *deassert)
>>  {
>> -	while (!atomic_read(deassert));
>> +	while (!atomic_read(deassert))
>> +		cpu_relax();
>>  	return;
>>  }
>
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

Not just P4 ... there are other threaded cpus where it is useful to
let the core know that this is a busy loop so it would be a good thing
to let other threads have priority.

Even on a non-threaded cpu the cpu_relax() could be useful in the
future to hint to the cpu that it could drop into a lower power
hogging state.

But I agree with your main point that the loop without the cpu_relax()
looks like it ought to work because atomic_read() ought to actually
go out and read memory each time around the loop.

-Tony

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 13:30               ` Satyam Sharma
@ 2007-08-24 17:06                 ` Christoph Lameter
  2007-08-24 20:26                   ` Denys Vlasenko
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-24 17:06 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Denys Vlasenko, Heiko Carstens, Herbert Xu, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Nick Piggin

On Fri, 24 Aug 2007, Satyam Sharma wrote:

> But if people do seem to have a mixed / confused notion of atomicity
> and barriers, and if there's consensus, then as I'd said earlier, I
> have no issues in going with the consensus (eg. having API variants).
> Linus would be more difficult to convince, however, I suspect :-)

The confusion may be the result of us having barrier semantics in 
atomic_read. If we take that out then we may avoid future confusions.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-24 12:50           ` Denys Vlasenko
@ 2007-08-24 17:15             ` Christoph Lameter
  2007-08-24 20:21               ` Denys Vlasenko
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-08-24 17:15 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Fri, 24 Aug 2007, Denys Vlasenko wrote:

> On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> > Satyam Sharma writes:
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> Amen.

A "timely" fashion? One cannot rely on something like that when coding. 
The visibility of updates is insured by barriers and not by some fuzzy 
notion of "timeliness".

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-24 12:19                                       ` Denys Vlasenko
@ 2007-08-24 17:19                                         ` Linus Torvalds
  0 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-24 17:19 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Satyam Sharma, Christoph Lameter, Paul E. McKenney, Herbert Xu,
	Nick Piggin, Paul Mackerras, Segher Boessenkool, heiko.carstens,
	horms, linux-kernel, rpjday, ak, netdev, cfriesen, akpm,
	jesper.juhl, linux-arch, zlynx, schwidefsky, Chris Snook, davem,
	wensong, wjiang

On Fri, 24 Aug 2007, Denys Vlasenko wrote:
>
> > No, you don't use "x.counter++". But you *do* use
> >
> > 	if (atomic_read(&x) <= 1)
> >
> > and loading into a register is stupid and pointless, when you could just
> > do it as a regular memory-operand to the cmp instruction.
> 
> It doesn't mean that (volatile int*) cast is bad, it means that current gcc
> is bad (or "not good enough"). IOW: instead of avoiding volatile cast,
> it's better to fix the compiler.

I would agree that fixing the compiler in this case would be a good thing, 
even quite regardless of any "atomic_read()" discussion.

I just have a strong suspicion that "volatile" performance is so low down 
the list of any C compiler persons interest, that it's never going to 
happen. And quite frankly, I cannot blame the gcc guys for it.

That's especially as "volatile" really isn't a very good feature of the C 
language, and is likely to get *less* interesting rather than more (as 
user space starts to be more and more threaded, "volatile" gets less and 
less useful.

[ Ie, currently, I think you can validly use "volatile" in a "sigatomic_t" 
  kind of way, where there is a single thread, but with asynchronous 
  events. In that kind of situation, I think it's probably useful. But 
  once you get multiple threads, it gets pointless.

  Sure: you could use "volatile" together with something like Dekker's or 
  Peterson's algorithm that doesn't depend on cache coherency (that's 
  basically what the C "volatile" keyword approximates: not atomic 
  accesses, but *uncached* accesses! But let's face it, that's way past 
  insane. ]

So I wouldn't expect "volatile" to ever really generate better code. It 
might happen as a side effect of other improvements (eg, I might hope that 
the SSA work would eventually lead to gcc having a much better defined 
model of valid optimizations, and maybe better code generation for 
volatile accesses fall out cleanly out of that), but in the end, it's such 
an ugly special case in C, and so seldom used, that I wouldn't depend on 
it.

> Linus, in all honesty gcc has many more cases of suboptimal code,
> case of "volatile" is just one of many.

Well, the thing is, quite often, many of those "suboptimal code" 
generations fall into two distinct classes:

 - complex C code. I can't really blame the compiler too much for this. 
   Some things are *hard* to optimize, and for various scalability 
   reasons, you often end up having limits in the compiler where it 
   doesn't even _try_ doing certain optimizations if you have excessive 
   complexity.

 - bad register allocation. Register allocation really is hard, and 
   sometimes gcc just does the "obviously wrong" thing, and you end up 
   having totally unnecessary spills.

> Off the top of my head:

Yes, "unsigned long long" with x86 has always generated atrocious code. In 
fact, I would say that historically it was really *really* bad. These 
days, gcc actually does a pretty good job, but I'm not surprised that it's 
still quite possible to find cases where it did some optimization (in this 
case, apparently noticing that "shift by >= 32 bits" causes the low 
register to be pointless) and then missed *another* optimization (better 
register use) because that optimization had been done *before* the first 
optimization was done.

That's a *classic* example of compiler code generation issues, and quite 
frankly, I think that's very different from the issue of "volatile".

Quite frankly, I'd like there to be more competition in the open source 
compiler game, and that might cause some upheavals, but on the whole, gcc 
actually does a pretty damn good job. 

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 14:25                 ` Denys Vlasenko
@ 2007-08-24 17:34                   ` Linus Torvalds
  0 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-08-24 17:34 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Kenn Humborg, Satyam Sharma, Heiko Carstens, Herbert Xu,
	Chris Snook, clameter, Linux Kernel Mailing List, linux-arch,
	netdev, Andrew Morton, ak, davem, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Fri, 24 Aug 2007, Denys Vlasenko wrote:
> 
> So you are ok with compiler propagating n1 to n2 here:
> 
> n1 += atomic_read(x);
> other_variable++;
> n2 += atomic_read(x);
> 
> without accessing x second time. What's the point? Any sane coder
> will say that explicitly anyway:

No.

This is a common mistake, and it's total crap.

Any "sane coder" will often use inline functions, macros, etc helpers to 
do certain abstract things. Those things may contain "atomic_read()" 
calls.

The biggest reason for compilers doing CSE is exactly the fact that many 
opportunities for CSE simple *are*not*visible* on a source code level. 

That is true of things like atomic_read() equally as to things like shared 
offsets inside structure member accesses. No difference what-so-ever.

Yes, we have, traditionally, tried to make it *easy* for the compiler to 
generate good code. So when we can, and when we look at performance for 
some really hot path, we *will* write the source code so that the compiler 
doesn't even have the option to screw it up, and that includes things like 
doing CSE at a source code level so that we don't see the compiler 
re-doing accesses unnecessarily.

And I'm not saying we shouldn't do that. But "performance" is not an 
either-or kind of situation, and we should:

 - spend the time at a source code level: make it reasonably easy for the 
   compiler to generate good code, and use the right algorithms at a 
   higher level (and order structures etc so that they have good cache 
   behaviour).

 - .. *and* expect the compiler to handle the cases we didn't do by hand
   pretty well anyway. In particular, quite often, abstraction levels at a 
   software level means that we give compilers "stupid" code, because some 
   function may have a certain high-level abstraction rule, but then on a 
   particular architecture it's actually a no-op, and the compiler should 
   get to "untangle" our stupid code and generate good end results.

 - .. *and* expect the hardware to be sane and do a good job even when the 
   compiler didn't generate perfect code or there were unlucky cache miss
   patterns etc.

and if we do all of that, we'll get good performance. But you really do 
want all three levels. It's not enough to be good at any one level (or 
even any two).

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-24 17:15             ` Christoph Lameter
@ 2007-08-24 20:21               ` Denys Vlasenko
  0 siblings, 0 replies; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 20:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, Satyam Sharma, Stefan Richter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu, Paul E. McKenney

On Friday 24 August 2007 18:15, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Denys Vlasenko wrote:
> > On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
> > > Satyam Sharma writes:
> > > In the kernel we use atomic variables in precisely those situations
> > > where a variable is potentially accessed concurrently by multiple
> > > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > > timely fashion.  That is what they are for.  Therefore the compiler
> > > must not cache values of atomic variables in registers; each
> > > atomic_read must result in a load and each atomic_set must result in a
> > > store.  Anything else will just lead to subtle bugs.
> >
> > Amen.
>
> A "timely" fashion? One cannot rely on something like that when coding.
> The visibility of updates is insured by barriers and not by some fuzzy
> notion of "timeliness".

But here you do have some notion of time:

	while (atomic_read(&x))
		continue;

"continue when other CPU(s) decrement it down to zero".
If "read" includes an insn which accesses RAM, you will
see "new" value sometime after other CPU decrements it.
"Sometime after" is on the order of nanoseconds here.
It is a valid concept of time, right?

The whole confusion is about whether atomic_read implies
"read from RAM" or not. I am in a camp which thinks it does.
You are in an opposite one.

We just need a less ambiguous name.

What about this:

/**
 * atomic_read - read atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically reads the value of @v.
 * No compiler barrier implied.
 */
#define atomic_read(v)          ((v)->counter)

+/**
+ * atomic_read_uncached - read atomic variable from memory
+ * @v: pointer of type atomic_t
+ *
+ * Atomically reads the value of @v. This is guaranteed to emit an insn
+ * which accesses memory, atomically. No ordering guarantees!
+ */
+#define atomic_read_uncached(v)  asm_or_volatile_ptr_magic(v)

I was thinking of s/atomic_read/atomic_get/ too, but it implies "taking"
atomic a-la get_cpu()...
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 17:06                 ` Christoph Lameter
@ 2007-08-24 20:26                   ` Denys Vlasenko
  2007-08-24 20:34                     ` Chris Snook
  0 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-08-24 20:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Satyam Sharma, Heiko Carstens, Herbert Xu, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Nick Piggin

On Friday 24 August 2007 18:06, Christoph Lameter wrote:
> On Fri, 24 Aug 2007, Satyam Sharma wrote:
> > But if people do seem to have a mixed / confused notion of atomicity
> > and barriers, and if there's consensus, then as I'd said earlier, I
> > have no issues in going with the consensus (eg. having API variants).
> > Linus would be more difficult to convince, however, I suspect :-)
>
> The confusion may be the result of us having barrier semantics in
> atomic_read. If we take that out then we may avoid future confusions.

I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);	// doesnt hint as strongly at reading as "atomic_read"
n = atomic_fetch(x);	// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
  2007-08-24 20:26                   ` Denys Vlasenko
@ 2007-08-24 20:34                     ` Chris Snook
  0 siblings, 0 replies; 657+ messages in thread
From: Chris Snook @ 2007-08-24 20:34 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Christoph Lameter, Satyam Sharma, Heiko Carstens, Herbert Xu,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher, Nick Piggin

Denys Vlasenko wrote:
> On Friday 24 August 2007 18:06, Christoph Lameter wrote:
>> On Fri, 24 Aug 2007, Satyam Sharma wrote:
>>> But if people do seem to have a mixed / confused notion of atomicity
>>> and barriers, and if there's consensus, then as I'd said earlier, I
>>> have no issues in going with the consensus (eg. having API variants).
>>> Linus would be more difficult to convince, however, I suspect :-)
>> The confusion may be the result of us having barrier semantics in
>> atomic_read. If we take that out then we may avoid future confusions.
> 
> I think better name may help. Nuke atomic_read() altogether.
> 
> n = atomic_value(x);	// doesnt hint as strongly at reading as "atomic_read"
> n = atomic_fetch(x);	// yes, we _do_ touch RAM
> n = atomic_read_uncached(x); // or this
> 
> How does that sound?

atomic_value() vs. atomic_fetch() should be rather unambiguous. 
atomic_read_uncached() begs the question of precisely which cache we are 
avoiding, and could itself cause confusion.

So, if I were writing atomic.h from scratch, knowing what I know now, I think 
I'd use atomic_value() and atomic_fetch().  The problem is that there are a lot 
of existing users of atomic_read(), and we can't write a script to correctly 
guess their intent.  I'm not sure auditing all uses of atomic_read() is really 
worth the comparatively miniscule benefits.

We could play it safe and convert them all to atomic_fetch(), or we could 
acknowledge that changing the semantics 8 months ago was not at all disastrous, 
and make them all atomic_value(), allowing people to use atomic_fetch() where 
they really care.

	-- Chris

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 16:48                                                             ` Linus Torvalds
  2007-08-17 18:50                                                               ` Chris Friesen
  2007-08-20 13:15                                                               ` Chris Snook
@ 2007-09-09 18:02                                                               ` Denys Vlasenko
  2007-09-09 18:18                                                                 ` Arjan van de Ven
  2 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-09 18:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Chris Snook, Ilpo Jarvinen, Paul E. McKenney,
	Stefan Richter, Linux Kernel Mailing List, linux-arch, Netdev,
	Andrew Morton, ak, heiko.carstens, David Miller, schwidefsky,
	wensong, horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	segher

On Friday 17 August 2007 17:48, Linus Torvalds wrote:
> 
> On Fri, 17 Aug 2007, Nick Piggin wrote:
> > 
> > That's not obviously just taste to me. Not when the primitive has many
> > (perhaps, the majority) of uses that do not require said barriers. And
> > this is not solely about the code generation (which, as Paul says, is
> > relatively minor even on x86). I prefer people to think explicitly
> > about barriers in their lockless code.
> 
> Indeed.
> 
> I think the important issues are:
> 
>  - "volatile" itself is simply a badly/weakly defined issue. The semantics 
>    of it as far as the compiler is concerned are really not very good, and 
>    in practice tends to boil down to "I will generate so bad code that 
>    nobody can accuse me of optimizing anything away".
> 
>  - "volatile" - regardless of how well or badly defined it is - is purely 
>    a compiler thing. It has absolutely no meaning for the CPU itself, so 
>    it at no point implies any CPU barriers. As a result, even if the 
>    compiler generates crap code and doesn't re-order anything, there's 
>    nothing that says what the CPU will do.
> 
>  - in other words, the *only* possible meaning for "volatile" is a purely 
>    single-CPU meaning. And if you only have a single CPU involved in the 
>    process, the "volatile" is by definition pointless (because even 
>    without a volatile, the compiler is required to make the C code appear 
>    consistent as far as a single CPU is concerned).
> 
> So, let's take the example *buggy* code where we use "volatile" to wait 
> for other CPU's:
> 
> 	atomic_set(&var, 0);
> 	while (!atomic_read(&var))
> 		/* nothing */;
> 
> 
> which generates an endless loop if we don't have atomic_read() imply 
> volatile.
> 
> The point here is that it's buggy whether the volatile is there or not! 
> Exactly because the user expects multi-processing behaviour, but 
> "volatile" doesn't actually give any real guarantees about it. Another CPU 
> may have done:
> 
> 	external_ptr = kmalloc(..);
> 	/* Setup is now complete, inform the waiter */
> 	atomic_inc(&var);
> 
> but the fact is, since the other CPU isn't serialized in any way, the 
> "while-loop" (even in the presense of "volatile") doesn't actually work 
> right! Whatever the "atomic_read()" was waiting for may not have 
> completed, because we have no barriers!

Why is all this fixation on "volatile"? I don't think
people want "volatile" keyword per se, they want atomic_read(&x) to
_always_ compile into an memory-accessing instruction, not register access.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-09 18:02                                                               ` Denys Vlasenko
@ 2007-09-09 18:18                                                                 ` Arjan van de Ven
  2007-09-10 10:56                                                                   ` Denys Vlasenko
  0 siblings, 1 reply; 657+ messages in thread
From: Arjan van de Ven @ 2007-09-09 18:18 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Sun, 9 Sep 2007 19:02:54 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:

> Why is all this fixation on "volatile"? I don't think
> people want "volatile" keyword per se, they want atomic_read(&x) to
> _always_ compile into an memory-accessing instruction, not register
> access.

and ... why is that?
is there any valid, non-buggy code sequence that makes that a
reasonable requirement?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-09 18:18                                                                 ` Arjan van de Ven
@ 2007-09-10 10:56                                                                   ` Denys Vlasenko
  2007-09-10 11:15                                                                     ` Herbert Xu
                                                                                       ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-10 10:56 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
> On Sun, 9 Sep 2007 19:02:54 +0100
> Denys Vlasenko <vda.linux@googlemail.com> wrote:
> 
> > Why is all this fixation on "volatile"? I don't think
> > people want "volatile" keyword per se, they want atomic_read(&x) to
> > _always_ compile into an memory-accessing instruction, not register
> > access.
> 
> and ... why is that?
> is there any valid, non-buggy code sequence that makes that a
> reasonable requirement?

Well, if you insist on having it again:

Waiting for atomic value to be zero:

        while (atomic_read(&x))
                continue;

gcc may happily convert it into:

        reg = atomic_read(&x);
        while (reg)
                continue;

Expecting every driver writer to remember that atomic_read is not in fact
a "read from memory" is naive. That won't happen. Face it, majority of
driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;)
The name of the macro is saying that it's a read.
We are confusing users here.

It's doubly confusing that cpy_relax(), which says _nothing_ about barriers
in its name, is actually a barrier you need to insert here.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 10:56                                                                   ` Denys Vlasenko
@ 2007-09-10 11:15                                                                     ` Herbert Xu
  2007-09-10 12:22                                                                     ` Kyle Moffett
  2007-09-10 14:51                                                                     ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Arjan van de Ven
  2 siblings, 0 replies; 657+ messages in thread
From: Herbert Xu @ 2007-09-10 11:15 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Arjan van de Ven, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Mon, Sep 10, 2007 at 11:56:29AM +0100, Denys Vlasenko wrote:
> 
> Expecting every driver writer to remember that atomic_read is not in fact
> a "read from memory" is naive. That won't happen. Face it, majority of
> driver authors are a bit less talented than Ingo Molnar or Arjan van de Ven ;)
> The name of the macro is saying that it's a read.
> We are confusing users here.

For driver authors who're too busy to learn the intricacies
of atomic operations, we have the plain old spin lock which
then lets you use normal data structures such as u32 safely.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 10:56                                                                   ` Denys Vlasenko
  2007-09-10 11:15                                                                     ` Herbert Xu
@ 2007-09-10 12:22                                                                     ` Kyle Moffett
  2007-09-10 13:38                                                                       ` Denys Vlasenko
  2007-09-10 14:51                                                                     ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Arjan van de Ven
  2 siblings, 1 reply; 657+ messages in thread
From: Kyle Moffett @ 2007-09-10 12:22 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Arjan van de Ven, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote:
> On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
>> On Sun, 9 Sep 2007 19:02:54 +0100
>> Denys Vlasenko <vda.linux@googlemail.com> wrote:
>>
>>> Why is all this fixation on "volatile"? I don't think people want  
>>> "volatile" keyword per se, they want atomic_read(&x) to _always_  
>>> compile into an memory-accessing instruction, not register access.
>>
>> and ... why is that?  is there any valid, non-buggy code sequence  
>> that makes that a reasonable requirement?
>
> Well, if you insist on having it again:
>
> Waiting for atomic value to be zero:
>
>         while (atomic_read(&x))
>                 continue;
>
> gcc may happily convert it into:
>
>         reg = atomic_read(&x);
>         while (reg)
>                 continue;

Bzzt.  Even if you fixed gcc to actually convert it to a busy loop on  
a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that  
does the conversion, it may be that the CPU does the caching of the  
memory value.  GCC has no mechanism to do cache-flushes or memory- 
barriers except through our custom inline assembly.  Also, you  
probably want a cpu_relax() in there somewhere to avoid overheating  
the CPU.  Thirdly, on a large system it may take some arbitrarily  
large amount of time for cache-propagation to update the value of the  
variable in your local CPU cache.  Finally, if atomics are based on  
based on spinlock+interrupt-disable then you will sit in a tight busy- 
loop of spin_lock_irqsave()->spin_unlock_irqrestore().  Depending on  
your system's internal model this may practically lock up your core  
because the spin_lock() will take the cacheline for exclusive access  
and doing that in a loop can prevent any other CPU from doing any  
operation on it!  Since your IRQs are disabled you even have a very  
small window that an IRQ will come along and free it up long enough  
for the update to take place.

The earlier code segment of:
> while(atomic_read(&x) > 0)
> 	atomic_dec(&x);
is *completely* buggy because you could very easily have 4 CPUs doing  
this on an atomic variable with a value of 1 and end up with it at  
negative 3 by the time you are done.  Moreover all the alternatives  
are also buggy, with the sole exception of this rather obvious- 
seeming one:
> atomic_set(&x, 0);

You simply CANNOT use an atomic_t as your sole synchronizing  
primitive, it doesn't work!  You virtually ALWAYS want to use an  
atomic_t in the following types of situations:

(A) As an object refcount.  The value is never read except as part of  
an atomic_dec_return().  Why aren't you using "struct kref"?

(B) As an atomic value counter (number of processes, for example).   
Just "reading" the value is racy anyways, if you want to enforce a  
limit or something then use atomic_inc_return(), check the result,  
and use atomic_dec() if it's too big.  If you just want to return the  
statistics then you are going to be instantaneous-point-in-time anyways.

(C) As an optimization value (statistics-like, but exact accuracy  
isn't important).

Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like  
completions, mutexes, semaphores, spinlocks, krefs, etc.  It's not  
useful for synchronization, only for keeping track of simple integer  
RMW values.  Note that atomic_read() and atomic_set() aren't very  
useful RMW primitives (read-nomodify-nowrite and read-set-zero- 
write).  Code which assumes anything else is probably buggy in other  
ways too.

So while I see no real reason for the "volatile" on the atomics, I  
also see no real reason why it's terribly harmful.  Regardless of the  
"volatile" on the operation the CPU is perfectly happy to cache it  
anyways so it doesn't buy you any actual "always-access-memory"  
guarantees.  If you are just interested in it as an optimization you  
could probably just read the properly-aligned integer counter  
directly, an atomic read on most CPUs.

If you really need it to hit main memory *every* *single* *time*  
(Why?  Are you using it instead of the proper kernel subsystem?)   
then you probably need a custom inline assembly helper anyways.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 12:22                                                                     ` Kyle Moffett
@ 2007-09-10 13:38                                                                       ` Denys Vlasenko
  2007-09-10 14:16                                                                         ` Denys Vlasenko
  0 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-10 13:38 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Arjan van de Ven, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Monday 10 September 2007 13:22, Kyle Moffett wrote:
> On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote:
> > On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
> >> On Sun, 9 Sep 2007 19:02:54 +0100
> >> Denys Vlasenko <vda.linux@googlemail.com> wrote:
> >>
> >>> Why is all this fixation on "volatile"? I don't think people want  
> >>> "volatile" keyword per se, they want atomic_read(&x) to _always_  
> >>> compile into an memory-accessing instruction, not register access.
> >>
> >> and ... why is that?  is there any valid, non-buggy code sequence  
> >> that makes that a reasonable requirement?
> >
> > Well, if you insist on having it again:
> >
> > Waiting for atomic value to be zero:
> >
> >         while (atomic_read(&x))
> >                 continue;
> >
> > gcc may happily convert it into:
> >
> >         reg = atomic_read(&x);
> >         while (reg)
> >                 continue;
> 
> Bzzt.  Even if you fixed gcc to actually convert it to a busy loop on  
> a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that  
> does the conversion, it may be that the CPU does the caching of the  
> memory value.  GCC has no mechanism to do cache-flushes or memory- 
> barriers except through our custom inline assembly.

CPU can cache the value all right, but it cannot use that cached value
*forever*, it has to react to invalidate cycles on the shared bus
and re-fetch new data.

IOW: atomic_read(&x) which compiles down to memory accessor
will work properly.

> the CPU.  Thirdly, on a large system it may take some arbitrarily  
> large amount of time for cache-propagation to update the value of the  
> variable in your local CPU cache.

Yes, but "arbitrarily large amount of time" is actually measured
in nanoseconds here. Let's say 1000ns max for hundreds of CPUs?

> Also, you   
> probably want a cpu_relax() in there somewhere to avoid overheating  
> the CPU.

Yes, but 
1. CPU shouldn't overheat (in a sense that it gets damaged),
   it will only use more power than needed.
2. cpu_relax() just throttles down my CPU, so it's performance
   optimization only. Wait, it isn't, it's a barrier too.
   Wow, "cpu_relax" is a barrier? How am I supposed to know
   that without reading lkml flamewars and/or header files?

Let's try reading headers. asm-x86_64/processor.h:

#define cpu_relax()   rep_nop()

So, is it a barrier? No clue yet.

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
        __asm__ __volatile__("rep;nop": : :"memory");
}

Comment explicitly says that it is "a good thing" (doesn't say
that it is mandatory) and says NOTHING about barriers!

Barrier-ness is not mentioned and is hidden in "memory" clobber.

Do you think it's obvious enough for average driver writer?
I think not, especially that it's unlikely for him to even start
suspecting that it is a memory barrier based on the "cpu_relax"
name.

> You simply CANNOT use an atomic_t as your sole synchronizing
> primitive, it doesn't work!  You virtually ALWAYS want to use an  
> atomic_t in the following types of situations:
> 
> (A) As an object refcount.  The value is never read except as part of  
> an atomic_dec_return().  Why aren't you using "struct kref"?
> 
> (B) As an atomic value counter (number of processes, for example).   
> Just "reading" the value is racy anyways, if you want to enforce a  
> limit or something then use atomic_inc_return(), check the result,  
> and use atomic_dec() if it's too big.  If you just want to return the  
> statistics then you are going to be instantaneous-point-in-time anyways.
> 
> (C) As an optimization value (statistics-like, but exact accuracy  
> isn't important).
> 
> Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like  
> completions, mutexes, semaphores, spinlocks, krefs, etc.  It's not  
> useful for synchronization, only for keeping track of simple integer  
> RMW values.  Note that atomic_read() and atomic_set() aren't very  
> useful RMW primitives (read-nomodify-nowrite and read-set-zero- 
> write).  Code which assumes anything else is probably buggy in other  
> ways too.

You are basically trying to educate me how to use atomic properly.
You don't need to do it, as I am (currently) not a driver author.

I am saying that people who are already using atomic_read()
(and who unfortunately did not read your explanation above)
will still sometimes use atomic_read() as a way to read atomic value
*from memory*, and will create nasty heisenbugs for you to debug.
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 13:38                                                                       ` Denys Vlasenko
@ 2007-09-10 14:16                                                                         ` Denys Vlasenko
  2007-09-10 15:09                                                                           ` Linus Torvalds
  0 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-10 14:16 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Arjan van de Ven, Linus Torvalds, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Monday 10 September 2007 14:38, Denys Vlasenko wrote:
> You are basically trying to educate me how to use atomic properly.
> You don't need to do it, as I am (currently) not a driver author.
> 
> I am saying that people who are already using atomic_read()
> (and who unfortunately did not read your explanation above)
> will still sometimes use atomic_read() as a way to read atomic value
> *from memory*, and will create nasty heisenbugs for you to debug.

static inline int
qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
{
        int      return_status = QLA_SUCCESS;
        unsigned long loop_timeout ;
        scsi_qla_host_t *pha = to_qla_parent(ha);

        /* wait for 5 min at the max for loop to be ready */
        loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);

        while ((!atomic_read(&pha->loop_down_timer) &&
            atomic_read(&pha->loop_state) == LOOP_DOWN) ||
            atomic_read(&pha->loop_state) != LOOP_READY) {
                if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
                        return_status = QLA_FUNCTION_FAILED;
                        break;
                }
                msleep(1000);
                if (time_after_eq(jiffies, loop_timeout)) {
                        return_status = QLA_FUNCTION_FAILED;
                        break;
                }
        }
        return (return_status);
}

Is above correct or buggy? Correct, because msleep is a barrier.
Is it obvious? No.

static void
qla2x00_rst_aen(scsi_qla_host_t *ha)
{
        if (ha->flags.online && !ha->flags.reset_active &&
            !atomic_read(&ha->loop_down_timer) &&
            !(test_bit(ABORT_ISP_ACTIVE, &ha->dpc_flags))) {
                do {
                        clear_bit(RESET_MARKER_NEEDED, &ha->dpc_flags);

                        /*
                         * Issue marker command only when we are going to start
                         * the I/O.
                         */
                        ha->marker_needed = 1;
                } while (!atomic_read(&ha->loop_down_timer) &&
                    (test_bit(RESET_MARKER_NEEDED, &ha->dpc_flags)));
        }
}

Is above correct? I honestly don't know. Correct, because set_bit is
a barrier on _all _memory_? Will it break if set_bit will be changed
to be a barrier only on its operand? Probably yes.

drivers/kvm/kvm_main.c

        while (atomic_read(&completed) != needed) {
                cpu_relax();
                barrier();
        }

Obviously author did not know that cpu_relax is already a barrier.
See why I think driver authors will be confused?

arch/x86_64/kernel/crash.c

static void nmi_shootdown_cpus(void)
{
...
        msecs = 1000; /* Wait at most a second for the other cpus to stop */
        while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
                mdelay(1);
                msecs--;
        }
...
}

Is mdelay(1) a barrier? Yes, because it is a function on x86_64.
Absolutely the same code will be buggy on an arch where
mdelay(1) == udelay(1000), and udelay is implemented
as inline busy-wait.

arch/sparc64/kernel/smp.c

        /* Wait for response */
        while (atomic_read(&data.finished) != cpus)
                cpu_relax();
...later in the same file...
                while (atomic_read(&smp_capture_registry) != ncpus)
                        rmb();

I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()?
(No it doesn't). Which of those two while loops needs correcting?
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 14:51                                                                     ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Arjan van de Ven
@ 2007-09-10 14:38                                                                       ` Denys Vlasenko
  2007-09-10 17:02                                                                         ` Arjan van de Ven
  0 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-10 14:38 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Monday 10 September 2007 15:51, Arjan van de Ven wrote:
> On Mon, 10 Sep 2007 11:56:29 +0100
> Denys Vlasenko <vda.linux@googlemail.com> wrote:
> 
> > 
> > Well, if you insist on having it again:
> > 
> > Waiting for atomic value to be zero:
> > 
> >         while (atomic_read(&x))
> >                 continue;
> > 
> 
> and this I would say is buggy code all the way.
>
> Not from a pure C level semantics, but from a "busy waiting is buggy"
> semantics level and a "I'm inventing my own locking" semantics level.

After inspecting arch/*, I cannot agree with you.
Otherwise almost all major architectures use
"conceptually buggy busy-waiting":

arch/alpha
arch/i386
arch/ia64
arch/m32r
arch/mips
arch/parisc
arch/powerpc
arch/sh
arch/sparc64
arch/um
arch/x86_64

All of the above contain busy-waiting on atomic_read.

Including these loops without barriers:

arch/mips/kernel/smtc.c
			while (atomic_read(&idle_hook_initialized) < 1000)
				;
arch/mips/sgi-ip27/ip27-nmi.c
	while (atomic_read(&nmied_cpus) != num_online_cpus());

[Well maybe num_online_cpus() is a barrier, I didn't check]

arch/sh/kernel/smp.c
	if (wait)
		while (atomic_read(&smp_fn_call.finished) != (nr_cpus - 1));

Bugs?
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 10:56                                                                   ` Denys Vlasenko
  2007-09-10 11:15                                                                     ` Herbert Xu
  2007-09-10 12:22                                                                     ` Kyle Moffett
@ 2007-09-10 14:51                                                                     ` Arjan van de Ven
  2007-09-10 14:38                                                                       ` Denys Vlasenko
  2 siblings, 1 reply; 657+ messages in thread
From: Arjan van de Ven @ 2007-09-10 14:51 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Mon, 10 Sep 2007 11:56:29 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:

> 
> Well, if you insist on having it again:
> 
> Waiting for atomic value to be zero:
> 
>         while (atomic_read(&x))
>                 continue;
> 

and this I would say is buggy code all the way.

Not from a pure C level semantics, but from a "busy waiting is buggy"
semantics level and a "I'm inventing my own locking" semantics level.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 14:16                                                                         ` Denys Vlasenko
@ 2007-09-10 15:09                                                                           ` Linus Torvalds
  2007-09-10 16:46                                                                             ` Denys Vlasenko
                                                                                               ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Linus Torvalds @ 2007-09-10 15:09 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Kyle Moffett, Arjan van de Ven, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Mon, 10 Sep 2007, Denys Vlasenko wrote:
> 
> static inline int
> qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
> {
>         int      return_status = QLA_SUCCESS;
>         unsigned long loop_timeout ;
>         scsi_qla_host_t *pha = to_qla_parent(ha);
> 
>         /* wait for 5 min at the max for loop to be ready */
>         loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
> 
>         while ((!atomic_read(&pha->loop_down_timer) &&
>             atomic_read(&pha->loop_state) == LOOP_DOWN) ||
>             atomic_read(&pha->loop_state) != LOOP_READY) {
>                 if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
...
> Is above correct or buggy? Correct, because msleep is a barrier.
> Is it obvious? No.

It's *buggy*. But it has nothing to do with any msleep() in the loop, or 
anything else.

And more importantly, it would be equally buggy even *with* a "volatile" 
atomic_read().

Why is this so hard for people to understand? You're all acting like 
morons.

The reason it is buggy has absolutely nothing to do with whether the read 
is done or not, it has to do with the fact that the CPU may re-order the 
reads *regardless* of whether the read is done in some specific order by 
the compiler ot not! In effect, there is zero ordering between all those 
three reads, and if you don't have memory barriers (or a lock or other 
serialization), that code is buggy.

So stop this idiotic discussion thread already. The above kind of code 
needs memory barriers to be non-buggy. The whole "volatile or not" 
discussion is totally idiotic, and pointless, and anybody who doesn't 
understand that by now needs to just shut up and think about it more, 
rather than make this discussion drag out even further.

The fact is, "volatile" *only* makes things worse. It generates worse 
code, and never fixes any real bugs. This is a *fact*.

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 15:09                                                                           ` Linus Torvalds
@ 2007-09-10 16:46                                                                             ` Denys Vlasenko
  2007-09-10 19:59                                                                               ` Kyle Moffett
  2007-09-10 18:59                                                                             ` Christoph Lameter
  2007-09-10 23:19                                                                             ` [PATCH] Document non-semantics of atomic_read() and atomic_set() Chris Snook
  2 siblings, 1 reply; 657+ messages in thread
From: Denys Vlasenko @ 2007-09-10 16:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kyle Moffett, Arjan van de Ven, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Monday 10 September 2007 16:09, Linus Torvalds wrote:
> On Mon, 10 Sep 2007, Denys Vlasenko wrote:
> > static inline int
> > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
> > {
> >         int      return_status = QLA_SUCCESS;
> >         unsigned long loop_timeout ;
> >         scsi_qla_host_t *pha = to_qla_parent(ha);
> > 
> >         /* wait for 5 min at the max for loop to be ready */
> >         loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
> > 
> >         while ((!atomic_read(&pha->loop_down_timer) &&
> >             atomic_read(&pha->loop_state) == LOOP_DOWN) ||
> >             atomic_read(&pha->loop_state) != LOOP_READY) {
> >                 if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
> ...
> > Is above correct or buggy? Correct, because msleep is a barrier.
> > Is it obvious? No.
> 
> It's *buggy*. But it has nothing to do with any msleep() in the loop, or 
> anything else.
> 
> And more importantly, it would be equally buggy even *with* a "volatile" 
> atomic_read().

I am not saying that this code is okay, this isn't the point.
(The code is in fact awful for several more reasons).

My point is that people are confused as to what atomic_read()
exactly means, and this is bad. Same for cpu_relax().
First one says "read", and second one doesn't say "barrier".

This is real code from current kernel which demonstrates this:

"I don't know that cpu_relax() is a barrier already":

drivers/kvm/kvm_main.c
        while (atomic_read(&completed) != needed) {
                cpu_relax();
                barrier();
        }

"I think that atomic_read() is a read from memory and therefore
I don't need a barrier":

arch/x86_64/kernel/crash.c
        msecs = 1000; /* Wait at most a second for the other cpus to stop */
        while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
                mdelay(1);
                msecs--;
        }

Since neither camp seems to give up, I am proposing renaming
them to something less confusing, and make everybody happy.

cpu_relax_barrier()
atomic_value(&x)
atomic_fetch(&x)

I'm not native English speaker, do these sound better?
--
vda

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 14:38                                                                       ` Denys Vlasenko
@ 2007-09-10 17:02                                                                         ` Arjan van de Ven
  0 siblings, 0 replies; 657+ messages in thread
From: Arjan van de Ven @ 2007-09-10 17:02 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
	Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
	Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
	linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
	David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
	zlynx, rpjday, jesper.juhl, segher

On Mon, 10 Sep 2007 15:38:23 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:

> On Monday 10 September 2007 15:51, Arjan van de Ven wrote:
> > On Mon, 10 Sep 2007 11:56:29 +0100
> > Denys Vlasenko <vda.linux@googlemail.com> wrote:
> > 
> > > 
> > > Well, if you insist on having it again:
> > > 
> > > Waiting for atomic value to be zero:
> > > 
> > >         while (atomic_read(&x))
> > >                 continue;
> > > 
> > 
> > and this I would say is buggy code all the way.
> >
> > Not from a pure C level semantics, but from a "busy waiting is
> > buggy" semantics level and a "I'm inventing my own locking"
> > semantics level.
> 
> After inspecting arch/*, I cannot agree with you.

the arch/ people obviously are allowed to do their own locking stuff...
BECAUSE THEY HAVE TO IMPLEMENT THAT!


the arch maintainers know EXACTLY how their hw behaves (well, we hope)
so they tend to be the exception to many rules in the kernel....

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-08-17 17:41                                         ` Segher Boessenkool
  2007-08-17 18:38                                           ` Satyam Sharma
@ 2007-09-10 18:59                                           ` Christoph Lameter
  2007-09-10 20:54                                             ` Paul E. McKenney
  2007-09-11  2:27                                             ` Segher Boessenkool
  1 sibling, 2 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-09-10 18:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul Mackerras, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

On Fri, 17 Aug 2007, Segher Boessenkool wrote:

> "volatile" has nothing to do with reordering.  atomic_dec() writes
> to memory, so it _does_ have "volatile semantics", implicitly, as
> long as the compiler cannot optimise the atomic variable away
> completely -- any store counts as a side effect.

Stores can be reordered. Only x86 has (mostly) implicit write ordering. So 
no atomic_dec has no volatile semantics and may be reordered on a variety 
of processors. Writes to memory may not follow code order on several 
processors.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 15:09                                                                           ` Linus Torvalds
  2007-09-10 16:46                                                                             ` Denys Vlasenko
@ 2007-09-10 18:59                                                                             ` Christoph Lameter
  2007-09-10 23:19                                                                             ` [PATCH] Document non-semantics of atomic_read() and atomic_set() Chris Snook
  2 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-09-10 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Kyle Moffett, Arjan van de Ven, Nick Piggin,
	Satyam Sharma, Herbert Xu, Paul Mackerras, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Mon, 10 Sep 2007, Linus Torvalds wrote:

> The fact is, "volatile" *only* makes things worse. It generates worse 
> code, and never fixes any real bugs. This is a *fact*.

Yes, lets just drop the volatiles now! We need a patch that gets rid of 
them.... Volunteers?



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 16:46                                                                             ` Denys Vlasenko
@ 2007-09-10 19:59                                                                               ` Kyle Moffett
  0 siblings, 0 replies; 657+ messages in thread
From: Kyle Moffett @ 2007-09-10 19:59 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Arjan van de Ven, Nick Piggin, Satyam Sharma,
	Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Sep 10, 2007, at 12:46:33, Denys Vlasenko wrote:
> My point is that people are confused as to what atomic_read()   
> exactly means, and this is bad. Same for cpu_relax().  First one  
> says "read", and second one doesn't say "barrier".

Q&A:

Q:  When is it OK to use atomic_read()?
A:  You are asking the question, so never.

Q:  But I need to check the value of the atomic at this point in time...
A:  Your code is buggy if it needs to do that on an atomic_t for  
anything other than debugging or optimization.  Use either  
atomic_*_return() or a lock and some normal integers.

Q:  "So why can't the atomic_read DTRT magically?"
A:  Because "the right thing" depends on the situation and is usually  
best done with something other than atomic_t.

If somebody can post some non-buggy code which is correctly using  
atomic_read() *and* depends on the compiler generating extra  
nonsensical loads due to "volatile" then the issue *might* be  
reconsidered.  This also includes samples of code which uses  
atomic_read() and needs memory barriers (so that we can fix the buggy  
code, not so we can change atomic_read()).  So far the only code  
samples anybody has posted are buggy regardless of whether or not the  
value and/or accessors are flagged "volatile" or not.  And hey, maybe  
the volatile ops *should* be implemented in inline ASM for future- 
proof-ness, but that's a separate issue.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 18:59                                           ` Christoph Lameter
@ 2007-09-10 20:54                                             ` Paul E. McKenney
  2007-09-10 21:36                                               ` Christoph Lameter
  2007-09-11  2:27                                             ` Segher Boessenkool
  1 sibling, 1 reply; 657+ messages in thread
From: Paul E. McKenney @ 2007-09-10 20:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang

On Mon, Sep 10, 2007 at 11:59:29AM -0700, Christoph Lameter wrote:
> On Fri, 17 Aug 2007, Segher Boessenkool wrote:
> 
> > "volatile" has nothing to do with reordering.  atomic_dec() writes
> > to memory, so it _does_ have "volatile semantics", implicitly, as
> > long as the compiler cannot optimise the atomic variable away
> > completely -- any store counts as a side effect.
> 
> Stores can be reordered. Only x86 has (mostly) implicit write ordering. So 
> no atomic_dec has no volatile semantics and may be reordered on a variety 
> of processors. Writes to memory may not follow code order on several 
> processors.

The one exception to this being the case where process-level code is
communicating to an interrupt handler running on that same CPU -- on
all CPUs that I am aware of, a given CPU always sees its own writes
in order.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 20:54                                             ` Paul E. McKenney
@ 2007-09-10 21:36                                               ` Christoph Lameter
  2007-09-10 21:50                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Lameter @ 2007-09-10 21:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang

On Mon, 10 Sep 2007, Paul E. McKenney wrote:

> The one exception to this being the case where process-level code is
> communicating to an interrupt handler running on that same CPU -- on
> all CPUs that I am aware of, a given CPU always sees its own writes
> in order.

Yes but that is due to the code path effectively continuing in the 
interrupt handler. The cpu makes sure that op codes being executed always 
see memory in a consistent way. The basic ordering problem with out of 
order writes is therefore coming from other processors concurrently 
executing code and holding variables in registers that are modified 
elsewhere. The only solution that I know of are one or the other form of 
barrier.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 21:36                                               ` Christoph Lameter
@ 2007-09-10 21:50                                                 ` Paul E. McKenney
  0 siblings, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-09-10 21:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
	Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
	jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang

On Mon, Sep 10, 2007 at 02:36:26PM -0700, Christoph Lameter wrote:
> On Mon, 10 Sep 2007, Paul E. McKenney wrote:
> 
> > The one exception to this being the case where process-level code is
> > communicating to an interrupt handler running on that same CPU -- on
> > all CPUs that I am aware of, a given CPU always sees its own writes
> > in order.
> 
> Yes but that is due to the code path effectively continuing in the 
> interrupt handler. The cpu makes sure that op codes being executed always 
> see memory in a consistent way. The basic ordering problem with out of 
> order writes is therefore coming from other processors concurrently 
> executing code and holding variables in registers that are modified 
> elsewhere. The only solution that I know of are one or the other form of 
> barrier.

So we are agreed then -- volatile accesses may be of some assistance when
interacting with interrupt handlers running on the same CPU (presumably
when using per-CPU variables), but are generally useless when sharing
variables among CPUs.  Correct?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* [PATCH] Document non-semantics of atomic_read() and atomic_set()
  2007-09-10 15:09                                                                           ` Linus Torvalds
  2007-09-10 16:46                                                                             ` Denys Vlasenko
  2007-09-10 18:59                                                                             ` Christoph Lameter
@ 2007-09-10 23:19                                                                             ` Chris Snook
  2007-09-10 23:44                                                                               ` Paul E. McKenney
  2007-09-11 19:35                                                                               ` Christoph Lameter
  2 siblings, 2 replies; 657+ messages in thread
From: Chris Snook @ 2007-09-10 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Kyle Moffett, Arjan van de Ven, Nick Piggin,
	Satyam Sharma, Herbert Xu, Paul Mackerras, Christoph Lameter,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

From: Chris Snook <csnook@redhat.com>

Unambiguously document the fact that atomic_read() and atomic_set()
do not imply any ordering or memory access, and that callers are
obligated to explicitly invoke barriers as needed to ensure that
changes to atomic variables are visible in all contexts that need
to see them.

Signed-off-by: Chris Snook <csnook@redhat.com>

--- a/Documentation/atomic_ops.txt	2007-07-08 19:32:17.000000000 -0400
+++ b/Documentation/atomic_ops.txt	2007-09-10 19:02:50.000000000 -0400
@@ -12,7 +12,11 @@
 C integer type will fail.  Something like the following should
 suffice:
 
-	typedef struct { volatile int counter; } atomic_t;
+	typedef struct { int counter; } atomic_t;
+
+	Historically, counter has been declared volatile.  This is now
+discouraged.  See Documentation/volatile-considered-harmful.txt for the
+complete rationale.
 
 	The first operations to implement for atomic_t's are the
 initializers and plain reads.
@@ -42,6 +46,22 @@
 
 which simply reads the current value of the counter.
 
+*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
+
+Some architectures may choose to use the volatile keyword, barriers, or
+inline assembly to guarantee some degree of immediacy for atomic_read()
+and atomic_set().  This is not uniformly guaranteed, and may change in
+the future, so all users of atomic_t should treat atomic_read() and
+atomic_set() as simple C assignment statements that may be reordered or
+optimized away entirely by the compiler or processor, and explicitly
+invoke the appropriate compiler and/or memory barrier for each use case.
+Failure to do so will result in code that may suddenly break when used with
+different architectures or compiler optimizations, or even changes in
+unrelated code which changes how the compiler optimizes the section
+accessing atomic_t variables.
+
+*** YOU HAVE BEEN WARNED! ***
+
 Now, we move onto the actual atomic operation interfaces.
 
 	void atomic_add(int i, atomic_t *v);

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] Document non-semantics of atomic_read() and atomic_set()
  2007-09-10 23:19                                                                             ` [PATCH] Document non-semantics of atomic_read() and atomic_set() Chris Snook
@ 2007-09-10 23:44                                                                               ` Paul E. McKenney
  2007-09-11 19:35                                                                               ` Christoph Lameter
  1 sibling, 0 replies; 657+ messages in thread
From: Paul E. McKenney @ 2007-09-10 23:44 UTC (permalink / raw)
  To: Chris Snook
  Cc: Linus Torvalds, Denys Vlasenko, Kyle Moffett, Arjan van de Ven,
	Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Christoph Lameter, Ilpo Jarvinen, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

On Mon, Sep 10, 2007 at 07:19:44PM -0400, Chris Snook wrote:
> From: Chris Snook <csnook@redhat.com>
> 
> Unambiguously document the fact that atomic_read() and atomic_set()
> do not imply any ordering or memory access, and that callers are
> obligated to explicitly invoke barriers as needed to ensure that
> changes to atomic variables are visible in all contexts that need
> to see them.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Chris Snook <csnook@redhat.com>
> 
> --- a/Documentation/atomic_ops.txt	2007-07-08 19:32:17.000000000 -0400
> +++ b/Documentation/atomic_ops.txt	2007-09-10 19:02:50.000000000 -0400
> @@ -12,7 +12,11 @@
>  C integer type will fail.  Something like the following should
>  suffice:
> 
> -	typedef struct { volatile int counter; } atomic_t;
> +	typedef struct { int counter; } atomic_t;
> +
> +	Historically, counter has been declared volatile.  This is now
> +discouraged.  See Documentation/volatile-considered-harmful.txt for the
> +complete rationale.
> 
>  	The first operations to implement for atomic_t's are the
>  initializers and plain reads.
> @@ -42,6 +46,22 @@
> 
>  which simply reads the current value of the counter.
> 
> +*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***
> +
> +Some architectures may choose to use the volatile keyword, barriers, or
> +inline assembly to guarantee some degree of immediacy for atomic_read()
> +and atomic_set().  This is not uniformly guaranteed, and may change in
> +the future, so all users of atomic_t should treat atomic_read() and
> +atomic_set() as simple C assignment statements that may be reordered or
> +optimized away entirely by the compiler or processor, and explicitly
> +invoke the appropriate compiler and/or memory barrier for each use case.
> +Failure to do so will result in code that may suddenly break when used with
> +different architectures or compiler optimizations, or even changes in
> +unrelated code which changes how the compiler optimizes the section
> +accessing atomic_t variables.
> +
> +*** YOU HAVE BEEN WARNED! ***
> +
>  Now, we move onto the actual atomic operation interfaces.
> 
>  	void atomic_add(int i, atomic_t *v);

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
  2007-09-10 18:59                                           ` Christoph Lameter
  2007-09-10 20:54                                             ` Paul E. McKenney
@ 2007-09-11  2:27                                             ` Segher Boessenkool
  1 sibling, 0 replies; 657+ messages in thread
From: Segher Boessenkool @ 2007-09-11  2:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Mackerras, heiko.carstens, horms, Stefan Richter,
	Satyam Sharma, Linux Kernel Mailing List, David Miller,
	Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
	Netdev, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
	wjiang

>> "volatile" has nothing to do with reordering.  atomic_dec() writes
>> to memory, so it _does_ have "volatile semantics", implicitly, as
>> long as the compiler cannot optimise the atomic variable away
>> completely -- any store counts as a side effect.
>
> Stores can be reordered. Only x86 has (mostly) implicit write ordering.
> So no atomic_dec has no volatile semantics

Read again: I said the C "volatile" construct has nothing to do
with CPU memory access reordering.

> and may be reordered on a variety
> of processors. Writes to memory may not follow code order on several
> processors.

The _compiler_ isn't allowed to reorder things here.  Yes, of course
you do need stronger barriers for many purposes, volatile isn't all
that useful you know.


Segher


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: [PATCH] Document non-semantics of atomic_read() and atomic_set()
  2007-09-10 23:19                                                                             ` [PATCH] Document non-semantics of atomic_read() and atomic_set() Chris Snook
  2007-09-10 23:44                                                                               ` Paul E. McKenney
@ 2007-09-11 19:35                                                                               ` Christoph Lameter
  1 sibling, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2007-09-11 19:35 UTC (permalink / raw)
  To: Chris Snook
  Cc: Linus Torvalds, Denys Vlasenko, Kyle Moffett, Arjan van de Ven,
	Nick Piggin, Satyam Sharma, Herbert Xu, Paul Mackerras,
	Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
	Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
	heiko.carstens, David Miller, schwidefsky, wensong, horms,
	wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher

Acked-by: Christoph Lameter <clameter@sgi.com>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-10-16 12:42   ` Gilbert Adikankwu
@ 2023-10-16 13:23     ` Julia Lawall
  0 siblings, 0 replies; 657+ messages in thread
From: Julia Lawall @ 2023-10-16 13:23 UTC (permalink / raw)
  To: Gilbert Adikankwu; +Cc: outreachy, linux-staging, linux-kernel, gregkh



On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:

> On Mon, Oct 16, 2023 at 02:34:48PM +0200, Julia Lawall wrote:
> >
> >
> > On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:
> >
> > > linux-staging@lists.linux.dev, linux-kernel@vger.kernel.org
> > > Bcc:
> > > Subject: Re: [PATCH] staging: emxx_udc: Remove unnecessary parentheses around
> > >  condition tests
> > > Reply-To:
> > > In-Reply-To: <6b60ed7-9d97-2071-44f8-83b173191ed@inria.fr>
> > >
> > > On Mon, Oct 16, 2023 at 02:15:06PM +0200, Julia Lawall wrote:
> > > >
> > > >
> > > > On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:
> > > >
> > > > > Fix 47 warnings detected by checkpatch.pl about unnecessary parenthesis
> > > > > around condition tests.
> > > >
> > > > If you need to make any changes to the patch, there is no need to give the
> > > > count of the changes.  It doesn't matter if it's 47, 46, 35, etc.
> > > >
> > > > julia
> > > >
> > > Hi Julia,
> > >
> > > I added the number because I saw I similar commit on the logs that did
> > > so. (commit b83970f23f36f0e2968872140e69f68118d82fe3)
> >
> > OK, I still think it's pointless...  The person who looks at the commit 5
> > years from now won't care about this information.  They care about what
> > you did and why.
> >
> > julia
> >
> Ok that make sense. I will revise it. Do I send revision patch now or
> later today?

You can wait, in case there are other comments.

julia

> >
> > > > >
> > > > > Signed-off-by: Gilbert Adikankwu <gilbertadikankwu@gmail.com>
> > > > > ---
> > > > >  drivers/staging/emxx_udc/emxx_udc.c | 72 ++++++++++++++---------------
> > > > >  1 file changed, 36 insertions(+), 36 deletions(-)
> > > > >
> > > > > diff --git a/drivers/staging/emxx_udc/emxx_udc.c b/drivers/staging/emxx_udc/emxx_udc.c
> > > > > index eb63daaca702..e8ddd691b788 100644
> > > > > --- a/drivers/staging/emxx_udc/emxx_udc.c
> > > > > +++ b/drivers/staging/emxx_udc/emxx_udc.c
> > > > > @@ -149,8 +149,8 @@ static void _nbu2ss_ep0_complete(struct usb_ep *_ep, struct usb_request *_req)
> > > > >  			/* SET_FEATURE */
> > > > >  			recipient = (u8)(p_ctrl->bRequestType & USB_RECIP_MASK);
> > > > >  			selector  = le16_to_cpu(p_ctrl->wValue);
> > > > > -			if ((recipient == USB_RECIP_DEVICE) &&
> > > > > -			    (selector == USB_DEVICE_TEST_MODE)) {
> > > > > +			if (recipient == USB_RECIP_DEVICE &&
> > > > > +			    selector == USB_DEVICE_TEST_MODE) {
> > > > >  				wIndex = le16_to_cpu(p_ctrl->wIndex);
> > > > >  				test_mode = (u32)(wIndex >> 8);
> > > > >  				_nbu2ss_set_test_mode(udc, test_mode);
> > > > > @@ -287,7 +287,7 @@ static int _nbu2ss_epn_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > > >  	u32		num;
> > > > >  	u32		data;
> > > > >
> > > > > -	if ((ep->epnum == 0) || (udc->vbus_active == 0))
> > > > > +	if (ep->epnum == 0 || udc->vbus_active == 0)
> > > > >  		return	-EINVAL;
> > > > >
> > > > >  	num = ep->epnum - 1;
> > > > > @@ -336,7 +336,7 @@ static void _nbu2ss_ep_dma_init(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > > >  	u32		data;
> > > > >
> > > > >  	data = _nbu2ss_readl(&udc->p_regs->USBSSCONF);
> > > > > -	if (((ep->epnum == 0) || (data & (1 << ep->epnum)) == 0))
> > > > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > > > >  		return;		/* Not Support DMA */
> > > > >
> > > > >  	num = ep->epnum - 1;
> > > > > @@ -380,7 +380,7 @@ static void _nbu2ss_ep_dma_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > > >  		return;		/* VBUS OFF */
> > > > >
> > > > >  	data = _nbu2ss_readl(&preg->USBSSCONF);
> > > > > -	if ((ep->epnum == 0) || ((data & (1 << ep->epnum)) == 0))
> > > > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > > > >  		return;		/* Not Support DMA */
> > > > >
> > > > >  	num = ep->epnum - 1;
> > > > > @@ -560,7 +560,7 @@ static int ep0_out_overbytes(struct nbu2ss_udc *udc, u8 *p_buf, u32 length)
> > > > >  	union usb_reg_access  temp_32;
> > > > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > > > >
> > > > > -	if ((length > 0) && (length < sizeof(u32))) {
> > > > > +	if (length > 0 && length < sizeof(u32)) {
> > > > >  		temp_32.dw = _nbu2ss_readl(&udc->p_regs->EP0_READ);
> > > > >  		for (i = 0 ; i < length ; i++)
> > > > >  			p_buf_32->byte.DATA[i] = temp_32.byte.DATA[i];
> > > > > @@ -608,7 +608,7 @@ static int ep0_in_overbytes(struct nbu2ss_udc *udc,
> > > > >  	union usb_reg_access  temp_32;
> > > > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > > > >
> > > > > -	if ((i_remain_size > 0) && (i_remain_size < sizeof(u32))) {
> > > > > +	if (i_remain_size > 0 && i_remain_size < sizeof(u32)) {
> > > > >  		for (i = 0 ; i < i_remain_size ; i++)
> > > > >  			temp_32.byte.DATA[i] = p_buf_32->byte.DATA[i];
> > > > >  		_nbu2ss_ep_in_end(udc, 0, temp_32.dw, i_remain_size);
> > > > > @@ -701,7 +701,7 @@ static int _nbu2ss_ep0_in_transfer(struct nbu2ss_udc *udc,
> > > > >  		return result;
> > > > >  	}
> > > > >
> > > > > -	if ((i_remain_size < sizeof(u32)) && (result != EP0_PACKETSIZE)) {
> > > > > +	if (i_remain_size < sizeof(u32) && result != EP0_PACKETSIZE) {
> > > > >  		p_buffer += result;
> > > > >  		result += ep0_in_overbytes(udc, p_buffer, i_remain_size);
> > > > >  		req->div_len = result;
> > > > > @@ -738,7 +738,7 @@ static int _nbu2ss_ep0_out_transfer(struct nbu2ss_udc *udc,
> > > > >  		req->req.actual += result;
> > > > >  		i_recv_length -= result;
> > > > >
> > > > > -		if ((i_recv_length > 0) && (i_recv_length < sizeof(u32))) {
> > > > > +		if (i_recv_length > 0 && i_recv_length < sizeof(u32)) {
> > > > >  			p_buffer += result;
> > > > >  			i_remain_size -= result;
> > > > >
> > > > > @@ -891,8 +891,8 @@ static int _nbu2ss_epn_out_pio(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > > >
> > > > >  	req->req.actual += result;
> > > > >
> > > > > -	if ((req->req.actual == req->req.length) ||
> > > > > -	    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > > > +	if (req->req.actual == req->req.length ||
> > > > > +	    (req->req.actual % ep->ep.maxpacket) != 0) {
> > > > >  		result = 0;
> > > > >  	}
> > > > >
> > > > > @@ -914,8 +914,8 @@ static int _nbu2ss_epn_out_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > > >
> > > > >  	i_buf_size = min((req->req.length - req->req.actual), data_size);
> > > > >
> > > > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > > > -	    (i_buf_size  >= sizeof(u32))) {
> > > > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > > > +	    i_buf_size  >= sizeof(u32)) {
> > > > >  		nret = _nbu2ss_out_dma(udc, req, num, i_buf_size);
> > > > >  	} else {
> > > > >  		i_buf_size = min_t(u32, i_buf_size, ep->ep.maxpacket);
> > > > > @@ -954,8 +954,8 @@ static int _nbu2ss_epn_out_transfer(struct nbu2ss_udc *udc,
> > > > >  			}
> > > > >  		}
> > > > >  	} else {
> > > > > -		if ((req->req.actual == req->req.length) ||
> > > > > -		    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > > > +		if (req->req.actual == req->req.length ||
> > > > > +		    (req->req.actual % ep->ep.maxpacket) != 0) {
> > > > >  			result = 0;
> > > > >  		}
> > > > >  	}
> > > > > @@ -1106,8 +1106,8 @@ static int _nbu2ss_epn_in_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > > >
> > > > >  	num = ep->epnum - 1;
> > > > >
> > > > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > > > -	    (data_size >= sizeof(u32))) {
> > > > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > > > +	    data_size >= sizeof(u32)) {
> > > > >  		nret = _nbu2ss_in_dma(udc, ep, req, num, data_size);
> > > > >  	} else {
> > > > >  		data_size = min_t(u32, data_size, ep->ep.maxpacket);
> > > > > @@ -1238,7 +1238,7 @@ static void _nbu2ss_endpoint_toggle_reset(struct nbu2ss_udc *udc, u8 ep_adrs)
> > > > >  	u8		num;
> > > > >  	u32		data;
> > > > >
> > > > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80))
> > > > > +	if (ep_adrs == 0 || ep_adrs == 0x80)
> > > > >  		return;
> > > > >
> > > > >  	num = (ep_adrs & 0x7F) - 1;
> > > > > @@ -1261,7 +1261,7 @@ static void _nbu2ss_set_endpoint_stall(struct nbu2ss_udc *udc,
> > > > >  	struct nbu2ss_ep *ep;
> > > > >  	struct fc_regs __iomem *preg = udc->p_regs;
> > > > >
> > > > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80)) {
> > > > > +	if (ep_adrs == 0 || ep_adrs == 0x80) {
> > > > >  		if (bstall) {
> > > > >  			/* Set STALL */
> > > > >  			_nbu2ss_bitset(&preg->EP0_CONTROL, EP0_STL);
> > > > > @@ -1392,8 +1392,8 @@ static inline int _nbu2ss_req_feature(struct nbu2ss_udc *udc, bool bset)
> > > > >  	u8	ep_adrs;
> > > > >  	int	result = -EOPNOTSUPP;
> > > > >
> > > > > -	if ((udc->ctrl.wLength != 0x0000) ||
> > > > > -	    (direction != USB_DIR_OUT)) {
> > > > > +	if (udc->ctrl.wLength != 0x0000 ||
> > > > > +	    direction != USB_DIR_OUT) {
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > @@ -1480,7 +1480,7 @@ static int std_req_get_status(struct nbu2ss_udc *udc)
> > > > >  	u8	ep_adrs;
> > > > >  	int	result = -EINVAL;
> > > > >
> > > > > -	if ((udc->ctrl.wValue != 0x0000) || (direction != USB_DIR_IN))
> > > > > +	if (udc->ctrl.wValue != 0x0000 || direction != USB_DIR_IN)
> > > > >  		return result;
> > > > >
> > > > >  	length =
> > > > > @@ -1542,9 +1542,9 @@ static int std_req_set_address(struct nbu2ss_udc *udc)
> > > > >  	int		result = 0;
> > > > >  	u32		wValue = le16_to_cpu(udc->ctrl.wValue);
> > > > >
> > > > > -	if ((udc->ctrl.bRequestType != 0x00)	||
> > > > > -	    (udc->ctrl.wIndex != 0x0000)	||
> > > > > -		(udc->ctrl.wLength != 0x0000)) {
> > > > > +	if (udc->ctrl.bRequestType != 0x00	||
> > > > > +	    udc->ctrl.wIndex != 0x0000		||
> > > > > +		udc->ctrl.wLength != 0x0000) {
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > @@ -1564,9 +1564,9 @@ static int std_req_set_configuration(struct nbu2ss_udc *udc)
> > > > >  {
> > > > >  	u32 config_value = (u32)(le16_to_cpu(udc->ctrl.wValue) & 0x00ff);
> > > > >
> > > > > -	if ((udc->ctrl.wIndex != 0x0000)	||
> > > > > -	    (udc->ctrl.wLength != 0x0000)	||
> > > > > -		(udc->ctrl.bRequestType != 0x00)) {
> > > > > +	if (udc->ctrl.wIndex != 0x0000	||
> > > > > +	    udc->ctrl.wLength != 0x0000	||
> > > > > +		udc->ctrl.bRequestType != 0x00) {
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > @@ -1838,8 +1838,8 @@ static void _nbu2ss_ep_done(struct nbu2ss_ep *ep,
> > > > >  	}
> > > > >
> > > > >  #ifdef USE_DMA
> > > > > -	if ((ep->direct == USB_DIR_OUT) && (ep->epnum > 0) &&
> > > > > -	    (req->req.dma != 0))
> > > > > +	if (ep->direct == USB_DIR_OUT && ep->epnum > 0 &&
> > > > > +	    req->req.dma != 0)
> > > > >  		_nbu2ss_dma_unmap_single(udc, ep, req, USB_DIR_OUT);
> > > > >  #endif
> > > > >
> > > > > @@ -1931,7 +1931,7 @@ static inline void _nbu2ss_epn_in_dma_int(struct nbu2ss_udc *udc,
> > > > >  		mpkt = ep->ep.maxpacket;
> > > > >  		size = preq->actual % mpkt;
> > > > >  		if (size > 0) {
> > > > > -			if (((preq->actual & 0x03) == 0) && (size < mpkt))
> > > > > +			if ((preq->actual & 0x03) == 0 && size < mpkt)
> > > > >  				_nbu2ss_ep_in_end(udc, ep->epnum, 0, 0);
> > > > >  		} else {
> > > > >  			_nbu2ss_epn_in_int(udc, ep, req);
> > > > > @@ -2428,8 +2428,8 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > > > >  	}
> > > > >
> > > > >  	ep_type = usb_endpoint_type(desc);
> > > > > -	if ((ep_type == USB_ENDPOINT_XFER_CONTROL) ||
> > > > > -	    (ep_type == USB_ENDPOINT_XFER_ISOC)) {
> > > > > +	if (ep_type == USB_ENDPOINT_XFER_CONTROL ||
> > > > > +	    ep_type == USB_ENDPOINT_XFER_ISOC) {
> > > > >  		pr_err(" *** %s, bat bmAttributes\n", __func__);
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > > @@ -2438,7 +2438,7 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > > > >  	if (udc->vbus_active == 0)
> > > > >  		return -ESHUTDOWN;
> > > > >
> > > > > -	if ((!udc->driver) || (udc->gadget.speed == USB_SPEED_UNKNOWN)) {
> > > > > +	if (!udc->driver || udc->gadget.speed == USB_SPEED_UNKNOWN) {
> > > > >  		dev_err(ep->udc->dev, " *** %s, udc !!\n", __func__);
> > > > >  		return -ESHUTDOWN;
> > > > >  	}
> > > > > @@ -2603,8 +2603,8 @@ static int nbu2ss_ep_queue(struct usb_ep *_ep,
> > > > >  		}
> > > > >  	}
> > > > >
> > > > > -	if ((ep->epnum > 0) && (ep->direct == USB_DIR_OUT) &&
> > > > > -	    (req->req.dma != 0))
> > > > > +	if (ep->epnum > 0 && ep->direct == USB_DIR_OUT &&
> > > > > +	    req->req.dma != 0)
> > > > >  		_nbu2ss_dma_map_single(udc, ep, req, USB_DIR_OUT);
> > > > >  #endif
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
> > > > >
> > > > >
> > >
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-10-16 12:34 ` your mail Julia Lawall
@ 2023-10-16 12:42   ` Gilbert Adikankwu
  2023-10-16 13:23     ` Julia Lawall
  0 siblings, 1 reply; 657+ messages in thread
From: Gilbert Adikankwu @ 2023-10-16 12:42 UTC (permalink / raw)
  To: Julia Lawall, outreachy; +Cc: linux-staging, linux-kernel, gregkh

On Mon, Oct 16, 2023 at 02:34:48PM +0200, Julia Lawall wrote:
> 
> 
> On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:
> 
> > linux-staging@lists.linux.dev, linux-kernel@vger.kernel.org
> > Bcc:
> > Subject: Re: [PATCH] staging: emxx_udc: Remove unnecessary parentheses around
> >  condition tests
> > Reply-To:
> > In-Reply-To: <6b60ed7-9d97-2071-44f8-83b173191ed@inria.fr>
> >
> > On Mon, Oct 16, 2023 at 02:15:06PM +0200, Julia Lawall wrote:
> > >
> > >
> > > On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:
> > >
> > > > Fix 47 warnings detected by checkpatch.pl about unnecessary parenthesis
> > > > around condition tests.
> > >
> > > If you need to make any changes to the patch, there is no need to give the
> > > count of the changes.  It doesn't matter if it's 47, 46, 35, etc.
> > >
> > > julia
> > >
> > Hi Julia,
> >
> > I added the number because I saw I similar commit on the logs that did
> > so. (commit b83970f23f36f0e2968872140e69f68118d82fe3)
> 
> OK, I still think it's pointless...  The person who looks at the commit 5
> years from now won't care about this information.  They care about what
> you did and why.
> 
> julia
> 
Ok that make sense. I will revise it. Do I send revision patch now or
later today?
> 
> > > >
> > > > Signed-off-by: Gilbert Adikankwu <gilbertadikankwu@gmail.com>
> > > > ---
> > > >  drivers/staging/emxx_udc/emxx_udc.c | 72 ++++++++++++++---------------
> > > >  1 file changed, 36 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/drivers/staging/emxx_udc/emxx_udc.c b/drivers/staging/emxx_udc/emxx_udc.c
> > > > index eb63daaca702..e8ddd691b788 100644
> > > > --- a/drivers/staging/emxx_udc/emxx_udc.c
> > > > +++ b/drivers/staging/emxx_udc/emxx_udc.c
> > > > @@ -149,8 +149,8 @@ static void _nbu2ss_ep0_complete(struct usb_ep *_ep, struct usb_request *_req)
> > > >  			/* SET_FEATURE */
> > > >  			recipient = (u8)(p_ctrl->bRequestType & USB_RECIP_MASK);
> > > >  			selector  = le16_to_cpu(p_ctrl->wValue);
> > > > -			if ((recipient == USB_RECIP_DEVICE) &&
> > > > -			    (selector == USB_DEVICE_TEST_MODE)) {
> > > > +			if (recipient == USB_RECIP_DEVICE &&
> > > > +			    selector == USB_DEVICE_TEST_MODE) {
> > > >  				wIndex = le16_to_cpu(p_ctrl->wIndex);
> > > >  				test_mode = (u32)(wIndex >> 8);
> > > >  				_nbu2ss_set_test_mode(udc, test_mode);
> > > > @@ -287,7 +287,7 @@ static int _nbu2ss_epn_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > >  	u32		num;
> > > >  	u32		data;
> > > >
> > > > -	if ((ep->epnum == 0) || (udc->vbus_active == 0))
> > > > +	if (ep->epnum == 0 || udc->vbus_active == 0)
> > > >  		return	-EINVAL;
> > > >
> > > >  	num = ep->epnum - 1;
> > > > @@ -336,7 +336,7 @@ static void _nbu2ss_ep_dma_init(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > >  	u32		data;
> > > >
> > > >  	data = _nbu2ss_readl(&udc->p_regs->USBSSCONF);
> > > > -	if (((ep->epnum == 0) || (data & (1 << ep->epnum)) == 0))
> > > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > > >  		return;		/* Not Support DMA */
> > > >
> > > >  	num = ep->epnum - 1;
> > > > @@ -380,7 +380,7 @@ static void _nbu2ss_ep_dma_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > > >  		return;		/* VBUS OFF */
> > > >
> > > >  	data = _nbu2ss_readl(&preg->USBSSCONF);
> > > > -	if ((ep->epnum == 0) || ((data & (1 << ep->epnum)) == 0))
> > > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > > >  		return;		/* Not Support DMA */
> > > >
> > > >  	num = ep->epnum - 1;
> > > > @@ -560,7 +560,7 @@ static int ep0_out_overbytes(struct nbu2ss_udc *udc, u8 *p_buf, u32 length)
> > > >  	union usb_reg_access  temp_32;
> > > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > > >
> > > > -	if ((length > 0) && (length < sizeof(u32))) {
> > > > +	if (length > 0 && length < sizeof(u32)) {
> > > >  		temp_32.dw = _nbu2ss_readl(&udc->p_regs->EP0_READ);
> > > >  		for (i = 0 ; i < length ; i++)
> > > >  			p_buf_32->byte.DATA[i] = temp_32.byte.DATA[i];
> > > > @@ -608,7 +608,7 @@ static int ep0_in_overbytes(struct nbu2ss_udc *udc,
> > > >  	union usb_reg_access  temp_32;
> > > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > > >
> > > > -	if ((i_remain_size > 0) && (i_remain_size < sizeof(u32))) {
> > > > +	if (i_remain_size > 0 && i_remain_size < sizeof(u32)) {
> > > >  		for (i = 0 ; i < i_remain_size ; i++)
> > > >  			temp_32.byte.DATA[i] = p_buf_32->byte.DATA[i];
> > > >  		_nbu2ss_ep_in_end(udc, 0, temp_32.dw, i_remain_size);
> > > > @@ -701,7 +701,7 @@ static int _nbu2ss_ep0_in_transfer(struct nbu2ss_udc *udc,
> > > >  		return result;
> > > >  	}
> > > >
> > > > -	if ((i_remain_size < sizeof(u32)) && (result != EP0_PACKETSIZE)) {
> > > > +	if (i_remain_size < sizeof(u32) && result != EP0_PACKETSIZE) {
> > > >  		p_buffer += result;
> > > >  		result += ep0_in_overbytes(udc, p_buffer, i_remain_size);
> > > >  		req->div_len = result;
> > > > @@ -738,7 +738,7 @@ static int _nbu2ss_ep0_out_transfer(struct nbu2ss_udc *udc,
> > > >  		req->req.actual += result;
> > > >  		i_recv_length -= result;
> > > >
> > > > -		if ((i_recv_length > 0) && (i_recv_length < sizeof(u32))) {
> > > > +		if (i_recv_length > 0 && i_recv_length < sizeof(u32)) {
> > > >  			p_buffer += result;
> > > >  			i_remain_size -= result;
> > > >
> > > > @@ -891,8 +891,8 @@ static int _nbu2ss_epn_out_pio(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > >
> > > >  	req->req.actual += result;
> > > >
> > > > -	if ((req->req.actual == req->req.length) ||
> > > > -	    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > > +	if (req->req.actual == req->req.length ||
> > > > +	    (req->req.actual % ep->ep.maxpacket) != 0) {
> > > >  		result = 0;
> > > >  	}
> > > >
> > > > @@ -914,8 +914,8 @@ static int _nbu2ss_epn_out_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > >
> > > >  	i_buf_size = min((req->req.length - req->req.actual), data_size);
> > > >
> > > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > > -	    (i_buf_size  >= sizeof(u32))) {
> > > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > > +	    i_buf_size  >= sizeof(u32)) {
> > > >  		nret = _nbu2ss_out_dma(udc, req, num, i_buf_size);
> > > >  	} else {
> > > >  		i_buf_size = min_t(u32, i_buf_size, ep->ep.maxpacket);
> > > > @@ -954,8 +954,8 @@ static int _nbu2ss_epn_out_transfer(struct nbu2ss_udc *udc,
> > > >  			}
> > > >  		}
> > > >  	} else {
> > > > -		if ((req->req.actual == req->req.length) ||
> > > > -		    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > > +		if (req->req.actual == req->req.length ||
> > > > +		    (req->req.actual % ep->ep.maxpacket) != 0) {
> > > >  			result = 0;
> > > >  		}
> > > >  	}
> > > > @@ -1106,8 +1106,8 @@ static int _nbu2ss_epn_in_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > > >
> > > >  	num = ep->epnum - 1;
> > > >
> > > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > > -	    (data_size >= sizeof(u32))) {
> > > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > > +	    data_size >= sizeof(u32)) {
> > > >  		nret = _nbu2ss_in_dma(udc, ep, req, num, data_size);
> > > >  	} else {
> > > >  		data_size = min_t(u32, data_size, ep->ep.maxpacket);
> > > > @@ -1238,7 +1238,7 @@ static void _nbu2ss_endpoint_toggle_reset(struct nbu2ss_udc *udc, u8 ep_adrs)
> > > >  	u8		num;
> > > >  	u32		data;
> > > >
> > > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80))
> > > > +	if (ep_adrs == 0 || ep_adrs == 0x80)
> > > >  		return;
> > > >
> > > >  	num = (ep_adrs & 0x7F) - 1;
> > > > @@ -1261,7 +1261,7 @@ static void _nbu2ss_set_endpoint_stall(struct nbu2ss_udc *udc,
> > > >  	struct nbu2ss_ep *ep;
> > > >  	struct fc_regs __iomem *preg = udc->p_regs;
> > > >
> > > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80)) {
> > > > +	if (ep_adrs == 0 || ep_adrs == 0x80) {
> > > >  		if (bstall) {
> > > >  			/* Set STALL */
> > > >  			_nbu2ss_bitset(&preg->EP0_CONTROL, EP0_STL);
> > > > @@ -1392,8 +1392,8 @@ static inline int _nbu2ss_req_feature(struct nbu2ss_udc *udc, bool bset)
> > > >  	u8	ep_adrs;
> > > >  	int	result = -EOPNOTSUPP;
> > > >
> > > > -	if ((udc->ctrl.wLength != 0x0000) ||
> > > > -	    (direction != USB_DIR_OUT)) {
> > > > +	if (udc->ctrl.wLength != 0x0000 ||
> > > > +	    direction != USB_DIR_OUT) {
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > @@ -1480,7 +1480,7 @@ static int std_req_get_status(struct nbu2ss_udc *udc)
> > > >  	u8	ep_adrs;
> > > >  	int	result = -EINVAL;
> > > >
> > > > -	if ((udc->ctrl.wValue != 0x0000) || (direction != USB_DIR_IN))
> > > > +	if (udc->ctrl.wValue != 0x0000 || direction != USB_DIR_IN)
> > > >  		return result;
> > > >
> > > >  	length =
> > > > @@ -1542,9 +1542,9 @@ static int std_req_set_address(struct nbu2ss_udc *udc)
> > > >  	int		result = 0;
> > > >  	u32		wValue = le16_to_cpu(udc->ctrl.wValue);
> > > >
> > > > -	if ((udc->ctrl.bRequestType != 0x00)	||
> > > > -	    (udc->ctrl.wIndex != 0x0000)	||
> > > > -		(udc->ctrl.wLength != 0x0000)) {
> > > > +	if (udc->ctrl.bRequestType != 0x00	||
> > > > +	    udc->ctrl.wIndex != 0x0000		||
> > > > +		udc->ctrl.wLength != 0x0000) {
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > @@ -1564,9 +1564,9 @@ static int std_req_set_configuration(struct nbu2ss_udc *udc)
> > > >  {
> > > >  	u32 config_value = (u32)(le16_to_cpu(udc->ctrl.wValue) & 0x00ff);
> > > >
> > > > -	if ((udc->ctrl.wIndex != 0x0000)	||
> > > > -	    (udc->ctrl.wLength != 0x0000)	||
> > > > -		(udc->ctrl.bRequestType != 0x00)) {
> > > > +	if (udc->ctrl.wIndex != 0x0000	||
> > > > +	    udc->ctrl.wLength != 0x0000	||
> > > > +		udc->ctrl.bRequestType != 0x00) {
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > @@ -1838,8 +1838,8 @@ static void _nbu2ss_ep_done(struct nbu2ss_ep *ep,
> > > >  	}
> > > >
> > > >  #ifdef USE_DMA
> > > > -	if ((ep->direct == USB_DIR_OUT) && (ep->epnum > 0) &&
> > > > -	    (req->req.dma != 0))
> > > > +	if (ep->direct == USB_DIR_OUT && ep->epnum > 0 &&
> > > > +	    req->req.dma != 0)
> > > >  		_nbu2ss_dma_unmap_single(udc, ep, req, USB_DIR_OUT);
> > > >  #endif
> > > >
> > > > @@ -1931,7 +1931,7 @@ static inline void _nbu2ss_epn_in_dma_int(struct nbu2ss_udc *udc,
> > > >  		mpkt = ep->ep.maxpacket;
> > > >  		size = preq->actual % mpkt;
> > > >  		if (size > 0) {
> > > > -			if (((preq->actual & 0x03) == 0) && (size < mpkt))
> > > > +			if ((preq->actual & 0x03) == 0 && size < mpkt)
> > > >  				_nbu2ss_ep_in_end(udc, ep->epnum, 0, 0);
> > > >  		} else {
> > > >  			_nbu2ss_epn_in_int(udc, ep, req);
> > > > @@ -2428,8 +2428,8 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > > >  	}
> > > >
> > > >  	ep_type = usb_endpoint_type(desc);
> > > > -	if ((ep_type == USB_ENDPOINT_XFER_CONTROL) ||
> > > > -	    (ep_type == USB_ENDPOINT_XFER_ISOC)) {
> > > > +	if (ep_type == USB_ENDPOINT_XFER_CONTROL ||
> > > > +	    ep_type == USB_ENDPOINT_XFER_ISOC) {
> > > >  		pr_err(" *** %s, bat bmAttributes\n", __func__);
> > > >  		return -EINVAL;
> > > >  	}
> > > > @@ -2438,7 +2438,7 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > > >  	if (udc->vbus_active == 0)
> > > >  		return -ESHUTDOWN;
> > > >
> > > > -	if ((!udc->driver) || (udc->gadget.speed == USB_SPEED_UNKNOWN)) {
> > > > +	if (!udc->driver || udc->gadget.speed == USB_SPEED_UNKNOWN) {
> > > >  		dev_err(ep->udc->dev, " *** %s, udc !!\n", __func__);
> > > >  		return -ESHUTDOWN;
> > > >  	}
> > > > @@ -2603,8 +2603,8 @@ static int nbu2ss_ep_queue(struct usb_ep *_ep,
> > > >  		}
> > > >  	}
> > > >
> > > > -	if ((ep->epnum > 0) && (ep->direct == USB_DIR_OUT) &&
> > > > -	    (req->req.dma != 0))
> > > > +	if (ep->epnum > 0 && ep->direct == USB_DIR_OUT &&
> > > > +	    req->req.dma != 0)
> > > >  		_nbu2ss_dma_map_single(udc, ep, req, USB_DIR_OUT);
> > > >  #endif
> > > >
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-10-16 12:31 Gilbert Adikankwu
@ 2023-10-16 12:34 ` Julia Lawall
  2023-10-16 12:42   ` Gilbert Adikankwu
  0 siblings, 1 reply; 657+ messages in thread
From: Julia Lawall @ 2023-10-16 12:34 UTC (permalink / raw)
  To: Gilbert Adikankwu; +Cc: outreachy, gregkh, linux-staging, linux-kernel



On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:

> linux-staging@lists.linux.dev, linux-kernel@vger.kernel.org
> Bcc:
> Subject: Re: [PATCH] staging: emxx_udc: Remove unnecessary parentheses around
>  condition tests
> Reply-To:
> In-Reply-To: <6b60ed7-9d97-2071-44f8-83b173191ed@inria.fr>
>
> On Mon, Oct 16, 2023 at 02:15:06PM +0200, Julia Lawall wrote:
> >
> >
> > On Mon, 16 Oct 2023, Gilbert Adikankwu wrote:
> >
> > > Fix 47 warnings detected by checkpatch.pl about unnecessary parenthesis
> > > around condition tests.
> >
> > If you need to make any changes to the patch, there is no need to give the
> > count of the changes.  It doesn't matter if it's 47, 46, 35, etc.
> >
> > julia
> >
> Hi Julia,
>
> I added the number because I saw I similar commit on the logs that did
> so. (commit b83970f23f36f0e2968872140e69f68118d82fe3)

OK, I still think it's pointless...  The person who looks at the commit 5
years from now won't care about this information.  They care about what
you did and why.

julia


> > >
> > > Signed-off-by: Gilbert Adikankwu <gilbertadikankwu@gmail.com>
> > > ---
> > >  drivers/staging/emxx_udc/emxx_udc.c | 72 ++++++++++++++---------------
> > >  1 file changed, 36 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/staging/emxx_udc/emxx_udc.c b/drivers/staging/emxx_udc/emxx_udc.c
> > > index eb63daaca702..e8ddd691b788 100644
> > > --- a/drivers/staging/emxx_udc/emxx_udc.c
> > > +++ b/drivers/staging/emxx_udc/emxx_udc.c
> > > @@ -149,8 +149,8 @@ static void _nbu2ss_ep0_complete(struct usb_ep *_ep, struct usb_request *_req)
> > >  			/* SET_FEATURE */
> > >  			recipient = (u8)(p_ctrl->bRequestType & USB_RECIP_MASK);
> > >  			selector  = le16_to_cpu(p_ctrl->wValue);
> > > -			if ((recipient == USB_RECIP_DEVICE) &&
> > > -			    (selector == USB_DEVICE_TEST_MODE)) {
> > > +			if (recipient == USB_RECIP_DEVICE &&
> > > +			    selector == USB_DEVICE_TEST_MODE) {
> > >  				wIndex = le16_to_cpu(p_ctrl->wIndex);
> > >  				test_mode = (u32)(wIndex >> 8);
> > >  				_nbu2ss_set_test_mode(udc, test_mode);
> > > @@ -287,7 +287,7 @@ static int _nbu2ss_epn_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > >  	u32		num;
> > >  	u32		data;
> > >
> > > -	if ((ep->epnum == 0) || (udc->vbus_active == 0))
> > > +	if (ep->epnum == 0 || udc->vbus_active == 0)
> > >  		return	-EINVAL;
> > >
> > >  	num = ep->epnum - 1;
> > > @@ -336,7 +336,7 @@ static void _nbu2ss_ep_dma_init(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > >  	u32		data;
> > >
> > >  	data = _nbu2ss_readl(&udc->p_regs->USBSSCONF);
> > > -	if (((ep->epnum == 0) || (data & (1 << ep->epnum)) == 0))
> > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > >  		return;		/* Not Support DMA */
> > >
> > >  	num = ep->epnum - 1;
> > > @@ -380,7 +380,7 @@ static void _nbu2ss_ep_dma_exit(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep)
> > >  		return;		/* VBUS OFF */
> > >
> > >  	data = _nbu2ss_readl(&preg->USBSSCONF);
> > > -	if ((ep->epnum == 0) || ((data & (1 << ep->epnum)) == 0))
> > > +	if (ep->epnum == 0 || (data & (1 << ep->epnum)) == 0)
> > >  		return;		/* Not Support DMA */
> > >
> > >  	num = ep->epnum - 1;
> > > @@ -560,7 +560,7 @@ static int ep0_out_overbytes(struct nbu2ss_udc *udc, u8 *p_buf, u32 length)
> > >  	union usb_reg_access  temp_32;
> > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > >
> > > -	if ((length > 0) && (length < sizeof(u32))) {
> > > +	if (length > 0 && length < sizeof(u32)) {
> > >  		temp_32.dw = _nbu2ss_readl(&udc->p_regs->EP0_READ);
> > >  		for (i = 0 ; i < length ; i++)
> > >  			p_buf_32->byte.DATA[i] = temp_32.byte.DATA[i];
> > > @@ -608,7 +608,7 @@ static int ep0_in_overbytes(struct nbu2ss_udc *udc,
> > >  	union usb_reg_access  temp_32;
> > >  	union usb_reg_access  *p_buf_32 = (union usb_reg_access *)p_buf;
> > >
> > > -	if ((i_remain_size > 0) && (i_remain_size < sizeof(u32))) {
> > > +	if (i_remain_size > 0 && i_remain_size < sizeof(u32)) {
> > >  		for (i = 0 ; i < i_remain_size ; i++)
> > >  			temp_32.byte.DATA[i] = p_buf_32->byte.DATA[i];
> > >  		_nbu2ss_ep_in_end(udc, 0, temp_32.dw, i_remain_size);
> > > @@ -701,7 +701,7 @@ static int _nbu2ss_ep0_in_transfer(struct nbu2ss_udc *udc,
> > >  		return result;
> > >  	}
> > >
> > > -	if ((i_remain_size < sizeof(u32)) && (result != EP0_PACKETSIZE)) {
> > > +	if (i_remain_size < sizeof(u32) && result != EP0_PACKETSIZE) {
> > >  		p_buffer += result;
> > >  		result += ep0_in_overbytes(udc, p_buffer, i_remain_size);
> > >  		req->div_len = result;
> > > @@ -738,7 +738,7 @@ static int _nbu2ss_ep0_out_transfer(struct nbu2ss_udc *udc,
> > >  		req->req.actual += result;
> > >  		i_recv_length -= result;
> > >
> > > -		if ((i_recv_length > 0) && (i_recv_length < sizeof(u32))) {
> > > +		if (i_recv_length > 0 && i_recv_length < sizeof(u32)) {
> > >  			p_buffer += result;
> > >  			i_remain_size -= result;
> > >
> > > @@ -891,8 +891,8 @@ static int _nbu2ss_epn_out_pio(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > >
> > >  	req->req.actual += result;
> > >
> > > -	if ((req->req.actual == req->req.length) ||
> > > -	    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > +	if (req->req.actual == req->req.length ||
> > > +	    (req->req.actual % ep->ep.maxpacket) != 0) {
> > >  		result = 0;
> > >  	}
> > >
> > > @@ -914,8 +914,8 @@ static int _nbu2ss_epn_out_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > >
> > >  	i_buf_size = min((req->req.length - req->req.actual), data_size);
> > >
> > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > -	    (i_buf_size  >= sizeof(u32))) {
> > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > +	    i_buf_size  >= sizeof(u32)) {
> > >  		nret = _nbu2ss_out_dma(udc, req, num, i_buf_size);
> > >  	} else {
> > >  		i_buf_size = min_t(u32, i_buf_size, ep->ep.maxpacket);
> > > @@ -954,8 +954,8 @@ static int _nbu2ss_epn_out_transfer(struct nbu2ss_udc *udc,
> > >  			}
> > >  		}
> > >  	} else {
> > > -		if ((req->req.actual == req->req.length) ||
> > > -		    ((req->req.actual % ep->ep.maxpacket) != 0)) {
> > > +		if (req->req.actual == req->req.length ||
> > > +		    (req->req.actual % ep->ep.maxpacket) != 0) {
> > >  			result = 0;
> > >  		}
> > >  	}
> > > @@ -1106,8 +1106,8 @@ static int _nbu2ss_epn_in_data(struct nbu2ss_udc *udc, struct nbu2ss_ep *ep,
> > >
> > >  	num = ep->epnum - 1;
> > >
> > > -	if ((ep->ep_type != USB_ENDPOINT_XFER_INT) && (req->req.dma != 0) &&
> > > -	    (data_size >= sizeof(u32))) {
> > > +	if (ep->ep_type != USB_ENDPOINT_XFER_INT && req->req.dma != 0 &&
> > > +	    data_size >= sizeof(u32)) {
> > >  		nret = _nbu2ss_in_dma(udc, ep, req, num, data_size);
> > >  	} else {
> > >  		data_size = min_t(u32, data_size, ep->ep.maxpacket);
> > > @@ -1238,7 +1238,7 @@ static void _nbu2ss_endpoint_toggle_reset(struct nbu2ss_udc *udc, u8 ep_adrs)
> > >  	u8		num;
> > >  	u32		data;
> > >
> > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80))
> > > +	if (ep_adrs == 0 || ep_adrs == 0x80)
> > >  		return;
> > >
> > >  	num = (ep_adrs & 0x7F) - 1;
> > > @@ -1261,7 +1261,7 @@ static void _nbu2ss_set_endpoint_stall(struct nbu2ss_udc *udc,
> > >  	struct nbu2ss_ep *ep;
> > >  	struct fc_regs __iomem *preg = udc->p_regs;
> > >
> > > -	if ((ep_adrs == 0) || (ep_adrs == 0x80)) {
> > > +	if (ep_adrs == 0 || ep_adrs == 0x80) {
> > >  		if (bstall) {
> > >  			/* Set STALL */
> > >  			_nbu2ss_bitset(&preg->EP0_CONTROL, EP0_STL);
> > > @@ -1392,8 +1392,8 @@ static inline int _nbu2ss_req_feature(struct nbu2ss_udc *udc, bool bset)
> > >  	u8	ep_adrs;
> > >  	int	result = -EOPNOTSUPP;
> > >
> > > -	if ((udc->ctrl.wLength != 0x0000) ||
> > > -	    (direction != USB_DIR_OUT)) {
> > > +	if (udc->ctrl.wLength != 0x0000 ||
> > > +	    direction != USB_DIR_OUT) {
> > >  		return -EINVAL;
> > >  	}
> > >
> > > @@ -1480,7 +1480,7 @@ static int std_req_get_status(struct nbu2ss_udc *udc)
> > >  	u8	ep_adrs;
> > >  	int	result = -EINVAL;
> > >
> > > -	if ((udc->ctrl.wValue != 0x0000) || (direction != USB_DIR_IN))
> > > +	if (udc->ctrl.wValue != 0x0000 || direction != USB_DIR_IN)
> > >  		return result;
> > >
> > >  	length =
> > > @@ -1542,9 +1542,9 @@ static int std_req_set_address(struct nbu2ss_udc *udc)
> > >  	int		result = 0;
> > >  	u32		wValue = le16_to_cpu(udc->ctrl.wValue);
> > >
> > > -	if ((udc->ctrl.bRequestType != 0x00)	||
> > > -	    (udc->ctrl.wIndex != 0x0000)	||
> > > -		(udc->ctrl.wLength != 0x0000)) {
> > > +	if (udc->ctrl.bRequestType != 0x00	||
> > > +	    udc->ctrl.wIndex != 0x0000		||
> > > +		udc->ctrl.wLength != 0x0000) {
> > >  		return -EINVAL;
> > >  	}
> > >
> > > @@ -1564,9 +1564,9 @@ static int std_req_set_configuration(struct nbu2ss_udc *udc)
> > >  {
> > >  	u32 config_value = (u32)(le16_to_cpu(udc->ctrl.wValue) & 0x00ff);
> > >
> > > -	if ((udc->ctrl.wIndex != 0x0000)	||
> > > -	    (udc->ctrl.wLength != 0x0000)	||
> > > -		(udc->ctrl.bRequestType != 0x00)) {
> > > +	if (udc->ctrl.wIndex != 0x0000	||
> > > +	    udc->ctrl.wLength != 0x0000	||
> > > +		udc->ctrl.bRequestType != 0x00) {
> > >  		return -EINVAL;
> > >  	}
> > >
> > > @@ -1838,8 +1838,8 @@ static void _nbu2ss_ep_done(struct nbu2ss_ep *ep,
> > >  	}
> > >
> > >  #ifdef USE_DMA
> > > -	if ((ep->direct == USB_DIR_OUT) && (ep->epnum > 0) &&
> > > -	    (req->req.dma != 0))
> > > +	if (ep->direct == USB_DIR_OUT && ep->epnum > 0 &&
> > > +	    req->req.dma != 0)
> > >  		_nbu2ss_dma_unmap_single(udc, ep, req, USB_DIR_OUT);
> > >  #endif
> > >
> > > @@ -1931,7 +1931,7 @@ static inline void _nbu2ss_epn_in_dma_int(struct nbu2ss_udc *udc,
> > >  		mpkt = ep->ep.maxpacket;
> > >  		size = preq->actual % mpkt;
> > >  		if (size > 0) {
> > > -			if (((preq->actual & 0x03) == 0) && (size < mpkt))
> > > +			if ((preq->actual & 0x03) == 0 && size < mpkt)
> > >  				_nbu2ss_ep_in_end(udc, ep->epnum, 0, 0);
> > >  		} else {
> > >  			_nbu2ss_epn_in_int(udc, ep, req);
> > > @@ -2428,8 +2428,8 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > >  	}
> > >
> > >  	ep_type = usb_endpoint_type(desc);
> > > -	if ((ep_type == USB_ENDPOINT_XFER_CONTROL) ||
> > > -	    (ep_type == USB_ENDPOINT_XFER_ISOC)) {
> > > +	if (ep_type == USB_ENDPOINT_XFER_CONTROL ||
> > > +	    ep_type == USB_ENDPOINT_XFER_ISOC) {
> > >  		pr_err(" *** %s, bat bmAttributes\n", __func__);
> > >  		return -EINVAL;
> > >  	}
> > > @@ -2438,7 +2438,7 @@ static int nbu2ss_ep_enable(struct usb_ep *_ep,
> > >  	if (udc->vbus_active == 0)
> > >  		return -ESHUTDOWN;
> > >
> > > -	if ((!udc->driver) || (udc->gadget.speed == USB_SPEED_UNKNOWN)) {
> > > +	if (!udc->driver || udc->gadget.speed == USB_SPEED_UNKNOWN) {
> > >  		dev_err(ep->udc->dev, " *** %s, udc !!\n", __func__);
> > >  		return -ESHUTDOWN;
> > >  	}
> > > @@ -2603,8 +2603,8 @@ static int nbu2ss_ep_queue(struct usb_ep *_ep,
> > >  		}
> > >  	}
> > >
> > > -	if ((ep->epnum > 0) && (ep->direct == USB_DIR_OUT) &&
> > > -	    (req->req.dma != 0))
> > > +	if (ep->epnum > 0 && ep->direct == USB_DIR_OUT &&
> > > +	    req->req.dma != 0)
> > >  		_nbu2ss_dma_map_single(udc, ep, req, USB_DIR_OUT);
> > >  #endif
> > >
> > > --
> > > 2.34.1
> > >
> > >
> > >
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-05-16 22:47   ` Thomas Gleixner
@ 2023-05-23 13:46     ` Liam R. Howlett
  0 siblings, 0 replies; 657+ messages in thread
From: Liam R. Howlett @ 2023-05-23 13:46 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Matthew Wilcox, linux-mm, Shanker Donthineni

* Thomas Gleixner <tglx@linutronix.de> [230516 18:48]:
> On Mon, May 15 2023 at 15:27, Liam R. Howlett wrote:
> > * Thomas Gleixner <tglx@linutronix.de> [230510 15:01]:
> >> The documentation of mt_next() claims that it starts the search at the
> >> provided index. That's incorrect as it starts the search after the provided
> >> index.
> >> 
> >> The documentation of mt_find() is slightly confusing. "Handles locking" is
> >> not really helpful as it does not explain how the "locking" works.
> >
> > More locking notes can be found in Documentation/core-api/maple_tree.rst
> > which lists mt_find() under the "Takes RCU read lock" list.  I'm okay
> > with duplicating the comment of taking the RCU read lock in here.
> 
> Without a reference to the actual locking documentation such comments
> are not super helpful.

Noted.  A reference to the larger document should probably be added.
Thanks.

> 
> >> Fix similar issues for mt_find_after() and mt_prev().
> >> 
> >> Remove the completely confusing and pointless "Note: Will not return the
> >> zero entry." comment from mt_for_each() and document @__index correctly.
> >
> > The zero entry concept is an advanced API concept which allows you to
> > store something that cannot be seen by the mt_* family of users, so it
> > will not be returned and, instead, it will return NULL.  Think of it as
> > a reservation for an entry that isn't fully initialized.  Perhaps it
> > should read "Will not return the XA_ZERO_ENTRY" ?
> >>
> >> - *
> >> - * Note: Will not return the zero entry.
> >
> > This function "will not return the zero entry", meaning it will return
> > NULL if xa_is_zero(entry).
> 
> If I understand correctly, this translates to:
> 
>   This iterator skips entries, which have been reserved for future use
>   but have not yet been fully initialized.
> 
> Right?

Well, that's one use of using the XA_ZERO_ENTRY, but it's really up to
the user to decide why they are adding something that returns NULL in a
specific range for the not-advanced API.  It might be worth adding the
XA_ZERO_ENTRY in here, since that's the only special value right now?

> 
> >> @@ -6487,9 +6493,14 @@ EXPORT_SYMBOL(mtree_destroy);
> >>   * mt_find() - Search from the start up until an entry is found.
> >>   * @mt: The maple tree
> >>   * @index: Pointer which contains the start location of the search
> >> - * @max: The maximum value to check
> >> + * @max: The maximum value of the search range
> >> + *
> >> + * Takes RCU read lock internally to protect the search, which does not
> >> + * protect the returned pointer after dropping RCU read lock.
> >>   *
> >> - * Handles locking.  @index will be incremented to one beyond the range.
> >> + * In case that an entry is found @index contains the index of the found
> >> + * entry plus one, so it can be used as iterator index to find the next
> >> + * entry.
> >
> > What about:
> > "In case that an entry is found @index contains the last index of the
> > found entry plus one"
> 
> Still confusing to the casual reader like me :)
> 
>     "In case that an entry is found @index is updated to point to the next
>      possible entry independent whether the found entry is occupying a
>      single index or a range if indices."
> 
> Hmm?

That makes sense to me.

Thanks,
Liam

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-05-15 19:27 ` your mail Liam R. Howlett
  2023-05-15 21:16   ` Thomas Gleixner
@ 2023-05-16 22:47   ` Thomas Gleixner
  2023-05-23 13:46     ` Liam R. Howlett
  1 sibling, 1 reply; 657+ messages in thread
From: Thomas Gleixner @ 2023-05-16 22:47 UTC (permalink / raw)
  To: Liam R. Howlett; +Cc: LKML, Matthew Wilcox, linux-mm, Shanker Donthineni

On Mon, May 15 2023 at 15:27, Liam R. Howlett wrote:
> * Thomas Gleixner <tglx@linutronix.de> [230510 15:01]:
>> The documentation of mt_next() claims that it starts the search at the
>> provided index. That's incorrect as it starts the search after the provided
>> index.
>> 
>> The documentation of mt_find() is slightly confusing. "Handles locking" is
>> not really helpful as it does not explain how the "locking" works.
>
> More locking notes can be found in Documentation/core-api/maple_tree.rst
> which lists mt_find() under the "Takes RCU read lock" list.  I'm okay
> with duplicating the comment of taking the RCU read lock in here.

Without a reference to the actual locking documentation such comments
are not super helpful.

>> Fix similar issues for mt_find_after() and mt_prev().
>> 
>> Remove the completely confusing and pointless "Note: Will not return the
>> zero entry." comment from mt_for_each() and document @__index correctly.
>
> The zero entry concept is an advanced API concept which allows you to
> store something that cannot be seen by the mt_* family of users, so it
> will not be returned and, instead, it will return NULL.  Think of it as
> a reservation for an entry that isn't fully initialized.  Perhaps it
> should read "Will not return the XA_ZERO_ENTRY" ?
>>
>> - *
>> - * Note: Will not return the zero entry.
>
> This function "will not return the zero entry", meaning it will return
> NULL if xa_is_zero(entry).

If I understand correctly, this translates to:

  This iterator skips entries, which have been reserved for future use
  but have not yet been fully initialized.

Right?

>> @@ -6487,9 +6493,14 @@ EXPORT_SYMBOL(mtree_destroy);
>>   * mt_find() - Search from the start up until an entry is found.
>>   * @mt: The maple tree
>>   * @index: Pointer which contains the start location of the search
>> - * @max: The maximum value to check
>> + * @max: The maximum value of the search range
>> + *
>> + * Takes RCU read lock internally to protect the search, which does not
>> + * protect the returned pointer after dropping RCU read lock.
>>   *
>> - * Handles locking.  @index will be incremented to one beyond the range.
>> + * In case that an entry is found @index contains the index of the found
>> + * entry plus one, so it can be used as iterator index to find the next
>> + * entry.
>
> What about:
> "In case that an entry is found @index contains the last index of the
> found entry plus one"

Still confusing to the casual reader like me :)

    "In case that an entry is found @index is updated to point to the next
     possible entry independent whether the found entry is occupying a
     single index or a range if indices."

Hmm?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-05-15 19:27 ` your mail Liam R. Howlett
@ 2023-05-15 21:16   ` Thomas Gleixner
  2023-05-16 22:47   ` Thomas Gleixner
  1 sibling, 0 replies; 657+ messages in thread
From: Thomas Gleixner @ 2023-05-15 21:16 UTC (permalink / raw)
  To: Liam R. Howlett; +Cc: LKML, Matthew Wilcox, linux-mm, Shanker Donthineni

Liam!

On Mon, May 15 2023 at 15:27, Liam R. Howlett wrote:
> * Thomas Gleixner <tglx@linutronix.de> [230510 15:01]:
>>Also the
>> documentation of index talks about a range, while in reality the index
>> is updated on a succesful search to the index of the found entry plus one.
>
> This is a range based tree, so the index is incremented beyond the last
> entry which would return the entry.  That is, if you search for 5 and
> there is an entry at 4-100, the index would be 101 after the search -
> or, one beyond the range.  If you have single entries at a specific
> index, then index would be equal to last and it would be one beyond the
> index you found - but only because index == last in this case.

Thanks for the explanation

>> 
>> Fix similar issues for mt_find_after() and mt_prev().
>> 
>> Remove the completely confusing and pointless "Note: Will not return the
>> zero entry." comment from mt_for_each() and document @__index correctly.
>
> The zero entry concept is an advanced API concept which allows you to
> store something that cannot be seen by the mt_* family of users, so it
> will not be returned and, instead, it will return NULL.  Think of it as
> a reservation for an entry that isn't fully initialized.  Perhaps it
> should read "Will not return the XA_ZERO_ENTRY" ?

That makes actually sense.

>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -659,10 +659,8 @@ void *mt_next(struct maple_tree *mt, uns
>>   * mt_for_each - Iterate over each entry starting at index until max.
>>   * @__tree: The Maple Tree
>>   * @__entry: The current entry
>> - * @__index: The index to update to track the location in the tree
>> + * @__index: The index to start the search from. Subsequently used as iterator.
>>   * @__max: The maximum limit for @index
>> - *
>> - * Note: Will not return the zero entry.
>
> This function "will not return the zero entry", meaning it will return
> NULL if xa_is_zero(entry).

Ack.

>> + * Takes RCU read lock internally to protect the search, which does not
>> + * protect the returned pointer after dropping RCU read lock.
>>   *
>> - * Handles locking.  @index will be incremented to one beyond the range.
>> + * In case that an entry is found @index contains the index of the found
>> + * entry plus one, so it can be used as iterator index to find the next
>> + * entry.
>
> What about:
> "In case that an entry is found @index contains the last index of the
> found entry plus one"

Something like that, yes.

Let me try again.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2023-05-10 19:01 [PATCH] maple_tree: Fix a few documentation issues, Thomas Gleixner
@ 2023-05-15 19:27 ` Liam R. Howlett
  2023-05-15 21:16   ` Thomas Gleixner
  2023-05-16 22:47   ` Thomas Gleixner
  0 siblings, 2 replies; 657+ messages in thread
From: Liam R. Howlett @ 2023-05-15 19:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Matthew Wilcox, linux-mm, Shanker Donthineni

* Thomas Gleixner <tglx@linutronix.de> [230510 15:01]:
> The documentation of mt_next() claims that it starts the search at the
> provided index. That's incorrect as it starts the search after the provided
> index.
> 
> The documentation of mt_find() is slightly confusing. "Handles locking" is
> not really helpful as it does not explain how the "locking" works.

More locking notes can be found in Documentation/core-api/maple_tree.rst
which lists mt_find() under the "Takes RCU read lock" list.  I'm okay
with duplicating the comment of taking the RCU read lock in here.

>Also the
> documentation of index talks about a range, while in reality the index
> is updated on a succesful search to the index of the found entry plus one.

This is a range based tree, so the index is incremented beyond the last
entry which would return the entry.  That is, if you search for 5 and
there is an entry at 4-100, the index would be 101 after the search -
or, one beyond the range.  If you have single entries at a specific
index, then index would be equal to last and it would be one beyond the
index you found - but only because index == last in this case.

> 
> Fix similar issues for mt_find_after() and mt_prev().
> 
> Remove the completely confusing and pointless "Note: Will not return the
> zero entry." comment from mt_for_each() and document @__index correctly.

The zero entry concept is an advanced API concept which allows you to
store something that cannot be seen by the mt_* family of users, so it
will not be returned and, instead, it will return NULL.  Think of it as
a reservation for an entry that isn't fully initialized.  Perhaps it
should read "Will not return the XA_ZERO_ENTRY" ?

> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/maple_tree.h |    4 +---
>  lib/maple_tree.c           |   23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -659,10 +659,8 @@ void *mt_next(struct maple_tree *mt, uns
>   * mt_for_each - Iterate over each entry starting at index until max.
>   * @__tree: The Maple Tree
>   * @__entry: The current entry
> - * @__index: The index to update to track the location in the tree
> + * @__index: The index to start the search from. Subsequently used as iterator.
>   * @__max: The maximum limit for @index
> - *
> - * Note: Will not return the zero entry.

This function "will not return the zero entry", meaning it will return
NULL if xa_is_zero(entry).

>   */
>  #define mt_for_each(__tree, __entry, __index, __max) \
>  	for (__entry = mt_find(__tree, &(__index), __max); \
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5947,7 +5947,10 @@ EXPORT_SYMBOL_GPL(mas_next);
>   * @index: The start index
>   * @max: The maximum index to check
>   *
> - * Return: The entry at @index or higher, or %NULL if nothing is found.
> + * Takes RCU read lock internally to protect the search, which does not
> + * protect the returned pointer after dropping RCU read lock.
> + *
> + * Return: The entry higher than @index or %NULL if nothing is found.
>   */
>  void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max)
>  {
> @@ -6012,7 +6015,10 @@ EXPORT_SYMBOL_GPL(mas_prev);
>   * @index: The start index
>   * @min: The minimum index to check
>   *
> - * Return: The entry at @index or lower, or %NULL if nothing is found.
> + * Takes RCU read lock internally to protect the search, which does not
> + * protect the returned pointer after dropping RCU read lock.
> + *
> + * Return: The entry before @index or %NULL if nothing is found.
>   */
>  void *mt_prev(struct maple_tree *mt, unsigned long index, unsigned long min)
>  {
> @@ -6487,9 +6493,14 @@ EXPORT_SYMBOL(mtree_destroy);
>   * mt_find() - Search from the start up until an entry is found.
>   * @mt: The maple tree
>   * @index: Pointer which contains the start location of the search
> - * @max: The maximum value to check
> + * @max: The maximum value of the search range
> + *
> + * Takes RCU read lock internally to protect the search, which does not
> + * protect the returned pointer after dropping RCU read lock.
>   *
> - * Handles locking.  @index will be incremented to one beyond the range.
> + * In case that an entry is found @index contains the index of the found
> + * entry plus one, so it can be used as iterator index to find the next
> + * entry.

What about:
"In case that an entry is found @index contains the last index of the
found entry plus one"

>   *
>   * Return: The entry at or after the @index or %NULL
>   */
> @@ -6548,7 +6559,9 @@ EXPORT_SYMBOL(mt_find);
>   * @index: Pointer which contains the start location of the search
>   * @max: The maximum value to check
>   *
> - * Handles locking, detects wrapping on index == 0
> + * Same as mt_find() except that it checks @index for 0 before
> + * searching. If @index == 0, the search is aborted. This covers a wrap
> + * around of @index to 0 in an iterator loop.
>   *
>   * Return: The entry at or after the @index or %NULL
>   */

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <CAP7CzPfLu6mm6f2fon-zez3PW6rDACEH6ihF2aG+1Dc7Zc2WuQ@mail.gmail.com>
@ 2021-09-13  6:06 ` Willy Tarreau
  0 siblings, 0 replies; 657+ messages in thread
From: Willy Tarreau @ 2021-09-13  6:06 UTC (permalink / raw)
  To: zhao xc
  Cc: tglx, peterz, keescook, mingo, joe, john.garry, song.bao.hua,
	linux-kernel

Hi,

On Mon, Sep 13, 2021 at 01:32:51PM +0800, zhao xc wrote:
> Hi maintainer:
>         delete blank line between two enum definitions

Could you please make sure to place a subject (and a meaningful one) in
the "subject" field of your e-mails ? There's nothing more annoying than
receiving messages with no subject and having to read them to figure you
were not interested!

Thanks,
Willy

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2021-08-21  8:59 Kari Argillander
@ 2021-08-22 13:13 ` CGEL
  0 siblings, 0 replies; 657+ messages in thread
From: CGEL @ 2021-08-22 13:13 UTC (permalink / raw)
  To: Kari Argillander
  Cc: viro, christian.brauner, jamorris, gladkov.alexey, yang.yang29,
	tj, paul.gortmaker, linux-fsdevel, linux-kernel, Zeal Robot

O
Sat, Aug 21, 2021 at 11:59:39AM +0300, Kari Argillander wrote:
> Bcc:
> Subject: Re: [PATCH] proc: prevent mount proc on same mountpoint in one pid
>  namespace
> Reply-To:
> In-Reply-To: <20210821083105.30336-1-yang.yang29@zte.com.cn>
> 
> On Sat, Aug 21, 2021 at 01:31:05AM -0700, cgel.zte@gmail.com wrote:
> > From: Yang Yang <yang.yang29@zte.com.cn>
> > 
> > Patch "proc: allow to mount many instances of proc in one pid namespace"
> > aims to mount many instances of proc on different mountpoint, see
> > tools/testing/selftests/proc/proc-multiple-procfs.c.
> > 
> > But there is a side-effects, user can mount many instances of proc on
> > the same mountpoint in one pid namespace, which is not allowed before.
> > This duplicate mount makes no sense but wastes memory and CPU, and user
> > may be confused why kernel allows it.
> > 
> > The logic of this patch is: when try to mount proc on /mnt, check if
> > there is a proc instance mount on /mnt in the same pid namespace. If
> > answer is yes, return -EBUSY.
> > 
> > Since this check can't be done in proc_get_tree(), which call
> > get_tree_nodev() and will create new super_block unconditionally.
> > And other nodev fs may faces the same case, so add a new hook in
> > fs_context_operations.
> > 
> > Reported-by: Zeal Robot <zealci@zte.com.cn>
> > Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
> > ---
> >  fs/namespace.c             |  9 +++++++++
> >  fs/proc/root.c             | 15 +++++++++++++++
> >  include/linux/fs_context.h |  1 +
> >  3 files changed, 25 insertions(+)
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index f79d9471cb76..84da649a70c5 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2878,6 +2878,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
> >  static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
> >  			int mnt_flags, const char *name, void *data)
> >  {
> > +	int (*check_mntpoint)(struct fs_context *fc, struct path *path);
> >  	struct file_system_type *type;
> >  	struct fs_context *fc;
> >  	const char *subtype = NULL;
> > @@ -2906,6 +2907,13 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
> >  	if (IS_ERR(fc))
> >  		return PTR_ERR(fc);
> >  
> > +	/* check if there is a same super_block mount on path*/
> > +	check_mntpoint = fc->ops->check_mntpoint;
> > +	if (check_mntpoint)
> > +		err = check_mntpoint(fc, path);
> > +	if (err < 0)
> > +		goto err_fc;
> > +
> >  	if (subtype)
> >  		err = vfs_parse_fs_string(fc, "subtype",
> >  					  subtype, strlen(subtype));
> > @@ -2920,6 +2928,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
> >  	if (!err)
> >  		err = do_new_mount_fc(fc, path, mnt_flags);
> >  
> > +err_fc:
> >  	put_fs_context(fc);
> >  	return err;
> >  }
> > diff --git a/fs/proc/root.c b/fs/proc/root.c
> > index c7e3b1350ef8..0971d6b0bec2 100644
> > --- a/fs/proc/root.c
> > +++ b/fs/proc/root.c
> > @@ -237,11 +237,26 @@ static void proc_fs_context_free(struct fs_context *fc)
> >  	kfree(ctx);
> >  }
> >  
> > +static int proc_check_mntpoint(struct fs_context *fc, struct path *path)
> > +{
> > +	struct super_block *mnt_sb = path->mnt->mnt_sb;
> > +	struct proc_fs_info *fs_info;
> > +
> > +	if (strcmp(mnt_sb->s_type->name, "proc") == 0) {
> > +		fs_info = mnt_sb->s_fs_info;
> > +		if (fs_info->pid_ns == task_active_pid_ns(current) &&
> > +		    path->mnt->mnt_root == path->dentry)
> > +			return -EBUSY;
> > +	}
> > +	return 0;
> > +}
> > +
> >  static const struct fs_context_operations proc_fs_context_ops = {
> >  	.free		= proc_fs_context_free,
> >  	.parse_param	= proc_parse_param,
> >  	.get_tree	= proc_get_tree,
> >  	.reconfigure	= proc_reconfigure,
> > +	.check_mntpoint	= proc_check_mntpoint,
> >  };
> >  
> >  static int proc_init_fs_context(struct fs_context *fc)
> > diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
> > index 6b54982fc5f3..090a05fb2d7d 100644
> > --- a/include/linux/fs_context.h
> > +++ b/include/linux/fs_context.h
> > @@ -119,6 +119,7 @@ struct fs_context_operations {
> >  	int (*parse_monolithic)(struct fs_context *fc, void *data);
> >  	int (*get_tree)(struct fs_context *fc);
> >  	int (*reconfigure)(struct fs_context *fc);
> > +	int (*check_mntpoint)(struct fs_context *fc, struct path *path);
> 
> Don't you think this should be it's own patch. It is after all internal
> api change. This also needs documentation. It would be confusing if
> someone convert to new mount api and there is one line which just
> address some proc stuff but even commit message does not address does
> every fs needs to add this. 
> 
> Documentation is very good shape right now and we are in face that
> everyone is migrating to use new mount api so everyting should be well
> documented.
> i
Thanks for your reply!

I will take commit message more carefully next time.
Sinece I am not quit sure about this patch, so I didn't write
Documentation for patch v1. AIViro had made it clear, so this 
patch is abondoned.
> >  };
> >  
> >  /*
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2021-08-16  2:46 Kari Argillander
@ 2021-08-16 12:27 ` Christoph Hellwig
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Hellwig @ 2021-08-16 12:27 UTC (permalink / raw)
  To: Kari Argillander
  Cc: Konstantin Komarov, Christoph Hellwig, ntfs3, linux-kernel,
	linux-fsdevel, Pali Rohár, Matthew Wilcox

On Mon, Aug 16, 2021 at 05:46:59AM +0300, Kari Argillander wrote:
> I would like really like to get fsparam_flag_no also for no_acs_rules
> but then we have to make new name for it. Other possibility is to
> modify mount api so it mount option can be no/no_. I think that would
> maybe be good change. 

I don't think adding another no_ alias is a good idea.  I'd suggest
to just rename the existing flag before the ntfs3 driver ever hits
mainline.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2021-04-07  8:25 ` your mail Huang Rui
@ 2021-04-07  9:25   ` Christian König
  0 siblings, 0 replies; 657+ messages in thread
From: Christian König @ 2021-04-07  9:25 UTC (permalink / raw)
  To: Huang Rui, songqiang; +Cc: airlied, daniel, linux-kernel, dri-devel

Thanks Ray for pointing this out. Looks like the mail ended up in my 
spam folder otherwise.

Apart from that this patch is a really really big NAK. I can't count how 
often I had to reject stuff like this!

Using the page reference for TTM pages is illegal and can lead to struct 
page corruption.

Can you please describe why you need that?

Regards,
Christian.

Am 07.04.21 um 10:25 schrieb Huang Rui:
> On Wed, Apr 07, 2021 at 09:27:46AM +0800, songqiang wrote:
>
> Please add the description in the commit message and subject.
>
> Thanks,
> Ray
>
>> Signed-off-by: songqiang <songqiang@uniontech.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> index 14660f723f71..f3698f0ad4d7 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> @@ -736,8 +736,16 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags,
>>   					if (++p != pages[i + j])
>>   					    break;
>>   
>> -				if (j == HPAGE_PMD_NR)
>> +				if (j == HPAGE_PMD_NR) {
>>   					order = HPAGE_PMD_ORDER;
>> +					for (j = 1; j < HPAGE_PMD_NR; ++j)
>> +						page_ref_dec(pages[i+j]);
>> +				}
>>   			}
>>   #endif
>>   
>> @@ -868,10 +876,12 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags,
>>   				p = alloc_pages(huge_flags, HPAGE_PMD_ORDER);
>>   				if (!p)
>>   					break;
>> -
>> -				for (j = 0; j < HPAGE_PMD_NR; ++j)
>> +				for (j = 0; j < HPAGE_PMD_NR; ++j) {
>>   					pages[i++] = p++;
>> -
>> +					if (j > 0)
>> +						page_ref_inc(pages[i-1]);
>> +				}
>>   				npages -= HPAGE_PMD_NR;
>>   			}
>>   		}
>>
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cray.huang%40amd.com%7C4ccc617b77d746db5af108d8f98db612%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533734805563118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9bSP90LYdJyJYJYmuphVmqk%2B3%2FE4JPrtXkQTbxwAt68%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2021-04-07  1:27 [PATCH] drivers/gpu/drm/ttm/ttm_page_allo.c: adjust ttm pages refcount fix the bug: Feb 6 17:13:13 aaa-PC kernel: [ 466.271034] BUG: Bad page state in process blur_image pfn:7aee2 Feb 6 17:13:13 aaa-PC kernel: [ 466.271037] page:980000025fca4170 count:0 mapcount:0 mapping:980000025a0dca60 index:0x0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271039] flags: 0x1e01fff000000() Feb 6 17:13:13 aaa-PC kernel: [ 466.271042] raw: 0001e01fff000000 0000000000000100 0000000000000200 980000025a0dca60 Feb 6 17:13:13 aaa-PC kernel: [ 466.271044] raw: 0000000000000000 0000000000000000 00000000ffffffff Feb 6 17:13:13 aaa-PC kernel: [ 466.271046] page dumped because: non-NULL mapping Feb 6 17:13:13 aaa-PC kernel: [ 466.271047] Modules linked in: bnep fuse bluetooth ecdh_generic sha256_generic cfg80211 rfkill vfat fat serio_raw uio_pdrv_genirq binfmt_misc ip_tables amdgpu chash radeon r8168 loongson gpu_sched Feb 6 17:13:13 aaa-PC kernel: [ 466.271059] CPU: 3 PID: 9554 Comm: blur_image Tainted: G B 4.19.0-loongson-3-desktop #3036 Feb 6 17:13:13 aaa-PC kernel: [ 466.271061] Hardware name: Haier Kunlun-LS3A4000-LS7A-desktop/Kunlun-LS3A4000-LS7A-desktop, BIOS Kunlun-V4.0.12V4.0 LS3A4000 03/19/2020 Feb 6 17:13:13 aaa-PC kernel: [ 466.271063] Stack : 000000000000007b 000000007400cce0 0000000000000000 0000000000000007 Feb 6 17:13:13 aaa-PC kernel: [ 466.271067] 0000000000000000 0000000000000000 0000000000002a82 ffffffff8202c910 Feb 6 17:13:13 aaa-PC kernel: [ 466.271070] 0000000000000000 0000000000002a82 0000000000000000 ffffffff81e20000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271074] 0000000000000000 ffffffff8021301c ffffffff82040000 6e754b20534f4942 Feb 6 17:13:13 aaa-PC kernel: [ 466.271078] ffff000000000000 0000000000000000 000000007400cce0 0000000000000000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271082] 9800000007155d40 ffffffff81cc5470 0000000000000005 6db6db6db6db0000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271086] 0000000000000003 fffffffffffffffb 0000000000006000 98000002559f4000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271090] 980000024a448000 980000024a44b7f0 9800000007155d50 ffffffff819f5158 Feb 6 17:13:13 aaa-PC kernel: [ 466.271094] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271097] 9800000007155d40 ffffffff802310c4 ffffffff81e70000 ffffffff819f5158 Feb 6 17:13:13 aaa-PC kernel: [ 466.271101] ... Feb 6 17:13:13 aaa-PC kernel: [ 466.271103] Call Trace: Feb 6 17:13:13 aaa-PC kernel: [ 466.271107] [<ffffffff802310c4>] show_stack+0x44/0x1c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271110] [<ffffffff819f5158>] dump_stack+0x1d8/0x240 Feb 6 17:13:13 aaa-PC kernel: [ 466.271113] [<ffffffff80491c10>] bad_page+0x210/0x2c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271116] [<ffffffff804931c8>] free_pcppages_bulk+0x708/0x900 Feb 6 17:13:13 aaa-PC kernel: [ 46 6.271119] [<ffffffff804980cc>] free_unref_page_list+0x1cc/0x2c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271122] [<ffffffff804ad2c8>] release_pages+0x648/0x900 Feb 6 17:13:13 aaa-PC kernel: [ 466.271125] [<ffffffff804f3b48>] tlb_flush_mmu_free+0x88/0x100 Feb 6 17:13:13 aaa-PC kernel: [ 466.271128] [<ffffffff804f8a24>] zap_pte_range+0xa24/0x1480 Feb 6 17:13:13 aaa-PC kernel: [ 466.271132] [<ffffffff804f98b0>] unmap_page_range+0x1f0/0x500 Feb 6 17:13:13 aaa-PC kernel: [ 466.271135] [<ffffffff804fa054>] unmap_vmas+0x154/0x200 Feb 6 17:13:13 aaa-PC kernel: [ 466.271138] [<ffffffff8051190c>] exit_mmap+0x20c/0x380 Feb 6 17:13:13 aaa-PC kernel: [ 466.271142] [<ffffffff802bb9c8>] mmput+0x148/0x300 Feb 6 17:13:13 aaa-PC kernel: [ 466.271145] [<ffffffff802c80d8>] do_exit+0x6d8/0x1900 Feb 6 17:13:13 aaa-PC kernel: [ 466.271148] [<ffffffff802cb288>] do_group_exit+0x88/0x1c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271151] [<ffffffff802cb3d8>] sys_exit_group+0x18/0x40 Feb 6 17 :13:13 aaa-PC kernel: [ 466.271155] [<ffffffff8023f954>] syscall_common+0x34/0xa4 songqiang
@ 2021-04-07  8:25 ` Huang Rui
  2021-04-07  9:25   ` Christian König
  0 siblings, 1 reply; 657+ messages in thread
From: Huang Rui @ 2021-04-07  8:25 UTC (permalink / raw)
  To: songqiang; +Cc: Koenig, Christian, airlied, daniel, linux-kernel, dri-devel

On Wed, Apr 07, 2021 at 09:27:46AM +0800, songqiang wrote:

Please add the description in the commit message and subject.

Thanks,
Ray

> Signed-off-by: songqiang <songqiang@uniontech.com>
> ---
>  drivers/gpu/drm/ttm/ttm_page_alloc.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index 14660f723f71..f3698f0ad4d7 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -736,8 +736,16 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags,
>  					if (++p != pages[i + j])
>  					    break;
>  
> -				if (j == HPAGE_PMD_NR)
> +				if (j == HPAGE_PMD_NR) {
>  					order = HPAGE_PMD_ORDER;
> +					for (j = 1; j < HPAGE_PMD_NR; ++j)
> +						page_ref_dec(pages[i+j]);
> +				}
>  			}
>  #endif
>  
> @@ -868,10 +876,12 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags,
>  				p = alloc_pages(huge_flags, HPAGE_PMD_ORDER);
>  				if (!p)
>  					break;
> -
> -				for (j = 0; j < HPAGE_PMD_NR; ++j)
> +				for (j = 0; j < HPAGE_PMD_NR; ++j) {
>  					pages[i++] = p++;
> -
> +					if (j > 0)
> +						page_ref_inc(pages[i-1]);
> +				}
>  				npages -= HPAGE_PMD_NR;
>  			}
>  		}
> 
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cray.huang%40amd.com%7C4ccc617b77d746db5af108d8f98db612%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533734805563118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=9bSP90LYdJyJYJYmuphVmqk%2B3%2FE4JPrtXkQTbxwAt68%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2021-04-01 21:16 Bhaumik Bhatt
@ 2021-04-07  6:56 ` Manivannan Sadhasivam
  0 siblings, 0 replies; 657+ messages in thread
From: Manivannan Sadhasivam @ 2021-04-07  6:56 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, carl.yin,
	naveen.kumar, loic.poulain

On Thu, Apr 01, 2021 at 02:16:09PM -0700, Bhaumik Bhatt wrote:
> Subject: [PATCH v8 0/9] Updates to MHI channel handling
> 

Subject is present in the body ;)

> MHI specification shows a state machine with support for STOP channel command
> and the validity of certain state transitions. MHI host currently does not
> provide any mechanism to stop a channel and restart it without resetting it.
> There are also times when the device moves on to a different execution
> environment while client drivers on the host are unaware of it and still
> attempt to reset the channels facing unnecessary timeouts.
> 
> This series addresses the above areas to provide support for stopping an MHI
> channel, resuming it back, improved documentation and improving upon channel
> state machine handling in general.
> 
> This set of patches was tested on arm64 and x86_64 architecture.
> 

Series applied to mhi-next!

Thanks,
Mani

> v8:
> -Split the state machine improvements patch to three patches as per review
> 
> v7:
> -Tested on x86_64 architecture
> -Drop the patch "Do not clear channel context more than once" as issue is fixed
> differently using "bus: mhi: core: Fix double dma free()"
> -Update the commit text to better reflect changes on state machine improvements
> 
> v6:
> -Dropped the patch which introduced start/stop transfer APIs for lack of users
> -Updated error handling and debug prints on channel handling improvements patch
> -Improved commit text to better explain certain patches based on review comments
> -Removed references to new APIs from the documentation improvement patch
> 
> v5:
> -Added reviewed-by tags from Hemant I missed earlier
> -Added patch to prevent kernel warnings on clearing channel context twice
> 
> v4:
> -Updated commit text/descriptions and addressed checkpatch checks
> -Added context validity check before starting/stopping channels from new API
> -Added patch to clear channel context configuration after reset/unprepare
> 
> v3:
> -Updated documentation for channel transfer APIs to highlight differences
> -Create separate patch for "allowing channel to be disabled from stopped state"
> 
> v2:
> -Renamed the newly introduced APIs to mhi_start_transfer() / mhi_stop_transfer()
> -Added improved documentation to avoid confusion with the new APIs
> -Removed the __ prefix from mhi_unprepare_channel() API for consistency.
> 
> Bhaumik Bhatt (9):
>   bus: mhi: core: Allow sending the STOP channel command
>   bus: mhi: core: Clear context for stopped channels from remove()
>   bus: mhi: core: Improvements to the channel handling state machine
>   bus: mhi: core: Update debug messages to use client device
>   bus: mhi: core: Hold device wake for channel update commands
>   bus: mhi: core: Clear configuration from channel context during reset
>   bus: mhi: core: Check channel execution environment before issuing
>     reset
>   bus: mhi: core: Remove __ prefix for MHI channel unprepare function
>   bus: mhi: Improve documentation on channel transfer setup APIs
> 
>  drivers/bus/mhi/core/init.c     |  22 ++++-
>  drivers/bus/mhi/core/internal.h |  12 +++
>  drivers/bus/mhi/core/main.c     | 190 ++++++++++++++++++++++++----------------
>  include/linux/mhi.h             |  18 +++-
>  4 files changed, 162 insertions(+), 80 deletions(-)
> 
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20210322213644.333112726@goodmis.org>
@ 2021-03-22 21:40 ` Steven Rostedt
  0 siblings, 0 replies; 657+ messages in thread
From: Steven Rostedt @ 2021-03-22 21:40 UTC (permalink / raw)
  To: linux-kernel

On Mon, Mar 22, 2021 at 05:36:44PM -0400, Steven Rostedt wrote:

$@#@#$%%%!!!!

Bah! There was another typo in the email list!

Take 3

-- Steve


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20210322212156.440428241@goodmis.org>
@ 2021-03-22 21:36 ` Steven Rostedt
  0 siblings, 0 replies; 657+ messages in thread
From: Steven Rostedt @ 2021-03-22 21:36 UTC (permalink / raw)
  To: linux-kernel

On Mon, Mar 22, 2021 at 05:21:56PM -0400, Steven Rostedt wrote:

Bah! John 'Warthog' Hawley email had those single quotes in it that I cut and
pasted into the Cc list, causing the quilt mail parsing to fail, but as LKML
was in the "To" part, it still sent!

Take 2

-- Steve

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-12-02 18:51             ` your mail Andy Shevchenko
@ 2020-12-02 18:56               ` Andy Shevchenko
  0 siblings, 0 replies; 657+ messages in thread
From: Andy Shevchenko @ 2020-12-02 18:56 UTC (permalink / raw)
  To: Yun Levi
  Cc: Yury Norov, Rasmus Villemoes, dushistov, Arnd Bergmann,
	Andrew Morton, Gustavo A. R. Silva, William Breathitt Gray,
	richard.weiyang, joseph.qi, skalluru, Josh Poimboeuf,
	Linux Kernel Mailing List, linux-arch

On Wed, Dec 02, 2020 at 08:51:27PM +0200, Andy Shevchenko wrote:
> On Thu, Dec 03, 2020 at 03:27:33AM +0900, Yun Levi wrote:
> > On Thu, Dec 3, 2020 at 2:36 AM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Wed, Dec 02, 2020 at 09:26:05AM -0800, Yury Norov wrote:

...

> > > Side note: speaking of performance, any plans to fix for_each_*_bit*() for
> > > cases when the nbits is known to be <= BITS_PER_LONG?
> > >
> > > Now it makes an awful code generation (something like few hundred bytes of
> > > code).
> 
> > Frankly Speaking, I don't have an idea in now.....
> > Could you share your idea or wisdom?
> 
> Something like (I may be mistaken by names, etc, I'm not a compiler expert,
> and this is in pseudo language, I don't remember all API names by hart,
> just to express the idea) as a rough first step
> 
> __builtin_constant(nbits, find_next_set_bit_long, find_next_set_bit)
> 
> find_next_set_bit_long()
> {
> 	unsigned long v = BIT_LAST_WORD(i);
> 	return ffs_long(v);
> }
> 
> Same for find_first_set_bit() -> map it to ffs_long().
> 
> And I believe it can be optimized more.

Btw it will also require to reconsider test cases where such constant small
nbits values are passed (forcing compiler to avoid optimization somehow, one
way is to try random nbits for some test cases).

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-12-02 18:27           ` Yun Levi
@ 2020-12-02 18:51             ` Andy Shevchenko
  2020-12-02 18:56               ` Andy Shevchenko
  0 siblings, 1 reply; 657+ messages in thread
From: Andy Shevchenko @ 2020-12-02 18:51 UTC (permalink / raw)
  To: Yun Levi
  Cc: Yury Norov, Rasmus Villemoes, dushistov, Arnd Bergmann,
	Andrew Morton, Gustavo A. R. Silva, William Breathitt Gray,
	richard.weiyang, joseph.qi, skalluru, Josh Poimboeuf,
	Linux Kernel Mailing List, linux-arch

On Thu, Dec 03, 2020 at 03:27:33AM +0900, Yun Levi wrote:
> On Thu, Dec 3, 2020 at 2:36 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Wed, Dec 02, 2020 at 09:26:05AM -0800, Yury Norov wrote:

...

> > Side note: speaking of performance, any plans to fix for_each_*_bit*() for
> > cases when the nbits is known to be <= BITS_PER_LONG?
> >
> > Now it makes an awful code generation (something like few hundred bytes of
> > code).

> Frankly Speaking, I don't have an idea in now.....
> Could you share your idea or wisdom?

Something like (I may be mistaken by names, etc, I'm not a compiler expert,
and this is in pseudo language, I don't remember all API names by hart,
just to express the idea) as a rough first step

__builtin_constant(nbits, find_next_set_bit_long, find_next_set_bit)

find_next_set_bit_long()
{
	unsigned long v = BIT_LAST_WORD(i);
	return ffs_long(v);
}

Same for find_first_set_bit() -> map it to ffs_long().

And I believe it can be optimized more.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-08-17 17:12     ` Amit Pundir
@ 2020-08-30 18:58       ` Bjorn Andersson
  0 siblings, 0 replies; 657+ messages in thread
From: Bjorn Andersson @ 2020-08-30 18:58 UTC (permalink / raw)
  To: Amit Pundir
  Cc: Konrad Dybcio, Andy Gross, dt, John Stultz, linux-arm-msm, lkml,
	Rob Herring, Sumit Semwal

On Mon 17 Aug 17:12 UTC 2020, Amit Pundir wrote:

> On Thu, 13 Aug 2020 at 12:38, Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote:
> >
> > > Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 (Beryllium)
> > >
> > > >// This removed_region is needed to boot the device
> > > >               // TODO: Find out the user of this reserved memory
> > > >               removed_region: memory@88f00000 {
> > >
> > > This region seems to belong to the Trust Zone. When Linux tries to access it, TZ bites and shuts the device down.
> > >
> >
> > This is in line with what the documentation indicates and then it would
> > be better to just bump &tz_mem to a size of 0x4900000.
> 
> Hi, so just to be sure that I got this right, you want me to extend
> &tz_mem to the size of 0x4900000 from the default size of 0x2D00000 by
> including this downstream &removed_region (of size 0x1A00000) +
> previously unreserved downstream memory region (of size 0x200000), to
> align with the starting address of &qseecom_mem?
> 

Yes

Regards,
Bjorn

> I just gave this &tz_mem change a spin and I do not see any obvious
> regression in my limited smoke testing (Boots AOSP to UI with
> v5.9-rc1. Touch/BT/WiFi works) so far, with 20+ out-of-tree patches.
> 
> Regards,
> Amit Pundir
> 
> >
> > Regards,
> > Bjorn

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-08-13  7:04   ` your mail Bjorn Andersson
@ 2020-08-17 17:12     ` Amit Pundir
  2020-08-30 18:58       ` Bjorn Andersson
  0 siblings, 1 reply; 657+ messages in thread
From: Amit Pundir @ 2020-08-17 17:12 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Konrad Dybcio, Andy Gross, dt, John Stultz, linux-arm-msm, lkml,
	Rob Herring, Sumit Semwal

On Thu, 13 Aug 2020 at 12:38, Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote:
>
> > Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 (Beryllium)
> >
> > >// This removed_region is needed to boot the device
> > >               // TODO: Find out the user of this reserved memory
> > >               removed_region: memory@88f00000 {
> >
> > This region seems to belong to the Trust Zone. When Linux tries to access it, TZ bites and shuts the device down.
> >
>
> This is in line with what the documentation indicates and then it would
> be better to just bump &tz_mem to a size of 0x4900000.

Hi, so just to be sure that I got this right, you want me to extend
&tz_mem to the size of 0x4900000 from the default size of 0x2D00000 by
including this downstream &removed_region (of size 0x1A00000) +
previously unreserved downstream memory region (of size 0x200000), to
align with the starting address of &qseecom_mem?

I just gave this &tz_mem change a spin and I do not see any obvious
regression in my limited smoke testing (Boots AOSP to UI with
v5.9-rc1. Touch/BT/WiFi works) so far, with 20+ out-of-tree patches.

Regards,
Amit Pundir

>
> Regards,
> Bjorn

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-08-06 22:31 ` Konrad Dybcio
@ 2020-08-13  7:04   ` Bjorn Andersson
  2020-08-17 17:12     ` Amit Pundir
  0 siblings, 1 reply; 657+ messages in thread
From: Bjorn Andersson @ 2020-08-13  7:04 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: amit.pundir, agross, devicetree, john.stultz, linux-arm-msm,
	linux-kernel, robh+dt, sumit.semwal

On Thu 06 Aug 15:31 PDT 2020, Konrad Dybcio wrote:

> Subject: Re: [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 (Beryllium)
> 
> >// This removed_region is needed to boot the device
> >               // TODO: Find out the user of this reserved memory
> >               removed_region: memory@88f00000 {
> 
> This region seems to belong to the Trust Zone. When Linux tries to access it, TZ bites and shuts the device down.
> 

This is in line with what the documentation indicates and then it would
be better to just bump &tz_mem to a size of 0x4900000.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-06-09 11:38 Gaurav Singh
@ 2020-06-09 11:54 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2020-06-09 11:54 UTC (permalink / raw)
  To: Gaurav Singh
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	open list:BPF (Safe dynamic programs and tools),
	open list:BPF (Safe dynamic programs and tools),
	open list

On Tue, Jun 09, 2020 at 07:38:38AM -0400, Gaurav Singh wrote:
> Please find the patch below.
> 
> Thanks and regards,
> Gaurav.
> 
> >From Gaurav Singh <gaurav1086@gmail.com> # This line is ignored.
> From: Gaurav Singh <gaurav1086@gmail.com>
> Reply-To: 
> Subject: 
> In-Reply-To: 
> 
> 

I think something went wrong in your submission, just use 'git
send-email' to send the patch out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2020-05-06  5:52 Jiaxun Yang
@ 2020-05-07 11:00 ` Thomas Bogendoerfer
  0 siblings, 0 replies; 657+ messages in thread
From: Thomas Bogendoerfer @ 2020-05-07 11:00 UTC (permalink / raw)
  To: Jiaxun Yang
  Cc: linux-mips, clang-built-linux, Maciej W . Rozycki, Fangrui Song,
	Kees Cook, Nathan Chancellor, Paul Burton, Masahiro Yamada,
	Jouni Hogander, Kevin Darbyshire-Bryant, Borislav Petkov,
	Heiko Carstens, linux-kernel

On Wed, May 06, 2020 at 01:52:45PM +0800, Jiaxun Yang wrote:
> Subject: [PATCH v6] MIPS: Truncate link address into 32bit for 32bit kernel
> In-Reply-To: <20200413062651.3992652-1-jiaxun.yang@flygoat.com>
> 
> LLD failed to link vmlinux with 64bit load address for 32bit ELF
> while bfd will strip 64bit address into 32bit silently.
> To fix LLD build, we should truncate load address provided by platform
> into 32bit for 32bit kernel.
> 
> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> Link: https://github.com/ClangBuiltLinux/linux/issues/786
> Link: https://sourceware.org/bugzilla/show_bug.cgi?id=25784
> Reviewed-by: Fangrui Song <maskray@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Tested-by: Nathan Chancellor <natechancellor@gmail.com>
> Cc: Maciej W. Rozycki <macro@linux-mips.org>
> ---
> V2: Take MaskRay's shell magic.
> 
> V3: After spent an hour on dealing with special character issue in
> Makefile, I gave up to do shell hacks and write a util in C instead.
> Thanks Maciej for pointing out Makefile variable problem.
> 
> v4: Finally we managed to find a Makefile method to do it properly
> thanks to Kees. As it's too far from the initial version, I removed
> Review & Test tag from Nick and Fangrui and Cc instead.
> 
> v5: Care vmlinuz as well.
> 
> v6: Rename to LIKER_LOAD_ADDRESS 
> ---
>  arch/mips/Makefile                 | 13 ++++++++++++-
>  arch/mips/boot/compressed/Makefile |  2 +-
>  arch/mips/kernel/vmlinux.lds.S     |  2 +-
>  3 files changed, 14 insertions(+), 3 deletions(-)

applied to mips-next.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20191026192359.27687-1-frank-w@public-files.de>
@ 2019-10-26 19:30 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 657+ messages in thread
From: Greg Kroah-Hartman @ 2019-10-26 19:30 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: linux-mediatek, Matthias Brugger, linux-serial, linux-arm-kernel,
	linux-kernel

On Sat, Oct 26, 2019 at 09:23:59PM +0200, Frank Wunderlich wrote:
> Date: Sat, 26 Oct 2019 20:53:28 +0200
> Subject: [PATCH] serial: 8250-mtk: Ask for IRQ-count before request one

Odd email with no subject line :(

Plaese fix up and resend.

thanks,

greg k-h-

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190626145238.19708-1-bigeasy@linutronix.de>
@ 2019-06-27 21:13 ` Tejun Heo
  0 siblings, 0 replies; 657+ messages in thread
From: Tejun Heo @ 2019-06-27 21:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Lai Jiangshan, Peter Zijlstra, Thomas Gleixner

On Wed, Jun 26, 2019 at 04:52:36PM +0200, Sebastian Andrzej Siewior wrote:
> A small series of tiny cleanups.

Applied 1-2 to wq/for-5.3.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-04-11 10:53 ` Peter Zijlstra
@ 2019-04-12  3:23   ` Nicholas Piggin
  0 siblings, 0 replies; 657+ messages in thread
From: Nicholas Piggin @ 2019-04-12  3:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, linux-kernel, Ingo Molnar, Thomas Gleixner

Peter Zijlstra's on April 11, 2019 8:53 pm:
> Was this supposed to be patch 6/5 of your previous series?

Dang, I screwed up the headers? Thanks for the ping, I will resend.

It is standalone. It seems more suited to the scheduler tree than the
timers one, but your call.

It is generally of more use when CPU0 is _not_ a housekeeping one,
and that's where I've done most testing, but I don't see any hard
dependency.

Thanks,
Nick

> 
> On Thu, Apr 11, 2019 at 04:05:36PM +1000, Nicholas Piggin wrote:
>> Date: Tue, 9 Apr 2019 20:23:16 +1000
>> Subject: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC
>>  CPUs
>> 
>> The nohz idle balancer runs on the lowest idle CPU. This can
>> interfere with isolated CPUs, so confine it to HK_FLAG_MISC
>> housekeeping CPUs.
>> 
>> HK_FLAG_SCHED is not used for this because it is not set anywhere
>> at the moment. This could be folded into HK_FLAG_SCHED once that
>> option is fixed.
>> 
>> The problem was observed with increased jitter on an application
>> running on CPU0, caused by nohz idle load balancing being run on
>> CPU1 (an SMT sibling).
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  kernel/sched/fair.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fdab7eb6f351..d29ca323214d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9522,22 +9522,26 @@ static inline int on_null_domain(struct rq *rq)
>>   * - When one of the busy CPUs notice that there may be an idle rebalancing
>>   *   needed, they will kick the idle load balancer, which then does idle
>>   *   load balancing for all the idle CPUs.
>> + * - HK_FLAG_MISC CPUs are used for this task, because HK_FLAG_SCHED not set
>> + *   anywhere yet.
>>   */
>>  
>>  static inline int find_new_ilb(void)
>>  {
>> -	int ilb = cpumask_first(nohz.idle_cpus_mask);
>> +	int ilb;
>>  
>> -	if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> -		return ilb;
>> +	for_each_cpu_and(ilb, nohz.idle_cpus_mask,
>> +			      housekeeping_cpumask(HK_FLAG_MISC)) {
>> +		if (idle_cpu(ilb))
>> +			return ilb;
>> +	}
>>  
>>  	return nr_cpu_ids;
>>  }
>>  
>>  /*
>> - * Kick a CPU to do the nohz balancing, if it is time for it. We pick the
>> - * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle
>> - * CPU (if there is one).
>> + * Kick a CPU to do the nohz balancing, if it is time for it. We pick any
>> + * idle CPU in the HK_FLAG_MISC housekeeping set (if there is one).
>>   */
>>  static void kick_ilb(unsigned int flags)
>>  {
>> -- 
>> 2.20.1
>> 
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190411060536.22409-1-npiggin@gmail.com>
@ 2019-04-11 10:53 ` Peter Zijlstra
  2019-04-12  3:23   ` Nicholas Piggin
  0 siblings, 1 reply; 657+ messages in thread
From: Peter Zijlstra @ 2019-04-11 10:53 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Frederic Weisbecker, Ingo Molnar, linux-kernel

Was this supposed to be patch 6/5 of your previous series?

On Thu, Apr 11, 2019 at 04:05:36PM +1000, Nicholas Piggin wrote:
> Date: Tue, 9 Apr 2019 20:23:16 +1000
> Subject: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC
>  CPUs
> 
> The nohz idle balancer runs on the lowest idle CPU. This can
> interfere with isolated CPUs, so confine it to HK_FLAG_MISC
> housekeeping CPUs.
> 
> HK_FLAG_SCHED is not used for this because it is not set anywhere
> at the moment. This could be folded into HK_FLAG_SCHED once that
> option is fixed.
> 
> The problem was observed with increased jitter on an application
> running on CPU0, caused by nohz idle load balancing being run on
> CPU1 (an SMT sibling).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  kernel/sched/fair.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fdab7eb6f351..d29ca323214d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9522,22 +9522,26 @@ static inline int on_null_domain(struct rq *rq)
>   * - When one of the busy CPUs notice that there may be an idle rebalancing
>   *   needed, they will kick the idle load balancer, which then does idle
>   *   load balancing for all the idle CPUs.
> + * - HK_FLAG_MISC CPUs are used for this task, because HK_FLAG_SCHED not set
> + *   anywhere yet.
>   */
>  
>  static inline int find_new_ilb(void)
>  {
> -	int ilb = cpumask_first(nohz.idle_cpus_mask);
> +	int ilb;
>  
> -	if (ilb < nr_cpu_ids && idle_cpu(ilb))
> -		return ilb;
> +	for_each_cpu_and(ilb, nohz.idle_cpus_mask,
> +			      housekeeping_cpumask(HK_FLAG_MISC)) {
> +		if (idle_cpu(ilb))
> +			return ilb;
> +	}
>  
>  	return nr_cpu_ids;
>  }
>  
>  /*
> - * Kick a CPU to do the nohz balancing, if it is time for it. We pick the
> - * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle
> - * CPU (if there is one).
> + * Kick a CPU to do the nohz balancing, if it is time for it. We pick any
> + * idle CPU in the HK_FLAG_MISC housekeeping set (if there is one).
>   */
>  static void kick_ilb(unsigned int flags)
>  {
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-19 15:22 ` Keith Busch
  2019-03-19 23:49   ` Chaitanya Kulkarni
  2019-03-20 16:30   ` Maxim Levitsky
@ 2019-04-08 10:04   ` Maxim Levitsky
  2 siblings, 0 replies; 657+ messages in thread
From: Maxim Levitsky @ 2019-04-08 10:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
	linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
	John Ferlan

On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> >   -> Share the NVMe device between host and guest. 
> >      Even in fully virtualized configurations,
> >      some partitions of nvme device could be used by guests as block
> > devices 
> >      while others passed through with nvme-mdev to achieve balance between
> >      all features of full IO stack emulation and performance.
> >   
> >   -> NVME-MDEV is a bit faster due to the fact that in-kernel driver 
> >      can send interrupts to the guest directly without a context 
> >      switch that can be expensive due to meltdown mitigation.
> > 
> >   -> Is able to utilize interrupts to get reasonable performance. 
> >      This is only implemented
> >      as a proof of concept and not included in the patches, 
> >      but interrupt driven mode shows reasonable performance
> >      
> >   -> This is a framework that later can be used to support NVMe devices 
> >      with more of the IO virtualization built-in 
> >      (IOMMU with PASID support coupled with device that supports it)
> 
> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.
> 
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
> 
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).


Hi!

Sorry to bother you, but any update?

I was somewhat sick for the last week, now finally back in shape to continue
working on this and other tasks I have.

I am studing now the nvme target code and the io_uring to evaluate the
difficultiy of using something similiar to talk to the block device instead of /
in addtion to the  direct connection I implemented.

I would be glad to hear more feedback on this project.

I will also soon post the few fixes separately as you suggested.

Best regards,
    Maxim Levitskky





^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190323171738.GA26736@titus.pi.local>
@ 2019-03-26  8:42 ` Dan Carpenter
  0 siblings, 0 replies; 657+ messages in thread
From: Dan Carpenter @ 2019-03-26  8:42 UTC (permalink / raw)
  To: William J. Cunningham; +Cc: gregkh, devel, linux-kernel

On Sat, Mar 23, 2019 at 01:17:38PM -0400, William J. Cunningham wrote:
> >From bb04b0ca982b7042902fffe1377e0e38e83b402b Mon Sep 17 00:00:00 2001
> From: Will Cunningham <wjcunningham7@gmail.com>
> Date: Sat, 23 Mar 2019 12:54:34 -0400
> Subject: [PATCH] Staging: emxx_udc: emxx_udc: Fixed a coding style error
> 
> Removed unnecessary parentheses.
> 
> Signed-off-by: Will Cunningham <wjcunningham7@gmail.com>

Please fix up the headers and resend.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-21 17:07   ` Maxim Levitsky
@ 2019-03-25 16:46     ` Stefan Hajnoczi
  0 siblings, 0 replies; 657+ messages in thread
From: Stefan Hajnoczi @ 2019-03-25 16:46 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
	David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Wolfram Sang, Nicolas Ferre, Paul E . McKenney, Paolo Bonzini,
	Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan, John Ferlan

[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]

On Thu, Mar 21, 2019 at 07:07:38PM +0200, Maxim Levitsky wrote:
> On Thu, 2019-03-21 at 16:13 +0000, Stefan Hajnoczi wrote:
> > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > > Date: Tue, 19 Mar 2019 14:45:45 +0200
> > > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
> > > 
> > > Hi everyone!
> > > 
> > > In this patch series, I would like to introduce my take on the problem of
> > > doing 
> > > as fast as possible virtualization of storage with emphasis on low latency.
> > > 
> > > In this patch series I implemented a kernel vfio based, mediated device
> > > that 
> > > allows the user to pass through a partition and/or whole namespace to a
> > > guest.
> > > 
> > > The idea behind this driver is based on paper you can find at
> > > https://www.usenix.org/conference/atc18/presentation/peng,
> > > 
> > > Although note that I stared the development prior to reading this paper, 
> > > independently.
> > > 
> > > In addition to that implementation is not based on code used in the paper
> > > as 
> > > I wasn't being able at that time to make the source available to me.
> > > 
> > > ***Key points about the implementation:***
> > > 
> > > * Polling kernel thread is used. The polling is stopped after a 
> > > predefined timeout (1/2 sec by default).
> > > Support for all interrupt driven mode is planned, and it shows promising
> > > results.
> > > 
> > > * Guest sees a standard NVME device - this allows to run guest with 
> > > unmodified drivers, for example windows guests.
> > > 
> > > * The NVMe device is shared between host and guest.
> > > That means that even a single namespace can be split between host 
> > > and guest based on different partitions.
> > > 
> > > * Simple configuration
> > > 
> > > *** Performance ***
> > > 
> > > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 
> > > and both latency and throughput is very similar to SPDK.
> > > 
> > > Soon I will test this on a better server and nvme device and provide
> > > more formal performance numbers.
> > > 
> > > Latency numbers:
> > > ~80ms - spdk with fio plugin on the host.
> > > ~84ms - nvme driver on the host
> > > ~87ms - mdev-nvme + nvme driver in the guest
> > 
> > You mentioned the spdk numbers are with vhost-user-nvme.  Have you
> > measured SPDK's vhost-user-blk?
> 
> I had lot of measuments of vhost-user-blk vs vhost-user-nvme.
> vhost-user-nvme was always a bit faster but only a bit.
> Thus I don't think it makes sense to benchamrk against vhost-user-blk.

It's interesting because mdev-nvme is closest to the hardware while
vhost-user-blk is closest to software.  Doing things at the NVMe level
isn't buying much performance because it's still going through a
software path comparable to vhost-user-blk.

From what you say it sounds like there isn't much to optimize away :(.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-21 16:13 ` Stefan Hajnoczi
@ 2019-03-21 17:07   ` Maxim Levitsky
  2019-03-25 16:46     ` Stefan Hajnoczi
  0 siblings, 1 reply; 657+ messages in thread
From: Maxim Levitsky @ 2019-03-21 17:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
	David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Wolfram Sang, Nicolas Ferre, Paul E . McKenney, Paolo Bonzini,
	Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan, John Ferlan

On Thu, 2019-03-21 at 16:13 +0000, Stefan Hajnoczi wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > Date: Tue, 19 Mar 2019 14:45:45 +0200
> > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
> > 
> > Hi everyone!
> > 
> > In this patch series, I would like to introduce my take on the problem of
> > doing 
> > as fast as possible virtualization of storage with emphasis on low latency.
> > 
> > In this patch series I implemented a kernel vfio based, mediated device
> > that 
> > allows the user to pass through a partition and/or whole namespace to a
> > guest.
> > 
> > The idea behind this driver is based on paper you can find at
> > https://www.usenix.org/conference/atc18/presentation/peng,
> > 
> > Although note that I stared the development prior to reading this paper, 
> > independently.
> > 
> > In addition to that implementation is not based on code used in the paper
> > as 
> > I wasn't being able at that time to make the source available to me.
> > 
> > ***Key points about the implementation:***
> > 
> > * Polling kernel thread is used. The polling is stopped after a 
> > predefined timeout (1/2 sec by default).
> > Support for all interrupt driven mode is planned, and it shows promising
> > results.
> > 
> > * Guest sees a standard NVME device - this allows to run guest with 
> > unmodified drivers, for example windows guests.
> > 
> > * The NVMe device is shared between host and guest.
> > That means that even a single namespace can be split between host 
> > and guest based on different partitions.
> > 
> > * Simple configuration
> > 
> > *** Performance ***
> > 
> > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 
> > and both latency and throughput is very similar to SPDK.
> > 
> > Soon I will test this on a better server and nvme device and provide
> > more formal performance numbers.
> > 
> > Latency numbers:
> > ~80ms - spdk with fio plugin on the host.
> > ~84ms - nvme driver on the host
> > ~87ms - mdev-nvme + nvme driver in the guest
> 
> You mentioned the spdk numbers are with vhost-user-nvme.  Have you
> measured SPDK's vhost-user-blk?

I had lot of measuments of vhost-user-blk vs vhost-user-nvme.
vhost-user-nvme was always a bit faster but only a bit.
Thus I don't think it makes sense to benchamrk against vhost-user-blk.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190319144116.400-1-mlevitsk@redhat.com>
  2019-03-19 15:22 ` Keith Busch
@ 2019-03-21 16:13 ` Stefan Hajnoczi
  2019-03-21 17:07   ` Maxim Levitsky
  1 sibling, 1 reply; 657+ messages in thread
From: Stefan Hajnoczi @ 2019-03-21 16:13 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
	David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Wolfram Sang, Nicolas Ferre, Paul E . McKenney ,
	Paolo Bonzini, Liang Cunming, Liu Changpeng, Fam Zheng,
	Amnon Ilan, John Ferlan

[-- Attachment #1: Type: text/plain, Size: 2018 bytes --]

On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> Date: Tue, 19 Mar 2019 14:45:45 +0200
> Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
> 
> Hi everyone!
> 
> In this patch series, I would like to introduce my take on the problem of doing 
> as fast as possible virtualization of storage with emphasis on low latency.
> 
> In this patch series I implemented a kernel vfio based, mediated device that 
> allows the user to pass through a partition and/or whole namespace to a guest.
> 
> The idea behind this driver is based on paper you can find at
> https://www.usenix.org/conference/atc18/presentation/peng,
> 
> Although note that I stared the development prior to reading this paper, 
> independently.
> 
> In addition to that implementation is not based on code used in the paper as 
> I wasn't being able at that time to make the source available to me.
> 
> ***Key points about the implementation:***
> 
> * Polling kernel thread is used. The polling is stopped after a 
> predefined timeout (1/2 sec by default).
> Support for all interrupt driven mode is planned, and it shows promising results.
> 
> * Guest sees a standard NVME device - this allows to run guest with 
> unmodified drivers, for example windows guests.
> 
> * The NVMe device is shared between host and guest.
> That means that even a single namespace can be split between host 
> and guest based on different partitions.
> 
> * Simple configuration
> 
> *** Performance ***
> 
> Performance was tested on Intel DC P3700, With Xeon E5-2620 v2 
> and both latency and throughput is very similar to SPDK.
> 
> Soon I will test this on a better server and nvme device and provide
> more formal performance numbers.
> 
> Latency numbers:
> ~80ms - spdk with fio plugin on the host.
> ~84ms - nvme driver on the host
> ~87ms - mdev-nvme + nvme driver in the guest

You mentioned the spdk numbers are with vhost-user-nvme.  Have you
measured SPDK's vhost-user-blk?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-20 17:03     ` Keith Busch
@ 2019-03-20 17:33       ` Maxim Levitsky
  0 siblings, 0 replies; 657+ messages in thread
From: Maxim Levitsky @ 2019-03-20 17:33 UTC (permalink / raw)
  To: Keith Busch
  Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
	linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
	John Ferlan

On Wed, 2019-03-20 at 11:03 -0600, Keith Busch wrote:
> On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote:
> > Or instead I can use the block backend, 
> > (but note that currently the block back-end doesn't support polling which is
> > critical for the performance).
> 
> Oh, I think you can do polling through there. For reference, fs/io_uring.c
> has a pretty good implementation that aligns with how you could use it.


That is exactly my thought. The polling recently got lot of improvements in the
block layer, which migh make this feasable.

I will give it a try.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-20 16:30   ` Maxim Levitsky
@ 2019-03-20 17:03     ` Keith Busch
  2019-03-20 17:33       ` Maxim Levitsky
  0 siblings, 1 reply; 657+ messages in thread
From: Keith Busch @ 2019-03-20 17:03 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
	linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
	John Ferlan

On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote:
> Or instead I can use the block backend, 
> (but note that currently the block back-end doesn't support polling which is
> critical for the performance).

Oh, I think you can do polling through there. For reference, fs/io_uring.c
has a pretty good implementation that aligns with how you could use it.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-19 23:49   ` Chaitanya Kulkarni
@ 2019-03-20 16:44     ` Maxim Levitsky
  0 siblings, 0 replies; 657+ messages in thread
From: Maxim Levitsky @ 2019-03-20 16:44 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Keith Busch
  Cc: Fam Zheng, Jens Axboe, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-nvme,
	linux-kernel, Keith Busch, Alex Williamson, Christoph Hellwig,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney, Amnon Ilan, David S . Miller,
	John Ferlan

On Tue, 2019-03-19 at 23:49 +0000, Chaitanya Kulkarni wrote:
> Hi Keith,
> On 03/19/2019 08:21 AM, Keith Busch wrote:
> > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > >    -> Share the NVMe device between host and guest.
> > >       Even in fully virtualized configurations,
> > >       some partitions of nvme device could be used by guests as block
> > > devices
> > >       while others passed through with nvme-mdev to achieve balance
> > > between
> > >       all features of full IO stack emulation and performance.
> > > 
> > >    -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
> > >       can send interrupts to the guest directly without a context
> > >       switch that can be expensive due to meltdown mitigation.
> > > 
> > >    -> Is able to utilize interrupts to get reasonable performance.
> > >       This is only implemented
> > >       as a proof of concept and not included in the patches,
> > >       but interrupt driven mode shows reasonable performance
> > > 
> > >    -> This is a framework that later can be used to support NVMe devices
> > >       with more of the IO virtualization built-in
> > >       (IOMMU with PASID support coupled with device that supports it)
> > 
> > Would be very interested to see the PASID support. You wouldn't even
> > need to mediate the IO doorbells or translations if assigning entire
> > namespaces, and should be much faster than the shadow doorbells.
> > 
> > I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> > reset" separately for immediate inclusion.
> > 
> > I like the idea in principle, but it will take me a little time to get
> > through reviewing your implementation. I would have guessed we could
> > have leveraged something from the existing nvme/target for the mediating
> > controller register access and admin commands. Maybe even start with
> > implementing an nvme passthrough namespace target type (we currently
> > have block and file).
> 
> I have the code for the NVMeOf target passthru-ctrl, I think we can use 
> that as it is if you are looking for the passthru for NVMeOF.
> 
> I'll post patch-series based on the latest code base soon.

I am very intersing in this code. 
Could you explain how your NVMeOF target passthrough works? 
Which components of the NVME stack does it involve?

Best regards,
	Maxim Levitsky

> > 
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
> > 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-19 15:22 ` Keith Busch
  2019-03-19 23:49   ` Chaitanya Kulkarni
@ 2019-03-20 16:30   ` Maxim Levitsky
  2019-03-20 17:03     ` Keith Busch
  2019-04-08 10:04   ` Maxim Levitsky
  2 siblings, 1 reply; 657+ messages in thread
From: Maxim Levitsky @ 2019-03-20 16:30 UTC (permalink / raw)
  To: Keith Busch
  Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
	linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
	John Ferlan

On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> >   -> Share the NVMe device between host and guest. 
> >      Even in fully virtualized configurations,
> >      some partitions of nvme device could be used by guests as block
> > devices 
> >      while others passed through with nvme-mdev to achieve balance between
> >      all features of full IO stack emulation and performance.
> >   
> >   -> NVME-MDEV is a bit faster due to the fact that in-kernel driver 
> >      can send interrupts to the guest directly without a context 
> >      switch that can be expensive due to meltdown mitigation.
> > 
> >   -> Is able to utilize interrupts to get reasonable performance. 
> >      This is only implemented
> >      as a proof of concept and not included in the patches, 
> >      but interrupt driven mode shows reasonable performance
> >      
> >   -> This is a framework that later can be used to support NVMe devices 
> >      with more of the IO virtualization built-in 
> >      (IOMMU with PASID support coupled with device that supports it)
> 

> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.

I fully agree with that.
Note that to enable PASID support two things have to happen in this vendor.

1. Mature support for IOMMU with PASID support. On Intel side I know that they
only have a spec released and currently the kernel bits to support it are
placed.
I still don't know when a product actually supporting this spec is going to be
released. For other vendors (ARM/AMD/) I haven't done yet a research on state of
PASID based IOMMU support on their platforms.

2. NVMe spec has to be extended to support PASID. At minimum, we need an ability
to assign an PASID to a sq/cq queue pair and ability to relocate the doorbells,
such as each guest would get its own (hardware backed) MMIO page with its own
doorbells. Plus of course the hardware vendors have to embrace the spec. I guess
these two things will happen in collaborative manner.

> 
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
I'll do this soon. 

Also '5/9 nvme/pci: add known admin effects to augment admin effects log page'
can be considered for immediate inclusion as well, as it works around a flaw
in the NVMe controller badly done admin side effects page with no side effects
(pun intended) for spec compliant controllers (I think so). 

This can be fixed with a quirk if you prefer though.

> 
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).

I fully agree with you on that I could have used some of the nvme/target code,
and I am planning to do so eventually.

For that I would need to make my driver, to be one of the target drivers, and I
would need to add another target back end, like you said to allow my target
driver to talk directly to the nvme hardware bypassing the block layer.

Or instead I can use the block backend, 
(but note that currently the block back-end doesn't support polling which is
critical for the performance).

Switch to the target code might though have some (probably minor) performance
impact, as it would probably lengthen the critical code path a bit (I might need
for instance to translate the PRP lists I am getting from the virtual controller
to a scattergather list and back).

This is why I did this the way I did, but now knowing that probably I can afford
to loose a bit of performance, I can look at doing that.

Best regards,
Thanks in advance for the review,
	Maxim Levitsky

PS:

For reference currently the IO path looks more or less like that:

My IO thread notices a doorbell write, reads a command from a submission queue,
translates it (without even looking at the data pointer) and sends it to the
nvme pci driver together with pointer to data iterator'.

The nvme pci driver calls the data iterator N times, which makes the iterator
translate and fetch the DMA addresses where the data is already mapped on the
its pci nvme device (the mdev driver maps all the guest memory to the nvme pci
device).
The nvme pci driver uses these addresses it receives, to create a prp list,
which it puts into the data pointer.

The nvme pci driver also allocates an free command id, from a list, puts it into
the command ID and sends the command to the real hardware.

Later the IO thread calls to the nvme pci driver to poll the queue. When
completions arrive, the nvme pci driver returns them back to the IO thread.

> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190319022012.11051-1-thirtythreeforty@gmail.com>
@ 2019-03-20  7:26 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 657+ messages in thread
From: Greg Kroah-Hartman @ 2019-03-20  7:26 UTC (permalink / raw)
  To: George Hilliard; +Cc: linux-mips, linux-kernel

On Mon, Mar 18, 2019 at 08:20:01PM -0600, George Hilliard wrote:
> Because of this change, the driver now expects a pinctrl device
> reference in the mmc controller's device tree node; without it, it will
> bail out.  This could break existing setups that don't specify it
> because it "just worked" up until now.  So currently I just let the old
> behavior fall away because this is a staging driver.  But if this is a
> problem, the old behavior could be added back as a fallback.
> 
> This is version 2 of a patchset that I requested feedback for about a
> month ago.  Please review as if they are a new patchset; all the patches
> were rebased several times and a couple new correctness fixes added.
> 
> The TODO list is largely unchanged, aside from the couple of TODO
> comments in the code that I have addressed.  Ultimately, I think this
> driver could potentially be merged with the "real" mtk-mmc driver as the
> TODO suggests, but someone who is more familiar with the IP core will
> have to do that.  Mediatek documentation (that I can find) is very
> sparse.
> 
> This is ready to merge if there is no other feedback!
> 
> >From George Hilliard <thirtythreeforty@gmail.com> # This line is ignored.
> From: George Hilliard <thirtythreeforty@gmail.com>
> Reply-To: 
> Subject: [PATCH v2 00/11] mt7621-mmc: Various correctness fixes
> In-Reply-To: 
> 
> 

No subject for this email?


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2019-03-19 15:22 ` Keith Busch
@ 2019-03-19 23:49   ` Chaitanya Kulkarni
  2019-03-20 16:44     ` Maxim Levitsky
  2019-03-20 16:30   ` Maxim Levitsky
  2019-04-08 10:04   ` Maxim Levitsky
  2 siblings, 1 reply; 657+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-19 23:49 UTC (permalink / raw)
  To: Keith Busch, Maxim Levitsky
  Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
	Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
	linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
	Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
	Liu Changpeng, Paul E . McKenney ,
	Amnon Ilan, Christoph Hellwig, John Ferlan

Hi Keith,
On 03/19/2019 08:21 AM, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
>>    -> Share the NVMe device between host and guest.
>>       Even in fully virtualized configurations,
>>       some partitions of nvme device could be used by guests as block devices
>>       while others passed through with nvme-mdev to achieve balance between
>>       all features of full IO stack emulation and performance.
>>
>>    -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
>>       can send interrupts to the guest directly without a context
>>       switch that can be expensive due to meltdown mitigation.
>>
>>    -> Is able to utilize interrupts to get reasonable performance.
>>       This is only implemented
>>       as a proof of concept and not included in the patches,
>>       but interrupt driven mode shows reasonable performance
>>
>>    -> This is a framework that later can be used to support NVMe devices
>>       with more of the IO virtualization built-in
>>       (IOMMU with PASID support coupled with device that supports it)
>
> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.
>
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
>
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).

I have the code for the NVMeOf target passthru-ctrl, I think we can use 
that as it is if you are looking for the passthru for NVMeOF.

I'll post patch-series based on the latest code base soon.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190319144116.400-1-mlevitsk@redhat.com>
@ 2019-03-19 15:22 ` Keith Busch
  2019-03-19 23:49   ` Chaitanya Kulkarni
                     ` (2 more replies)
  2019-03-21 16:13 ` Stefan Hajnoczi
  1 sibling, 3 replies; 657+ messages in thread
From: Keith Busch @ 2019-03-19 15:22 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
	Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
	David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Wolfram Sang, Nicolas Ferre, Paul E . McKenney ,
	Paolo Bonzini, Liang Cunming, Liu Changpeng, Fam Zheng,
	Amnon Ilan, John Ferlan

On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
>   -> Share the NVMe device between host and guest. 
>      Even in fully virtualized configurations,
>      some partitions of nvme device could be used by guests as block devices 
>      while others passed through with nvme-mdev to achieve balance between
>      all features of full IO stack emulation and performance.
>   
>   -> NVME-MDEV is a bit faster due to the fact that in-kernel driver 
>      can send interrupts to the guest directly without a context 
>      switch that can be expensive due to meltdown mitigation.
> 
>   -> Is able to utilize interrupts to get reasonable performance. 
>      This is only implemented
>      as a proof of concept and not included in the patches, 
>      but interrupt driven mode shows reasonable performance
>      
>   -> This is a framework that later can be used to support NVMe devices 
>      with more of the IO virtualization built-in 
>      (IOMMU with PASID support coupled with device that supports it)

Would be very interested to see the PASID support. You wouldn't even
need to mediate the IO doorbells or translations if assigning entire
namespaces, and should be much faster than the shadow doorbells.

I think you should send 6/9 "nvme/pci: init shadow doorbell after each
reset" separately for immediate inclusion.

I like the idea in principle, but it will take me a little time to get
through reviewing your implementation. I would have guessed we could
have leveraged something from the existing nvme/target for the mediating
controller register access and admin commands. Maybe even start with
implementing an nvme passthrough namespace target type (we currently
have block and file).

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20190225201635.4648-1-hannes@cmpxchg.org>
@ 2019-02-26 23:49 ` Roman Gushchin
  0 siblings, 0 replies; 657+ messages in thread
From: Roman Gushchin @ 2019-02-26 23:49 UTC (permalink / raw)
  To: up, the, LRU, counts, tracking
  Cc: Andrew Morton, Tejun Heo, linux-mm, cgroups, linux-kernel, Kernel Team

On Mon, Feb 25, 2019 at 03:16:29PM -0500, Johannes Weiner wrote:
> [resending, rebased on top of latest mmots]
> 
> The memcg LRU stats usage is currently a bit messy. Memcg has private
> per-zone counters because reclaim needs zone granularity sometimes,
> but we also have plenty of users that need to awkwardly sum them up to
> node or memcg granularity. Meanwhile the canonical per-memcg vmstats
> do not track the LRU counts (NR_INACTIVE_ANON etc.) as you'd expect.
> 
> This series enables LRU count tracking in the per-memcg vmstats array
> such that lruvec_page_state() and memcg_page_state() work on the enum
> node_stat_item items for the LRU counters. Then it converts all the
> callers that don't specifically need per-zone numbers over to that.

The updated version looks very good* to me!
Please, feel free to use:
Reviewed-by: Roman Gushchin <guro@fb.com>

Looking through the patchset, I have a feeling that we're sometimes
gathering too much data. Perhaps we don't need the whole set
of counters to be per-cpu on both memcg- and memcg-per-node levels.
Merging them can save quite a lot of space. Anyway, it's a separate
topic.

* except "to" and "subject" of the cover letter

Thanks!

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20180827145032.9522-1-hch@lst.de>
@ 2018-08-31 20:23 ` Paul Burton
  0 siblings, 0 replies; 657+ messages in thread
From: Paul Burton @ 2018-08-31 20:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: iommu, Marek Szyprowski, Robin Murphy, Greg Kroah-Hartman,
	linux-mips, linux-kernel

Hi Christoph,

On Mon, Aug 27, 2018 at 04:50:27PM +0200, Christoph Hellwig wrote:
> Subject: [RFC] merge dma_direct_ops and dma_noncoherent_ops
> 
> While most architectures are either always or never dma coherent for a
> given build, the arm, arm64, mips and soon arc architectures can have
> different dma coherent settings on a per-device basis.  Additionally
> some mips builds can decide at boot time if dma is coherent or not.
> 
> I've started to look into handling noncoherent dma in swiotlb, and
> moving the dma-iommu ops into common code [1], and for that we need a
> generic way to check if a given device is coherent or not.  Moving
> this flag into struct device also simplifies the conditionally coherent
> architecture implementations.
> 
> These patches are also available in a git tree given that they have
> a few previous posted dependencies:
> 
>     git://git.infradead.org/users/hch/misc.git dma-direct-noncoherent-merge
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-noncoherent-merge

Apart from the nits in patch 2, these look sane to me from a MIPS
perspective, so for patches 1-4:

    Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20180724222212.8742-1-tsotsos@gmail.com>
@ 2018-07-25  7:39 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 657+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-25  7:39 UTC (permalink / raw)
  To: Georgios Tsotsos; +Cc: devel, James Hogan, linux-kernel, Aaro Koskinen

On Wed, Jul 25, 2018 at 01:22:07AM +0300, Georgios Tsotsos wrote:
> Date: Wed, 25 Jul 2018 01:18:58 +0300
> Subject: [PATCH 0/4] Staging: octeon-usb: Fixes and Coding style applied. 
> 
> Hello, 

Somehow your subject here got messed up and put in the bod of the email.
Not a big deal this time, but be more careful next time please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <2018071901551081442221@163.com>
@ 2018-07-18 20:04 ` Johan Hovold
  0 siblings, 0 replies; 657+ messages in thread
From: Johan Hovold @ 2018-07-18 20:04 UTC (permalink / raw)
  To: m13297920107; +Cc: johan, gregkh, linux-usb, linux-kernel, moviesong, billli

On Thu, Jul 19, 2018 at 01:55:12AM +0800, m13297920107@163.com wrote:
> From 14bd57ea5c5fc385bd36b5a3ea5c805337bbc8db Mon Sep 17 00:00:00 2001
> From: Movie Song <MovieSong@aten-itlab.cn>
> Date: Thu, 19 Jul 2018 02:20:48 +0800
> Subject: [PATCH] USB:serial:pl2303:add a new device id for ATEN

Add spaces after the colons (':') in the Subject above, and place a
short commit message here before your SoB.

> Signed-off-by:MovieSong<MovieSong@aten-itlab.cn>

Missing spaces in you SoB as well.

> ---
>  drivers/usb/serial/pl2303.c | 2 ++
>  drivers/usb/serial/pl2303.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
> index 5d1a193..e41f725 100644
> --- a/drivers/usb/serial/pl2303.c
> +++ b/drivers/usb/serial/pl2303.c
> @@ -52,6 +52,8 @@
>   .driver_info = PL2303_QUIRK_ENDPOINT_HACK },
>   { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_UC485),
>   .driver_info = PL2303_QUIRK_ENDPOINT_HACK },
> + { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_UC232B),
> + .driver_info = PL2303_QUIRK_ENDPOINT_HACK },
>   { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_ID2) },
>   { USB_DEVICE(ATEN_VENDOR_ID2, ATEN_PRODUCT_ID) },
>   { USB_DEVICE(ELCOM_VENDOR_ID, ELCOM_PRODUCT_ID) },
> diff --git a/drivers/usb/serial/pl2303.h b/drivers/usb/serial/pl2303.h
> index fcd7239..26965cc 100644
> --- a/drivers/usb/serial/pl2303.h
> +++ b/drivers/usb/serial/pl2303.h
> @@ -24,6 +24,7 @@
>  #define ATEN_VENDOR_ID2 0x0547
>  #define ATEN_PRODUCT_ID 0x2008
>  #define ATEN_PRODUCT_UC485 0x2021
> +#define ATEN_PRODUCT_UC232B 0x2022
>  #define ATEN_PRODUCT_ID2 0x2118
>  
>  #define IODATA_VENDOR_ID 0x04bb

As I suggested earlier, try sending the patch to yourself first and run
scripts/checkpatch.pl on it. The patch is still whitespace corrupted
(probably by your mail client) as checkpatch would have let you know:

WARNING: Use a single space after Signed-off-by:
#13: 
Signed-off-by:MovieSong<MovieSong@aten-itlab.cn>

WARNING: email address 'MovieSong<MovieSong@aten-itlab.cn>' might be better as 'MovieSong <MovieSong@aten-itlab.cn>'
#13: 
Signed-off-by:MovieSong<MovieSong@aten-itlab.cn>

WARNING: please, no spaces at the start of a line
#27: FILE: drivers/usb/serial/pl2303.c:55:
+ { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_UC232B),$

WARNING: please, no spaces at the start of a line
#28: FILE: drivers/usb/serial/pl2303.c:56:
+ .driver_info = PL2303_QUIRK_ENDPOINT_HACK },$

total: 1 errors, 4 warnings, 15 lines checked


git-send-email is convenient for sending patches (e.g. generated with
git-format-patch). Perhaps you can set that up.

One more try?

Thanks,
Johan

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <201807160555.w6G5t9Dc075492@mse.aten.com.tw>
@ 2018-07-16 10:03 ` Johan Hovold
  0 siblings, 0 replies; 657+ messages in thread
From: Johan Hovold @ 2018-07-16 10:03 UTC (permalink / raw)
  To: moviesong; +Cc: johan, gregkh, linux-usb, linux-kernel, YorkDai, BillLi

On Mon, Jul 16, 2018 at 09:46:05AM +0800, MovieSong wrote:
> From cff42ec450bdd1fb44dd80564cb622660a9a8071 Mon Sep 17 00:00:00 2001
> From: Movie Song <MovieSong@aten-itlab.cn>
> Date: Fri, 13 Jul 2018 17:46:19 +0800
> Subject: [PATCH] This add a new device for ATEN
> 
> Signed-off-by: Movie Song <MovieSong@aten-itlab.cn>

First, your mail still has the legal disclaimer footer which prevents us
from using this patch.

Second, the patch is now inline, but it's unfortunately white-space
damaged (tabs replaces with spaces).

Take a look at

	https://marc.info/?l=linux-usb&m=150576193231309

for an example of what the subject and commit message should look like.

Send it to yourself first and make sure it has no legal disclaimer
footers, and that you can apply it using git-am.

> ---
>  drivers/usb/serial/pl2303.c | 2 ++
>  drivers/usb/serial/pl2303.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
> index 5d1a193..99f7e1f 100644
> --- a/drivers/usb/serial/pl2303.c
> +++ b/drivers/usb/serial/pl2303.c
> @@ -52,6 +52,8 @@
>   .driver_info = PL2303_QUIRK_ENDPOINT_HACK },
>   { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_UC485),
>   .driver_info = PL2303_QUIRK_ENDPOINT_HACK },
> + { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_UC485),
> + .driver_info = PL2303_QUIRK_ENDPOINT_HACK },

And here you add a duplicate entry instead of the one based on the new
id you add.

>   { USB_DEVICE(ATEN_VENDOR_ID, ATEN_PRODUCT_ID2) },
>   { USB_DEVICE(ATEN_VENDOR_ID2, ATEN_PRODUCT_ID) },
>   { USB_DEVICE(ELCOM_VENDOR_ID, ELCOM_PRODUCT_ID) },
> diff --git a/drivers/usb/serial/pl2303.h b/drivers/usb/serial/pl2303.h
> index fcd7239..26965cc 100644
> --- a/drivers/usb/serial/pl2303.h
> +++ b/drivers/usb/serial/pl2303.h
> @@ -24,6 +24,7 @@
>  #define ATEN_VENDOR_ID2 0x0547
>  #define ATEN_PRODUCT_ID 0x2008
>  #define ATEN_PRODUCT_UC485 0x2021
> +#define ATEN_PRODUCT_UC232B 0x2022
>  #define ATEN_PRODUCT_ID2 0x2118
> 
>  #define IODATA_VENDOR_ID 0x04bb

Thanks,
Johan

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] ` <20180613173128.32384-1-vasilyev@ispras.ru>
@ 2018-06-19  7:42   ` Dan Carpenter
  0 siblings, 0 replies; 657+ messages in thread
From: Dan Carpenter @ 2018-06-19  7:42 UTC (permalink / raw)
  To: Anton Vasilyev
  Cc: Andy Shevchenko, devel, ldv-project, Johannes Thumshirn,
	linux-kernel, Sinan Kaya, Hannes Reinecke, Gaurav Pathak

Thanks for this.  This is a lot of work.

On Wed, Jun 13, 2018 at 08:31:28PM +0300, Anton Vasilyev wrote:
> diff --git a/drivers/staging/rts5208/rtsx.c b/drivers/staging/rts5208/rtsx.c
> index 70e0b8623110..69e6abe14abf 100644
> --- a/drivers/staging/rts5208/rtsx.c
> +++ b/drivers/staging/rts5208/rtsx.c
> @@ -857,7 +857,7 @@ static int rtsx_probe(struct pci_dev *pci,
>  	dev->chip = kzalloc(sizeof(*dev->chip), GFP_KERNEL);
>  	if (!dev->chip) {
>  		err = -ENOMEM;
> -		goto errout;
> +		goto chip_alloc_fail;

The most recent successful allocation is scsi_host_alloc().  I was
really hoping this would say something like "goto err_free_host;" or
something.  The naming style here is a "come from" label which doesn't
say if it's going to free the scsi host or not...  It turns out we don't
free the the host, but we should:

err_put_host:
	scsi_host_put(host);

The kzalloc() has it's own error message built in, and all the other
error paths as well so the dev_err() is not super important to me...

Killing the threads seems actually really complicated so maybe we should
just have a separate error paths for that.  I'm not sure...

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-12-07  9:26 Alexander Kappner
@ 2017-12-07 10:38 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 657+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-07 10:38 UTC (permalink / raw)
  To: Alexander Kappner; +Cc: mathias.nyman, linux-usb, linux-kernel

On Thu, Dec 07, 2017 at 01:26:14AM -0800, Alexander Kappner wrote:
> Date: Wed, 6 Dec 2017 15:28:37 -0800
> Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

Something went wrong here, resulting in an email with no subject.

Can you fix this up and resend?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-08-18 17:42 Rajneesh Bhardwaj
@ 2017-08-18 17:53 ` Rajneesh Bhardwaj
  0 siblings, 0 replies; 657+ messages in thread
From: Rajneesh Bhardwaj @ 2017-08-18 17:53 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Peter Zijlstra (Intel),
	Platform Driver, dvhart, Andy Shevchenko, linux-kernel,
	Vishwanath Somayaji, dbasehore, rjw, rajatja

On Fri, Aug 18, 2017 at 11:12:14PM +0530, Rajneesh Bhardwaj wrote:
> Bcc: 
> Subject: Re: [PATCH] platform/x86: intel_pmc_core: Add Package C-states
>  residency info
> Reply-To: 
> In-Reply-To: <CAHp75Vd5Wnio-RCEBENtonYWOJF2+88FDvqkUv1HzV3CdcaaPA@mail.gmail.com>
>

Please ignore my previous email without subject. It was sent by mistake.

> On Fri, Aug 18, 2017 at 08:17:32PM +0300, Andy Shevchenko wrote:
> > +PeterZ (since I mentioned his name)
> > 
> > On Fri, Aug 18, 2017 at 5:58 PM, Rajneesh Bhardwaj
> > <rajneesh.bhardwaj@intel.com> wrote:
> > > On Fri, Aug 18, 2017 at 03:57:34PM +0300, Andy Shevchenko wrote:
> > >> On Fri, Aug 18, 2017 at 3:37 PM, Rajneesh Bhardwaj
> > >> <rajneesh.bhardwaj@intel.com> wrote:
> > >> > This patch introduces a new debugfs entry to read current Package C-state
> > >> > residency values and, one new kernel API to read the Package C-10 residency
> > >> > counter.
> > >> >
> > >> > Package C-state residency MSRs provide useful debug information about system
> > >> > idle states. In idle states system must enter deeper Package C-states.
> > 
> > >> Why this patch is needed?
> > >
> > > Andy, I'll try to give some background for this.
> > >
> > > This is needed to enhance the S0ix failure debug capabilities from within
> > > the kernel. On ChromeOS we have S0ix failsafe kernel framework that is used
> > > to validate S0ix and report the blockers in case of a failure.
> > > https://patchwork.kernel.org/patch/9148999/
> > 
> > (It's not part of upstream)
> 
> Sorry i sent an older link. There are fresh attempts to get this into
> mainline kernel and looks like there is a traction for it.
> https://patchwork.kernel.org/patch/9831229/
> 
> Package C-state (PC10) validation is discussed there.
> 
> > 
> > > So far only intel_pmc_slp_s0_counter_read is called by this framework to
> > > check whether the previous attempt to enter S0ix was success or not.
> > 
> > I harder see even a single user of that API in current kernel. It
> > should be unexported and removed I think.
> > 
> > >  Having
> > > another PC10 counter related exported function enhances the S0ix debug since
> > > PC10 state is a prerequisite to enter S0ix.
> > >
> > >> See, we have turbostat and cpupower user space tools which do this
> > >> without any additional code to be written in kernel. What prevents
> > >> your user space application do the same?
> > >>
> > >> Moreover, we have events for cstate, I assume perf or something alike
> > >> can monitor those counters as well.
> > >
> > > You're right, perhaps the debugfs is redundant when we have those user space
> > > tools but such tools are not available readily for all platforms/distros.
> > > Interfaces like /dev/cpu/*/msr that turbostat uses are not available on all
> > > the platforms.
> > > PMC driver is a debug driver so i thought its better to show Package C-state
> > > related info for low power debug here.
> > >
> > >>
> > >> Sorry, NAK.
> > >
> > > This patch has two parts i.e. exported PC10 API and the debugfs. Based on
> > > the above explanation, if the patch is not good as is, please let me know if
> > > i should drop the debugfs part and respin a v2 with just the exported API or
> > > drop this totally.
> > >
> > > Thanks for the feedback and thanks for taking time to review!
> > 
> > Reading above makes me think that entire design of this is misguided.
> > Since the most of values are counters they better to be accessed in a
> > way how perf does.
> > 
> > In case you need *in-kernel* facility, do some APIs (if it's not done
> > yet) for events drivers first.
> > cstate event driver is already in upstream.
> > 
> > Sorry, NAK for entire patch until it would be blessed by people like Peter Z.
> > 
> > -- 
> > With Best Regards,
> > Andy Shevchenko
> 
> -- 
> Best Regards,
> Rajneesh

-- 
Best Regards,
Rajneesh

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-06-04 11:59 Yury Norov
@ 2017-06-14 20:16 ` Yury Norov
  0 siblings, 0 replies; 657+ messages in thread
From: Yury Norov @ 2017-06-14 20:16 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel, linux-kernel, linux-doc,
	Arnd Bergmann
  Cc: Andrew Pinski, Andrew Pinski, Adam Borowski, Chris Metcalf,
	Steve Ellcey, Maxim Kuvyrkov, Ramana Radhakrishnan,
	Florian Weimer, Bamvor Zhangjian, Andreas Schwab, Chris Metcalf,
	Heiko Carstens, schwidefsky, broonie, Joseph Myers,
	christoph.muellner, szabolcs.nagy, klimov.linux, Nathan_Lynch,
	agraf, Prasun.Kapoor, geert, philipp.tomsich, manuel.montezelo,
	linyongting, davem, zhouchengming1

Hi Catalin, all.

Thank you for your time on reviewing the series. I really appreciate it.

This is the updated version where I tried to address all comments:
https://github.com/norov/linux/commits/ilp32-20170613.4

(3 last patches here is the Andrew Pinski's rework of vdso rebased on
ilp32 series)

If nothing will come here on review, I'll send v8 at the beginning of
the next week. Is this plan OK?

And this is the backport on the v4.11 kernel:
https://github.com/norov/linux/commits/ilp32-4.11.4

Yury

On Sun, Jun 04, 2017 at 02:59:49PM +0300, Yury Norov wrote:
> Subject: [PATCH v7 resend 2 00/20] ILP32 for ARM64
> 
> Hi Catalin,
>  
> Here is a rebase of latest kernel patchset against next-20170602. There's almost
> no changes, but there are some conflicts that are not trivial, and I'd like to
> refresh the submission therefore.
> 
> How are your experiments with testing and benchmarking of ILP32 are going? In
> my current tests I see 0 failures on LTP. Benchmarking on SPEC CPU2006 and
> LMBench shows no difference for LP64 and expected performance boost for ILP32
> (compared to LP64 results).
> 
> Steve Ellcey is handling upstream submission of Glibc patches. The patches are
> ready and have been reviewed and reworked per community’s comments. There are
> no outstanding userspace ABI issues from Glibc. Glibc submission is now waiting
> on ILP32 kernel submission.
> 
> Catalin, regarding rootfs, is OpenSuSe’s build sufficient for your experiments?
> I’ve also seen Wookey merging patches for ILP32 triplet to binutils and pushing
> them to Debian.
> 
> One last thing I wanted to check with you about is ILP32 PCS - does, in your
> view, ARM Ltd. needs to publish any additional docs for ABI to become official?
> 
> Below is the regular description.
> 
> Thanks.
> Yury
> 
> --------
> 
> This series enables aarch64 with ilp32 mode.
> 
> As supporting work, it introduces ARCH_32BIT_OFF_T configuration
> option that is enabled for existing 32-bit architectures but disabled
> for new arches (so 64-bit off_t is is used by new userspace). Also it
> deprecates getrlimit and setrlimit syscalls prior to prlimit64.
> 
> This version is based on linux-next from 2017-03-01. It works with
> glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench,
> CPUSpec.
> 
> Patches 1, 2, 3 and 8 are general, and may be applied separately.
> 
> This is the rebase of v7 - still no major changes has been made.
> 
> Kernel and GLIBC trees:
> https://github.com/norov/linux/tree/ilp32-20170602
> https://github.com/norov/glibc/tree/dev9
> 
> (GLIBC patches are managed by Steve Ellcey, so my tree is only for
> reference.)
> 
> Changes:
> v3: https://lkml.org/lkml/2014/9/3/704
> v4: https://lkml.org/lkml/2015/4/13/691
> v5: https://lkml.org/lkml/2015/9/29/911
> v6: https://lkml.org/lkml/2016/5/23/661
> v7: RFC nowrap:  https://lkml.org/lkml/2016/6/17/990
> v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245
> v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883
> v7: https://lkml.org/lkml/2017/1/9/213
> v7: Resend: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/490801.html
> v7: Resend 2:
>     - vdso-ilp32 Makefile synced with lp64 Makefile (patch 19);
>     - rebased on next-20170602.
> 
> Andrew Pinski (6):
>   arm64: rename COMPAT to AARCH32_EL0 in Kconfig
>   arm64: ensure the kernel is compiled for LP64
>   arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64
>   arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use
>     it
>   arm64: ilp32: introduce ilp32-specific handlers for sigframe and
>     ucontext
>   arm64:ilp32: add ARM64_ILP32 to Kconfig
> 
> Philipp Tomsich (1):
>   arm64:ilp32: add vdso-ilp32 and use for signal return
> 
> Yury Norov (13):
>   compat ABI: use non-compat openat and open_by_handle_at variants
>   32-bit ABI: introduce ARCH_32BIT_OFF_T config option
>   asm-generic: Drop getrlimit and setrlimit syscalls from default list
>   arm64: ilp32: add documentation on the ILP32 ABI for ARM64
>   thread: move thread bits accessors to separated file
>   arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat)
>   arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64
>   arm64: introduce binfmt_elf32.c
>   arm64: ilp32: introduce binfmt_ilp32.c
>   arm64: ilp32: share aarch32 syscall handlers
>   arm64: signal: share lp64 signal routines to ilp32
>   arm64: signal32: move ilp32 and aarch32 common code to separated file
>   arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
> 
>  Documentation/arm64/ilp32.txt                 |  45 +++++++
>  arch/Kconfig                                  |   4 +
>  arch/arc/Kconfig                              |   1 +
>  arch/arc/include/uapi/asm/unistd.h            |   1 +
>  arch/arm/Kconfig                              |   1 +
>  arch/arm64/Kconfig                            |  19 ++-
>  arch/arm64/Makefile                           |   8 ++
>  arch/arm64/include/asm/compat.h               |  19 +--
>  arch/arm64/include/asm/elf.h                  |  37 ++----
>  arch/arm64/include/asm/fpsimd.h               |   2 +-
>  arch/arm64/include/asm/ftrace.h               |   2 +-
>  arch/arm64/include/asm/hwcap.h                |   6 +-
>  arch/arm64/include/asm/is_compat.h            |  90 ++++++++++++++
>  arch/arm64/include/asm/memory.h               |   5 +-
>  arch/arm64/include/asm/processor.h            |  11 +-
>  arch/arm64/include/asm/ptrace.h               |   2 +-
>  arch/arm64/include/asm/seccomp.h              |   2 +-
>  arch/arm64/include/asm/signal32.h             |   9 +-
>  arch/arm64/include/asm/signal32_common.h      |  27 ++++
>  arch/arm64/include/asm/signal_common.h        |  33 +++++
>  arch/arm64/include/asm/signal_ilp32.h         |  38 ++++++
>  arch/arm64/include/asm/syscall.h              |   2 +-
>  arch/arm64/include/asm/thread_info.h          |   4 +-
>  arch/arm64/include/asm/unistd.h               |   6 +-
>  arch/arm64/include/asm/vdso.h                 |   6 +
>  arch/arm64/include/uapi/asm/bitsperlong.h     |   9 +-
>  arch/arm64/include/uapi/asm/unistd.h          |  13 ++
>  arch/arm64/kernel/Makefile                    |   8 +-
>  arch/arm64/kernel/asm-offsets.c               |   9 +-
>  arch/arm64/kernel/binfmt_elf32.c              |  38 ++++++
>  arch/arm64/kernel/binfmt_ilp32.c              |  85 +++++++++++++
>  arch/arm64/kernel/cpufeature.c                |   8 +-
>  arch/arm64/kernel/cpuinfo.c                   |  20 +--
>  arch/arm64/kernel/entry.S                     |  34 +++++-
>  arch/arm64/kernel/entry32.S                   |  80 ------------
>  arch/arm64/kernel/entry32_common.S            | 107 ++++++++++++++++
>  arch/arm64/kernel/entry_ilp32.S               |  22 ++++
>  arch/arm64/kernel/head.S                      |   2 +-
>  arch/arm64/kernel/hw_breakpoint.c             |   8 +-
>  arch/arm64/kernel/perf_regs.c                 |   2 +-
>  arch/arm64/kernel/process.c                   |   7 +-
>  arch/arm64/kernel/ptrace.c                    |  80 ++++++++++--
>  arch/arm64/kernel/signal.c                    | 102 ++++++++++------
>  arch/arm64/kernel/signal32.c                  | 107 ----------------
>  arch/arm64/kernel/signal32_common.c           | 135 ++++++++++++++++++++
>  arch/arm64/kernel/signal_ilp32.c              | 170 ++++++++++++++++++++++++++
>  arch/arm64/kernel/sys_ilp32.c                 | 100 +++++++++++++++
>  arch/arm64/kernel/traps.c                     |   5 +-
>  arch/arm64/kernel/vdso-ilp32/.gitignore       |   2 +
>  arch/arm64/kernel/vdso-ilp32/Makefile         |  80 ++++++++++++
>  arch/arm64/kernel/vdso-ilp32/vdso-ilp32.S     |  33 +++++
>  arch/arm64/kernel/vdso-ilp32/vdso-ilp32.lds.S |  95 ++++++++++++++
>  arch/arm64/kernel/vdso.c                      |  69 +++++++++--
>  arch/arm64/kernel/vdso/gettimeofday.S         |  20 ++-
>  arch/arm64/kernel/vdso/vdso.S                 |   6 +-
>  arch/blackfin/Kconfig                         |   1 +
>  arch/c6x/include/uapi/asm/unistd.h            |   1 +
>  arch/cris/Kconfig                             |   1 +
>  arch/frv/Kconfig                              |   1 +
>  arch/h8300/Kconfig                            |   1 +
>  arch/h8300/include/uapi/asm/unistd.h          |   1 +
>  arch/hexagon/Kconfig                          |   1 +
>  arch/hexagon/include/uapi/asm/unistd.h        |   1 +
>  arch/m32r/Kconfig                             |   1 +
>  arch/m68k/Kconfig                             |   1 +
>  arch/metag/Kconfig                            |   1 +
>  arch/metag/include/uapi/asm/unistd.h          |   1 +
>  arch/microblaze/Kconfig                       |   1 +
>  arch/mips/Kconfig                             |   1 +
>  arch/mn10300/Kconfig                          |   1 +
>  arch/nios2/Kconfig                            |   1 +
>  arch/nios2/include/uapi/asm/unistd.h          |   1 +
>  arch/openrisc/Kconfig                         |   1 +
>  arch/openrisc/include/uapi/asm/unistd.h       |   1 +
>  arch/parisc/Kconfig                           |   1 +
>  arch/powerpc/Kconfig                          |   1 +
>  arch/score/Kconfig                            |   1 +
>  arch/score/include/uapi/asm/unistd.h          |   1 +
>  arch/sh/Kconfig                               |   1 +
>  arch/sparc/Kconfig                            |   1 +
>  arch/tile/Kconfig                             |   1 +
>  arch/tile/include/uapi/asm/unistd.h           |   1 +
>  arch/tile/kernel/compat.c                     |   3 +
>  arch/unicore32/Kconfig                        |   1 +
>  arch/unicore32/include/uapi/asm/unistd.h      |   1 +
>  arch/x86/Kconfig                              |   1 +
>  arch/x86/um/Kconfig                           |   1 +
>  arch/xtensa/Kconfig                           |   1 +
>  drivers/clocksource/arm_arch_timer.c          |   2 +-
>  include/linux/fcntl.h                         |   2 +-
>  include/linux/thread_bits.h                   |  63 ++++++++++
>  include/linux/thread_info.h                   |  66 ++--------
>  include/uapi/asm-generic/unistd.h             |  10 +-
>  93 files changed, 1601 insertions(+), 413 deletions(-)
>  create mode 100644 Documentation/arm64/ilp32.txt
>  create mode 100644 arch/arm64/include/asm/is_compat.h
>  create mode 100644 arch/arm64/include/asm/signal32_common.h
>  create mode 100644 arch/arm64/include/asm/signal_common.h
>  create mode 100644 arch/arm64/include/asm/signal_ilp32.h
>  create mode 100644 arch/arm64/kernel/binfmt_elf32.c
>  create mode 100644 arch/arm64/kernel/binfmt_ilp32.c
>  create mode 100644 arch/arm64/kernel/entry32_common.S
>  create mode 100644 arch/arm64/kernel/entry_ilp32.S
>  create mode 100644 arch/arm64/kernel/signal32_common.c
>  create mode 100644 arch/arm64/kernel/signal_ilp32.c
>  create mode 100644 arch/arm64/kernel/sys_ilp32.c
>  create mode 100644 arch/arm64/kernel/vdso-ilp32/.gitignore
>  create mode 100644 arch/arm64/kernel/vdso-ilp32/Makefile
>  create mode 100644 arch/arm64/kernel/vdso-ilp32/vdso-ilp32.S
>  create mode 100644 arch/arm64/kernel/vdso-ilp32/vdso-ilp32.lds.S
>  create mode 100644 include/linux/thread_bits.h
> 
> -- 
> 2.11.0

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-27  2:08                       ` Joonsoo Kim
@ 2017-04-27 15:10                         ` Michal Hocko
  0 siblings, 0 replies; 657+ messages in thread
From: Michal Hocko @ 2017-04-27 15:10 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 27-04-17 11:08:38, Joonsoo Kim wrote:
> On Wed, Apr 26, 2017 at 11:19:06AM +0200, Michal Hocko wrote:
> > > > [...]
> > > > 
> > > > > > You are trying to change a semantic of something that has a well defined
> > > > > > meaning. I disagree that we should change it. It might sound like a
> > > > > > simpler thing to do because pfn walkers will have to be checked but what
> > > > > > you are proposing is conflating two different things together.
> > > > > 
> > > > > I don't think that *I* try to change the semantic of pfn_valid().
> > > > > It would be original semantic of pfn_valid().
> > > > > 
> > > > > "If pfn_valid() returns true, we can get proper struct page and the
> > > > > zone information,"
> > > > 
> > > > I do not see any guarantee about the zone information anywhere. In fact
> > > > this is not true with the original implementation as I've tried to
> > > > explain already. We do have new pages associated with a zone but that
> > > > association might change during the online phase. So you cannot really
> > > > rely on that information until the page is online. There is no real
> > > > change in that regards after my rework.
> > > 
> > > I know that what you did doesn't change thing much. What I try to say
> > > is that previous implementation related to pfn_valid() in hotplug is
> > > wrong. Please do not assume that hotplug implementation is correct and
> > > other pfn_valid() users are incorrect. There is no design document so
> > > I'm not sure which one is correct but assumption that pfn_valid() user
> > > can access whole the struct page information makes much sense to me.
> > 
> > Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we
> > still need pfn_valid to work for those pfns. Really, pfn_valid has a
> 
> It's really contrary example to your insist. They requires not only
> struct page but also other information, especially, the zone index.
> They checks zone idx to know whether this page is for ZONE_DEVICE or not.

Yes and they guarantee this association is true. Without memory onlining
though. This memory is never online for anybody who is asking.

[...]

> I think that I did my best to explain my reasoning. It seems that we
> cannot agree with each other so it's better for some others to express
> their opinion to this problem. I will stop this discussion from now
> on.

I _do_ appreciate your feedback and if the general consensus is to
modify pfn_valid I can go that direction but my gut feeling tells me
that conflating "existing struct page" test and "fully online and
initialized" one is a wrong thing to do.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-26  9:19                     ` Michal Hocko
@ 2017-04-27  2:08                       ` Joonsoo Kim
  2017-04-27 15:10                         ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-27  2:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Wed, Apr 26, 2017 at 11:19:06AM +0200, Michal Hocko wrote:
> > > [...]
> > > 
> > > > > You are trying to change a semantic of something that has a well defined
> > > > > meaning. I disagree that we should change it. It might sound like a
> > > > > simpler thing to do because pfn walkers will have to be checked but what
> > > > > you are proposing is conflating two different things together.
> > > > 
> > > > I don't think that *I* try to change the semantic of pfn_valid().
> > > > It would be original semantic of pfn_valid().
> > > > 
> > > > "If pfn_valid() returns true, we can get proper struct page and the
> > > > zone information,"
> > > 
> > > I do not see any guarantee about the zone information anywhere. In fact
> > > this is not true with the original implementation as I've tried to
> > > explain already. We do have new pages associated with a zone but that
> > > association might change during the online phase. So you cannot really
> > > rely on that information until the page is online. There is no real
> > > change in that regards after my rework.
> > 
> > I know that what you did doesn't change thing much. What I try to say
> > is that previous implementation related to pfn_valid() in hotplug is
> > wrong. Please do not assume that hotplug implementation is correct and
> > other pfn_valid() users are incorrect. There is no design document so
> > I'm not sure which one is correct but assumption that pfn_valid() user
> > can access whole the struct page information makes much sense to me.
> 
> Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we
> still need pfn_valid to work for those pfns. Really, pfn_valid has a

It's really contrary example to your insist. They requires not only
struct page but also other information, especially, the zone index.
They checks zone idx to know whether this page is for ZONE_DEVICE or not.

So, pfn_valid() for ZONE_DEVICE pages assume that struct page has all
the valid information. It's perfectly matched with my suggestion.
Online isn't important issue here. What the important point is the condition
that pfn_valid() return true. pfn_valid() for ZONE_DEVICE returns true after
arch_add_memory() since all the struct page information is fixed there.

If zone of hotplugged memory cannot be fixed at this moment, you can
defef it until all the information is fixed (onlining). That
seems to be better semantic of pfn_valid() to me.

> different meaning than you would like it to have. Who knows how many
> others like that are lurking there. I feel much more comfortable to go
> and hunt already broken code and fix it rathert than break something
> unexpectedly.

I think that I did my best to explain my reasoning. It seems that we
cannot agree with each other so it's better for some others to express
their opinion to this problem. I will stop this discussion from now
on.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-25  2:50                   ` Joonsoo Kim
@ 2017-04-26  9:19                     ` Michal Hocko
  2017-04-27  2:08                       ` Joonsoo Kim
  0 siblings, 1 reply; 657+ messages in thread
From: Michal Hocko @ 2017-04-26  9:19 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Tue 25-04-17 11:50:45, Joonsoo Kim wrote:
> On Mon, Apr 24, 2017 at 09:53:12AM +0200, Michal Hocko wrote:
> > On Mon 24-04-17 10:44:43, Joonsoo Kim wrote:
> > > On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote:
> > > > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote:
> > > > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> > > > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > > > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> > > > > > [...]
> > > > > > > > Which pfn walkers you have in mind?
> > > > > > > 
> > > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > > > > > > using pfn_valid().
> > > > > > 
> > > > > > Yeah, I've checked that one and in fact this is a good example of the
> > > > > > case where you do not really care about holes. It just checks the page
> > > > > > count which is a valid information under any circumstances.
> > > > > 
> > > > > I don't think so. First, it checks the page *map* count. Is it still valid
> > > > > even if PageReserved() is set?
> > > > 
> > > > I do not know about any user which would manipulate page map count for
> > > > referenced pages. The core MM code doesn't.
> > > 
> > > That's weird that we can get *map* count without PageReserved() check,
> > > but we cannot get zone information.
> > > Zone information is more static information than map count.
> > 
> > As I've already pointed out the rework of the hotplug code is mainly
> > about postponing the zone initialization from the physical hot add to
> > the logical onlining. The zone is really not clear until that moment.
> >  
> > > It should be defined/documented in this time that what information in
> > > the struct page is valid even if PageReserved() is set. And then, we
> > > need to fix all the things based on this design decision.
> > 
> > Where would you suggest documenting this? We do have
> > Documentation/memory-hotplug.txt but it is not really specific about
> > struct page.
> 
> pfn_valid() in include/linux/mmzone.h looks proper place.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c412e6a3a1e9..443258fcac93 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1288,10 +1288,14 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 #ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
 /*
  * pfn_valid() is meant to be able to tell if a given PFN has valid memmap
- * associated with it or not. In FLATMEM, it is expected that holes always
- * have valid memmap as long as there is valid PFNs either side of the hole.
- * In SPARSEMEM, it is assumed that a valid section has a memmap for the
- * entire section.
+ * associated with it or not. This means that a struct page exists for this
+ * pfn. The caller cannot assume the page is fully initialized though.
+ * pfn_to_online_page() should be used to make sure the struct page is fully
+ * initialized.
+ *
+ * In FLATMEM, it is expected that holes always have valid memmap as long as
+ * there is valid PFNs either side of the hole. In SPARSEMEM, it is assumed
+ * that a valid section has a memmap for the entire section.
  *
  * However, an ARM, and maybe other embedded architectures in the future
  * free memmap backing holes to save memory on the assumption the memmap is

> > [...]
> > 
> > > > You are trying to change a semantic of something that has a well defined
> > > > meaning. I disagree that we should change it. It might sound like a
> > > > simpler thing to do because pfn walkers will have to be checked but what
> > > > you are proposing is conflating two different things together.
> > > 
> > > I don't think that *I* try to change the semantic of pfn_valid().
> > > It would be original semantic of pfn_valid().
> > > 
> > > "If pfn_valid() returns true, we can get proper struct page and the
> > > zone information,"
> > 
> > I do not see any guarantee about the zone information anywhere. In fact
> > this is not true with the original implementation as I've tried to
> > explain already. We do have new pages associated with a zone but that
> > association might change during the online phase. So you cannot really
> > rely on that information until the page is online. There is no real
> > change in that regards after my rework.
> 
> I know that what you did doesn't change thing much. What I try to say
> is that previous implementation related to pfn_valid() in hotplug is
> wrong. Please do not assume that hotplug implementation is correct and
> other pfn_valid() users are incorrect. There is no design document so
> I'm not sure which one is correct but assumption that pfn_valid() user
> can access whole the struct page information makes much sense to me.

Not really. E.g. ZONE_DEVICE pages are never online AFAIK. I believe we
still need pfn_valid to work for those pfns. Really, pfn_valid has a
different meaning than you would like it to have. Who knows how many
others like that are lurking there. I feel much more comfortable to go
and hunt already broken code and fix it rathert than break something
unexpectedly.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-24  7:53                 ` Michal Hocko
@ 2017-04-25  2:50                   ` Joonsoo Kim
  2017-04-26  9:19                     ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-25  2:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon, Apr 24, 2017 at 09:53:12AM +0200, Michal Hocko wrote:
> On Mon 24-04-17 10:44:43, Joonsoo Kim wrote:
> > On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote:
> > > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote:
> > > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> > > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> > > > > [...]
> > > > > > > Which pfn walkers you have in mind?
> > > > > > 
> > > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > > > > > using pfn_valid().
> > > > > 
> > > > > Yeah, I've checked that one and in fact this is a good example of the
> > > > > case where you do not really care about holes. It just checks the page
> > > > > count which is a valid information under any circumstances.
> > > > 
> > > > I don't think so. First, it checks the page *map* count. Is it still valid
> > > > even if PageReserved() is set?
> > > 
> > > I do not know about any user which would manipulate page map count for
> > > referenced pages. The core MM code doesn't.
> > 
> > That's weird that we can get *map* count without PageReserved() check,
> > but we cannot get zone information.
> > Zone information is more static information than map count.
> 
> As I've already pointed out the rework of the hotplug code is mainly
> about postponing the zone initialization from the physical hot add to
> the logical onlining. The zone is really not clear until that moment.
>  
> > It should be defined/documented in this time that what information in
> > the struct page is valid even if PageReserved() is set. And then, we
> > need to fix all the things based on this design decision.
> 
> Where would you suggest documenting this? We do have
> Documentation/memory-hotplug.txt but it is not really specific about
> struct page.

pfn_valid() in include/linux/mmzone.h looks proper place.

> 
> [...]
> 
> > > You are trying to change a semantic of something that has a well defined
> > > meaning. I disagree that we should change it. It might sound like a
> > > simpler thing to do because pfn walkers will have to be checked but what
> > > you are proposing is conflating two different things together.
> > 
> > I don't think that *I* try to change the semantic of pfn_valid().
> > It would be original semantic of pfn_valid().
> > 
> > "If pfn_valid() returns true, we can get proper struct page and the
> > zone information,"
> 
> I do not see any guarantee about the zone information anywhere. In fact
> this is not true with the original implementation as I've tried to
> explain already. We do have new pages associated with a zone but that
> association might change during the online phase. So you cannot really
> rely on that information until the page is online. There is no real
> change in that regards after my rework.

I know that what you did doesn't change thing much. What I try to say
is that previous implementation related to pfn_valid() in hotplug is
wrong. Please do not assume that hotplug implementation is correct and
other pfn_valid() users are incorrect. There is no design document so
I'm not sure which one is correct but assumption that pfn_valid() user
can access whole the struct page information makes much sense to me.
So, I hope that please fix hotplug implementation rather than
modifying each pfn_valid() users.

> 
> [...]
> > > So please do not conflate those two different concepts together. I
> > > believe that the most prominent pfn walkers should be covered now and
> > > others can be evaluated later.
> > 
> > Even if original pfn_valid()'s semantic is not the one that I mentioned,
> > I think that suggested semantic from me is better.
> > Only hotplug code need to be changed and others doesn't need to be changed.
> > There is no overhead for others. What's the problem about this approach?
> 
> That this would require to check _every_ single pfn_valid user in the
> kernel. That is beyond my time capacity and not really necessary because
> the current code already suffers from the same/similar class of
> problems.

I think that all the pfn_valid() user doesn't consider hole case.
Unlike your expectation, if your way is taken, it requires to check
_every_ pfn_valid() users.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-24  1:44               ` Joonsoo Kim
@ 2017-04-24  7:53                 ` Michal Hocko
  2017-04-25  2:50                   ` Joonsoo Kim
  0 siblings, 1 reply; 657+ messages in thread
From: Michal Hocko @ 2017-04-24  7:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon 24-04-17 10:44:43, Joonsoo Kim wrote:
> On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote:
> > On Fri 21-04-17 13:38:28, Joonsoo Kim wrote:
> > > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> > > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> > > > [...]
> > > > > > Which pfn walkers you have in mind?
> > > > > 
> > > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > > > > using pfn_valid().
> > > > 
> > > > Yeah, I've checked that one and in fact this is a good example of the
> > > > case where you do not really care about holes. It just checks the page
> > > > count which is a valid information under any circumstances.
> > > 
> > > I don't think so. First, it checks the page *map* count. Is it still valid
> > > even if PageReserved() is set?
> > 
> > I do not know about any user which would manipulate page map count for
> > referenced pages. The core MM code doesn't.
> 
> That's weird that we can get *map* count without PageReserved() check,
> but we cannot get zone information.
> Zone information is more static information than map count.

As I've already pointed out the rework of the hotplug code is mainly
about postponing the zone initialization from the physical hot add to
the logical onlining. The zone is really not clear until that moment.
 
> It should be defined/documented in this time that what information in
> the struct page is valid even if PageReserved() is set. And then, we
> need to fix all the things based on this design decision.

Where would you suggest documenting this? We do have
Documentation/memory-hotplug.txt but it is not really specific about
struct page.

[...]

> > You are trying to change a semantic of something that has a well defined
> > meaning. I disagree that we should change it. It might sound like a
> > simpler thing to do because pfn walkers will have to be checked but what
> > you are proposing is conflating two different things together.
> 
> I don't think that *I* try to change the semantic of pfn_valid().
> It would be original semantic of pfn_valid().
> 
> "If pfn_valid() returns true, we can get proper struct page and the
> zone information,"

I do not see any guarantee about the zone information anywhere. In fact
this is not true with the original implementation as I've tried to
explain already. We do have new pages associated with a zone but that
association might change during the online phase. So you cannot really
rely on that information until the page is online. There is no real
change in that regards after my rework.

[...]
> > So please do not conflate those two different concepts together. I
> > believe that the most prominent pfn walkers should be covered now and
> > others can be evaluated later.
> 
> Even if original pfn_valid()'s semantic is not the one that I mentioned,
> I think that suggested semantic from me is better.
> Only hotplug code need to be changed and others doesn't need to be changed.
> There is no overhead for others. What's the problem about this approach?

That this would require to check _every_ single pfn_valid user in the
kernel. That is beyond my time capacity and not really necessary because
the current code already suffers from the same/similar class of
problems.
 
> And, I'm not sure that you covered the most prominent pfn walkers.
> Please see pagetypeinfo_showblockcount_print() in mm/vmstat.c.

I probably haven't (and will send a patch to fix this one - thanks for
pointing to it) but the point is they those are broken already and they
can be fixed in follow up patches. If you change pfn_valid you might
break an existing code in an unexpected ways.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-21  7:16             ` Michal Hocko
@ 2017-04-24  1:44               ` Joonsoo Kim
  2017-04-24  7:53                 ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-24  1:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri, Apr 21, 2017 at 09:16:16AM +0200, Michal Hocko wrote:
> On Fri 21-04-17 13:38:28, Joonsoo Kim wrote:
> > On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> > > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> > > [...]
> > > > > Which pfn walkers you have in mind?
> > > > 
> > > > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > > > using pfn_valid().
> > > 
> > > Yeah, I've checked that one and in fact this is a good example of the
> > > case where you do not really care about holes. It just checks the page
> > > count which is a valid information under any circumstances.
> > 
> > I don't think so. First, it checks the page *map* count. Is it still valid
> > even if PageReserved() is set?
> 
> I do not know about any user which would manipulate page map count for
> referenced pages. The core MM code doesn't.

That's weird that we can get *map* count without PageReserved() check,
but we cannot get zone information.
Zone information is more static information than map count.

It should be defined/documented in this time that what information in
the struct page is valid even if PageReserved() is set. And then, we
need to fix all the things based on this design decision.

> 
> > What I'd like to ask in this example is
> > that what information is valid if PageReserved() is set. Is there any
> > design document on this? I think that we need to define/document it first.
> 
> NO, it is not AFAIK.
> 
> [...]
> > > OK, fair enough. I did't consider memblock allocations. I will rethink
> > > this patch but there are essentially 3 options
> > > 	- use a different criterion for the offline holes dection. I
> > > 	  have just realized we might do it by storing the online
> > > 	  information into the mem sections
> > > 	- drop this patch
> > > 	- move the PageReferenced check down the chain into
> > > 	  isolate_freepages_block resp. isolate_migratepages_block
> > > 
> > > I would prefer 3 over 2 over 1. I definitely want to make this more
> > > robust so 1 is preferable long term but I do not want this to be a
> > > roadblock to the rest of the rework. Does that sound acceptable to you?
> > 
> > I like #1 among of above options and I already see your patch for #1.
> > It's much better than your first attempt but I'm still not happy due
> > to the semantic of pfn_valid().
> 
> You are trying to change a semantic of something that has a well defined
> meaning. I disagree that we should change it. It might sound like a
> simpler thing to do because pfn walkers will have to be checked but what
> you are proposing is conflating two different things together.

I don't think that *I* try to change the semantic of pfn_valid().
It would be original semantic of pfn_valid().

"If pfn_valid() returns true, we can get proper struct page and the
zone information,"

That situation is now being changed by your patch *hotplug rework*.

"Even if pfn_valid() returns true, we cannot get the zone information
without PageReserved() check, since *zone is determined during
onlining* and pfn_valid() return true after adding the memory."

> 
> > > [..]
> > > > Let me clarify my desire(?) for this issue.
> > > > 
> > > > 1. If pfn_valid() returns true, struct page has valid information, at
> > > > least, in flags (zone id, node id, flags, etc...). So, we can use them
> > > > without checking PageResereved().
> > > 
> > > This is no longer true after my rework. Pages are associated with the
> > > zone during _onlining_ rather than when they are physically hotpluged.
> > 
> > If your rework make information valid during _onlining_, my
> > suggestion is making pfn_valid() return false until onlining.
> > 
> > Caller of pfn_valid() expects that they can get valid information from
> > the struct page. There is no reason to access the struct page if they
> > can't get valid information from it. So, passing pfn_valid() should
> > guarantee that, at least, some kind of information is valid.
> > 
> > If pfn_valid() doesn't guarantee it, most of the pfn walker should
> > check PageResereved() to make sure that validity of information from
> > the struct page.
> 
> This is true only for those walkers which really depend on the full
> initialization. This is not the case for all of them. I do not see any
> reason to introduce another _pfn_valid to just check whether there is a
> struct page...

It's really confusing concept that only some information is valid for
*not* fully initialized struct page. Even, there is no document that
what information is valid for this half-initialized struct page.

Better design would be that we regard that every information is
invalid for half-initialized struct page. In this case, it's natural
to make pfn_valid() returns false for this half-initialized struct page.

>  
> So please do not conflate those two different concepts together. I
> believe that the most prominent pfn walkers should be covered now and
> others can be evaluated later.

Even if original pfn_valid()'s semantic is not the one that I mentioned,
I think that suggested semantic from me is better.
Only hotplug code need to be changed and others doesn't need to be changed.
There is no overhead for others. What's the problem about this approach?

And, I'm not sure that you covered the most prominent pfn walkers.
Please see pagetypeinfo_showblockcount_print() in mm/vmstat.c.
As you admitted, additional check approach is really error-prone and
this example shows that.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-21  4:38           ` Joonsoo Kim
@ 2017-04-21  7:16             ` Michal Hocko
  2017-04-24  1:44               ` Joonsoo Kim
  0 siblings, 1 reply; 657+ messages in thread
From: Michal Hocko @ 2017-04-21  7:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Fri 21-04-17 13:38:28, Joonsoo Kim wrote:
> On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> > On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> > [...]
> > > > Which pfn walkers you have in mind?
> > > 
> > > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > > using pfn_valid().
> > 
> > Yeah, I've checked that one and in fact this is a good example of the
> > case where you do not really care about holes. It just checks the page
> > count which is a valid information under any circumstances.
> 
> I don't think so. First, it checks the page *map* count. Is it still valid
> even if PageReserved() is set?

I do not know about any user which would manipulate page map count for
referenced pages. The core MM code doesn't.

> What I'd like to ask in this example is
> that what information is valid if PageReserved() is set. Is there any
> design document on this? I think that we need to define/document it first.

NO, it is not AFAIK.

[...]
> > OK, fair enough. I did't consider memblock allocations. I will rethink
> > this patch but there are essentially 3 options
> > 	- use a different criterion for the offline holes dection. I
> > 	  have just realized we might do it by storing the online
> > 	  information into the mem sections
> > 	- drop this patch
> > 	- move the PageReferenced check down the chain into
> > 	  isolate_freepages_block resp. isolate_migratepages_block
> > 
> > I would prefer 3 over 2 over 1. I definitely want to make this more
> > robust so 1 is preferable long term but I do not want this to be a
> > roadblock to the rest of the rework. Does that sound acceptable to you?
> 
> I like #1 among of above options and I already see your patch for #1.
> It's much better than your first attempt but I'm still not happy due
> to the semantic of pfn_valid().

You are trying to change a semantic of something that has a well defined
meaning. I disagree that we should change it. It might sound like a
simpler thing to do because pfn walkers will have to be checked but what
you are proposing is conflating two different things together.

> > [..]
> > > Let me clarify my desire(?) for this issue.
> > > 
> > > 1. If pfn_valid() returns true, struct page has valid information, at
> > > least, in flags (zone id, node id, flags, etc...). So, we can use them
> > > without checking PageResereved().
> > 
> > This is no longer true after my rework. Pages are associated with the
> > zone during _onlining_ rather than when they are physically hotpluged.
> 
> If your rework make information valid during _onlining_, my
> suggestion is making pfn_valid() return false until onlining.
> 
> Caller of pfn_valid() expects that they can get valid information from
> the struct page. There is no reason to access the struct page if they
> can't get valid information from it. So, passing pfn_valid() should
> guarantee that, at least, some kind of information is valid.
> 
> If pfn_valid() doesn't guarantee it, most of the pfn walker should
> check PageResereved() to make sure that validity of information from
> the struct page.

This is true only for those walkers which really depend on the full
initialization. This is not the case for all of them. I do not see any
reason to introduce another _pfn_valid to just check whether there is a
struct page...
 
So please do not conflate those two different concepts together. I
believe that the most prominent pfn walkers should be covered now and
others can be evaluated later.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-20  7:28         ` Michal Hocko
  2017-04-20  8:49           ` Michal Hocko
@ 2017-04-21  4:38           ` Joonsoo Kim
  2017-04-21  7:16             ` Michal Hocko
  1 sibling, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-21  4:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu, Apr 20, 2017 at 09:28:20AM +0200, Michal Hocko wrote:
> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> [...]
> > > Which pfn walkers you have in mind?
> > 
> > For example, kpagecount_read() in fs/proc/page.c. I searched it by
> > using pfn_valid().
> 
> Yeah, I've checked that one and in fact this is a good example of the
> case where you do not really care about holes. It just checks the page
> count which is a valid information under any circumstances.

I don't think so. First, it checks the page *map* count. Is it still valid
even if PageReserved() is set? What I'd like to ask in this example is
that what information is valid if PageReserved() is set. Is there any
design document on this? I think that we need to define/document it first.

And, I hope that all the information in flags field is valid in all
cases if pfn_valid() return true. By the design.

This makes all the exsiting pfn walkers happy since we don't need an
additional check for PageReserved().

> 
> > > > The other problem I found is that your change will makes some
> > > > contiguous zones to be considered as non-contiguous. Memory allocated
> > > > by memblock API is also marked as PageResereved. If we consider this as
> > > > a hole, we will set such a zone as non-contiguous.
> > > 
> > > Why would that be a problem? We shouldn't touch those pages anyway?
> > 
> > Skipping those pages in compaction are valid so no problem in this
> > case.
> > 
> > The problem I mentioned above is that adding PageReserved() check in
> > __pageblock_pfn_to_page() invalidates optimization by
> > set_zone_contiguous(). In compaction, we need to get a valid struct
> > page and it requires a lot of work. There is performance problem
> > report due to this so set_zone_contiguous() optimization is added. It
> > checks if the zone is contiguous or not in boot time. If zone is
> > determined as contiguous, we can easily get a valid struct page in
> > runtime without expensive checks.
> 
> OK, I see. I've had some vague understading and the clarification helps.
> 
> > Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
> > woule make that zone->contiguous usually returns false since memory
> > used by memblock API is marked as PageReserved() and your patch regard
> > it as a hole. It invalidates set_zone_contiguous() optimization and I
> > worry about it.
> 
> OK, fair enough. I did't consider memblock allocations. I will rethink
> this patch but there are essentially 3 options
> 	- use a different criterion for the offline holes dection. I
> 	  have just realized we might do it by storing the online
> 	  information into the mem sections
> 	- drop this patch
> 	- move the PageReferenced check down the chain into
> 	  isolate_freepages_block resp. isolate_migratepages_block
> 
> I would prefer 3 over 2 over 1. I definitely want to make this more
> robust so 1 is preferable long term but I do not want this to be a
> roadblock to the rest of the rework. Does that sound acceptable to you?

I like #1 among of above options and I already see your patch for #1.
It's much better than your first attempt but I'm still not happy due
to the semantic of pfn_valid().

> [..]
> > Let me clarify my desire(?) for this issue.
> > 
> > 1. If pfn_valid() returns true, struct page has valid information, at
> > least, in flags (zone id, node id, flags, etc...). So, we can use them
> > without checking PageResereved().
> 
> This is no longer true after my rework. Pages are associated with the
> zone during _onlining_ rather than when they are physically hotpluged.

If your rework make information valid during _onlining_, my
suggestion is making pfn_valid() return false until onlining.

Caller of pfn_valid() expects that they can get valid information from
the struct page. There is no reason to access the struct page if they
can't get valid information from it. So, passing pfn_valid() should
guarantee that, at least, some kind of information is valid.

If pfn_valid() doesn't guarantee it, most of the pfn walker should
check PageResereved() to make sure that validity of information from
the struct page.

> Basically only the nid is set properly. Strictly speaking this is the
> case also without my rework because the zone might change during online
> phase so you cannot assume it is correct even now. It just happens that
> it more or less works just fine.
>
> > 2. pfn_valid() for offlined holes returns false. This can be easily
> > (?) implemented by manipulating SECTION_MAP_MASK in hotplug code. I
> > guess that there is no reason that pfn_valid() returns true for
> > offlined holes. If there is, please let me know.
> 
> There is some code which really expects that pfn_valid returns true iff
> there is a struct page and it doesn't care about the online status.
> E.g. hotplug code itself so no, we cannot change pfn_valid. What we can
> do though is to add pfn_to_online_page which would do the proper check.
> I have already sent [1]. As noted above we can (ab)use the remaining bit
> in SECTION_MAP_MASK to detect offline pages more robustly.

Some pfn_valid() caller in hotplug code look wrong. They want to check
section's validity rather than pfn's validity. Others want to access
the struct page so they fit for my assumption (?) for pfn_valid().
Therefore, we can change that pfn_valid() return false until online.

> > 3. We don't need to check PageReserved() in most of pfn walkers in
> > order to check offline holes.
> 
> We still have to distinguish those who care about offline pages from
> those who do not care about it.

Hotplug code can distinguish those by another way by using new section
mask as you did in a new patch. If someone excluding hotplug code do
care about offline pages, it would be just for optimization rather
than correteness. I think that it's okay.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-20 11:56             ` Vlastimil Babka
@ 2017-04-20 12:13               ` Michal Hocko
  0 siblings, 0 replies; 657+ messages in thread
From: Michal Hocko @ 2017-04-20 12:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, linux-mm, Andrew Morton, Mel Gorman,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 20-04-17 13:56:34, Vlastimil Babka wrote:
> On 04/20/2017 10:49 AM, Michal Hocko wrote:
> > On Thu 20-04-17 09:28:20, Michal Hocko wrote:
> >> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> > [...]
> >>> Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
> >>> woule make that zone->contiguous usually returns false since memory
> >>> used by memblock API is marked as PageReserved() and your patch regard
> >>> it as a hole. It invalidates set_zone_contiguous() optimization and I
> >>> worry about it.
> >>
> >> OK, fair enough. I did't consider memblock allocations. I will rethink
> >> this patch but there are essentially 3 options
> >> 	- use a different criterion for the offline holes dection. I
> >> 	  have just realized we might do it by storing the online
> >> 	  information into the mem sections
> >> 	- drop this patch
> >> 	- move the PageReferenced check down the chain into
> >> 	  isolate_freepages_block resp. isolate_migratepages_block
> >>
> >> I would prefer 3 over 2 over 1. I definitely want to make this more
> >> robust so 1 is preferable long term but I do not want this to be a
> >> roadblock to the rest of the rework. Does that sound acceptable to you?
> > 
> > So I've played with all three options just to see how the outcome would
> > look like and it turned out that going with 1 will be easiest in the
> > end. What do you think about the following? It should be free of any 
> > false positives. I have only compile tested it yet.
> 
> That looks fine, can't say immediately if fully correct. I think you'll
> need to bump SECTION_NID_SHIFT as well and make sure things still fit?
> Otherwise looks like nobody needed a new section bit since 2005, so we
> should be fine.

You are absolutely right. Thanks for spotting this! I have folded this
in

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 611ff869fa4d..c412e6a3a1e9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1166,7 +1166,7 @@ extern unsigned long usemap_size(void);
 #define SECTION_IS_ONLINE	(1UL<<2)
 #define SECTION_MAP_LAST_BIT	(1UL<<3)
 #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
-#define SECTION_NID_SHIFT	2
+#define SECTION_NID_SHIFT	3
 
 static inline struct page *__section_mem_map_addr(struct mem_section *section)
 {
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-20  8:49           ` Michal Hocko
@ 2017-04-20 11:56             ` Vlastimil Babka
  2017-04-20 12:13               ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Vlastimil Babka @ 2017-04-20 11:56 UTC (permalink / raw)
  To: Michal Hocko, Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Andrea Arcangeli,
	Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu, qiuxishi,
	Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On 04/20/2017 10:49 AM, Michal Hocko wrote:
> On Thu 20-04-17 09:28:20, Michal Hocko wrote:
>> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> [...]
>>> Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
>>> woule make that zone->contiguous usually returns false since memory
>>> used by memblock API is marked as PageReserved() and your patch regard
>>> it as a hole. It invalidates set_zone_contiguous() optimization and I
>>> worry about it.
>>
>> OK, fair enough. I did't consider memblock allocations. I will rethink
>> this patch but there are essentially 3 options
>> 	- use a different criterion for the offline holes dection. I
>> 	  have just realized we might do it by storing the online
>> 	  information into the mem sections
>> 	- drop this patch
>> 	- move the PageReferenced check down the chain into
>> 	  isolate_freepages_block resp. isolate_migratepages_block
>>
>> I would prefer 3 over 2 over 1. I definitely want to make this more
>> robust so 1 is preferable long term but I do not want this to be a
>> roadblock to the rest of the rework. Does that sound acceptable to you?
> 
> So I've played with all three options just to see how the outcome would
> look like and it turned out that going with 1 will be easiest in the
> end. What do you think about the following? It should be free of any 
> false positives. I have only compile tested it yet.

That looks fine, can't say immediately if fully correct. I think you'll
need to bump SECTION_NID_SHIFT as well and make sure things still fit?
Otherwise looks like nobody needed a new section bit since 2005, so we
should be fine.

> ---
> From 747794c13c0e82b55b793a31cdbe1a84ee1c6920 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 13 Apr 2017 10:28:45 +0200
> Subject: [PATCH] mm: consider zone which is not fully populated to have holes
> 
> __pageblock_pfn_to_page has two users currently, set_zone_contiguous
> which checks whether the given zone contains holes and
> pageblock_pfn_to_page which then carefully returns a first valid
> page from the given pfn range for the given zone. This doesn't handle
> zones which are not fully populated though. Memory pageblocks can be
> offlined or might not have been onlined yet. In such a case the zone
> should be considered to have holes otherwise pfn walkers can touch
> and play with offline pages.
> 
> Current callers of pageblock_pfn_to_page in compaction seem to work
> properly right now because they only isolate PageBuddy
> (isolate_freepages_block) or PageLRU resp. __PageMovable
> (isolate_migratepages_block) which will be always false for these pages.
> It would be safer to skip these pages altogether, though.
> 
> In order to do this patch adds a new memory section state
> (SECTION_IS_ONLINE) which is set in memory_present (during boot
> time) or in online_pages_range during the memory hotplug. Similarly
> offline_mem_sections clears the bit and it is called when the memory
> range is offlined.
> 
> pfn_to_online_page helper is then added which check the mem section and
> only returns a page if it is onlined already.
> 
> Use the new helper in __pageblock_pfn_to_page and skip the whole page
> block in such a case.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/memory_hotplug.h | 21 ++++++++++++++++++++
>  include/linux/mmzone.h         | 20 ++++++++++++++++++-
>  mm/memory_hotplug.c            |  3 +++
>  mm/page_alloc.c                |  5 ++++-
>  mm/sparse.c                    | 45 +++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 91 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 3c8cf86201c3..fc1c873504eb 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -14,6 +14,19 @@ struct memory_block;
>  struct resource;
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +/*
> + * Return page for the valid pfn only if the page is online. All pfn
> + * walkers which rely on the fully initialized page->flags and others
> + * should use this rather than pfn_valid && pfn_to_page
> + */
> +#define pfn_to_online_page(pfn)				\
> +({							\
> +	struct page *___page = NULL;			\
> +							\
> +	if (online_section_nr(pfn_to_section_nr(pfn)))	\
> +		___page = pfn_to_page(pfn);		\
> +	___page;					\
> +})
>  
>  /*
>   * Types for free bootmem stored in page->lru.next. These have to be in
> @@ -203,6 +216,14 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>  
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
> +#define pfn_to_online_page(pfn)			\
> +({						\
> +	struct page *___page = NULL;		\
> +	if (pfn_valid(pfn))			\
> +		___page = pfn_to_page(pfn);	\
> +	___page;				\
> + })
> +
>  /*
>   * Stub functions for when hotplug is off
>   */
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0fc121bbf4ff..cad16ac080f5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1143,7 +1143,8 @@ extern unsigned long usemap_size(void);
>   */
>  #define	SECTION_MARKED_PRESENT	(1UL<<0)
>  #define SECTION_HAS_MEM_MAP	(1UL<<1)
> -#define SECTION_MAP_LAST_BIT	(1UL<<2)
> +#define SECTION_IS_ONLINE	(1UL<<2)
> +#define SECTION_MAP_LAST_BIT	(1UL<<3)
>  #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
>  #define SECTION_NID_SHIFT	2
>  
> @@ -1174,6 +1175,23 @@ static inline int valid_section_nr(unsigned long nr)
>  	return valid_section(__nr_to_section(nr));
>  }
>  
> +static inline int online_section(struct mem_section *section)
> +{
> +	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
> +}
> +
> +static inline int online_section_nr(unsigned long nr)
> +{
> +	return online_section(__nr_to_section(nr));
> +}
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
> +#endif
> +#endif
> +
>  static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>  {
>  	return __nr_to_section(pfn_to_section_nr(pfn));
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index caa58338d121..98f565c279bf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -929,6 +929,9 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  	unsigned long i;
>  	unsigned long onlined_pages = *(unsigned long *)arg;
>  	struct page *page;
> +
> +	online_mem_sections(start_pfn, start_pfn + nr_pages);
> +
>  	if (PageReserved(pfn_to_page(start_pfn)))
>  		for (i = 0; i < nr_pages; i++) {
>  			page = pfn_to_page(start_pfn + i);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5d72d29a6ece..fa752de84eef 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1353,7 +1353,9 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>  	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
>  		return NULL;
>  
> -	start_page = pfn_to_page(start_pfn);
> +	start_page = pfn_to_online_page(start_pfn);
> +	if (!start_page)
> +		return NULL;
>  
>  	if (page_zone(start_page) != zone)
>  		return NULL;
> @@ -7686,6 +7688,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  			break;
>  	if (pfn == end_pfn)
>  		return;
> +	offline_mem_sections(pfn, end_pfn);
>  	zone = page_zone(pfn_to_page(pfn));
>  	spin_lock_irqsave(&zone->lock, flags);
>  	pfn = start_pfn;
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 6903c8fc3085..79017f90d8fc 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -185,7 +185,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
>  		ms = __nr_to_section(section);
>  		if (!ms->section_mem_map)
>  			ms->section_mem_map = sparse_encode_early_nid(nid) |
> -							SECTION_MARKED_PRESENT;
> +							SECTION_MARKED_PRESENT |
> +							SECTION_IS_ONLINE;
>  	}
>  }
>  
> @@ -590,6 +591,48 @@ void __init sparse_init(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +
> +/* Mark all memory sections within the pfn range as online */
> +void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +		unsigned long section_nr = pfn_to_section_nr(start_pfn);
> +		struct mem_section *ms;
> +
> +		/* onlining code should never touch invalid ranges */
> +		if (WARN_ON(!valid_section_nr(section_nr)))
> +			continue;
> +
> +		ms = __nr_to_section(section_nr);
> +		ms->section_mem_map |= SECTION_IS_ONLINE;
> +	}
> +}
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/* Mark all memory sections within the pfn range as online */
> +void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> +		unsigned long section_nr = pfn_to_section_nr(start_pfn);
> +		struct mem_section *ms;
> +
> +		/*
> +		 * TODO this needs some double checking. Offlining code makes
> +		 * sure to check pfn_valid but those checks might be just bogus
> +		 */
> +		if (WARN_ON(!valid_section_nr(section_nr)))
> +			continue;
> +
> +		ms = __nr_to_section(section_nr);
> +		ms->section_mem_map &= ~SECTION_IS_ONLINE;
> +	}
> +}
> +#endif
> +
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
>  {
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-20  7:28         ` Michal Hocko
@ 2017-04-20  8:49           ` Michal Hocko
  2017-04-20 11:56             ` Vlastimil Babka
  2017-04-21  4:38           ` Joonsoo Kim
  1 sibling, 1 reply; 657+ messages in thread
From: Michal Hocko @ 2017-04-20  8:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 20-04-17 09:28:20, Michal Hocko wrote:
> On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
[...]
> > Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
> > woule make that zone->contiguous usually returns false since memory
> > used by memblock API is marked as PageReserved() and your patch regard
> > it as a hole. It invalidates set_zone_contiguous() optimization and I
> > worry about it.
> 
> OK, fair enough. I did't consider memblock allocations. I will rethink
> this patch but there are essentially 3 options
> 	- use a different criterion for the offline holes dection. I
> 	  have just realized we might do it by storing the online
> 	  information into the mem sections
> 	- drop this patch
> 	- move the PageReferenced check down the chain into
> 	  isolate_freepages_block resp. isolate_migratepages_block
> 
> I would prefer 3 over 2 over 1. I definitely want to make this more
> robust so 1 is preferable long term but I do not want this to be a
> roadblock to the rest of the rework. Does that sound acceptable to you?

So I've played with all three options just to see how the outcome would
look like and it turned out that going with 1 will be easiest in the
end. What do you think about the following? It should be free of any 
false positives. I have only compile tested it yet.
---
>From 747794c13c0e82b55b793a31cdbe1a84ee1c6920 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 13 Apr 2017 10:28:45 +0200
Subject: [PATCH] mm: consider zone which is not fully populated to have holes

__pageblock_pfn_to_page has two users currently, set_zone_contiguous
which checks whether the given zone contains holes and
pageblock_pfn_to_page which then carefully returns a first valid
page from the given pfn range for the given zone. This doesn't handle
zones which are not fully populated though. Memory pageblocks can be
offlined or might not have been onlined yet. In such a case the zone
should be considered to have holes otherwise pfn walkers can touch
and play with offline pages.

Current callers of pageblock_pfn_to_page in compaction seem to work
properly right now because they only isolate PageBuddy
(isolate_freepages_block) or PageLRU resp. __PageMovable
(isolate_migratepages_block) which will be always false for these pages.
It would be safer to skip these pages altogether, though.

In order to do this patch adds a new memory section state
(SECTION_IS_ONLINE) which is set in memory_present (during boot
time) or in online_pages_range during the memory hotplug. Similarly
offline_mem_sections clears the bit and it is called when the memory
range is offlined.

pfn_to_online_page helper is then added which check the mem section and
only returns a page if it is onlined already.

Use the new helper in __pageblock_pfn_to_page and skip the whole page
block in such a case.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/memory_hotplug.h | 21 ++++++++++++++++++++
 include/linux/mmzone.h         | 20 ++++++++++++++++++-
 mm/memory_hotplug.c            |  3 +++
 mm/page_alloc.c                |  5 ++++-
 mm/sparse.c                    | 45 +++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 3c8cf86201c3..fc1c873504eb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -14,6 +14,19 @@ struct memory_block;
 struct resource;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+/*
+ * Return page for the valid pfn only if the page is online. All pfn
+ * walkers which rely on the fully initialized page->flags and others
+ * should use this rather than pfn_valid && pfn_to_page
+ */
+#define pfn_to_online_page(pfn)				\
+({							\
+	struct page *___page = NULL;			\
+							\
+	if (online_section_nr(pfn_to_section_nr(pfn)))	\
+		___page = pfn_to_page(pfn);		\
+	___page;					\
+})
 
 /*
  * Types for free bootmem stored in page->lru.next. These have to be in
@@ -203,6 +216,14 @@ extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
 
 #else /* ! CONFIG_MEMORY_HOTPLUG */
+#define pfn_to_online_page(pfn)			\
+({						\
+	struct page *___page = NULL;		\
+	if (pfn_valid(pfn))			\
+		___page = pfn_to_page(pfn);	\
+	___page;				\
+ })
+
 /*
  * Stub functions for when hotplug is off
  */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0fc121bbf4ff..cad16ac080f5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1143,7 +1143,8 @@ extern unsigned long usemap_size(void);
  */
 #define	SECTION_MARKED_PRESENT	(1UL<<0)
 #define SECTION_HAS_MEM_MAP	(1UL<<1)
-#define SECTION_MAP_LAST_BIT	(1UL<<2)
+#define SECTION_IS_ONLINE	(1UL<<2)
+#define SECTION_MAP_LAST_BIT	(1UL<<3)
 #define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
 #define SECTION_NID_SHIFT	2
 
@@ -1174,6 +1175,23 @@ static inline int valid_section_nr(unsigned long nr)
 	return valid_section(__nr_to_section(nr));
 }
 
+static inline int online_section(struct mem_section *section)
+{
+	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
+}
+
+static inline int online_section_nr(unsigned long nr)
+{
+	return online_section(__nr_to_section(nr));
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
+#ifdef CONFIG_MEMORY_HOTREMOVE
+void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
+#endif
+#endif
+
 static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 {
 	return __nr_to_section(pfn_to_section_nr(pfn));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index caa58338d121..98f565c279bf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -929,6 +929,9 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	unsigned long i;
 	unsigned long onlined_pages = *(unsigned long *)arg;
 	struct page *page;
+
+	online_mem_sections(start_pfn, start_pfn + nr_pages);
+
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d72d29a6ece..fa752de84eef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1353,7 +1353,9 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
 		return NULL;
 
-	start_page = pfn_to_page(start_pfn);
+	start_page = pfn_to_online_page(start_pfn);
+	if (!start_page)
+		return NULL;
 
 	if (page_zone(start_page) != zone)
 		return NULL;
@@ -7686,6 +7688,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			break;
 	if (pfn == end_pfn)
 		return;
+	offline_mem_sections(pfn, end_pfn);
 	zone = page_zone(pfn_to_page(pfn));
 	spin_lock_irqsave(&zone->lock, flags);
 	pfn = start_pfn;
diff --git a/mm/sparse.c b/mm/sparse.c
index 6903c8fc3085..79017f90d8fc 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -185,7 +185,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
 		ms = __nr_to_section(section);
 		if (!ms->section_mem_map)
 			ms->section_mem_map = sparse_encode_early_nid(nid) |
-							SECTION_MARKED_PRESENT;
+							SECTION_MARKED_PRESENT |
+							SECTION_IS_ONLINE;
 	}
 }
 
@@ -590,6 +591,48 @@ void __init sparse_init(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+
+/* Mark all memory sections within the pfn range as online */
+void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(start_pfn);
+		struct mem_section *ms;
+
+		/* onlining code should never touch invalid ranges */
+		if (WARN_ON(!valid_section_nr(section_nr)))
+			continue;
+
+		ms = __nr_to_section(section_nr);
+		ms->section_mem_map |= SECTION_IS_ONLINE;
+	}
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/* Mark all memory sections within the pfn range as online */
+void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+		unsigned long section_nr = pfn_to_section_nr(start_pfn);
+		struct mem_section *ms;
+
+		/*
+		 * TODO this needs some double checking. Offlining code makes
+		 * sure to check pfn_valid but those checks might be just bogus
+		 */
+		if (WARN_ON(!valid_section_nr(section_nr)))
+			continue;
+
+		ms = __nr_to_section(section_nr);
+		ms->section_mem_map &= ~SECTION_IS_ONLINE;
+	}
+}
+#endif
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid)
 {
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-20  1:27       ` Joonsoo Kim
@ 2017-04-20  7:28         ` Michal Hocko
  2017-04-20  8:49           ` Michal Hocko
  2017-04-21  4:38           ` Joonsoo Kim
  0 siblings, 2 replies; 657+ messages in thread
From: Michal Hocko @ 2017-04-20  7:28 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Thu 20-04-17 10:27:55, Joonsoo Kim wrote:
> On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
[...]
> > Which pfn walkers you have in mind?
> 
> For example, kpagecount_read() in fs/proc/page.c. I searched it by
> using pfn_valid().

Yeah, I've checked that one and in fact this is a good example of the
case where you do not really care about holes. It just checks the page
count which is a valid information under any circumstances.

> > > The other problem I found is that your change will makes some
> > > contiguous zones to be considered as non-contiguous. Memory allocated
> > > by memblock API is also marked as PageResereved. If we consider this as
> > > a hole, we will set such a zone as non-contiguous.
> > 
> > Why would that be a problem? We shouldn't touch those pages anyway?
> 
> Skipping those pages in compaction are valid so no problem in this
> case.
> 
> The problem I mentioned above is that adding PageReserved() check in
> __pageblock_pfn_to_page() invalidates optimization by
> set_zone_contiguous(). In compaction, we need to get a valid struct
> page and it requires a lot of work. There is performance problem
> report due to this so set_zone_contiguous() optimization is added. It
> checks if the zone is contiguous or not in boot time. If zone is
> determined as contiguous, we can easily get a valid struct page in
> runtime without expensive checks.

OK, I see. I've had some vague understading and the clarification helps.

> Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
> woule make that zone->contiguous usually returns false since memory
> used by memblock API is marked as PageReserved() and your patch regard
> it as a hole. It invalidates set_zone_contiguous() optimization and I
> worry about it.

OK, fair enough. I did't consider memblock allocations. I will rethink
this patch but there are essentially 3 options
	- use a different criterion for the offline holes dection. I
	  have just realized we might do it by storing the online
	  information into the mem sections
	- drop this patch
	- move the PageReferenced check down the chain into
	  isolate_freepages_block resp. isolate_migratepages_block

I would prefer 3 over 2 over 1. I definitely want to make this more
robust so 1 is preferable long term but I do not want this to be a
roadblock to the rest of the rework. Does that sound acceptable to you?

[..]
> Let me clarify my desire(?) for this issue.
> 
> 1. If pfn_valid() returns true, struct page has valid information, at
> least, in flags (zone id, node id, flags, etc...). So, we can use them
> without checking PageResereved().

This is no longer true after my rework. Pages are associated with the
zone during _onlining_ rather than when they are physically hotpluged.
Basically only the nid is set properly. Strictly speaking this is the
case also without my rework because the zone might change during online
phase so you cannot assume it is correct even now. It just happens that
it more or less works just fine.

> 2. pfn_valid() for offlined holes returns false. This can be easily
> (?) implemented by manipulating SECTION_MAP_MASK in hotplug code. I
> guess that there is no reason that pfn_valid() returns true for
> offlined holes. If there is, please let me know.

There is some code which really expects that pfn_valid returns true iff
there is a struct page and it doesn't care about the online status.
E.g. hotplug code itself so no, we cannot change pfn_valid. What we can
do though is to add pfn_to_online_page which would do the proper check.
I have already sent [1]. As noted above we can (ab)use the remaining bit
in SECTION_MAP_MASK to detect offline pages more robustly.

> 3. We don't need to check PageReserved() in most of pfn walkers in
> order to check offline holes.

We still have to distinguish those who care about offline pages from
those who do not care about it.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-17  8:15     ` Michal Hocko
@ 2017-04-20  1:27       ` Joonsoo Kim
  2017-04-20  7:28         ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-20  1:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon, Apr 17, 2017 at 10:15:15AM +0200, Michal Hocko wrote:
> On Mon 17-04-17 14:47:20, Joonsoo Kim wrote:
> > On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote:
> > > Hi,
> > > here I 3 more preparatory patches which I meant to send on Thursday but
> > > forgot... After more thinking about pfn walkers I have realized that
> > > the current code doesn't check offline holes in zones. From a quick
> > > review that doesn't seem to be a problem currently. Pfn walkers can race
> > > with memory offlining and with the original hotplug impementation those
> > > offline pages can change the zone but I wasn't able to find any serious
> > > problem other than small confusion. The new hotplug code, will not have
> > > any valid zone, though so those code paths should check PageReserved
> > > to rule offline holes. I hope I have addressed all of them in these 3
> > > patches. I would appreciate if Vlastimil and Jonsoo double check after
> > > me.
> > 
> > Hello, Michal.
> > 
> > s/Jonsoo/Joonsoo. :)
> 
> ups, sorry about that.
> 
> > I'm not sure that it's a good idea to add PageResereved() check in pfn
> > walkers. First, this makes struct page validity check as two steps,
> > pfn_valid() and then PageResereved().
> 
> Yes, those are two separate checkes because semantically they are
> different. Not all pfn walkers do care about the online status.

If offlined page has no valid information, reading information
about offlined pages are just wrong. So, all pfn walkers that reads
information about the page should do care about it.

I guess that many callers for pfn_valid() is in this category.

> 
> > If we should not use struct page
> > in this case, it's better to pfn_valid() returns false rather than
> > adding a separate check. Anyway, we need to fix more places (all pfn
> > walker?) if we want to check validity by two steps.
> 
> Which pfn walkers you have in mind?

For example, kpagecount_read() in fs/proc/page.c. I searched it by
using pfn_valid().

> > The other problem I found is that your change will makes some
> > contiguous zones to be considered as non-contiguous. Memory allocated
> > by memblock API is also marked as PageResereved. If we consider this as
> > a hole, we will set such a zone as non-contiguous.
> 
> Why would that be a problem? We shouldn't touch those pages anyway?

Skipping those pages in compaction are valid so no problem in this
case.

The problem I mentioned above is that adding PageReserved() check in
__pageblock_pfn_to_page() invalidates optimization by
set_zone_contiguous(). In compaction, we need to get a valid struct
page and it requires a lot of work. There is performance problem
report due to this so set_zone_contiguous() optimization is added. It
checks if the zone is contiguous or not in boot time. If zone is
determined as contiguous, we can easily get a valid struct page in
runtime without expensive checks.

Your patch try to add PageReserved() to __pageblock_pfn_to_page(). It
woule make that zone->contiguous usually returns false since memory
used by memblock API is marked as PageReserved() and your patch regard
it as a hole. It invalidates set_zone_contiguous() optimization and I
worry about it.

>  
> > And, I guess that it's not enough to check PageResereved() in
> > pageblock_pfn_to_page() in order to skip these pages in compaction. If
> > holes are in the middle of the pageblock, pageblock_pfn_to_page()
> > cannot catch it and compaction will use struct page for this hole.
> 
> Yes pageblock_pfn_to_page cannot catch it and it wouldn't with the
> current implementation anyway. So the implementation won't be any worse
> than with the current code. On the other hand offline holes will always
> fill the whole pageblock (assuming those are not spanning multiple
> memblocks).
>  
> > Therefore, I think that making pfn_valid() return false for not
> > onlined memory is a better solution for this problem. I don't know the
> > implementation detail for hotplug and I don't see your recent change
> > but we may defer memmap initialization until the zone is determined.
> > It will make pfn_valid() return false for un-initialized range.
> 
> I am not really sure. pfn_valid is used in many context and its only
> purpose is to tell whether pfn_to_page will return a valid struct page
> AFAIU.
> 
> I agree that having more checks is more error prone and we can add a
> helper pfn_to_valid_page or something similar but I believe we can do
> that on top of the current hotplug rework. This would require a non
> trivial amount of changes and I believe that a lacking check for the
> offline holes is not critical - we would (ab)use the lowest zone which
> is similar to (ab)using ZONE_NORMAL/MOVABLE with the original code.

I'm not objecting your hotplug rework. In fact, I don't know the
relationship between this work and hotplug rework. I'm agreeing
with checking offline holes but I don't like the design and
implementation about it.

Let me clarify my desire(?) for this issue.

1. If pfn_valid() returns true, struct page has valid information, at
least, in flags (zone id, node id, flags, etc...). So, we can use them
without checking PageResereved().

2. pfn_valid() for offlined holes returns false. This can be easily
(?) implemented by manipulating SECTION_MAP_MASK in hotplug code. I
guess that there is no reason that pfn_valid() returns true for
offlined holes. If there is, please let me know.

3. We don't need to check PageReserved() in most of pfn walkers in
order to check offline holes.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-17  5:47   ` your mail Joonsoo Kim
@ 2017-04-17  8:15     ` Michal Hocko
  2017-04-20  1:27       ` Joonsoo Kim
  0 siblings, 1 reply; 657+ messages in thread
From: Michal Hocko @ 2017-04-17  8:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Mon 17-04-17 14:47:20, Joonsoo Kim wrote:
> On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote:
> > Hi,
> > here I 3 more preparatory patches which I meant to send on Thursday but
> > forgot... After more thinking about pfn walkers I have realized that
> > the current code doesn't check offline holes in zones. From a quick
> > review that doesn't seem to be a problem currently. Pfn walkers can race
> > with memory offlining and with the original hotplug impementation those
> > offline pages can change the zone but I wasn't able to find any serious
> > problem other than small confusion. The new hotplug code, will not have
> > any valid zone, though so those code paths should check PageReserved
> > to rule offline holes. I hope I have addressed all of them in these 3
> > patches. I would appreciate if Vlastimil and Jonsoo double check after
> > me.
> 
> Hello, Michal.
> 
> s/Jonsoo/Joonsoo. :)

ups, sorry about that.

> I'm not sure that it's a good idea to add PageResereved() check in pfn
> walkers. First, this makes struct page validity check as two steps,
> pfn_valid() and then PageResereved().

Yes, those are two separate checkes because semantically they are
different. Not all pfn walkers do care about the online status.

> If we should not use struct page
> in this case, it's better to pfn_valid() returns false rather than
> adding a separate check. Anyway, we need to fix more places (all pfn
> walker?) if we want to check validity by two steps.

Which pfn walkers you have in mind?

> The other problem I found is that your change will makes some
> contiguous zones to be considered as non-contiguous. Memory allocated
> by memblock API is also marked as PageResereved. If we consider this as
> a hole, we will set such a zone as non-contiguous.

Why would that be a problem? We shouldn't touch those pages anyway?
 
> And, I guess that it's not enough to check PageResereved() in
> pageblock_pfn_to_page() in order to skip these pages in compaction. If
> holes are in the middle of the pageblock, pageblock_pfn_to_page()
> cannot catch it and compaction will use struct page for this hole.

Yes pageblock_pfn_to_page cannot catch it and it wouldn't with the
current implementation anyway. So the implementation won't be any worse
than with the current code. On the other hand offline holes will always
fill the whole pageblock (assuming those are not spanning multiple
memblocks).
 
> Therefore, I think that making pfn_valid() return false for not
> onlined memory is a better solution for this problem. I don't know the
> implementation detail for hotplug and I don't see your recent change
> but we may defer memmap initialization until the zone is determined.
> It will make pfn_valid() return false for un-initialized range.

I am not really sure. pfn_valid is used in many context and its only
purpose is to tell whether pfn_to_page will return a valid struct page
AFAIU.

I agree that having more checks is more error prone and we can add a
helper pfn_to_valid_page or something similar but I believe we can do
that on top of the current hotplug rework. This would require a non
trivial amount of changes and I believe that a lacking check for the
offline holes is not critical - we would (ab)use the lowest zone which
is similar to (ab)using ZONE_NORMAL/MOVABLE with the original code.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2017-04-15 12:17 ` Michal Hocko
@ 2017-04-17  5:47   ` Joonsoo Kim
  2017-04-17  8:15     ` Michal Hocko
  0 siblings, 1 reply; 657+ messages in thread
From: Joonsoo Kim @ 2017-04-17  5:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Mel Gorman, Vlastimil Babka,
	Andrea Arcangeli, Jerome Glisse, Reza Arbab, Yasuaki Ishimatsu,
	qiuxishi, Kani Toshimitsu, slaoub, Andi Kleen, David Rientjes,
	Daniel Kiper, Igor Mammedov, Vitaly Kuznetsov, LKML

On Sat, Apr 15, 2017 at 02:17:31PM +0200, Michal Hocko wrote:
> Hi,
> here I 3 more preparatory patches which I meant to send on Thursday but
> forgot... After more thinking about pfn walkers I have realized that
> the current code doesn't check offline holes in zones. From a quick
> review that doesn't seem to be a problem currently. Pfn walkers can race
> with memory offlining and with the original hotplug impementation those
> offline pages can change the zone but I wasn't able to find any serious
> problem other than small confusion. The new hotplug code, will not have
> any valid zone, though so those code paths should check PageReserved
> to rule offline holes. I hope I have addressed all of them in these 3
> patches. I would appreciate if Vlastimil and Jonsoo double check after
> me.

Hello, Michal.

s/Jonsoo/Joonsoo. :)

I'm not sure that it's a good idea to add PageResereved() check in pfn
walkers. First, this makes struct page validity check as two steps,
pfn_valid() and then PageResereved(). If we should not use struct page
in this case, it's better to pfn_valid() returns false rather than
adding a separate check. Anyway, we need to fix more places (all pfn
walker?) if we want to check validity by two steps.

The other problem I found is that your change will makes some
contiguous zones to be considered as non-contiguous. Memory allocated
by memblock API is also marked as PageResereved. If we consider this as
a hole, we will set such a zone as non-contiguous.

And, I guess that it's not enough to check PageResereved() in
pageblock_pfn_to_page() in order to skip these pages in compaction. If
holes are in the middle of the pageblock, pageblock_pfn_to_page()
cannot catch it and compaction will use struct page for this hole.

Therefore, I think that making pfn_valid() return false for not
onlined memory is a better solution for this problem. I don't know the
implementation detail for hotplug and I don't see your recent change
but we may defer memmap initialization until the zone is determined.
It will make pfn_valid() return false for un-initialized range.

Thanks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2016-11-16 14:25   ` Steven Rostedt
@ 2016-11-16 14:28     ` Peter Zijlstra
  0 siblings, 0 replies; 657+ messages in thread
From: Peter Zijlstra @ 2016-11-16 14:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, Daniel Vacek, Daniel Bristot de Oliveira,
	Tommaso Cucinotta, LKML, linux-rt-users, Ingo Molnar

On Wed, Nov 16, 2016 at 09:25:43AM -0500, Steven Rostedt wrote:
> On Wed, 16 Nov 2016 11:40:14 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> 
> > On top of which, the implementation had issues; now I know you're the
> > blinder kind of person that disregards everything not in his immediate
> > interest, but if you'd looked at the patch you'd have seen he'd added
> > code the idle entry path, which will slow down every single to-idle
> > transition.
> 
> Isn't to-idle a bit bloated anyway? Or has that been fixed. I know
> there was some issues with idle_balance() which can add latency to
> wakeups. idle_balance() is also in the to-idle path.
> 

Yes it is too heavy as is, but just stacking more crap in just because
its already expensive seems to wrong way around.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2016-11-16 10:40 ` your mail Peter Zijlstra
@ 2016-11-16 14:25   ` Steven Rostedt
  2016-11-16 14:28     ` Peter Zijlstra
  0 siblings, 1 reply; 657+ messages in thread
From: Steven Rostedt @ 2016-11-16 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Daniel Vacek, Daniel Bristot de Oliveira,
	Tommaso Cucinotta, LKML, linux-rt-users, Ingo Molnar

On Wed, 16 Nov 2016 11:40:14 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On top of which, the implementation had issues; now I know you're the
> blinder kind of person that disregards everything not in his immediate
> interest, but if you'd looked at the patch you'd have seen he'd added
> code the idle entry path, which will slow down every single to-idle
> transition.

Isn't to-idle a bit bloated anyway? Or has that been fixed. I know
there was some issues with idle_balance() which can add latency to
wakeups. idle_balance() is also in the to-idle path.

Note, that this is a sched feature which would be a nop (jump_label)
when disabled. And I'm sure it could also be optimized to be a static
inline as well when it is enabled.

I'm not saying we need to go this approach, but I'm just saying that
the to-idle issue is a bit of a red herring.

-- Steve

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2016-11-15 20:29 Christoph Lameter
@ 2016-11-16 10:40 ` Peter Zijlstra
  2016-11-16 14:25   ` Steven Rostedt
  0 siblings, 1 reply; 657+ messages in thread
From: Peter Zijlstra @ 2016-11-16 10:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Daniel Vacek, Daniel Bristot de Oliveira, Tommaso Cucinotta,
	LKML, linux-rt-users, Steven Rostedt, Ingo Molnar

On Tue, Nov 15, 2016 at 02:29:16PM -0600, Christoph Lameter wrote:
> 
> > > There is a deadlock, Peter!!!
> >
> > Describe please? Also, have you tried disabling RT_RUNTIME_SHARE ?
> >
> 
> 
> The description was given earlier in the the thread and the drawbacks of
> using RT_RUNTIME_SHARE as well.

I've not seen a deadlock described. It either was an unbounded priority
inversion or a starvation issue, both of which are 'design' features of
the !rt kernel.

Neither things are new, so its not a regression either.

And, as stated, I'm not really happy to muck with this known troublesome
code and add features for which we must then maintain feature parity
when replacing it either.

On top of which, the implementation had issues; now I know you're the
blinder kind of person that disregards everything not in his immediate
interest, but if you'd looked at the patch you'd have seen he'd added
code the idle entry path, which will slow down every single to-idle
transition.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2016-09-20 22:21 Andrew Banman
@ 2016-09-20 22:23 ` andrew banman
  0 siblings, 0 replies; 657+ messages in thread
From: andrew banman @ 2016-09-20 22:23 UTC (permalink / raw)
  To: Andrew Banman
  Cc: mingo, akpm, tglx, hpa, travis, rja, sivanich, x86, linux-kernel

Subject line got dropped the first time around. Will send again.

Apologies for the chatter,

Andrew

On Tue, Sep 20, 2016 at 05:21:06PM -0500, Andrew Banman wrote:
> From Andrew Banman <abanman@sgi.com> # This line is ignored.
> From: Andrew Banman <abanman@sgi.com>
> Subject: [PATCH 0/9] arch/x86/platform/uv: add UV4 support to BAU
> In-Reply-To: 
> 
> The following patch set adds support for UV4 architecture to the Broadcast
> Assist Unit (BAU). Major hardware changes to the BAU require these fixes to
> ensure correct operation and to avoid illegal MMR writes.
> 
>  arch/x86/include/asm/uv/uv_bau.h |  45 ++----------------------------
>  arch/x86/platform/uv/tlb_uv.c    | 114 ++++++++++++++++++++++++---------------------------------------
> -------------
> 
> The patch set can be thought of in three logical groups:
> 
> 1) General cleanup.
> 
>  [PATCH 1/9] arch/x86/platform/uv: BAU cleanup: update printks
>  [PATCH 2/9] arch/x86/platform/uv: BAU cleanup: pq_init
>  [PATCH 3/9] arch/x86/platform/uv: BAU replace uv_physnodeaddr
> 
>  These housekeeping patches make the subsequent UV4 patches clearer,
>  and they should be done in any case.
> 
> 
> 2) Implement a new scheme to abstract UV version-specific functions.
> 
>  [PATCH 4/9] arch/x86/platform/uv: BAU add generic function pointers
>  [PATCH 5/9] arch/x86/platform/uv: BAU use generic function pointers
> 
>  We add a struct of function pointers to define version-specific BAU
>  operations. The philosophy is to abstract functions that perform the same
>  operation on all UV versions but have different implementations. This will
>  simplify their use in the body of the driver code and greatly simplify the
>  UV4 patches to follow.
> 
> 
> 3) Add UV4 functionality.
> 
>  [PATCH 6/9] arch/x86/platform/uv: BAU UV4 populate uvhub_version
>  [PATCH 7/9] arch/x86/platform/uv: BAU UV4 disable software timeout
>  [PATCH 8/9] arch/x86/platform/uv: BAU UV4 fix payload queue setup
>  [PATCH 9/9] arch/x86/platform/uv: BAU UV4 add version-specific
> 
>  These patches feature a minimal set of changes to make the BAU on UV4
>  operational.
> 
> 
> This patch set has been tested for regressions on pre-UV4 architectures and
> for correct functionality on UV4. The patches apply cleanly to 4.8-rc7.
> Fine-tuned performance tweaking for UV4 will come in a future patch set.
> 
> 
> Thank you,
> 
> Andrew Banman

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2015-08-03  6:18 Shraddha Barke
  2015-08-03  7:12 ` your mail Sudip Mukherjee
@ 2015-08-03  7:24 ` Dan Carpenter
  1 sibling, 0 replies; 657+ messages in thread
From: Dan Carpenter @ 2015-08-03  7:24 UTC (permalink / raw)
  To: Shraddha Barke
  Cc: Oleg Drokin, Al Viro, Julia Lawall, aybuke ozdemir,
	Andreas Dilger, John L. Hammond, Frank Zago, Greg Kroah-Hartman,
	HPDD-discuss, devel, linux-kernel

Returning EINVAL here is the wrong thing.  Just leave the code as is.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2015-08-03  6:18 Shraddha Barke
@ 2015-08-03  7:12 ` Sudip Mukherjee
  2015-08-03  7:24 ` Dan Carpenter
  1 sibling, 0 replies; 657+ messages in thread
From: Sudip Mukherjee @ 2015-08-03  7:12 UTC (permalink / raw)
  To: Shraddha Barke
  Cc: Oleg Drokin, Al Viro, Julia Lawall, aybuke ozdemir,
	Andreas Dilger, John L. Hammond, Frank Zago, Greg Kroah-Hartman,
	HPDD-discuss, devel, linux-kernel

On Mon, Aug 03, 2015 at 11:48:59AM +0530, Shraddha Barke wrote:
> From b67c6c20455b04b77447ab4561e44f1a75dd978d Mon Sep 17 00:00:00 2001
> From: Shraddha Barke <shraddha.6596@gmail.com>
> Date: Mon, 3 Aug 2015 11:34:19 +0530
> Subject: [PATCH] Staging : lustre : Use -EINVAL instead of -ENOSYS

You do not need these in the commit message.

regards
sudip

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2015-01-21 23:57   ` Jason Gunthorpe
  2015-01-22 20:50     ` One Thousand Gnomes
@ 2015-01-28 22:09     ` atull
  1 sibling, 0 replies; 657+ messages in thread
From: atull @ 2015-01-28 22:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: One Thousand Gnomes, michal.simek, linux-kernel,
	delicious.quinoa, dinguyen, yvanderv

On Wed, 21 Jan 2015, Jason Gunthorpe wrote:

> [unfutzd the cc a bit, sorry]
> 
> On Wed, Jan 21, 2015 at 04:19:17PM -0600, atull wrote:
> > > If we consider a Zynq, for instance, there are a number of clock nets
> > > that the CPU drives into the FPGA fabric. These nets are controlled by
> > > the kernel CLK framework. So, before we program the FPGA bitstream the
> > > clocks must be setup properly.
> > 
> > It's pretty normal for drivers to find out what their clocks are from
> > the DT and enable them.  
> 
> Sure, but the clocks are bitfile specific, and not related to
> programming. Some bitfiles may not require CPU clocks at all.

The bitfile specific clocks are the clocks that are turned on by the device
driver for that chunk of the bitfile.  So those clocks can be specified
the same way as clocks are specied in the DT.

> 
> > Yes the DT overlay can specify:
> >   * clock info
> >   * firmware file name if user is doing it that way
> >   * fpga manager - specific info
> >     * compatiblity string specifies what type of fpga it is
> >     * which fpga this image should go into
> >   * fpga/processor bridges to enable
> >   * driver(s) info that is dependent on the above
> 
> All sounds reasonable
> 
> > > Today in our Zynq systems we have the bootloader preconfigure
> > > everything for what we are trying to do - but that is specific to the
> > > particular FPGA we are expecting to run, and eg, I expect if we ran a
> > > kernel using the Zynq clk framework there would be problems with it
> > > mangling the configuration.
> > > 
> > > So there would have to be some kind of sequence where the DT is
> > > loaded, the zynq specific FPGA programmer does its pre setup, then the
> > > request_firmarw/fpga_program_fw loads the bitstream and another pass
> > > for a zynq specific post setup and completion handshake?
> > 
> > fpga-mgr.c has the concept that each different FPGA family will
> > likely need its own way of doing these 3 steps:
> >  * write_init (prepare fgpa for receiving configuration information)
> >  * write (write configuration info to the fpga)
> >  * write_complete (done writing, put fpga into running mode)
> > 
> > There are callbacks into the manufacturor/fpga family specific lower
> > level driver to do these things (as part of the "fpga_manager_ops"
> 
> I think the missing bit here is that there are bitfile specific things
> as well.
> 
> The functions above are fine for a generic manufacturer bitfile loader,
> ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc.
> 
> But wrappered around that should be another set of functions that are
> bitfile specific.
> 
> Like Zynq-PL-boot-protocol-v1 - which deasserts a reset line and waits
> for the PL to signal back that it has completed reset.
> 
> Or jgg-boot-protocol-v1 which monitors the configuration GPIOs for a
> specific ready pattern..
> 
> Or ... 
> 
> All of those procedures depend on the bitfile to implement something.
> 
> > > The DT needs to specify not only the bitstream programming HW to use
> > > but this ancillary programming protocol. There are many ways to do
> > > a out of reset and completion handshake on Zynq, for instance.
> > 
> > Currently the lower level driver supports only one preferred method
> > of programming.  I guess we could add an enumerated DT property to
> > select programming protocol.  It would have to be manufacturor specific.
> > Alternatively it could be encoded into the compatibility string if that
> > makes sense.
> 
> From a DT perspective I'd expect it to look something like:
> 
> soc {
> 
>   // This is the 'how to program a bitstream'
>   fpga-bitstream0: zynq_pl_dma 
>   {
>      compatible = "xilinx,zynq,pl,dma";
>      regs = <..>
>   }
> 
>   fpga: ..
>   {
>      // This is 'what is in the bitstream'
>      boot-protocol = "xilinx,zynq,protocol1";
>      compatible = "jgg,fpga-foo-bar";
>      manager = @fpga-bitstream0
>      clocks = ..
>      clock-frequency = ...
> 
>      zynq_axi_gp0
>      {
>       // Settings for a CPU to FPGA AXI bridge
>        axi setting 1 = ...
>        [...]
>      }
>   }
> }

That's good.  It also needs to specify the driver(s) for the hardware that the
bitstream will instantiate.

> 
> I could also see integrating with the regulator framework as well to
> power up FPGA specific controllable power supplies.
> 
> > > And then user space would need to have control points between each of
> > > these steps.
> 
> > We could have two options, configurable from the ioctl:
> > * When the DT is loaded, do everything
> > * Even when the DT is loaded, wait for further instructions from ioctl or
> > 
> > Freewheeling flow:
> > * Tell ioctl that we are in freewheeling mode
> > * Load DT overlay
> > 
> > Tightly controlled flow:
> > * Tell ioctl that we are doing things stepwise
> > * Load DT overlay
> > * Use ioctl to step through getting the fpga loaded and known to be happy
> 
> I think you've certainly got the idea!
> 
> Thinking it through some more, if the kernel DT tells the fpga-mgr
> that it is 
> 
>   boot-protocol = "xilinx,zynq,protocol1","jgg,foo-bar";
> 
> Then the kernel should refuse to start it if it doesn't know how to do
> both 'xilinx,zynq,protocol1' and 'jgg,foo-bar'.
> 
> Thus the user space ioctl interface becomes more of how to implement a
> boot protocol helper in userspace? With the proper locking - while the
> helper is working the FPGA cannot be messed with..
> 
> Jason
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2015-01-21 23:57   ` Jason Gunthorpe
@ 2015-01-22 20:50     ` One Thousand Gnomes
  2015-01-28 22:09     ` atull
  1 sibling, 0 replies; 657+ messages in thread
From: One Thousand Gnomes @ 2015-01-22 20:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: atull, michal.simek, linux-kernel, delicious.quinoa, dinguyen, yvanderv

> The functions above are fine for a generic manufacturer bitfile loader,
> ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc.
> 
> But wrappered around that should be another set of functions that are
> bitfile specific.

And also a transport layer. You can have the same FPGA with the same
loader protocol off multiple different bus types (from USB to on CPU die
and all the way between).

Alan

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] ` <alpine.DEB.2.02.1501211520150.13480@linuxheads99>
@ 2015-01-21 23:57   ` Jason Gunthorpe
  2015-01-22 20:50     ` One Thousand Gnomes
  2015-01-28 22:09     ` atull
  0 siblings, 2 replies; 657+ messages in thread
From: Jason Gunthorpe @ 2015-01-21 23:57 UTC (permalink / raw)
  To: atull
  Cc: One Thousand Gnomes, michal.simek, linux-kernel,
	delicious.quinoa, dinguyen, yvanderv

[unfutzd the cc a bit, sorry]

On Wed, Jan 21, 2015 at 04:19:17PM -0600, atull wrote:
> > If we consider a Zynq, for instance, there are a number of clock nets
> > that the CPU drives into the FPGA fabric. These nets are controlled by
> > the kernel CLK framework. So, before we program the FPGA bitstream the
> > clocks must be setup properly.
> 
> It's pretty normal for drivers to find out what their clocks are from
> the DT and enable them.  

Sure, but the clocks are bitfile specific, and not related to
programming. Some bitfiles may not require CPU clocks at all.

> Yes the DT overlay can specify:
>   * clock info
>   * firmware file name if user is doing it that way
>   * fpga manager - specific info
>     * compatiblity string specifies what type of fpga it is
>     * which fpga this image should go into
>   * fpga/processor bridges to enable
>   * driver(s) info that is dependent on the above

All sounds reasonable

> > Today in our Zynq systems we have the bootloader preconfigure
> > everything for what we are trying to do - but that is specific to the
> > particular FPGA we are expecting to run, and eg, I expect if we ran a
> > kernel using the Zynq clk framework there would be problems with it
> > mangling the configuration.
> > 
> > So there would have to be some kind of sequence where the DT is
> > loaded, the zynq specific FPGA programmer does its pre setup, then the
> > request_firmarw/fpga_program_fw loads the bitstream and another pass
> > for a zynq specific post setup and completion handshake?
> 
> fpga-mgr.c has the concept that each different FPGA family will
> likely need its own way of doing these 3 steps:
>  * write_init (prepare fgpa for receiving configuration information)
>  * write (write configuration info to the fpga)
>  * write_complete (done writing, put fpga into running mode)
> 
> There are callbacks into the manufacturor/fpga family specific lower
> level driver to do these things (as part of the "fpga_manager_ops"

I think the missing bit here is that there are bitfile specific things
as well.

The functions above are fine for a generic manufacturer bitfile loader,
ie Xilinx GPIO twiddling, Altera JTAG, Zynq DMA, etc.

But wrappered around that should be another set of functions that are
bitfile specific.

Like Zynq-PL-boot-protocol-v1 - which deasserts a reset line and waits
for the PL to signal back that it has completed reset.

Or jgg-boot-protocol-v1 which monitors the configuration GPIOs for a
specific ready pattern..

Or ... 

All of those procedures depend on the bitfile to implement something.

> > The DT needs to specify not only the bitstream programming HW to use
> > but this ancillary programming protocol. There are many ways to do
> > a out of reset and completion handshake on Zynq, for instance.
> 
> Currently the lower level driver supports only one preferred method
> of programming.  I guess we could add an enumerated DT property to
> select programming protocol.  It would have to be manufacturor specific.
> Alternatively it could be encoded into the compatibility string if that
> makes sense.

>From a DT perspective I'd expect it to look something like:

soc {

  // This is the 'how to program a bitstream'
  fpga-bitstream0: zynq_pl_dma 
  {
     compatible = "xilinx,zynq,pl,dma";
     regs = <..>
  }

  fpga: ..
  {
     // This is 'what is in the bitstream'
     boot-protocol = "xilinx,zynq,protocol1";
     compatible = "jgg,fpga-foo-bar";
     manager = @fpga-bitstream0
     clocks = ..
     clock-frequency = ...

     zynq_axi_gp0
     {
      // Settings for a CPU to FPGA AXI bridge
       axi setting 1 = ...
       [...]
     }
  }
}

I could also see integrating with the regulator framework as well to
power up FPGA specific controllable power supplies.

> > And then user space would need to have control points between each of
> > these steps.

> We could have two options, configurable from the ioctl:
> * When the DT is loaded, do everything
> * Even when the DT is loaded, wait for further instructions from ioctl or
> 
> Freewheeling flow:
> * Tell ioctl that we are in freewheeling mode
> * Load DT overlay
> 
> Tightly controlled flow:
> * Tell ioctl that we are doing things stepwise
> * Load DT overlay
> * Use ioctl to step through getting the fpga loaded and known to be happy

I think you've certainly got the idea!

Thinking it through some more, if the kernel DT tells the fpga-mgr
that it is 

  boot-protocol = "xilinx,zynq,protocol1","jgg,foo-bar";

Then the kernel should refuse to start it if it doesn't know how to do
both 'xilinx,zynq,protocol1' and 'jgg,foo-bar'.

Thus the user space ioctl interface becomes more of how to implement a
boot protocol helper in userspace? With the proper locking - while the
helper is working the FPGA cannot be messed with..

Jason

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-10-15  8:10 Christoph Lameter
@ 2014-10-27 15:07 ` Tejun Heo
  0 siblings, 0 replies; 657+ messages in thread
From: Tejun Heo @ 2014-10-27 15:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel

Hello, Christoph.

On Wed, Oct 15, 2014 at 03:10:37AM -0500, Christoph Lameter wrote:
> Subject: Convert remaining __get_cpu_var uses
> 
> During the 3.18 merge period additional __get_cpu_var uses were
> added. The patch converts these to this_cpu_ptr().
> 
> [This does not address the powerpc issue where the conversion
> patches were routed directly to the powerpc maintainers but were
> not applied in the merge period. Will have to be handled separately]
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Can you please repost with proper subject line and the subsys
maintainers cc'd?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-09-01 15:47 sunwxg
@ 2014-09-01 17:01 ` Dan Carpenter
  0 siblings, 0 replies; 657+ messages in thread
From: Dan Carpenter @ 2014-09-01 17:01 UTC (permalink / raw)
  To: sunwxg
  Cc: Greg Kroah-Hartman, Dulshani Gunawardhana, Josh Triplett,
	John L. Hammond, Andreas Dilger, Chi Pham, Oleg Drokin, devel,
	linux-kernel

No subject.

It should be a subject about adding spaces.

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <1409556896-21523-2-git-send-email-xiaoguang_wang5188@qq.com>
@ 2014-09-01  8:04 ` Dan Carpenter
  0 siblings, 0 replies; 657+ messages in thread
From: Dan Carpenter @ 2014-09-01  8:04 UTC (permalink / raw)
  To: sunwxg
  Cc: Benjamin Romer, David Kershner, Greg Kroah-Hartman, Ken Cox,
	Iulia Manda, Luis R. Rodriguez, Masaru Nomura, devel,
	sparmaintainer, linux-kernel

On Mon, Sep 01, 2014 at 03:34:56PM +0800, sunwxg wrote:
> From: Sun Wang <xiaoguang_wang5188@qq.com>
> 
> Subject: [PATCH] staging: unisys: visorutil: procobjecttree: fix coding style issue 
> 

Your email headers are mangled.  The subject is vague.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-07-09  1:03 James Ban
@ 2014-07-09  7:56 ` Mark Brown
  0 siblings, 0 replies; 657+ messages in thread
From: Mark Brown @ 2014-07-09  7:56 UTC (permalink / raw)
  To: James Ban; +Cc: Liam Girdwood, Support Opensource, LKML, David Dajun Chen

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Wed, Jul 09, 2014 at 10:03:32AM +0900, James Ban wrote:

> > > +	ret = regmap_read(chip->regmap, DA9211_REG_EVENT_B, &reg_val);
> > > +	if (ret < 0)
> > > +		goto error_i2c;

> > > +	if (reg_val & DA9211_E_OV_CURR_A) {

> > > +	if (reg_val & DA9211_E_OV_CURR_B) {

> > > +	return IRQ_HANDLED;

> > This is buggy - the driver should only return IRQ_HANDLED if it handled the
> > interrupt somehow, otherwise it should return IRQ_NONE and let the interrupt
> > core handle things.  This is especially important since the device appears to
> > require that interrupts are explicitly acknoweldged so if something is flagged
> > but not handled the interrupt will just sit constantly asserted.

> Basically all interrupts are masked when the chip wakes up. 
> Only two interrupts are unmasked at the start of driver like below.

I know that's the intention but the code should still be written
robustly - something might go wrong somewhere which causes another
interrupt to be enabled, or we might even gain support for shared
threaded interrupts in the interrupt core and someone could then
try to use that in a system.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <CAM0G4ztXWM5kw6dV4WRrTVJBMmeJDXuRnbeRBE603hM+7c=PCg@mail.gmail.com>
@ 2014-02-25 15:01 ` Will Deacon
  0 siblings, 0 replies; 657+ messages in thread
From: Will Deacon @ 2014-02-25 15:01 UTC (permalink / raw)
  To: srikanth TS; +Cc: ts.srikanth, linux-kernel, iommu, sungjinn.chung

On Tue, Feb 25, 2014 at 11:20:11AM +0000, srikanth TS wrote:
> 
> On Feb 25, 2014 2:28 AM, "Will Deacon" <will.deacon@arm.com<mailto:will.deacon@arm.com>> wrote:
> >
> > On Mon, Feb 24, 2014 at 03:12:21PM +0000, srikanth TS wrote:
> > > Hi Will Deacon,
> >
> > Hello,
> >
> > > Currently SMMU driver expecting all stream ID used by respective master
> > > should be defined in the DT.
> > >
> > > We want to know how to handle in the case of virtual functions dynamically
> > > created and destroyed.
> > >
> > > Is PCI driver responsible for creating stream ID respective BDand
> > > requesting SMMU to add to the mapping table[stream Id to context mapping
> > > table]?
> > >
> > > Or is there any right way of doing it?
> >
> > Correct, the driver currently doesn't support dynamic mappings (mainly
> > because I didn't want to try and invent something that I couldn't test).
> >
> > There are a couple of ways to solve this:
> >
> >   (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU
> >       within a fixed range. That would probably need some code in the bus
> >       layer, so that a bus notifier can kick and call back to the relevant
> >       SMMU.
> 
> I think first way of solving seems to be better, because we don't know how many
> 
> VF are used and i feel its not good idea to keep whole list of streamID [which is
> 
> equal to max num vf] in DT. Again in this method we need to generate the stream ID
> 
> dynamically whenever VF is added in pci iov driver side. And then pass that
> 
> stream ID to SMMU.
> 
> Is it ok this way?  Or you prefer 2nd way which is simpler.

I'm happy either way, but I'd need to see some patches before I can merge
anything ;)

Will

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
  2014-02-24 17:28 ` Will Deacon
@ 2014-02-25 11:28   ` Varun Sethi
  0 siblings, 0 replies; 657+ messages in thread
From: Varun Sethi @ 2014-02-25 11:28 UTC (permalink / raw)
  To: Will Deacon, srikanth TS; +Cc: iommu, sungjinn.chung, linux-kernel, ts.srikanth



> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org [mailto:iommu-
> bounces@lists.linux-foundation.org] On Behalf Of Will Deacon
> Sent: Monday, February 24, 2014 10:59 PM
> To: srikanth TS
> Cc: iommu@lists.linux-foundation.org; sungjinn.chung@samsung.com; linux-
> kernel@vger.kernel.org; ts.srikanth@samsung.com
> Subject: Re: your mail
> 
> On Mon, Feb 24, 2014 at 03:12:21PM +0000, srikanth TS wrote:
> > Hi Will Deacon,
> 
> Hello,
> 
> > Currently SMMU driver expecting all stream ID used by respective
> > master should be defined in the DT.
> >
> > We want to know how to handle in the case of virtual functions
> > dynamically created and destroyed.
> >
> > Is PCI driver responsible for creating stream ID respective BDand
> > requesting SMMU to add to the mapping table[stream Id to context
> > mapping table]?
> >
> > Or is there any right way of doing it?
> 
> Correct, the driver currently doesn't support dynamic mappings (mainly
> because I didn't want to try and invent something that I couldn't test).
> 
> There are a couple of ways to solve this:
> 
>   (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU
>       within a fixed range. That would probably need some code in the bus
>       layer, so that a bus notifier can kick and call back to the
> relevant
>       SMMU.
This could be done in add device notifier. I am working on similar(not PCI) hot plug device infrastructure for arm smmu driver. I will post an RFC patch by next week.

-Varun


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <CAM0G4zvu1BHcOrSgBuobvb-+fVsNWXjXdzZdV51T70B9_ZC4XQ@mail.gmail.com>
@ 2014-02-24 17:28 ` Will Deacon
  2014-02-25 11:28   ` Varun Sethi
  0 siblings, 1 reply; 657+ messages in thread
From: Will Deacon @ 2014-02-24 17:28 UTC (permalink / raw)
  To: srikanth TS; +Cc: iommu, linux-kernel, ts.srikanth, sungjinn.chung

On Mon, Feb 24, 2014 at 03:12:21PM +0000, srikanth TS wrote:
> Hi Will Deacon,

Hello,

> Currently SMMU driver expecting all stream ID used by respective master
> should be defined in the DT.
> 
> We want to know how to handle in the case of virtual functions dynamically
> created and destroyed.
> 
> Is PCI driver responsible for creating stream ID respective BDand
> requesting SMMU to add to the mapping table[stream Id to context mapping
> table]?
> 
> Or is there any right way of doing it?

Correct, the driver currently doesn't support dynamic mappings (mainly
because I didn't want to try and invent something that I couldn't test).

There are a couple of ways to solve this:

  (1) Add a way for a PCI RC to dynamically allocate StreamIDs on an SMMU
      within a fixed range. That would probably need some code in the bus
      layer, so that a bus notifier can kick and call back to the relevant
      SMMU.

  (2) Describe the RID -> SID mapping in the device-tree. We probably want
      to avoid an enormous table, so this would only work for simple `SID =
      RID + offset' or 'SID = RID & mask' cases.

How do your IDs map to each other?

Will

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-01-23  9:06 Prabhakar Lad
@ 2014-01-23 19:55 ` Mark Brown
  0 siblings, 0 replies; 657+ messages in thread
From: Mark Brown @ 2014-01-23 19:55 UTC (permalink / raw)
  To: Prabhakar Lad; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Thu, Jan 23, 2014 at 02:36:05PM +0530, Prabhakar Lad wrote:
> Hi Mark,

Please use a subject line for your e-mails, otherwise they look a lot
like spam.

> So currently I am booting it traditional way (NON DT way) and
> regulator_dev_lookup()
> fails (return NULL)  and for this check it fails.

> +    if (ret && ret != -ENODEV) {
>          regulator = ERR_PTR(ret);
>          goto out;
>      }
> In the NON-DT case the 'ret' is never updated in regulator_dev_lookup().

What is the problem you're trying to report here?  You're describing the
behaviour of the code but I don't understand the problem.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-01-09 18:49   ` Joe Borġ
@ 2014-01-14 16:40     ` Steven Rostedt
  0 siblings, 0 replies; 657+ messages in thread
From: Steven Rostedt @ 2014-01-14 16:40 UTC (permalink / raw)
  To: Joe Bor??; +Cc: Greg KH, abbotti, hsweeten, devel, linux-kernel

On Thu, Jan 09, 2014 at 06:49:39PM +0000, Joe Bor?? wrote:
> 
> I didn't do the changes as root, I sent them from my server as it has SMTP out.
> 

Hmm, this gives me an idea. There's nothing, I believe, that makes the root user
have to have the name "root" except for the passwd file. Maybe I'll just
rename "root" to "walley" and then use "root" as my normal account. If anyone tries
to break into "root" they will just gain access to a normal account and nothing
more ;-)

/me goes back to hacking

-- Steve

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2014-01-09 18:39 ` Greg KH
@ 2014-01-09 18:49   ` Joe Borġ
  2014-01-14 16:40     ` Steven Rostedt
  0 siblings, 1 reply; 657+ messages in thread
From: Joe Borġ @ 2014-01-09 18:49 UTC (permalink / raw)
  To: Greg KH; +Cc: abbotti, hsweeten, devel, linux-kernel

Hi Greg,

I'll re do them tonight.

I didn't do the changes as root, I sent them from my server as it has SMTP out.

Thanks

Regards,
Joseph David Borġ
http://www.jdborg.com


On 9 January 2014 18:39, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Mon, Dec 30, 2013 at 05:40:44PM +0000, Joe Borg wrote:
>> >From 6d9f6446434c4021cc9452e31c374ac50e08f0f9 Mon Sep 17 00:00:00 2001
>> From: Joe Borg <root@josephb.org>
>
> This isn't matching your "from:" line on your email, why should I trust
> it?
>
> And doing kernel work as 'root'?  That's not a good idea for lots of
> reasons...
>
>> Date: Mon, 30 Dec 2013 15:35:08 +0000
>> Subject: [PATCH 62/62] DAS1800: Fixing error from checkpatch.
>>
>> Fixed pointer typeo; foo * bar should be foo *bar.
>>
>> Signed-off by Joe Borg <root@josephb.org>
>
> What happened to your Subject:?
>
> And why is the whole git header in the email, please use git send-email
> so that I don't have to hand-edit the body of the email to apply it.
>
> Can you please fix this up and resend?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <1388425244-10017-1-git-send-email-jdb@sitrep3.com>
@ 2014-01-09 18:39 ` Greg KH
  2014-01-09 18:49   ` Joe Borġ
  0 siblings, 1 reply; 657+ messages in thread
From: Greg KH @ 2014-01-09 18:39 UTC (permalink / raw)
  To: Joe Borg; +Cc: abbotti, hsweeten, devel, linux-kernel

On Mon, Dec 30, 2013 at 05:40:44PM +0000, Joe Borg wrote:
> >From 6d9f6446434c4021cc9452e31c374ac50e08f0f9 Mon Sep 17 00:00:00 2001
> From: Joe Borg <root@josephb.org>

This isn't matching your "from:" line on your email, why should I trust
it?

And doing kernel work as 'root'?  That's not a good idea for lots of
reasons...

> Date: Mon, 30 Dec 2013 15:35:08 +0000
> Subject: [PATCH 62/62] DAS1800: Fixing error from checkpatch.
> 
> Fixed pointer typeo; foo * bar should be foo *bar.
> 
> Signed-off by Joe Borg <root@josephb.org>

What happened to your Subject:?

And why is the whole git header in the email, please use git send-email
so that I don't have to hand-edit the body of the email to apply it.

Can you please fix this up and resend?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <CACaajQtCTW_PKA25q3-4o4XAV6sgZnyD+Skkw6mhUHpRBEgbjQ@mail.gmail.com>
@ 2012-11-26 18:29 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2012-11-26 18:29 UTC (permalink / raw)
  To: Vasiliy Tolstov; +Cc: linux-kernel, stable

On Mon, Nov 26, 2012 at 10:14:44PM +0400, Vasiliy Tolstov wrote:
> Hello, Greg. Hello kernel team! I'm system enginer at clodo.ru (russian cloud
> hosting provider) we are use xen and sles11-sp2 for our compute xen nodes.
> Each virtual machine (domU) have disks that attached by Infiniband SRP. On top
> of disk that attached by srp we use multipath (to do failover)
> Now we have issues like all commands that uses multipath hang while one storage
> is rebooted.
> After some discussion with maintainer of linux-rdma (Bart Van Assche) and using
> it backported ib_srp with HA patches we can't solve deadlock issues. Bart
> thinks that SLES team does not backport some core scsi patches to their kernel
> (3.0.42) to prevent multipath deadlock (currently is about 2.5 minutes) on
> failed target.
> Is that possible to determine or getting help to solve this issue?

As you are using SLES, please contact the SUSE for support for that
kernel, as you are paying for it, and the community can't do anything to
support their kernel, sorry.

Best of luck,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-10-10 15:06 Kent Yoder
@ 2012-10-10 15:12 ` Kent Yoder
  0 siblings, 0 replies; 657+ messages in thread
From: Kent Yoder @ 2012-10-10 15:12 UTC (permalink / raw)
  To: Kent Yoder; +Cc: linux-kernel, linux-security-module, tpmdd-devel

 Please ignore.

On Wed, Oct 10, 2012 at 10:06:53AM -0500, Kent Yoder wrote:
> The following changes since commit ecefbd94b834fa32559d854646d777c56749ef1c:
> 
>   Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm (2012-10-04 09:30:33 -0700)
> 
> are available in the git repository at:
> 
> 
>   git://github.com/shpedoikal/linux.git tpmdd-fixes-v3.6
> 
> for you to fetch changes up to 1631cfb7cee28388b04aef6c0a73050f6fd76e4d:
> 
>   driver/char/tpm: fix regression causesd by ppi (2012-10-10 09:50:56 -0500)
> 
> ----------------------------------------------------------------
> Gang Wei (1):
>       driver/char/tpm: fix regression causesd by ppi
> 
>  drivers/char/tpm/tpm.c     |  3 ++-
>  drivers/char/tpm/tpm.h     |  9 +++++++--
>  drivers/char/tpm/tpm_ppi.c | 18 ++++++++++--------
>  3 files changed, 19 insertions(+), 11 deletions(-)


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-10-04 16:50 Andrea Arcangeli
@ 2012-10-04 18:17 ` Christoph Lameter
  0 siblings, 0 replies; 657+ messages in thread
From: Christoph Lameter @ 2012-10-04 18:17 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-kernel, linux-mm, Linus Torvalds, Andrew Morton,
	Peter Zijlstra, Ingo Molnar, Mel Gorman, Hugh Dickins,
	Rik van Riel, Johannes Weiner, Hillf Danton, Andrew Jones,
	Dan Smith, Thomas Gleixner, Paul Turner, Suresh Siddha,
	Mike Galbraith, Paul E. McKenney

On Thu, 4 Oct 2012, Andrea Arcangeli wrote:

> So we could drop page_autonuma by creating a CONFIG_SLUB=y dependency
> (AUTONUMA wouldn't be available in the kernel config if SLAB=y, and it
> also wouldn't be available on 32bit archs but the latter isn't a
> problem).

Nope it should depend on page struct alignment. Other kernel subsystems
may be depeding on page struct alignment in the future (and some other
arches may already have that requirement)


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-08-03 17:43 Tejun Heo
@ 2012-08-08 16:39 ` Tejun Heo
  0 siblings, 0 replies; 657+ messages in thread
From: Tejun Heo @ 2012-08-08 16:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: torvalds, akpm, padovan, marcel, peterz, mingo, davem,
	dougthompson, ibm-acpi, cbou, rui.zhang, tomi.valkeinen

On Fri, Aug 03, 2012 at 10:43:45AM -0700, Tejun Heo wrote:
> delayed_work has been annoyingly missing the mechanism to modify timer
> of a pending delayed_work - ie. mod_timer() counterpart.  delayed_work
> users have been working around this using several methods - using an
> explicit timer + work item, messing directly with delayed_work->timer,
> and canceling before re-queueing, all of which are error-prone and/or
> ugly.
> 
> Gustavo Padovan posted a RFC implementation[1] of mod_delayed_work() a
> while back but it wasn't complete.  To properly implement
> mod_delayed_work[_on](), it should be able to steal pending work items
> which may be on timer or worklist or anywhere inbetween.  This is
> similar to what __cancel_work_timer() does but it turns out that there
> are a lot of holes around this area and try_to_grab_pending() needs
> considerable amount of work to be used for other purposes too.
> 
> This patchset improves canceling and try_to_grab_pending(), use it to
> implement mod_delayed_work[_on](), convert easy ones, and drop
> __cancel_delayed_work_sync() which doesn't have relevant users
> afterwards.

Applied to wq/for-3.7.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-21 15:39 Yang Honggang
@ 2012-02-21 11:34 ` Hans J. Koch
  0 siblings, 0 replies; 657+ messages in thread
From: Hans J. Koch @ 2012-02-21 11:34 UTC (permalink / raw)
  To: Yang Honggang; +Cc: linux-kernel, hjk

On Tue, Feb 21, 2012 at 10:39:18AM -0500, Yang Honggang wrote:
> hi, everyone

Please give your mail a proper subject line before posting.
If you talk about UIO, it should start with uio:
Otherwise, people won't read it and just send it to /dev/null.

> 
> Is there a mail list dedicated for UIO (userspace I/O)?

No, there's not enough mail traffic to justify that.

> I want to contribute to UIO but did not find the right
> mail list.

Please send your contribution to LKML and Cc: me and Greg
Kroah-Hartman. If you change an existing driver, also Cc:
the author of that driver.

Thanks,
Hans

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12 19:06     ` Al Viro
@ 2012-02-13  9:40       ` Jiri Slaby
  0 siblings, 0 replies; 657+ messages in thread
From: Jiri Slaby @ 2012-02-13  9:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Jiri Slaby, Richard Weinberger, linux-kernel,
	user-mode-linux-devel, akpm, alan, gregkh

On 02/12/2012 08:06 PM, Al Viro wrote:
> Yecchhh...  If I'm reading (and grepping) it right, there are only two
> non-default instance of tty_operations ->shutdown() - pty and vt ones.
> Lovely...  And while we are at it, vt instance is definitely not safe
> from interrupts - calls console_lock().  Not that it was relevant in
> this case...

Thanks for looking into that. I was too lazy to do that on Sunday.

You're right that it may cause problems. Fortunately vt doesn't refcount
ttys. Hence con_shutdown can be called only from release_tty (close
path) in the user context.

Adding to my TODO list, unless somebody beats me to fix it.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12 19:11 ` Al Viro
@ 2012-02-13  9:15   ` Jiri Slaby
  0 siblings, 0 replies; 657+ messages in thread
From: Jiri Slaby @ 2012-02-13  9:15 UTC (permalink / raw)
  To: Al Viro
  Cc: Richard Weinberger, linux-kernel, user-mode-linux-devel, akpm,
	alan, gregkh

On 02/12/2012 08:11 PM, Al Viro wrote:
> On Sun, Feb 12, 2012 at 01:21:10AM +0100, Richard Weinberger wrote:
> 
>> @@ -343,7 +267,7 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>>  {
>>  	struct chan *chan = data;
>>  	struct line *line = chan->line;
>> -	struct tty_struct *tty = line->tty;
>> +	struct tty_struct *tty = tty_port_tty_get(&line->port);
>>  	int err;
>>  
>>  	/*
>> @@ -354,6 +278,9 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>>  	spin_lock(&line->lock);
>>  	err = flush_buffer(line);
>>  	if (err == 0) {
>> +		tty_kref_put(tty);
>> +
>> +		spin_unlock(&line->lock);
>>  		return IRQ_NONE;
>>  	} else if (err < 0) {
>>  		line->head = line->buffer;
>> @@ -365,9 +292,12 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>>  		return IRQ_NONE;
>>  
>>  	tty_wakeup(tty);
>> +	tty_kref_put(tty);
>>  	return IRQ_HANDLED;
>>  }
> 
> That, BTW, smells ugly.  Note that return before the last one has no
> tty_kref_put() for a very good reason - it's under if (!tty).  And
> just as line->tty, port->tty can become NULL, so tty_port_tty_get()
> can, indeed, return NULL here.  Which makes the first tty_kref_put()
> oopsable...

Nope, it is allowed to call tty_kref_put(NULL).

regards,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12  0:21 Richard Weinberger
  2012-02-12  0:25 ` your mail Jesper Juhl
  2012-02-12  1:02 ` Al Viro
@ 2012-02-12 19:11 ` Al Viro
  2012-02-13  9:15   ` Jiri Slaby
  2 siblings, 1 reply; 657+ messages in thread
From: Al Viro @ 2012-02-12 19:11 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-kernel, user-mode-linux-devel, akpm, alan, gregkh

On Sun, Feb 12, 2012 at 01:21:10AM +0100, Richard Weinberger wrote:

> @@ -343,7 +267,7 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>  {
>  	struct chan *chan = data;
>  	struct line *line = chan->line;
> -	struct tty_struct *tty = line->tty;
> +	struct tty_struct *tty = tty_port_tty_get(&line->port);
>  	int err;
>  
>  	/*
> @@ -354,6 +278,9 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>  	spin_lock(&line->lock);
>  	err = flush_buffer(line);
>  	if (err == 0) {
> +		tty_kref_put(tty);
> +
> +		spin_unlock(&line->lock);
>  		return IRQ_NONE;
>  	} else if (err < 0) {
>  		line->head = line->buffer;
> @@ -365,9 +292,12 @@ static irqreturn_t line_write_interrupt(int irq, void *data)
>  		return IRQ_NONE;
>  
>  	tty_wakeup(tty);
> +	tty_kref_put(tty);
>  	return IRQ_HANDLED;
>  }

That, BTW, smells ugly.  Note that return before the last one has no
tty_kref_put() for a very good reason - it's under if (!tty).  And
just as line->tty, port->tty can become NULL, so tty_port_tty_get()
can, indeed, return NULL here.  Which makes the first tty_kref_put()
oopsable...

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12 12:40   ` Jiri Slaby
@ 2012-02-12 19:06     ` Al Viro
  2012-02-13  9:40       ` Jiri Slaby
  0 siblings, 1 reply; 657+ messages in thread
From: Al Viro @ 2012-02-12 19:06 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Richard Weinberger, linux-kernel, user-mode-linux-devel, akpm,
	alan, gregkh, Jiri Slaby

On Sun, Feb 12, 2012 at 01:40:47PM +0100, Jiri Slaby wrote:
> > Is tty_kref_put() safe in interrupt?  Here it seems to be OK, but in other
> > callers...  More or less at random: drivers/tty/serial/lantiq.c has it
> > called from lqasc_rx_int().  It seems to be possible to have it end up
> > calling ->ops->shutdown() and in this case that'd be lqasc_shutdown().
> > Which does a bunch of free_irq(), including the ->rx_irq, i.e. the one
> > we have it called from.  Alan?
> 
> I'm not Alan, but will reply anyway. Yes, it is safe (unless the driver
> does something tricky). In the driver you mention, this is uart_ops,
> called from tty_port_operations' ->shutdown. And that's a different from
> tty_operations' ->shutdown.
> 
> Yes, there are:
> * tty->ops
> * tty_port->ops
> * uart_port->ops
> 
> uart_port->ops->shutdown is supposed to tear down interrupts like in
> lantiq.c. It is called from tty_port->ops->shutdown. And that one is
> allowed to be called only from user context (tty->ops->close and
> tty->ops->hangup).

Yecchhh...  If I'm reading (and grepping) it right, there are only two
non-default instance of tty_operations ->shutdown() - pty and vt ones.
Lovely...  And while we are at it, vt instance is definitely not safe
from interrupts - calls console_lock().  Not that it was relevant in
this case...

It's probably too late in this case, but I would've called that method
->sync_cleanup().  Assuming I'm not misreading its intent and history...

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12  1:02 ` Al Viro
@ 2012-02-12 12:40   ` Jiri Slaby
  2012-02-12 19:06     ` Al Viro
  0 siblings, 1 reply; 657+ messages in thread
From: Jiri Slaby @ 2012-02-12 12:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Richard Weinberger, linux-kernel, user-mode-linux-devel, akpm,
	alan, gregkh, Jiri Slaby

On 02/12/2012 02:02 AM, Al Viro wrote:
> On Sun, Feb 12, 2012 at 01:21:10AM +0100, Richard Weinberger wrote:
>> +++ b/arch/um/drivers/line.c
>> @@ -19,19 +19,29 @@ static irqreturn_t line_interrupt(int irq, void *data)
>>  {
>>  	struct chan *chan = data;
>>  	struct line *line = chan->line;
>> +	struct tty_struct *tty;
>> +
>> +	if (line) {
>> +		tty = tty_port_tty_get(&line->port);
>> +		chan_interrupt(&line->chan_list, &line->task, tty, irq);
>> +		tty_kref_put(tty);
>> +	}
>>  
>> -	if (line)
>> -		chan_interrupt(&line->chan_list, &line->task, line->tty, irq);
>>  	return IRQ_HANDLED;
>>  }
> 
> Is tty_kref_put() safe in interrupt?  Here it seems to be OK, but in other
> callers...  More or less at random: drivers/tty/serial/lantiq.c has it
> called from lqasc_rx_int().  It seems to be possible to have it end up
> calling ->ops->shutdown() and in this case that'd be lqasc_shutdown().
> Which does a bunch of free_irq(), including the ->rx_irq, i.e. the one
> we have it called from.  Alan?

I'm not Alan, but will reply anyway. Yes, it is safe (unless the driver
does something tricky). In the driver you mention, this is uart_ops,
called from tty_port_operations' ->shutdown. And that's a different from
tty_operations' ->shutdown.

Yes, there are:
* tty->ops
* tty_port->ops
* uart_port->ops

uart_port->ops->shutdown is supposed to tear down interrupts like in
lantiq.c. It is called from tty_port->ops->shutdown. And that one is
allowed to be called only from user context (tty->ops->close and
tty->ops->hangup).

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12  0:21 Richard Weinberger
  2012-02-12  0:25 ` your mail Jesper Juhl
@ 2012-02-12  1:02 ` Al Viro
  2012-02-12 12:40   ` Jiri Slaby
  2012-02-12 19:11 ` Al Viro
  2 siblings, 1 reply; 657+ messages in thread
From: Al Viro @ 2012-02-12  1:02 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-kernel, user-mode-linux-devel, akpm, alan, gregkh

On Sun, Feb 12, 2012 at 01:21:10AM +0100, Richard Weinberger wrote:

Not a full review by any means, but...

> +++ b/arch/um/drivers/line.c
> @@ -19,19 +19,29 @@ static irqreturn_t line_interrupt(int irq, void *data)
>  {
>  	struct chan *chan = data;
>  	struct line *line = chan->line;
> +	struct tty_struct *tty;
> +
> +	if (line) {
> +		tty = tty_port_tty_get(&line->port);
> +		chan_interrupt(&line->chan_list, &line->task, tty, irq);
> +		tty_kref_put(tty);
> +	}
>  
> -	if (line)
> -		chan_interrupt(&line->chan_list, &line->task, line->tty, irq);
>  	return IRQ_HANDLED;
>  }

Is tty_kref_put() safe in interrupt?  Here it seems to be OK, but in other
callers...  More or less at random: drivers/tty/serial/lantiq.c has it
called from lqasc_rx_int().  It seems to be possible to have it end up
calling ->ops->shutdown() and in this case that'd be lqasc_shutdown().
Which does a bunch of free_irq(), including the ->rx_irq, i.e. the one
we have it called from.  Alan?

> @@ -495,13 +413,6 @@ static int setup_one_line(struct line *lines, int n, char *init, int init_prio,
>  	struct line *line = &lines[n];
>  	int err = -EINVAL;
>  
> -	spin_lock(&line->count_lock);
> -
> -	if (line->count) {
> -		*error_out = "Device is already open";
> -		goto out;
> -	}

... and similar in line_open() - just what happens if you try to reconfigure
an opened one?

> @@ -612,13 +523,15 @@ int line_get_config(char *name, struct line *lines, unsigned int num, char *str,
>  
>  	line = &lines[dev];
>  
> -	spin_lock(&line->count_lock);
> +	tty = tty_port_tty_get(&line->port);
> +
>  	if (!line->valid)
>  		CONFIG_CHUNK(str, size, n, "none", 1);
> -	else if (line->tty == NULL)
> +	else if (tty == NULL)
>  		CONFIG_CHUNK(str, size, n, line->init_str, 1);
>  	else n = chan_config_string(&line->chan_list, str, size, error_out);
> -	spin_unlock(&line->count_lock);
> +
> +	tty_kref_put(tty);

again, where's the exclusion with config changes?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2012-02-12  0:21 Richard Weinberger
@ 2012-02-12  0:25 ` Jesper Juhl
  2012-02-12  1:02 ` Al Viro
  2012-02-12 19:11 ` Al Viro
  2 siblings, 0 replies; 657+ messages in thread
From: Jesper Juhl @ 2012-02-12  0:25 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-kernel, user-mode-linux-devel, viro, akpm, alan, gregkh

On Sun, 12 Feb 2012, Richard Weinberger wrote:

> Can you please review this patch?
> 

A subject on the mail along with a description of the patch would make 
that a great deal easier...

-- 
Jesper Juhl <jj@chaosbits.net>       http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20120110061735.9BD676BA98@mailhub.coreip.homeip.net>
@ 2012-01-10  7:45 ` Dmitry Torokhov
  0 siblings, 0 replies; 657+ messages in thread
From: Dmitry Torokhov @ 2012-01-10  7:45 UTC (permalink / raw)
  To: Milton Miller; +Cc: Che-Liang Chiou, linux-kernel

On Mon, Jan 09, 2012 at 10:17:35PM -0800, Milton Miller wrote:
> Subject	Re: [PATCH 1/2] Input: serio_raw - cosmetic fixes
> In-Reply-To: <20120109082412.GC4049@core.coreip.homeip.net>
> References: <20120109082412.GC4049@core.coreip.homeip.net>
> 	<1325847795-30486-1-git-send-email-clchiou@chromium.org>
> Date: Tue, 10 Jan 2012 00:14:35 -0600
> Subject: (No subject header)
> X-Originating-IP: 71.22.127.106
> Message-ID: <1326176075_1502@mail4.comsite.net>
> 
> On Mon, 9 Jan 2012 about 00:24:12 -0800, Dmitry Torokhov wrote:
> > >  	struct serio_raw_client *client = file->private_data;
> > >  	struct serio_raw *serio_raw = client->serio_raw;
> > > -	unsigned int mask;
> > > 
> > >  	poll_wait(file, &serio_raw->wait, wait);
> > > 
> > > -	mask = serio_raw->dead ? POLLHUP | POLLERR : POLLOUT | POLLWRNORM;
> > >  	if (serio_raw->head != serio_raw->tail)
> > >  		return POLLIN | POLLRDNORM;
> > > 
> > 
> > This however is not quite correct. I will be applying the patch below
> > instead.
> > 
> > 
> > diff --git a/drivers/input/serio/serio_raw.c b/drivers/input/serio/serio_raw.c
> > index ca78a89..c2c6ad8 100644
> > --- a/drivers/input/serio/serio_raw.c
> > +++ b/drivers/input/serio/serio_raw.c
> > @@ -237,7 +237,7 @@ static unsigned int serio_raw_poll(struct file *file, poll_table *wait)
> >  
> >  	mask = serio_raw->dead ? POLLHUP | POLLERR : POLLOUT | POLLWRNORM;
> >  	if (serio_raw->head != serio_raw->tail)
> > -		return POLLIN | POLLRDNORM;
> > +		mask |= POLLIN | POLLRDNORM;
> >  
> >  	return 0;
> 
> doesn't this need to be changed to return mask?

Doh! Of course it does.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-09-21 21:54 jim.cromie
@ 2011-09-26 23:23 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2011-09-26 23:23 UTC (permalink / raw)
  To: jim.cromie; +Cc: jbaron, joe, bart.vanassche, linux-kernel

On Wed, Sep 21, 2011 at 03:54:49PM -0600, jim.cromie@gmail.com wrote:
> hi all,
> 
> this reworked* patchset enhances dynamic-debug with:

I need acks from Jason before I can apply any of this...


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-05-18 19:22   ` your mail Greg KH
@ 2011-05-18 20:35     ` Alessio Igor Bogani
  0 siblings, 0 replies; 657+ messages in thread
From: Alessio Igor Bogani @ 2011-05-18 20:35 UTC (permalink / raw)
  To: Greg KH
  Cc: Rusty Russell, Tim Bird, Christoph Hellwig, Anders Kaseorg,
	Tim Abbott, LKML, Linux Embedded, Jason Wessel, Dirk Behme

Dear Mr. Kroah-Hartman,

2011/5/18 Greg KH <greg@kroah.com>:
[...]
> Care to resend it without all the stuff above so someone (Rusty I guess)
> can apply it?

Sure! It'll follow in few minutes.

Thank you very much!

Ciao,
Alessio

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-05-18 18:55 ` Alessio Igor Bogani
@ 2011-05-18 19:22   ` Greg KH
  2011-05-18 20:35     ` Alessio Igor Bogani
  0 siblings, 1 reply; 657+ messages in thread
From: Greg KH @ 2011-05-18 19:22 UTC (permalink / raw)
  To: Alessio Igor Bogani
  Cc: Rusty Russell, Tim Bird, Christoph Hellwig, Anders Kaseorg,
	Tim Abbott, LKML, Linux Embedded, Jason Wessel, Dirk Behme

On Wed, May 18, 2011 at 08:55:25PM +0200, Alessio Igor Bogani wrote:
> Dear Mr. Bird, Dear Mr. Kroah-Hartman,
> 
> Sorry for my very bad English.
> 
> 2011/5/18 Tim Bird <tim.bird@am.sony.com>:
> [...]
> > Alessio - do you have any timings you can share for the speedup?
> 
> You can find a little benchmark using ftrace at end of this email:
> https://lkml.org/lkml/2011/4/5/341
> 
> > On 05/17/2011 04:22 PM, Greg KH wrote:
> >> On Tue, May 17, 2011 at 10:56:03PM +0200, Alessio Igor Bogani wrote:
> >>> This work was supported by a hardware donation from the CE Linux Forum.
> [...]
> >> Please explain why you make a change, not just who sponsored the change,
> >> that's not very interesting to developers.
> 
> You are right. I apologize.
> 
> This patch is a missing piece (not essential it is only a further little
> optimization) of this little patchset:
> https://lkml.org/lkml/2011/4/16/48
> 
> Unfortunately I forgot to include this patch in the series (my first error)
> then I avoided explaining the changes because I had thought that those were
> already enough explained in the cover-letter of the patchset (my second error).
> 
> Sorry for my mistakes.
> 
> Is this better?
> 
> Subject: [PATCH] module: Use binary search in lookup_symbol()
> 
> The function is_exported() with its helper function lookup_symbol() are used to
> verify if a provided symbol is effectively exported by the kernel or by the
> modules. Now that both have their symbols sorted we can replace a linear search
> with a binary search which provide a considerably speed-up.
> 
> This work was supported by a hardware donation from the CE Linux Forum.
> 
> Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>

Much better, I have no objection to this at all.

	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

Care to resend it without all the stuff above so someone (Rusty I guess)
can apply it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-05-16  9:44 ` your mail Felipe Balbi
@ 2011-05-16 10:07   ` Munegowda, Keshava
  0 siblings, 0 replies; 657+ messages in thread
From: Munegowda, Keshava @ 2011-05-16 10:07 UTC (permalink / raw)
  To: balbi; +Cc: linux-usb, linux-omap, linux-kernel, gadiyar, sameo, parthab

On Mon, May 16, 2011 at 3:14 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Mon, May 16, 2011 at 03:04:20PM +0530, Keshava Munegowda wrote:
>> Following 2 hwmod strcuture are added:
>> UHH hwmod of usbhs with uhh base address and
>> EHCI , OHCI irq and base addresses.
>> TLL hwmod of usbhs with the TLL base address and irq.
>>
>> Signed-off-by: Keshava Munegowda <keshava_mgowda@ti.com>
>
> missing subject line.

Ya, I have already correct it and resend this patch [RESEND] [PATCH 1/5]...

Regards
keshava

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-05-16  9:34 Keshava Munegowda
@ 2011-05-16  9:44 ` Felipe Balbi
  2011-05-16 10:07   ` Munegowda, Keshava
  0 siblings, 1 reply; 657+ messages in thread
From: Felipe Balbi @ 2011-05-16  9:44 UTC (permalink / raw)
  To: Keshava Munegowda
  Cc: linux-usb, linux-omap, linux-kernel, balbi, gadiyar, sameo, parthab

[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

Hi,

On Mon, May 16, 2011 at 03:04:20PM +0530, Keshava Munegowda wrote:
> Following 2 hwmod strcuture are added:
> UHH hwmod of usbhs with uhh base address and
> EHCI , OHCI irq and base addresses.
> TLL hwmod of usbhs with the TLL base address and irq.
> 
> Signed-off-by: Keshava Munegowda <keshava_mgowda@ti.com>

missing subject line.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-01-14  1:14 Omar Ramirez Luna
@ 2011-01-14  4:36 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2011-01-14  4:36 UTC (permalink / raw)
  To: Omar Ramirez Luna; +Cc: Felipe Contreras, devel, linux-kernel

On Thu, Jan 13, 2011 at 07:14:53PM -0600, Omar Ramirez Luna wrote:
> Please pull these changes for 2.6.38:
> 
> The following changes since commit 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5:
> 
>   Linux 2.6.37 (2011-01-04 16:50:19 -0800)
> 
> are available in the git repository at:
>   git://dev.omapzoom.org/pub/scm/tidspbridge/kernel-dspbridge.git for-gkh-2.6.38
> 
> Guzman Lugo, Fernando (1):
>       staging: tidspbridge: configure full L1 MMU range
> 
> Omar Ramirez Luna (1):
>       staging: tidspbridge: replace mbox callback with notifier_call
> 
>  drivers/staging/tidspbridge/core/tiomap3430.c |   15 +++++++--------
>  1 files changed, 7 insertions(+), 8 deletions(-)

You forgot a Subject: line.

Also, as these are just 2 patches, care to just email them so we can
review them?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-01-03 17:03 ` your mail Stanislaw Gruszka
@ 2011-01-04  5:17   ` Tejun Heo
  0 siblings, 0 replies; 657+ messages in thread
From: Tejun Heo @ 2011-01-04  5:17 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: castet.matthieu, linux-kernel, linux-usb, stf_xl

On Mon, Jan 03, 2011 at 06:03:17PM +0100, Stanislaw Gruszka wrote:
> On Mon, Jan 03, 2011 at 05:38:00PM +0100, castet.matthieu@free.fr wrote:
> > could you CC me on ueagle-atm.c patches.

Will try to, but maybe it's a good idea to add a MAINTAINERS entry?

> > From what I remind we sleep in the workqueue, that's why we couldn't use the
> > system one (freeze keyboard...). But may be the code changed.
> In case when firmware is not available we can sleep for a few seconds in
> work function. That's block keyboard driver who also use common workqueue.
> If recent Tejun workqueue rewrite allow to long sleep in work func and
> not hurt other workqueue users, patch is ok.

Yeap, work items can sleep all they want on the system_wq.  It won't
delay execution of other work items.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2011-01-03 16:38 castet.matthieu
@ 2011-01-03 17:03 ` Stanislaw Gruszka
  2011-01-04  5:17   ` Tejun Heo
  0 siblings, 1 reply; 657+ messages in thread
From: Stanislaw Gruszka @ 2011-01-03 17:03 UTC (permalink / raw)
  To: castet.matthieu; +Cc: linux-kernel, linux-usb, stf_xl, tj

On Mon, Jan 03, 2011 at 05:38:00PM +0100, castet.matthieu@free.fr wrote:
> Hi,
> 
> could you CC me on ueagle-atm.c patches.
> 
> From what I remind we sleep in the workqueue, that's why we couldn't use the
> system one (freeze keyboard...). But may be the code changed.
In case when firmware is not available we can sleep for a few seconds in
work function. That's block keyboard driver who also use common workqueue.
If recent Tejun workqueue rewrite allow to long sleep in work func and
not hurt other workqueue users, patch is ok.

Unfortunately I'm not able to test the patch, my ueagle device was physically
damaged a few months ago.

Stanislaw

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2010-06-13  6:16 Mike Gilks
@ 2010-06-18 23:52 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2010-06-18 23:52 UTC (permalink / raw)
  To: Mike Gilks; +Cc: gregkh, mchehab, julia, joe, devel, linux-kernel

On Sun, Jun 13, 2010 at 02:16:47PM +0800, Mike Gilks wrote:
> Subject:r8192U_core.c Last pass
> In-Reply-To: 
> 
> 
> This is the last patch I can manage for this file.
> Everything else to do with checkpatch.pl issues may require an actual developer to look at it.

I have a whole bunch of series of patches from you (one duplicating
Linus's patch, I don't think you ment to send that...)  So, which should
I apply?

How about I delete them all and you send me the latest ones that you
want me to apply, as I'm totally confused which is your latest version
and which isn't and I see lots of duplicates.

Sound good?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2010-04-15 23:41   ` Rafi Rubin
@ 2010-04-16  4:21     ` Dmitry Torokhov
  0 siblings, 0 replies; 657+ messages in thread
From: Dmitry Torokhov @ 2010-04-16  4:21 UTC (permalink / raw)
  To: Rafi Rubin; +Cc: Alan Cox, linux-i2c, khali, linux-input, linux-kernel

On Thu, Apr 15, 2010 at 07:41:22PM -0400, Rafi Rubin wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> >> +	if (ts->tc.event_sended == false) {
> > 
> > We set "event_sended" to false immediately before calling
> > cy8ctmg110_send_event() so I do not see the point of this flag.
> 
> On that note:
> 
> $ git grep -n sended
> drivers/net/eth16i.c:1295:
> 		how many packets there is to be sended */
> drivers/net/wan/sbni.c:638:
> 		/* if frame was sended but not ACK'ed - resend it */
> drivers/net/wan/sbni.c:659:
> 		* frame sended then in prepare_to_send next frame
> drivers/usb/serial/aircable.c:13:
> 		* next two bytes must say how much data will be sended.
> 

Well, if you want to go down that path...

[dtor@hammer work]$ grep -r -e "\(setted\|setuped\|split\+ed\)" . | wc -l
54
[dtor@hammer work]$ 

-- 
Dmitry

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2010-04-14 23:16 ` your mail Dmitry Torokhov
@ 2010-04-15 23:41   ` Rafi Rubin
  2010-04-16  4:21     ` Dmitry Torokhov
  0 siblings, 1 reply; 657+ messages in thread
From: Rafi Rubin @ 2010-04-15 23:41 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Alan Cox, linux-i2c, khali, linux-input, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> +	if (ts->tc.event_sended == false) {
> 
> We set "event_sended" to false immediately before calling
> cy8ctmg110_send_event() so I do not see the point of this flag.

On that note:

$ git grep -n sended
drivers/net/eth16i.c:1295:
		how many packets there is to be sended */
drivers/net/wan/sbni.c:638:
		/* if frame was sended but not ACK'ed - resend it */
drivers/net/wan/sbni.c:659:
		* frame sended then in prepare_to_send next frame
drivers/usb/serial/aircable.c:13:
		* next two bytes must say how much data will be sended.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvHpB4ACgkQwuRiAT9o609wAgCfbGjTP2lIN6JJyX28VzjPHxTY
ylIAn15FZRPpBEkWaFR8oAFKCCRmNF4d
=u4nx
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2010-04-14 12:54 Alan Cox
@ 2010-04-14 23:16 ` Dmitry Torokhov
  2010-04-15 23:41   ` Rafi Rubin
  0 siblings, 1 reply; 657+ messages in thread
From: Dmitry Torokhov @ 2010-04-14 23:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-i2c, khali, linux-input, linux-kernel

On Wed, Apr 14, 2010 at 01:54:02PM +0100, Alan Cox wrote:
> Subject: [FOR COMMENT] cy8ctmg110 for review
> 
> From: Samuli Konttila <samuli.konttila@aavamobile.com>
> 
> Add support for the cy8ctmg110 capacitive touchscreen used on some embedded
> devices.
> 
> (Some clean up by Alan Cox)
> 
> (No signed off, not yet ready to go in)
> ---
> 
>  drivers/input/touchscreen/Kconfig         |   12 +
>  drivers/input/touchscreen/Makefile        |    3 
>  drivers/input/touchscreen/cy8ctmg110_ts.c |  521 +++++++++++++++++++++++++++++
>  3 files changed, 535 insertions(+), 1 deletions(-)
>  create mode 100644 drivers/input/touchscreen/cy8ctmg110_ts.c
> 
> 
> diff --git a/drivers/input/touchscreen/Kconfig b/drivers/input/touchscreen/Kconfig
> index b3ba374..89a3eb1 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -591,4 +591,16 @@ config TOUCHSCREEN_TPS6507X
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called tps6507x_ts.
>  
> +config TOUCHSCREEN_CY8CTMG110
> +	tristate "cy8ctmg110 touchscreen"
> +	depends on I2C
> +	help
> +	  Say Y here if you have a cy8ctmg110 touchscreen capacitive
> +	  touchscreen
> +
> +	  If unsure, say N.
> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called cy8ctmg110_ts.
> +
>  endif
> diff --git a/drivers/input/touchscreen/Makefile b/drivers/input/touchscreen/Makefile
> index dfb7239..c7acb65 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -1,5 +1,5 @@
>  #
> -# Makefile for the touchscreen drivers.
> +# Makefile for the touchscreen drivers.mororor
>  #
>  
>  # Each configuration option enables a list of files.
> @@ -12,6 +12,7 @@ obj-$(CONFIG_TOUCHSCREEN_AD7879)	+= ad7879.o
>  obj-$(CONFIG_TOUCHSCREEN_ADS7846)	+= ads7846.o
>  obj-$(CONFIG_TOUCHSCREEN_ATMEL_TSADCC)	+= atmel_tsadcc.o
>  obj-$(CONFIG_TOUCHSCREEN_BITSY)		+= h3600_ts_input.o
> +obj-$(CONFIG_TOUCHSCREEN_CY8CTMG110)    += cy8ctmg110_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_DYNAPRO)	+= dynapro.o
>  obj-$(CONFIG_TOUCHSCREEN_GUNZE)		+= gunze.o
>  obj-$(CONFIG_TOUCHSCREEN_EETI)		+= eeti_ts.o
> diff --git a/drivers/input/touchscreen/cy8ctmg110_ts.c b/drivers/input/touchscreen/cy8ctmg110_ts.c
> new file mode 100644
> index 0000000..4adbe87
> --- /dev/null
> +++ b/drivers/input/touchscreen/cy8ctmg110_ts.c
> @@ -0,0 +1,521 @@
> +/*
> + * cy8ctmg110_ts.c Driver for cypress touch screen controller
> + * Copyright (c) 2009 Aava Mobile
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/input.h>
> +#include <linux/slab.h>
> +#include <linux/interrupt.h>
> +#include <asm/io.h>
> +#include <linux/i2c.h>
> +#include <linux/timer.h>
> +#include <linux/gpio.h>
> +#include <linux/hrtimer.h>
> +
> +#include <linux/platform_device.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <asm/ioctl.h>
> +#include <asm/uaccess.h>
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/delay.h>
> +#include <linux/fs.h>
> +#include <asm/ioctl.h>
> +#include <linux/fs.h>
> +#include <linux/init.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +
> +
> +#define CY8CTMG110_DRIVER_NAME      "cy8ctmg110"
> +
> +
> +/*HW definations*/
> +#define CY8CTMG110_RESET_PIN_GPIO   43
> +#define CY8CTMG110_IRQ_PIN_GPIO     59
> +#define CY8CTMG110_I2C_ADDR         0x38
> +#define CY8CTMG110_I2C_ADDR_EXT     0x39
> +#define CY8CTMG110_I2C_ADDR_        0x2	/*i2c address first sample */
> +#define CY8CTMG110_I2C_ADDR__       53	/*i2c address to FW where irq support missing */
> +#define CY8CTMG110_TOUCH_IRQ        21
> +#define CY8CTMG110_TOUCH_LENGHT     9787
> +#define CY8CTMG110_SCREEN_LENGHT    8424
> +
> +
> +/*Touch coordinates*/
> +#define CY8CTMG110_X_MIN        0
> +#define CY8CTMG110_Y_MIN        0
> +#define CY8CTMG110_X_MAX        864
> +#define CY8CTMG110_Y_MAX        480
> +
> +
> +/*cy8ctmg110 registers defination*/
> +#define CY8CTMG110_TOUCH_WAKEUP_TIME   0
> +#define CY8CTMG110_TOUCH_SLEEP_TIME    2
> +#define CY8CTMG110_TOUCH_X1            3
> +#define CY8CTMG110_TOUCH_Y1            5
> +#define CY8CTMG110_TOUCH_X2            7
> +#define CY8CTMG110_TOUCH_Y2            9
> +#define CY8CTMG110_FINGERS             11
> +#define CY8CTMG110_GESTURE             12
> +#define CY8CTMG110_REG_MAX             13
> +
> +#define CY8CTMG110_POLL_TIMER_DELAY  1000*1000*100
> +#define TOUCH_MAX_I2C_FAILS          50
> +
> +/* Scale factors for coordinates */
> +#define X_SCALE_FACTOR 9387/8424
> +#define Y_SCALE_FACTOR 97/100
> +
> +/* For tracing */
> +static int g_y_trace_coord = 0;
> +module_param(g_y_trace_coord, int, 0600);
> +
> +/* Polling mode */
> +static int polling = 0;
> +module_param(polling, int, 0);
> +MODULE_PARM_DESC(polling, "Set to enabling polling of the touchscreen");
> +
> +
> +/*
> + * The touch position structure.
> + */
> +struct ts_event {
> +	int x1;
> +	int y1;
> +	int x2;
> +	int y2;
> +	bool event_sended;
> +};
> +
> +/*
> + * The touch driver structure.
> + */
> +struct cy8ctmg110 {
> +	struct input_dev *input;
> +	char phys[32];
> +	struct ts_event tc;
> +	struct i2c_client *client;
> +	bool pending;
> +	spinlock_t lock;
> +	bool initController;
> +	bool sleepmode;
> +	int i2c_fail_count;
> +	struct hrtimer timer;
> +};
> +
> +/*
> + * cy8ctmg110_poweroff is the routine that is called when touch hardware 
> + * will powered off
> + */
> +static void cy8ctmg110_power(bool poweron)
> +{
> +	if (poweron)
> +		gpio_direction_output(CY8CTMG110_RESET_PIN_GPIO, 0);
> +	else
> +		gpio_direction_output(CY8CTMG110_RESET_PIN_GPIO, 1);
> +}
> +
> +/*
> + * cy8ctmg110_write_req write regs to the i2c devices
> + * 
> + */
> +static int cy8ctmg110_write_req(struct cy8ctmg110 *tsc, unsigned char reg,
> +		unsigned char len, unsigned char *value)
> +{
> +	struct i2c_client *client = tsc->client;
> +	unsigned int ret;
> +	unsigned char i2c_data[] = { 0, 0, 0, 0, 0, 0 };
> +	struct i2c_msg msg[] = {
> +			{client->addr, 0, len + 1, i2c_data},
> +			};
> +
> +	i2c_data[0] = reg;
> +	memcpy(i2c_data + 1, value, len);
> +
> +	ret = i2c_transfer(client->adapter, msg, 1);
> +	if (ret != 1) {
> +		printk("cy8ctmg110 touch : i2c write data cmd failed \n");
> +		return ret;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * cy8ctmg110_read_req read regs from i2c devise
> + * 
> + */
> +
> +static int cy8ctmg110_read_req(struct cy8ctmg110 *tsc,
> +		unsigned char *i2c_data, unsigned char len, unsigned char cmd)
> +{
> +	struct i2c_client *client = tsc->client;
> +	unsigned int ret;
> +	unsigned char regs_cmd[2] = { 0, 0 };
> +	struct i2c_msg msg1[] = {
> +		{client->addr, 0, 1, regs_cmd},
> +	};
> +	struct i2c_msg msg2[] = {
> +		{client->addr, I2C_M_RD, len, i2c_data},
> +	};
> +
> +	regs_cmd[0] = cmd;
> +
> +	/* first write slave position to i2c devices */
> +	ret = i2c_transfer(client->adapter, msg1, 1);
> +	if (ret != 1) {
> +		tsc->i2c_fail_count++;
> +		return ret;
> +	}
> +
> +	/* Second read data from position */
> +	ret = i2c_transfer(client->adapter, msg2, 1);
> +	if (ret != 1) {
> +		tsc->i2c_fail_count++;
> +		return ret;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * cy8ctmg110_send_event delevery touch event to the userpace
> + * function use normal input interface
> + */
> +static void cy8ctmg110_send_event(void *tsc)
> +{
> +	struct cy8ctmg110 *ts = tsc;
> +	struct input_dev *input = ts->input;
> +	u16 x, y;
> +	u16 x2, y2;
> +
> +	x = ts->tc.x1;
> +	y = ts->tc.y1;
> +
> +	if (ts->tc.event_sended == false) {

We set "event_sended" to false immediately before calling
cy8ctmg110_send_event() so I do not see the point of this flag.

> +		input_report_key(input, BTN_TOUCH, 1);
> +		ts->pending = true;
> +		x2 = (u16) (y * X_SCALE_FACTOR);
> +		y2 = (u16) (x * Y_SCALE_FACTOR);
> +		input_report_abs(input, ABS_X, x2);
> +		input_report_abs(input, ABS_Y, y2);
> +		input_sync(input);
> +		if (g_y_trace_coord)
> +			printk("cy8ctmg110 touch position X:%d (was = %d) Y:%d (was = %d)\n", x2, y, y2, x);

Do we really need this? Seems to be early development diagnostic.

> +	}
> +
> +}
> +
> +/*
> + * cy8ctmg110_touch_pos check touch position from i2c devices
> + * 
> + */
> +static int cy8ctmg110_touch_pos(struct cy8ctmg110 *tsc)
> +{
> +	unsigned char reg_p[CY8CTMG110_REG_MAX];
> +	int x, y;
> +
> +	memset(reg_p, 0, CY8CTMG110_REG_MAX);
> +
> +	/*Reading coordinates */
> +	if (cy8ctmg110_read_req(tsc, reg_p, 9, CY8CTMG110_TOUCH_X1) != 0)
> +		return -EIO;
> +		
> +	y = reg_p[2] << 8 | reg_p[3];
> +	x = reg_p[0] << 8 | reg_p[1];
> +		/*number of touch */
> +	if (reg_p[8] == 0) {
> +		if (tsc->pending == true) {
> +			struct input_dev *input = tsc->input;
> +
> +			input_report_key(input, BTN_TOUCH, 0);
> +			tsc->tc.event_sended = true;
> +			tsc->pending = false;
> +		}

Just do input_report_key(input, BTN_TOUCH, 0); and let input core take
care of filtering duplicates. This will allow you get rid of bunch of
flags. Also input_sync() is missing here.

> +	} else if (tsc->tc.x1 != x || tsc->tc.y1 != y) {
> +		tsc->tc.y1 = y;
> +		tsc->tc.x1 = x;
> +		tsc->tc.event_sended = false;
> +		cy8ctmg110_send_event(tsc);
> +	}
> +	return 0;
> +}
> +
> +/*
> + * if interrupt isn't in use the touch positions can reads by polling
> + * 
> + */
> +static enum hrtimer_restart cy8ctmg110_timer(struct hrtimer *handle)
> +{
> +	struct cy8ctmg110 *ts = container_of(handle, struct cy8ctmg110, timer);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ts->lock, flags);
> +
> +	cy8ctmg110_touch_pos(ts);
> +	if (ts->i2c_fail_count < TOUCH_MAX_I2C_FAILS)
> +		hrtimer_start(&ts->timer, ktime_set(0, CY8CTMG110_POLL_TIMER_DELAY), HRTIMER_MODE_REL);
> +

So device simply dies after so many errors?

> +	spin_unlock_irqrestore(&ts->lock, flags);

The timer handler is the only user for the spinlock, what is the point?

> +	return HRTIMER_NORESTART;
> +}
> +
> +/*
> + * 
> + */
> +static bool cy8ctmg110_set_sleepmode(struct cy8ctmg110 *ts)
> +{
> +	unsigned char reg_p[3];
> +
> +	if (ts->sleepmode == true) {
> +		reg_p[0] = 0x00;
> +		reg_p[1] = 0xff;
> +		reg_p[2] = 5;
> +	} else {
> +		reg_p[0] = 0x10;
> +		reg_p[1] = 0xff;
> +		reg_p[2] = 0;
> +	}
> +
> +	if (cy8ctmg110_write_req(ts, CY8CTMG110_TOUCH_WAKEUP_TIME, 3, reg_p))
> +		return false;
> +
> +	ts->initController = true;
> +	return true;
> +}
> +
> +/*
> + * cy8ctmg110_irq_handler irq handling function
> + * 
> + */
> +
> +static irqreturn_t cy8ctmg110_irq_handler(int irq, void *dev_id)
> +{
> +	struct cy8ctmg110 *tsc = (struct cy8ctmg110 *) dev_id;
> +
> +	if (tsc->initController == false) {
> +		if (cy8ctmg110_set_sleepmode(tsc) == true)
> +			tsc->initController = true;
> +	} else
> +		cy8ctmg110_touch_pos(tsc);

Initalizing device from interrupt handler is quite novel concept...

> +
> +	/* if interrupt supported in the touch controller
> +	   timer polling need to stop */
> +	tsc->i2c_fail_count = TOUCH_MAX_I2C_FAILS;
> +	return IRQ_HANDLED;
> +}
> +
> +
> +static int cy8ctmg110_probe(struct i2c_client *client, const struct i2c_device_id *id)
> +{
> +	struct cy8ctmg110 *ts;
> +	struct input_dev *input_dev;
> +	int err;
> +	client->irq = CY8CTMG110_TOUCH_IRQ;
> +
> +	if (!i2c_check_functionality(client->adapter,
> +					I2C_FUNC_SMBUS_READ_WORD_DATA))
> +		return -EIO;
> +
> +	ts = kzalloc(sizeof(struct cy8ctmg110), GFP_KERNEL);
> +	input_dev = input_allocate_device();
> +
> +	if (!ts || !input_dev) {
> +		err = -ENOMEM;
> +		goto err_free_mem;
> +	}
> +
> +	ts->client = client;
> +	i2c_set_clientdata(client, ts);
> +
> +	ts->input = input_dev;
> +	ts->pending = false;
> +	ts->sleepmode = false;
> +
> +	snprintf(ts->phys, sizeof(ts->phys), "%s/input0",
> +						dev_name(&client->dev));
> +
> +	input_dev->name = CY8CTMG110_DRIVER_NAME " Touchscreen";
> +	input_dev->phys = ts->phys;
> +	input_dev->id.bustype = BUS_I2C;
> +
> +	spin_lock_init(&ts->lock);
> +
> +	input_dev->evbit[0] = BIT_MASK(EV_KEY) | BIT_MASK(EV_REP) |

You usually do not set up autorepeat for pointingt devices.

> +					BIT_MASK(EV_REL) | BIT_MASK(EV_ABS);

The device does not emit relative events.

> +	input_dev->keybit[BIT_WORD(BTN_TOUCH)] = BIT_MASK(BTN_TOUCH);
> +
> +	input_set_capability(input_dev, EV_KEY, KEY_F);

KEY_F?

> +
> +	input_set_abs_params(input_dev, ABS_X, CY8CTMG110_X_MIN, CY8CTMG110_X_MAX, 0, 0);
> +	input_set_abs_params(input_dev, ABS_Y, CY8CTMG110_Y_MIN, CY8CTMG110_Y_MAX, 0, 0);
> +
> +	err = gpio_request(CY8CTMG110_RESET_PIN_GPIO, NULL);
> +
> +	if (err) {
> +		dev_err(&client->dev, "cy8ctmg110_ts: Unable to request GPIO pin %d.\n",
> +						CY8CTMG110_RESET_PIN_GPIO);
> +		goto err_free_irq;
> +	}
> +	cy8ctmg110_power(true);
> +
> +	ts->initController = false;
> +	ts->i2c_fail_count = 0;
> +
> +	hrtimer_init(&ts->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	ts->timer.function = cy8ctmg110_timer;
> +
> +	if (polling)
> +		hrtimer_start(&ts->timer, ktime_set(10, 0), HRTIMER_MODE_REL);
> +

Polling mode shoudl be controlled by platform data, not kernel module I think.

> +	/* Can we fall back to polling if these bits fail - something to look
> +	   at for robustness */
> +
> +	err = gpio_request(CY8CTMG110_IRQ_PIN_GPIO, "touch_irq_key");
> +	if (err < 0) {
> +		dev_err(&client->dev,
> +			"cy8ctmg110_ts: failed to request GPIO %d, error %d\n",
> +						CY8CTMG110_IRQ_PIN_GPIO, err);
> +		goto err_free_timer;
> +	}
> +
> +	err = gpio_direction_input(CY8CTMG110_IRQ_PIN_GPIO);
> +
> +	if (err < 0) {
> +		dev_err(&client->dev,
> +			"cy8ctmg110_ts: failed to configure input direction for GPIO %d, error %d\n",
> +						CY8CTMG110_IRQ_PIN_GPIO, err);
> +		goto err_free_gpio;
> +	}
> +	client->irq = gpio_to_irq(CY8CTMG110_IRQ_PIN_GPIO);
> +
> +	if (client->irq < 0) {
> +		err = client->irq;
> +		dev_err(&client->dev,
> +	"cy8ctmg110_ts: Unable to get irq number" " for GPIO %d, error %d\n",
> +						CY8CTMG110_IRQ_PIN_GPIO, err);
> +		goto err_free_gpio;
> +	}
> +	err = request_irq(client->irq, cy8ctmg110_irq_handler, IRQF_TRIGGER_RISING | IRQF_SHARED, "touch_reset_key", ts);
> +	if (err < 0) {
> +		dev_err(&client->dev,
> +			"cy8ctmg110 irq %d busy? error %d\n",
> +				client->irq, err);
> +		goto err_free_gpio;
> +	}
> +
> +	err = input_register_device(input_dev);
> +	if (!err)
> +		return 0;
> +err_free_gpio:
> +	gpio_free(CY8CTMG110_IRQ_PIN_GPIO);
> +err_free_timer:
> +	if (polling)
> +		hrtimer_cancel(&ts->timer);
> +err_free_irq:
> +	free_irq(client->irq, ts);
> +err_free_mem:
> +	input_free_device(input_dev);
> +	kfree(ts);
> +	return err;
> +}
> +
> +/*
> + * cy8ctmg110_suspend
> + * 
> + */
> +
> +static int cy8ctmg110_suspend(struct i2c_client *client, pm_message_t mesg)
> +{

Stop timer here? Also power down the device?

> +	if (device_may_wakeup(&client->dev))
> +		enable_irq_wake(client->irq);
> +
> +	return 0;
> +}
> +
> +/*
> + * cy8ctmg110_resume 
> + * 
> + */
> +
> +static int cy8ctmg110_resume(struct i2c_client *client)
> +{
> +	if (device_may_wakeup(&client->dev))
> +		disable_irq_wake(client->irq);
> +
> +	return 0;
> +}
> +
> +/*
> + * cy8ctmg110_remove
> + * 
> + */
> +
> +static int cy8ctmg110_remove(struct i2c_client *client)
> +{
> +	struct cy8ctmg110 *ts = i2c_get_clientdata(client);
> +
> +	cy8ctmg110_power(false);
> +
> +	if (polling)
> +		hrtimer_cancel(&ts->timer);

Implement close() method and move the code above there? Also do open().

> +	free_irq(client->irq, ts);
> +	input_unregister_device(ts->input);
> +	/* FIXME: Do we need to free the GPIO ? */
> +	kfree(ts);
> +	return 0;
> +}
> +
> +static struct i2c_device_id cy8ctmg110_idtable[] = {
> +	{CY8CTMG110_DRIVER_NAME, 1},
> +	{}
> +};
> +
> +MODULE_DEVICE_TABLE(i2c, cy8ctmg110_idtable);
> +
> +static struct i2c_driver cy8ctmg110_driver = {
> +	.driver = {
> +		   .owner = THIS_MODULE,
> +		   .name = CY8CTMG110_DRIVER_NAME,
> +		   .bus = &i2c_bus_type,
> +		   },
> +	.id_table = cy8ctmg110_idtable,
> +	.probe = cy8ctmg110_probe,
> +	.remove = cy8ctmg110_remove,
> +	.suspend = cy8ctmg110_suspend,
> +	.resume = cy8ctmg110_resume,
> +};
> +
> +static int __init cy8ctmg110_init(void)
> +{
> +	return i2c_add_driver(&cy8ctmg110_driver);
> +}
> +
> +static void __exit cy8ctmg110_exit(void)
> +{
> +	i2c_del_driver(&cy8ctmg110_driver);
> +}
> +
> +module_init(cy8ctmg110_init);
> +module_exit(cy8ctmg110_exit);
> +
> +MODULE_AUTHOR("Samuli Konttila <samuli.konttila@aavamobile.com>");
> +MODULE_DESCRIPTION("cy8ctmg110 TouchScreen Driver");
> +MODULE_LICENSE("GPL v2");
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-input" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Dmitry

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20100113004939.289333186@suse.com>
@ 2010-01-13 14:57 ` scameron
  0 siblings, 0 replies; 657+ messages in thread
From: scameron @ 2010-01-13 14:57 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: Linux Kernel Mailing List, Andrew Morton, Linux SCSI

On Tue, Jan 12, 2010 at 07:49:00PM -0500, Jeff Mahoney wrote:
> Subject: [patch 5/6] hpsa: Fix section mismatch
> References: <20100113004855.550486769@suse.com>
> Content-Disposition: inline; filename=patches.rpmify/hpsa-fix-section-mismatch
> 
>  hpsa_pci_init calls hpsa_interrupt_mode which is a __devinit function.
>  hpsa_pci_init is only called by hpsa_init_one which is also __devinit, so
>  mark it __devinit as well.
> 
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  drivers/scsi/hpsa.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -3111,7 +3111,7 @@ default_int_mode:
>  	return;
>  }
>  
> -static int hpsa_pci_init(struct ctlr_info *h, struct pci_dev *pdev)
> +static int __devinit hpsa_pci_init(struct ctlr_info *h, struct pci_dev *pdev)
>  {
>  	ushort subsystem_vendor_id, subsystem_device_id, command;
>  	__u32 board_id, scratchpad = 0;
> 

Thanks.

-- steve


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-05-07 10:20                   ` your mail Ingo Molnar
@ 2009-05-08  3:27                     ` Casey Schaufler
  0 siblings, 0 replies; 657+ messages in thread
From: Casey Schaufler @ 2009-05-08  3:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Morris, Chris Wright, Oleg Nesterov, Roland McGrath,
	Andrew Morton, linux-kernel, Al Viro, linux-security-module

Ingo Molnar wrote:
> * James Morris <jmorris@namei.org> wrote:
>
>   
>> On Thu, 7 May 2009, Chris Wright wrote:
>>
>>     
>>> * Ingo Molnar (mingo@elte.hu) wrote:
>>>       
>> [Added LSM list to the CC; please do so whenever making changes in this 
>> area...]
>>
>>     
>>>> They have no active connection to the core kernel 
>>>> ptrace_may_access() check in any case:
>>>>         
>>> Not sure what you mean:
>>>
>>> ptrace_may_access
>>>  __ptrace_may_access
>>>   security_ptrace_may_access
>>>
>>> Looks like your patch won't compile.
>>>
>>>       
>> Below is an updated version which fixes the bug, against 
>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
>>
>> Boot tested with SELinux.
>>     
>
> thanks! Below are the two patches i wrote and tested.
>   

I hate to make an assumption regarding whether or not your tests
included Smack, so I'll ask. Does tested mean with Smack?

Thank you.

> 	Ingo
>
> ----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----
>
> Date: Thu, 7 May 2009 11:49:47 +0200
> From: Ingo Molnar <mingo@elte.hu>
> To: Chris Wright <chrisw@sous-sol.org>
> Subject: [patch 1/2] ptrace, security: rename ptrace_may_access =>
> 	ptrace_access_check
> Cc: Oleg Nesterov <oleg@redhat.com>, Roland McGrath <roland@redhat.com>,
> 	Andrew Morton <akpm@linux-foundation.org>,
> 	linux-kernel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>
>
> The ptrace_may_access() methods are named confusingly - some 
> variants return a bool, while the security subsystem methods have a 
> retval convention.
>
> Rename it to ptrace_access_check, to reduce the confusion factor. A 
> followup patch eliminates the bool usage.
>
> [ Impact: cleanup, no code changed ]
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Cc: Roland McGrath <roland@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Chris Wright <chrisw@sous-sol.org>
> Cc: Al Viro <viro@ZenIV.linux.org.uk>
> Cc: Oleg Nesterov <oleg@redhat.com>
> LKML-Reference: <20090507084943.GB19133@elte.hu>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  fs/proc/array.c            |    2 +-
>  fs/proc/base.c             |   10 +++++-----
>  fs/proc/task_mmu.c         |    2 +-
>  include/linux/ptrace.h     |    4 ++--
>  include/linux/security.h   |   14 +++++++-------
>  kernel/ptrace.c            |   10 +++++-----
>  security/capability.c      |    2 +-
>  security/commoncap.c       |    4 ++--
>  security/root_plug.c       |    2 +-
>  security/security.c        |    4 ++--
>  security/selinux/hooks.c   |    6 +++---
>  security/smack/smack_lsm.c |    8 ++++----
>  12 files changed, 34 insertions(+), 34 deletions(-)
>
> Index: linux/fs/proc/array.c
> ===================================================================
> --- linux.orig/fs/proc/array.c
> +++ linux/fs/proc/array.c
> @@ -366,7 +366,7 @@ static int do_task_stat(struct seq_file 
>  
>  	state = *get_task_state(task);
>  	vsize = eip = esp = 0;
> -	permitted = ptrace_may_access(task, PTRACE_MODE_READ);
> +	permitted = ptrace_access_check(task, PTRACE_MODE_READ);
>  	mm = get_task_mm(task);
>  	if (mm) {
>  		vsize = task_vsize(mm);
> Index: linux/fs/proc/base.c
> ===================================================================
> --- linux.orig/fs/proc/base.c
> +++ linux/fs/proc/base.c
> @@ -222,7 +222,7 @@ static int check_mem_permission(struct t
>  		rcu_read_lock();
>  		match = (tracehook_tracer_task(task) == current);
>  		rcu_read_unlock();
> -		if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
> +		if (match && ptrace_access_check(task, PTRACE_MODE_ATTACH))
>  			return 0;
>  	}
>  
> @@ -242,7 +242,7 @@ struct mm_struct *mm_for_maps(struct tas
>  	if (task->mm != mm)
>  		goto out;
>  	if (task->mm != current->mm &&
> -	    __ptrace_may_access(task, PTRACE_MODE_READ) < 0)
> +	    __ptrace_access_check(task, PTRACE_MODE_READ) < 0)
>  		goto out;
>  	task_unlock(task);
>  	return mm;
> @@ -322,7 +322,7 @@ static int proc_pid_wchan(struct task_st
>  	wchan = get_wchan(task);
>  
>  	if (lookup_symbol_name(wchan, symname) < 0)
> -		if (!ptrace_may_access(task, PTRACE_MODE_READ))
> +		if (!ptrace_access_check(task, PTRACE_MODE_READ))
>  			return 0;
>  		else
>  			return sprintf(buffer, "%lu", wchan);
> @@ -559,7 +559,7 @@ static int proc_fd_access_allowed(struct
>  	 */
>  	task = get_proc_task(inode);
>  	if (task) {
> -		allowed = ptrace_may_access(task, PTRACE_MODE_READ);
> +		allowed = ptrace_access_check(task, PTRACE_MODE_READ);
>  		put_task_struct(task);
>  	}
>  	return allowed;
> @@ -938,7 +938,7 @@ static ssize_t environ_read(struct file 
>  	if (!task)
>  		goto out_no_task;
>  
> -	if (!ptrace_may_access(task, PTRACE_MODE_READ))
> +	if (!ptrace_access_check(task, PTRACE_MODE_READ))
>  		goto out;
>  
>  	ret = -ENOMEM;
> Index: linux/fs/proc/task_mmu.c
> ===================================================================
> --- linux.orig/fs/proc/task_mmu.c
> +++ linux/fs/proc/task_mmu.c
> @@ -656,7 +656,7 @@ static ssize_t pagemap_read(struct file 
>  		goto out;
>  
>  	ret = -EACCES;
> -	if (!ptrace_may_access(task, PTRACE_MODE_READ))
> +	if (!ptrace_access_check(task, PTRACE_MODE_READ))
>  		goto out_task;
>  
>  	ret = -EINVAL;
> Index: linux/include/linux/ptrace.h
> ===================================================================
> --- linux.orig/include/linux/ptrace.h
> +++ linux/include/linux/ptrace.h
> @@ -99,9 +99,9 @@ extern void ptrace_fork(struct task_stru
>  #define PTRACE_MODE_READ   1
>  #define PTRACE_MODE_ATTACH 2
>  /* Returns 0 on success, -errno on denial. */
> -extern int __ptrace_may_access(struct task_struct *task, unsigned int mode);
> +extern int __ptrace_access_check(struct task_struct *task, unsigned int mode);
>  /* Returns true on success, false on denial. */
> -extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
> +extern bool ptrace_access_check(struct task_struct *task, unsigned int mode);
>  
>  static inline int ptrace_reparented(struct task_struct *child)
>  {
> Index: linux/include/linux/security.h
> ===================================================================
> --- linux.orig/include/linux/security.h
> +++ linux/include/linux/security.h
> @@ -52,7 +52,7 @@ struct audit_krule;
>  extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
>  		       int cap, int audit);
>  extern int cap_settime(struct timespec *ts, struct timezone *tz);
> -extern int cap_ptrace_may_access(struct task_struct *child, unsigned int mode);
> +extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
>  extern int cap_ptrace_traceme(struct task_struct *parent);
>  extern int cap_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted);
>  extern int cap_capset(struct cred *new, const struct cred *old,
> @@ -1209,7 +1209,7 @@ static inline void security_free_mnt_opt
>   *	@alter contains the flag indicating whether changes are to be made.
>   *	Return 0 if permission is granted.
>   *
> - * @ptrace_may_access:
> + * @ptrace_access_check:
>   *	Check permission before allowing the current process to trace the
>   *	@child process.
>   *	Security modules may also want to perform a process tracing check
> @@ -1224,7 +1224,7 @@ static inline void security_free_mnt_opt
>   *	Check that the @parent process has sufficient permission to trace the
>   *	current process before allowing the current process to present itself
>   *	to the @parent process for tracing.
> - *	The parent process will still have to undergo the ptrace_may_access
> + *	The parent process will still have to undergo the ptrace_access_check
>   *	checks before it is allowed to trace this one.
>   *	@parent contains the task_struct structure for debugger process.
>   *	Return 0 if permission is granted.
> @@ -1336,7 +1336,7 @@ static inline void security_free_mnt_opt
>  struct security_operations {
>  	char name[SECURITY_NAME_MAX + 1];
>  
> -	int (*ptrace_may_access) (struct task_struct *child, unsigned int mode);
> +	int (*ptrace_access_check) (struct task_struct *child, unsigned int mode);
>  	int (*ptrace_traceme) (struct task_struct *parent);
>  	int (*capget) (struct task_struct *target,
>  		       kernel_cap_t *effective,
> @@ -1617,7 +1617,7 @@ extern int security_module_enable(struct
>  extern int register_security(struct security_operations *ops);
>  
>  /* Security operations */
> -int security_ptrace_may_access(struct task_struct *child, unsigned int mode);
> +int security_ptrace_access_check(struct task_struct *child, unsigned int mode);
>  int security_ptrace_traceme(struct task_struct *parent);
>  int security_capget(struct task_struct *target,
>  		    kernel_cap_t *effective,
> @@ -1798,10 +1798,10 @@ static inline int security_init(void)
>  	return 0;
>  }
>  
> -static inline int security_ptrace_may_access(struct task_struct *child,
> +static inline int security_ptrace_access_check(struct task_struct *child,
>  					     unsigned int mode)
>  {
> -	return cap_ptrace_may_access(child, mode);
> +	return cap_ptrace_access_check(child, mode);
>  }
>  
>  static inline int security_ptrace_traceme(struct task_struct *parent)
> Index: linux/kernel/ptrace.c
> ===================================================================
> --- linux.orig/kernel/ptrace.c
> +++ linux/kernel/ptrace.c
> @@ -127,7 +127,7 @@ int ptrace_check_attach(struct task_stru
>  	return ret;
>  }
>  
> -int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> +int __ptrace_access_check(struct task_struct *task, unsigned int mode)
>  {
>  	const struct cred *cred = current_cred(), *tcred;
>  
> @@ -162,14 +162,14 @@ int __ptrace_may_access(struct task_stru
>  	if (!dumpable && !capable(CAP_SYS_PTRACE))
>  		return -EPERM;
>  
> -	return security_ptrace_may_access(task, mode);
> +	return security_ptrace_access_check(task, mode);
>  }
>  
> -bool ptrace_may_access(struct task_struct *task, unsigned int mode)
> +bool ptrace_access_check(struct task_struct *task, unsigned int mode)
>  {
>  	int err;
>  	task_lock(task);
> -	err = __ptrace_may_access(task, mode);
> +	err = __ptrace_access_check(task, mode);
>  	task_unlock(task);
>  	return !err;
>  }
> @@ -217,7 +217,7 @@ repeat:
>  	/* the same process cannot be attached many times */
>  	if (task->ptrace & PT_PTRACED)
>  		goto bad;
> -	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH);
> +	retval = __ptrace_access_check(task, PTRACE_MODE_ATTACH);
>  	if (retval)
>  		goto bad;
>  
> Index: linux/security/capability.c
> ===================================================================
> --- linux.orig/security/capability.c
> +++ linux/security/capability.c
> @@ -863,7 +863,7 @@ struct security_operations default_secur
>  
>  void security_fixup_ops(struct security_operations *ops)
>  {
> -	set_to_cap_if_null(ops, ptrace_may_access);
> +	set_to_cap_if_null(ops, ptrace_access_check);
>  	set_to_cap_if_null(ops, ptrace_traceme);
>  	set_to_cap_if_null(ops, capget);
>  	set_to_cap_if_null(ops, capset);
> Index: linux/security/commoncap.c
> ===================================================================
> --- linux.orig/security/commoncap.c
> +++ linux/security/commoncap.c
> @@ -79,7 +79,7 @@ int cap_settime(struct timespec *ts, str
>  }
>  
>  /**
> - * cap_ptrace_may_access - Determine whether the current process may access
> + * cap_ptrace_access_check - Determine whether the current process may access
>   *			   another
>   * @child: The process to be accessed
>   * @mode: The mode of attachment.
> @@ -87,7 +87,7 @@ int cap_settime(struct timespec *ts, str
>   * Determine whether a process may access another, returning 0 if permission
>   * granted, -ve if denied.
>   */
> -int cap_ptrace_may_access(struct task_struct *child, unsigned int mode)
> +int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
>  {
>  	int ret = 0;
>  
> Index: linux/security/root_plug.c
> ===================================================================
> --- linux.orig/security/root_plug.c
> +++ linux/security/root_plug.c
> @@ -72,7 +72,7 @@ static int rootplug_bprm_check_security 
>  
>  static struct security_operations rootplug_security_ops = {
>  	/* Use the capability functions for some of the hooks */
> -	.ptrace_may_access =		cap_ptrace_may_access,
> +	.ptrace_access_check =		cap_ptrace_access_check,
>  	.ptrace_traceme =		cap_ptrace_traceme,
>  	.capget =			cap_capget,
>  	.capset =			cap_capset,
> Index: linux/security/security.c
> ===================================================================
> --- linux.orig/security/security.c
> +++ linux/security/security.c
> @@ -127,9 +127,9 @@ int register_security(struct security_op
>  
>  /* Security operations */
>  
> -int security_ptrace_may_access(struct task_struct *child, unsigned int mode)
> +int security_ptrace_access_check(struct task_struct *child, unsigned int mode)
>  {
> -	return security_ops->ptrace_may_access(child, mode);
> +	return security_ops->ptrace_access_check(child, mode);
>  }
>  
>  int security_ptrace_traceme(struct task_struct *parent)
> Index: linux/security/selinux/hooks.c
> ===================================================================
> --- linux.orig/security/selinux/hooks.c
> +++ linux/security/selinux/hooks.c
> @@ -1854,12 +1854,12 @@ static inline u32 open_file_to_av(struct
>  
>  /* Hook functions begin here. */
>  
> -static int selinux_ptrace_may_access(struct task_struct *child,
> +static int selinux_ptrace_access_check(struct task_struct *child,
>  				     unsigned int mode)
>  {
>  	int rc;
>  
> -	rc = cap_ptrace_may_access(child, mode);
> +	rc = cap_ptrace_access_check(child, mode);
>  	if (rc)
>  		return rc;
>  
> @@ -5318,7 +5318,7 @@ static int selinux_key_getsecurity(struc
>  static struct security_operations selinux_ops = {
>  	.name =				"selinux",
>  
> -	.ptrace_may_access =		selinux_ptrace_may_access,
> +	.ptrace_access_check =		selinux_ptrace_access_check,
>  	.ptrace_traceme =		selinux_ptrace_traceme,
>  	.capget =			selinux_capget,
>  	.capset =			selinux_capset,
> Index: linux/security/smack/smack_lsm.c
> ===================================================================
> --- linux.orig/security/smack/smack_lsm.c
> +++ linux/security/smack/smack_lsm.c
> @@ -92,7 +92,7 @@ struct inode_smack *new_inode_smack(char
>   */
>  
>  /**
> - * smack_ptrace_may_access - Smack approval on PTRACE_ATTACH
> + * smack_ptrace_access_check - Smack approval on PTRACE_ATTACH
>   * @ctp: child task pointer
>   * @mode: ptrace attachment mode
>   *
> @@ -100,11 +100,11 @@ struct inode_smack *new_inode_smack(char
>   *
>   * Do the capability checks, and require read and write.
>   */
> -static int smack_ptrace_may_access(struct task_struct *ctp, unsigned int mode)
> +static int smack_ptrace_access_check(struct task_struct *ctp, unsigned int mode)
>  {
>  	int rc;
>  
> -	rc = cap_ptrace_may_access(ctp, mode);
> +	rc = cap_ptrace_access_check(ctp, mode);
>  	if (rc != 0)
>  		return rc;
>  
> @@ -2826,7 +2826,7 @@ static void smack_release_secctx(char *s
>  struct security_operations smack_ops = {
>  	.name =				"smack",
>  
> -	.ptrace_may_access =		smack_ptrace_may_access,
> +	.ptrace_access_check =		smack_ptrace_access_check,
>  	.ptrace_traceme =		smack_ptrace_traceme,
>  	.capget = 			cap_capget,
>  	.capset = 			cap_capset,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
> ----- End forwarded message -----
> ----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----
>
> Date: Thu, 7 May 2009 11:50:54 +0200
> From: Ingo Molnar <mingo@elte.hu>
> To: Chris Wright <chrisw@sous-sol.org>
> Subject: [patch 2/2] ptrace: turn ptrace_access_check() into a retval
> 	function
> Cc: Oleg Nesterov <oleg@redhat.com>, Roland McGrath <roland@redhat.com>,
> 	Andrew Morton <akpm@linux-foundation.org>,
> 	linux-kernel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>
>
> ptrace_access_check() returns a bool, while most of the ptrace 
> access check machinery works with Linux retvals (where 0 indicates 
> success, negative indicates an error).
>
> So eliminate the bool and invert the usage at the call sites.
>
> ( Note: "< 0" checks are used instead of !0 checks, because that's
>   the convention for retval checks and it results in similarly fast
>   assembly code. )
>
> [ Impact: cleanup ]
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  fs/proc/array.c        |    2 +-
>  fs/proc/base.c         |    8 ++++----
>  fs/proc/task_mmu.c     |    2 +-
>  include/linux/ptrace.h |    2 +-
>  kernel/ptrace.c        |    6 ++++--
>  5 files changed, 11 insertions(+), 9 deletions(-)
>
> Index: linux/fs/proc/array.c
> ===================================================================
> --- linux.orig/fs/proc/array.c
> +++ linux/fs/proc/array.c
> @@ -366,7 +366,7 @@ static int do_task_stat(struct seq_file 
>  
>  	state = *get_task_state(task);
>  	vsize = eip = esp = 0;
> -	permitted = ptrace_access_check(task, PTRACE_MODE_READ);
> +	permitted = !ptrace_access_check(task, PTRACE_MODE_READ);
>  	mm = get_task_mm(task);
>  	if (mm) {
>  		vsize = task_vsize(mm);
> Index: linux/fs/proc/base.c
> ===================================================================
> --- linux.orig/fs/proc/base.c
> +++ linux/fs/proc/base.c
> @@ -222,7 +222,7 @@ static int check_mem_permission(struct t
>  		rcu_read_lock();
>  		match = (tracehook_tracer_task(task) == current);
>  		rcu_read_unlock();
> -		if (match && ptrace_access_check(task, PTRACE_MODE_ATTACH))
> +		if (match && !ptrace_access_check(task, PTRACE_MODE_ATTACH))
>  			return 0;
>  	}
>  
> @@ -322,7 +322,7 @@ static int proc_pid_wchan(struct task_st
>  	wchan = get_wchan(task);
>  
>  	if (lookup_symbol_name(wchan, symname) < 0)
> -		if (!ptrace_access_check(task, PTRACE_MODE_READ))
> +		if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
>  			return 0;
>  		else
>  			return sprintf(buffer, "%lu", wchan);
> @@ -559,7 +559,7 @@ static int proc_fd_access_allowed(struct
>  	 */
>  	task = get_proc_task(inode);
>  	if (task) {
> -		allowed = ptrace_access_check(task, PTRACE_MODE_READ);
> +		allowed = !ptrace_access_check(task, PTRACE_MODE_READ);
>  		put_task_struct(task);
>  	}
>  	return allowed;
> @@ -938,7 +938,7 @@ static ssize_t environ_read(struct file 
>  	if (!task)
>  		goto out_no_task;
>  
> -	if (!ptrace_access_check(task, PTRACE_MODE_READ))
> +	if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
>  		goto out;
>  
>  	ret = -ENOMEM;
> Index: linux/fs/proc/task_mmu.c
> ===================================================================
> --- linux.orig/fs/proc/task_mmu.c
> +++ linux/fs/proc/task_mmu.c
> @@ -656,7 +656,7 @@ static ssize_t pagemap_read(struct file 
>  		goto out;
>  
>  	ret = -EACCES;
> -	if (!ptrace_access_check(task, PTRACE_MODE_READ))
> +	if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
>  		goto out_task;
>  
>  	ret = -EINVAL;
> Index: linux/include/linux/ptrace.h
> ===================================================================
> --- linux.orig/include/linux/ptrace.h
> +++ linux/include/linux/ptrace.h
> @@ -101,7 +101,7 @@ extern void ptrace_fork(struct task_stru
>  /* Returns 0 on success, -errno on denial. */
>  extern int __ptrace_access_check(struct task_struct *task, unsigned int mode);
>  /* Returns true on success, false on denial. */
> -extern bool ptrace_access_check(struct task_struct *task, unsigned int mode);
> +extern int ptrace_access_check(struct task_struct *task, unsigned int mode);
>  
>  static inline int ptrace_reparented(struct task_struct *child)
>  {
> Index: linux/kernel/ptrace.c
> ===================================================================
> --- linux.orig/kernel/ptrace.c
> +++ linux/kernel/ptrace.c
> @@ -165,13 +165,15 @@ int __ptrace_access_check(struct task_st
>  	return security_ptrace_access_check(task, mode);
>  }
>  
> -bool ptrace_access_check(struct task_struct *task, unsigned int mode)
> +int ptrace_access_check(struct task_struct *task, unsigned int mode)
>  {
>  	int err;
> +
>  	task_lock(task);
>  	err = __ptrace_access_check(task, mode);
>  	task_unlock(task);
> -	return !err;
> +
> +	return err;
>  }
>  
>  int ptrace_attach(struct task_struct *task)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
> ----- End forwarded message -----
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>   


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-05-07  9:54                 ` James Morris
@ 2009-05-07 10:20                   ` Ingo Molnar
  2009-05-08  3:27                     ` Casey Schaufler
  0 siblings, 1 reply; 657+ messages in thread
From: Ingo Molnar @ 2009-05-07 10:20 UTC (permalink / raw)
  To: James Morris
  Cc: Chris Wright, Oleg Nesterov, Roland McGrath, Andrew Morton,
	linux-kernel, Al Viro, linux-security-module


* James Morris <jmorris@namei.org> wrote:

> On Thu, 7 May 2009, Chris Wright wrote:
> 
> > * Ingo Molnar (mingo@elte.hu) wrote:
> 
> [Added LSM list to the CC; please do so whenever making changes in this 
> area...]
> 
> > > They have no active connection to the core kernel 
> > > ptrace_may_access() check in any case:
> > 
> > Not sure what you mean:
> > 
> > ptrace_may_access
> >  __ptrace_may_access
> >   security_ptrace_may_access
> > 
> > Looks like your patch won't compile.
> > 
> 
> Below is an updated version which fixes the bug, against 
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
> 
> Boot tested with SELinux.

thanks! Below are the two patches i wrote and tested.

	Ingo

----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----

Date: Thu, 7 May 2009 11:49:47 +0200
From: Ingo Molnar <mingo@elte.hu>
To: Chris Wright <chrisw@sous-sol.org>
Subject: [patch 1/2] ptrace, security: rename ptrace_may_access =>
	ptrace_access_check
Cc: Oleg Nesterov <oleg@redhat.com>, Roland McGrath <roland@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>

The ptrace_may_access() methods are named confusingly - some 
variants return a bool, while the security subsystem methods have a 
retval convention.

Rename it to ptrace_access_check, to reduce the confusion factor. A 
followup patch eliminates the bool usage.

[ Impact: cleanup, no code changed ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <20090507084943.GB19133@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/proc/array.c            |    2 +-
 fs/proc/base.c             |   10 +++++-----
 fs/proc/task_mmu.c         |    2 +-
 include/linux/ptrace.h     |    4 ++--
 include/linux/security.h   |   14 +++++++-------
 kernel/ptrace.c            |   10 +++++-----
 security/capability.c      |    2 +-
 security/commoncap.c       |    4 ++--
 security/root_plug.c       |    2 +-
 security/security.c        |    4 ++--
 security/selinux/hooks.c   |    6 +++---
 security/smack/smack_lsm.c |    8 ++++----
 12 files changed, 34 insertions(+), 34 deletions(-)

Index: linux/fs/proc/array.c
===================================================================
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -366,7 +366,7 @@ static int do_task_stat(struct seq_file 
 
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
-	permitted = ptrace_may_access(task, PTRACE_MODE_READ);
+	permitted = ptrace_access_check(task, PTRACE_MODE_READ);
 	mm = get_task_mm(task);
 	if (mm) {
 		vsize = task_vsize(mm);
Index: linux/fs/proc/base.c
===================================================================
--- linux.orig/fs/proc/base.c
+++ linux/fs/proc/base.c
@@ -222,7 +222,7 @@ static int check_mem_permission(struct t
 		rcu_read_lock();
 		match = (tracehook_tracer_task(task) == current);
 		rcu_read_unlock();
-		if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
+		if (match && ptrace_access_check(task, PTRACE_MODE_ATTACH))
 			return 0;
 	}
 
@@ -242,7 +242,7 @@ struct mm_struct *mm_for_maps(struct tas
 	if (task->mm != mm)
 		goto out;
 	if (task->mm != current->mm &&
-	    __ptrace_may_access(task, PTRACE_MODE_READ) < 0)
+	    __ptrace_access_check(task, PTRACE_MODE_READ) < 0)
 		goto out;
 	task_unlock(task);
 	return mm;
@@ -322,7 +322,7 @@ static int proc_pid_wchan(struct task_st
 	wchan = get_wchan(task);
 
 	if (lookup_symbol_name(wchan, symname) < 0)
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
+		if (!ptrace_access_check(task, PTRACE_MODE_READ))
 			return 0;
 		else
 			return sprintf(buffer, "%lu", wchan);
@@ -559,7 +559,7 @@ static int proc_fd_access_allowed(struct
 	 */
 	task = get_proc_task(inode);
 	if (task) {
-		allowed = ptrace_may_access(task, PTRACE_MODE_READ);
+		allowed = ptrace_access_check(task, PTRACE_MODE_READ);
 		put_task_struct(task);
 	}
 	return allowed;
@@ -938,7 +938,7 @@ static ssize_t environ_read(struct file 
 	if (!task)
 		goto out_no_task;
 
-	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+	if (!ptrace_access_check(task, PTRACE_MODE_READ))
 		goto out;
 
 	ret = -ENOMEM;
Index: linux/fs/proc/task_mmu.c
===================================================================
--- linux.orig/fs/proc/task_mmu.c
+++ linux/fs/proc/task_mmu.c
@@ -656,7 +656,7 @@ static ssize_t pagemap_read(struct file 
 		goto out;
 
 	ret = -EACCES;
-	if (!ptrace_may_access(task, PTRACE_MODE_READ))
+	if (!ptrace_access_check(task, PTRACE_MODE_READ))
 		goto out_task;
 
 	ret = -EINVAL;
Index: linux/include/linux/ptrace.h
===================================================================
--- linux.orig/include/linux/ptrace.h
+++ linux/include/linux/ptrace.h
@@ -99,9 +99,9 @@ extern void ptrace_fork(struct task_stru
 #define PTRACE_MODE_READ   1
 #define PTRACE_MODE_ATTACH 2
 /* Returns 0 on success, -errno on denial. */
-extern int __ptrace_may_access(struct task_struct *task, unsigned int mode);
+extern int __ptrace_access_check(struct task_struct *task, unsigned int mode);
 /* Returns true on success, false on denial. */
-extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
+extern bool ptrace_access_check(struct task_struct *task, unsigned int mode);
 
 static inline int ptrace_reparented(struct task_struct *child)
 {
Index: linux/include/linux/security.h
===================================================================
--- linux.orig/include/linux/security.h
+++ linux/include/linux/security.h
@@ -52,7 +52,7 @@ struct audit_krule;
 extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
 		       int cap, int audit);
 extern int cap_settime(struct timespec *ts, struct timezone *tz);
-extern int cap_ptrace_may_access(struct task_struct *child, unsigned int mode);
+extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
 extern int cap_ptrace_traceme(struct task_struct *parent);
 extern int cap_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted);
 extern int cap_capset(struct cred *new, const struct cred *old,
@@ -1209,7 +1209,7 @@ static inline void security_free_mnt_opt
  *	@alter contains the flag indicating whether changes are to be made.
  *	Return 0 if permission is granted.
  *
- * @ptrace_may_access:
+ * @ptrace_access_check:
  *	Check permission before allowing the current process to trace the
  *	@child process.
  *	Security modules may also want to perform a process tracing check
@@ -1224,7 +1224,7 @@ static inline void security_free_mnt_opt
  *	Check that the @parent process has sufficient permission to trace the
  *	current process before allowing the current process to present itself
  *	to the @parent process for tracing.
- *	The parent process will still have to undergo the ptrace_may_access
+ *	The parent process will still have to undergo the ptrace_access_check
  *	checks before it is allowed to trace this one.
  *	@parent contains the task_struct structure for debugger process.
  *	Return 0 if permission is granted.
@@ -1336,7 +1336,7 @@ static inline void security_free_mnt_opt
 struct security_operations {
 	char name[SECURITY_NAME_MAX + 1];
 
-	int (*ptrace_may_access) (struct task_struct *child, unsigned int mode);
+	int (*ptrace_access_check) (struct task_struct *child, unsigned int mode);
 	int (*ptrace_traceme) (struct task_struct *parent);
 	int (*capget) (struct task_struct *target,
 		       kernel_cap_t *effective,
@@ -1617,7 +1617,7 @@ extern int security_module_enable(struct
 extern int register_security(struct security_operations *ops);
 
 /* Security operations */
-int security_ptrace_may_access(struct task_struct *child, unsigned int mode);
+int security_ptrace_access_check(struct task_struct *child, unsigned int mode);
 int security_ptrace_traceme(struct task_struct *parent);
 int security_capget(struct task_struct *target,
 		    kernel_cap_t *effective,
@@ -1798,10 +1798,10 @@ static inline int security_init(void)
 	return 0;
 }
 
-static inline int security_ptrace_may_access(struct task_struct *child,
+static inline int security_ptrace_access_check(struct task_struct *child,
 					     unsigned int mode)
 {
-	return cap_ptrace_may_access(child, mode);
+	return cap_ptrace_access_check(child, mode);
 }
 
 static inline int security_ptrace_traceme(struct task_struct *parent)
Index: linux/kernel/ptrace.c
===================================================================
--- linux.orig/kernel/ptrace.c
+++ linux/kernel/ptrace.c
@@ -127,7 +127,7 @@ int ptrace_check_attach(struct task_stru
 	return ret;
 }
 
-int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+int __ptrace_access_check(struct task_struct *task, unsigned int mode)
 {
 	const struct cred *cred = current_cred(), *tcred;
 
@@ -162,14 +162,14 @@ int __ptrace_may_access(struct task_stru
 	if (!dumpable && !capable(CAP_SYS_PTRACE))
 		return -EPERM;
 
-	return security_ptrace_may_access(task, mode);
+	return security_ptrace_access_check(task, mode);
 }
 
-bool ptrace_may_access(struct task_struct *task, unsigned int mode)
+bool ptrace_access_check(struct task_struct *task, unsigned int mode)
 {
 	int err;
 	task_lock(task);
-	err = __ptrace_may_access(task, mode);
+	err = __ptrace_access_check(task, mode);
 	task_unlock(task);
 	return !err;
 }
@@ -217,7 +217,7 @@ repeat:
 	/* the same process cannot be attached many times */
 	if (task->ptrace & PT_PTRACED)
 		goto bad;
-	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH);
+	retval = __ptrace_access_check(task, PTRACE_MODE_ATTACH);
 	if (retval)
 		goto bad;
 
Index: linux/security/capability.c
===================================================================
--- linux.orig/security/capability.c
+++ linux/security/capability.c
@@ -863,7 +863,7 @@ struct security_operations default_secur
 
 void security_fixup_ops(struct security_operations *ops)
 {
-	set_to_cap_if_null(ops, ptrace_may_access);
+	set_to_cap_if_null(ops, ptrace_access_check);
 	set_to_cap_if_null(ops, ptrace_traceme);
 	set_to_cap_if_null(ops, capget);
 	set_to_cap_if_null(ops, capset);
Index: linux/security/commoncap.c
===================================================================
--- linux.orig/security/commoncap.c
+++ linux/security/commoncap.c
@@ -79,7 +79,7 @@ int cap_settime(struct timespec *ts, str
 }
 
 /**
- * cap_ptrace_may_access - Determine whether the current process may access
+ * cap_ptrace_access_check - Determine whether the current process may access
  *			   another
  * @child: The process to be accessed
  * @mode: The mode of attachment.
@@ -87,7 +87,7 @@ int cap_settime(struct timespec *ts, str
  * Determine whether a process may access another, returning 0 if permission
  * granted, -ve if denied.
  */
-int cap_ptrace_may_access(struct task_struct *child, unsigned int mode)
+int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 {
 	int ret = 0;
 
Index: linux/security/root_plug.c
===================================================================
--- linux.orig/security/root_plug.c
+++ linux/security/root_plug.c
@@ -72,7 +72,7 @@ static int rootplug_bprm_check_security 
 
 static struct security_operations rootplug_security_ops = {
 	/* Use the capability functions for some of the hooks */
-	.ptrace_may_access =		cap_ptrace_may_access,
+	.ptrace_access_check =		cap_ptrace_access_check,
 	.ptrace_traceme =		cap_ptrace_traceme,
 	.capget =			cap_capget,
 	.capset =			cap_capset,
Index: linux/security/security.c
===================================================================
--- linux.orig/security/security.c
+++ linux/security/security.c
@@ -127,9 +127,9 @@ int register_security(struct security_op
 
 /* Security operations */
 
-int security_ptrace_may_access(struct task_struct *child, unsigned int mode)
+int security_ptrace_access_check(struct task_struct *child, unsigned int mode)
 {
-	return security_ops->ptrace_may_access(child, mode);
+	return security_ops->ptrace_access_check(child, mode);
 }
 
 int security_ptrace_traceme(struct task_struct *parent)
Index: linux/security/selinux/hooks.c
===================================================================
--- linux.orig/security/selinux/hooks.c
+++ linux/security/selinux/hooks.c
@@ -1854,12 +1854,12 @@ static inline u32 open_file_to_av(struct
 
 /* Hook functions begin here. */
 
-static int selinux_ptrace_may_access(struct task_struct *child,
+static int selinux_ptrace_access_check(struct task_struct *child,
 				     unsigned int mode)
 {
 	int rc;
 
-	rc = cap_ptrace_may_access(child, mode);
+	rc = cap_ptrace_access_check(child, mode);
 	if (rc)
 		return rc;
 
@@ -5318,7 +5318,7 @@ static int selinux_key_getsecurity(struc
 static struct security_operations selinux_ops = {
 	.name =				"selinux",
 
-	.ptrace_may_access =		selinux_ptrace_may_access,
+	.ptrace_access_check =		selinux_ptrace_access_check,
 	.ptrace_traceme =		selinux_ptrace_traceme,
 	.capget =			selinux_capget,
 	.capset =			selinux_capset,
Index: linux/security/smack/smack_lsm.c
===================================================================
--- linux.orig/security/smack/smack_lsm.c
+++ linux/security/smack/smack_lsm.c
@@ -92,7 +92,7 @@ struct inode_smack *new_inode_smack(char
  */
 
 /**
- * smack_ptrace_may_access - Smack approval on PTRACE_ATTACH
+ * smack_ptrace_access_check - Smack approval on PTRACE_ATTACH
  * @ctp: child task pointer
  * @mode: ptrace attachment mode
  *
@@ -100,11 +100,11 @@ struct inode_smack *new_inode_smack(char
  *
  * Do the capability checks, and require read and write.
  */
-static int smack_ptrace_may_access(struct task_struct *ctp, unsigned int mode)
+static int smack_ptrace_access_check(struct task_struct *ctp, unsigned int mode)
 {
 	int rc;
 
-	rc = cap_ptrace_may_access(ctp, mode);
+	rc = cap_ptrace_access_check(ctp, mode);
 	if (rc != 0)
 		return rc;
 
@@ -2826,7 +2826,7 @@ static void smack_release_secctx(char *s
 struct security_operations smack_ops = {
 	.name =				"smack",
 
-	.ptrace_may_access =		smack_ptrace_may_access,
+	.ptrace_access_check =		smack_ptrace_access_check,
 	.ptrace_traceme =		smack_ptrace_traceme,
 	.capget = 			cap_capget,
 	.capset = 			cap_capset,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

----- End forwarded message -----
----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----

Date: Thu, 7 May 2009 11:50:54 +0200
From: Ingo Molnar <mingo@elte.hu>
To: Chris Wright <chrisw@sous-sol.org>
Subject: [patch 2/2] ptrace: turn ptrace_access_check() into a retval
	function
Cc: Oleg Nesterov <oleg@redhat.com>, Roland McGrath <roland@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>

ptrace_access_check() returns a bool, while most of the ptrace 
access check machinery works with Linux retvals (where 0 indicates 
success, negative indicates an error).

So eliminate the bool and invert the usage at the call sites.

( Note: "< 0" checks are used instead of !0 checks, because that's
  the convention for retval checks and it results in similarly fast
  assembly code. )

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 fs/proc/array.c        |    2 +-
 fs/proc/base.c         |    8 ++++----
 fs/proc/task_mmu.c     |    2 +-
 include/linux/ptrace.h |    2 +-
 kernel/ptrace.c        |    6 ++++--
 5 files changed, 11 insertions(+), 9 deletions(-)

Index: linux/fs/proc/array.c
===================================================================
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -366,7 +366,7 @@ static int do_task_stat(struct seq_file 
 
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
-	permitted = ptrace_access_check(task, PTRACE_MODE_READ);
+	permitted = !ptrace_access_check(task, PTRACE_MODE_READ);
 	mm = get_task_mm(task);
 	if (mm) {
 		vsize = task_vsize(mm);
Index: linux/fs/proc/base.c
===================================================================
--- linux.orig/fs/proc/base.c
+++ linux/fs/proc/base.c
@@ -222,7 +222,7 @@ static int check_mem_permission(struct t
 		rcu_read_lock();
 		match = (tracehook_tracer_task(task) == current);
 		rcu_read_unlock();
-		if (match && ptrace_access_check(task, PTRACE_MODE_ATTACH))
+		if (match && !ptrace_access_check(task, PTRACE_MODE_ATTACH))
 			return 0;
 	}
 
@@ -322,7 +322,7 @@ static int proc_pid_wchan(struct task_st
 	wchan = get_wchan(task);
 
 	if (lookup_symbol_name(wchan, symname) < 0)
-		if (!ptrace_access_check(task, PTRACE_MODE_READ))
+		if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
 			return 0;
 		else
 			return sprintf(buffer, "%lu", wchan);
@@ -559,7 +559,7 @@ static int proc_fd_access_allowed(struct
 	 */
 	task = get_proc_task(inode);
 	if (task) {
-		allowed = ptrace_access_check(task, PTRACE_MODE_READ);
+		allowed = !ptrace_access_check(task, PTRACE_MODE_READ);
 		put_task_struct(task);
 	}
 	return allowed;
@@ -938,7 +938,7 @@ static ssize_t environ_read(struct file 
 	if (!task)
 		goto out_no_task;
 
-	if (!ptrace_access_check(task, PTRACE_MODE_READ))
+	if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
 		goto out;
 
 	ret = -ENOMEM;
Index: linux/fs/proc/task_mmu.c
===================================================================
--- linux.orig/fs/proc/task_mmu.c
+++ linux/fs/proc/task_mmu.c
@@ -656,7 +656,7 @@ static ssize_t pagemap_read(struct file 
 		goto out;
 
 	ret = -EACCES;
-	if (!ptrace_access_check(task, PTRACE_MODE_READ))
+	if (ptrace_access_check(task, PTRACE_MODE_READ) < 0)
 		goto out_task;
 
 	ret = -EINVAL;
Index: linux/include/linux/ptrace.h
===================================================================
--- linux.orig/include/linux/ptrace.h
+++ linux/include/linux/ptrace.h
@@ -101,7 +101,7 @@ extern void ptrace_fork(struct task_stru
 /* Returns 0 on success, -errno on denial. */
 extern int __ptrace_access_check(struct task_struct *task, unsigned int mode);
 /* Returns true on success, false on denial. */
-extern bool ptrace_access_check(struct task_struct *task, unsigned int mode);
+extern int ptrace_access_check(struct task_struct *task, unsigned int mode);
 
 static inline int ptrace_reparented(struct task_struct *child)
 {
Index: linux/kernel/ptrace.c
===================================================================
--- linux.orig/kernel/ptrace.c
+++ linux/kernel/ptrace.c
@@ -165,13 +165,15 @@ int __ptrace_access_check(struct task_st
 	return security_ptrace_access_check(task, mode);
 }
 
-bool ptrace_access_check(struct task_struct *task, unsigned int mode)
+int ptrace_access_check(struct task_struct *task, unsigned int mode)
 {
 	int err;
+
 	task_lock(task);
 	err = __ptrace_access_check(task, mode);
 	task_unlock(task);
-	return !err;
+
+	return err;
 }
 
 int ptrace_attach(struct task_struct *task)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

----- End forwarded message -----

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-03-27 23:26 Eric Anholt
@ 2009-03-28  0:02 ` Linus Torvalds
  0 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2009-03-28  0:02 UTC (permalink / raw)
  To: Eric Anholt; +Cc: lkml, dri-devel

On Fri, 27 Mar 2009, Eric Anholt wrote:
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel drm-intel-next

Grr.

Guys, what the *hell* is wrong with you, when you can't even react to 
trivial warnings and fix buggy code pointed out by the compiler?

If you had _ever_ compiled this on x86-64, you would have seen:

  drivers/gpu/drm/i915/i915_gem_debugfs.c: In function ‘i915_gem_fence_regs_info’:
  drivers/gpu/drm/i915/i915_gem_debugfs.c:201: warning: format ‘%08x’ expects type ‘unsigned int’, but argument 7 has type ‘size_t’

and this is not the first time this has happened.

See commits f06da264cfb0f9444d41ca247213e419f90aa72a and 
aeb565dfc3ac4c8b47c5049085b4c7bfb2c7d5d7.

What's so hard with keeping the build warning-clean, and fixing these 
things _long_ before they hit my tree?

Some basic quality control. PLEASE.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <8F90F944E50427428C60E12A34A309D21C401BA619@carmd-exchmb01.sierrawireless.local>
@ 2009-03-13 16:54 ` Ralf Nyren
  0 siblings, 0 replies; 657+ messages in thread
From: Ralf Nyren @ 2009-03-13 16:54 UTC (permalink / raw)
  To: Rory Filer; +Cc: linux-kernel, Kevin Lloyd

Hi Rory,

Sounds great, send the driver and I'll give it a try at once. I'll report back
to you with the results.

Many thanks, Ralf

On Fri, 13 Mar 2009, Rory Filer wrote:

> Hi Ralf
>
>
>
> Kevin passed your email on to my attention and I think we can help you with this problem. We've been doing a lot of work on our drivers lately and I've got a freshly-ready version of sierra.c just for 2.6.28. We've done a lot of testing here and it seems pretty robust; perhaps you'd be willing to give it a try?
>
>
>
> Since I'm not sure about the etiquette for posting to this list, so I will attach the driver in a separate email to you.
>
>
>
> Regards
>
>
>
> Rory Filer
>
>
>
>
>
> -----Original Message-----
>
> From: Ralf Nyren [mailto:ralf@nyren.net]
>
> Sent: Friday, March 13, 2009 8:01 AM
>
> To: linux-kernel@vger.kernel.org
>
> Cc: Kevin Lloyd
>
> Subject: Sierra Wireless (MC8780) HSDPA speed issue
>
>
>
> Hi,
>
>
>
> I have a Sierra Wireless MC8780 UMTS card in a Fujitsu S6410 laptop running kernel 2.6.28.7. In kernel sierra driver v1.3.2.
>
>
>
> The card works but speed seems limited to approx 1.0 Mbit/s download using the linux driver.  Testing the card in Windows XP yields download speeds close to 5.0 Mbit/s.
>
>
>
> I recently updated the firmware of the card to support HSDPA/HSUPA. The update gave the desired result in Windows but not in Linux. The speed improved in Linux but didn't increase above 1 Mbit/s.
>
>
>
> Is there any known driver limitations or is this a configuration issue?
>
>
>
> Please let me know if you need any additional information.
>
>
>
> Best regards, Ralf
>
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-03-11 14:59 ` your mail Linus Torvalds
@ 2009-03-11 17:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 657+ messages in thread
From: Vitaly Mayatskikh @ 2009-03-11 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Vitaly Mayatskikh, linux-kernel


> On Wed, 11 Mar 2009, Vitaly Mayatskikh wrote:
> > 
> > (v)scnprintf says it should return 0 when size is 0, but doesn't do
> > so. Also size_t is unsigned, it can't be less then 0. Fix the code and
> > comments.
> 
> That is bogus.
> 
> The code really does (od "did"? Maybe you removed it) check for _smaller_ 
> than 0:

Well, (v)scnprintf says it returns 0 for size <= 0, but really returns
-1 for size == 0. I think, this code can't return 0 for size == 0:

	i=vsnprintf(buf,size,fmt,args);
	return (i >= size) ? (size - 1) : i;

Systemtap's script:

function test:long()
%{
        char tmp[256];
        long err;
        err = scnprintf(tmp, 0, "%lu", (long)128);
        THIS->__retvalue = err;
%}

probe begin
{
        printf("scnprintf returns %d\n", test());
}

stap -g scnprintf.stp
scnprintf returns -1

-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-03-11 10:47 Vitaly Mayatskikh
@ 2009-03-11 14:59 ` Linus Torvalds
  2009-03-11 17:23   ` Vitaly Mayatskikh
  0 siblings, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2009-03-11 14:59 UTC (permalink / raw)
  To: Vitaly Mayatskikh; +Cc: linux-kernel



On Wed, 11 Mar 2009, Vitaly Mayatskikh wrote:
> 
> (v)scnprintf says it should return 0 when size is 0, but doesn't do
> so. Also size_t is unsigned, it can't be less then 0. Fix the code and
> comments.

That is bogus.

The code really does (od "did"? Maybe you removed it) check for _smaller_ 
than 0:

	int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
	{
		...
		/* Reject out-of-range values early.  Large positive sizes are
		   used for unknown buffer sizes. */
		if (unlikely((int) size < 0)) {
			/* There can be only one.. */
			static char warn = 1;
			WARN_ON(warn);
			warn = 0;
			return 0;
		}
		...

because under/overflows have happened.

The kernel is _not_ a regular libc. We have different rules.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-02-13  0:45 Youngwhan Kim
@ 2009-02-13  3:40 ` Johannes Weiner
  0 siblings, 0 replies; 657+ messages in thread
From: Johannes Weiner @ 2009-02-13  3:40 UTC (permalink / raw)
  To: Youngwhan Kim; +Cc: linux-kernel

On Fri, Feb 13, 2009 at 09:45:13AM +0900, Youngwhan Kim wrote:
> unsubscribe

There is just no way out!

> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org        ^^^^^^^^^^^^

                           ^^^^^^^^^^^^^^^^^^^^^^^^^

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-01-19  2:54 Gao, Yunpeng
@ 2009-01-19  3:07 ` Matthew Wilcox
  0 siblings, 0 replies; 657+ messages in thread
From: Matthew Wilcox @ 2009-01-19  3:07 UTC (permalink / raw)
  To: Gao, Yunpeng; +Cc: linux-ia64, linux-kernel

On Mon, Jan 19, 2009 at 10:54:02AM +0800, Gao, Yunpeng wrote:
> I have to use 64bit variable in my 2.6.27 kernel NAND driver as below:
> ---------------------------------------------------------------------------
> u64 NAND_capacity;
> unsigned int block_num, block_size;
> ...
> block_num = NAND_capacity / block_size;
> ---------------------------------------------------------------------------
> but it failed when compiling and reports 'undefined reference to `__udivdi3'.

Presumably block_size is a power of two, so you can do:

	block_num = NAND_capacity >> block_shift;

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-01-13  6:10 Steven Rostedt
@ 2009-01-13 13:21 ` Steven Rostedt
  0 siblings, 0 replies; 657+ messages in thread
From: Steven Rostedt @ 2009-01-13 13:21 UTC (permalink / raw)
  To: linux-kernel

On Tue, Jan 13, 2009 at 01:10:04AM -0500, Steven Rostedt wrote:

Bah! sorry for the noise here. My scripts to send out the patch
queue failed to handle the comma in "Luck, Tony" email address.
But it unfortunately did a partial send :-(

I had to modify Tony's email for the final send.

-- Steve

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2009-01-11  3:41 Jose Luis Marchetti
@ 2009-01-11  6:47 ` Jesper Juhl
  0 siblings, 0 replies; 657+ messages in thread
From: Jesper Juhl @ 2009-01-11  6:47 UTC (permalink / raw)
  To: Jose Luis Marchetti; +Cc: linux-kernel

On Sat, 10 Jan 2009, Jose Luis Marchetti wrote:

> Hi,
> 
> I would like to open/read/write/close a regular file from my device
> driver.

That's probably a bad idea and what you really want to do is use procfs, 
sysfs, debugfs, relayfs, module parameters or similar.

Take a look here: 
http://kernelnewbies.org/FAQ/WhyWritingFilesFromKernelIsBad 


-- 
Jesper Juhl <jj@chaosbits.net>        http://personal.chaosbits.net/
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2008-05-24 20:05 Thomas Gleixner
@ 2008-05-24 21:06 ` Daniel Walker
  0 siblings, 0 replies; 657+ messages in thread
From: Daniel Walker @ 2008-05-24 21:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel


On Sat, 2008-05-24 at 22:05 +0200, Thomas Gleixner wrote:

> > If that's the requirement then code that cleans up the corner case that
> > I've identified, which is also minimal should be acceptable .. Since
> > it's meeting the same requirement you layed out above for the original
> > plist changes.
> 
> Your code solves the least to worry about corner case and hurts
> performance for nothing. You take extra locks in the hot path for no
> benefit.
> 
> Aside of that it introduces lock order problems and we can really do
> without extra useless complexity in the futex code.
> 
> You can argue in circles. This is not going anywhere near mainline.

Above I'm not speaking about my code, I'm only speaking in terms of a
solution to this case, even if it isn't mine..

Daniel




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
@ 2008-05-24 20:05 Thomas Gleixner
  2008-05-24 21:06 ` Daniel Walker
  0 siblings, 1 reply; 657+ messages in thread
From: Thomas Gleixner @ 2008-05-24 20:05 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-kernel

On Sat, 24 May 2008, Daniel Walker wrote:
> > There is no kernel side controlled handover of a normal futex. The
> > woken up waiters race for it and a low prio thread on another CPU can
> > steal it even if there is a high prio waiter woken up.
> 
> After reading futex_wake, Doesn't it depend how many waiters are woken?
> Given that comes from userspace, glibc could wake a single waiter and
> obtain a priority ordering, couldn't it?

It could and it does. Still this does not protect against another
lower prio task taking the futex before the woken waiter can do it,
which is happening way more often than your theoretical setscheduler
case. Again, setscheduler is called in startup code of a program not
at arbitrary points during runtime, which rely on lock ordering.

> > The plist add on works correct in most of the cases, nothing else. To
> > achieve full correctness there is much more necessary than this
> > setscheduler issue. The plist changes were accepted because the
> > overhead is really minimal, but achieving full correctness would hurt
> > performance badly.
> 
> If that's the requirement then code that cleans up the corner case that
> I've identified, which is also minimal should be acceptable .. Since
> it's meeting the same requirement you layed out above for the original
> plist changes.

Your code solves the least to worry about corner case and hurts
performance for nothing. You take extra locks in the hot path for no
benefit.

Aside of that it introduces lock order problems and we can really do
without extra useless complexity in the futex code.

You can argue in circles. This is not going anywhere near mainline.

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2008-05-20 12:34 Lukas Hejtmanek
@ 2008-05-20 15:20 ` Alan Stern
  0 siblings, 0 replies; 657+ messages in thread
From: Alan Stern @ 2008-05-20 15:20 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: Oliver Neukum, Rafael J. Wysocki, Linux Kernel Mailing List,
	greg, linux-usb

On Tue, 20 May 2008, Lukas Hejtmanek wrote:

> <stern@rowland.harvard.edu>, Greg KH <greg@kroah.com>
> Bcc: 
> Subject: Re: [Bug #10630] USB devices plugged into dock are not discoverred
> 	until reload of ehci-hcd
> Reply-To: 
> In-Reply-To: <200805201327.34678.oliver@neukum.org>
> X-echelon: NSA, CIA, CI5, MI5, FBI, KGB, BIS, Plutonium, Bin Laden, bomb
> 
> On Tue, May 20, 2008 at 01:27:34PM +0200, Oliver Neukum wrote:
> > > done.
> > > http://bugzilla.kernel.org/show_bug.cgi?id=10630
> > 
> > Aha. Thanks.
> > Please recompile without CONFIG_USB_SUSPEND
> 
> Hm, without USB_SUSPEND it works. So what next, considered fixed or any
> further investigation is needed?

No further investigation is needed.  I tried doing essentially the same 
thing on my system and the same problem occurred.

It is caused by the way ehci-hcd "auto-clears" the port
change-suspend feature.  This patch should fix the problem.  Please 
try it out and let us know if it works.

Alan Stern



Index: usb-2.6/drivers/usb/host/ehci.h
===================================================================
--- usb-2.6.orig/drivers/usb/host/ehci.h
+++ usb-2.6/drivers/usb/host/ehci.h
@@ -97,6 +97,8 @@ struct ehci_hcd {			/* one per controlle
 			dedicated to the companion controller */
 	unsigned long		owned_ports;		/* which ports are
 			owned by the companion during a bus suspend */
+	unsigned long		port_c_suspend;		/* which ports have
+			the change-suspend feature turned on */
 
 	/* per-HC memory pools (could be per-bus, but ...) */
 	struct dma_pool		*qh_pool;	/* qh per active urb */
Index: usb-2.6/drivers/usb/host/ehci-hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/host/ehci-hub.c
+++ usb-2.6/drivers/usb/host/ehci-hub.c
@@ -609,7 +609,7 @@ static int ehci_hub_control (
 			}
 			break;
 		case USB_PORT_FEAT_C_SUSPEND:
-			/* we auto-clear this feature */
+			clear_bit(wIndex, &ehci->port_c_suspend);
 			break;
 		case USB_PORT_FEAT_POWER:
 			if (HCS_PPC (ehci->hcs_params))
@@ -688,7 +688,7 @@ static int ehci_hub_control (
 			/* resume completed? */
 			else if (time_after_eq(jiffies,
 					ehci->reset_done[wIndex])) {
-				status |= 1 << USB_PORT_FEAT_C_SUSPEND;
+				set_bit(wIndex, &ehci->port_c_suspend);
 				ehci->reset_done[wIndex] = 0;
 
 				/* stop resume signaling */
@@ -765,6 +765,8 @@ static int ehci_hub_control (
 			status |= 1 << USB_PORT_FEAT_RESET;
 		if (temp & PORT_POWER)
 			status |= 1 << USB_PORT_FEAT_POWER;
+		if (test_bit(wIndex, &ehci->port_c_suspend))
+			status |= 1 << USB_PORT_FEAT_C_SUSPEND;
 
 #ifndef	VERBOSE_DEBUG
 	if (status & ~0xffff)	/* only if wPortChange is interesting */


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
@ 2007-10-17 18:28 nicholas.thompson1
  0 siblings, 0 replies; 657+ messages in thread
From: nicholas.thompson1 @ 2007-10-17 18:28 UTC (permalink / raw)
  To: linux-kernel

>Nope, wrong clues.
>The right clues are in the footer of this message after it travels thru the list.
>
>I supplied them to Nicholas already, but apparently others need to be reminded of
>them every now and then  :-]   That footer is in these list messages for a reason!
>
>    /Matti Aarnio -- one of  <postmaster@vger.kernel.org>
>
>PS: You want to contact VGER's email and list managers ?
>    We use the internet email standard address "postmaster"
>

Jan, Matti, + List,
 I am very sorry about the noise, that's what I get for using cut and paste while tired and before my third cup of coffee. ;p Apologies.

Nick 

>>On Wed, Oct 17, 2007 at 06:36:19PM +0200, Jan Engelhardt wrote:
>> Date: Wed, 17 Oct 2007 18:36:19 +0200 (CEST)
>> From: Jan Engelhardt <jengelh@computergmbh.de>
>> To: nicholas.thompson1@mchsi.com
>> cc: linux-kernel@vger.kernel.org
>> Subject: Re: your mail
>> 
>> On Oct 17 2007 16:30, nicholas.thompson1@mchsi.com wrote:
>> >Date: Wed, 17 Oct 2007 16:30:24 +0000
>> >From:  <nicholas.thompson1@mchsi.com>
>> >To:  <linux-kernel@vger.kernel.org>
>>              ^^^^^^
> >>
> >>subscribe linux-alpha
>>                  ^^^^^

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-10-17 16:36 ` your mail Jan Engelhardt
@ 2007-10-17 17:50   ` Matti Aarnio
  0 siblings, 0 replies; 657+ messages in thread
From: Matti Aarnio @ 2007-10-17 17:50 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel

Nope, wrong clues.
The right clues are in the footer of this message after it travels thru the list.

I supplied them to Nicholas already, but apparently others need to be reminded of
them every now and then  :-]   That footer is in these list messages for a reason!

    /Matti Aarnio -- one of  <postmaster@vger.kernel.org>

PS: You want to contact VGER's email and list managers ?
    We use the internet email standard address "postmaster"


On Wed, Oct 17, 2007 at 06:36:19PM +0200, Jan Engelhardt wrote:
> Date: Wed, 17 Oct 2007 18:36:19 +0200 (CEST)
> From: Jan Engelhardt <jengelh@computergmbh.de>
> To: nicholas.thompson1@mchsi.com
> cc: linux-kernel@vger.kernel.org
> Subject: Re: your mail
> 
> On Oct 17 2007 16:30, nicholas.thompson1@mchsi.com wrote:
> >Date: Wed, 17 Oct 2007 16:30:24 +0000
> >From:  <nicholas.thompson1@mchsi.com>
> >To:  <linux-kernel@vger.kernel.org>
>              ^^^^^^
> >
> >subscribe linux-alpha
>                  ^^^^^

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-10-17 16:30 nicholas.thompson1
@ 2007-10-17 16:36 ` Jan Engelhardt
  2007-10-17 17:50   ` Matti Aarnio
  0 siblings, 1 reply; 657+ messages in thread
From: Jan Engelhardt @ 2007-10-17 16:36 UTC (permalink / raw)
  To: nicholas.thompson1; +Cc: linux-kernel


On Oct 17 2007 16:30, nicholas.thompson1@mchsi.com wrote:
>Date: Wed, 17 Oct 2007 16:30:24 +0000
>From:  <nicholas.thompson1@mchsi.com>
>To:  <linux-kernel@vger.kernel.org>
             ^^^^^^
>
>subscribe linux-alpha
                 ^^^^^


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-09-24 20:44 Steven Rostedt
@ 2007-09-24 20:50 ` Steven Rostedt
  0 siblings, 0 replies; 657+ messages in thread
From: Steven Rostedt @ 2007-09-24 20:50 UTC (permalink / raw)
  To: Jaswinder Singh; +Cc: linux-kernel, mingo, linux-rt-users



--
On Mon, 24 Sep 2007, Steven Rostedt wrote:

> linux-rt-users@vger.kernel.org
> Bcc:
> Subject: Re: realtime preemption performance difference
> Reply-To:
> In-Reply-To: <3f9a31f40709240448h4a9e8337t437328b5c675ecd5@mail.gmail.com>

[ I'm actually just learning how to screw-up^Wuse mutt ]

bah!

-- Steve


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-05-16 17:11   ` Olof Johansson
@ 2007-05-16 17:24     ` Bob Picco
  0 siblings, 0 replies; 657+ messages in thread
From: Bob Picco @ 2007-05-16 17:24 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Linas Vepstas, Bob Picco, johnrose, linuxppc-dev, Andrew Morton,
	linux-kernel

Olof Johansson wrote:	[Wed May 16 2007, 01:11:00PM EDT]
> On Wed, May 16, 2007 at 11:43:41AM -0500, Linas Vepstas wrote:
> > On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote:
> > > Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage
> > > 
> > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: error: unknown field `subsys' specified in initializer
> > > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: warning: initialization from incompatible pointer type
> > > make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1
> > > make[3]: *** [drivers/pci/hotplug] Error 2
> > > make[2]: *** [drivers/pci] Error 2
> > > make[1]: *** [drivers] Error 2
> > > make: *** [_all] Error 2
> > 
> > John Rose is working to fix this "real soon now".
> 
> Do you mean the fix Al Viro posted yesterday?
> 
> http://patchwork.ozlabs.org/linuxppc/patch?id=11177
> 
> 
> -Olof
Missed that patch.

thanks,

bob

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-05-16 16:43 ` your mail Linas Vepstas
@ 2007-05-16 17:11   ` Olof Johansson
  2007-05-16 17:24     ` Bob Picco
  0 siblings, 1 reply; 657+ messages in thread
From: Olof Johansson @ 2007-05-16 17:11 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Bob Picco, johnrose, linuxppc-dev, Andrew Morton, linux-kernel

On Wed, May 16, 2007 at 11:43:41AM -0500, Linas Vepstas wrote:
> On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote:
> > Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage
> > 
> > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: error: unknown field `subsys' specified in initializer
> > /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: warning: initialization from incompatible pointer type
> > make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1
> > make[3]: *** [drivers/pci/hotplug] Error 2
> > make[2]: *** [drivers/pci] Error 2
> > make[1]: *** [drivers] Error 2
> > make: *** [_all] Error 2
> 
> John Rose is working to fix this "real soon now".

Do you mean the fix Al Viro posted yesterday?

http://patchwork.ozlabs.org/linuxppc/patch?id=11177


-Olof

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-05-16 13:30 Bob Picco
@ 2007-05-16 16:43 ` Linas Vepstas
  2007-05-16 17:11   ` Olof Johansson
  0 siblings, 1 reply; 657+ messages in thread
From: Linas Vepstas @ 2007-05-16 16:43 UTC (permalink / raw)
  To: Bob Picco, johnrose; +Cc: Andrew Morton, linuxppc-dev, linux-kernel

On Wed, May 16, 2007 at 09:30:46AM -0400, Bob Picco wrote:
> Subject: Re: 2.6.22-rc1-mm1 powerpc build breakage
> 
> /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: error: unknown field `subsys' specified in initializer
> /usr/src/linux-2.6.22-rc1-mm1/drivers/pci/hotplug/rpadlpar_sysfs.c:132: warning: initialization from incompatible pointer type
> make[4]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1
> make[3]: *** [drivers/pci/hotplug] Error 2
> make[2]: *** [drivers/pci] Error 2
> make[1]: *** [drivers] Error 2
> make: *** [_all] Error 2

John Rose is working to fix this "real soon now".

--linas

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-03-29 21:42 ` your mail Jan Engelhardt
  2007-03-29 21:46   ` David Miller
@ 2007-03-29 21:48   ` Gerard Braad
  1 sibling, 0 replies; 657+ messages in thread
From: Gerard Braad @ 2007-03-29 21:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Sorry, this wasn't supposed to happen. Already done...
Unsubscribed due to lack of a digest mail.

> I wonder why people can't send their unsubscribe message to the same
> address they sent their subscribe message to.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-03-29 21:42 ` your mail Jan Engelhardt
@ 2007-03-29 21:46   ` David Miller
  2007-03-29 21:48   ` Gerard Braad
  1 sibling, 0 replies; 657+ messages in thread
From: David Miller @ 2007-03-29 21:46 UTC (permalink / raw)
  To: jengelh; +Cc: linux-kernel

From: Jan Engelhardt <jengelh@linux01.gwdg.de>
Date: Thu, 29 Mar 2007 23:42:17 +0200 (MEST)

> > unsubscribe linux-kernel ..
> 
> I wonder why people can't send their unsubscribe message to the same 
> address they sent their subscribe message to.

People get frustrated that it doesn't work then start doing stupid
things like sending it to the actual list, like this person did.

Of course they always fail to consider doing the proper thing which is
to ask postmaster@vger.kernel.org or the list owner
(linux-kernel-owner@vger.kernel.org in this case) for help if it is
the case that their email has changed and they no longer have a way to
send from the subscribed address.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-03-29 21:39 Gerard Braad Jr.
@ 2007-03-29 21:42 ` Jan Engelhardt
  2007-03-29 21:46   ` David Miller
  2007-03-29 21:48   ` Gerard Braad
  0 siblings, 2 replies; 657+ messages in thread
From: Jan Engelhardt @ 2007-03-29 21:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List


>
> unsubscribe linux-kernel ..

I wonder why people can't send their unsubscribe message to the same 
address they sent their subscribe message to.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-02-05 14:01   ` Pekka Enberg
@ 2007-02-06  9:41     ` Joerg Roedel
  0 siblings, 0 replies; 657+ messages in thread
From: Joerg Roedel @ 2007-02-06  9:41 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: logic, linux-kernel

On Mon, Feb 05, 2007 at 04:01:23PM +0200, Pekka Enberg wrote:
> Hi Joerg,
> 
> On 2/5/07, Joerg Roedel <joerg.roedel@amd.com> wrote:
> >Hmm, this seems to be the same issue as in [1] and [2]. A page that is
> >assumed to belong to the slab but is not longer marked as a slab page.
> >Could this be a bug in the memory management?
> 
> The BUG_ON triggers whenever you feed an invalid pointer to kfree() or
> kmem_cache_free() so I am guessing the caller is simply broken. Note
> that kernels prior to 2.6.18 would quietly corrupt the slab unless
> CONFIG_SLAB_DEBUG was enabled which might explain why this hasn't been
> noticed before.

Ok. I was not aware of that. Thanks for clarification.

Joerg

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-02-05 12:36 ` your mail Joerg Roedel
@ 2007-02-05 14:01   ` Pekka Enberg
  2007-02-06  9:41     ` Joerg Roedel
  0 siblings, 1 reply; 657+ messages in thread
From: Pekka Enberg @ 2007-02-05 14:01 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: logic, linux-kernel

Hi Joerg,

On 2/5/07, Joerg Roedel <joerg.roedel@amd.com> wrote:
> Hmm, this seems to be the same issue as in [1] and [2]. A page that is
> assumed to belong to the slab but is not longer marked as a slab page.
> Could this be a bug in the memory management?

The BUG_ON triggers whenever you feed an invalid pointer to kfree() or
kmem_cache_free() so I am guessing the caller is simply broken. Note
that kernels prior to 2.6.18 would quietly corrupt the slab unless
CONFIG_SLAB_DEBUG was enabled which might explain why this hasn't been
noticed before.

                               Pekka

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2007-02-05 15:41 logic
@ 2007-02-05 12:36 ` Joerg Roedel
  2007-02-05 14:01   ` Pekka Enberg
  0 siblings, 1 reply; 657+ messages in thread
From: Joerg Roedel @ 2007-02-05 12:36 UTC (permalink / raw)
  To: logic; +Cc: linux-kernel

On Mon, Feb 05, 2007 at 05:41:29PM +0200, logic@thinknet.ro wrote:
> Good morning,
> 
> I am experiencing a bug i think. I am running a 2.6.19.2 kernel on a 3Ghz
> Intel with HT activated, 1 gb ram, and noname motherboard. Here is the
> output of the hang:

Hmm, this seems to be the same issue as in [1] and [2]. A page that is
assumed to belong to the slab but is not longer marked as a slab page.
Could this be a bug in the memory management?

Joerg

[1] http://lkml.org/lkml/2007/2/4/77
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=406477

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2005-11-25 22:06 root
@ 2005-11-26  0:11 ` Hugh Dickins
  0 siblings, 0 replies; 657+ messages in thread
From: Hugh Dickins @ 2005-11-26  0:11 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

On Fri, 25 Nov 2005, root wrote:

> Nov 25 21:59:24 txiringo kernel: [17182458.504000] program ddcprobe
> is using MAP_PRIVATE, PROT_WRITE mmap of VM_RESERVED memory, which
> is deprecated. Please report this to linux-kernel@vger.kernel.org

Thanks for the report: now fixed, please upgrade to 2.6.15-rc2-git3 or later.

Hugh

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2005-06-16 23:32 ` your mail Chris Wedgwood
@ 2005-06-17  1:46   ` Tom McNeal
  0 siblings, 0 replies; 657+ messages in thread
From: Tom McNeal @ 2005-06-17  1:46 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

I'll look at that.  This occurs on all Linux platforms, including a generic
2.4.31 I downloaded from kernel.org. The user test is trivial, just doing
the nonblocking connect, the poll, the send, and then the close, in that loop.

Tom

Chris Wedgwood wrote:
> On Thu, Jun 16, 2005 at 11:08:28PM +0000, trmcneal@comcast.net wrote:
> 
> 
>>>I've been working with some tcp network test programs that have
>>>multiple clients opening nonblocking sockets to a single server
>>>port, sending a short message, and then closing the socket,
>>>100,000 times.  Since the socket is non-blocking, it generally
>>>tries to connect and then does a poll since the socket is busy.
>>>The test fails if the poll times out in 10 seconds.  It fails
>>>consistently on Linux servers but succeeds on Solaris servers; the
>>>client is a non-issue unless its loopback on the Linux server.
> 
> 
> where is the code for this?  are you sure you're not overflowing the
> listen backlog somewhere?  that would show up in some cases but not
> all depending on latencies and local scheduler behavior
> 

-- 
Tom McNeal
(650)906-0761(cell)
(650)964-8459(fax)
Email: trmcneal@comcast.net

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2005-06-16 23:08 trmcneal
@ 2005-06-16 23:32 ` Chris Wedgwood
  2005-06-17  1:46   ` Tom McNeal
  0 siblings, 1 reply; 657+ messages in thread
From: Chris Wedgwood @ 2005-06-16 23:32 UTC (permalink / raw)
  To: trmcneal; +Cc: linux-kernel

On Thu, Jun 16, 2005 at 11:08:28PM +0000, trmcneal@comcast.net wrote:

> > I've been working with some tcp network test programs that have
> > multiple clients opening nonblocking sockets to a single server
> > port, sending a short message, and then closing the socket,
> > 100,000 times.  Since the socket is non-blocking, it generally
> > tries to connect and then does a poll since the socket is busy.
> > The test fails if the poll times out in 10 seconds.  It fails
> > consistently on Linux servers but succeeds on Solaris servers; the
> > client is a non-issue unless its loopback on the Linux server.

where is the code for this?  are you sure you're not overflowing the
listen backlog somewhere?  that would show up in some cases but not
all depending on latencies and local scheduler behavior

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2005-02-03  0:17 Aleksey Gorelov
  2005-02-03  1:12 ` your mail Matthew Dharm
@ 2005-02-03 16:03 ` Alan Stern
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Stern @ 2005-02-03 16:03 UTC (permalink / raw)
  To: Aleksey Gorelov; +Cc: mdharm-usb, linux-kernel

On Wed, 2 Feb 2005, Aleksey Gorelov wrote:

> Hi Matt, Alan, 
> 
>   Could you please tell me (link would do) why it makes default
> delay_use=5 
> really necessary (from the patch below)?
> https://lists.one-eyed-alien.net/pipermail/usb-storage/2004-August/00074
> 7.html
> 
> It makes USB boot really painfull and slow :(
> 
>   I understand there should be a good reason for it. I've tried to find
> an answer in 
> archives, without much success though.

Lots of devices don't need that delay, but enough of them do that we 
decided to add it.  The value of 5 seconds was more or less arbitrary; it 
was long enough for every device we could test and it didn't seem _too_ 
long.  Maybe 1 second would be long enough -- we just didn't know so we 
were conservative.

Alan Stern

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2005-02-03  0:17 Aleksey Gorelov
@ 2005-02-03  1:12 ` Matthew Dharm
  2005-02-03 16:03 ` Alan Stern
  1 sibling, 0 replies; 657+ messages in thread
From: Matthew Dharm @ 2005-02-03  1:12 UTC (permalink / raw)
  To: Aleksey Gorelov; +Cc: stern, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

It's basically just like the code says.

A lot of devices choke if you access them too quickly after enumeration.
The 5 second delay seems to be enough for most devices.  But we made it
adjustable exactly for people like you.

Matt

On Wed, Feb 02, 2005 at 04:17:13PM -0800, Aleksey Gorelov wrote:
> Hi Matt, Alan, 
> 
>   Could you please tell me (link would do) why it makes default
> delay_use=5 
> really necessary (from the patch below)?
> https://lists.one-eyed-alien.net/pipermail/usb-storage/2004-August/00074
> 7.html
> 
> It makes USB boot really painfull and slow :(
> 
>   I understand there should be a good reason for it. I've tried to find
> an answer in 
> archives, without much success though.
> 
> Thanks,
> Aleks.

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

Now payink attention, please.  This is mouse.  Click-click. Easy to 
use, da? Now you try...
					-- Pitr to Miranda
User Friendly, 10/11/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-09-19 12:29 plt
@ 2004-09-19 18:22 ` Jesper Juhl
  0 siblings, 0 replies; 657+ messages in thread
From: Jesper Juhl @ 2004-09-19 18:22 UTC (permalink / raw)
  To: plt; +Cc: linux-kernel

On Sun, 19 Sep 2004 plt@taylorassociate.com wrote:

> Question: Are you guys going to work on please cleaning up some of the errors in
> the code so we can get please get a more clean compile?
> 
I think it's safe to say that there is an ongoing effort to do that.

Some more strict typechecking has recently been introduced (read more 
here: http://kerneltrap.org/node/view/3848 ) and this currently cause a 
lot of compiler warnings that have yet to be cleaned, but that will happen 
in time - faster if you lend a hand.

> 
> drivers/mtd/nftlmount.c:44: warning: unused variable `oob'
> 
This is due to the fact that the code using that variable is currently 
within an  #if 0  block. I am not familiar with the mtd code, but the 
comment in there has this to say :

#if 0 /* Some people seem to have devices without ECC or erase marks
         on the Media Header blocks. There are enough other sanity
         checks in here that we can probably do without it.
      */

...

#endif

So it would seem that this bit of code could be on its way out. I'd assume 
that once it goes (if it does) that the variable will then be removed as 
well.


Ohh and btw, if you want people to pay attention to your emails you should 
try adding a descriptive Subject:  :)


--
Jesper Juhl


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 15:42 Jon Smirl
@ 2004-08-16 23:55 ` Dave Airlie
  0 siblings, 0 replies; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 23:55 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Christoph Hellwig, torvalds, Andrew Morton, linux-kernel


> But DRM still has to live with existing fbdev drivers. The same DRM
> code is used in 2.4 and 2.6 so existing fbdev drivers are not going
> away anytime soon. When DRM detects a fbdev it will revert back into
> stealth mode where is attaches itself to the hardware without telling
> the kernel that it is doing so. DRM can not use stealth mode when
> running without fbdev present since it will mess up hotplug by not
> marking the resources in use.
>
> I don't believe the ordering between fbdev and DRM is an issue. If you
> are using fbdev you likely have it compiled in. In that case fbdev
> always loads first and DRM second. In the non-ppc world, most of us
> have x86 boxes which don't use fbdev. In those machines DRM needs to be
> a first class driver. In the real world I don't know anyone other than
> a developer who would load DRM first and then fbdev. If this is a
> problem you will need to fixed fbdev to fall back into stealth mode
> like DRM does.

This is a good point, we are being forced into stealth mode by the fb
driver if they want to load after us they should respsect us and do the
same, (nope this isn't an us and them, DRM vs fb - I think we have a
solution and are heading the correct direction)...

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 12:37                 ` Christoph Hellwig
@ 2004-08-16 23:33                   ` Dave Airlie
  0 siblings, 0 replies; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 23:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alan Cox, torvalds, Andrew Morton, Linux Kernel Mailing List

>
> All the fbdev handling code in X is also an accident?

I've no idea I've nothing to do with X... but the fact that graphics work
at all with fb/drm/X is by no fault of any design it is pure hack ...

> Really, why do you even push for this change if the better fix isn't that
> far away.  Send the i915 driver and the other misc cleanups to Linus now
> and get a proper graphics stub driver done, it's not that much work.  I'll
> hack up the fbdev side once I'll get a little time, but the drm code is
> far to disgusting to touch, sorry.

It means writing 6 or 7 stub drivers, for cards we don't have, it means
making PCI probing different for some fbdev drivers and some DRM drivers
(e.g. the i915 doesn't have a framebuffer driver in 2.6 so do I write a
stub on the chance that someone writes an fb driver for it? -  why do this
when the DRM will start encompassing the fb soon..) it is a lot of work
that we intend throwing away, the final solution is not to merge DRM/fb
via a stub, it is to create a single driver for each card, what happens
when the DRM starts doing memory management and 2d stuff.. we won't want
fb to be able to load anymore as it will break the DRM...I see Jon Smirl
has found the thread, please discuss with him as he was the one doing all
the legwork at the kernel summit...

again this doesn't break any real setups, it is the path of least
resistance as it doesn't affect fb drivers, why should DRM be a second
class citizen, when it is clearly going to have to be a first class 2.6
driver to do its job... if you can find someone with a real world setup
that this breaks I'll consider it a really bad idea... but I think Jon has
made his point far better than I...

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
@ 2004-08-16 15:42 Jon Smirl
  2004-08-16 23:55 ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Jon Smirl @ 2004-08-16 15:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel, Dave Airlie

Graphics drivers in the kernel are broken. The kernel was never
designed to have two device drivers trying to control the same piece of
hardware. 
I have posted a long list of 25 points that we are working towards to
unify things. http://lkml.org/lkml/2004/8/2/111 The PCI ROM patch that
has been posted recently addresses the first one.

In the meanwhile we have to transition somehow between what we have and
where we are going. Since fbdev has taken the path to pretend that DRM
doesn't exist DRM has to go through a lot of trouble to work when fbdev
is in the system. DRM also has to work when fbdev is not in the system.

DRM is being reworked into a first class driver with full support for
2.6 and hotplug. Part of being a first class driver means that DRM has
to register itself with the kernel like a real driver and claim all of
it's resources. I'm also fixing the driver to use 2.6 module parameters
and to support dynamic assignment of minors. Sysfs support is in the
patch being discussed.

But DRM still has to live with existing fbdev drivers. The same DRM
code is used in 2.4 and 2.6 so existing fbdev drivers are not going
away anytime soon. When DRM detects a fbdev it will revert back into
stealth mode where is attaches itself to the hardware without telling
the kernel that it is doing so. DRM can not use stealth mode when
running without fbdev present since it will mess up hotplug by not
marking the resources in use.

I don't believe the ordering between fbdev and DRM is an issue. If you
are using fbdev you likely have it compiled in. In that case fbdev
always loads first and DRM second. In the non-ppc world, most of us
have x86 boxes which don't use fbdev. In those machines DRM needs to be
a first class driver. In the real world I don't know anyone other than
a developer who would load DRM first and then fbdev. If this is a
problem you will need to fixed fbdev to fall back into stealth mode
like DRM does.

I would like to encourage you to work towards the points on the above
referenced list. It has been widely distributed and commented on. It
has been posted to lkml, dri-dev, fb-dev and xorg lists and discussed
at OLS. 

Sorry, but I can't add an In-Reply-To header in the middle of thread on
yahoo. cc me on a reply to the main thread so that I will pick up the header.

=====
Jon Smirl
jonsmirl@yahoo.com

__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 12:24               ` Dave Airlie
@ 2004-08-16 12:37                 ` Christoph Hellwig
  2004-08-16 23:33                   ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16 12:37 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Christoph Hellwig, Alan Cox, torvalds, Andrew Morton,
	Linux Kernel Mailing List

On Mon, Aug 16, 2004 at 01:24:30PM +0100, Dave Airlie wrote:
> >
> > Works fine on all my pmacs here.  In fact X works only on fbdev for
> > full features.
> 
> I think Alan would classify that as luck rathar than design... and I would

All the fbdev handling code in X is also an accident?

Really, why do you even push for this change if the better fix isn't that
far away.  Send the i915 driver and the other misc cleanups to Linus now
and get a proper graphics stub driver done, it's not that much work.  I'll
hack up the fbdev side once I'll get a little time, but the drm code is
far to disgusting to touch, sorry.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 12:20             ` Christoph Hellwig
@ 2004-08-16 12:24               ` Dave Airlie
  2004-08-16 12:37                 ` Christoph Hellwig
  0 siblings, 1 reply; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 12:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alan Cox, torvalds, Andrew Morton, Linux Kernel Mailing List

>
> Works fine on all my pmacs here.  In fact X works only on fbdev for
> full features.

I think Alan would classify that as luck rathar than design... and I would
tend to agree, does it work if you load the driver modules in any order?
or do you always to fb then drm? or the other way around?

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 11:12           ` Alan Cox
@ 2004-08-16 12:20             ` Christoph Hellwig
  2004-08-16 12:24               ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16 12:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Christoph Hellwig, Dave Airlie, torvalds, Andrew Morton,
	Linux Kernel Mailing List

On Mon, Aug 16, 2004 at 12:12:00PM +0100, Alan Cox wrote:
> On Llu, 2004-08-16 at 10:50, Christoph Hellwig wrote:
> > no, now you're acting like an even more broken driver, preventing a fbdev
> > driver to be loaded afterwards and doing all kinds of funny things.  Please
> > revert to the old method until you have a common pci_driver for fbdev and dri.
> 
> fbdev and DRI are not functional together in the general case. They
> sometimes happen to work by luck. fbdev and X for that matter are
> generally incompatible except unaccelerated.

Works fine on all my pmacs here.  In fact X works only on fbdev for
full features.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 11:08                 ` Christoph Hellwig
  2004-08-16 11:12                   ` Alan Cox
@ 2004-08-16 11:47                   ` Dave Airlie
  1 sibling, 0 replies; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 11:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel

> > Yes and that is the final goal but you are dodging the point we cannot
> > jump to a fully finished state in one simple transition, it is great to
> > hear "fbdrv/drm into a common driver" it's a simple sentence surely coding
> > it must be simple, well its not and we are taking the route that should
>
> It _is+ simple.  Look at drivers/message/fusion/ for a driver doing multiple
> protocol on a single pci_driver.  I don't demand full-blown memory management
> integration or anything pother fancy.  Just get your crap sorted out.
>
> ou could propably have done a prototype in the time you wasted arguing here.

we could write one quick enough for one card but now make it work on
combinations of mach64/i810/radeon/r128/i830/i915/mga cards and tested so
that it doesn't break current setups, its just not going to happen, this
change doesn't break near as many setups (I'd be surprised if it broke any
real world setups at all...) I don't have the hardware to test this on all
those cards, the hope is to get the DRM into a state that we can start
proving the shared idea on one card.. it will also make changes to fb
drivers which I'm not comfortable with doing and will cause more hassles..

> I want you a) to back out this particular broken change in your current
> mega-patch.  and b) submit small reviewable changes in the future, as every
> other driver maintainer does.

I'm considering your argument and have taken it on-board, I await Linus's
decision for now, I'll start looking into the info you've given me and
I'll talk to the DRM people actually doing the work (not one line of this
is orignally from me!!..)

All DRM changes are available in small chunks in DRM CVS and DRM bk trees,
the -mm tree picks up the DRM changes and I fix the bugs that come up in
the -mm tree and then I submit the bk tree to Linus, I thought this was
how kernel development worked these days,

The patch you are against is
http://drm.bkbits.net:8080/drm-2.6/patch@1.1784.4.4?nav=index.html|tags|ChangeSet@1.1722.154.18..|cset@1.1784.4.4

with a couple of bugfixes on top of it from testing in -mm.. if I'm
missing the kernel development process somehow please inform me.. I'm new
to this maintainer job and the drm hasn't been maintained in years so I'm
not starting from a good place...

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 11:08                 ` Christoph Hellwig
@ 2004-08-16 11:12                   ` Alan Cox
  2004-08-16 11:47                   ` Dave Airlie
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Cox @ 2004-08-16 11:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Airlie, torvalds, Andrew Morton, Linux Kernel Mailing List

On Llu, 2004-08-16 at 12:08, Christoph Hellwig wrote:
> I want you a) to back out this particular broken change in your current
> mega-patch.  and b) submit small reviewable changes in the future, as every
> other driver maintainer does.

DRI is done as small reviewable changes. If you want to be involved then
follow the DRI list too or ask for the entire list to be gated to
linux-kernel for your pleasure...


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16  9:50         ` Christoph Hellwig
  2004-08-16 10:29           ` Dave Airlie
@ 2004-08-16 11:12           ` Alan Cox
  2004-08-16 12:20             ` Christoph Hellwig
  1 sibling, 1 reply; 657+ messages in thread
From: Alan Cox @ 2004-08-16 11:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Airlie, torvalds, Andrew Morton, Linux Kernel Mailing List

On Llu, 2004-08-16 at 10:50, Christoph Hellwig wrote:
> no, now you're acting like an even more broken driver, preventing a fbdev
> driver to be loaded afterwards and doing all kinds of funny things.  Please
> revert to the old method until you have a common pci_driver for fbdev and dri.

fbdev and DRI are not functional together in the general case. They
sometimes happen to work by luck. fbdev and X for that matter are
generally incompatible except unaccelerated.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 11:02               ` Dave Airlie
@ 2004-08-16 11:08                 ` Christoph Hellwig
  2004-08-16 11:12                   ` Alan Cox
  2004-08-16 11:47                   ` Dave Airlie
  0 siblings, 2 replies; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16 11:08 UTC (permalink / raw)
  To: Dave Airlie; +Cc: torvalds, Andrew Morton, linux-kernel

On Mon, Aug 16, 2004 at 12:02:15PM +0100, Dave Airlie wrote:
> > 	You do stop fb from beeing loaded after drm
> > and thus break perfectly working setups during stable series.  And you
> 
> I doubt anyone has a system that does it and they should have a broken one
> if they do it.. drm has also said you should load fb before it.. and
> having both fb and drm loaded on the same hardware is a hack anyways..

So fix it properly instead of making it even more broken.

> Yes and that is the final goal but you are dodging the point we cannot
> jump to a fully finished state in one simple transition, it is great to
> hear "fbdrv/drm into a common driver" it's a simple sentence surely coding
> it must be simple, well its not and we are taking the route that should

It _is+ simple.  Look at drivers/message/fusion/ for a driver doing multiple
protocol on a single pci_driver.  I don't demand full-blown memory management
integration or anything pother fancy.  Just get your crap sorted out.

ou could propably have done a prototype in the time you wasted arguing here.

> You seem to want us to go down the finished unmergeable mega-patch road
> to avoid breaking something that is broken and might work, the benefits
> don't outweight the costs.. so it makes no sense..

I want you a) to back out this particular broken change in your current
mega-patch.  and b) submit small reviewable changes in the future, as every
other driver maintainer does.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 10:38             ` Christoph Hellwig
@ 2004-08-16 11:02               ` Dave Airlie
  2004-08-16 11:08                 ` Christoph Hellwig
  0 siblings, 1 reply; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 11:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel

>
> 3) stop making broken changes.

The current system is broken in way more subtle ways...

> 	You do stop fb from beeing loaded after drm
> and thus break perfectly working setups during stable series.  And you

I doubt anyone has a system that does it and they should have a broken one
if they do it.. drm has also said you should load fb before it.. and
having both fb and drm loaded on the same hardware is a hack anyways..

> introduce indeterministic behaviour, and although I haven't looked at the
> code because unlike every guideline tells you you didn't post it to do the
> list, probably horribly broken code.

I just did post it, it's been in the DRM CVS tree for 3-6 mths now, it's
been in -mm for 1.5 mths, I've followed what Andrew and Linus told me to
do to get the DRM maintained... the link I posted in the last mail to the
broken out patch in the -mm tree, the only file to change is really
drm_drv.h and some bits in drm_stub.h... the current code is we have
discovered horribly broken in a lot of cases.. I've gotten nothing back to
say this code is any worse....

> If you want pci_driver semantics - and apparently you do - move fbdev
> and drm into a common driver or introduce a stub.  This was discussed to
> death and all kinds of list and Kernel Summit and now please follow what
> was agreed on instead of introducing subtile hacks.

Yes and that is the final goal but you are dodging the point we cannot
jump to a fully finished state in one simple transition, it is great to
hear "fbdrv/drm into a common driver" it's a simple sentence surely coding
it must be simple, well its not and we are taking the route that should
affect the least people, I'm majorly involved in the discussion and I was
the one to agree to carry out the maintenance paths between DRM and LK,
this code is needed for us to move forward with the merged drivers - if
Linus/Andrew decide not to merge it I'll go back to the DRM team and it'll
be reworked until they do accept it, but we have to stop the fb from
loading after the DRM at some stage and it may as well be earlier.. (if
2.7 was going to happen I'd wait but kernel development seems to be
changing...)

You seem to want us to go down the finished unmergeable mega-patch road
to avoid breaking something that is broken and might work, the benefits
don't outweight the costs.. so it makes no sense..

Again if Linus/Andrew bounce this we will have to rework it but something
like this has to go in at some stage...

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16 10:29           ` Dave Airlie
@ 2004-08-16 10:38             ` Christoph Hellwig
  2004-08-16 11:02               ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16 10:38 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Christoph Hellwig, torvalds, Andrew Morton, linux-kernel

On Mon, Aug 16, 2004 at 11:29:48AM +0100, Dave Airlie wrote:
> 1) move the DRM to be a real PCI driver now - stop fb from working on same
> card
> 
> 2) move the DRM to act like a real PCI driver when fb isn't loaded, when
> we merge we rip the code out..

3) stop making broken changes.

	You do stop fb from beeing loaded after drm
and thus break perfectly working setups during stable series.  And you
introduce indeterministic behaviour, and although I haven't looked at the
code because unlike every guideline tells you you didn't post it to do the
list, probably horribly broken code.

If you want pci_driver semantics - and apparently you do - move fbdev
and drm into a common driver or introduce a stub.  This was discussed to
death and all kinds of list and Kernel Summit and now please follow what
was agreed on instead of introducing subtile hacks.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16  9:50         ` Christoph Hellwig
@ 2004-08-16 10:29           ` Dave Airlie
  2004-08-16 10:38             ` Christoph Hellwig
  2004-08-16 11:12           ` Alan Cox
  1 sibling, 1 reply; 657+ messages in thread
From: Dave Airlie @ 2004-08-16 10:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel


>
> no, now you're acting like an even more broken driver, preventing a fbdev
> driver to be loaded afterwards and doing all kinds of funny things.  Please
> revert to the old method until you have a common pci_driver for fbdev and dri.
>

the options we have are
1) move the DRM to be a real PCI driver now - stop fb from working on same
card

2) move the DRM to act like a real PCI driver when fb isn't loaded, when
we merge we rip the code out..

the other option is not going to happen unless Linus/Andrew/Alan tell us
to go away do it that away and will then unconditionally merge a
mega-patch when I'm finished - you can't have it both ways we fix things
step-by-step or we leave it as is and nobody fixes it, so Christoph I
repsect your opinion but unless you care about this enough to do the work
on it, the way we are going seems to be the best way to avoid breaking
things and I'm leaving the decision on whether to merge this stuff or not
to Linus/Andrew - btw in case anyone wants to look the patch is whats at:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8-rc4/2.6.8-rc4-mm1/broken-out/bk-drm.patch

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16  9:30       ` Dave Airlie
@ 2004-08-16  9:50         ` Christoph Hellwig
  2004-08-16 10:29           ` Dave Airlie
  2004-08-16 11:12           ` Alan Cox
  0 siblings, 2 replies; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16  9:50 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Christoph Hellwig, torvalds, Andrew Morton, linux-kernel

On Mon, Aug 16, 2004 at 10:30:55AM +0100, Dave Airlie wrote:
> >
> > Eeek, doing different styles of probing is even worse than what you did
> > before.  Please revert to pci_find_device() util you havea proper common
> > driver ready.
> 
> There was nothing wrong with what we did before it just happened to work
> like 2.4. we are now acting like real 2.6 drivers,

no, now you're acting like an even more broken driver, preventing a fbdev
driver to be loaded afterwards and doing all kinds of funny things.  Please
revert to the old method until you have a common pci_driver for fbdev and dri.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-16  9:17     ` Christoph Hellwig
@ 2004-08-16  9:30       ` Dave Airlie
  2004-08-16  9:50         ` Christoph Hellwig
  0 siblings, 1 reply; 657+ messages in thread
From: Dave Airlie @ 2004-08-16  9:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel

>
> Eeek, doing different styles of probing is even worse than what you did
> before.  Please revert to pci_find_device() util you havea proper common
> driver ready.

There was nothing wrong with what we did before it just happened to work
like 2.4. we are now acting like real 2.6 drivers, which we need to do for
sysfs and hotplug to work, Jon Smirl is working on a proper minor device
support (like USB does I think)... we need to get this work done before we
can have proper common drivers and I don't want to do all this work in
hiding and then have it refused because we told no-one,

The DRM will flux a lot over the next while (while we get this common
drm/fb stuff together) and as long as we can keep the changes from
actually breaking it I think people should be able to live with it ...

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-15 23:40   ` Dave Airlie
@ 2004-08-16  9:17     ` Christoph Hellwig
  2004-08-16  9:30       ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-16  9:17 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Christoph Hellwig, torvalds, Andrew Morton, linux-kernel

On Mon, Aug 16, 2004 at 12:40:43AM +0100, Dave Airlie wrote:
> Probably should say PCI APIs properly, it now does enable/disable devices
> and registers the DRM as owning the memory regions, does proper PCI
> probing .. in cases where the fb is loaded on the card already it falls
> back to the old ways (evil direct register writing.. ), this change will
> stop you loading the fb driver adter the drm driver but this shouldn't be
> a common case at all..

Eeek, doing different styles of probing is even worse than what you did
before.  Please revert to pci_find_device() util you havea proper common
driver ready.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-15 12:34 ` your mail Christoph Hellwig
@ 2004-08-15 23:40   ` Dave Airlie
  2004-08-16  9:17     ` Christoph Hellwig
  0 siblings, 1 reply; 657+ messages in thread
From: Dave Airlie @ 2004-08-15 23:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: torvalds, Andrew Morton, linux-kernel

> On Sun, Aug 15, 2004 at 01:19:31PM +0100, Dave Airlie wrote:
> > Graphics, and the DRM now uses PCI properly if no framebuffer is loaded
> > (it falls back if framebuffer is enabled...),
>
> Can you explain what this means?
>

Probably should say PCI APIs properly, it now does enable/disable devices
and registers the DRM as owning the memory regions, does proper PCI
probing .. in cases where the fb is loaded on the card already it falls
back to the old ways (evil direct register writing.. ), this change will
stop you loading the fb driver adter the drm driver but this shouldn't be
a common case at all..

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-08-15 12:19 Dave Airlie
@ 2004-08-15 12:34 ` Christoph Hellwig
  2004-08-15 23:40   ` Dave Airlie
  0 siblings, 1 reply; 657+ messages in thread
From: Christoph Hellwig @ 2004-08-15 12:34 UTC (permalink / raw)
  To: Dave Airlie; +Cc: torvalds, Andrew Morton, linux-kernel

On Sun, Aug 15, 2004 at 01:19:31PM +0100, Dave Airlie wrote:
> Graphics, and the DRM now uses PCI properly if no framebuffer is loaded
> (it falls back if framebuffer is enabled...),

Can you explain what this means?


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-05-24 23:04 Laughlin, Joseph V
  2004-05-24 23:13 ` Bernd Petrovitsch
@ 2004-05-24 23:21 ` Chris Wright
  1 sibling, 0 replies; 657+ messages in thread
From: Chris Wright @ 2004-05-24 23:21 UTC (permalink / raw)
  To: Laughlin, Joseph V; +Cc: Herbert Poetzl, linux-kernel

* Laughlin, Joseph V (Joseph.V.Laughlin@boeing.com) wrote:
> Currently, we're using sched_setaffinity() to control it, which existed
> in our 2.4.19 kernel.  (but, you have to be root to use it, and we'd
> like non-root users to be able to change the affinity.)

Sounds like it's patched in.  And it likely doesn't require root per se,
but CAP_SYS_NICE (as the 2.6 code does).

So, you've got choices of how to disable those capability checks to do
what you want.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2004-05-24 23:15 Laughlin, Joseph V
  0 siblings, 0 replies; 657+ messages in thread
From: Laughlin, Joseph V @ 2004-05-24 23:15 UTC (permalink / raw)
  To: Bernd Petrovitsch, linux-kernel

> -----Original Message-----
> From: Bernd Petrovitsch [mailto:bernd@firmix.at] 
> Sent: Monday, May 24, 2004 4:13 PM
> To: Laughlin, Joseph V; linux-kernel@vger.kernel.org
> Subject: RE: your mail
> 
> 
> On Tue, 2004-05-25 at 01:04, Laughlin, Joseph V wrote:
> > > -----Original Message-----
> [...]
> > > On Mon, May 24, 2004 at 03:20:33PM -0700, Laughlin, 
> Joseph V wrote:
> > > > I've been tasked with modifying a 2.4 kernel so that a
> > > non-root user
> > > > can do the following:
> > > > 
> > > > Dynamically change the priorities of processes (up and 
> down) Lock
> > > > processes in memory Can change process cpu affinity
> [...]
> > Currently, we're using sched_setaffinity() to control it, which 
> > existed in our 2.4.19 kernel.  (but, you have to be root to use it, 
> > and we'd like non-root users to be able to change the affinity.)
> 
> And using sudo or setuid Binaries?
> 
> 	Bernd
> -- 

Not an option, unfortunately. 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
  2004-05-24 23:04 Laughlin, Joseph V
@ 2004-05-24 23:13 ` Bernd Petrovitsch
  2004-05-24 23:21 ` Chris Wright
  1 sibling, 0 replies; 657+ messages in thread
From: Bernd Petrovitsch @ 2004-05-24 23:13 UTC (permalink / raw)
  To: Laughlin, Joseph V, linux-kernel

On Tue, 2004-05-25 at 01:04, Laughlin, Joseph V wrote:
> > -----Original Message-----
[...]
> > On Mon, May 24, 2004 at 03:20:33PM -0700, Laughlin, Joseph V wrote:
> > > I've been tasked with modifying a 2.4 kernel so that a 
> > non-root user 
> > > can do the following:
> > > 
> > > Dynamically change the priorities of processes (up and down) Lock 
> > > processes in memory Can change process cpu affinity
[...]
> Currently, we're using sched_setaffinity() to control it, which existed
> in our 2.4.19 kernel.  (but, you have to be root to use it, and we'd
> like non-root users to be able to change the affinity.)

And using sudo or setuid Binaries?

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services



^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2004-05-24 23:04 Laughlin, Joseph V
  2004-05-24 23:13 ` Bernd Petrovitsch
  2004-05-24 23:21 ` Chris Wright
  0 siblings, 2 replies; 657+ messages in thread
From: Laughlin, Joseph V @ 2004-05-24 23:04 UTC (permalink / raw)
  To: Herbert Poetzl; +Cc: linux-kernel

> -----Original Message-----
> From: Herbert Poetzl [mailto:herbert@13thfloor.at] 
> Sent: Monday, May 24, 2004 3:30 PM
> To: Laughlin, Joseph V
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: your mail
> 
> 
> On Mon, May 24, 2004 at 03:20:33PM -0700, Laughlin, Joseph V wrote:
> > I've been tasked with modifying a 2.4 kernel so that a 
> non-root user 
> > can do the following:
> > 
> > Dynamically change the priorities of processes (up and down) Lock 
> > processes in memory Can change process cpu affinity
> > 
> > Anyone got any ideas about how I could start doing this?  
> (I'm new to 
> > kernel development, btw.)
> 
> check the kernel capability system ...
> (it's quite simple)
> 
> #define CAP_SYS_NICE         23
> #define CAP_IPC_LOCK         14
> 
> cpu scheduler affinity isn't part of 2.4 AFAIK
> so there is no easy way to 'control' it ...
> 

Currently, we're using sched_setaffinity() to control it, which existed
in our 2.4.19 kernel.  (but, you have to be root to use it, and we'd
like non-root users to be able to change the affinity.)

Joe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-05-24 22:30 ` your mail Herbert Poetzl
@ 2004-05-24 22:34   ` Marc-Christian Petersen
  0 siblings, 0 replies; 657+ messages in thread
From: Marc-Christian Petersen @ 2004-05-24 22:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Herbert Poetzl, Laughlin, Joseph V

On Tuesday 25 May 2004 00:30, Herbert Poetzl wrote:

Hi Joseph,

> > Dynamically change the priorities of processes (up and down)
> > Lock processes in memory
> > Can change process cpu affinity
> > Anyone got any ideas about how I could start doing this?  (I'm new to
> > kernel development, btw.)
> check the kernel capability system ...
> (it's quite simple)
> #define CAP_SYS_NICE         23
> #define CAP_IPC_LOCK         14
> cpu scheduler affinity isn't part of 2.4 AFAIK
> so there is no easy way to 'control' it ...

at least I have a patch in my 2.4-tree where a user in a predefined GID 
(changeable via /proc) can change his/her nice of his/her own processes up 
and down.

ciao, Marc

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-05-24 22:20 Laughlin, Joseph V
  2004-05-24 22:30 ` your mail Herbert Poetzl
@ 2004-05-24 22:33 ` Chris Wright
  1 sibling, 0 replies; 657+ messages in thread
From: Chris Wright @ 2004-05-24 22:33 UTC (permalink / raw)
  To: Laughlin, Joseph V; +Cc: linux-kernel

* Laughlin, Joseph V (Joseph.V.Laughlin@boeing.com) wrote:
> I've been tasked with modifying a 2.4 kernel so that a non-root user can
> do the following:
> 
> Dynamically change the priorities of processes (up and down)

Requires CAP_SYS_NICE.

> Lock processes in memory

Currently requires CAP_IPC_LOCK.  However, this one is already been
done using rlimits (at least via mlock() and friends, SHM_LOCK has
different issue).

> Can change process cpu affinity

Requires CAP_SYS_NICE (but I believe this was a 2.6 feature).

> Anyone got any ideas about how I could start doing this?  (I'm new to
> kernel development, btw.)

There's a few approaches floating about.  Probably the simplest is to
disable the checks globally, but this will also be less secure.  I have
an example of this in 2.6 if you'd like.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-05-24 22:20 Laughlin, Joseph V
@ 2004-05-24 22:30 ` Herbert Poetzl
  2004-05-24 22:34   ` Marc-Christian Petersen
  2004-05-24 22:33 ` Chris Wright
  1 sibling, 1 reply; 657+ messages in thread
From: Herbert Poetzl @ 2004-05-24 22:30 UTC (permalink / raw)
  To: Laughlin, Joseph V; +Cc: linux-kernel

On Mon, May 24, 2004 at 03:20:33PM -0700, Laughlin, Joseph V wrote:
> I've been tasked with modifying a 2.4 kernel so that a non-root user can
> do the following:
> 
> Dynamically change the priorities of processes (up and down)
> Lock processes in memory
> Can change process cpu affinity
> 
> Anyone got any ideas about how I could start doing this?  (I'm new to
> kernel development, btw.)

check the kernel capability system ...
(it's quite simple)

#define CAP_SYS_NICE         23
#define CAP_IPC_LOCK         14

cpu scheduler affinity isn't part of 2.4 AFAIK
so there is no easy way to 'control' it ...

HTH,
Herbert

> Thanks,
> 
> Joe Laughlin
> Phantom Works - Integrated Technology Development Labs 
> The Boeing Company
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-04-29  3:03 whitehorse
@ 2004-04-29  3:21 ` Jon
  0 siblings, 0 replies; 657+ messages in thread
From: Jon @ 2004-04-29  3:21 UTC (permalink / raw)
  To: whitehorse; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

On Wed, Apr 28, 2004 at 11:03:08PM -0400, whitehorse@mustika.net wrote:
> dear Sir,
>  I have a problem in compiling kernel 2.6.4 from kernel 2.4.19. I use
>  Debian woody. When I rebooting new kernel, some message occur such:
>  "modprobe: QM_MODULES: function not implemented"
>  and I can't load my modules when boot. I would like to waiting any one who
>  answer this. Please send to this mail. Thanks
> 
>  Best regards,
> 
>  Hafid
>  Indonesia
> 
You need to install module-init-tools which is not in Debian Woody
A backport of it for x86 machines is here
http://www.backports.org/debian/dists/woody/module-init-tools/
-- 
Jon
http://tesla.resnet.mtu.edu
The only meaning in life is the meaning you create for it.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <200404121623.42558.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-04-13 13:46 ` James Morris
  0 siblings, 0 replies; 657+ messages in thread
From: James Morris @ 2004-04-13 13:46 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: David S. Miller, netdev,
	YOSHIFUJI Hideaki / 吉藤英明,
	linux-kernel

On Mon, 12 Apr 2004, Denis Vlasenko wrote:

> According to my measurements,
> 
> ip_vs_control_add() (from include/net/ip_vs.h) is called twice
> and
> sock_queue_rcv_skb() (from include/net/sock.h) is called 19 times
> from various kernel .c files.
> 
> Both these includes generate more than 500 bytes of code on x86.
> 
> These patches uninline them. Please apply.

What kind of performance impact (if any) does this patch have?


- James
-- 
James Morris
<jmorris@redhat.com>



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-04-09 17:54 Martin Knoblauch
@ 2004-04-09 18:12 ` Joel Jaeggli
  0 siblings, 0 replies; 657+ messages in thread
From: Joel Jaeggli @ 2004-04-09 18:12 UTC (permalink / raw)
  To: Martin Knoblauch; +Cc: linux-kernel

On Fri, 9 Apr 2004, Martin Knoblauch wrote:

> >I was wondering if for linux or better for a linux filesystem
> >there is something like dynamic swapping of files possible.
> >For explanation: I habeaccess to an Infinstor via NFS and
> >linux is runnig there. This server has a nice funtion I'd
> >like to have: if there are files that are not used for a
> >specified time (i.e. 30 days) they are moved to another storage
> >(disk and after that to an streamer tape) and are replaced
> >by some kind of 'link'. So if you look at your directory you
> >can see everything that was there, but if you try to open it,
> >you have to wait a moment (some seconds if the file was
> >swapped to another disk) oder just another moment (some
> >minutes if the file is on a tape) and then it restored at
> >it's old place.
> >
> 
>  Good description of a HSM (Hierarchical Storage Management)
> System.
> 
> >So is there anything which provides such a feature? By now
> >I have a little script that moves such files out of the way and
> >replaces them by links. But restoring is somewhat harder and
> >it's not automatic.
> >
> >Any ideas?
> >

part of the thing for us (my group at UO) right now, is tape robots aren't
cheaper than disk, so a lot of our offline/near-line backup is slowly
moving in that direction... 1TB lto jukeboxs cost order of $8-9K ea and
the driver for your commercial tabe-backup software can cost nearly that
much on top of it, but I can put 3.5TB of disk in a 5u enclosure and
locate in some other building for a similar price if not less. Even If buy
it in something like a netapp filer it's still only around $10,000 a TB so
HSM systems involving tape don't really have the same apeal as when we
were paying $1200ea for 4GB scsi disks. If I had sunk costs in something
like a storagetek powerhorn with 6000 tape capacity I might think a little
differently but I suspect your situation is closer to mine that it is to
the sorts of people who buy those.

>  Really depends. As far as I know thare are no "free" HSM Systems
> out there for Linux The only one that I am faintly familiar with
> that runs on Linux is StorNext from ADIC. Definitely not free.
> 
>  DMF/Irix may now be ported to Linux (Altix/IA64), but I doubt
> it will be free.
> 
>  Sun is most likely not (yet) interested in doing a Linux port
> of SAM-FS (there are still Sparc/Solaris Machines to sell).
> And it won't be free (my guess).
> 
>  Tivoli/IBM and UniTree are also sold for Linux. Again "sold" is
> the important word
> 
> Martin
> 
> 
> =====
> ------------------------------------------------------
> Martin Knoblauch
> email: k n o b i AT knobisoft DOT de
> www:   http://www.knobisoft.de
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja@darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-03-15 22:49 Kevin Leung
@ 2004-03-15 23:26 ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2004-03-15 23:26 UTC (permalink / raw)
  To: Kevin Leung; +Cc: linux-kernel

On Mon, 15 Mar 2004, Kevin Leung wrote:

> Hello All,
>
> I am very new to Linux and am working on a project. The nature of the
> project is to essentially record all process/thread scheduling activity for
> use in a later application. I wanted to know if any experts out there knew
> of any libraries that could essentially "monitor" or "listen" for any
> scheduling changes made. For instance if the kernel decides to set process A
> from "sleeping" to "running" and process B from "running" to "sleeping", I
> wanted to know if there was a function that could generate an immediate
> notification of this event.

No. FYI, there are hundreds-of-thousands of such "events" per second
of operation! Basically, any time some task is waiting for I/O its
CPU is taken away and given to somebody else. This is what "sleeping"
usually means. Once the I/O completes, the task gets the CPU
again and that's what "running" means. If you were to instrument
these two state-changes for all tasks, it would certainly leave
only a new percent of CPU available for the tasks. This would
royally screw up the meaning of anything you were trying to
instrument.

> Priority change information is also desireable.

If you mean the dynamic priority that keeps changing until
the task is executed, no. If you mean priority like
'nice', you can instrument the sys-call.

> The more aspects which trigger notificaiton, the better. As a first attempt,

There is a kernel logging daemon that writes 'printk' messages. This
works by having a user-mode daemon open and read /proc/kmsg. You can
make a similar communications interface, using the existing daemon
as a template, that will instrument anything you want.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-02-24 13:58 Jim Deas
@ 2004-02-24 14:44 ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2004-02-24 14:44 UTC (permalink / raw)
  To: Jim Deas; +Cc: linux-kernel

On Tue, 24 Feb 2004, Jim Deas wrote:

> Can someone point me in the right direction.
> I am getting a oops on a driver I am porting from 2.4 to 2.6.2 kernel.
> I have expanded the file_operations structures and have a driver that
> loads and inits the hardware but when I call the open function I
> get an oops. The best I can track it is
>

Fix your line-warp!

> EIP 0060:[c0188954]
> chrdev_open +0x104
>
> What is the best debug tool to put this oops information in clear
> sight? It appears to never get to my modules open routine so I am
> at a debugging crossroad. What is the option on a kernel compile
> to get the compile listing so I can see what is at 0x104 in this
> block of code?
>

Nothing is going to help with that EIP with a segment value of
0x60. It looks like some dumb coding error, using a pointer
that disappeared after the module init function. In other
words, it's probably something like:

int __init init_module()
{
    struct file_operations fops;
    mset(&fops, 0x00, sizeof(fops));
    fops.open = open;
    fops.release = close;
    fops.owner = THIS_MODULE;
    register_chrdev(DEV_MAJOR, dev, &fops);
}

So, everything in init_module is GONE. Your program calls open()
and the pointer in the kernel gets dereferenced to junk.

There are kernel debugging tools, however I have found that
the most useful tools are printk() and some discipline.

In the case of code above, don't just change the declaration
of the fops object to static. Instead, move it outside the
function, so it's obviously where it won't go away.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-02-19 13:52 Joilnen Leite
@ 2004-02-19 14:12 ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2004-02-19 14:12 UTC (permalink / raw)
  To: Joilnen Leite; +Cc: linux-kernel, linux-ide

On Thu, 19 Feb 2004, Joilnen Leite wrote:

> excuse me friends shcedule_timeout(1) is not a problem
> for spin_lock_irqsave ?
>
> drivers/scsi/ide-scsi.c:897
>
> spin_lock_irqsave(&ide_lock, flags);
> while (HWGROUP(drive)->handler) {
>        HWGROUP(drive)->handler = NULL;
>        schedule_timeout(1);
> }
>
> pub 1024D/5139533E Joilnen Batista Leite
> F565 BD0B 1A39 390D 827E 03E5 0CD4 0F20 5139 533E

What kernel version?  It is very bad. You can't sleep with
a spin-lock being held!

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an Intel Pentium III machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2004-02-13 19:23 Bloch, Jack
  0 siblings, 0 replies; 657+ messages in thread
From: Bloch, Jack @ 2004-02-13 19:23 UTC (permalink / raw)
  To: 'Maciej Zenczykowski'; +Cc: 'linux-kernel@vger.kernel.org'

By the way shouldn't a munmap call really free the memory. I have an strace
showing that the process calls munmap a lot but I do not seeany gaps in the
map file

Jack Bloch 
Siemens ICN
phone                (561) 923-6550
e-mail                jack.bloch@icn.siemens.com

-----Original Message-----
From: Bloch, Jack 
Sent: Friday, February 13, 2004 2:14 PM
To: 'Maciej Zenczykowski'
Cc: linux-kernel@vger.kernel.org
Subject: RE: your mail

Yes, your assumtion about the 1GB is correct.

Jack Bloch 
Siemens ICN
phone                (561) 923-6550
e-mail                jack.bloch@icn.siemens.com

-----Original Message-----
From: Maciej Zenczykowski [mailto:maze@cela.pl]
Sent: Friday, February 13, 2004 1:11 PM
To: Bloch, Jack
Cc: linux-kernel@vger.kernel.org
Subject: Re: your mail

The deleted marks in question mean that the file in question has been 
unlinked (rm'ed), however it is still being used and the inode in question 
still exists.  This memory is in use and thus validly takes up mapping 
space.  You'd need to unmap inorder to free that memory.  Deleting a file 
does not delete that file until _all_ processes close and unmap any 
references to it.  What's more worrying is the large area of unmapped 
memory below 1GB (0x40000000), wonder why it doesn't get allocated?  But I 
think the answer is that the standard allocator only searches 1GB..3GB for 
free areas...

Cheers,
MaZe.

On Fri, 13 Feb 2004, Bloch, Jack wrote:

> I am running a 2.4.19 Kernel and have a problem where a process is using
the
> up to the 0xC0000000 of space. It is no longer possible for this process
to
> get any more memory vi mmap or via shmget. However, when I dump the
> /procs/#/maps file, I see large chunks of memory deleted. i.e this should
be
> freely available to be used by the next call. I do not see these addresses
> get re-used. The maps file is attached.
> 
>  <<9369>> 
> 
> Jack Bloch 
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
> 
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2004-02-13 19:14 Bloch, Jack
  0 siblings, 0 replies; 657+ messages in thread
From: Bloch, Jack @ 2004-02-13 19:14 UTC (permalink / raw)
  To: 'Maciej Zenczykowski'; +Cc: linux-kernel

Yes, your assumtion about the 1GB is correct.

Jack Bloch 
Siemens ICN
phone                (561) 923-6550
e-mail                jack.bloch@icn.siemens.com

-----Original Message-----
From: Maciej Zenczykowski [mailto:maze@cela.pl]
Sent: Friday, February 13, 2004 1:11 PM
To: Bloch, Jack
Cc: linux-kernel@vger.kernel.org
Subject: Re: your mail

The deleted marks in question mean that the file in question has been 
unlinked (rm'ed), however it is still being used and the inode in question 
still exists.  This memory is in use and thus validly takes up mapping 
space.  You'd need to unmap inorder to free that memory.  Deleting a file 
does not delete that file until _all_ processes close and unmap any 
references to it.  What's more worrying is the large area of unmapped 
memory below 1GB (0x40000000), wonder why it doesn't get allocated?  But I 
think the answer is that the standard allocator only searches 1GB..3GB for 
free areas...

Cheers,
MaZe.

On Fri, 13 Feb 2004, Bloch, Jack wrote:

> I am running a 2.4.19 Kernel and have a problem where a process is using
the
> up to the 0xC0000000 of space. It is no longer possible for this process
to
> get any more memory vi mmap or via shmget. However, when I dump the
> /procs/#/maps file, I see large chunks of memory deleted. i.e this should
be
> freely available to be used by the next call. I do not see these addresses
> get re-used. The maps file is attached.
> 
>  <<9369>> 
> 
> Jack Bloch 
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
> 
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-02-13 16:45 Bloch, Jack
@ 2004-02-13 18:11 ` Maciej Zenczykowski
  0 siblings, 0 replies; 657+ messages in thread
From: Maciej Zenczykowski @ 2004-02-13 18:11 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

The deleted marks in question mean that the file in question has been 
unlinked (rm'ed), however it is still being used and the inode in question 
still exists.  This memory is in use and thus validly takes up mapping 
space.  You'd need to unmap inorder to free that memory.  Deleting a file 
does not delete that file until _all_ processes close and unmap any 
references to it.  What's more worrying is the large area of unmapped 
memory below 1GB (0x40000000), wonder why it doesn't get allocated?  But I 
think the answer is that the standard allocator only searches 1GB..3GB for 
free areas...

Cheers,
MaZe.

On Fri, 13 Feb 2004, Bloch, Jack wrote:

> I am running a 2.4.19 Kernel and have a problem where a process is using the
> up to the 0xC0000000 of space. It is no longer possible for this process to
> get any more memory vi mmap or via shmget. However, when I dump the
> /procs/#/maps file, I see large chunks of memory deleted. i.e this should be
> freely available to be used by the next call. I do not see these addresses
> get re-used. The maps file is attached.
> 
>  <<9369>> 
> 
> Jack Bloch 
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
> 
> 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2004-02-10 23:36 Bloch, Jack
@ 2004-02-11  1:09 ` Maciej Zenczykowski
  0 siblings, 0 replies; 657+ messages in thread
From: Maciej Zenczykowski @ 2004-02-11  1:09 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

On Tue, 10 Feb 2004, Bloch, Jack wrote:

> I have a system with 2GB of memory. One of my processes calls mmap to try to
> map a 100MB file into memory. This calls fails with -ENOMEM. I rebuilt the
> kernel with a few debug printk statements in mmap.c to see where the failure
> was occurring it occurred in the function arch_get_unmapped_area. the code
> is as follows:
> 
> for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
> 		/* At this point:  (!vma || addr < vma->vm_end). */
> 		unsigned long __heap_stack_gap;
> 		if (TASK_SIZE - len < addr)
>                 { 

it's valid there's no point in searching further for an area of at least 
len bytes.  The user area is 0 .. TASK_SIZE-1.  The addr is the address 
currently being checked, the len is the requested length.  if addr+len is 
greater or equal to TASK_SIZE then the current addr (which is increasing 
within this loop) already causes such a mapping to overflow into kernel 
space (exceeds the TASK_SIZE virtual address limit).  This is precisely as 
expected.

I'd assume your program has fragmented memory to such a level that a 
single consecutive 100 MB area is no longer free (not that hard to do, 
since TASK_SIZE is 3 GB).

Cheers,
MaZe.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-26 22:27 ` your mail Linus Torvalds
@ 2004-01-05 10:59   ` Gerd Knorr
  0 siblings, 0 replies; 657+ messages in thread
From: Gerd Knorr @ 2004-01-05 10:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: caszonyi, linux-kernel

> 		....
>                         while (voffset >= sg_dma_len(vsg)) {
>                                 voffset -= sg_dma_len(vsg);
>                                 vsg++;
>                         }
> 		....

> I suspect the problem is that 
> 
> 	"voffset >= sg_dma_len(vsg)"
> 
> test: if "voffset" is _exactly_ the same as sg_dma_len(), then we will 
> test one more iteration (when "voffset" is 0), and that iteration may be 
> past the end of the "vsg" array.

That certainly makes sense, the 'v' plane is the last one in the memory
block for the video frame to be captured, so voffset / vsg will walk to
the last sg entry and may overrun described.  Good catch, I'm impressed.

> I suspect the fix might be to change the test to
> 
> 	"voffset && voffset >= sg_dma_len(vsg)"

Merged into my tree, thanks.

still busy with the xmas mail backlog,

  Gerd


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-26 20:20 caszonyi
@ 2003-12-26 22:27 ` Linus Torvalds
  2004-01-05 10:59   ` Gerd Knorr
  0 siblings, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2003-12-26 22:27 UTC (permalink / raw)
  To: caszonyi; +Cc: linux-kernel, kraxel

On Fri, 26 Dec 2003 caszonyi@rdslink.ro wrote:
> 
> I was trying to capture a tv program  with mencoder when the oops occured
> a couple  of hours later the system froze without leaving a single trace
> in logs. I was able to reboot with SysRq.
> 
> Programs versions, config and dmesg are attached.

Looks like this loop:

		....
                        while (voffset >= sg_dma_len(vsg)) {
                                voffset -= sg_dma_len(vsg);
                                vsg++;
                        }
		....

and in particular, it's the "sg_dma_len()" access that oopses, apparently 
because vsg was stale to begin with, or because it incremented past the 
last pointer.

The pointer that fails (0xc4bea00c) looks reasonable, so it's almost
certainly due to CONFIG_PAGE_DEBUG showing some kind of use-after-free
problem (ie the pointer is stale, and the memory has already been freed).

I suspect the problem is that 

	"voffset >= sg_dma_len(vsg)"

test: if "voffset" is _exactly_ the same as sg_dma_len(), then we will 
test one more iteration (when "voffset" is 0), and that iteration may be 
past the end of the "vsg" array.

I suspect the fix might be to change the test to

	"voffset && voffset >= sg_dma_len(vsg)"

to make sure that we never access vsg past the end of the array.

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-23 14:54 ` your mail Matti Aarnio
@ 2003-12-23 17:36   ` Norberto Bensa
  0 siblings, 0 replies; 657+ messages in thread
From: Norberto Bensa @ 2003-12-23 17:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Matti Aarnio

[-- Attachment #1: signed data --]
[-- Type: text/plain, Size: 354 bytes --]

Matti Aarnio wrote:
> Folks, I don't understand you...
> In EVERY list posting there are explicite instructions
> of how to unsubscribe, and STILL people do it wrong...

People doesn't read.

Regards,
Norberto

-- 
Linux 2.6.0-mm1 Pentium III (Coppermine) GenuineIntel GNU/Linux
 14:35:46 up 39 min,  1 user,  load average: 0.34, 0.18, 0.13

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-23 14:16 dublinux
@ 2003-12-23 14:54 ` Matti Aarnio
  2003-12-23 17:36   ` Norberto Bensa
  0 siblings, 1 reply; 657+ messages in thread
From: Matti Aarnio @ 2003-12-23 14:54 UTC (permalink / raw)
  To: dublinux; +Cc: linux-kernel

Folks, I don't understand you...
In EVERY list posting there are explicite instructions
of how to unsubscribe, and STILL people do it wrong...

Do tell us (postmaster@vger.kernel.org) if you do find that
there is something confusing, and should be improved.

  /Matti Aarnio -- one of  <postmaster@vger.kernel.org>

On Tue, Dec 23, 2003 at 03:16:22PM +0100, dublinux wrote:
> Date:	Tue, 23 Dec 2003 15:16:22 +0100
> From:	dublinux <dublinux@box.it>
> To:	linux-kernel@vger.kernel.org
> 
> unsubscribe linux-kernel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20031210120336.GU8039@holomorphy.com>
@ 2003-12-10 13:17 ` Stephan von Krawczynski
  0 siblings, 0 replies; 657+ messages in thread
From: Stephan von Krawczynski @ 2003-12-10 13:17 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: paul, marcelo.tosatti, thornber, linux-kernel

On Wed, 10 Dec 2003 04:03:36 -0800
William Lee Irwin III <wli@holomorphy.com> wrote:

> On Tue, 9 Dec 2003, William Lee Irwin III wrote:
> >>> Just apply the patch if you're for some reason terrified of 2.6.
> 
> On Wed, 10 Dec 2003 00:15:17 +0000 (GMT) Paul Jakma <paul@clubi.ie> wrote:
> >> Or get RedHat or Fedora to apply the patch.
> 
> On Wed, Dec 10, 2003 at 11:49:28AM +0000, skraw@ithnet.com wrote:
> > There it is again, this /dev/null argument.
> > "Multi-billion dollar companies" have gone bancrupt on the simple
> > fact that diversification of one product can rattle customers/users
> > to a degree that they in fact decide against the whole product range.
> > IOW go on with the idea to spread around an unknown number of kernel
> > versions and you can be sure that linux as a whole will greatly suffer.
> > This is a "user" issue, not a "developer" issue of course. Developers
> > can apply any kind of patches they like, but don't go and tell the
> > vast user base to "just apply patch xyz". They won't honor this at
> > all, your level of acceptance will dramatically drop.
> 
> One of the main reasons to have an open source OS is customization.
> Arguing that it's not truly feasible to customize will not hold water.

Are you calling a user-configured (not user-patched) kernel "customized" or
not?
_The_ top reason (at least when reading Al's posts :-) is probably that the
source is cross-checked by many eyes. If you create a infinite number of
patched kernel-versions it is obvious you will loose this primary advantage.
The more versions the fewer cross-checking.
IOW a "customized" but instable OS values exactly zero.

> Pretty much every "productized" version of Linux is heavily customized
> to get some kind of value-add. There's no reason to bother mainline
> with this; if it's a serious user issue of that magnitude vendors will
> pick it up.

"Serious" is a subjective argument, therefore different people see different
issues as serious. In my opinion a kernel.org kernel should cover most if not
all possible stable customizations, see it as a pool.
So my primary question for inclusion would not be "what is it worth?" but "does
it do any harm?". I am not god, therefore I do not and can not judge 
"worthness". Can you?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2003-12-03 16:19 Bloch, Jack
  0 siblings, 0 replies; 657+ messages in thread
From: Bloch, Jack @ 2003-12-03 16:19 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: linux-kernel

Thanks,

I found the problem. I do have errno.h included. I was doing a read of errno
after calling perror. If I read it directly after getting the neagtive 0ne
back, it contains the right value.

Jack Bloch 
Siemens ICN
phone                (561) 923-6550
e-mail                jack.bloch@icn.siemens.com


-----Original Message-----
From: Linus Torvalds [mailto:torvalds@osdl.org]
Sent: Wednesday, December 03, 2003 11:04 AM
To: Bloch, Jack
Cc: linux-kernel@vger.kernel.org
Subject: Re: your mail




On Wed, 3 Dec 2003, Bloch, Jack wrote:
>
> I try to open a non-existan device driver node file. The Kernel returns a
> value of -1 (expected). However, when I read the value of errno it
contains
> a value of 29. A call to the perror functrion does print out the correct
> error message (a value of 2). Why does this happen?

Because you forgot a "#include <errno.h>"? Or you have something else
wrong in your program that makes "errno" mean the wrong thing?

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-03 15:08 Bloch, Jack
  2003-12-03 15:43 ` your mail Richard B. Johnson
@ 2003-12-03 16:03 ` Linus Torvalds
  1 sibling, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2003-12-03 16:03 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel



On Wed, 3 Dec 2003, Bloch, Jack wrote:
>
> I try to open a non-existan device driver node file. The Kernel returns a
> value of -1 (expected). However, when I read the value of errno it contains
> a value of 29. A call to the perror functrion does print out the correct
> error message (a value of 2). Why does this happen?

Because you forgot a "#include <errno.h>"? Or you have something else
wrong in your program that makes "errno" mean the wrong thing?

		Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-12-03 15:08 Bloch, Jack
@ 2003-12-03 15:43 ` Richard B. Johnson
  2003-12-03 16:03 ` Linus Torvalds
  1 sibling, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2003-12-03 15:43 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

On Wed, 3 Dec 2003, Bloch, Jack wrote:

> I try to open a non-existan device driver node file. The Kernel returns a
> value of -1 (expected). However, when I read the value of errno it contains
> a value of 29. A call to the perror functrion does print out the correct
> error message (a value of 2). Why does this happen?
>
> Jack Bloch
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com


Because it doesn't happen! You are likely polluting the errno
variable either with another system call before you test it
or by not including the correct header file (errno may be a
MACRO).


Try this program:


#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>

int main(int args, char *argv[])
{
    int fd, save_errno;
    if(args < 2) {
        fprintf(stderr, "Usage:\n%s <filename>\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    if((fd = open(argv[1], O_RDONLY)) < 0) {
        save_errno = errno;
        perror("open");
        fprintf(stderr, "Was %d (%s)\n", save_errno, strerror(save_errno));
        exit(EXIT_FAILURE);
    }
    (void)close(fd);
    return 0;
}

Script started on Wed Dec  3 10:41:24 2003
# ./xxx /dev/XXX
open: No such file or directory
Was 2 (No such file or directory)
# ./xxx /dev/VXI
open: Operation not supported by device
Was 19 (Operation not supported by device)
# exit
exit
Script done on Wed Dec  3 10:42:12 2003

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 14:50     ` Dave Jones
  2003-09-30 15:30       ` Jamie Lokier
@ 2003-09-30 16:34       ` Adrian Bunk
  1 sibling, 0 replies; 657+ messages in thread
From: Adrian Bunk @ 2003-09-30 16:34 UTC (permalink / raw)
  To: Dave Jones, Jamie Lokier, John Bradford, akpm, torvalds, linux-kernel

On Tue, Sep 30, 2003 at 03:50:08PM +0100, Dave Jones wrote:
>...
>  > Basically, if you're building a
>  > distro boot kernel, you must turn on all known workarounds.  That's
>  > certainly lowest-common-denominator, but it's a far cry from the
>  > configuration that a 386-as-firewall user wants.
> 
> Ok, I see what you're getting at, but Adrian's patch turned arch/i386/Kconfig
> and arch/i386/Makefile into guacamole.  After spending so much time
> getting that crap into something maintainable, it seemed a huge step
> backwards to litter it with dozens of ifdefs and duplication.
> There has to be a cleaner way of pleasing everyone.
>...

Referring to the latest patch I sent:

arch/i386/Kconfig:
The only problems seem to be some CPU_ONLY_* derived symbols I haven't 
yet found a better solution for.

arch/i386/Makefile:
There are two ifdefs to deal with Pentium 4 and K7/K8 selected at the 
same time:
ifdef CONFIG_CPU_PENTIUM4
  cpuflags-$(CONFIG_CPU_K{7,8})    := ...
else
  cpuflags-$(CONFIG_CPU_K{7,8})    := ...
endif

That's perhaps not optimal but IMHO not that bad.

The dozens of ifdefs were in other areas where I tried to add some 
additional space optimizations. It was a mistake to put them into the 
same patch and in the latest patches I sent they were already separated 
and they are _not_ required for the CPU selection scheme.

> 		Dave

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 14:50     ` Dave Jones
@ 2003-09-30 15:30       ` Jamie Lokier
  2003-09-30 16:34       ` Adrian Bunk
  1 sibling, 0 replies; 657+ messages in thread
From: Jamie Lokier @ 2003-09-30 15:30 UTC (permalink / raw)
  To: Dave Jones, John Bradford, akpm, torvalds, linux-kernel

Dave Jones wrote:
>  > I'm not sure what the fuss is; a strict 386 kernel runs just fine
>  > without any problems on an Athlon.  But anyway...
> 
> Unless it got configured away as proposed in your earlier patch.

No, I don't understand.  What about my patch, or indeed anything else,
stops a "strict 386" kernel from running on an Athlon?

>  > The latter is for distro boots.  The former is for that
>  > 386-as-a-firewall with 1MB of RAM, where it _really_ has to trim
>  > everything it can, and no errata thank you.
> 
> Again, 'trimming' away a few hundred bytes of errata workarounds
> is ridiculous when we have bigger fish to fry where we can save
> KBs of .text size, and MB's of runtime memory.

Well I think both are worthwhile.  Low hanging fruit and all that -
this is an example of a small saving that's very clear and easy.

>  > I've not heard of anyone actually wanting a strict 386 kernel lately,
>  > but strict 486 is not so unusual.
> 
> ISTR that current gcc's emit 486 instructions anyway, so its possible
> that with a modern toolchain, you can't *build* a 386 kernel.
> I'm not sure if that got fixed or not, I don't track gcc lists any more.

Afaict GCC has fine targetting for the 386, better than it did years
ago.  It didn't used to use the "leave" instruction, have an option to
optimise for size, or options for selecting exactly which
architectural instruction set it would use.

Anyway, that there is very little difference between 386 and 486 from
an application point of view anyway.  You may be thinking of the
recent C++ ABI debacle, I think it was, which accidentially turned out
to require some instruction emulation in the Debian kernel.  I think
they've fixed it in GCC now.

>  > Just as some people want a P4 optimised kernel, and some people want a
>  > K7 optimised kernel, so some people want a 386 or 486 or Pentium
>  > optimised kernel.  Lowest-common-denominator means it runs on
>  > everything, and isn't really anything to do with 386 any more - that's
>  > not really the lowest-common-denominator, by virtue of the obvious
>  > fact that pure 386 code isn't reliable on all other CPUs.
> 
> Elaborate? "pure 386 code" (whatever that means in your definition)
> should run perfectly reliable on every CPU we care about.

If that were true, why are we talking about needing workarounds for
non-386 chips to work correctly?

The canonical example is the F00F sequence: reliable on a 386, crashes
a Pentium.  That's a fine example of pure 386 code not being reliable
on a higher CPU.  And that's why it isn't safe to run Linux 1.0 on
your Pentium web server.

> So first you argue for compiling out a few hundred bytes of errata
> workaround, now you want to instead compile in checks & printk's
> (which probably add up to not far off the same amount of space).

Oh, I have nothing against __init space :)

>  > By selecting a PII kernel, it is possible to configure out the code
>  > for X86_PPRO_FENCE and X86_F00F_BUG, yet as far as I can tell, those
>  > _can_ possibly boot on kernels where the errata are needed, and nary a
>  > printk is emitted for it.  Nasty bugs they are, too.
> 
> Indeed. That's arguably a bug that occured when someone split the
> original CONFIG_M686 into _M686 & MPENTIUMII.

It's a bit more complicated.  It dates from before we had the
"alternative" macro, and it was still cool to optimise spin_unlock()
into the most minimal instruction sequence at compile time.

It's only since then that we've been generalising to "M586 should run
on all later models correctly".  Arguably, tidying up in the process.

Now we could use "alternative" to put the locked store or non-locked
store there and it would not look out of place.

If we're honest, Linux seems to have evolved through the 2.5 series
from "optimise the primitives as tight as reasonable for a target
architecture" to "a few nops here and there won't hurt".  Perhaps
Transmeta's malign influence, as nops cost virtually nothing on those :)

Or perhaps it's because CPU models have branched and don't make a
straight line any more.  So we have to do more run-time checking to
keep it sane.

>  > More generally than the CPU, you can also configure out BLK_DEV_RZ1000
>  > which is another crucial workaround that needs to go in any
>  > lowest-common-denominator kernel.
> 
> I wouldn't look at the history of drivers/ide/ as a shining example of
> good design 8-)

No, but as an example of needing to enable all the workarounds for a
distro boot kernel, it's a glorious gem.  Even now people aren't quite
sure if multi-sector mode or DMA should be enabled by default :)

>  > Basically, if you're building a
>  > distro boot kernel, you must turn on all known workarounds.  That's
>  > certainly lowest-common-denominator, but it's a far cry from the
>  > configuration that a 386-as-firewall user wants.
> 
> Ok, I see what you're getting at, but Adrian's patch turned arch/i386/Kconfig
> and arch/i386/Makefile into guacamole.  After spending so much time
> getting that crap into something maintainable, it seemed a huge step
> backwards to litter it with dozens of ifdefs and duplication.
> There has to be a cleaner way of pleasing everyone.

Perhaps it's in a name.  It doesn't help that there's an assumed
linear progression of CPUs to support, up to the point where they
branch off all over the place in feature space.  In the linear part,
CONFIG_M586, CONFIG_M686 etc. seem to mean "support this CPU or
later", whatever later means (and it's not stated exactly).  After the
explosion of different feature directions, they stop meaning that and
just become optimisation knobs, as all the different essential features
are supported at run time.

Personally I think Adrian's patch's heart is in the right place,
simply because the menu options make more sense than the present
rather confusion decision, if you intend to (or might ever, take your
pick) run a kernel compiled for one CPU on another.  I am never sure,
for example, if it's safe to take the hard disk from my K6 and drop it
into a P5MMX box and boot from it.  The kernel config just doesn't
make that clear.

With Adrian's it does, even if the code behind it is a little like
guacamole.  Perhaps the code could be cleaner; I don't see that
individual CPU model support is much different than what we already
have, except for the option to fix features at compile time rather than
run time.

And that gives me an idea.... ;)

-- Jamie

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 14:58     ` Jamie Lokier
@ 2003-09-30 15:11       ` Dave Jones
  0 siblings, 0 replies; 657+ messages in thread
From: Dave Jones @ 2003-09-30 15:11 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: John Bradford, akpm, torvalds, linux-kernel

On Tue, Sep 30, 2003 at 03:58:54PM +0100, Jamie Lokier wrote:

 > (Aside: It is quite an anomaly that those cumbersome floating point
 > instructions are emulated on the older CPUs, yet all the other
 > instructions aren't emulated.  Emulation is very slow, and forcing
 > userspace to just use different code instead is good, but that's just
 > as valid for floating point as it is for MMX, cmpxchg etc.)

There was a patch around a while back that did 486 emulation on 386
kernels. I think it even made into the Mandrake kernel.

 > To be fair, the kernel really ought to just say that and halt.  That
 > is a fine compromise.  It won't make embedded systems folks completely
 > happy, because if you've only got 2MB of NVRAM for your whole kernel
 > _and_ filesystem including user data (think PDA or cellphone), then a
 > hundred bytes here or there is actually worth trimming.

With such tight constraints, why not just use 2.4 (or even 2.2) which
has much lower memory usage and diskspace requirements ?

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 14:10   ` John Bradford
@ 2003-09-30 14:58     ` Jamie Lokier
  2003-09-30 15:11       ` Dave Jones
  0 siblings, 1 reply; 657+ messages in thread
From: Jamie Lokier @ 2003-09-30 14:58 UTC (permalink / raw)
  To: John Bradford; +Cc: Dave Jones, akpm, torvalds, linux-kernel

John Bradford wrote:
> Unless, of course, you object to the possibility that somebody might
> go out of their way to compile a 386 specific kernel from source
> themselves, then run it on an Athlon.  By chance it will probably
> appear to work OK, but won't have the workaround enabled.  So what?

Actually the 386 kernel will work just fine on the AMD...  The
workaround is only needed, in the kernel, to protect against the
kernel's own use of non-386 features...

Userspace is a different matter, but userspace has a lot of
model-specific things to worry about beyond this one instruction on
AMD.  In practice: bswap, cmov, cmpxchg, mmx, sse, sse2, so knowing
whether to use prefetch or not is just one more variable for userspace
- and one which any portable app or library will have to know about in
any case.

(Aside: It is quite an anomaly that those cumbersome floating point
instructions are emulated on the older CPUs, yet all the other
instructions aren't emulated.  Emulation is very slow, and forcing
userspace to just use different code instead is good, but that's just
as valid for floating point as it is for MMX, cmpxchg etc.)

> Only somebody who knows exactly what they were doing is likely to do
> that - how could it happen by accident?  If you really must, put a
> warning in to say, 'This kernel doesn't support your processor', but
> doing that just adds more bloat.  OK, so the bloat will be freed after
> boot, but it's still bloat on the boot device, which matters in some
> embedded systems.

To be fair, the kernel really ought to just say that and halt.  That
is a fine compromise.  It won't make embedded systems folks completely
happy, because if you've only got 2MB of NVRAM for your whole kernel
_and_ filesystem including user data (think PDA or cellphone), then a
hundred bytes here or there is actually worth trimming.

But then, those sort of embedded folks should just figure out
compressed software-suspend, and then they can ditch __init data from
the NVRAM image completely.  It's much better to lose all of __init
than just a few bytes here or there, isn't it?

-- Jamie

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 14:06   ` Jamie Lokier
@ 2003-09-30 14:50     ` Dave Jones
  2003-09-30 15:30       ` Jamie Lokier
  2003-09-30 16:34       ` Adrian Bunk
  0 siblings, 2 replies; 657+ messages in thread
From: Dave Jones @ 2003-09-30 14:50 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: John Bradford, akpm, torvalds, linux-kernel

On Tue, Sep 30, 2003 at 03:06:27PM +0100, Jamie Lokier wrote:
 > Dave Jones wrote:
 > > On Tue, Sep 30, 2003 at 09:17:16AM +0100, John Bradford wrote:
 > >  > Of course a kernel compiled strictly for 386s may seem to boot on an
 > >  > Athlon but not work properly.  So what?  Just don't run the 'wrong'
 > >  > kernel.
 > > Wrong answer. How do you intend to install Linux when a distro boot
 > > kernel is compiled for lowest-common-denominator (386), and is the
 > > 'wrong' kernel for an Athlon ?
 > I'm not sure what the fuss is; a strict 386 kernel runs just fine
 > without any problems on an Athlon.  But anyway...

Unless it got configured away as proposed in your earlier patch.

 > Dave, you are conflating "kernel compiled strictly for 386s" with
 > "compiled for lowest-common-denominator".
 > 
 > They are totally different configurations.  Isn't that why we have
 > "generic" now?

CONFIG_GENERIC could be extended to offer other options yes,
but right now what it does doesn't really match the name IMO.
Right now its closer to a CONFIG_MAX_CACHELINE_SIZE

 > The latter is for distro boots.  The former is for that
 > 386-as-a-firewall with 1MB of RAM, where it _really_ has to trim
 > everything it can, and no errata thank you.

Again, 'trimming' away a few hundred bytes of errata workarounds
is ridiculous when we have bigger fish to fry where we can save
KBs of .text size, and MB's of runtime memory.

 > I've not heard of anyone actually wanting a strict 386 kernel lately,
 > but strict 486 is not so unusual.

ISTR that current gcc's emit 486 instructions anyway, so its possible
that with a modern toolchain, you can't *build* a 386 kernel.
I'm not sure if that got fixed or not, I don't track gcc lists any more.

 > Just as some people want a P4 optimised kernel, and some people want a
 > K7 optimised kernel, so some people want a 386 or 486 or Pentium
 > optimised kernel.  Lowest-common-denominator means it runs on
 > everything, and isn't really anything to do with 386 any more - that's
 > not really the lowest-common-denominator, by virtue of the obvious
 > fact that pure 386 code isn't reliable on all other CPUs.

Elaborate? "pure 386 code" (whatever that means in your definition)
should run perfectly reliable on every CPU we care about.

 > > We hashed this argument out a week or so ago, it seems the message
 > > didn't get across. YOU CAN NOT DISABLE ERRATA WORKAROUNDS IN A KERNEL
 > > THAT MAY POSSIBLY BOOT ON HARDWARE THAT WORKAROUND IS FOR.
 > I agree.  It shouln't be possible to boot on the wrong hardware: it
 > should refuse.

So first you argue for compiling out a few hundred bytes of errata
workaround, now you want to instead compile in checks & printk's
(which probably add up to not far off the same amount of space).

 > By selecting a PII kernel, it is possible to configure out the code
 > for X86_PPRO_FENCE and X86_F00F_BUG, yet as far as I can tell, those
 > _can_ possibly boot on kernels where the errata are needed, and nary a
 > printk is emitted for it.  Nasty bugs they are, too.

Indeed. That's arguably a bug that occured when someone split the
original CONFIG_M686 into _M686 & MPENTIUMII.

 > More generally than the CPU, you can also configure out BLK_DEV_RZ1000
 > which is another crucial workaround that needs to go in any
 > lowest-common-denominator kernel.

I wouldn't look at the history of drivers/ide/ as a shining example of
good design 8-)

 > Basically, if you're building a
 > distro boot kernel, you must turn on all known workarounds.  That's
 > certainly lowest-common-denominator, but it's a far cry from the
 > configuration that a 386-as-firewall user wants.

Ok, I see what you're getting at, but Adrian's patch turned arch/i386/Kconfig
and arch/i386/Makefile into guacamole.  After spending so much time
getting that crap into something maintainable, it seemed a huge step
backwards to litter it with dozens of ifdefs and duplication.
There has to be a cleaner way of pleasing everyone.

 > > clearer?
 > If the kernel had a consistent policy so far, it would be more clear,
 > but it doesn't.

Agreed, there are some questionable parts.

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 13:31 ` your mail Dave Jones
  2003-09-30 14:06   ` Jamie Lokier
@ 2003-09-30 14:10   ` John Bradford
  2003-09-30 14:58     ` Jamie Lokier
  1 sibling, 1 reply; 657+ messages in thread
From: John Bradford @ 2003-09-30 14:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jamie Lokier, akpm, torvalds, linux-kernel

Quote from Dave Jones <davej@redhat.com>:
> On Tue, Sep 30, 2003 at 09:17:16AM +0100, John Bradford wrote:
>  
>  > Of course a kernel compiled strictly for 386s may seem to boot on an
>  > Athlon but not work properly.  So what?  Just don't run the 'wrong'
>  > kernel.
> 
> Wrong answer. How do you intend to install Linux when a distro boot
> kernel is compiled for lowest-common-denominator (386), and is the
> 'wrong' kernel for an Athlon ?

I don't.  I *never* suggested doing that.  I clearly said a kernel
compiled *strictly* for 386s.  I.E. Without support for other
processors.

> We hashed this argument out a week or so ago, it seems the message
> didn't get across. YOU CAN NOT DISABLE ERRATA WORKAROUNDS IN A KERNEL
> THAT MAY POSSIBLY BOOT ON HARDWARE THAT WORKAROUND IS FOR.

It seems the message didn't get across to you.

Have you actually looked at Adrian's patch?

*Forget* that 386=lowest-common-denominator.  This
'386=lowest-common-denominator' theme is out of date, and we should be
moving away from it - oh, hang on, that's exactly what Adrian's patch
allows us to do.

A distribution installation kernel needs to boot all supported
hardware - of course it does.  So what?  Just select support for all
the processors in the configurator.  No, don't just select 386,
because 386 doesn't mean 386 and above anymore with Adrian's patch, it
means support 386 and don't bloat the kernel with workarounds for
other processors.  Select *all* processors.  Now you have a nice,
(bloated), kernel that boots on the same hardware that you old '386'
one did.  Fine for installation on diverse hardware.  Rubbish for
performance.

Unless, of course, you object to the possibility that somebody might
go out of their way to compile a 386 specific kernel from source
themselves, then run it on an Athlon.  By chance it will probably
appear to work OK, but won't have the workaround enabled.  So what?
Only somebody who knows exactly what they were doing is likely to do
that - how could it happen by accident?  If you really must, put a
warning in to say, 'This kernel doesn't support your processor', but
doing that just adds more bloat.  OK, so the bloat will be freed after
boot, but it's still bloat on the boot device, which matters in some
embedded systems.

> clearer ?

It's clear that you didn't read my original post, yes.

John.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30 13:31 ` your mail Dave Jones
@ 2003-09-30 14:06   ` Jamie Lokier
  2003-09-30 14:50     ` Dave Jones
  2003-09-30 14:10   ` John Bradford
  1 sibling, 1 reply; 657+ messages in thread
From: Jamie Lokier @ 2003-09-30 14:06 UTC (permalink / raw)
  To: Dave Jones, John Bradford, akpm, torvalds, linux-kernel

Dave Jones wrote:
> On Tue, Sep 30, 2003 at 09:17:16AM +0100, John Bradford wrote:
>  > Of course a kernel compiled strictly for 386s may seem to boot on an
>  > Athlon but not work properly.  So what?  Just don't run the 'wrong'
>  > kernel.
> 
> Wrong answer. How do you intend to install Linux when a distro boot
> kernel is compiled for lowest-common-denominator (386), and is the
> 'wrong' kernel for an Athlon ?

I'm not sure what the fuss is; a strict 386 kernel runs just fine
without any problems on an Athlon.  But anyway...

Dave, you are conflating "kernel compiled strictly for 386s" with
"compiled for lowest-common-denominator".

They are totally different configurations.  Isn't that why we have
"generic" now?

The latter is for distro boots.  The former is for that
386-as-a-firewall with 1MB of RAM, where it _really_ has to trim
everything it can, and no errata thank you.

I've not heard of anyone actually wanting a strict 386 kernel lately,
but strict 486 is not so unusual.

Just as some people want a P4 optimised kernel, and some people want a
K7 optimised kernel, so some people want a 386 or 486 or Pentium
optimised kernel.  Lowest-common-denominator means it runs on
everything, and isn't really anything to do with 386 any more - that's
not really the lowest-common-denominator, by virtue of the obvious
fact that pure 386 code isn't reliable on all other CPUs.

> We hashed this argument out a week or so ago, it seems the message
> didn't get across. YOU CAN NOT DISABLE ERRATA WORKAROUNDS IN A KERNEL
> THAT MAY POSSIBLY BOOT ON HARDWARE THAT WORKAROUND IS FOR.

I agree.  It shouln't be possible to boot on the wrong hardware: it
should refuse.

There is precedent: X86_GOOD_APIC && X86_LOCAL_APIC: when booted on a
non-MMX P5, it refuses to boot, because it does not contain the errata
workaround.

Unfortunately the kernel has opposite precedents too.

By selecting a PII kernel, it is possible to configure out the code
for X86_PPRO_FENCE and X86_F00F_BUG, yet as far as I can tell, those
_can_ possibly boot on kernels where the errata are needed, and nary a
printk is emitted for it.  Nasty bugs they are, too.

More generally than the CPU, you can also configure out BLK_DEV_RZ1000
which is another crucial workaround that needs to go in any
lowest-common-denominator kernel.  Basically, if you're building a
distro boot kernel, you must turn on all known workarounds.  That's
certainly lowest-common-denominator, but it's a far cry from the
configuration that a 386-as-firewall user wants.

> clearer?

If the kernel had a consistent policy so far, it would be more clear,
but it doesn't.

-- Jamie

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-09-30  8:17 John Bradford
@ 2003-09-30 13:31 ` Dave Jones
  2003-09-30 14:06   ` Jamie Lokier
  2003-09-30 14:10   ` John Bradford
  0 siblings, 2 replies; 657+ messages in thread
From: Dave Jones @ 2003-09-30 13:31 UTC (permalink / raw)
  To: John Bradford; +Cc: Jamie Lokier, akpm, torvalds, linux-kernel

On Tue, Sep 30, 2003 at 09:17:16AM +0100, John Bradford wrote:

 > Of course a kernel compiled strictly for 386s may seem to boot on an
 > Athlon but not work properly.  So what?  Just don't run the 'wrong'
 > kernel.

Wrong answer. How do you intend to install Linux when a distro boot
kernel is compiled for lowest-common-denominator (386), and is the
'wrong' kernel for an Athlon ?

We hashed this argument out a week or so ago, it seems the message
didn't get across. YOU CAN NOT DISABLE ERRATA WORKAROUNDS IN A KERNEL
THAT MAY POSSIBLY BOOT ON HARDWARE THAT WORKAROUND IS FOR.

clearer ?

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-08-28  2:25 warudkar
@ 2003-08-27 16:02 ` William Lee Irwin III
  0 siblings, 0 replies; 657+ messages in thread
From: William Lee Irwin III @ 2003-08-27 16:02 UTC (permalink / raw)
  To: warudkar; +Cc: kernel, linux-kernel, Andrew Morton

On Wed, Aug 27, 2003 at 09:25:23PM -0500, warudkar@vsnl.net wrote:
> Con - With swappiness set to 100, the apps do start up in 3 minutes and kswapd doesn't hog the CPU. But X is still unusable till all of them have started up.
> Wli - Sorry, vmstat segfaults on 2.6!

This is a bug in older versions of vmstat. Upgrade vmstat.


-- wli

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-08-25 16:45 Marcelo Tosatti
@ 2003-08-25 16:59 ` Herbert Pötzl
  0 siblings, 0 replies; 657+ messages in thread
From: Herbert Pötzl @ 2003-08-25 16:59 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

On Mon, Aug 25, 2003 at 01:45:25PM -0300, Marcelo Tosatti wrote:
> On Mon, 25 Aug 2003, Herbert Pötzl wrote:
> 
> > On Mon, Aug 25, 2003 at 10:53:21AM -0300, Marcelo Tosatti wrote:
> > >
> > > >
> > > >
> > > > Matthias Andree wrote:
> > > >
> > > > >On Mon, 25 Aug 2003, Marcelo Tosatti wrote:
> > > > >
> > > > >
> > > > >>- 2.4.22-rc4 was released as 2.4.22 with no changes.
> > > > >>
> > > > >
> > > > >What are the plans for 2.4.23? XFS merge perhaps <hint>?
> > > > >
> > > >
> > > > Maybe some of Andrea's VM stuff?
> > >
> > > Definately. Thats the first thing I'm going to do after looking
> through
> > > "2.4.23-pre-patches" folder.
> >
> > any chance for the Bind Mount Extensions? 8-)
> 
> I haven't found time to at the patch yet but will do so soon.

fine, no problem, let me know if you need something
(like rediff, resend, explanation, etc ...)

best,
Herbert


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
@ 2003-08-25 16:45 Marcelo Tosatti
  2003-08-25 16:59 ` Herbert Pötzl
  0 siblings, 1 reply; 657+ messages in thread
From: Marcelo Tosatti @ 2003-08-25 16:45 UTC (permalink / raw)
  To: Herbert Pötzl; +Cc: lkml

On Mon, 25 Aug 2003, Herbert Pötzl wrote:

> On Mon, Aug 25, 2003 at 10:53:21AM -0300, Marcelo Tosatti wrote:
> >
> > >
> > >
> > > Matthias Andree wrote:
> > >
> > > >On Mon, 25 Aug 2003, Marcelo Tosatti wrote:
> > > >
> > > >
> > > >>- 2.4.22-rc4 was released as 2.4.22 with no changes.
> > > >>
> > > >
> > > >What are the plans for 2.4.23? XFS merge perhaps <hint>?
> > > >
> > >
> > > Maybe some of Andrea's VM stuff?
> >
> > Definately. Thats the first thing I'm going to do after looking
through
> > "2.4.23-pre-patches" folder.
>
> any chance for the Bind Mount Extensions? 8-)

I haven't found time to at the patch yet but will do so soon.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-08-25 13:53 Marcelo Tosatti
@ 2003-08-25 14:30 ` Herbert Pötzl
  0 siblings, 0 replies; 657+ messages in thread
From: Herbert Pötzl @ 2003-08-25 14:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml

On Mon, Aug 25, 2003 at 10:53:21AM -0300, Marcelo Tosatti wrote:
> 
> >
> >
> > Matthias Andree wrote:
> >
> > >On Mon, 25 Aug 2003, Marcelo Tosatti wrote:
> > >
> > >
> > >>- 2.4.22-rc4 was released as 2.4.22 with no changes.
> > >>
> > >
> > >What are the plans for 2.4.23? XFS merge perhaps <hint>?
> > >
> >
> > Maybe some of Andrea's VM stuff?
> 
> Definately. Thats the first thing I'm going to do after looking through
> "2.4.23-pre-patches" folder.

any chance for the Bind Mount Extensions? 8-)

best,
Herbert

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-08-18  6:21 "Andrey Borzenkov" 
@ 2003-08-18 20:42 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2003-08-18 20:42 UTC (permalink / raw)
  To: Andrey Borzenkov; +Cc: jw schultz, linux-kernel

On Mon, Aug 18, 2003 at 10:21:22AM +0400, "Andrey Borzenkov"  wrote:
> 
> just to show what I expected from sysfs - here is entry from Solaris
> /devices:
> 
> brw-r-----   1 root     sys       32,240 Jan 24  2002 /devices/pci@16,4000/scsi@5,1/sd@0,0:a
> 
> this entry identifies disk partition 0 on drive with SCSI ID 0, LUN 0
> connected to bus 1 of controller in slot 5 of PCI bus identified
> by 16. Now you can use whatever policy you like to give human
> meaningful name to this entry. And if you have USB it will continue
> further giving you exact topology starting from the root of your
> device tree.
> 
> and this path does not contain single logical id so it is not subject
> to change if I add the same controller somewhere else.
> 
> hopefully it clarifies what I mean ...

Hm, a bit.  First, have you looked at what sysfs provides?  Here's one
of my machines and tell me if it has all the info you are looking for:

$ tree /sys/bus/scsi/
/sys/bus/scsi/
|-- devices
|   `-- 0:0:0:0 -> ../../../devices/pci0000:00/0000:00:1e.0/0000:02:05.0/host0/0:0:0:0
`-- drivers
    `-- sd
        `-- 0:0:0:0 -> ../../../../devices/pci0000:00/0000:00:1e.0/0000:02:05.0/host0/0:0:0:0

$ tree /sys/block/sda/
/sys/block/sda/
|-- dev
|-- device -> ../../devices/pci0000:00/0000:00:1e.0/0000:02:05.0/host0/0:0:0:0
|-- queue
|   |-- iosched
|   |   |-- antic_expire
|   |   |-- read_batch_expire
|   |   |-- read_expire
|   |   |-- write_batch_expire
|   |   `-- write_expire
|   `-- nr_requests
|-- range
|-- sda1
|   |-- dev
|   |-- size
|   |-- start
|   `-- stat
|-- sda2
|   |-- dev
|   |-- size
|   |-- start
|   `-- stat
|-- sda3
|   |-- dev
|   |-- size
|   |-- start
|   `-- stat
|-- sda4
|   |-- dev
|   |-- size
|   |-- start
|   `-- stat
|-- size
`-- stat

Now, from that you can see exactly where my scsi device is in the pci
tree, and you can see in the block directory, what block device is
assigned to what physical device in the device tree.  Then there are 4
partitions on this disk, all what those specific paramaters.

So, when sda shows up, udev can determine that it lives on a specific
scsi device, located in a specific place in the pci space, and that it
has some number of partitions, all of specific sizes, wich specific
major/minor numbers.  It can then create all of the /dev links based on
this.

Please, take a few minutes looking at the existing sysfs tree on Linux.
If you then have any specific questions, I would be glad to answer
them.

Hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-08-14 21:57 kartikey bhatt
@ 2003-08-15  3:31 ` James Morris
  0 siblings, 0 replies; 657+ messages in thread
From: James Morris @ 2003-08-15  3:31 UTC (permalink / raw)
  To: kartikey bhatt; +Cc: davem, linux-kernel, alan

On Fri, 15 Aug 2003, kartikey bhatt wrote:

> Hi James.
> A little bit work for you.
> Somebody on mailing list commented that you should *really* go for better
> algorithm like CAST6 (rfc2612) to be included in kernel.
> This time I'm sending you cast6.c (cast6 cipher algorithm) implementation.
> But this time it's a patch.

Cool.  Unfortunately the patch is corrupted, please try sending as an 
attachment or via a different mail system.


- James
-- 
James Morris
<jmorris@intercode.com.au>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <200308031136.17768.lx@lxhp.in-berlin.de>
@ 2003-08-03 18:30 ` Linus Torvalds
  0 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2003-08-03 18:30 UTC (permalink / raw)
  To: hp; +Cc: linux-assembly, Kernel Mailing List, David S. Miller

On Sun, 3 Aug 2003, hp wrote:
>
> so He/You did lock me out, too?
> whithout any notice. by what reason?

Maybe because this has nothing to do with the kernel?

It's ok to discuss kernel issues on the kernel mailing list, but we've had 
tons of totally off-topic flames, rants and general noise.

To the point that a lot of people don't even have time to follow 
linux-kernel any more, since a lot of the discussion has nothing to do 
with the technical kernel work.

Since some of these rants are started (and kept going) by people who don't
ever seem to actually get involved in _real_ kernel-related technical
discussions, David felt that one way to curb it was to just blacklist 
people who repeatedly post things that aren't related to the kernel.

It's ok to be off-topic every once in a while, but it's not ok to 
consistently be so.

That said, David is also not the most politic person I know, and I suspect 
this could have been handled slightly more gracefully. One potential less 
annoying approach is to not block posting from people, but rewrite the 
subject line for such posters with a prepended "[OFF-TOPIC]", and just let 
people filter those out on the receiving end. Or just automatically shunt 
them off to another list.

I dunno. I don't personally much care - but I've never been the maintainer 
of the mailing list, and I sure as hell don't ever want to be. Whoever is 
the maintainer gets to set the rules.

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-05-14 18:41 dirf
@ 2003-05-16 10:00 ` Maciej Soltysiak
  0 siblings, 0 replies; 657+ messages in thread
From: Maciej Soltysiak @ 2003-05-16 10:00 UTC (permalink / raw)
  To: dirf; +Cc: linux-kernel

> - Where I can find a list of RFCs?
http://www.ietf.org/rfc

There is a RFC Index link

> - Where I can find a cdfs format ( cd file system format)?
You mean kernel drivers or the specification?
The kernel drivers are in the stock kernel.

Regards,
Maciej


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <053C05D4.4D025D2E.0005F166@netscape.net>
@ 2003-05-08  9:06 ` Gerd Knorr
  0 siblings, 0 replies; 657+ messages in thread
From: Gerd Knorr @ 2003-05-08  9:06 UTC (permalink / raw)
  To: ark925; +Cc: Kernel List

> Actually it does in some cases. I know of two devices that have analog
> tuners on an smbus-like interface (OV511 USB TV and W9967CF USB TV). The
> tuner can be controlled using a pair of i2c_smbus_write_byte_data()
> calls.

Hmm, maybe we should rename the SMBUS class to SENSORS or MAINBOARD or
something like that?  I assumed you smbus interfaces are used for
mainboard sensors only ...

> Would a patch that adds smbus algorithm support to tuner.c be
> acceptable?

Yes.  Certainly makes more sense than duplicating the whole rest of
tuner.c just for a smbus-aware tuner driver ;)

  Gerd

-- 
sigfault

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-30 21:39 Mauricio Oliveira Carneiro
@ 2003-05-01  0:05 ` Greg KH
  0 siblings, 0 replies; 657+ messages in thread
From: Greg KH @ 2003-05-01  0:05 UTC (permalink / raw)
  To: Mauricio Oliveira Carneiro; +Cc: linux-kernel

On Wed, Apr 30, 2003 at 06:39:41PM -0300, Mauricio Oliveira Carneiro wrote:
> But I can't see it mounted anywhere in my system, nor can I mount it by 
> hand since I don't know the device filename (/dev/?) .

Have you read the Linux USB Guide at http://www.linux-usb.org/ ?

If you still have questions/problems after reading that, try asking this
on the linux-usb-users mailing list.  The people there can help you out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-25 17:35 Bloch, Jack
@ 2003-04-25 19:43 ` Francois Romieu
  0 siblings, 0 replies; 657+ messages in thread
From: Francois Romieu @ 2003-04-25 19:43 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

Bloch, Jack <Jack.Bloch@icn.siemens.com> :
> Is there example driver source code available for a MUSYCC CN8478 device?
> Please CC me directly on any answers. 

http://ww.google.com/search?q=musycc.c+bsd&ie=ISO-8859-1&hl=fr&lr=

Example source code for a complete reset of a PEB20534 device operating in
last descriptor address control mode will be welcome too.

Regards

--
Ueimor

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-05  0:38 Ed Vance
@ 2003-04-05  4:51 ` Keith Owens
  0 siblings, 0 replies; 657+ messages in thread
From: Keith Owens @ 2003-04-05  4:51 UTC (permalink / raw)
  To: Ed Vance; +Cc: linux-kernel

On Fri, 4 Apr 2003 16:38:50 -0800 , 
Ed Vance <EdV@macrolink.com> wrote:
>On Fri, Apr 04, 2003 at 3:21 PM, Keith Owens wrote:
>> 
>> On Fri, 4 Apr 2003 14:10:16 -0800 , 
>> Ed Vance <EdV@macrolink.com> wrote:
>> >Perhaps there is a middle ground. Leave the list open, but require a
>> >confirmation reply prior to passing along posts from addresses that:
>> >
>> >1. are not members of the list, AND
>> >2. have not previously done a proper confirmation reply.
>> 
>> 30 seconds after doing that, the spammers will forge email that claims
>> to be from LT, AC, DM, MT etc.  Not to mention all the viruses that
>> forge the headers.  Verification by 'From:' line on an open list is
>> pointless.
>> 
>The goal was to greatly reduce, in one swell foop, the volume of spam that
>the filters (and postmaster) must interactively deal with. I thought that
>perhaps this method could replace one of more of the troublesome filtering
>techniques to achieve the same net spam reduction without evoking as much
>whining.

Paraphrase: Replace filtering code that catches spam with filtering
code based on checking header content that can be trivially forged by
spammers.

>Matti, 
>Roughly what percentage of the spam actually hitting vger today (and
>bouncing off) is based on Keith's flavor of spoofing? Is it even 1 percent? 

Current figures are irrelevant, spammers react to spam filters and they
react very quickly[*].  If you replace "reject HTML bodies" with "allow
HTML based on known From: lines" then the spammers will send HTML
bodies with forged headers, because they know it will get through.
That will require the original HTML filters to be reintroduced, the end
result is you added an extra step for new posters without reducing the
spam or users whining "my mail does not get through".

[*] About 24 hours after slashdot carried a story on Baysian spam
    filters, I started receiving HTML spam that contained comments that
    were designed to fool the Baysian filters, like this.

    FREE 1 MONTH SUPP<!--kernel-->Y WITH THIS

    The comment has no effect on the spam display but the use of
    non-spam words skews the Baysian rules on whether the content is
    spam or not.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2003-04-05  0:38 Ed Vance
  2003-04-05  4:51 ` Keith Owens
  0 siblings, 1 reply; 657+ messages in thread
From: Ed Vance @ 2003-04-05  0:38 UTC (permalink / raw)
  To: 'Keith Owens', 'Matti Aarnio'; +Cc: linux-kernel

On Fri, Apr 04, 2003 at 3:21 PM, Keith Owens wrote:
> 
> On Fri, 4 Apr 2003 14:10:16 -0800 , 
> Ed Vance <EdV@macrolink.com> wrote:
> >Perhaps there is a middle ground. Leave the list open, but require a
> >confirmation reply prior to passing along posts from addresses that:
> >
> >1. are not members of the list, AND
> >2. have not previously done a proper confirmation reply.
> 
> 30 seconds after doing that, the spammers will forge email that claims
> to be from LT, AC, DM, MT etc.  Not to mention all the viruses that
> forge the headers.  Verification by 'From:' line on an open list is
> pointless.
> 

Keith,

No single method is perfect. Your point is well taken. 

The goal was to greatly reduce, in one swell foop, the volume of spam that
the filters (and postmaster) must interactively deal with. I thought that
perhaps this method could replace one of more of the troublesome filtering
techniques to achieve the same net spam reduction without evoking as much
whining.

imperfect != pointless

Matti, 
Roughly what percentage of the spam actually hitting vger today (and
bouncing off) is based on Keith's flavor of spoofing? Is it even 1 percent? 

Cheers,
Ed

---------------------------------------------------------------- 
Ed Vance              edv (at) macrolink (dot) com
Macrolink, Inc.       1500 N. Kellogg Dr  Anaheim, CA  92807
----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 22:10 Ed Vance
  2003-04-04 23:19 ` William Scott Lockwood III
@ 2003-04-04 23:21 ` Keith Owens
  1 sibling, 0 replies; 657+ messages in thread
From: Keith Owens @ 2003-04-04 23:21 UTC (permalink / raw)
  To: Ed Vance; +Cc: linux-kernel

On Fri, 4 Apr 2003 14:10:16 -0800 , 
Ed Vance <EdV@macrolink.com> wrote:
>Perhaps there is a middle ground. Leave the list open, but require a
>confirmation reply prior to passing along posts from addresses that:
>
>1. are not members of the list, AND
>2. have not previously done a proper confirmation reply.

30 seconds after doing that, the spammers will forge email that claims
to be from LT, AC, DM, MT etc.  Not to mention all the viruses that
forge the headers.  Verification by 'From:' line on an open list is
pointless.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
  2003-04-04 22:10 Ed Vance
@ 2003-04-04 23:19 ` William Scott Lockwood III
  2003-04-04 23:21 ` Keith Owens
  1 sibling, 0 replies; 657+ messages in thread
From: William Scott Lockwood III @ 2003-04-04 23:19 UTC (permalink / raw)
  To: Ed Vance; +Cc: 'Matti Aarnio', linux-kernel

That is the best suggestion I've yet seen.  It's an excellent idea!

On Fri, 4 Apr 2003, Ed Vance wrote:

> On Fri, Apr 04, 2003 at 12:38 PM, Matti Aarnio wrote:
> > [snip]
> > A somewhat better anti-spam filter method, than what we use presently
> > is to use strictly CLOSED list -- e.g. must be a member to post.
> > I have seen what kind of pains closed lists are, I even moderate
> > couple small ones.
> >
> > However we are deliberately running "open for posting, subject to
> > filters" policy, which lets questions and reports to come from
> > non-subscribers.
> >
>
> Perhaps there is a middle ground. Leave the list open, but require a
> confirmation reply prior to passing along posts from addresses that:
>
> 1. are not members of the list, AND
> 2. have not previously done a proper confirmation reply.
>
> The unconfirmed posts would time out and disappear after a decent interval,
> to prevent constipation.
>
> So, anybody could still post, the members would not be inconvenienced, and
> non-members would be inconvenienced only on their first post from each
> address they post from. This would preserve the "real time" nature of the
> list, while gaining the assurance that all who post are life-forms, even if
> they live in front of a keyboard and have no real life.  ;-)
>
> Of course, this would require storage for the list of confirmed addresses
> and pending unconfirmed posts, and the bandwidth and other overhead of the
> infrequent confirmation messages.
>
> Just a thought.
>
> Cheers,
> Ed
>
> ----------------------------------------------------------------
> Ed Vance              edv (at) macrolink (dot) com
> Macrolink, Inc.       1500 N. Kellogg Dr  Anaheim, CA  92807
> ----------------------------------------------------------------
>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2003-04-04 22:10 Ed Vance
  2003-04-04 23:19 ` William Scott Lockwood III
  2003-04-04 23:21 ` Keith Owens
  0 siblings, 2 replies; 657+ messages in thread
From: Ed Vance @ 2003-04-04 22:10 UTC (permalink / raw)
  To: 'Matti Aarnio'; +Cc: William Scott Lockwood III, linux-kernel

On Fri, Apr 04, 2003 at 12:38 PM, Matti Aarnio wrote:
> [snip]
> A somewhat better anti-spam filter method, than what we use presently
> is to use strictly CLOSED list -- e.g. must be a member to post.
> I have seen what kind of pains closed lists are, I even moderate
> couple small ones.
> 
> However we are deliberately running "open for posting, subject to
> filters" policy, which lets questions and reports to come from
> non-subscribers.
> 

Perhaps there is a middle ground. Leave the list open, but require a
confirmation reply prior to passing along posts from addresses that:

1. are not members of the list, AND
2. have not previously done a proper confirmation reply.

The unconfirmed posts would time out and disappear after a decent interval,
to prevent constipation.

So, anybody could still post, the members would not be inconvenienced, and
non-members would be inconvenienced only on their first post from each
address they post from. This would preserve the "real time" nature of the
list, while gaining the assurance that all who post are life-forms, even if
they live in front of a keyboard and have no real life.  ;-)

Of course, this would require storage for the list of confirmed addresses
and pending unconfirmed posts, and the bandwidth and other overhead of the
infrequent confirmation messages.

Just a thought. 

Cheers,
Ed

---------------------------------------------------------------- 
Ed Vance              edv (at) macrolink (dot) com
Macrolink, Inc.       1500 N. Kellogg Dr  Anaheim, CA  92807
----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 15:28           ` William Scott Lockwood III
                               ` (2 preceding siblings ...)
  2003-04-04 16:10             ` Jens Axboe
@ 2003-04-04 20:37             ` Matti Aarnio
  3 siblings, 0 replies; 657+ messages in thread
From: Matti Aarnio @ 2003-04-04 20:37 UTC (permalink / raw)
  To: William Scott Lockwood III; +Cc: linux-kernel

On Fri, Apr 04, 2003 at 07:28:12AM -0800, William Scott Lockwood III wrote:
...
> The best list is one that is inclusive.  One that tollerates other opinions
> and choices.  LKML has turned into the largest, nastiest click I've ever
> seen, and that's really sad, as I'm sure it scares some good people away.

Are you speaking about PEOPLE who react on emails by flaming, or
something of list filtering "technology" ?

>  Look at all the crap I and others got for using hotmail - I finally
> got sick and tired of the whining and now have to take 3x as long to 
> read my mail - but it's not a hotmail address anymore, so the whining
> stoped.

About people, then..    There I can't help, unfortunately.

We have lots of people subscribing on Hotmail addresses, and only 
complaint I can voice is that those people will at times let their
mailbox quotas overflow, which leads to bounces, and then subscription
revocation...  (Hard controlled quotas are not unique to Hotmail, nor
people who let them overflow...)

> Why not spend less timing restricting what people can read and post
> from, and just let people participate?

There is this small thing called spam...

We have various filters (see my other posting), but obviously they
are not infallible, a few spams do leak thru, and earn new filter
rules (if I can think up something suitably specific, while generic..)

A somewhat better anti-spam filter method, than what we use presently
is to use strictly CLOSED list -- e.g. must be a member to post.
I have seen what kind of pains closed lists are, I even moderate
couple small ones.

However we are deliberately running "open for posting, subject to
filters" policy, which lets questions and reports to come from
non-subscribers.

/Matti Aarnio

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 15:28           ` William Scott Lockwood III
  2003-04-04 16:04             ` Richard B. Johnson
  2003-04-04 16:04             ` Christoph Hellwig
@ 2003-04-04 16:10             ` Jens Axboe
  2003-04-04 20:37             ` Matti Aarnio
  3 siblings, 0 replies; 657+ messages in thread
From: Jens Axboe @ 2003-04-04 16:10 UTC (permalink / raw)
  To: William Scott Lockwood III
  Cc: Richard B. Johnson, David S. Miller, linux-kernel

On Fri, Apr 04 2003, William Scott Lockwood III wrote:
> On Fri, 4 Apr 2003, Richard B. Johnson wrote:
> > On Thu, 3 Apr 2003, William Scott Lockwood III wrote:
> > > On Thu, 3 Apr 2003, David S. Miller wrote:
> > > >    From: "Richard B. Johnson" <root@chaos.analogic.com>
> > > >    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
> > > >    Well it's not a yahoo users problem because yahoo users can't fix
> > > >    it. Some yahoo users have yahoo "free" mail as their only connection
> > > >    to the internet because of facist network administrators.
> > > > If you want all the SPAM that will result on Linux-kernel, we
> > > > can disable the filter if you want.
> > > > I refuse to sit here and listen to all the "this is the only
> > > > connection person FOO has to the internet" stories, quite frankly I'm
> > > > absolutely sick of hearing them.
> > > > If you don't have properly functioning mail, you can't use these
> > > > lists.
> > > > Period.
> > > When did that become your call?  I didn't realize you owned LKML.
> > Well it's his "baseball" and; "You'll play by my rules or you won't
> > play at all..."
> > FYI, there is no Major Domo. It's Latin, major domus, "master of
> > the house". He doith whatever he careth...
> 
> Yes, I can see that.  No matter who it alienates.  Weither or not he's
> checked with anyone else either.  How about leting those of us who (like
> Linus) choose to use a commercial email product do so?  Garbage about
> headers, etc. is just that - garbage.  The best list is one that is
> inclusive.  One that tollerates other opinions and choices.  LKML has
> turned into the largest, nastiest click I've ever seen, and that's really
> sad, as I'm sure it scares some good people away.  Look at all the crap I
> and others got for using hotmail - I finally got sick and tired of the
> whining and now have to take 3x as long to read my mail - but it's not a
> hotmail address anymore, so the whining stoped.  Why not spend less timing
> restricting what people can read and post from, and just let people
> participate?

Oh please go away. Would you rather see lkml be as ridden with spam as
other lists? You have the right to use a commercial product, and you may
also exercise your right to choose a _bad_ one.

Besides, crap like the above doesn't carry much weight. Especially not
from someone who rarely contributes anything but noise on the list. No
time for whiners, to the kill file you go.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 15:28           ` William Scott Lockwood III
  2003-04-04 16:04             ` Richard B. Johnson
@ 2003-04-04 16:04             ` Christoph Hellwig
  2003-04-04 16:10             ` Jens Axboe
  2003-04-04 20:37             ` Matti Aarnio
  3 siblings, 0 replies; 657+ messages in thread
From: Christoph Hellwig @ 2003-04-04 16:04 UTC (permalink / raw)
  To: William Scott Lockwood III
  Cc: Richard B. Johnson, David S. Miller, linux-kernel

On Fri, Apr 04, 2003 at 07:28:12AM -0800, William Scott Lockwood III wrote:
> Yes, I can see that.  No matter who it alienates.  Weither or not he's
> checked with anyone else either.

LKML is DaveM's list.  If the choice he and his co-postmaster make don't
suit yours or others need setup your own linux kernel list.

> How about leting those of us who (like
> Linus) choose to use a commercial email product do so?  Garbage about
> headers, etc. is just that - garbage.

Who said anything about commercial products?  lkml refuses _broken_
mails, it doesn't check what MUA you used.

> and others got for using hotmail - I finally got sick and tired of the
> whining and now have to take 3x as long to read my mail - but it's not a
> hotmail address anymore, so the whining stoped.  Why not spend less timing
> restricting what people can read and post from, and just let people
> participate?

Please red the mail RFC and the nettiquette and come back once you've
done that.  Your current whining wastes our time.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 15:28           ` William Scott Lockwood III
@ 2003-04-04 16:04             ` Richard B. Johnson
  2003-04-04 16:04             ` Christoph Hellwig
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2003-04-04 16:04 UTC (permalink / raw)
  To: William Scott Lockwood III; +Cc: David S. Miller, Linux kernel

On Fri, 4 Apr 2003, William Scott Lockwood III wrote:

> On Fri, 4 Apr 2003, Richard B. Johnson wrote:
> > On Thu, 3 Apr 2003, William Scott Lockwood III wrote:
> > > On Thu, 3 Apr 2003, David S. Miller wrote:
> > > >    From: "Richard B. Johnson" <root@chaos.analogic.com>
> > > >    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
> > > >    Well it's not a yahoo users problem because yahoo users can't fix
> > > >    it. Some yahoo users have yahoo "free" mail as their only connection
> > > >    to the internet because of facist network administrators.
> > > > If you want all the SPAM that will result on Linux-kernel, we
> > > > can disable the filter if you want.
> > > > I refuse to sit here and listen to all the "this is the only
> > > > connection person FOO has to the internet" stories, quite frankly I'm
> > > > absolutely sick of hearing them.
> > > > If you don't have properly functioning mail, you can't use these
> > > > lists.
> > > > Period.
> > > When did that become your call?  I didn't realize you owned LKML.
> > Well it's his "baseball" and; "You'll play by my rules or you won't
> > play at all..."
> > FYI, there is no Major Domo. It's Latin, major domus, "master of
> > the house". He doith whatever he careth...
>
> Yes, I can see that.  No matter who it alienates.  Weither or not he's
> checked with anyone else either.  How about leting those of us who (like
> Linus) choose to use a commercial email product do so?  Garbage about
> headers, etc. is just that - garbage.  The best list is one that is
> inclusive.  One that tollerates other opinions and choices.  LKML has
> turned into the largest, nastiest click I've ever seen, and that's really
> sad, as I'm sure it scares some good people away.  Look at all the crap I
> and others got for using hotmail - I finally got sick and tired of the
> whining and now have to take 3x as long to read my mail - but it's not a
> hotmail address anymore, so the whining stoped.  Why not spend less timing
> restricting what people can read and post from, and just let people
> participate?
>

Well SPAM is a very big problem and I can see that David is trying.
Sometimes he has a bad day and pisses a few off with his answers.
However, in every case in which somebody that I know of has complained,
the problems did get "mysteriously" fixed so, like they say; "Don't go
away mad. Just go away!".

Once you get flammed for a few years, you get used to it. That's why
some people send email to me rather than "the list". Sometimes I am
able to help without having to forward their problems to the list.
Sometimes I have to take a work-break and can't help, and other times
I can't help because I don't know what they are talking about. Anyway
if David wants invoke the rage of the Gods, yawn... It doesn't bother
me anymore....


Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04 12:57         ` Richard B. Johnson
@ 2003-04-04 15:28           ` William Scott Lockwood III
  2003-04-04 16:04             ` Richard B. Johnson
                               ` (3 more replies)
  0 siblings, 4 replies; 657+ messages in thread
From: William Scott Lockwood III @ 2003-04-04 15:28 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: David S. Miller, linux-kernel

On Fri, 4 Apr 2003, Richard B. Johnson wrote:
> On Thu, 3 Apr 2003, William Scott Lockwood III wrote:
> > On Thu, 3 Apr 2003, David S. Miller wrote:
> > >    From: "Richard B. Johnson" <root@chaos.analogic.com>
> > >    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
> > >    Well it's not a yahoo users problem because yahoo users can't fix
> > >    it. Some yahoo users have yahoo "free" mail as their only connection
> > >    to the internet because of facist network administrators.
> > > If you want all the SPAM that will result on Linux-kernel, we
> > > can disable the filter if you want.
> > > I refuse to sit here and listen to all the "this is the only
> > > connection person FOO has to the internet" stories, quite frankly I'm
> > > absolutely sick of hearing them.
> > > If you don't have properly functioning mail, you can't use these
> > > lists.
> > > Period.
> > When did that become your call?  I didn't realize you owned LKML.
> Well it's his "baseball" and; "You'll play by my rules or you won't
> play at all..."
> FYI, there is no Major Domo. It's Latin, major domus, "master of
> the house". He doith whatever he careth...

Yes, I can see that.  No matter who it alienates.  Weither or not he's
checked with anyone else either.  How about leting those of us who (like
Linus) choose to use a commercial email product do so?  Garbage about
headers, etc. is just that - garbage.  The best list is one that is
inclusive.  One that tollerates other opinions and choices.  LKML has
turned into the largest, nastiest click I've ever seen, and that's really
sad, as I'm sure it scares some good people away.  Look at all the crap I
and others got for using hotmail - I finally got sick and tired of the
whining and now have to take 3x as long to read my mail - but it's not a
hotmail address anymore, so the whining stoped.  Why not spend less timing
restricting what people can read and post from, and just let people
participate?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04  0:31       ` William Scott Lockwood III
  2003-04-04  0:40         ` David S. Miller
@ 2003-04-04 12:57         ` Richard B. Johnson
  2003-04-04 15:28           ` William Scott Lockwood III
  1 sibling, 1 reply; 657+ messages in thread
From: Richard B. Johnson @ 2003-04-04 12:57 UTC (permalink / raw)
  To: William Scott Lockwood III; +Cc: David S. Miller, linux-kernel

On Thu, 3 Apr 2003, William Scott Lockwood III wrote:

> On Thu, 3 Apr 2003, David S. Miller wrote:
> >    From: "Richard B. Johnson" <root@chaos.analogic.com>
> >    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
> >    Well it's not a yahoo users problem because yahoo users can't fix
> >    it. Some yahoo users have yahoo "free" mail as their only connection
> >    to the internet because of facist network administrators.
> > If you want all the SPAM that will result on Linux-kernel, we
> > can disable the filter if you want.
> > I refuse to sit here and listen to all the "this is the only
> > connection person FOO has to the internet" stories, quite frankly I'm
> > absolutely sick of hearing them.
> > If you don't have properly functioning mail, you can't use these
> > lists.
> > Period.
>
> When did that become your call?  I didn't realize you owned LKML.
>

Well it's his "baseball" and; "You'll play by my rules or you won't
play at all..."

FYI, there is no Major Domo. It's Latin, major domus, "master of
the house". He doith whatever he careth...

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04  0:40         ` David S. Miller
@ 2003-04-04  0:47           ` William Scott Lockwood III
  0 siblings, 0 replies; 657+ messages in thread
From: William Scott Lockwood III @ 2003-04-04  0:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: root, linux-kernel

Yeah, sorry - I thought HPA was running it for some reason.  I still don't
think you should make that kind of a call unilaterally, but hey - after
all the incessant whining I put up with about using OE, I finally caved
and moved to a real email address myself.  I guess it just goes to show
that if you while and act petulant long enough...

On Thu, 3 Apr 2003, David S. Miller wrote:

>    From: William Scott Lockwood III <vlad@geekizoid.com>
>    Date: Thu, 3 Apr 2003 16:31:13 -0800 (PST)
>
>    When did that become your call?  I didn't realize you owned LKML.
>
> Maybe this is news to you, but I've been running LKML for
> 6 or so years now.
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-04  0:31       ` William Scott Lockwood III
@ 2003-04-04  0:40         ` David S. Miller
  2003-04-04  0:47           ` William Scott Lockwood III
  2003-04-04 12:57         ` Richard B. Johnson
  1 sibling, 1 reply; 657+ messages in thread
From: David S. Miller @ 2003-04-04  0:40 UTC (permalink / raw)
  To: vlad; +Cc: root, linux-kernel

   From: William Scott Lockwood III <vlad@geekizoid.com>
   Date: Thu, 3 Apr 2003 16:31:13 -0800 (PST)

   When did that become your call?  I didn't realize you owned LKML.
   
Maybe this is news to you, but I've been running LKML for
6 or so years now.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:00     ` David S. Miller
  2003-04-03 20:21       ` Richard B. Johnson
@ 2003-04-04  0:31       ` William Scott Lockwood III
  2003-04-04  0:40         ` David S. Miller
  2003-04-04 12:57         ` Richard B. Johnson
  1 sibling, 2 replies; 657+ messages in thread
From: William Scott Lockwood III @ 2003-04-04  0:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: root, linux-kernel

On Thu, 3 Apr 2003, David S. Miller wrote:
>    From: "Richard B. Johnson" <root@chaos.analogic.com>
>    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
>    Well it's not a yahoo users problem because yahoo users can't fix
>    it. Some yahoo users have yahoo "free" mail as their only connection
>    to the internet because of facist network administrators.
> If you want all the SPAM that will result on Linux-kernel, we
> can disable the filter if you want.
> I refuse to sit here and listen to all the "this is the only
> connection person FOO has to the internet" stories, quite frankly I'm
> absolutely sick of hearing them.
> If you don't have properly functioning mail, you can't use these
> lists.
> Period.

When did that become your call?  I didn't realize you owned LKML.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:02   ` your mail Richard B. Johnson
  2003-04-03 19:24     ` Alan Cox
  2003-04-03 20:00     ` David S. Miller
@ 2003-04-03 20:40     ` Trever L. Adams
  2 siblings, 0 replies; 657+ messages in thread
From: Trever L. Adams @ 2003-04-03 20:40 UTC (permalink / raw)
  To: root; +Cc: David S. Miller, Linux Kernel Mailing List

On Thu, 2003-04-03 at 15:02, Richard B. Johnson wrote:
> Well it's not a yahoo users problem because yahoo users can't fix
> it. Some yahoo users have yahoo "free" mail as their only connection
> to the internet because of facist network administrators. It gets
> worse how that you can't tell a company to go screw themselves and
> get another job. The three engineers that I know who use yahoo do
> so because they don't have any choice and there is no way that they
> can configure the mailer to get rid of the empty HTML section.

I would suggest that those who think Yahoo is there only option, check
out digitalme.com or myrealbox.com.  Web, pop, imap, etc.  All free.

Trever
--
"Never raise your hand to your children - it leaves your midsection
unprotected." -- Matthew Harrell


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:00     ` David S. Miller
@ 2003-04-03 20:21       ` Richard B. Johnson
  2003-04-03 20:15         ` David S. Miller
  2003-04-04  0:31       ` William Scott Lockwood III
  1 sibling, 1 reply; 657+ messages in thread
From: Richard B. Johnson @ 2003-04-03 20:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

On Thu, 3 Apr 2003, David S. Miller wrote:

>    From: "Richard B. Johnson" <root@chaos.analogic.com>
>    Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)
>
>    Well it's not a yahoo users problem because yahoo users can't fix
>    it. Some yahoo users have yahoo "free" mail as their only connection
>    to the internet because of facist network administrators.
>
> If you want all the SPAM that will result on Linux-kernel, we
> can disable the filter if you want.

No. I think you can let empty HTML sections go through.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:21       ` Richard B. Johnson
@ 2003-04-03 20:15         ` David S. Miller
  0 siblings, 0 replies; 657+ messages in thread
From: David S. Miller @ 2003-04-03 20:15 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

   From: "Richard B. Johnson" <root@chaos.analogic.com>
   Date: Thu, 3 Apr 2003 15:21:25 -0500 (EST)

   On Thu, 3 Apr 2003, David S. Miller wrote:
   
   > If you want all the SPAM that will result on Linux-kernel, we
   > can disable the filter if you want.
   
   No. I think you can let empty HTML sections go through.
   
I think these people who it matters to can petition yahoo.com to drop
this dumb empty HTML section.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 19:22 ` David S. Miller
@ 2003-04-03 20:02   ` Richard B. Johnson
  2003-04-03 19:24     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 657+ messages in thread
From: Richard B. Johnson @ 2003-04-03 20:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux kernel

On Thu, 3 Apr 2003, David S. Miller wrote:

> On Thu, 2003-04-03 at 08:22, Richard B. Johnson wrote:
> > FYI vger rejects mail sent from yahoo.com, claims that it
> > has a HTML subpart and considers it spam or Outlook Virus.
> >
> > FYI any mail sent from yahoo will end up using the yahoo tools
> > (qmail). This will put an empty HTML section in all mail. It
> > is not a good thing to reject this because that means you reject
> > all mail from yahoo.
>
> That's yahoo users problem not ours.  If you can't be bothered
> to get a plain text email out, you shouldn't be using these
> lists.
>

Well it's not a yahoo users problem because yahoo users can't fix
it. Some yahoo users have yahoo "free" mail as their only connection
to the internet because of facist network administrators. It gets
worse how that you can't tell a company to go screw themselves and
get another job. The three engineers that I know who use yahoo do
so because they don't have any choice and there is no way that they
can configure the mailer to get rid of the empty HTML section.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:02   ` your mail Richard B. Johnson
  2003-04-03 19:24     ` Alan Cox
@ 2003-04-03 20:00     ` David S. Miller
  2003-04-03 20:21       ` Richard B. Johnson
  2003-04-04  0:31       ` William Scott Lockwood III
  2003-04-03 20:40     ` Trever L. Adams
  2 siblings, 2 replies; 657+ messages in thread
From: David S. Miller @ 2003-04-03 20:00 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

   From: "Richard B. Johnson" <root@chaos.analogic.com>
   Date: Thu, 3 Apr 2003 15:02:41 -0500 (EST)

   Well it's not a yahoo users problem because yahoo users can't fix
   it. Some yahoo users have yahoo "free" mail as their only connection
   to the internet because of facist network administrators.

If you want all the SPAM that will result on Linux-kernel, we
can disable the filter if you want.

I refuse to sit here and listen to all the "this is the only
connection person FOO has to the internet" stories, quite frankly I'm
absolutely sick of hearing them.

If you don't have properly functioning mail, you can't use these
lists.

Period.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-04-03 20:02   ` your mail Richard B. Johnson
@ 2003-04-03 19:24     ` Alan Cox
  2003-04-03 20:00     ` David S. Miller
  2003-04-03 20:40     ` Trever L. Adams
  2 siblings, 0 replies; 657+ messages in thread
From: Alan Cox @ 2003-04-03 19:24 UTC (permalink / raw)
  To: root; +Cc: David S. Miller, Linux Kernel Mailing List

On Iau, 2003-04-03 at 21:02, Richard B. Johnson wrote:
> Well it's not a yahoo users problem because yahoo users can't fix
> it. Some yahoo users have yahoo "free" mail as their only connection
> to the internet because of facist network administrators. It gets
> worse how that you can't tell a company to go screw themselves and
> get another job. The three engineers that I know who use yahoo do
> so because they don't have any choice and there is no way that they
> can configure the mailer to get rid of the empty HTML section.

There are lots of other free email providers.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-31 18:46 saurabh  khanna
@ 2003-02-03 12:53 ` Alexander Kellett
  0 siblings, 0 replies; 657+ messages in thread
From: Alexander Kellett @ 2003-02-03 12:53 UTC (permalink / raw)
  To: saurabh khanna; +Cc: linux-kernel

hiya,

unfortunately this list isn't for such problems and it
would be better to contact your distribution or the various
forums it has. try google.

/me wonders again why this list isn't called linux-kernel-dev@...

Alex

On Fri, Jan 31, 2003 at 06:46:05PM -0000, saurabh  khanna wrote:
> Problem: My xwindows did not open and my sound card don't work.
> 
> Xwindows:
> I am a novice. I am using redhat linux 8. It detects my graphics 
> card
> correctly but when i tried to open xwindows, my system hangs.
> 
> Sound:
> Linux has detected my sound card once but not configured it and 
> after
> that nor it is working nither it is detected by my linux.
> 
> GRUB:
> Also, i can boot my linux through LILO only. GRUB wont work, it 
> gives
> error "Not enough memory".
> 
> I have re-installed linux on my computer but the problem 
> remains.
> All other detailes are follows.
> 
> Kernel version:
> Linux version 2.4.18-14 (bhcompile@astest.test.redhat.com)
> (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7))
> #1 Wed Sep 4 12:13:11 EDT 2002
> 
> 
> Commond which triggers the problem:
> startx
> 
> Processor information:
> processor	: 0
> 
> vendor_id	: AuthenticAMD
> 
> cpu family	: 6
> 
> model		: 6
> 
> model name	: AMD Athlon(TM) XP 1700+
> 
> stepping	: 2
> 
> cpu MHz		: 1469.861
> 
> cache size	: 256 KB
> 
> fdiv_bug	: no
> 
> hlt_bug		: no
> 
> f00f_bug	: no
> 
> coma_bug	: no
> 
> fpu		: yes
> 
> fpu_exception	: yes
> 
> cpuid level	: 1
> 
> wp		: yes
> 
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
> 
> bogomips	: 2920.57
> 
> 
> 
> Module information:
> nls_iso8859-1           3516   1 (autoclean)
> 
> nls_cp437               5116   1 (autoclean)
> 
> vfat                   13084   1 (autoclean)
> 
> fat                    38744   0 (autoclean)
> [vfat]
> autofs                 13348   0 (autoclean) (unused)
> 
> ipt_REJECT              3736   2 (autoclean)
> 
> iptable_filter          2412   1 (autoclean)
> 
> ip_tables              14840   2 [ipt_REJECT iptable_filter]
> 
> mousedev                5524   0 (unused)
> 
> keybdev                 2976   0 (unused)
> 
> hid                    22244   0 (unused)
> 
> input                   5888   0 [mousedev keybdev hid]
> 
> usb-ohci               21288   0 (unused)
> 
> usbcore                77056   1 [hid usb-ohci]
> 
> ext3                   70400   1
> 
> jbd                    52212   1 [ext3]
> 
> 
> 
> Loaded driver and hardware information:
> 0000-001f : dma
> 1
> 0020-003f : pic
> 1
> 0040-005f : timer
> 
> 0060-006f : keyboard
> 
> 0070-007f : rtc
> 
> 0080-008f : dma page reg
> 
> 00a0-00bf : pic
> 2
> 00c0-00df : dma
> 2
> 00f0-00ff : fpu
> 
> 01f0-01f7 : ide
> 0
> 02f8-02ff : serial(auto)
> 
> 03c0-03df : vga+
> 
> 03f6-03f6 : ide
> 0
> 03f8-03ff : serial(auto)
> 
> 0cf8-0cff : PCI conf
> 1
> 5000-500f : PCI device 10de:01b4 (nVidia Corporation)
> 
> 5100-511f : PCI device 10de:01b4 (nVidia Corporation)
> 
> 5500-550f : PCI device 10de:01b4 (nVidia Corporation)
> 
> a800-a80f : nVidia Corporation nForce IDE
> 
> a800-a807 : ide0
> 
> a808-a80f : ide1
> 
> b000-bfff : PCI Bus #01
> 
> b800-b807 : Rockwell International HCF 56k Data/Fax/Voice/Spkp 
> (w/Handset) Modem
> 
> d800-d807 : PCI device 10de:01c3 (nVidia Corporation)
> 
> e000-e07f : PCI device 10de:01b1 (nVidia Corporation)
> 
> e100-e1ff : PCI device 10de:01b1 (nVidia Corporation)
> 
> 00000000-0007ffff : System RAM
> 
> 0009fc00-0009ffff : reserved
> 
> 000a0000-000bffff : Video RAM area
> 
> 000c0000-000c7fff : Video ROM
> 
> 000f0000-000fffff : System ROM
> 
> 00100000-06febfff : System RAM
> 
> 00100000-00247f2e : Kernel code
> 
> 00247f2f-0033ed03 : Kernel data
> 
> 06fec000-06feefff : ACPI Tables
> 
> 06fef000-06ffefff : reserved
> 
> 06fff000-06ffffff : ACPI Non-volatile Storage
> 
> eb000000-ec7fffff : PCI Bus #02
> 
> eb000000-ebffffff : nVidia Corporation GeForce2 Integrated GPU
> 
> ec800000-ecffffff : PCI Bus #01
> 
> ec800000-ec80ffff : Rockwell International HCF 56k 
> Data/Fax/Voice/Spkp (w/Handset) Modem
> 
> ed000000-ed000fff : PCI device 10de:01b1 (nVidia Corporation)
> 
> ed800000-ed87ffff : PCI device 10de:01b0 (nVidia Corporation)
> 
> ee000000-ee0003ff : PCI device 10de:01c3 (nVidia Corporation)
> 
> ee800000-ee800fff : PCI device 10de:01c2 (nVidia Corporation)
> 
> ee800000-ee800fff : usb-ohci
> e
> f000000-ef000fff : PCI device
> 10de:01c2 (nVidia Corporation)
> 
> ef000000-ef000fff : usb-ohci
> e
> ff00000-f7ffffff : PCI Bus #02
> 
> f0000000-f7ffffff : nVidia Corporation GeForce2 Integrated GPU
> 
> f8000000-fbffffff : PCI device
> 10de:01a4 (nVidia Corporation)
> 
> fec00000-fec00fff : reserved
> 
> fee00000-fee00fff : reserved
> 
> ffff0000-ffffffff : reserved
> 
> 
> PCI information:
> 00:00.0 Host bridge: nVidia Corporation nForce CPU bridge (rev 
> b2)
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0
> 	Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
> 	Capabilities: [40] AGP version 2.0
> 		Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2,x4
> 		Command: RQ=0 SBA- AGP- 64bit- FW- Rate=x1
> 	Capabilities: [60] #08 [2001]
> 
> 00:00.1 RAM memory: nVidia Corporation nForce 220/420 Memory 
> Controller (rev b2)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 
> 00:00.2 RAM memory: nVidia Corporation nForce 220/420 Memory 
> Controller (rev b2)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 
> 00:00.3 RAM memory: nVidia Corporation: Unknown device 01aa (rev 
> b2)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 
> 00:01.0 ISA bridge: nVidia Corporation nForce ISA Bridge (rev 
> c3)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0
> 	Capabilities: [50] #08 [01e1]
> 
> 00:01.1 SMBus: nVidia Corporation nForce PCI System Management 
> (rev c1)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: I/O ports at 5000 [size=16]
> 	Region 1: I/O ports at 5500 [size=16]
> 	Region 2: I/O ports at 5100 [size=32]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:02.0 USB Controller: nVidia Corporation: Unknown device 01c2 
> (rev c3) (prog-if 10 [OHCI])
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (750ns min, 250ns max)
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: Memory at ef000000 (32-bit, non-prefetchable) 
> [size=4K]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:03.0 USB Controller: nVidia Corporation: Unknown device 01c2 
> (rev c3) (prog-if 10 [OHCI])
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (750ns min, 250ns max)
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: Memory at ee800000 (32-bit, non-prefetchable) 
> [size=4K]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:04.0 Ethernet controller: nVidia Corporation: Unknown device 
> 01c3 (rev c2)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (250ns min, 5000ns max)
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: Memory at ee000000 (32-bit, non-prefetchable) 
> [size=1K]
> 	Region 1: I/O ports at d800 [size=8]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:05.0 Multimedia audio controller: nVidia Corporation: Unknown 
> device 01b0 (rev c2)
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (250ns min, 3000ns max)
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: Memory at ed800000 (32-bit, non-prefetchable) 
> [size=512K]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:06.0 Multimedia audio controller: nVidia Corporation nForce 
> Audio (rev c2)
> 	Subsystem: nVidia Corporation: Unknown device 8384
> 	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (500ns min, 1250ns max)
> 	Interrupt: pin A routed to IRQ 11
> 	Region 0: I/O ports at e100 [size=256]
> 	Region 1: I/O ports at e000 [size=128]
> 	Region 2: Memory at ed000000 (32-bit, non-prefetchable) 
> [disabled] [size=4K]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:08.0 PCI bridge: nVidia Corporation nForce PCI-to-PCI bridge 
> (rev c2) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0
> 	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> 	I/O behind bridge: 0000b000-0000bfff
> 	Memory behind bridge: ec800000-ecffffff
> 	Prefetchable memory behind bridge: f8000000-f7ffffff
> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
> 
> 00:09.0 IDE interface: nVidia Corporation nForce IDE (rev c3) 
> (prog-if 8a [Master SecP PriP])
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0 (750ns min, 250ns max)
> 	Region 4: I/O ports at a800 [size=16]
> 	Capabilities: [44] Power Management version 2
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 
> 00:1e.0 PCI bridge: nVidia Corporation nForce AGP to PCI Bridge 
> (rev b2) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 0
> 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
> 	I/O behind bridge: 0000a000-00009fff
> 	Memory behind bridge: eb000000-ec7fffff
> 	Prefetchable memory behind bridge: eff00000-f7ffffff
> 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
> 
> 01:08.0 Communication controller: Rockwell International HCF 56k 
> Data/Fax/Voice/Spkp (w/Handset) Modem (rev 01)
> 	Subsystem: Rockwell International HCF 56k Data/Fax/Voice/Spkp 
> (w/Handset) Modem
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 64
> 	Interrupt: pin A routed to IRQ 5
> 	Region 0: Memory at ec800000 (32-bit, non-prefetchable) 
> [size=64K]
> 	Region 1: I/O ports at b800 [size=8]
> 	Capabilities: [40] Power Management version 2
> 		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
> 
> 02:00.0 VGA compatible controller: nVidia Corporation NV15 
> [GeForce2 - nForce GPU] (rev b1) (prog-if 00 [VGA])
> 	Subsystem: nVidia Corporation: Unknown device 0c11
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
> 	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
> 	Latency: 32 (1250ns min, 250ns max)
> 	Interrupt: pin A routed to IRQ 11
> 	Region 0: Memory at eb000000 (32-bit, non-prefetchable) 
> [size=16M]
> 	Region 1: Memory at f0000000 (32-bit, prefetchable) 
> [size=128M]
> 	Expansion ROM at efff0000 [disabled] [size=64K]
> 	Capabilities: [60] Power Management version 2
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [44] AGP version 2.0
> 		Status: RQ=31 SBA- 64bit- FW+ Rate=x1,x4
> 		Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
> 
> 
> 
> XF86Config:
> 
> # File generated by anaconda.
> 
> Section "ServerLayout"
>         Identifier     "Anaconda Configured"
>         Screen      0  "Screen0" 0 0
>         InputDevice    "Mouse0" "CorePointer"
> 	InputDevice	"Mouse1" "SendCoreEvents"
>         InputDevice    "Keyboard0" "CoreKeyboard"
> EndSection
> 
> Section "Files"
> 
> # The location of the RGB database.  Note, this is the name of 
> the
> # file minus the extension (like ".txt" or ".db").  There is 
> normally
> # no need to change the default.
> 
>     RgbPath	"/usr/X11R6/lib/X11/rgb"
> 
> # Multiple FontPath entries are allowed (they are concatenated 
> together)
> # By default, Red Hat 6.0 and later now use a font server 
> independent of
> # the X server to render fonts.
> 
>     FontPath   "unix/:7100"
> 
> EndSection
> 
> Section "Module"
>         Load  "dbe"
>         Load  "extmod"
> 	Load  "fbdevhw"
> 	Load  "dri"
>         Load  "glx"
>         Load  "record"
>         Load  "freetype"
>         Load  "type1"
> EndSection
> 
> Section "InputDevice"
>         Identifier  "Keyboard0"
>         Driver      "keyboard"
> 
> #	Option	"AutoRepeat"	"500 5"
> 
> # when using XQUEUE, comment out the above line, and uncomment 
> the
> # following line
> #	Option	"Protocol"	"Xqueue"
> 
> # Specify which keyboard LEDs can be user-controlled (eg, with 
> xset(1))
> #	Option	"Xleds"		"1 2 3"
> 
> # To disable the XKEYBOARD extension, uncomment XkbDisable.
> #	Option	"XkbDisable"
> 
> # To customise the XKB settings to suit your keyboard, modify 
> the
> # lines below (which are the defaults).  For example, for a 
> non-U.S.
> # keyboard, you will probably want to use:
> #	Option	"XkbModel"	"pc102"
> # If you have a US Microsoft Natural keyboard, you can use:
> #	Option	"XkbModel"	"microsoft"
> #
> # Then to change the language, change the Layout setting.
> # For example, a german layout can be obtained with:
> #	Option	"XkbLayout"	"de"
> # or:
> #	Option	"XkbLayout"	"de"
> #	Option	"XkbVariant"	"nodeadkeys"
> #
> # If you'd like to switch the positions of your capslock and
> # control keys, use:
> #	Option	"XkbOptions"	"ctrl:swapcaps"
> 	Option	"XkbRules"	"xfree86"
> 	Option	"XkbModel"	"pc105"
> 	Option	"XkbLayout"	"us"
> 	#Option	"XkbVariant"	""
> 	#Option	"XkbOptions"	""
> EndSection
> 
> Section "InputDevice"
>         Identifier  "Mouse0"
>         Driver      "mouse"
>         Option      "Protocol" "PS/2"
>         Option      "Device" "/dev/psaux"
>         Option      "ZAxisMapping" "4 5"
>         Option      "Emulate3Buttons" "yes"
> EndSection
> 
> 
> Section "InputDevice"
> 	Identifier	"Mouse1"
> 	Driver		"mouse"
> 	Option		"Device"		"/dev/input/mice"
> 	Option		"Protocol"		"IMPS/2"
> 	Option		"Emulate3Buttons"	"no"
> 	Option		"ZAxisMapping"		"4 5"
> EndSection
> 
> 
> Section "Monitor"
>         Identifier   "Monitor0"
>         VendorName   "Monitor Vendor"
>         ModelName    "Monitor Model"
>         HorizSync   30-55
>         VertRefresh 50-120
>         Option "dpms"
> 
> 
> EndSection
> 
> Section "Device"
> 	# no known options
> 	Identifier   "NVIDIA GeForce 2 MX (generic)"
>         Driver       "nv"
>         VendorName   "NVIDIA GeForce 2 MX (generic)"
>         BoardName     "NVIDIA GeForce 2 MX (generic)"
> 
>         #BusID
> EndSection
> 
> Section "Screen"
> 	Identifier   "Screen0"
>         Device       "NVIDIA GeForce 2 MX (generic)"
>         Monitor      "Monitor0"
> 	DefaultDepth	16
> 
> 	Subsection "Display"
>         	Depth       16
>                 Modes       "1024x768" "800x600" "640x480"
> 	EndSubsection
> 
> EndSection
> 
> Section "DRI"
> 	Mode 0666
> EndSection
> 
> cmdline:
> auto BOOT_IMAGE=linux ro BOOT_FILE=/boot/vmlinuz-2.4.18-14 
> root=LABEL=/
> 
> 
> dma:
> 4: cascade
> 
> 
> intrrupts:
> CPU0
>   0:     337647          XT-PIC  timer
>   1:       2694          XT-PIC  keyboard
>   2:          0          XT-PIC  cascade
>   5:          0          XT-PIC  usb-ohci, usb-ohci
>   8:          1          XT-PIC  rtc
>  12:         20          XT-PIC  PS/2 Mouse
>  14:      27338          XT-PIC  ide0
> NMI:          0
> ERR:          2
> 
> 
> partitions:
> major minor  #blocks  name     rio rmerge rsect ruse wio wmerge 
> wsect wuse running use aveq
> 
>    3     0   39121488 hda 2567 4181 52107 22201 1417 1941 26952 
> 45880 -2 329576 7788488
>    3     1    5245191 hda1 9 43 104 109 0 0 0 0 0 109 109
>    3     2          1 hda2 0 0 0 0 0 0 0 0 0 0 0
>    3     5   10490413 hda5 9 43 104 95 0 0 0 0 0 95 95
>    3     6   11695288 hda6 50 43 145 214 7 1 8 95 0 230 310
>    3     7    8104761 hda7 9 43 104 132 0 0 0 0 0 132 132
>    3     8    3277228 hda8 2475 3966 51498 21527 1410 1940 26944 
> 45785 0 22394 67314
>    3     9     305203 hda9 9 25 104 56 0 0 0 0 0 56 56
> 
> 
> 
> My e-mail addresses are: linux_guyus@yahoo.com and 
> linux_guyus@rediff.com
> My postel address: 80, Ahilya Nagar Ext. Annapurna Road, Indore, 
> M.P.,
> India. PIN 452009
> please answer me soon.
> 		Thanking you.
> 			Saurabh Khanna.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

mvg,
Alex

-- 
"[...] Konqueror open source project. Weighing in at less than
            one tenth the size of another open source renderer"
Apple,  Jan 2003 (http://www.apple.com/safari/)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-25 23:10           ` Larry McVoy
@ 2003-01-26  8:12             ` David S. Miller
  0 siblings, 0 replies; 657+ messages in thread
From: David S. Miller @ 2003-01-26  8:12 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Eric W. Biederman, Jason Papadopoulos, linux-kernel, linux-mm

On Sat, 2003-01-25 at 15:10, Larry McVoy wrote:
> All good page coloring implementation do exactly that.  The starting
> index into the page buckets is based on process id.

I think everyone interested in learning more about this
topic should go read the following papers, they were very
helpful when I was fiddling around in this area.

These papers, in turn, reference several others which are
good reads as well.

1) W. L. Lynch, B. K. Bray, and M. J. Flynn. "The effect of page
   allocation on caches". In Micro-25 Conference Proceedings, pages
   222-225, December 1992. 

2) W. Lynch and M. Flynn. "Cache improvements through colored page
   allocation". ACM Transactions on Computer Systems, 1993. Submitted
   for review, 1992. 

3) William L. Lynch. "The Interaction of Virtual Memory and Cache
   Memory". PhD thesis, Stanford University, October
   1993. CSL-TR-93-587.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-25 17:47         ` Eric W. Biederman
@ 2003-01-25 23:10           ` Larry McVoy
  2003-01-26  8:12             ` David S. Miller
  0 siblings, 1 reply; 657+ messages in thread
From: Larry McVoy @ 2003-01-25 23:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Larry McVoy, Jason Papadopoulos, linux-kernel, linux-mm

> I am wondering if there is any point in biasing page addresses in between
> processes so that processes are less likely to have a cache conflict.
> i.e.  process 1 address 0 %16K == 0, process 2 address 0 %16K == 4K 

All good page coloring implementation do exactly that.  The starting
index into the page buckets is based on process id.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-25  2:26       ` Larry McVoy
@ 2003-01-25 17:47         ` Eric W. Biederman
  2003-01-25 23:10           ` Larry McVoy
  0 siblings, 1 reply; 657+ messages in thread
From: Eric W. Biederman @ 2003-01-25 17:47 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Jason Papadopoulos, linux-kernel, linux-mm

Larry McVoy <lm@bitmover.com> writes:

> > For the record, I finally got to try my own page coloring patch on a 1GHz
> > Athlon Thunderbird system with 256kB L2 cache. With the present patch, my
> > own number crunching benchmarks and a kernel compile don't show any benefit 
> > at all, and lmbench is completely unchanged except for the mmap latency, 
> > which is slightly worse. Hardly a compelling case for PCs!
> 
> If it works correctly then the variability in lat_ctx should go away.
> Try this
> 
> 	for p in 2 4 8 12 16 24 32 64
> 	do	for size in 0 2 4 8 16
> 		do	for i in 1 2 3 4 5 6 7 8 9 0
> 			do	lat_ctx -s$size $p
> 			done
> 		done
> 	done
> 
> on both the with and without kernel.  The page coloring should make the 
> numbers rock steady, without it, they will bounce a lot.

On the same kind of vein I have seen some tremendous variability in the
stream benchmark.  Under linux I have gotten it to very as much
as a 100MB/sec by running updatedb, between runs.  In one case
it ran faster with updatedb running in the background.

But at the same time streams tends to be very steady if you have a quiet
machine and run it several times in a row repeatedly because it gets
allocated essentially the same memory every run.

So I do no the variables of cache contention do have effect on some
real programs.  I have not yet tracked it down to see if cache coloring
could be a benefit.  I suspect the buddy allocator actually comes
quite close most of the time, and tricks like allocating multiple pages
at once could improve that even more with very little effort, while reducing
page fault miss times.

I am wondering if there is any point in biasing page addresses in between
processes so that processes are less likely to have a cache conflict.
i.e.  process 1 address 0 %16K == 0, process 2 address 0 %16K == 4K 

Eric

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  6:06   ` John Alvord
@ 2003-01-25  2:29     ` Jason Papadopoulos
  2003-01-25  2:26       ` Larry McVoy
  0 siblings, 1 reply; 657+ messages in thread
From: Jason Papadopoulos @ 2003-01-25  2:29 UTC (permalink / raw)
  To: linux-kernel, linux-mm

At 10:06 PM 1/23/03 -0800, John Alvord wrote:

>The big challenge in Linux is that several serious attempts to add
>page coloring have foundered on the shoals of "no benefit found". It
>may be that the typical hardware Linux runs on just doesn't experience
>the problem very much.

Another strike against page coloring is that it gives tremendous benefits
when caches are large and not very associative, but if both of these are
not present the benefits are much smaller. In the case of latter-day PCs,
neither of these is the case: the caches are very small and at least 8-way
set associative.

For the record, I finally got to try my own page coloring patch on a 1GHz
Athlon Thunderbird system with 256kB L2 cache. With the present patch, my
own number crunching benchmarks and a kernel compile don't show any benefit 
at all, and lmbench is completely unchanged except for the mmap latency, 
which is slightly worse. Hardly a compelling case for PCs!

Oh well. At least now I'll be able to port to 2.5 :)

jasonp

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-25  2:29     ` Jason Papadopoulos
@ 2003-01-25  2:26       ` Larry McVoy
  2003-01-25 17:47         ` Eric W. Biederman
  0 siblings, 1 reply; 657+ messages in thread
From: Larry McVoy @ 2003-01-25  2:26 UTC (permalink / raw)
  To: Jason Papadopoulos; +Cc: linux-kernel, linux-mm

> For the record, I finally got to try my own page coloring patch on a 1GHz
> Athlon Thunderbird system with 256kB L2 cache. With the present patch, my
> own number crunching benchmarks and a kernel compile don't show any benefit 
> at all, and lmbench is completely unchanged except for the mmap latency, 
> which is slightly worse. Hardly a compelling case for PCs!

If it works correctly then the variability in lat_ctx should go away.
Try this

	for p in 2 4 8 12 16 24 32 64
	do	for size in 0 2 4 8 16
		do	for i in 1 2 3 4 5 6 7 8 9 0
			do	lat_ctx -s$size $p
			done
		done
	done

on both the with and without kernel.  The page coloring should make the 
numbers rock steady, without it, they will bounce a lot.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24 19:14         ` David Lang
@ 2003-01-24 19:40           ` Maciej W. Rozycki
  0 siblings, 0 replies; 657+ messages in thread
From: Maciej W. Rozycki @ 2003-01-24 19:40 UTC (permalink / raw)
  To: David Lang; +Cc: Anoop J., linux-mm, linux-kernel

On Fri, 24 Jan 2003, David Lang wrote:

> the cache never sees the virtual addresses, it operated excclusivly on the
> physical addresses so the problem of aliasing never comes up.

 It depends on the implementation.

> virtual to physical addres mapping is all resolved before anything hits
> the cache.

 It depends on the processor.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  9:49       ` Anoop J.
@ 2003-01-24 19:14         ` David Lang
  2003-01-24 19:40           ` Maciej W. Rozycki
  0 siblings, 1 reply; 657+ messages in thread
From: David Lang @ 2003-01-24 19:14 UTC (permalink / raw)
  To: Anoop J.; +Cc: linux-mm, linux-kernel

the cache never sees the virtual addresses, it operated excclusivly on the
physical addresses so the problem of aliasing never comes up.

virtual to physical addres mapping is all resolved before anything hits
the cache.

David Lang

On Fri, 24 Jan 2003, Anoop J. wrote:

> Date: Fri, 24 Jan 2003 15:19:16 +0530 (IST)
> From: Anoop J. <cs99001@nitc.ac.in>
> To: david.lang@digitalinsight.com
> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> Subject: Re: your mail
>
> ok i shall put it in another way
> since virtual indexing is a representation of the virtual memory,
> it is possible for more multiple virtual addresses to represent the same
> physical address.So the problem of aliasing occurs in the cache.Does page
> coloring guarantee a unique mapping of physical address.If so how is the
> maping from virtual to physical address
>
>
>
> Thanks
>
>
>
> > I think this is a case of the same tuerm being used for two different
> > purposes. I don't know the use you are refering to.
> >
> > David Lang
> >
> >
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  8:48     ` David Lang
@ 2003-01-24  9:49       ` Anoop J.
  2003-01-24 19:14         ` David Lang
  0 siblings, 1 reply; 657+ messages in thread
From: Anoop J. @ 2003-01-24  9:49 UTC (permalink / raw)
  To: david.lang; +Cc: linux-mm, linux-kernel

ok i shall put it in another way
since virtual indexing is a representation of the virtual memory,
it is possible for more multiple virtual addresses to represent the same
physical address.So the problem of aliasing occurs in the cache.Does page
coloring guarantee a unique mapping of physical address.If so how is the
maping from virtual to physical address



Thanks



> I think this is a case of the same tuerm being used for two different
> purposes. I don't know the use you are refering to.
>
> David Lang
>
>
>





^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  6:28 ` your mail David Lang
@ 2003-01-24  8:51   ` Anoop J.
  2003-01-24  8:48     ` David Lang
  0 siblings, 1 reply; 657+ messages in thread
From: Anoop J. @ 2003-01-24  8:51 UTC (permalink / raw)
  To: david.lang; +Cc: linux-mm, linux-kernel

I read that the data coherency problem due to virtual indexing is avoided
through page coloring and it has also got the speed of physical indexing
can u just elaborate on how this is possible?


Thanks




> implementing a fully associative cache eliminates the need for page
> coloring, but it has to be implemented in hardware. if you don't have
> fully associative caches in your hardware page coloring helps avoid the
> worst case memory allocations.
>
> from what I have seen on the attempts to implement it the problem is
> that the calculations needed to do page colored allocations end up
> costing enough that they end up with a net loss compared to the old
> method.
>
> David Lang
>
>
>  On Fri, 24 Jan 2003, Anoop J.
> wrote:
>
>> Date: Fri, 24 Jan 2003 11:24:24 +0530 (IST)
>> From: Anoop J. <cs99001@nitc.ac.in>
>> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
>>
>>
>> How is this different from a fully associative cache .Would be better
>> if u could deal it based on the address bits used
>>




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  8:51   ` Anoop J.
@ 2003-01-24  8:48     ` David Lang
  2003-01-24  9:49       ` Anoop J.
  0 siblings, 1 reply; 657+ messages in thread
From: David Lang @ 2003-01-24  8:48 UTC (permalink / raw)
  To: Anoop J.; +Cc: linux-mm, linux-kernel

I think this is a case of the same tuerm being used for two different
purposes. I don't know the use you are refering to.

David Lang


On Fri, 24 Jan 2003, Anoop J. wrote:

> I read that the data coherency problem due to virtual indexing is avoided
> through page coloring and it has also got the speed of physical indexing
> can u just elaborate on how this is possible?
>
>
> Thanks
>
>
>
>
> > implementing a fully associative cache eliminates the need for page
> > coloring, but it has to be implemented in hardware. if you don't have
> > fully associative caches in your hardware page coloring helps avoid the
> > worst case memory allocations.
> >
> > from what I have seen on the attempts to implement it the problem is
> > that the calculations needed to do page colored allocations end up
> > costing enough that they end up with a net loss compared to the old
> > method.
> >
> > David Lang
> >
> >
> >  On Fri, 24 Jan 2003, Anoop J.
> > wrote:
> >
> >> Date: Fri, 24 Jan 2003 11:24:24 +0530 (IST)
> >> From: Anoop J. <cs99001@nitc.ac.in>
> >> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
> >>
> >>
> >> How is this different from a fully associative cache .Would be better
> >> if u could deal it based on the address bits used
> >>
>
>
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  5:54 Anoop J.
@ 2003-01-24  6:28 ` David Lang
  2003-01-24  8:51   ` Anoop J.
  0 siblings, 1 reply; 657+ messages in thread
From: David Lang @ 2003-01-24  6:28 UTC (permalink / raw)
  To: Anoop J.; +Cc: linux-mm, linux-kernel

implementing a fully associative cache eliminates the need for page
coloring, but it has to be implemented in hardware. if you don't have
fully associative caches in your hardware page coloring helps avoid the
worst case memory allocations.

from what I have seen on the attempts to implement it the problem is that
the calculations needed to do page colored allocations end up costing
enough that they end up with a net loss compared to the old method.

David Lang


 On Fri, 24 Jan 2003, Anoop J.
wrote:

> Date: Fri, 24 Jan 2003 11:24:24 +0530 (IST)
> From: Anoop J. <cs99001@nitc.ac.in>
> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
>
>
> How is this different from a fully associative cache .Would be better if u
> could deal it based on the address bits used
>
> Thanks
>
> David Lang wrote:
>
> >The idea of page coloring is based on the fact that common implementations
> >of caching can't put any page in memory in any line in the cache (such an
> >implementation is possible, but is more expensive to do so is not commonly
> >done)
> >
> >With this implementation it means that if your program happens to use
> >memory that cannot be mapped to half of the cache lines then effectivly
> >the CPU cache is half it's rated size for your program. the next time your
> >program runs it may get a more favorable memory allocation and be able to
> >use all of the cache and therefor run faster.
> >
> >Page coloring is an attampt to take this into account when allocating
> >memory to programs so that every program gets to use all of the cache.
> >
> >David Lang
> >
> >
> > On Fri, 24 Jan 2003, Anoop J. wrote:
> >
> >>Date: Fri, 24 Jan 2003 10:38:03 +0530 (IST)
> >>From: Anoop J. <cs99001@nitc.ac.in>
> >>To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
> >>
> >>
> >>How does page coloring work. Iwant its mechanism not the implementation.
> >>I went through some pages of W.L.Lynch's paper on cache and VM. Still not
> >>able to grasp it .
> >>
> >>
> >>Thanks in advance
> >>
> >>
> >>
> >>-
> >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>Please read the FAQ at  http://www.tux.org/lkml/
> >>
> >
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  5:11 ` your mail David Lang
@ 2003-01-24  6:06   ` John Alvord
  2003-01-25  2:29     ` Jason Papadopoulos
  0 siblings, 1 reply; 657+ messages in thread
From: John Alvord @ 2003-01-24  6:06 UTC (permalink / raw)
  To: David Lang; +Cc: Anoop J., linux-kernel, linux-mm

The big challenge in Linux is that several serious attempts to add
page coloring have foundered on the shoals of "no benefit found". It
may be that the typical hardware Linux runs on just doesn't experience
the problem very much.

john


On Thu, 23 Jan 2003 21:11:10 -0800 (PST), David Lang
<david.lang@digitalinsight.com> wrote:

>The idea of page coloring is based on the fact that common implementations
>of caching can't put any page in memory in any line in the cache (such an
>implementation is possible, but is more expensive to do so is not commonly
>done)
>
>With this implementation it means that if your program happens to use
>memory that cannot be mapped to half of the cache lines then effectivly
>the CPU cache is half it's rated size for your program. the next time your
>program runs it may get a more favorable memory allocation and be able to
>use all of the cache and therefor run faster.
>
>Page coloring is an attampt to take this into account when allocating
>memory to programs so that every program gets to use all of the cache.
>
>David Lang
>
>
> On Fri, 24 Jan 2003, Anoop J. wrote:
>
>> Date: Fri, 24 Jan 2003 10:38:03 +0530 (IST)
>> From: Anoop J. <cs99001@nitc.ac.in>
>> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
>>
>>
>> How does page coloring work. Iwant its mechanism not the implementation.
>> I went through some pages of W.L.Lynch's paper on cache and VM. Still not
>> able to grasp it .
>>
>>
>> Thanks in advance
>>
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-24  5:08 Anoop J.
@ 2003-01-24  5:11 ` David Lang
  2003-01-24  6:06   ` John Alvord
  0 siblings, 1 reply; 657+ messages in thread
From: David Lang @ 2003-01-24  5:11 UTC (permalink / raw)
  To: Anoop J.; +Cc: linux-kernel, linux-mm

The idea of page coloring is based on the fact that common implementations
of caching can't put any page in memory in any line in the cache (such an
implementation is possible, but is more expensive to do so is not commonly
done)

With this implementation it means that if your program happens to use
memory that cannot be mapped to half of the cache lines then effectivly
the CPU cache is half it's rated size for your program. the next time your
program runs it may get a more favorable memory allocation and be able to
use all of the cache and therefor run faster.

Page coloring is an attampt to take this into account when allocating
memory to programs so that every program gets to use all of the cache.

David Lang


 On Fri, 24 Jan 2003, Anoop J. wrote:

> Date: Fri, 24 Jan 2003 10:38:03 +0530 (IST)
> From: Anoop J. <cs99001@nitc.ac.in>
> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
>
>
> How does page coloring work. Iwant its mechanism not the implementation.
> I went through some pages of W.L.Lynch's paper on cache and VM. Still not
> able to grasp it .
>
>
> Thanks in advance
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2003-01-12 13:28 Philip K.F. Hölzenspies
@ 2003-01-13 16:37 ` Pete Zaitcev
  0 siblings, 0 replies; 657+ messages in thread
From: Pete Zaitcev @ 2003-01-13 16:37 UTC (permalink / raw)
  To: Philip K.F. Hölzenspies; +Cc: linux-kernel

> Linux version 2.4.20 (root@tomwaits) (gcc version 3.2) #1 SMP Sat Jan 11 18:46:51 CET 2003
> Intel MultiProcessor Specification v1.4
>     Virtual Wire compatibility mode.
>[...]
> PCI: Using IRQ router AMD768 [1022/7443] at 00:07.3
> PCI->APIC IRQ transform: (B1,I5,P0) -> 16
> PCI->APIC IRQ transform: (B2,I5,P0) -> 18

> PCI: Enabling device 02:08.2 (0014 -> 0016)
> PCI: No IRQ known for interrupt pin C of device 02:08.2. Probably buggy
> MP table.

I am sorry to say, I cannot help you. This is the department
of Manfred, most likely. The 95% bet is that your BIOS is crap,
and you have to poke ASUS. However, you might want to explore
a possiblity of a bug. The best way to do it is to run "mptable"
program to dump the table and then get someone who makes
a sense of the data. Try to figure out who wrote the code
to support AMD IRQ router. He may be the culprit (5%, but...)

 http://people.redhat.com/zaitcev/linux/mptable-2.0.15a-1.i386.rpm
 http://people.redhat.com/zaitcev/linux/mptable-2.0.15a-1.src.rpm

-- Pete

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-11-11 19:22 David Mosberger
@ 2002-11-12  1:39 ` Rik van Riel
  0 siblings, 0 replies; 657+ messages in thread
From: Rik van Riel @ 2002-11-12  1:39 UTC (permalink / raw)
  To: davidm; +Cc: Mario Smarduch, linux-ia64, linux-kernel

On Mon, 11 Nov 2002, David Mosberger wrote:
> >>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:
>
>   Mario> I know that on some commercial Unix systems there are ways to
>   Mario> cap the CPU utilization by user/group ids are there such
>   Mario> features/patches available on Linux?

> The kernel patches available from this URL are pretty old (up to
> 2.4.6, as far as I could see), and I'm not sure what the future plans
> for PRM on Linux are.  Perhaps someone else can provide more details.

I'm (slowly) working on a per-user fair scheduler on top of Ingo's
O(1) scheduler.  Slowly because it's a fairly complex thing.

Once that is done it should be possible to change the accounting
to other resource containers and generally have fun assigning
priorities, though that is beyond the scope of what I'm trying to
achieve.

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* RE: your mail
@ 2002-10-31 18:13 Bloch, Jack
  0 siblings, 0 replies; 657+ messages in thread
From: Bloch, Jack @ 2002-10-31 18:13 UTC (permalink / raw)
  To: 'Tom Bradley'; +Cc: linux-kernel

Thanks very much.

Jack Bloch 
Siemens ICN
phone                (561) 923-6550
e-mail                jack.bloch@icn.siemens.com


-----Original Message-----
From: Tom Bradley [mailto:tojabr@tojabr.com]
Sent: Thursday, October 31, 2002 1:00 PM
To: Bloch, Jack
Cc: linux-kernel@vger.kernel.org
Subject: Re: your mail


They are just regular values. The UL tells the compiler to format the
number as an unsgned long.


On Thu, 31 Oct 2002, Bloch, Jack wrote:

> I am looking at some sample driver code which shows the usage of some
> unsigned integers 1UL, 2UL, 4UL, 16UL, 64UL, 128UL and 256UL.  I need to
> know what these are defined as. Please excuse my ignorance.
>
> Please CC me directly on any responses.
>
> Jack Bloch
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-31 15:39 Bloch, Jack
@ 2002-10-31 18:00 ` Tom Bradley
  0 siblings, 0 replies; 657+ messages in thread
From: Tom Bradley @ 2002-10-31 18:00 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

They are just regular values. The UL tells the compiler to format the
number as an unsgned long.


On Thu, 31 Oct 2002, Bloch, Jack wrote:

> I am looking at some sample driver code which shows the usage of some
> unsigned integers 1UL, 2UL, 4UL, 16UL, 64UL, 128UL and 256UL.  I need to
> know what these are defined as. Please excuse my ignorance.
>
> Please CC me directly on any responses.
>
> Jack Bloch
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-30 12:45 Roberto Fichera
@ 2002-10-30 14:04 ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2002-10-30 14:04 UTC (permalink / raw)
  To: Roberto Fichera; +Cc: linux-kernel

On Wed, 30 Oct 2002, Roberto Fichera wrote:

> I've a problem with a DAT on a Compaq Proliant ML350 with PIII 1GHz,
> 1Gb RAM, RAID controller Smart Array 451 with 3 x HDD 9Gb RAID 5
> and an internal SCSI controller Adaptec 7899 Ultra160 where is connected
> only a DAT 12/24 Gb. Current installed distribution is RH7.3 with its kernel
> 2.4.18-10 but I've tryed the standard 2.4.19 with the same problem.
> The problem is that the DAT don't work any more with Linux. This DAT work
> well on Win2K :-(! Below  there is some logs and a 'ps fax' showing a tar in
> D state.
> 
> Does anyone know a solution ?

> 
> Adaptec AIC7xxx driver version: 6.2.6
> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> Corrupted Serial EEPROM
^^^^^^^^^^^^^^^^^^^^^^^^^

I think your controller has fallen-back into survival mode
because it lost it's mind. You may want to upgrade the
controller BIOS to fix this problem. Then, see if it handles
tapes okay.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
   Bush : The Fourth Reich of America



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-18  2:47   ` Rusty Russell
@ 2002-10-18 21:50     ` Kai Germaschewski
  0 siblings, 0 replies; 657+ messages in thread
From: Kai Germaschewski @ 2002-10-18 21:50 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Daniel Phillips, S, Roman Zippel, linux-kernel

On Fri, 18 Oct 2002, Rusty Russell wrote:

> > I wonder if this new method is going to be mandatory (the only one
> > available) or optional. I think there's two different kind of users, for
> > one modules which use an API which provides its own infrastructure for
> > dealing with modules via ->owner, on the other hand things like netfilter
> > (that's probably where you are coming from) where calls into a module,
> > which need protection are really frequent.
> 
> Mandatory for interfaces where the function can sleep (or be preempted).

and is not protected by other means (try_inc_mod_count()), I presume.

> > I see that your approach makes frequent calls into the module cheaper, but
> > I'm not totally convinced that the current safe interfaces need to change
> > just to accomodate rare cases like netfilter (there's most likely some
> > more cases like it, but the majority of modules is not).
> 
> They're not changing.  The current users doing try_inc_mod_count() are
> fine.  It's the ones not doing it which are problematic.

Alright, so I'm fine with it ;) (not that makes a difference, but...)

--Kai



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-17 14:56 ` your mail Kai Germaschewski
@ 2002-10-18  2:47   ` Rusty Russell
  2002-10-18 21:50     ` Kai Germaschewski
  0 siblings, 1 reply; 657+ messages in thread
From: Rusty Russell @ 2002-10-18  2:47 UTC (permalink / raw)
  To: Kai Germaschewski; +Cc: Daniel Phillips, S, Roman Zippel, linux-kernel

In message <Pine.LNX.4.44.0210170930410.6301-100000@chaos.physics.uiowa.edu> yo
u write:
> Since I made the mistake of getting involved into this discussion lately,

My condolences. 8)

> I wonder if this new method is going to be mandatory (the only one
> available) or optional. I think there's two different kind of users, for
> one modules which use an API which provides its own infrastructure for
> dealing with modules via ->owner, on the other hand things like netfilter
> (that's probably where you are coming from) where calls into a module,
> which need protection are really frequent.

Mandatory for interfaces where the function can sleep (or be preempted).

> Note that for the vast majority of modules, dealing with unload races is 
> as simple as setting ->owner, for example filesystems, network drivers.

Yes.  We do not have complete coverage though, this policy would
extend it.

> I see that your approach makes frequent calls into the module cheaper, but
> I'm not totally convinced that the current safe interfaces need to change
> just to accomodate rare cases like netfilter (there's most likely some
> more cases like it, but the majority of modules is not).

They're not changing.  The current users doing try_inc_mod_count() are
fine.  It's the ones not doing it which are problematic.

> Anyway, I may see further problems, but let me check first: Is your count
> supposed to only count users which are currently executing in the module's
> .text, or is it also to count references to data allocated in the module?
> (I.e. when I register_netdev(), does that keep a reference to the module
> even after the code has left the module's .text?)

It's to protect entry to the function, but of course, some interfaces
(eg. filesystems) lend themselves very neatly to batching this at
mount/unmount time.  Data is already protected by the usual means.

At risk of boring you, here's the document from the documentation
patch.  Suggestions welcome.

+Writing Modules and the Interfaces To Be Used By Them: A Gentle Guide.
+Copyright 2002, Rusty Russell IBM Corportation
+
+Modules are running parts of the kernel which can be added, and
+sometimes removed, while the kernel is operational.
+
+There are several delicate issues involved in this procedure which
+indicate special care should be taken.
+
+There are three cases you need to be careful:
+
+1) Any code which creates an interface for callbacks (ie. almost any
+   function called register_*)
+	=> See Rule #1
+
+2) Any modules which use (old) interfaces which do not obey Rule #1
+	=> See Rule #2
+
+Rule #1: Module-safe Interfaces.  Any interface which allows
+	registration of callbacks, must also allow registration of a
+	"struct module *owner", either in the structure or as a
+	function parameter, and it must use them to protect the
+	callbacks.  See "MAKING INTERFACES SAFE".
+
+Exception #1: As an optimization, you may skip this protection if you
+	   *know* that the callbacks are non-preemtible and never
+	   sleep (eg. registration of interrupt handlers).
+
+
+Rule #2: Modules using unsafe interfaces.  If your module is using any
+	interface which does not obey rule number 1, that means your
+	module functions may be called from the rest of the kernel
+	without the caller first doing a successful try_module_get().
+
+	You must not register a "module_cleanup" handler, and your module
+	cannot be unloaded except by force.  You must be especially
+	careful in this case with initialization: see "INITIALIZING
+	MODULES WHICH USE UNSAFE INTERFACES".
+
+MAKING INTERFACES SAFE
+
+A caller must always call "try_module_get()" on a function pointers's
+owner before calling through that function pointer.  If
+"try_module_get()" returns 0 (false), the function pointer must *not*
+be called, and the caller should pretend that registration does not
+exist: this means the (module) owner is closing down and doesn't want
+any more calls, or in the process of starting up and isn't ready yet.
+
+For many interfaces, this can be optimized by assuming that a
+structure containing function pointers has the same owner, and knowing
+that one function is always called before the others, such as the
+filesystem code which knows a mount must succeed before any other
+methods can be accessed.
+
+You must call "module_put()" on the owner sometime after you have
+called the function(s).
+
+If you cannot make your interface module-safe in this way, you can at
+least split registration into a "reserve" stage and an "activate"
+stage, so that modules can use the interface, even if they cannot
+(easily) unload.
+
+
+INITIALIZING MODULES WHICH USE UNSAFE INTERFACES
+
+Safe interfaces will never enter your module before module_init() has
+successfully finished, but unsafe interfaces may.  The rule is simple:
+your init_module() function *must* succeed (by returning 0) if it has
+successfully used any unsafe interfaces.
+
+So, if you are only using ONE unsafe interface, simply use that
+interface last.  Otherwise you will have to use printk() to report
+failure and leave the module initialized (but possibly useless).
+
+
+
+If you have questions about how to apply this document to your own
+modules, please ask rusty@rustcorp.com.au or linux-kernel@vger.kernel.org.
+
+Thankyou,
+Rusty.

--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-17  7:41 Rusty Russell
@ 2002-10-17 14:56 ` Kai Germaschewski
  2002-10-18  2:47   ` Rusty Russell
  0 siblings, 1 reply; 657+ messages in thread
From: Kai Germaschewski @ 2002-10-17 14:56 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Daniel Phillips, S, Roman Zippel, linux-kernel

On Thu, 17 Oct 2002, Rusty Russell wrote:

> > But that one is easy: the zero check just takes the same spinlock as 
> > TRY_INC_MOD_COUNT, then sets can't-increment only in the case the count
> > is zero, considerably simpler than:
> 
> The current spinlock is horrible.  You could use a brlock, of course,
> but I didn't mainly because of code bloat and speed.  My current code
> looks like:
> 
> static inline int try_module_get(struct module *module)
> {
> 	int ret = 1;
> 
> 	if (module) {
> 		unsigned int cpu = get_cpu();
> 		if (likely(module->ref[cpu].live))
> 			local_inc(&module->ref[cpu].counter);
> 		else
> 			ret = 0;
> 		put_cpu();
> 	}
> 	return ret;
> }

Since I made the mistake of getting involved into this discussion lately,
I wonder if this new method is going to be mandatory (the only one
available) or optional. I think there's two different kind of users, for
one modules which use an API which provides its own infrastructure for
dealing with modules via ->owner, on the other hand things like netfilter
(that's probably where you are coming from) where calls into a module,
which need protection are really frequent.

Note that for the vast majority of modules, dealing with unload races is 
as simple as setting ->owner, for example filesystems, network drivers.

Sure, we need a global lock (unload_lock) when calling into these modules
initially, but these "binding/unbinding" calls are really rare. For
filesystems, they happen once per mount, for network drivers only for
ifconfig up/down. Afterwards, calling into the module (e.g. accessing the
mounted filesystem, xmitting/receiving data) doesn't have any overhead at
all compared to a linked-in filesystem/driver (well, ignore TLB misses)

I don't see a good reason to change this, in particular, since it provides 
useful information to the user, that is the mod_use_count. It means "Is it 
possible to successfully unload the module now?", and since looking at
the count and the actual unload is protected by unload_lock, the unload 
will either succeed basically immediately, or fail with -EBUSY right away.

I see that your approach makes frequent calls into the module cheaper, but
I'm not totally convinced that the current safe interfaces need to change
just to accomodate rare cases like netfilter (there's most likely some
more cases like it, but the majority of modules is not).

Anyway, I may see further problems, but let me check first: Is your count
supposed to only count users which are currently executing in the module's
.text, or is it also to count references to data allocated in the module?
(I.e. when I register_netdev(), does that keep a reference to the module
even after the code has left the module's .text?)

--Kai

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-14  6:28 Maros RAJNOCH /HiaeR Silvanna/
@ 2002-10-14 12:28 ` Dave Jones
  0 siblings, 0 replies; 657+ messages in thread
From: Dave Jones @ 2002-10-14 12:28 UTC (permalink / raw)
  To: Maros RAJNOCH /HiaeR Silvanna/; +Cc: linux-kernel

On Mon, Oct 14, 2002 at 08:28:28AM +0200, Maros RAJNOCH /HiaeR Silvanna/ wrote:
 > Linux version 2.4.2-2 (root@porky.devel.redhat.com) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-79)) #1 Sun Apr 8 20:41:30 EDT 2001

1, 2.4.2 is /very/ old, there are updated errata kernel packages at
    ftp.redhat.com
2, Bugs in Red Hat's kernel should be filed in http://bugzilla.redhat.com
   and not in linux-kernel.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-02 19:58 Mark Peloquin
@ 2002-10-02 20:19 ` jbradford
  0 siblings, 0 replies; 657+ messages in thread
From: jbradford @ 2002-10-02 20:19 UTC (permalink / raw)
  To: Mark Peloquin; +Cc: alan, linux-kernel

> On Wed, 2002-10-02 at 17:09, Alan Cox wrote:
> > Look at history - if such a mess got in, it would never get sorted.
> 
> Instead of throwing around vague statements with little
> context like "compost heap" and "such a mess", why don't
> you spell out the specific design points of EVMS that you
> disagree with. The advantages and disadvantages of
> each point can then be discussed.

Yeah, but he is right in any case - look how the IDE mess of 2.5.x, which, frankly, I don't believe was ever as bad as people seem to be saying it was, has put people off testing 2.5.x.  Instead they are waiting for Linus to type

mv linux-2.5.x linux-2.6.0

at which point they think that all remaining bugs will auto-magically correct themselves and the tree is one again safe to use.  WRONG answer!

Simply from the point of view of not wanting to 'scare off' people from a whole tree, (which is so rediculous, I think I'll go and patent it), and as a result get less testing, we're better off trying to stop weirdness from getting in.

John.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-10-02 12:41 s.stoklossa
@ 2002-10-02 12:51 ` Sam Ravnborg
  0 siblings, 0 replies; 657+ messages in thread
From: Sam Ravnborg @ 2002-10-02 12:51 UTC (permalink / raw)
  To: s.stoklossa; +Cc: mec, linux-kernel

On Wed, Oct 02, 2002 at 02:41:42PM +0200, s.stoklossa@mentopolis.de wrote:
> beim versuch, die Einstellungen von alsa aufzurufen, kam folgende FM:
> 
>  Q> ./scripts/Menuconfig: MCmenu74: command not found
> 
> grusz
> 
> Sven
Known problem, try this patch:
copy-n-pated so may not apply cleanly, try by hand.
Ps. Please in english next time.

        Sam

--- linux/sound/Config.in       2002-10-01 12:09:44.000000000 +0200
+++ linux/sound/Config.in       2002-10-01 12:21:05.000000000 +0200
@@ -31,10 +31,7 @@
 if [ "$CONFIG_SND" != "n" -a "$CONFIG_ARM" = "y" ]; then
   source sound/arm/Config.in
 fi
-if [ "$CONFIG_SND" != "n" -a "$CONFIG_SPARC32" = "y" ]; then
-  source sound/sparc/Config.in
-fi
-if [ "$CONFIG_SND" != "n" -a "$CONFIG_SPARC64" = "y" ]; then
+if [ "$CONFIG_SND" != "n" -a "$CONFIG_SPARC32" = "y" ] || [ "$CONFIG_SND" !=
"n" -a "$CONFIG_SPARC64" = "y" ] ; then
   source sound/sparc/Config.in
 fi

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-09-21  5:32 Greg KH
@ 2002-09-23 18:35 ` Patrick Mochel
  0 siblings, 0 replies; 657+ messages in thread
From: Patrick Mochel @ 2002-09-23 18:35 UTC (permalink / raw)
  To: Rhoads, Rob
  Cc: Greg KH, linux-kernel, hardeneddrivers-discuss, cgl_discussion

In general, I agree completely with what Greg says (as usual), but I do 
have a few additional comments.

> (I'll skip the intro, and feel good sections and get into the details
> that you lay out, starting in section 2)
> 
> Section 2:
> 2.1:
> 	- do NOT use /proc for driver info.  Use driverfs.
> 	- If you are using a kernel version that does not have driverfs,
> 	  put all /proc driver info under /proc/drivers, which is where
> 	  it belongs.

Actually, they mention using driverfs in Section 3: Instrumentation. I 
can't tell if this was around before, or this was just added. The date is 
the same (16 Aug), but there is no changelog information about the spec. 

I would suggest not using procfs at all, even if driverfs is not avaiable.  
If you're using 2.4, backport driverfs, or clone it for your own
filesystem. It's not dependent on the driver model at all, and has been
done at least once before (Greg's pcihpfs).

> Section 3:

> The Common Statistic Manager:

Please drop the term 'Manager' from your nomenclature. It is ambiguous, 
because of the context in which its generally used in. Windows uses the 
term for any collection of kernel or device data and/or kernel policy. 
It's not a bad term, but it fails to make a clear distinction between 
kernel space and user space, which we insist on. 

Only the mechanism for setting the policy should exist in the kernel, and
itself my be very intelligent. But, the policy itself should exist outside
of the kernel.

> 3.2.5.2:
> (I'm not condoning ANY of these functions or code, just trying to point out how
> you should, if they were to be in the kernel, done properly.)
> 	- do not use typedef
> 	- struct stat_info does not need *unit, as that is already
> 	  specified in the scale field, right?
> 	- the stat_value_t union is just a horrible abomination, don't
> 	  do that.

Please do not pass void *. You should only pass type-safe structures. If 
you cannot get that information, you should redesign the API. 

> 3.4 Event logging:
> 	- I'm not even going to touch this, sorry.

There are a lot of topics in this spec, most of which are irrelevant to 
actually hardening drivers. They may be features dependent on your APIs, 
but they are completely optional and may hinder acceptance of your primary 
objectives. 

Event logging is definitely one of them, esp. with a function like

evl_log_event_string(  
	ME_EVENT_BUCKET_EMPTY, 
	LOG_WARNING, 
	"Leaky bucket exception (bucket empty):\ 
	Bucket_Level <= Observed_Value - Last_Value\ 
	|%s=%s|%s=%s|%s=%s|%s=%s|%s=%s|%s=%s|%s=%s|%s=%s\ 
	|%s=%u|%s=%u|%s=%u|%s=%u|%s=%u|%s=%u|", 
	RMGT_FacilityIDAttrStr,         RMUUID, 
	RMGT_SubsystemIDAttrStr,    SUBSYSTEM_UUID, 
	RMGT_SubsystemNameAttrStr,  subsystem_name, 
	RMGT_ResourceIDAttrStr,         resource_id, 
	RMGT_ResourceNameAttrStr,   resource_name, 
	ME_MonitorIDAttrStr,        monitor_uuid,  
	ME_StatisticIDAttrStr,         statistic_id, 
	ME_StatisticNameAttrStr,    statistic_name,  
	ME_BucketSizeAttrStr,       bucketsz,  
	ME_FillValueAttrStr,            fillval, 
	ME_FillIntervalAttrStr,         fillint, 
	ME_BucketLevelAttrStr,      bucketlvl, 
	ME_ObservedValueAttrStr,    obsval, 
	ME_LastValueAttrStr,        lastval); 

> In summary, I think that a lot of people have spent a lot of time in
> creating this document, and the surrounding code that matches this
> document.  I really wish that a tiny bit of that effort had gone into
> contacting the Linux kernel development community, and asking to work
> with them on a project like this.  Due to that not happening, and by
> looking at the resultant spec and code, I'm really afraid the majority
> of that time and effort will have been wasted.

I completely agree. There is definitely good intention in some aspects of 
the spec, and definitely in the effort put forth to support this type of 
work. But, in order to gain the support of kernel developers, or even the 
blessing of a few, you should be working with them on the design from the 
beginnging.

Designing APIs is hard. Doing it well is very hard. I'm not claiming I've 
done a stellar job, but I have at least learned that. I've made a lot of 
poor design decisions, many of which are also evident in your code 
descriptions and examples. I can't tell you how many times I've rewritten 
things over and over and over because someone hated them (usually Linus or 
Greg).

There are people that are willing to help, as we are trying to do. But,
it's much easier if you do things gradually and get that help from the
beginning.

> What do I think can be salvaged?  Diagnostics are a good idea, and I
> think they fit into the driver model in 2.5 pretty well.  A lot of
> kernel janitoring work could be done by the CG team to clean up, and
> harden (by applying the things in section 2) the existing kernel
> drivers.  That effort alone would go a long way in helping the stability
> of Linux, and also introduce the CG developers into the kernel community
> as active, helping developers.  It would allow the CG developers to
> learn from the existing developers, as we must be doing something right
> for Linux to be working as well as it does :)

Which kernel are you targeting? I didn't see it in the spec, though I 
could have easily missed it. CGL is based on 2.4, so I would assume that. 
But, I would think the ideal choice would be to start in 2.5 and backport 
it to 2.4. 

If that's the case, how do you intend to work with the driver model? 
There will be quite a bit of code and interface duplication between
your code and the driver model. I can see ways to support many of the
things you want in a relatively easy manner, and not punish the common
user or developer; but the margin is to small to write the answer... ;)

Also, there are many projects in areas similar to what your doing: 
diganostics, HA, etc, etc. It would be nice to see some collaboration 
between the developers of those projects instead of having many disparate 
projects with similar goals. 

	-pat

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-09-14 12:39 Paolo Ciarrocchi
@ 2002-09-14 17:05 ` Rik van Riel
  0 siblings, 0 replies; 657+ messages in thread
From: Rik van Riel @ 2002-09-14 17:05 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: linux-kernel, conman

On Sat, 14 Sep 2002, Paolo Ciarrocchi wrote:

> I think that only the _memload_ test is not
> working with 2.5.*, am I wrong?

You're right, the memload test doesn't work with 2.5 but
needs the following patch...

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

Spamtraps of the month:  september@surriel.com trac@trac.org


--- contest-0.1/mem_load.c.orig	2002-09-13 23:36:47.000000000 -0400
+++ contest-0.1/mem_load.c	2002-09-14 11:10:07.000000000 -0400
@@ -47,24 +47,25 @@
   switch (type) {

   case 0: /* RAM */
-    if ((position = strstr(buffer, "Mem:")) == (char *) NULL) {
-      fprintf (stderr, "Can't parse \"Mem:\" in /proc/meminfo\n");
+    if ((position = strstr(buffer, "MemTotal:")) == (char *) NULL) {
+      fprintf (stderr, "Can't parse \"MemTotal:\" in /proc/meminfo\n");
       exit (-1);
     }
-    sscanf (position, "Mem:  %ul", &size);
+    sscanf (position, "MemTotal:  %ul", &size);
     break;

   case 1:
-    if ((position = strstr(buffer, "Swap:")) == (char *) NULL) {
-      fprintf (stderr, "Can't parse \"Swap:\" in /proc/meminfo\n");
+    if ((position = strstr(buffer, "SwapTotal:")) == (char *) NULL) {
+      fprintf (stderr, "Can't parse \"SwapTotal:\" in /proc/meminfo\n");
       exit (-1);
     }
-    sscanf (position, "Swap: %ul", &size);
+    sscanf (position, "SwapTotal: %ul", &size);
     break;

   }

-  return (size / MB);
+  /* convert from kB to MB */
+  return (size / KB);

 }

--- contest-0.1/mem_load.h.orig	2002-09-14 11:09:28.000000000 -0400
+++ contest-0.1/mem_load.h	2002-09-14 11:09:42.000000000 -0400
@@ -24,6 +24,7 @@

 #define MAX_BUF_SIZE 1024          /* size of /proc/meminfo in bytes */
 #define MB (1024 * 1024)           /* 2^20 bytes */
+#define KB 1024
 #define MAX_MEM_IN_MB (1024 * 64)  /* 64 GB */

 /* Tuning parameter.  Increase if you are getting an 'unreasonable' load


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <200208312335.g7VNZmk37659@sullivan.realtime.net>
@ 2002-09-01  9:53 ` Krzysiek Taraszka
  0 siblings, 0 replies; 657+ messages in thread
From: Krzysiek Taraszka @ 2002-09-01  9:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel

On Sat, 31 Aug 2002, Milton Miller wrote:

> At Fri Aug 30 2002 - 12:54:37 EST Krzysiek Taraszka (dzimi@pld.org.pl) wrote:
> > Great work, but in 2.2.22rc2 powerpc's still broken.
> > First of All Sources have got a lot of unsed stuff.
> > For example look like that:
> > 
> > [dzimi@cyborg linux]$ rgrep -n -R '*.*' 'CONFIG_PPC64' . 
> ...
> 
> Doesn't sound like -rc (release canidate) changes.

Well yes, in 2.2.10 someone tried to add CONFIG_PPC64 support in to 2.2 
kernel.
In 2.2.11 someone add CONFIG_PPC64 in to Config.in! but on 2.2.12 or 
2.2.13 someone remove it ... 
(without remove it from directory != arch/ppc/kernel/ )

> > Second kernel-2.2.21 still have got time init problems in symbios driver
> > on powerpc platform.
> > I send to you my ugly hack witch work, but IMHO he's ugly ;) I need to do
> > it correct.
> 
> > Third, kernel for powerpc boot and work on g3-266 but on g3-333 Ops ...
> > (kernel traps, kernel wrote: Caused by SRR1 or somethink like that, in 2.3
> > i saw #define FIX_SRR1 macro ...)
> 
> Well, SRR1 doesn't cause traps, but it does help tell you why they occurred.
> And the FIX_SRR1 stuff isn't the solution either if you look at it closer.
> How about a decoded oops?  Also, you didn't say what platform you were using.

I used g3 (pmac). My based system was PLD with 2.4.18 tree.
I used gcc-2.95.4 to build 2.2.21 vmlinux.

> As far as the open-pic changes you posted, how about explaining what your
> trying to fix (partly hidden by the rename and move to chrp_setup.c from
> open_pic.c)?

I tried to fix problem witch is on my IBM RS/6000 (model b50).
Openpic can't initialize propertly my scsi system. (sym82c8xx scsi 
driver). Some time init problems.

Oh I forgot, 2.2.22rc1/2 or kernel >= 2.2.16 (2.2 tree) didn't work on my 
IBM RS/6000 (b50).
Build with egcs work, but work slow (Bogomips: 16MHz!) and won't reboot 
and shutdown -h now.
The same code build with gcc Ops (Kernel Exception, look like openpic 
allocation address.)
I'll post the Ops later.

> I see you are wrapping the 8259 checks, but it also refers to a few new
> functions/macros I didn't see defined.

Hmm, yes, that why my patch is ugly. I want to do this correctly.

> How about discussing these problems and patches over at
> linuxppc-dev@lists.linuxppc.org ? (I set the reply-to there).

Ok, but first of all i should subscribe there.

Krzysiek Taraszka			(dzimi@pld.org.pl)

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-30 18:43 Bloch, Jack
  2002-08-30 18:55 ` your mail Matthew Dharm
  2002-08-30 19:22 ` Andreas Dilger
@ 2002-08-31  0:12 ` David Woodhouse
  2 siblings, 0 replies; 657+ messages in thread
From: David Woodhouse @ 2002-08-31  0:12 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Bloch, Jack, linux-kernel

adilger@clusterfs.com said:
>  I would instead suggest using a filesystem like JFFS2 for flash
> devices. This is journaled like ext3, but it also has the benefit of
> doing wear levelling on the device, which otherwise will probably wear
> out the superblock part of the flash rather quickly. 

He said he's using CompactFlash. CompactFlash is not flash, as far as we're
concerned: it is an IDE drive. You may think it has flash inside it; we
couldn't possibly comment.

In fact, it generally has a kind of pseudo-filesystem internally which it 
uses to emulate a block device with 512-byte sectors. It may do its own 
wear-levelling; the manufacturers are often quite cagey about whether it 
actually does or not. Draw your own conclusions about that if you will.

It's quite common to find that this internal pseudo-filesystem _itself_ gets
screwed on power failures. This tends to manifest itself as unrecoverable 
I/O errors.

There is no fundamental reason why every CF card should have these 
problems, in the same way as there is no fundamental reason why all PC 
BIOSes should be crap. But the same expectations apply.

If you want to pass power-fail testing, I would recommend you switch to
using real flash. JFFS2 on real flash has survived days of stress testing
whilst being power cycled randomly every ~5 minutes. The same tests were 
observed to destroy CF cards¹.

CF is bog-roll technology. It's disposable storage designed for temporary
use in stuff like cameras -- not for real computing. Think of it like a
floppy disc and you won't go far wrong.

--
dwmw2
¹ http://www.embeddedlinuxworks.com/articles/jffs_guide.html²
² Constant reboots no longer screw the wear levelling, as reported there.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-30 18:43 Bloch, Jack
  2002-08-30 18:55 ` your mail Matthew Dharm
@ 2002-08-30 19:22 ` Andreas Dilger
  2002-08-31  0:12 ` David Woodhouse
  2 siblings, 0 replies; 657+ messages in thread
From: Andreas Dilger @ 2002-08-30 19:22 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

On Aug 30, 2002  14:43 -0400, Bloch, Jack wrote:
> I have an embedded system runing a 2.4.18-3 Kernel. It runs from a 256MB
> compact flash disk (emulates an IDE interface). I am using an EXT2
> filesystem. During some power-off/power-on testing, the disk check failed.
> It dropped me to a shell and I had to run e2fsck -cfv to correct this
> problem. This is all good and well in a lab environment, but in reality,
> there is nobody there to perform the repair (running system is not equipped
> with keyboard and monitor). Is there any way to invoke e2fsck automatically
> or inhibit the failure detection mechanism? Please CC me directly on any
> responses.

I would instead suggest using a filesystem like JFFS2 for flash devices.
This is journaled like ext3, but it also has the benefit of doing wear
levelling on the device, which otherwise will probably wear out the
superblock part of the flash rather quickly.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-30 18:43 Bloch, Jack
@ 2002-08-30 18:55 ` Matthew Dharm
  2002-08-30 19:22 ` Andreas Dilger
  2002-08-31  0:12 ` David Woodhouse
  2 siblings, 0 replies; 657+ messages in thread
From: Matthew Dharm @ 2002-08-30 18:55 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

I would simply recommend switching to ext3, where these types of errors
generally don't occur.

Oh, and if you just edit your initscripts, you can do anything you want.

Matt

On Fri, Aug 30, 2002 at 02:43:52PM -0400, Bloch, Jack wrote:
> I have an embedded system runing a 2.4.18-3 Kernel. It runs from a 256MB
> compact flash disk (emulates an IDE interface). I am using an EXT2
> filesystem. During some power-off/power-on testing, the disk check failed.
> It dropped me to a shell and I had to run e2fsck -cfv to correct this
> problem. This is all good and well in a lab environment, but in reality,
> there is nobody there to perform the repair (running system is not equipped
> with keyboard and monitor). Is there any way to invoke e2fsck automatically
> or inhibit the failure detection mechanism? Please CC me directly on any
> responses.
> 
> 
> Thanks in advance....
> 
> Jack Bloch 
> Siemens ICN
> phone                (561) 923-6550
> e-mail                jack.bloch@icn.siemens.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

My mother not mind to die for stoppink Windows NT!  She is rememberink 
Stalin!
					-- Pitr
User Friendly, 9/6/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-27 18:22 Steffen Persvold
@ 2002-08-27 19:27 ` Willy Tarreau
  0 siblings, 0 replies; 657+ messages in thread
From: Willy Tarreau @ 2002-08-27 19:27 UTC (permalink / raw)
  To: Steffen Persvold; +Cc: linux-kernel

On Tue, Aug 27, 2002 at 08:22:03PM +0200, Steffen Persvold wrote:

> I have an idea that this happens because the packets are comming out of 
> order into the receiving node (i.e the bonding device is alternating 
> between each interface when sending, and when the receiving node gets the 
> packets it is possible that the first interface get packets number 0, 2, 
> 4 and 6 in one interrupt and queues it to the network stack before packet 
> 1, 3, 5 is handled on the other interface).

You pointed your finger on this exact common problem.
You can use the XOR bonding mode (modprobe bonding mode=2), which uses a
hash of mac addresses to select the outgoing interface. This is interesting
if you have lots of L2 hosts on the same network switch.

Or if you have a few hosts on the same switch, you'd better use the "nexthop"
parameter of "ip route". IIRC, it should be something like :
  ip route add <destination> nexthop dev eth0 nexthop dev eth1
but read the help, I'm not certain.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-23 16:12     ` Bill Unruh
  2002-08-23 20:33       ` Mike Dresser
@ 2002-08-25  2:05       ` Mike Dresser
  1 sibling, 0 replies; 657+ messages in thread
From: Mike Dresser @ 2002-08-25  2:05 UTC (permalink / raw)
  To: Bill Unruh; +Cc: linux-ppp, linux-kernel

On Fri, 23 Aug 2002, Bill Unruh wrote:

> You could try running the little program I got basically from Carlson in
> http://axion.physics.ubc.ca/modem-chk.html
> to try resetting the serial line befor the next attempt (eg, put it into
> /etc/ppp/ip-down).
> Not sure if this is the problem however.

It died again.

I'm going to go out there and swap out the modem with a different model if
i have one.  If that doesn't fix it, I'll get that VIA garbage out of the
system and replaced with a proper Intel 815 based motherboard.

Mike


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-23 16:12     ` Bill Unruh
@ 2002-08-23 20:33       ` Mike Dresser
  2002-08-25  2:05       ` Mike Dresser
  1 sibling, 0 replies; 657+ messages in thread
From: Mike Dresser @ 2002-08-23 20:33 UTC (permalink / raw)
  To: Bill Unruh; +Cc: linux-ppp, linux-kernel

On Fri, 23 Aug 2002, Bill Unruh wrote:

>
> OK, that problem is usually a "hardware" problem-- ie the hardware is
> not responding properly to the icotl request. This could be because
> there is not hardware there (eg trying to open a serial port which does
> not exist on the machine), or is busy, or has been left in some weird
> state. The last sounds most likely here-- eg the serial port on your
> modem thinks it is still busy.
>
> You could try running the little program I got basically from Carlson in
> http://axion.physics.ubc.ca/modem-chk.html
> to try resetting the serial line befor the next attempt (eg, put it into
> /etc/ppp/ip-down).
> Not sure if this is the problem however.

Another 7 minutes, and I'll know if this worked or not.

Another data point I just thought of, if i poff chatham, and then pon
chatham, that actually works.

It just hung up.

And redialed.

And connected properly.

Thank you so very much, it looks like your reset-serial did the job.

I'll implement it on future machines, just in case the same problem
happens, rather than pray it works.

I saw a lot of postings on the 5160 USR modem on the serial-pci-info list,
perhaps it's something to do with this modem.

I'll know for sure at 10:30 this evening, if it is definately owrking or
not.  I was logged in on the other line to monitor the syslog, and bring
up the internet line, just in case.

Thanks again,

Mike

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-23 15:26   ` Mike Dresser
@ 2002-08-23 16:12     ` Bill Unruh
  2002-08-23 20:33       ` Mike Dresser
  2002-08-25  2:05       ` Mike Dresser
  0 siblings, 2 replies; 657+ messages in thread
From: Bill Unruh @ 2002-08-23 16:12 UTC (permalink / raw)
  To: Mike Dresser; +Cc: linux-ppp, linux-kernel


OK, that problem is usually a "hardware" problem-- ie the hardware is
not responding properly to the icotl request. This could be because
there is not hardware there (eg trying to open a serial port which does
not exist on the machine), or is busy, or has been left in some weird
state. The last sounds most likely here-- eg the serial port on your
modem thinks it is still busy.

You could try running the little program I got basically from Carlson in
http://axion.physics.ubc.ca/modem-chk.html
to try resetting the serial line befor the next attempt (eg, put it into
/etc/ppp/ip-down).
Not sure if this is the problem however.

On Fri, 23 Aug 2002, Mike Dresser wrote:

> On Fri, 23 Aug 2002, Bill Unruh wrote:
>
> > Well, it would be good if you actually told us what problem you were
> > describing. Is this a new connection attempt after the first hang up?
> > What?
> >
> > What repeats over and over-- I see no repeat.
>
> I >
> > You also do not tell us info like what kind of modem is this-- external,
> > internal, serial, usb, pci, winmodem,....
> >
> > I assume what you are refering to is the "inappropriate ioctl" line.
> > This indicates a hardware problem.
> >
> > Actually, it looks to me like another pppd is up on the line. Those
> > EchoReq are another pppd receiving stuff on an open pppd on another
> > line. More information on what it is you are trying to do, on what your
> > system is, and what the problem is might get you help.
> >
>
> Sorry.
>
> It's a new connection from the persist option.  The exact same message
> repeats for every dial out it attempts.
>
> It's a PCI 3com 56k Sportster.  It's a hardware modem.
>
> There is sometimes another pppd up on ttys1
>
> Here's the setup:
>
> There is an external modem on ttyS01, irq 3, that dials in occasionally as
> needed.
>
> there is an internal PCI modem on ttyS04, irq 5, that dials in permamently
> to the ISP.
>
> Every 6 hours, the ISP enforces the 6 hour hangup rule they have.
>
> The modem is set to persist, max-fails 0.  It is not able to redial, and
> keeps giving the error message that i pasted.
>
> Under 2.2.x, this functioned properly.
>
> System is a VIA VT82C693A/694x [Apollo PRO133x] based motherboard, from
> Giga-byte, if I remember correctly.  Celeron 533.
>
> Sorry about the too brief error message, I fell into my "it makes sense to
> me the way it is" trap.
>
> Mike
>
>

-- 
William G. Unruh        Canadian Institute for          Tel: +1(604)822-3273
Physics&Astronomy          Advanced Research            Fax: +1(604)822-5324
UBC, Vancouver,BC        Program in Cosmology           unruh@physics.ubc.ca
Canada V6T 1Z1               and Gravity           www.theory.physics.ubc.ca/
For step by step instructions about setting up ppp under Linux, see
            http://www.theory.physics.ubc.ca/ppp-linux.html


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-23 15:12 ` your mail Bill Unruh
@ 2002-08-23 15:26   ` Mike Dresser
  2002-08-23 16:12     ` Bill Unruh
  0 siblings, 1 reply; 657+ messages in thread
From: Mike Dresser @ 2002-08-23 15:26 UTC (permalink / raw)
  To: Bill Unruh; +Cc: linux-ppp, linux-kernel

On Fri, 23 Aug 2002, Bill Unruh wrote:

> Well, it would be good if you actually told us what problem you were
> describing. Is this a new connection attempt after the first hang up?
> What?
>
> What repeats over and over-- I see no repeat.

I >
> You also do not tell us info like what kind of modem is this-- external,
> internal, serial, usb, pci, winmodem,....
>
> I assume what you are refering to is the "inappropriate ioctl" line.
> This indicates a hardware problem.
>
> Actually, it looks to me like another pppd is up on the line. Those
> EchoReq are another pppd receiving stuff on an open pppd on another
> line. More information on what it is you are trying to do, on what your
> system is, and what the problem is might get you help.
>

Sorry.

It's a new connection from the persist option.  The exact same message
repeats for every dial out it attempts.

It's a PCI 3com 56k Sportster.  It's a hardware modem.

There is sometimes another pppd up on ttys1

Here's the setup:

There is an external modem on ttyS01, irq 3, that dials in occasionally as
needed.

there is an internal PCI modem on ttyS04, irq 5, that dials in permamently
to the ISP.

Every 6 hours, the ISP enforces the 6 hour hangup rule they have.

The modem is set to persist, max-fails 0.  It is not able to redial, and
keeps giving the error message that i pasted.

Under 2.2.x, this functioned properly.

System is a VIA VT82C693A/694x [Apollo PRO133x] based motherboard, from
Giga-byte, if I remember correctly.  Celeron 533.

Sorry about the too brief error message, I fell into my "it makes sense to
me the way it is" trap.

Mike

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-23 14:45 Mike Dresser
@ 2002-08-23 15:12 ` Bill Unruh
  2002-08-23 15:26   ` Mike Dresser
  0 siblings, 1 reply; 657+ messages in thread
From: Bill Unruh @ 2002-08-23 15:12 UTC (permalink / raw)
  To: Mike Dresser; +Cc: linux-ppp, linux-kernel

Well, it would be good if you actually told us what problem you were
describing. Is this a new connection attempt after the first hang up?
What?

What repeats over and over-- I see no repeat.

You also do not tell us info like what kind of modem is this-- external,
internal, serial, usb, pci, winmodem,....

I assume what you are refering to is the "inappropriate ioctl" line.
This indicates a hardware problem.

Actually, it looks to me like another pppd is up on the line. Those
EchoReq are another pppd receiving stuff on an open pppd on another
line. More information on what it is you are trying to do, on what your
system is, and what the problem is might get you help.


On Fri, 23 Aug 2002, Mike Dresser wrote:

> I'm having problems with pppd under 2.4.19, with pppd 2.4.1
>
> I can establish a new connection, and no problems.  But once the ISP on
> the other end hangs up, this is what i get in my syslog.
> Repeats over and over.  I saw a few google postings about this, but those
> were back in _1999_, so I would think they were fixed by now.
>
> Doesn't matter if PPP is compiled in with the kernel, or modules.
>
> I'm running Debian 3.0(woody)
>
> This worked under Debian 2.2 and kernel 2.2.21
>
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (BUSY)
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (NO CARRIER)
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (VOICE)
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (NO DIALTONE)
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (NO DIAL TONE)
> Aug 23 10:25:55 tilburybackup chat[9825]: abort on (NO ANSWER)
> Aug 23 10:25:55 tilburybackup chat[9825]: send (ATZ^M)
> Aug 23 10:25:55 tilburybackup chat[9825]: expect (OK)
> Aug 23 10:25:55 tilburybackup chat[9825]: ATZ^M^M
> Aug 23 10:25:55 tilburybackup chat[9825]: OK
> Aug 23 10:25:55 tilburybackup chat[9825]:  -- got it
> Aug 23 10:25:55 tilburybackup chat[9825]: send (ATDT3806600^M)
> Aug 23 10:25:55 tilburybackup chat[9825]: expect (CONNECT)
> Aug 23 10:25:55 tilburybackup chat[9825]: ^M
> Aug 23 10:26:11 tilburybackup pppd[9804]: rcvd [LCP EchoReq id=0x4 magic=0x96835d5b]
> Aug 23 10:26:11 tilburybackup pppd[9804]: sent [LCP EchoRep id=0x4 magic=0x72c56787]
> Aug 23 10:26:11 tilburybackup pppd[9804]: sent [LCP EchoReq id=0x4 magic=0x72c56787]
> Aug 23 10:26:11 tilburybackup pppd[9804]: rcvd [LCP EchoRep id=0x4 magic=0x96835d5b]
> Aug 23 10:26:16 tilburybackup chat[9825]: ATDT3806600^M^M
> Aug 23 10:26:16 tilburybackup chat[9825]: CONNECT
> Aug 23 10:26:16 tilburybackup chat[9825]:  -- got it
> Aug 23 10:26:16 tilburybackup chat[9825]: send (\d)
> Aug 23 10:26:17 tilburybackup pppd[329]: Serial connection established.
> Aug 23 10:26:17 tilburybackup pppd[329]: using channel 1179
> Aug 23 10:26:17 tilburybackup pppd[329]: Couldn't create new ppp unit: Inappropriate ioctl for device
> Aug 23 10:26:18 tilburybackup pppd[329]: Hangup (SIGHUP)
>
> tilburybackup:/etc/ppp# egrep -v '#|^ *$' /etc/ppp/options
> asyncmap 0
> auth
> crtscts
> lock
> hide-password
> modem
> proxyarp
> lcp-echo-interval 30
> lcp-echo-failure 4
> noipx
> persist
> maxfail 0
>
> ttyS04 at port 0xcc00 (irq = 5) is a 16550A
>
> 00:0b.0 Serial controller: US Robotics/3Com 56K FaxModem Model 5610 (rev 01) (prog-if 02 [16550])
>         Subsystem: US Robotics/3Com USR 56k Internal FAX Modem (Model 2977)
>         Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
>         Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>         Interrupt: pin A routed to IRQ 5
>         Region 0: I/O ports at cc00 [size=8]
>         Capabilities: [dc] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2+ AuxCurrent=0mA PME(D0+,D1-,D2+,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=2 PME-
>
> Any ideas?
>
> Mike
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
William G. Unruh        Canadian Institute for          Tel: +1(604)822-3273
Physics&Astronomy          Advanced Research            Fax: +1(604)822-5324
UBC, Vancouver,BC        Program in Cosmology           unruh@physics.ubc.ca
Canada V6T 1Z1               and Gravity           www.theory.physics.ubc.ca/
For step by step instructions about setting up ppp under Linux, see
            http://www.theory.physics.ubc.ca/ppp-linux.html


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-19 21:29 Bloch, Jack
@ 2002-08-20  6:47 ` Philipp Matthias Hahn
  0 siblings, 0 replies; 657+ messages in thread
From: Philipp Matthias Hahn @ 2002-08-20  6:47 UTC (permalink / raw)
  To: Bloch, Jack; +Cc: linux-kernel

Hello Jack!

On Mon, Aug 19, 2002 at 05:29:26PM -0400, Bloch, Jack wrote:
> Are there any plans to do an SCTP (RFC 2960) implementation for Linux?
> Please CC me directly on any responses.

The Linux Kernel 2.5 Status page at
	http://www.kernelnewbies.org/status/latest.html
lists the following URL:
	http://www.sf.net/projects/lksctp

BYtE
Philipp
-- 
  / /  (_)__  __ ____  __ Philipp Hahn
 / /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ pmhahn@titan.lahn.de

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-08-16  7:51 Misha Alex
@ 2002-08-16  9:52 ` Willy Tarreau
  0 siblings, 0 replies; 657+ messages in thread
From: Willy Tarreau @ 2002-08-16  9:52 UTC (permalink / raw)
  To: Misha Alex; +Cc: linux-kernel

Hello !

On Fri, Aug 16, 2002 at 07:51:37AM +0000, Misha Alex wrote:

>      Also i tried the linear addressing linear = c*H*S + h*S +s -1 .But 
> linear or linear*512 never gave me the exact byte offset to seek.
> 
> I am working in linux and using a hexeditor to seek .How many exact bytes 
> should i seek to find out the extended partition.I read the MBR and found 
> the exteneded partiton.
> 00 01 01 00 02 fe 3f 01 3f 00 00 00 43 7d 00 00
> 80 00 01 02 0b fe bf 7e 82 7d 00 00 3d 26 9c 00
> 00 00 81 7f 0f fe ff ff bf a3 9c 00 f1 49 c3 01
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

I haven't played with this for a long time, but I still have some memory
about this. First, when an offset is higher than 8GB, there's no way to
code it with the bios' CHS scheme as you find it in the partition table.
I see that you know how to decode this, so set all the CHS bits to ones
and look at the offset. For this reason, we often use only the size to
locate these partitions. If I recall correctly, the last 4 bytes of your
parts are the sizes in sectors. For example, hda2 is 9c263d sectors long,
which equals 5.2 GB. You'll notice that bytes 8 to 11 of each partitions
are nearly equivalent to the size of the previous part. They should be
the start offset in sectors. So in this case, hda3 begins at 9ca3bf
(byte 5255953920), and is 1c349f1 sectors long (15.1 GB).

I think that 'fe ff ff' after the partition type indicates that only the
linear mode should be used, but I'm not sure about this nor I do I have
any proof. You should read the partition code to get more clues, IMHO.

Hoping this helps,
Willy

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-07-06 15:59 Hacksaw
@ 2002-07-07 19:32 ` Min Li
  0 siblings, 0 replies; 657+ messages in thread
From: Min Li @ 2002-07-07 19:32 UTC (permalink / raw)
  To: Hacksaw; +Cc: linux-kernel

Hello, Yes, I tried to subcribe to those two lists. But I don't think they
are working. But I do need help right now...


On Sat, 6 Jul 2002, Hacksaw wrote:

> Hello Min:
> 
> I suggest your questions would be better asked on the kernle newbies list:
> http://mail.nl.linux.org/kernelnewbies/
> 
> and/or on the RedHat install List:
> 
> https://listman.redhat.com/mailman/listinfo/redhat-install-list.
> 
> The kernel list is strictly for talk about developing the kernel. Also, please 
> read the linux kernel mailing list FAQ: http://www.tux.org/lkml/
> -- 
> Powered by beta particles
> http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD
> 
> 
> 


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-07-05  8:47 Christian Berger
@ 2002-07-05 13:34 ` Gerhard Mack
  0 siblings, 0 replies; 657+ messages in thread
From: Gerhard Mack @ 2002-07-05 13:34 UTC (permalink / raw)
  To: Christian Berger; +Cc: linux-kernel

Right command wrong email address.  You need to send that to
majordomo@vger.kernel.org



On 5 Jul 2002, Christian Berger wrote:

> Date: 05 Jul 2002 10:47:32 +0200
> From: Christian Berger <christian@berger-online.de>
> To: linux-kernel@vger.kernel.org
>
> unsubscribe linux-kernel
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
Gerhard Mack

gmack@innerfire.net

<>< As a computer I find your faith in technology amusing.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <000d01c22361$62c9d6f0$0100a8c0@digital>
@ 2002-07-04 20:45 ` Stephen Tweedie
  0 siblings, 0 replies; 657+ messages in thread
From: Stephen Tweedie @ 2002-07-04 20:45 UTC (permalink / raw)
  To: Naseer Bhatti
  Cc: security, security, linux-kernel, sct, akpm, adilger, ext3-users

Hi,

On Thu, Jul 04, 2002 at 06:47:11PM +0500, Naseer Bhatti <naseer@digitallinx.com> wrote:

> I got these errors in the log on a Production server. I am running ProFTPD 1.2.4 with RedHat 7.2 Kernel 2.4.7-10 not yet compiled myself and Apache 1.3.26. I got my server stop responding and after reboot I checked the logs and got a lots of kernel bugs. ProFTPD was also involved in that. httpd (Apache 1.3.26) also gave some stack output. Correct me if I am wrong. I have attached the file for detailed analysis. Please check it and let me know about the possible bug/solution.

The log shows no sign of any ext3 problem.  I can't see anything in it
which would justify trying to send a compressed log of nearly 400kB to
an ext3 general users mailing list.

For what it's worth, your dcache oopses are most often associated with
bad memory --- try memtest86 on that machine before you go any
further.

--Stephen

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-06-24  5:49 pah
@ 2002-06-24  7:34 ` Zwane Mwaikambo
  0 siblings, 0 replies; 657+ messages in thread
From: Zwane Mwaikambo @ 2002-06-24  7:34 UTC (permalink / raw)
  To: pah; +Cc: linux-kernel

On 24 Jun 2002 pah@promiscua.org wrote:

> 	I've just found a bug (an unsignificant bug) in the panic() function!
> 	There's a possible buffer overflow if the formated string exceeds
> 1024 characters (I think that the problem is in all kernel releases).
> 	The problem is in the use of vsprintf() insted of vsnprintf()!
> 
> 	I know that this doesn't compromise any exploitation by an uid
> different than zero, but should be fixed in the case of panic()'s arguments
> exceeds the buffer limit (probably by an lkm or something like that) and
> cause (probably) a system crash.
> 

In that case there are quite a number of other places in the kernel which 
can be 'exploited' in various ways.

Cheers,
	Zwane

--
http://function.linuxpower.ca
		


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-16 15:54   ` Sanket Rathi
  2002-05-16 18:05     ` Alan Cox
@ 2002-05-20 18:07     ` David Mosberger
  1 sibling, 0 replies; 657+ messages in thread
From: David Mosberger @ 2002-05-20 18:07 UTC (permalink / raw)
  To: Sanket Rathi; +Cc: Alan Cox, linux-kernel

>>>>> On Thu, 16 May 2002 21:24:10 +0530 (IST), Sanket Rathi <sanket.rathi@cdac.ernet.in> said:

  Sanket> No actually i don't want that for DMA it is for diffrent
  Sanket> requirment.  actually in our device there is a page table in
  Sanket> device which have virtual to physical address translation we
  Sanket> save virtual address in device and corresponding physical
  Sanket> address. but we can store only upto 44 bit information of
  Sanket> virtual address thats why i want that.

  Sanket> Can you help me in this

There is no way to limit virtual memory to 44 bits.  I don't know how
your device works, but just fyi: ia64 divides the address space into 8
equal-sized regions and user space applications tend to use at least
two regions (2 for text and 3 for data/stack).  This means that even
with the smallest page size, you'll have to take virtual address bits
61-63 into account.

Hope this helps,

	--david
--
Interested in learning more about IA-64 Linux?  Try http://www.lia64.org/book/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-16 15:54   ` Sanket Rathi
@ 2002-05-16 18:05     ` Alan Cox
  2002-05-20 18:07     ` David Mosberger
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Cox @ 2002-05-16 18:05 UTC (permalink / raw)
  To: Sanket Rathi; +Cc: Alan Cox, Sanket Rathi, linux-kernel

> No actually i don't want that for DMA it is for diffrent requirment.
> actually in our device there is a page table in device which have
> virtual to physical address translation we save virtual address in device
> and corresponding physical address. but we can store only upto 44 bit 
> information of virtual address thats why i want that.

Still read Documentation/DMA-mapping.txt

Whether its DMA or not you are going to need to keep the allocations below
44bits and thats what the DMA allocators do

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-16 13:38 ` your mail Alan Cox
@ 2002-05-16 15:54   ` Sanket Rathi
  2002-05-16 18:05     ` Alan Cox
  2002-05-20 18:07     ` David Mosberger
  0 siblings, 2 replies; 657+ messages in thread
From: Sanket Rathi @ 2002-05-16 15:54 UTC (permalink / raw)
  To: Alan Cox; +Cc: Sanket Rathi, linux-kernel

No actually i don't want that for DMA it is for diffrent requirment.
actually in our device there is a page table in device which have
virtual to physical address translation we save virtual address in device
and corresponding physical address. but we can store only upto 44 bit 
information of virtual address thats why i want that.

Can you help me in this 

Thanks in advance

-----
--------Sanket


> > I just want to know how can we restrict the maximum virtual memory and
> > maximum physical memory on ia64 platform.
> > kernel. Actually we have a device which can only access 44 bits so we cant
> 
> That won't help you. You might not be dealing with RAM at the bottom of the
> address space. You might also be in platforms with an iommu, or doing DMA
> to another PCI target
> 
> > Tell me something related to this or any link which i can refer
> 
> Assuming the device is doing bus mastering. Read
> Documentation/DMA-mapping.txt
> 


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-16 12:40 Sanket Rathi
@ 2002-05-16 13:38 ` Alan Cox
  2002-05-16 15:54   ` Sanket Rathi
  0 siblings, 1 reply; 657+ messages in thread
From: Alan Cox @ 2002-05-16 13:38 UTC (permalink / raw)
  To: Sanket Rathi; +Cc: linux-kernel

> I just want to know how can we restrict the maximum virtual memory and
> maximum physical memory on ia64 platform.
> kernel. Actually we have a device which can only access 44 bits so we cant

That won't help you. You might not be dealing with RAM at the bottom of the
address space. You might also be in platforms with an iommu, or doing DMA
to another PCI target

> Tell me something related to this or any link which i can refer

Assuming the device is doing bus mastering. Read
Documentation/DMA-mapping.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-03 15:29   ` Keith Owens
@ 2002-05-03 15:45     ` tomas szepe
  0 siblings, 0 replies; 657+ messages in thread
From: tomas szepe @ 2002-05-03 15:45 UTC (permalink / raw)
  To: Keith Owens; +Cc: kbuild-devel, linux-kernel

> On Fri, 3 May 2002 16:37:38 +0200, 
> tomas szepe <kala@pinerecords.com> wrote:
>
> >kala@nibbler:~$ tar xzf /usr/src/linux-2.5.13.tgz 
> >kala@nibbler:~$ cd linux-2.5.13 
> >kala@nibbler:~/linux-2.5.13$ zcat /usr/src/kbuild-2.5-core-10.gz /usr/src/kbuild-2.5-common-2.5.13-1.gz /usr/src/kbuild-2.5-i386-2.5.13-1.gz |patch -sp1
> >kala@nibbler:~/linux-2.5.13$ cp /lib/modules/2.5.13/.config .
> >kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 oldconfig
> >Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
> 
> The trailing '/' is omitted in one case.  Workaround for common source and object
> 
> export KBUILD_SRCTREE_000=`pwd`/
> make -f Makefile-2.5 oldconfig

Another problem/question:

$ cd build
$ export KBUILD_OBJTREE=$PWD
$ export KBUILD_SRCTREE_000=/usr/src/linux-2.5.13
$ alias M="make -f $KBUILD_SRCTREE_000/Makefile-2.5"
$ cp /lib/modules/2.5.13/.config .
$ M oldconfig
...

$ M installable
...

[so far so good]

$ make -f Makefile-2.5 menuconfig
[enable RAMDISK support, tweak ramdisk size, enable initrd]
...

Now, issuing "M installable" will result in nearly all files getting rebuilt.
The same happens when switching ramdisk off again. How's that?

I tried enabling/disabling many other config options and doing rebuilds but
couldn't find anything as damaging buildtime-wise as the ramdisk stuff.

Hopefully I'm causing no headaches,
T.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-03 14:37 ` your mail tomas szepe
  2002-05-03 15:07   ` tomas szepe
@ 2002-05-03 15:29   ` Keith Owens
  2002-05-03 15:45     ` tomas szepe
  1 sibling, 1 reply; 657+ messages in thread
From: Keith Owens @ 2002-05-03 15:29 UTC (permalink / raw)
  To: tomas szepe; +Cc: kbuild-devel, linux-kernel

On Fri, 3 May 2002 16:37:38 +0200, 
tomas szepe <kala@pinerecords.com> wrote:
>kala@nibbler:~$ tar xzf /usr/src/linux-2.5.13.tgz 
>kala@nibbler:~$ cd linux-2.5.13 
>kala@nibbler:~/linux-2.5.13$ zcat /usr/src/kbuild-2.5-core-10.gz /usr/src/kbuild-2.5-common-2.5.13-1.gz /usr/src/kbuild-2.5-i386-2.5.13-1.gz |patch -sp1
>kala@nibbler:~/linux-2.5.13$ cp /lib/modules/2.5.13/.config .
>kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 oldconfig
>Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory

The trailing '/' is omitted in one case.  Workaround for common source and object

export KBUILD_SRCTREE_000=`pwd`/
make -f Makefile-2.5 oldconfig


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-03 14:37 ` your mail tomas szepe
@ 2002-05-03 15:07   ` tomas szepe
  2002-05-03 15:29   ` Keith Owens
  1 sibling, 0 replies; 657+ messages in thread
From: tomas szepe @ 2002-05-03 15:07 UTC (permalink / raw)
  To: Keith Owens; +Cc: kbuild-devel, linux-kernel

Building as follows works, though.

$ cd /usr/src && tar xzf linux-2.5.13.tar.gz
$ cd ~ && mkdir build && cd build
$ export KBUILD_OBJTREE=$PWD
$ export KBUILD_SRCTREE_000=/usr/src/linux-2.5.13
$ alias M="make -f $KBUILD_SRCTREE_000/Makefile-2.5"
$ cp /lib/modules/2.5.13/.config .
$ M oldconfig
$ M installable

T.


> > Release 2.4 of kernel build for kernel 2.5 (kbuild 2.5) is available.
> > http://sourceforge.net/projects/kbuild/, package kbuild-2.5, download
> > release 2.4.
> >
> > kbuild-2.5-core-13-1.
> 
> I believe you meant 's/13/10/'.
> 
> > kbuild-2.5-common-2.5.13-1.
> > kbuild-2.5-i386-2.5.13-1.
> 
> hmmm.. doesn't look so good.
> 
> kala@nibbler:~$ tar xzf /usr/src/linux-2.5.13.tgz 
> kala@nibbler:~$ cd linux-2.5.13 
> kala@nibbler:~/linux-2.5.13$ zcat /usr/src/kbuild-2.5-core-10.gz /usr/src/kbuild-2.5-common-2.5.13-1.gz /usr/src/kbuild-2.5-i386-2.5.13-1.gz |patch -sp1
> kala@nibbler:~/linux-2.5.13$ cp /lib/modules/2.5.13/.config .
> kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 oldconfig
> Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
> /home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
> Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
> /home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
> Using ARCH='i386' AS='as' LD='ld' CC='/usr/bin/gcc' CPP='/usr/bin/gcc -E' AR='ar' HOSTAS='as' HOSTLD='gcc' HOSTCC='gcc' HOSTAR='ar'
> Generating global Makefile
>   phase 1 (find all inputs)
> ...
> 
> kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 installable
> Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
> spec value %p not found
> /home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
> Using ARCH='i386' AS='as' LD='ld' CC='/usr/bin/gcc' CPP='/usr/bin/gcc -E' AR='ar' HOSTAS='as' HOSTLD='gcc' HOSTCC='gcc' HOSTAR='ar'
> Generating global Makefile
>   phase 1 (find all inputs)
>   phase 2 (convert all Makefile.in files)
>   phase 3 (evaluate selections)
>   phase 4 (integrity checks, write global makefile)
> pp_makefile4: arch/i386/lib/lib.a is selected but is not part of vmlinux, missing link_subdirs?
> make: *** [phase4] Error 1
> 
> 
> T.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-05-03 14:19 Keith Owens
@ 2002-05-03 14:37 ` tomas szepe
  2002-05-03 15:07   ` tomas szepe
  2002-05-03 15:29   ` Keith Owens
  0 siblings, 2 replies; 657+ messages in thread
From: tomas szepe @ 2002-05-03 14:37 UTC (permalink / raw)
  To: Keith Owens; +Cc: kbuild-devel, linux-kernel

> Release 2.4 of kernel build for kernel 2.5 (kbuild 2.5) is available.
> http://sourceforge.net/projects/kbuild/, package kbuild-2.5, download
> release 2.4.
>
> kbuild-2.5-core-13-1.

I believe you meant 's/13/10/'.

> kbuild-2.5-common-2.5.13-1.
> kbuild-2.5-i386-2.5.13-1.

hmmm.. doesn't look so good.

kala@nibbler:~$ tar xzf /usr/src/linux-2.5.13.tgz 
kala@nibbler:~$ cd linux-2.5.13 
kala@nibbler:~/linux-2.5.13$ zcat /usr/src/kbuild-2.5-core-10.gz /usr/src/kbuild-2.5-common-2.5.13-1.gz /usr/src/kbuild-2.5-i386-2.5.13-1.gz |patch -sp1
kala@nibbler:~/linux-2.5.13$ cp /lib/modules/2.5.13/.config .
kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 oldconfig
Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
/home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
/home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
Using ARCH='i386' AS='as' LD='ld' CC='/usr/bin/gcc' CPP='/usr/bin/gcc -E' AR='ar' HOSTAS='as' HOSTLD='gcc' HOSTCC='gcc' HOSTAR='ar'
Generating global Makefile
  phase 1 (find all inputs)
...

kala@nibbler:~/linux-2.5.13$ make -f Makefile-2.5 installable
Makefile-2.5:251: /no_such_file-arch/i386/Makefile.defs.noconfig: No such file or directory
spec value %p not found
/home/kala/linux-2.5.13/scripts/Makefile-2.5:473: /no_such_file-arch/i386/Makefile.defs.config: No such file or directory
Using ARCH='i386' AS='as' LD='ld' CC='/usr/bin/gcc' CPP='/usr/bin/gcc -E' AR='ar' HOSTAS='as' HOSTLD='gcc' HOSTCC='gcc' HOSTAR='ar'
Generating global Makefile
  phase 1 (find all inputs)
  phase 2 (convert all Makefile.in files)
  phase 3 (evaluate selections)
  phase 4 (integrity checks, write global makefile)
pp_makefile4: arch/i386/lib/lib.a is selected but is not part of vmlinux, missing link_subdirs?
make: *** [phase4] Error 1

T.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-04-24  7:55 Huo Zhigang
  2002-04-24  7:51 ` your mail Zwane Mwaikambo
@ 2002-04-24  8:27 ` Alan Cox
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Cox @ 2002-04-24  8:27 UTC (permalink / raw)
  To: Huo Zhigang; +Cc: linux-kernel

> 
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>   
>   What's wrong with my machines?  They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>   Thank you in advance 8-)

If you boot the machije without your driver, then reboot does the
same happen ? If not then it may well be your driver has an error but only
when it closes down

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-04-24  7:55 Huo Zhigang
@ 2002-04-24  7:51 ` Zwane Mwaikambo
  2002-04-24  8:27 ` Alan Cox
  1 sibling, 0 replies; 657+ messages in thread
From: Zwane Mwaikambo @ 2002-04-24  7:51 UTC (permalink / raw)
  To: Huo Zhigang; +Cc: Linux Kernel

On Wed, 24 Apr 2002, Huo Zhigang wrote:

>   Hi, all.
>   My cluster go wrong these days. So many times when I "/sbin/reboot" a node, the following message will be displayed on the console.
> 
> >INIT: Switching to runlevel: 6
> >INIT: Send processes the TERM signal
> >Unable to handle kernel NULL pointer dereference
>   
>   What's wrong with my machines?  They are all running linux-2.2.18(SMP-supported) with a kernel module which is a driver of Myricom NIC M3S-PCI64C-2 written by my group.
>   Thank you in advance 8-)
>   
>             Zhigang Huo
>             zghuo@ncic.ac.cn

Have you tried decoding the oops? Have a look at  
linux/Documentation/oops-tracing.txt

	Zwane

-- 
http://function.linuxpower.ca



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-04-21 21:16 Ivan G.
@ 2002-04-21 23:02 ` Jeff Garzik
  0 siblings, 0 replies; 657+ messages in thread
From: Jeff Garzik @ 2002-04-21 23:02 UTC (permalink / raw)
  To: Ivan G.; +Cc: Urban Widmark, LKML

On Sun, Apr 21, 2002 at 03:16:40PM -0600, Ivan G. wrote:
> Urban,
> 
> About the suggestion to make via_rhine_error handle more interrupts,
> 
> enum intr_status_bits {
>         IntrRxDone=0x0001, IntrRxErr=0x0004, IntrRxEmpty=0x0020,
>         IntrTxDone=0x0002, IntrTxAbort=0x0008, IntrTxUnderrun=0x0010,
>         IntrPCIErr=0x0040,
>         IntrStatsMax=0x0080, IntrRxEarly=0x0100, IntrMIIChange=0x0200,
>         IntrRxOverflow=0x0400, IntrRxDropped=0x0800, IntrRxNoBuf=0x1000,
>         IntrTxAborted=0x2000, IntrLinkChange=0x4000,
>         IntrRxWakeUp=0x8000,
>         IntrNormalSummary=0x0003, IntrAbnormalSummary=0xC260,
> };
> 
> RxEarly, RxOverflow, RxNoBuf are not handled
> (which brings up another question - how should they be handled 
> and where?? It doesn't seem to me that those should end up in error,
> sending CmdTxDemand. )

*blink*  I had not noticed that.

All drivers actually need to handle RxNoBufs and RxOverflow, assuming
they have similar meaning to what I'm familiar with on other chips.
The chip may recover transparently, but one should be at least aware of
them.

RxEarly you very likely do -not- want to handle...

	Jeff





^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 19:21 Romain Liévin
  2002-03-13 19:43 ` your mail Alan Cox
@ 2002-03-14  7:08 ` Zwane Mwaikambo
  1 sibling, 0 replies; 657+ messages in thread
From: Zwane Mwaikambo @ 2002-03-14  7:08 UTC (permalink / raw)
  To: Romain Liévin; +Cc: Kernel List, Alan Cox, Tim Waugh

Firstly, thanks for doing this =) secondly i'll give your driver a try 
when you release the serial version (i have a serial cable + TI-83)

Cheers,
	Zwane



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 20:28   ` Romain Liévin
  2002-03-13 20:49     ` Richard B. Johnson
@ 2002-03-13 22:35     ` Alan Cox
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Cox @ 2002-03-13 22:35 UTC (permalink / raw)
  To: Romain Liévin; +Cc: Alan Cox, Kernel List

> +/*
> + * Deal with CONFIG_MODVERSIONS
> + */
> +#if 0 /* Pb with MODVERSIONS */
> +#if CONFIG_MODVERSIONS==1
> +#define MODVERSIONS
> +#include <linux/modversions.h>
> +#endif
> +#endif

[modversions.h is magically included by the kernel for you when its in 
 kernel if you haven't worked that one out yet]

> +#define PP_NO 3
> +struct tipar_struct  table[PP_NO];
static ?

> +               for(i=0; i < delay; i++) {
> +                       inbyte(minor);
> +               }
> +               schedule();

Oh random tip

		  if(current->need_resched)
			schedule();

will just give up the CPU when you are out of time

> +       if(table[minor].opened)
> +               return -EBUSY;
> +       table[minor].opened++;

Think about open/close at the same moment or SMP - the watchdog drivers all
had this problem and now do

	unsigned long opened = 0;

	if(test_and_set_bit(0, &opened))
		return -EBUSY;

	clear_bit(0, &opened)

[this generates atomic operations so is always safe]

> +       if(!table[minor].opened)
> +               return -EFAULT;

	BUG() may be better - it can't happen so BUG() will get a backtrace
and actually get it reported 8)

> +static long long tipar_lseek(struct file * file, long long offset, int origin)
> +{
> +       return -ESPIPE;
> +}

Can go (you now use no_llseek)


Basically except for the open/close one I'm now just picking holes. 
For the device major/minors see http://www.lanana.org.




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 20:49     ` Richard B. Johnson
@ 2002-03-13 22:27       ` Alan Cox
  0 siblings, 0 replies; 657+ messages in thread
From: Alan Cox @ 2002-03-13 22:27 UTC (permalink / raw)
  To: root; +Cc: Romain Liévin, Alan Cox, Kernel List

> > +                       START(max);=20
> > +                       do {
> > +                               WAIT(max);
> > +                       } while (inbyte(minor) & 0x10);
> 
>              This may never happen. You end up waiting forever!

No - its hidden in his macros. Look harder



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 20:28   ` Romain Liévin
@ 2002-03-13 20:49     ` Richard B. Johnson
  2002-03-13 22:27       ` Alan Cox
  2002-03-13 22:35     ` Alan Cox
  1 sibling, 1 reply; 657+ messages in thread
From: Richard B. Johnson @ 2002-03-13 20:49 UTC (permalink / raw)
  To: Romain Liévin; +Cc: Alan Cox, Kernel List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 4611 bytes --]

On Wed, 13 Mar 2002, [ISO-8859-1] Romain Liévin wrote:
I'm going to comment on a few points:

[SNIPPED most...]

> +
> +/* D-bus protocol:
> +                    1                 0                      0
> +       _______        ______|______    __________|________    __________
> +Red  :        ________      |      ____          |        ____
> +       _        ____________|________      ______|__________       _____
> +White:  ________            |        ______      |          _______
> +*/
> +
> +/* Try to transmit a byte on the specified port (-1 if error). */
> +static int put_ti_parallel(int minor, unsigned char data)
> +{
> +       int bit, i;
> +       unsigned long max;
> +  
> +       for (bit=0; bit<8; bit++) {
> +               if (data & 1) {
> +                       outbyte(2, minor);
> +                       START(max); 
> +                       do {
> +                               WAIT(max);
> +                       } while (inbyte(minor) & 0x10);

             This may never happen. You end up waiting forever!
             If the port doesn't exist or is broken, it may return 0xff
             forever! You need to time-out and get out.


> +                       
> +                       outbyte(3, minor);
> +                       START(max);
> +                       do {
> +                               WAIT(max);
> +                       } while (!(inbyte(minor) & 0x10));

                     This may never happen. You end up awiting forever!
                     You need to time-out and get out.


> +               } else {
> +                       outbyte(1, minor);
> +                       START(max);
> +                       do {
> +                               WAIT(max);
> +                       } while (inbyte(minor) & 0x20);
> +                       
                       This also may never happen!
                       Same applies, time-out and get out.
                     
> +                       outbyte(3, minor);
> +                       START(max);
> +                       do {
> +                               WAIT(max);
> +                       } while (!(inbyte(minor) & 0x20));

                      This also may never happen!
                      Same applives, time-out and get out.

> +               }
> +               data >>= 1;
> +               for(i=0; i < delay; i++) {
> +                       inbyte(minor);
> +               }

> +               schedule();

                  This will just spin without setting
                  current->policy |= SCHED_YIELD;
                  (you really should use sys_sched_yield())


> +       }
> +       
> +       return 0;
> +}
> +
> +/* Receive a byte on the specified port or -1 if error. */
> +static int get_ti_parallel(int minor)
> +{
> +       int bit,i;
> +       unsigned char v, data=0;
> +       unsigned long max;
> +
> +       for (bit=0; bit<8; bit++) {
> +               START(max); 
> +               do {
> +                       WAIT(max);
> +               } while ((v=inbyte(minor) & 0x30) == 0x30);
> +      
                  More wait-forever above...

> +               if (v == 0x10) { 
> +                       data=(data>>1) | 0x80;
> +                       outbyte(1, minor);
> +                       START(max);
> +                       do {
> +                               WAIT(max);
> +                       } while (!(inbyte(minor) & 0x20));

                      More wait-forever above.


> +                       outbyte(3, minor);
> +               } else {
> +                       data=data>>1;
> +                       outbyte(2, minor);
> +                       START(max);
> +                       do {
> +                               WAIT(max);
> +                       } while (!(inbyte(minor) & 0x10));
> +                       outbyte(3, minor);
                      More wait-forever!

> +               }
> +               for(i=0; i<delay; i++) {
> +                       inbyte(minor);
> +               }
> +               schedule();
                  No current->policy

> +       }
> +       return (int)data;
> +}
> +

[SNIPPED rest]


Basically, this code performs a needed function. I have been waiting
for someone to write this! However, it's not yet ready for prime-time.
Never assume that hardware is going to produce what you expect. Don't
wait forever for something that was supposed to happen. You'll hang
the machine.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 19:43 ` your mail Alan Cox
@ 2002-03-13 20:28   ` Romain Liévin
  2002-03-13 20:49     ` Richard B. Johnson
  2002-03-13 22:35     ` Alan Cox
  0 siblings, 2 replies; 657+ messages in thread
From: Romain Liévin @ 2002-03-13 20:28 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kernel List

Quoting Alan Cox <alan@lxorguk.ukuu.org.uk>:

> > It has been tested on x86 for almost 2 years and on Alpha & Sparc too
> with 
> > various calculators.
> 
> One oddity - some other comments
> 
> > +static int tipar_open(struct inode *inode, struct file *file)
> > +{
> > +       unsigned int minor = minor(inode->i_rdev) - TIPAR_MINOR_0;
> > +
> > +       if (minor >= PP_NO)
> > +               return -ENXIO;  
> > +       
> > +       init_ti_parallel(minor);
> > +
> > +       MOD_INC_USE_COUNT;
> 
> You should remove these and use in 2.4 + . Also what stops multiple
> simultaneous runs of init_ti_parallel if two people open it at once ?
> 
> 
> > +static unsigned int tipar_poll(struct file *file, poll_table *
> wait)
> > +{
> > +       unsigned int mask=0;
> > +       return mask;
> > +}
> 
> That seems unfinished ??
> 
> > +static int tipar_ioctl(struct inode *inode, struct file *file,
> > +                      unsigned int cmd, unsigned long arg)
> > +       case O_NONBLOCK:
> > +               file->f_flags |= O_NONBLOCK;
> > +               return 0;
> 
> O_NDELAY is set by fcntl - your driver never needs this.
> 
> > +       default:
> > +               retval = -EINVAL;
> 
> SuS says -ENOTTY here (lots of drivers get this wrong still)
> 
> > +static long long tipar_lseek(struct file * file, long long offset,
> int origin)
> > +{
> > +       return -ESPIPE;
> > +}
> 
> There is a generic no_llseek function
> 
> > +/* Major & minor number for character devices */
> > +#define TIPAR_MAJOR   61
> 
> These don't appear to be officially assigned via lanana ?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Fixed some stuffs according to your remarks.

Comments are welcome...

=================== [ cuts here ] =====================
--- linux.orig/drivers/char/tipar.c     Wed Mar 13 19:19:10 2002
+++ linux/drivers/char/tipar.c  Wed Mar 13 21:24:51 2002
@@ -0,0 +1,543 @@
+/* Hey EMACS -*- linux-c -*-
+ *
+ * tipar - low level driver for handling a parallel link cable
+ * designed for Texas Instruments graphing calculators.
+ *
+ * Copyright (C) 2000-2002, Romain Lievin <roms@lpg.ticalc.org>
+ * under the terms of the GNU General Public License.
+ */
+
+#define VERSION "1.12"
+
+/* This driver should, in theory, work with any parallel port that has an
+ * appropriate low-level driver; all I/O is done through the parport
+ * abstraction layer.
+ *
+ * If this driver is built into the kernel, you can configure it using the
+ * kernel command-line.  For example:
+ *
+ *      tipar=timeout,delay       (set timeout and delay)
+ *
+ * If the driver is loaded as a module, similar functionality is available
+ * using module parameters.  The equivalent of the above commands would be:
+ *
+ *      # insmod tipar.o tipar=15,10
+ */
+
+/* COMPATIBILITY WITH OLD KERNELS
+ *
+ * Usually, parallel cables were bound to ports at
+ * particular I/O addresses, as follows:
+ *
+ *      tipar0             0x378
+ *      tipar1             0x278
+ *      tipar2             0x3bc
+ *
+ *
+ * This driver, by default, binds tipar devices according to parport and
+ * the minor number.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/config.h>
+#include <linux/version.h>
+#include <linux/init.h>
+#include <asm/uaccess.h>
+#include <linux/ioport.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <asm/io.h>
+#include <linux/devfs_fs_kernel.h>
+#include <linux/parport.h> /* Our code depend on parport */
+
+/*
+ * TI definitions
+ */
+#include <linux/ticable.h>
+
+/*
+ * Deal with CONFIG_MODVERSIONS
+ */
+#if 0 /* Pb with MODVERSIONS */
+#if CONFIG_MODVERSIONS==1
+#define MODVERSIONS
+#include <linux/modversions.h>
+#endif
+#endif
+
+/* ----- global variables --------------------------------------------- */
+
+struct tipar_struct {
+       struct pardevice *dev;                  /* Parport device entry */
+       int opened;
+};
+
+#define PP_NO 3
+struct tipar_struct  table[PP_NO];
+
+static int delay   = IO_DELAY;      /* inter-bit delay in microseconds */
+static int timeout = TIMAXTIME;     /* timeout in tenth of seconds     */
+
+static devfs_handle_t devfs_handle = NULL;
+static unsigned int tp_count = 0;   /* tipar_count */
+
+/* --- macros for parport access -------------------------------------- */
+
+#define r_dtr(x)        (parport_read_data(table[(x)].dev->port))
+#define r_str(x)        (parport_read_status(table[(x)].dev->port))
+#define w_ctr(x,y)      (parport_write_control(table[(x)].dev->port, (y)))
+#define w_dtr(x,y)      (parport_write_data(table[(x)].dev->port, (y)))
+
+/* --- setting states on the D-bus with the right timing: ------------- */
+
+static inline void outbyte(int value, int minor)
+{
+       w_dtr(minor, value);
+}
+
+static inline int inbyte(int minor)
+{
+       return (r_str(minor) & 0x30);
+}
+
+static inline void init_ti_parallel(int minor)
+{
+       outbyte(3, minor);
+}
+
+/* ----- global defines ----------------------------------------------- */
+
+#define START(x) { max=jiffies+HZ/(timeout/10); }
+#define WAIT(x) { if(!time_before(jiffies, (x))) return -1; schedule(); }
+
+/* ----- D-bus bit-banging functions ---------------------------------- */
+
+/* D-bus protocol:
+                    1                 0                      0
+       _______        ______|______    __________|________    __________
+Red  :        ________      |      ____          |        ____
+       _        ____________|________      ______|__________       _____
+White:  ________            |        ______      |          _______
+*/
+
+/* Try to transmit a byte on the specified port (-1 if error). */
+static int put_ti_parallel(int minor, unsigned char data)
+{
+       int bit, i;
+       unsigned long max;
+  
+       for (bit=0; bit<8; bit++) {
+               if (data & 1) {
+                       outbyte(2, minor);
+                       START(max); 
+                       do {
+                               WAIT(max);
+                       } while (inbyte(minor) & 0x10);
+                       
+                       outbyte(3, minor);
+                       START(max);
+                       do {
+                               WAIT(max);
+                       } while (!(inbyte(minor) & 0x10));
+               } else {
+                       outbyte(1, minor);
+                       START(max);
+                       do {
+                               WAIT(max);
+                       } while (inbyte(minor) & 0x20);
+                       
+                       outbyte(3, minor);
+                       START(max);
+                       do {
+                               WAIT(max);
+                       } while (!(inbyte(minor) & 0x20));
+               }
+               data >>= 1;
+               for(i=0; i < delay; i++) {
+                       inbyte(minor);
+               }
+               schedule();
+       }
+       
+       return 0;
+}
+
+/* Receive a byte on the specified port or -1 if error. */
+static int get_ti_parallel(int minor)
+{
+       int bit,i;
+       unsigned char v, data=0;
+       unsigned long max;
+
+       for (bit=0; bit<8; bit++) {
+               START(max); 
+               do {
+                       WAIT(max);
+               } while ((v=inbyte(minor) & 0x30) == 0x30);
+      
+               if (v == 0x10) { 
+                       data=(data>>1) | 0x80;
+                       outbyte(1, minor);
+                       START(max);
+                       do {
+                               WAIT(max);
+                       } while (!(inbyte(minor) & 0x20));
+                       outbyte(3, minor);
+               } else {
+                       data=data>>1;
+                       outbyte(2, minor);
+                       START(max);
+                       do {
+                               WAIT(max);
+                       } while (!(inbyte(minor) & 0x10));
+                       outbyte(3, minor);
+               }
+               for(i=0; i<delay; i++) {
+                       inbyte(minor);
+               }
+               schedule();
+       }
+       return (int)data;
+}
+
+/* Return non zero if both lines are at logical one */
+static int check_ti_parallel(int minor)
+{
+       return ((inbyte(minor) & 0x30) == 0x30);
+}
+
+/* Try to detect a parallel link cable on the specified port */
+static int probe_ti_parallel(int minor)
+{
+       int i, j;
+       int seq[]={ 0x00, 0x20, 0x10, 0x30 };
+       unsigned char data = 0;
+       
+       for(i=3; i>=0; i--) {
+               outbyte(3, minor);
+               outbyte(i, minor);
+               for(j=0; j<delay; j++) data = inbyte(minor);
+               /*printk("Probing -> %i: 0x%02x 0x%02x\n", i, data & 0x30,
seq[i]);*/
+               if( (data & 0x30) != seq[i]) {
+                       outbyte(3, minor);
+                       return -1;
+               }
+       } 
+       outbyte(3, minor);
+       return 0;
+}
+
+/* ----- kernel module functions--------------------------------------- */
+
+static int tipar_open(struct inode *inode, struct file *file)
+{
+       unsigned int minor = minor(inode->i_rdev) - TIPAR_MINOR_0;
+
+       if (minor >= PP_NO)
+               return -ENXIO;
+
+       if(table[minor].opened)
+               return -EBUSY;
+
+       table[minor].opened++;
+
+       lp_claim_parport_or_block(table[minor].dev);
+       init_ti_parallel(minor);
+       lp_release_parport(table[minor].dev);
+
+       return 0;
+}
+
+static int tipar_close(struct inode *inode, struct file *file)
+{
+       if (minor >= PP_NO)
+               return -ENXIO;
+
+       if(!table[minor].opened)
+               return -EFAULT;
+
+       table[minor].opened--;
+
+       return 0;
+}
+
+static ssize_t tipar_write(struct file *file,
+                          const char *buf, size_t count, loff_t *ppos)
+{
+       unsigned int minor = minor(file->f_dentry->d_inode->i_rdev) - 
+               TIPAR_MINOR_0;
+       ssize_t n;
+  
+       if (minor >= PP_NO)
+               return -ENXIO;
+
+       if (table[minor].dev == NULL) 
+               return -ENXIO;
+
+       parport_claim_or_block (table[minor].dev);
+       
+       for(n=0; n<count; n++) {
+               unsigned char b;
+               
+               if(get_user(b, buf + n)) {
+                       n = -EFAULT;
+                       goto out;
+               }
+
+               if(put_ti_parallel(minor, b) == -1) {
+                       init_ti_parallel(minor);
+                       n = -ETIMEDOUT;
+                       goto out;
+               }
+       }
+
+ out:
+       parport_release (table[minor].dev);
+       return n;
+}
+
+static ssize_t tipar_read(struct file *file, char *buf, 
+                         size_t count, loff_t *ppos)
+{
+       int b=0;
+       unsigned int minor=minor(file->f_dentry->d_inode->i_rdev) - 
+               TIPAR_MINOR_0;
+       ssize_t retval = 0;
+
+       if(count == 0)
+               return 0;
+
+       if(ppos != &file->f_pos)
+               return -ESPIPE;
+
+       parport_claim_or_block(table[minor].dev);
+  
+       do {
+               b = get_ti_parallel(minor);
+               if(b == -1) {
+                       init_ti_parallel(minor);
+                       retval = -ETIMEDOUT;
+                       goto out;
+               }
+               else
+                       break;
+      
+               /* Non-blocking mode: try again ! */
+               if (file->f_flags & O_NONBLOCK) {
+                       retval = -EAGAIN;
+                       goto out;
+               }
+               
+               /* Signal pending, try again ! */
+               if (signal_pending(current)) {
+                       retval = -ERESTARTSYS;
+                       goto out;
+               }
+
+               schedule();
+       } while (1);
+
+       retval = put_user(b, (unsigned char *)buf);
+       if(!retval)
+               retval = 1;
+       else
+               retval = -EFAULT;
+
+ out:
+       parport_release(table[minor].dev);
+       return retval;
+}
+
+static int tipar_ioctl(struct inode *inode, struct file *file,
+                      unsigned int cmd, unsigned long arg)
+{
+       unsigned int minor = minor(inode->i_rdev) - TIPAR_MINOR_0;
+       int retval = 0;
+
+       if (minor >= PP_NO) 
+               return -ENODEV;
+
+       switch (cmd) {
+       case 0:
+               break;
+       case TIPAR_DELAY:
+               delay = arg;
+               return 0;
+       case TIPAR_TIMEOUT:
+               timeout = arg;
+               return 0;
+       default:
+               retval = -ENOTTY;
+               break;
+       }
+
+       return retval;
+}
+
+static long long tipar_lseek(struct file * file, long long offset, int origin)
+{
+       return -ESPIPE;
+}
+
+
+/* ----- kernel module registering ------------------------------------ */
+
+static struct file_operations tipar_fops = {
+       owner:   THIS_MODULE,
+       llseek:  no_llseek,
+       read:    tipar_read,
+       write:   tipar_write,
+       ioctl:   tipar_ioctl,
+       open:    tipar_open,
+       release: tipar_close,
+};
+
+/* --- initialisation code ------------------------------------- */
+
+#ifndef MODULE
+/*      You must set these - there is no sane way to probe for this cable.
+ *      You can use tipar=timeout,delay to set these now. */
+static int __init tipar_setup (char *str)
+{
+       int ints[2];
+
+        str = get_options (str, ARRAY_SIZE(ints), ints);
+
+        if (ints[0] > 0) {
+                timeout = ints[1];
+                if(ints[0] > 1) {
+                        delay = ints[2];
+               }
+        }
+        return 1;
+}
+#endif
+
+/*
+ * Register our module into parport.
+ * Pass also 2 callbacks functions to parport: a pre-emptive function and an
+ * interrupt handler function (unused).
+ * Display a message such "tipar0: using parport0 (polling)".
+ */
+static int tipar_register(int nr, struct parport *port)
+{
+       char name[8];
+       
+       /* Register our module into parport */
+       table[nr].dev = parport_register_device(port, "tipar",
+                                               NULL, NULL, NULL, 0,
+                                               (void *) &table[nr]);
+       
+       if (table[nr].dev == NULL)
+               return 1;
+ 
+       /* Use devfs, tree: /dev/ticables/par/[0..2] */
+       sprintf(name, "%d", nr);
+       devfs_register(devfs_handle, name,
+                       DEVFS_FL_AUTO_DEVNUM, TIPAR_MAJOR, nr,
+                       S_IFCHR | S_IRUGO | S_IWUGO,
+                       &tipar_fops, NULL);
+
+       /* Display informations */
+       printk(KERN_INFO "tipar%d: using %s (%s).\n", nr, port->name,
+              (port->irq == PARPORT_IRQ_NONE) ? "polling" :
"interrupt-driven");
+
+       if(probe_ti_parallel(nr) != -1)
+               printk("tipar%d: link cable found !\n", nr);
+       else
+               printk("tipar%d: link cable not found (do not plug cable to
calc).\n", nr);
+
+       return 0;
+}
+
+static void tipar_attach (struct parport *port)
+{
+       if (tp_count == PP_NO) {
+               printk("tipar: ignoring parallel port (max. %d)\n", 
+                      PP_NO);
+               return;
+       }
+       if (!tipar_register(tp_count, port))
+               tp_count++;
+}
+
+static void tipar_detach (struct parport *port)
+{
+       /* Will be written at some point in the future */
+}
+
+static struct parport_driver tipar_driver = {
+       "tipar",
+       tipar_attach,
+       tipar_detach,
+       NULL
+};
+
+int tipar_init(void)
+{
+       unsigned int i;
+       
+       /* Initialize structure */
+       for (i = 0; i < PP_NO; i++) {
+               table[i].dev = NULL;
+               table[i].opened = 0;
+       }
+
+       /* Register parport device */  
+       if (devfs_register_chrdev (TIPAR_MAJOR, "tipar", &tipar_fops)) {
+               printk("tipar: unable to get major %d\n", TIPAR_MAJOR);
+               return -EIO;
+       }
+
+       /* Use devfs with tree: /dev/ticables/par/[0..2] */
+       devfs_handle = devfs_mk_dir (NULL, "ticables/par", NULL);
+
+       if (parport_register_driver (&tipar_driver)) {
+               printk ("tipar: unable to register with parport\n");
+               return -EIO;
+       }
+
+       return 0;
+}  
+
+int __init tipar_init_module(void)
+{
+       printk("tipar: parallel link cable driver, version %s\n", VERSION);
+       return tipar_init();
+}
+
+void __exit tipar_cleanup_module(void)
+{
+       unsigned int offset;
+
+       /* Unregistering module */
+       parport_unregister_driver (&tipar_driver);
+
+       devfs_unregister (devfs_handle);
+       devfs_unregister_chrdev(TIPAR_MAJOR, "tipar");  
+
+       for (offset = 0; offset < PP_NO; offset++) {
+               if (table[offset].dev == NULL)
+                       continue;
+               parport_unregister_device(table[offset].dev);
+       }
+}
+
+__setup("tipar=", tipar_setup);
+module_init(tipar_init_module);
+module_exit(tipar_cleanup_module);
+
+MODULE_AUTHOR("Author/Maintainer: Romain Lievin <roms@lpg.ticalc.org>");
+MODULE_DESCRIPTION("Device driver for TI/PC parallel link cables");
+MODULE_LICENSE("GPL");
+
+EXPORT_NO_SYMBOLS;
+
+MODULE_PARM(timeout, "i");
+MODULE_PARM_DESC(timeout, "Timeout, default=1.5 seconds");
+MODULE_PARM(delay, "i");
+MODULE_PARM_DESC(delay, "Inter-bit delay, default=10 microseconds");
--- linux.orig/include/linux/ticable.h  Wed Mar 13 19:42:30 2002
+++ linux/include/linux/ticable.h       Wed Mar 13 21:25:03 2002
@@ -0,0 +1,41 @@
+/* Hey EMACS -*- linux-c -*-
+ *
+ * tipar/tiser/tiglusb - low level driver for handling link cables
+ * designed for Texas Instruments graphing calculators.
+ *
+ * Copyright (C) 2000-2002, Romain Lievin <roms@lpg.ticalc.org>
+ * under the terms of the GNU General Public License.
+ */
+
+#ifndef TICABLE_H 
+#define TICABLE_H 1
+
+/* Internal default constants for the kernel module */
+#define TIMAXTIME 10      /* 1 seconds                         */
+#define IO_DELAY  10      /* 10 micro-seconds  */
+
+/* Major & minor number for character devices */
+#define TIPAR_MAJOR   61
+#define TIPAR_MINOR_0  1
+#define TIPAR_MINOR_1  2
+#define TIPAR_MINOR_2  3
+
+#define TISER_MAJOR   62
+#define TISER_MINOR_0  1
+#define TISER_MINOR_1  2
+#define TISER_MINOR_2  3
+#define TISER_MINOR_3  4
+
+/*
+ * Request values for the 'ioctl' function.
+ * Simply pass the appropriate value as arg of the ioctl call.
+ * These values do not conflict with other ones but they have to be
+ * allocated... (/usr/src/linux/Documentation/ioctl-number.txt).
+ */
+#define TIPAR_DELAY     _IOW('p', 0xa8, int) /* set delay   */
+#define TIPAR_TIMEOUT   _IOW('p', 0xa9, int) /* set timeout */
+
+#define TISER_DELAY     _IOW('p', 0xa0, int) /* set delay   */
+#define TISER_TIMEOUT   _IOW('p', 0xa1, int) /* set timeout */
+
+#endif /* TICABLE_H */


Romain.

---
Romain Liévin (aka roms)
http://lpg.ticalc.org/prj_tilp, prj_usb, prj_tidev, prj_gtktiemu
mail: roms@lpg.ticalc.org

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-03-13 19:21 Romain Liévin
@ 2002-03-13 19:43 ` Alan Cox
  2002-03-13 20:28   ` Romain Liévin
  2002-03-14  7:08 ` Zwane Mwaikambo
  1 sibling, 1 reply; 657+ messages in thread
From: Alan Cox @ 2002-03-13 19:43 UTC (permalink / raw)
  To: Romain Liévin; +Cc: Kernel List, Linus Torvalds, Alan Cox, Tim Waugh

> It has been tested on x86 for almost 2 years and on Alpha & Sparc too with 
> various calculators.

One oddity - some other comments

> +static int tipar_open(struct inode *inode, struct file *file)
> +{
> +       unsigned int minor = minor(inode->i_rdev) - TIPAR_MINOR_0;
> +
> +       if (minor >= PP_NO)
> +               return -ENXIO;  
> +       
> +       init_ti_parallel(minor);
> +
> +       MOD_INC_USE_COUNT;

You should remove these and use in 2.4 + . Also what stops multiple
simultaneous runs of init_ti_parallel if two people open it at once ?


> +static unsigned int tipar_poll(struct file *file, poll_table * wait)
> +{
> +       unsigned int mask=0;
> +       return mask;
> +}

That seems unfinished ??

> +static int tipar_ioctl(struct inode *inode, struct file *file,
> +                      unsigned int cmd, unsigned long arg)
> +       case O_NONBLOCK:
> +               file->f_flags |= O_NONBLOCK;
> +               return 0;

O_NDELAY is set by fcntl - your driver never needs this.

> +       default:
> +               retval = -EINVAL;

SuS says -ENOTTY here (lots of drivers get this wrong still)

> +static long long tipar_lseek(struct file * file, long long offset, int origin)
> +{
> +       return -ESPIPE;
> +}

There is a generic no_llseek function

> +/* Major & minor number for character devices */
> +#define TIPAR_MAJOR   61

These don't appear to be officially assigned via lanana ?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-28 13:58 shura
@ 2002-03-01 15:30 ` Jan-Marek Glogowski
  0 siblings, 0 replies; 657+ messages in thread
From: Jan-Marek Glogowski @ 2002-03-01 15:30 UTC (permalink / raw)
  To: shura; +Cc: linux-kernel

Hi

> I'm setting up a new machine with a pair of IDE drives connected to
> HPT 370 controller. I defined a RAID-1 array using the HPT370 bios
> setting utility.
> Description - hard:
> motherboard Abit ST6-RAID, HPT370, 2 identical hard disks as
> primary/secondary master on ide3/ide4
> - bios:
> Primary Master:   Mirror (Raid 1) for Array #0 UDMA 5 78150 BOOT
> Primary Slave:    No drive
> Secondary Master: Mirror (Raid 1) for Array #0 UDMA 5 78150 HIDDEEN
> Secondary Slave:  No drive
> - os:
> Linux RedHat 7.1 & kernel 2.4.17
> with compilation option
> CONFIG_BLK_DEV_ATARAID_HPT=y
> Lilo:
> ...
> root=/dev/hde10

The root should be /dev/ataraid/xxx for any ata raid but that is not the
real problem...

> During system booting i see following
> ...
> ataraid/d0: ataraid/d0p1 ataraid/d0p2 ataraid/d0p3 ataraid/d0p4 <>
> Highpoint HPT370 Softwareraid driver for linux version 0.01
> Drive 0 is 76319 Mb
> Drive 6 is 76319 Mb
> Raid array consists of 2 drivers
> ...
> Kernel panic: VFS: Unable to mount root fs on 21:0a
> ...

Ataraid seems to find four partitions d0p[1234]. But as far as I know
mirroring isn't supported by the in kernel open source drivers at all -
you may look at the closed source drivers at www.highpoint-tech.com, if
you really need the "hpt native" raid.
(http://people.redhat.com/arjanv/pdcraid/ataraidhowto.html)

> Booting with option root=/dev/atarad/d0p1 ro
> (or root=/dev/ataraid/d0p10 ro)
> and etc - no effect

If you just just use need to access the harddisks from linux it is
suggested to use linux software raid (there was a discussion at lkml - if
I remember right). On modern PCs it uses < 5% CPU and is faster, as it
operates high level in the kernel, not right before the hardware, as those
"big software part, small hardware part" raid controller from Promise and
Highpoint do.

HTH

Jan-Marek

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  4:02     ` Alexander Viro
@ 2002-02-26  5:50       ` Rusty Russell
  0 siblings, 0 replies; 657+ messages in thread
From: Rusty Russell @ 2002-02-26  5:50 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

In message <Pine.GSO.4.21.0202242230370.1549-100000@weyl.math.psu.edu> you write:
> Honour or not, in this case your complaint is hardly deserved.  To
> compress the above a bit:
> 
> you: <false statement>
> me: RTFS.  <short description of the reasons why statement is wrong; further
> details could be obtained by reading TFS>

Al, *please* read.

Rusty said:
> First, fd passing sucks: you can't leave an fd somewhere and wait for
> someone to pick it up, and they vanish when you exit.  Secondly, you
> have some arbitrary limit on the number of semaphores.  Thirdly,
> someone has to own them.

These are all true: I was criticising the "fd == semaphore" approach,
in the context of my "tied to mapped location" approach, and Linus's
"magic cookie" approach.

I went on to explain furthur:

> Consider tdb, the Trivial Database.  There is no "master locking
> daemon".  There is no way for the first opener (who then has to create
> the semaphores in your model) to pass them to other openers: this is a
> library.

You also managed to ignore my previous comment on the "fd ==
semaphore" approach:

> Implemented exactly that (and posted to l-k IIRC), and it's
> *horrible* to use.

And you came out assuming I had no idea how fd passing works:

> Yes, you can.  Please, RTFS

...and then in the next mail you suggested I implement a "master
locking daemon".

I have taken the liberty of rewriting your reply as I might expect to
see from a peer:

================
From: Al Viro's Polite Twin
To: Rusty Russell
Subject: Re: [PATCH] Lightweight userspace semaphores... 
Date: Two days after hell freezes over

On Mon, 25 Feb 2002, Rusty Russell wrote:
> First, fd passing sucks: you can't leave an fd somewhere and wait for
> someone to pick it up, and they vanish when you exit.  Secondly, you

Have you considered using a daemon to hold the fds?  It shouldn't be
that bad.

================

See how it doesn't assume that I am an idiot?  It's not condescending,
and invites furthur consideration.  It's also shorter than your other
two replies.

I might have replied as follows:

		Yes, and for a "serious" database it's not a problem, as
	it usually has some kind of daemon anyway.  But for TDB, I
	found that it's fragile and extremely unwieldy.  Creating a
	unix domain socket for each .tdb file may not be possible.
	The tdb_open call would have to fork off a daemon if it's the
	first process to access it.  It starts to get fairly icky:
	certainly when compared with the fairly trivial patch to
	support the "semaphore tied to mapped region" approach.

		You can try if you want (TDB enclosed).

Maybe I'm the only one who finds it *really* painful to continually
deal with your "Dan Bernstein of Linux" approach: enough that it
hinders my kernel work.

Genuinely hope this helps,
Rusty.
--
  Taste: it's not just for source code anymore...

#ifndef __TDB_H__
#define __TDB_H__
/* 
   Unix SMB/Netbios implementation.
   Version 3.0
   Samba database functions
   Copyright (C) Andrew Tridgell 1999
   
   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
#ifdef  __cplusplus
extern "C" {
#endif

/* flags to tdb_store() */
#define TDB_REPLACE 1
#define TDB_INSERT 2
#define TDB_MODIFY 3

/* flags for tdb_open() */
#define TDB_DEFAULT 0 /* just a readability place holder */
#define TDB_CLEAR_IF_FIRST 1
#define TDB_INTERNAL 2 /* don't store on disk */
#define TDB_NOLOCK   4 /* don't do any locking */
#define TDB_NOMMAP   8 /* don't use mmap */
#define TDB_CONVERT 16 /* convert endian (internal use) */

#define TDB_ERRCODE(code, ret) ((tdb->ecode = (code)), ret)

/* error codes */
enum TDB_ERROR {TDB_SUCCESS=0, TDB_ERR_CORRUPT, TDB_ERR_IO, TDB_ERR_LOCK, 
		TDB_ERR_OOM, TDB_ERR_EXISTS, TDB_ERR_NOEXIST, TDB_ERR_NOLOCK };

#ifndef u32
#define u32 unsigned
#endif

typedef struct {
	char *dptr;
	size_t dsize;
} TDB_DATA;

typedef u32 tdb_len;
typedef u32 tdb_off;

/* this is stored at the front of every database */
struct tdb_header {
	char magic_food[32]; /* for /etc/magic */
	u32 version; /* version of the code */
	u32 hash_size; /* number of hash entries */
	tdb_off rwlocks;
	tdb_off reserved[31];
};

struct tdb_lock_type {
	u32 count;
	u32 ltype;
};

struct tdb_traverse_lock {
	struct tdb_traverse_lock *next;
	u32 off;
	u32 hash;
};

/* this is the context structure that is returned from a db open */
typedef struct tdb_context {
	char *name; /* the name of the database */
	void *map_ptr; /* where it is currently mapped */
	int fd; /* open file descriptor for the database */
	tdb_len map_size; /* how much space has been mapped */
	int read_only; /* opened read-only */
	struct tdb_lock_type *locked; /* array of chain locks */
	enum TDB_ERROR ecode; /* error code for last tdb error */
	struct tdb_header header; /* a cached copy of the header */
	u32 flags; /* the flags passed to tdb_open */
	u32 *lockedkeys; /* array of locked keys: first is #keys */
	struct tdb_traverse_lock travlocks; /* current traversal locks */
	struct tdb_context *next; /* all tdbs to avoid multiple opens */
	dev_t device;	/* uniquely identifies this tdb */
	ino_t inode;	/* uniquely identifies this tdb */
} TDB_CONTEXT;

typedef int (*tdb_traverse_func)(TDB_CONTEXT *, TDB_DATA, TDB_DATA, void *);
typedef void (*tdb_log_func)(TDB_CONTEXT *, int , const char *, ...);

TDB_CONTEXT *tdb_open(char *name, int hash_size, int tdb_flags,
		      int open_flags, mode_t mode);

enum TDB_ERROR tdb_error(TDB_CONTEXT *tdb);
const char *tdb_errorstr(TDB_CONTEXT *tdb);
TDB_DATA tdb_fetch(TDB_CONTEXT *tdb, TDB_DATA key);
int tdb_delete(TDB_CONTEXT *tdb, TDB_DATA key);
int tdb_store(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf, int flag);
int tdb_close(TDB_CONTEXT *tdb);
TDB_DATA tdb_firstkey(TDB_CONTEXT *tdb);
TDB_DATA tdb_nextkey(TDB_CONTEXT *tdb, TDB_DATA key);
int tdb_traverse(TDB_CONTEXT *tdb, tdb_traverse_func fn, void *state);
int tdb_exists(TDB_CONTEXT *tdb, TDB_DATA key);
int tdb_lockkeys(TDB_CONTEXT *tdb, u32 number, TDB_DATA keys[]);
void tdb_unlockkeys(TDB_CONTEXT *tdb);
int tdb_lockall(TDB_CONTEXT *tdb);
void tdb_unlockall(TDB_CONTEXT *tdb);

/* Low level locking functions: use with care */
int tdb_chainlock(TDB_CONTEXT *tdb, TDB_DATA key);
void tdb_chainunlock(TDB_CONTEXT *tdb, TDB_DATA key);

/* Debug functions. Not used in production. */
void tdb_dump_all(TDB_CONTEXT *tdb);
void tdb_printfreelist(TDB_CONTEXT *tdb);

extern TDB_DATA tdb_null;
#ifdef  __cplusplus
}
#endif

#endif /* tdb.h */

 /* 
   Unix SMB/Netbios implementation.
   Version 3.0
   Samba database functions
   Copyright (C) Andrew Tridgell              1999-2000
   Copyright (C) Luke Kenneth Casson Leighton      2000
   Copyright (C) Paul `Rusty' Russell		   2000
   Copyright (C) Jeremy Allison			   2000
   
   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include "tdb.h"

#define TDB_MAGIC_FOOD "TDB file\n"
#define TDB_VERSION (0x26011967 + 6)
#define TDB_MAGIC (0x26011999U)
#define TDB_FREE_MAGIC (~TDB_MAGIC)
#define TDB_DEAD_MAGIC (0xFEE1DEAD)
#define TDB_ALIGNMENT 4
#define MIN_REC_SIZE (2*sizeof(struct list_struct) + TDB_ALIGNMENT)
#define DEFAULT_HASH_SIZE 131
#define TDB_PAGE_SIZE 0x2000
#define FREELIST_TOP (sizeof(struct tdb_header))
#define TDB_ALIGN(x,a) (((x) + (a)-1) & ~((a)-1))
#define TDB_BYTEREV(x) (((((x)&0xff)<<24)|((x)&0xFF00)<<8)|(((x)>>8)&0xFF00)|((x)>>24))
#define TDB_DEAD(r) ((r)->magic == TDB_DEAD_MAGIC)
#define TDB_BAD_MAGIC(r) ((r)->magic != TDB_MAGIC && !TDB_DEAD(r))
#define TDB_HASH_TOP(hash) (FREELIST_TOP + (BUCKET(hash)+1)*sizeof(tdb_off))

/* lock offsets */
#define GLOBAL_LOCK 0
#define ACTIVE_LOCK 4

#ifndef MAP_FILE
#define MAP_FILE 0
#endif

#ifndef MAP_FAILED
#define MAP_FAILED ((void *)-1)
#endif

#define BUCKET(hash) ((hash) % tdb->header.hash_size)
TDB_DATA tdb_null;

/* all contexts, to ensure no double-opens (fcntl locks don't nest!) */
static TDB_CONTEXT *tdbs = NULL;

static void tdb_munmap(TDB_CONTEXT *tdb)
{
	if (tdb->flags & TDB_INTERNAL)
		return;

	if (tdb->map_ptr)
		munmap(tdb->map_ptr, tdb->map_size);
	tdb->map_ptr = NULL;
}

static void tdb_mmap(TDB_CONTEXT *tdb)
{
	if (tdb->flags & TDB_INTERNAL)
		return;

	if (!(tdb->flags & TDB_NOMMAP)) {
		tdb->map_ptr = mmap(NULL, tdb->map_size, 
				    PROT_READ|(tdb->read_only? 0:PROT_WRITE), 
				    MAP_SHARED|MAP_FILE, tdb->fd, 0);

		/*
		 * NB. When mmap fails it returns MAP_FAILED *NOT* NULL !!!!
		 */

		if (tdb->map_ptr == MAP_FAILED)
			tdb->map_ptr = NULL;
	} else {
		tdb->map_ptr = NULL;
	}
}

/* Endian conversion: we only ever deal with 4 byte quantities */
static void *convert(void *buf, u32 size)
{
	u32 i, *p = buf;
	for (i = 0; i < size / 4; i++)
		p[i] = TDB_BYTEREV(p[i]);
	return buf;
}
#define DOCONV() (tdb->flags & TDB_CONVERT)
#define CONVERT(x) (DOCONV() ? convert(&x, sizeof(x)) : &x)

/* the body of the database is made of one list_struct for the free space
   plus a separate data list for each hash value */
struct list_struct {
	tdb_off next; /* offset of the next record in the list */
	tdb_len rec_len; /* total byte length of record */
	tdb_len key_len; /* byte length of key */
	tdb_len data_len; /* byte length of data */
	u32 full_hash; /* the full 32 bit hash of the key */
	u32 magic;   /* try to catch errors */
	/* the following union is implied:
		union {
			char record[rec_len];
			struct {
				char key[key_len];
				char data[data_len];
			}
			u32 totalsize; (tailer)
		}
	*/
};

/* a byte range locking function - return 0 on success
   this functions locks/unlocks 1 byte at the specified offset.

   On error, errno is also set so that errors are passed back properly
   through tdb_open(). */
static int tdb_brlock(TDB_CONTEXT *tdb, tdb_off offset, 
		      int rw_type, int lck_type)
{
	struct flock fl;

	if (tdb->flags & TDB_NOLOCK)
		return 0;
	if (tdb->read_only) {
		errno = EACCES;
		return -1;
	}

	fl.l_type = rw_type;
	fl.l_whence = SEEK_SET;
	fl.l_start = offset;
	fl.l_len = 1;
	fl.l_pid = 0;

	if (fcntl(tdb->fd,lck_type,&fl)) {
		/* errno set by fcntl */
		return TDB_ERRCODE(TDB_ERR_LOCK, -1);
	}
	return 0;
}

/* lock a list in the database. list -1 is the alloc list */
static int tdb_lock(TDB_CONTEXT *tdb, int list, int ltype)
{
	if (list < -1 || list >= (int)tdb->header.hash_size) {
		return -1;
	}
	if (tdb->flags & TDB_NOLOCK)
		return 0;

	/* Since fcntl locks don't nest, we do a lock for the first one,
	   and simply bump the count for future ones */
	if (tdb->locked[list+1].count == 0) {
		if (tdb_brlock(tdb,FREELIST_TOP+4*list,ltype,F_SETLKW)) {
			return -1;
		}
		tdb->locked[list+1].ltype = ltype;
	}
	tdb->locked[list+1].count++;
	return 0;
}

/* unlock the database: returns void because it's too late for errors. */
static void tdb_unlock(TDB_CONTEXT *tdb, int list, int ltype)
{
	if (tdb->flags & TDB_NOLOCK)
		return;

	/* Sanity checks */
	if (list < -1 || list >= (int)tdb->header.hash_size)
		return;
	if (tdb->locked[list+1].count==0)
		return;

	if (tdb->locked[list+1].count == 1) {
		/* Down to last nested lock: unlock underneath */
		tdb_brlock(tdb, FREELIST_TOP+4*list, F_UNLCK, F_SETLKW);
	}
	tdb->locked[list+1].count--;
}

/* This is based on the hash agorithm from gdbm */
static u32 tdb_hash(TDB_DATA *key)
{
	u32 value;	/* Used to compute the hash value.  */
	u32   i;	/* Used to cycle through random values. */

	/* Set the initial value from the key size. */
	for (value = 0x238F13AF * key->dsize, i=0; i < key->dsize; i++)
		value = (value + (key->dptr[i] << (i*5 % 24)));

	return (1103515243 * value + 12345);  
}

/* check for an out of bounds access - if it is out of bounds then
   see if the database has been expanded by someone else and expand
   if necessary 
   note that "len" is the minimum length needed for the db
*/
static int tdb_oob(TDB_CONTEXT *tdb, tdb_off len)
{
	struct stat st;
	if (len <= tdb->map_size)
		return 0;
	if (tdb->flags & TDB_INTERNAL) {
		return TDB_ERRCODE(TDB_ERR_IO, -1);
	}

	if (fstat(tdb->fd, &st) == -1)
		return TDB_ERRCODE(TDB_ERR_IO, -1);

	if (st.st_size < (size_t)len) {
		return TDB_ERRCODE(TDB_ERR_IO, -1);
	}

	/* Unmap, update size, remap */
	tdb_munmap(tdb);
	tdb->map_size = st.st_size;
	tdb_mmap(tdb);
	return 0;
}

/* write a lump of data at a specified offset */
static int tdb_write(TDB_CONTEXT *tdb, tdb_off off, void *buf, tdb_len len)
{
	if (tdb_oob(tdb, off + len) != 0)
		return -1;

	if (tdb->map_ptr)
		memcpy(off + (char *)tdb->map_ptr, buf, len);
	else if (lseek(tdb->fd, off, SEEK_SET) != off
		 || write(tdb->fd, buf, len) != (ssize_t)len) {
		return TDB_ERRCODE(TDB_ERR_IO, -1);
	}
	return 0;
}

/* read a lump of data at a specified offset, maybe convert */
static int tdb_read(TDB_CONTEXT *tdb,tdb_off off,void *buf,tdb_len len,int cv)
{
	if (tdb_oob(tdb, off + len) != 0)
		return -1;

	if (tdb->map_ptr)
		memcpy(buf, off + (char *)tdb->map_ptr, len);
	else if (lseek(tdb->fd, off, SEEK_SET) != off
		 || read(tdb->fd, buf, len) != (ssize_t)len) {
		return TDB_ERRCODE(TDB_ERR_IO, -1);
	}
	if (cv)
		convert(buf, len);
	return 0;
}

/* read a lump of data, allocating the space for it */
static char *tdb_alloc_read(TDB_CONTEXT *tdb, tdb_off offset, tdb_len len)
{
	char *buf;

	if (!(buf = malloc(len))) {
		return TDB_ERRCODE(TDB_ERR_OOM, buf);
	}
	if (tdb_read(tdb, offset, buf, len, 0) == -1) {
		free(buf);
		return NULL;
	}
	return buf;
}

/* read/write a tdb_off */
static int ofs_read(TDB_CONTEXT *tdb, tdb_off offset, tdb_off *d)
{
	return tdb_read(tdb, offset, (char*)d, sizeof(*d), DOCONV());
}
static int ofs_write(TDB_CONTEXT *tdb, tdb_off offset, tdb_off *d)
{
	tdb_off off = *d;
	return tdb_write(tdb, offset, CONVERT(off), sizeof(*d));
}

/* read/write a record */
static int rec_read(TDB_CONTEXT *tdb, tdb_off offset, struct list_struct *rec)
{
	if (tdb_read(tdb, offset, rec, sizeof(*rec),DOCONV()) == -1)
		return -1;
	if (TDB_BAD_MAGIC(rec)) {
		return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
	}
	return tdb_oob(tdb, rec->next+sizeof(*rec));
}
static int rec_write(TDB_CONTEXT *tdb, tdb_off offset, struct list_struct *rec)
{
	struct list_struct r = *rec;
	return tdb_write(tdb, offset, CONVERT(r), sizeof(r));
}

/* read a freelist record and check for simple errors */
static int rec_free_read(TDB_CONTEXT *tdb, tdb_off off, struct list_struct *rec)
{
	if (tdb_read(tdb, off, rec, sizeof(*rec),DOCONV()) == -1)
		return -1;
	if (rec->magic != TDB_FREE_MAGIC) {
		return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
	}
	if (tdb_oob(tdb, rec->next+sizeof(*rec)) != 0)
		return -1;
	return 0;
}

/* update a record tailer (must hold allocation lock) */
static int update_tailer(TDB_CONTEXT *tdb, tdb_off offset,
			 const struct list_struct *rec)
{
	tdb_off totalsize;

	/* Offset of tailer from record header */
	totalsize = sizeof(*rec) + rec->rec_len;
	return ofs_write(tdb, offset + totalsize - sizeof(tdb_off),
			 &totalsize);
}

static tdb_off tdb_dump_record(TDB_CONTEXT *tdb, tdb_off offset)
{
	struct list_struct rec;
	tdb_off tailer_ofs, tailer;

	if (tdb_read(tdb, offset, (char *)&rec, sizeof(rec), DOCONV()) == -1) {
		printf("ERROR: failed to read record at %u\n", offset);
		return 0;
	}

	printf(" rec: offset=%u next=%d rec_len=%d key_len=%d data_len=%d full_hash=0x%x magic=0x%x\n",
	       offset, rec.next, rec.rec_len, rec.key_len, rec.data_len, rec.full_hash, rec.magic);

	tailer_ofs = offset + sizeof(rec) + rec.rec_len - sizeof(tdb_off);
	if (ofs_read(tdb, tailer_ofs, &tailer) == -1) {
		printf("ERROR: failed to read tailer at %u\n", tailer_ofs);
		return rec.next;
	}

	if (tailer != rec.rec_len + sizeof(rec)) {
		printf("ERROR: tailer does not match record! tailer=%u totalsize=%u\n", tailer, rec.rec_len + sizeof(rec));
	}
	return rec.next;
}

static void tdb_dump_chain(TDB_CONTEXT *tdb, int i)
{
	tdb_off rec_ptr, top;

	top = TDB_HASH_TOP(i);

	tdb_lock(tdb, i, F_WRLCK);

	if (ofs_read(tdb, top, &rec_ptr) == -1) {
		tdb_unlock(tdb, i, F_WRLCK);
		return;
	}

	if (rec_ptr)
		printf("hash=%d\n", i);

	while (rec_ptr) {
		rec_ptr = tdb_dump_record(tdb, rec_ptr);
	}
	tdb_unlock(tdb, i, F_WRLCK);
}

void tdb_dump_all(TDB_CONTEXT *tdb)
{
	int i;
	for (i=0;i<tdb->header.hash_size;i++) {
		tdb_dump_chain(tdb, i);
	}
	printf("freelist:\n");
	tdb_dump_chain(tdb, -1);
}

void tdb_printfreelist(TDB_CONTEXT *tdb)
{
	long total_free = 0;
	tdb_off offset, rec_ptr, last_ptr;
	struct list_struct rec;

	tdb_lock(tdb, -1, F_WRLCK);

	last_ptr = 0;
	offset = FREELIST_TOP;

	/* read in the freelist top */
	if (ofs_read(tdb, offset, &rec_ptr) == -1) {
		return;
	}

	printf("freelist top=[0x%08x]\n", rec_ptr );
	while (rec_ptr) {
		if (tdb_read(tdb, rec_ptr, (char *)&rec, sizeof(rec), DOCONV()) == -1) {
			return;
		}

		if (rec.magic != TDB_FREE_MAGIC) {
			printf("bad magic 0x%08x in free list\n", rec.magic);
			return;
		}

		printf("entry offset=[0x%08x], rec.rec_len = [0x%08x (%d)]\n", rec.next, rec.rec_len, rec.rec_len );
		total_free += rec.rec_len;

		/* move to the next record */
		rec_ptr = rec.next;
	}
	printf("total rec_len = [0x%08x (%d)]\n", (int)total_free, 
               (int)total_free);

	tdb_unlock(tdb, -1, F_WRLCK);
}

/* Remove an element from the freelist.  Must have alloc lock. */
static int remove_from_freelist(TDB_CONTEXT *tdb, tdb_off off, tdb_off next)
{
	tdb_off last_ptr, i;

	/* read in the freelist top */
	last_ptr = FREELIST_TOP;
	while (ofs_read(tdb, last_ptr, &i) != -1 && i != 0) {
		if (i == off) {
			/* We've found it! */
			return ofs_write(tdb, last_ptr, &next);
		}
		/* Follow chain (next offset is at start of record) */
		last_ptr = i;
	}
	return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
}

/* Add an element into the freelist. Merge adjacent records if
   neccessary. */
static int tdb_free(TDB_CONTEXT *tdb, tdb_off offset, struct list_struct *rec)
{
	tdb_off right, left;

	/* Allocation and tailer lock */
	if (tdb_lock(tdb, -1, F_WRLCK) != 0)
		return -1;

	/* set an initial tailer, so if we fail we don't leave a bogus record */
	update_tailer(tdb, offset, rec);

	/* Look right first (I'm an Australian, dammit) */
	right = offset + sizeof(*rec) + rec->rec_len;
	if (right + sizeof(*rec) <= tdb->map_size) {
		struct list_struct r;

		if (tdb_read(tdb, right, &r, sizeof(r), DOCONV()) == -1) {
			goto left;
		}

		/* If it's free, expand to include it. */
		if (r.magic == TDB_FREE_MAGIC) {
			if (remove_from_freelist(tdb, right, r.next) == -1) {
				goto left;
			}
			rec->rec_len += sizeof(r) + r.rec_len;
		}
	}

left:
	/* Look left */
	left = offset - sizeof(tdb_off);
	if (left > TDB_HASH_TOP(tdb->header.hash_size-1)) {
		struct list_struct l;
		tdb_off leftsize;

		/* Read in tailer and jump back to header */
		if (ofs_read(tdb, left, &leftsize) == -1) {
			goto update;
		}
		left = offset - leftsize;

		/* Now read in record */
		if (tdb_read(tdb, left, &l, sizeof(l), DOCONV()) == -1) {
			goto update;
		}

		/* If it's free, expand to include it. */
		if (l.magic == TDB_FREE_MAGIC) {
			if (remove_from_freelist(tdb, left, l.next) == -1) {
				goto update;
			} else {
				offset = left;
				rec->rec_len += leftsize;
			}
		}
	}

update:
	if (update_tailer(tdb, offset, rec) == -1) {
		goto fail;
	}

	/* Now, prepend to free list */
	rec->magic = TDB_FREE_MAGIC;

	if (ofs_read(tdb, FREELIST_TOP, &rec->next) == -1 ||
	    rec_write(tdb, offset, rec) == -1 ||
	    ofs_write(tdb, FREELIST_TOP, &offset) == -1) {
		goto fail;
	}

	/* And we're done. */
	tdb_unlock(tdb, -1, F_WRLCK);
	return 0;

 fail:
	tdb_unlock(tdb, -1, F_WRLCK);
	return -1;
}


/* expand a file.  we prefer to use ftruncate, as that is what posix
  says to use for mmap expansion */
static int expand_file(TDB_CONTEXT *tdb, tdb_off size, tdb_off addition)
{
	char buf[1024];

	if (ftruncate(tdb->fd, size+addition) != 0) {
		return -1;
	}

	/* now fill the file with something. This ensures that the file isn't sparse, which would be
	   very bad if we ran out of disk. This must be done with write, not via mmap */
	memset(buf, 0x42, sizeof(buf));
	while (addition) {
		int n = addition>sizeof(buf)?sizeof(buf):addition;
		int ret;
		if (lseek(tdb->fd, size, SEEK_SET) != size)
			return -1;
		ret = write(tdb->fd, buf, n);
		if (ret != n) {
			return -1;
		}
		addition -= n;
		size += n;
	}
	return 0;
}


/* expand the database at least size bytes by expanding the underlying
   file and doing the mmap again if necessary */
static int tdb_expand(TDB_CONTEXT *tdb, tdb_off size)
{
	struct list_struct rec;
	tdb_off offset;

	if (tdb_lock(tdb, -1, F_WRLCK) == -1) {
		return -1;
	}

	/* must know about any previous expansions by another process */
	tdb_oob(tdb, tdb->map_size + 1);

	/* always make room for at least 10 more records, and round
           the database up to a multiple of TDB_PAGE_SIZE */
	size = TDB_ALIGN(tdb->map_size + size*10, TDB_PAGE_SIZE) - tdb->map_size;

	if (!(tdb->flags & TDB_INTERNAL))
		tdb_munmap(tdb);

	/*
	 * We must ensure the file is unmapped before doing this
	 * to ensure consistency with systems like OpenBSD where
	 * writes and mmaps are not consistent.
	 */

	/* expand the file itself */
	if (!(tdb->flags & TDB_INTERNAL)) {
		if (expand_file(tdb, tdb->map_size, size) != 0)
			goto fail;
	}

	tdb->map_size += size;

	if (tdb->flags & TDB_INTERNAL)
		tdb->map_ptr = realloc(tdb->map_ptr, tdb->map_size);
	else {
		/*
		 * We must ensure the file is remapped before adding the space
		 * to ensure consistency with systems like OpenBSD where
		 * writes and mmaps are not consistent.
		 */

		/* We're ok if the mmap fails as we'll fallback to read/write */
		tdb_mmap(tdb);
	}

	/* form a new freelist record */
	memset(&rec,'\0',sizeof(rec));
	rec.rec_len = size - sizeof(rec);

	/* link it into the free list */
	offset = tdb->map_size - size;
	if (tdb_free(tdb, offset, &rec) == -1)
		goto fail;

	tdb_unlock(tdb, -1, F_WRLCK);
	return 0;
 fail:
	tdb_unlock(tdb, -1, F_WRLCK);
	return -1;
}

/* allocate some space from the free list. The offset returned points
   to a unconnected list_struct within the database with room for at
   least length bytes of total data

   0 is returned if the space could not be allocated
 */
static tdb_off tdb_allocate(TDB_CONTEXT *tdb, tdb_len length,
			    struct list_struct *rec)
{
	tdb_off rec_ptr, last_ptr, newrec_ptr;
	struct list_struct newrec;

	if (tdb_lock(tdb, -1, F_WRLCK) == -1)
		return 0;

	/* Extra bytes required for tailer */
	length += sizeof(tdb_off);

 again:
	last_ptr = FREELIST_TOP;

	/* read in the freelist top */
	if (ofs_read(tdb, FREELIST_TOP, &rec_ptr) == -1)
		goto fail;

	/* keep looking until we find a freelist record big enough */
	while (rec_ptr) {
		if (rec_free_read(tdb, rec_ptr, rec) == -1)
			goto fail;

		if (rec->rec_len >= length) {
			/* found it - now possibly split it up  */
			if (rec->rec_len > length + MIN_REC_SIZE) {
				/* Length of left piece */
				length = TDB_ALIGN(length, TDB_ALIGNMENT);

				/* Right piece to go on free list */
				newrec.rec_len = rec->rec_len
					- (sizeof(*rec) + length);
				newrec_ptr = rec_ptr + sizeof(*rec) + length;

				/* And left record is shortened */
				rec->rec_len = length;
			} else
				newrec_ptr = 0;

			/* Remove allocated record from the free list */
			if (ofs_write(tdb, last_ptr, &rec->next) == -1)
				goto fail;

			/* Update header: do this before we drop alloc
                           lock, otherwise tdb_free() might try to
                           merge with us, thinking we're free.
                           (Thanks Jeremy Allison). */
			rec->magic = TDB_MAGIC;
			if (rec_write(tdb, rec_ptr, rec) == -1)
				goto fail;

			/* Did we create new block? */
			if (newrec_ptr) {
				/* Update allocated record tailer (we
                                   shortened it). */
				if (update_tailer(tdb, rec_ptr, rec) == -1)
					goto fail;

				/* Free new record */
				if (tdb_free(tdb, newrec_ptr, &newrec) == -1)
					goto fail;
			}

			/* all done - return the new record offset */
			tdb_unlock(tdb, -1, F_WRLCK);
			return rec_ptr;
		}
		/* move to the next record */
		last_ptr = rec_ptr;
		rec_ptr = rec->next;
	}
	/* we didn't find enough space. See if we can expand the
	   database and if we can then try again */
	if (tdb_expand(tdb, length + sizeof(*rec)) == 0)
		goto again;
 fail:
	tdb_unlock(tdb, -1, F_WRLCK);
	return 0;
}

/* initialise a new database with a specified hash size */
static int tdb_new_database(TDB_CONTEXT *tdb, int hash_size)
{
	struct tdb_header *newdb;
	int size, ret = -1;

	/* We make it up in memory, then write it out if not internal */
	size = sizeof(struct tdb_header) + (hash_size+1)*sizeof(tdb_off);
	if (!(newdb = calloc(size, 1)))
		return TDB_ERRCODE(TDB_ERR_OOM, -1);

	/* Fill in the header */
	newdb->version = TDB_VERSION;
	newdb->hash_size = hash_size;
	if (tdb->flags & TDB_INTERNAL) {
		tdb->map_size = size;
		tdb->map_ptr = (char *)newdb;
		memcpy(&tdb->header, newdb, sizeof(tdb->header));
		/* Convert the `ondisk' version if asked. */
		CONVERT(*newdb);
		return 0;
	}
	if (lseek(tdb->fd, 0, SEEK_SET) == -1)
		goto fail;

	if (ftruncate(tdb->fd, 0) == -1)
		goto fail;

	/* This creates an endian-converted header, as if read from disk */
	CONVERT(*newdb);
	memcpy(&tdb->header, newdb, sizeof(tdb->header));
	/* Don't endian-convert the magic food! */
	memcpy(newdb->magic_food, TDB_MAGIC_FOOD, strlen(TDB_MAGIC_FOOD)+1);
	if (write(tdb->fd, newdb, size) != size)
		ret = -1;
	else
		ret = 0;

  fail:
	free(newdb);
	return ret;
}

/* Returns 0 on fail.  On success, return offset of record, and fills
   in rec */
static tdb_off tdb_find(TDB_CONTEXT *tdb, TDB_DATA key, u32 hash,
			struct list_struct *r)
{
	tdb_off rec_ptr;
	
	/* read in the hash top */
	if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
		return 0;

	/* keep looking until we find the right record */
	while (rec_ptr) {
		if (rec_read(tdb, rec_ptr, r) == -1)
			return 0;

		if (!TDB_DEAD(r) && hash==r->full_hash && key.dsize==r->key_len) {
			char *k;
			/* a very likely hit - read the key */
			k = tdb_alloc_read(tdb, rec_ptr + sizeof(*r), 
					   r->key_len);
			if (!k)
				return 0;

			if (memcmp(key.dptr, k, key.dsize) == 0) {
				free(k);
				return rec_ptr;
			}
			free(k);
		}
		rec_ptr = r->next;
	}
	return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
}

/* If they do lockkeys, check that this hash is one they locked */
static int tdb_keylocked(TDB_CONTEXT *tdb, u32 hash)
{
	u32 i;
	if (!tdb->lockedkeys)
		return 1;
	for (i = 0; i < tdb->lockedkeys[0]; i++)
		if (tdb->lockedkeys[i+1] == hash)
			return 1;
	return TDB_ERRCODE(TDB_ERR_NOLOCK, 0);
}

/* As tdb_find, but if you succeed, keep the lock */
static tdb_off tdb_find_lock(TDB_CONTEXT *tdb, TDB_DATA key, int locktype,
			     struct list_struct *rec)
{
	u32 hash, rec_ptr;

	hash = tdb_hash(&key);
	if (!tdb_keylocked(tdb, hash))
		return 0;
	if (tdb_lock(tdb, BUCKET(hash), locktype) == -1)
		return 0;
	if (!(rec_ptr = tdb_find(tdb, key, hash, rec)))
		tdb_unlock(tdb, BUCKET(hash), locktype);
	return rec_ptr;
}

enum TDB_ERROR tdb_error(TDB_CONTEXT *tdb)
{
	return tdb->ecode;
}

static struct tdb_errname {
	enum TDB_ERROR ecode; const char *estring;
} emap[] = { {TDB_SUCCESS, "Success"},
	     {TDB_ERR_CORRUPT, "Corrupt database"},
	     {TDB_ERR_IO, "IO Error"},
	     {TDB_ERR_LOCK, "Locking error"},
	     {TDB_ERR_OOM, "Out of memory"},
	     {TDB_ERR_EXISTS, "Record exists"},
	     {TDB_ERR_NOLOCK, "Lock exists on other keys"},
	     {TDB_ERR_NOEXIST, "Record does not exist"} };

/* Error string for the last tdb error */
const char *tdb_errorstr(TDB_CONTEXT *tdb)
{
	u32 i;
	for (i = 0; i < sizeof(emap) / sizeof(struct tdb_errname); i++)
		if (tdb->ecode == emap[i].ecode)
			return emap[i].estring;
	return "Invalid error code";
}

/* update an entry in place - this only works if the new data size
   is <= the old data size and the key exists.
   on failure return -1
*/
static int tdb_update(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf)
{
	struct list_struct rec;
	tdb_off rec_ptr;
	int ret = -1;

	/* find entry */
	if (!(rec_ptr = tdb_find_lock(tdb, key, F_WRLCK, &rec)))
		return -1;

	/* must be long enough key, data and tailer */
	if (rec.rec_len < key.dsize + dbuf.dsize + sizeof(tdb_off)) {
		tdb->ecode = TDB_SUCCESS; /* Not really an error */
		goto out;
	}

	if (tdb_write(tdb, rec_ptr + sizeof(rec) + rec.key_len,
		      dbuf.dptr, dbuf.dsize) == -1)
		goto out;

	if (dbuf.dsize != rec.data_len) {
		/* update size */
		rec.data_len = dbuf.dsize;
		ret = rec_write(tdb, rec_ptr, &rec);
	} else
		ret = 0;
 out:
	tdb_unlock(tdb, BUCKET(rec.full_hash), F_WRLCK);
	return ret;
}

/* find an entry in the database given a key */
TDB_DATA tdb_fetch(TDB_CONTEXT *tdb, TDB_DATA key)
{
	tdb_off rec_ptr;
	struct list_struct rec;
	TDB_DATA ret;

	/* find which hash bucket it is in */
	if (!(rec_ptr = tdb_find_lock(tdb,key,F_RDLCK,&rec)))
		return tdb_null;

	ret.dptr = tdb_alloc_read(tdb, rec_ptr + sizeof(rec) + rec.key_len,
				  rec.data_len);
	ret.dsize = rec.data_len;
	tdb_unlock(tdb, BUCKET(rec.full_hash), F_RDLCK);
	return ret;
}

/* check if an entry in the database exists 

   note that 1 is returned if the key is found and 0 is returned if not found
   this doesn't match the conventions in the rest of this module, but is
   compatible with gdbm
*/
int tdb_exists(TDB_CONTEXT *tdb, TDB_DATA key)
{
	struct list_struct rec;
	
	if (tdb_find_lock(tdb, key, F_RDLCK, &rec) == 0)
		return 0;
	tdb_unlock(tdb, BUCKET(rec.full_hash), F_RDLCK);
	return 1;
}

/* record lock stops delete underneath */
static int lock_record(TDB_CONTEXT *tdb, tdb_off off)
{
	return off ? tdb_brlock(tdb, off, F_RDLCK, F_SETLKW) : 0;
}
/*
  Write locks override our own fcntl readlocks, so check it here.
  Note this is meant to be F_SETLK, *not* F_SETLKW, as it's not
  an error to fail to get the lock here.
*/
 
static int write_lock_record(TDB_CONTEXT *tdb, tdb_off off)
{
	struct tdb_traverse_lock *i;
	for (i = &tdb->travlocks; i; i = i->next)
		if (i->off == off)
			return -1;
	return tdb_brlock(tdb, off, F_WRLCK, F_SETLK);
}

/*
  Note this is meant to be F_SETLK, *not* F_SETLKW, as it's not
  an error to fail to get the lock here.
*/

static int write_unlock_record(TDB_CONTEXT *tdb, tdb_off off)
{
	return tdb_brlock(tdb, off, F_UNLCK, F_SETLK);
}
/* fcntl locks don't stack: avoid unlocking someone else's */
static int unlock_record(TDB_CONTEXT *tdb, tdb_off off)
{
	struct tdb_traverse_lock *i;
	u32 count = 0;

	if (off == 0)
		return 0;
	for (i = &tdb->travlocks; i; i = i->next)
		if (i->off == off)
			count++;
	return (count == 1 ? tdb_brlock(tdb, off, F_UNLCK, F_SETLKW) : 0);
}

/* actually delete an entry in the database given the offset */
static int do_delete(TDB_CONTEXT *tdb, tdb_off rec_ptr, struct list_struct*rec)
{
	tdb_off last_ptr, i;
	struct list_struct lastrec;

	if (tdb->read_only) return -1;

	if (write_lock_record(tdb, rec_ptr) == -1) {
		/* Someone traversing here: mark it as dead */
		rec->magic = TDB_DEAD_MAGIC;
		return rec_write(tdb, rec_ptr, rec);
	}
	write_unlock_record(tdb, rec_ptr);

	/* find previous record in hash chain */
	if (ofs_read(tdb, TDB_HASH_TOP(rec->full_hash), &i) == -1)
		return -1;
	for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next)
		if (rec_read(tdb, i, &lastrec) == -1)
			return -1;

	/* unlink it: next ptr is at start of record. */
	if (last_ptr == 0)
		last_ptr = TDB_HASH_TOP(rec->full_hash);
	if (ofs_write(tdb, last_ptr, &rec->next) == -1)
		return -1;

	/* recover the space */
	if (tdb_free(tdb, rec_ptr, rec) == -1)
		return -1;
	return 0;
}

/* Uses traverse lock: 0 = finish, -1 = error, other = record offset */
static int tdb_next_lock(TDB_CONTEXT *tdb, struct tdb_traverse_lock *tlock,
			 struct list_struct *rec)
{
	int want_next = (tlock->off != 0);

	/* No traversal allows if you've called tdb_lockkeys() */
	if (tdb->lockedkeys)
		return TDB_ERRCODE(TDB_ERR_NOLOCK, -1);

	/* Lock each chain from the start one. */
	for (; tlock->hash < tdb->header.hash_size; tlock->hash++) {
		if (tdb_lock(tdb, tlock->hash, F_WRLCK) == -1)
			return -1;

		/* No previous record?  Start at top of chain. */
		if (!tlock->off) {
			if (ofs_read(tdb, TDB_HASH_TOP(tlock->hash),
				     &tlock->off) == -1)
				goto fail;
		} else {
			/* Otherwise unlock the previous record. */
			unlock_record(tdb, tlock->off);
		}

		if (want_next) {
			/* We have offset of old record: grab next */
			if (rec_read(tdb, tlock->off, rec) == -1)
				goto fail;
			tlock->off = rec->next;
		}

		/* Iterate through chain */
		while( tlock->off) {
			tdb_off current;
			if (rec_read(tdb, tlock->off, rec) == -1)
				goto fail;
			if (!TDB_DEAD(rec)) {
				/* Woohoo: we found one! */
				lock_record(tdb, tlock->off);
				return tlock->off;
			}
			/* Try to clean dead ones from old traverses */
			current = tlock->off;
			tlock->off = rec->next;
			do_delete(tdb, current, rec);
		}
		tdb_unlock(tdb, tlock->hash, F_WRLCK);
		want_next = 0;
	}
	/* We finished iteration without finding anything */
	return TDB_ERRCODE(TDB_SUCCESS, 0);

 fail:
	tlock->off = 0;
	tdb_unlock(tdb, tlock->hash, F_WRLCK);
	return -1;
}

/* traverse the entire database - calling fn(tdb, key, data) on each element.
   return -1 on error or the record count traversed
   if fn is NULL then it is not called
   a non-zero return value from fn() indicates that the traversal should stop
  */
int tdb_traverse(TDB_CONTEXT *tdb, tdb_traverse_func fn, void *state)
{
	TDB_DATA key, dbuf;
	struct list_struct rec;
	struct tdb_traverse_lock tl = { NULL, 0, 0 };
	int ret, count = 0;

	/* This was in the initializaton, above, but the IRIX compiler
	 * did not like it.  crh
	 */
	tl.next = tdb->travlocks.next;

	/* fcntl locks don't stack: beware traverse inside traverse */
	tdb->travlocks.next = &tl;

	/* tdb_next_lock places locks on the record returned, and its chain */
	while ((ret = tdb_next_lock(tdb, &tl, &rec)) > 0) {
		count++;
		/* now read the full record */
		key.dptr = tdb_alloc_read(tdb, tl.off + sizeof(rec), 
					  rec.key_len + rec.data_len);
		if (!key.dptr) {
			tdb_unlock(tdb, tl.hash, F_WRLCK);
			unlock_record(tdb, tl.off);
			tdb->travlocks.next = tl.next;
			return -1;
		}
		key.dsize = rec.key_len;
		dbuf.dptr = key.dptr + rec.key_len;
		dbuf.dsize = rec.data_len;

		/* Drop chain lock, call out */
		tdb_unlock(tdb, tl.hash, F_WRLCK);
		if (fn && fn(tdb, key, dbuf, state)) {
			/* They want us to terminate traversal */
			unlock_record(tdb, tl.off);
			tdb->travlocks.next = tl.next;
			free(key.dptr);
			return count;
		}
		free(key.dptr);
	}
	tdb->travlocks.next = tl.next;
	if (ret < 0)
		return -1;
	else
		return count;
}

/* find the first entry in the database and return its key */
TDB_DATA tdb_firstkey(TDB_CONTEXT *tdb)
{
	TDB_DATA key;
	struct list_struct rec;

	/* release any old lock */
	unlock_record(tdb, tdb->travlocks.off);
	tdb->travlocks.off = tdb->travlocks.hash = 0;

	if (tdb_next_lock(tdb, &tdb->travlocks, &rec) <= 0)
		return tdb_null;
	/* now read the key */
	key.dsize = rec.key_len;
	key.dptr =tdb_alloc_read(tdb,tdb->travlocks.off+sizeof(rec),key.dsize);
	tdb_unlock(tdb, BUCKET(tdb->travlocks.hash), F_WRLCK);
	return key;
}

/* find the next entry in the database, returning its key */
TDB_DATA tdb_nextkey(TDB_CONTEXT *tdb, TDB_DATA oldkey)
{
	u32 oldhash;
	TDB_DATA key = tdb_null;
	struct list_struct rec;
	char *k = NULL;

	/* Is locked key the old key?  If so, traverse will be reliable. */
	if (tdb->travlocks.off) {
		if (tdb_lock(tdb,tdb->travlocks.hash,F_WRLCK))
			return tdb_null;
		if (rec_read(tdb, tdb->travlocks.off, &rec) == -1
		    || !(k = tdb_alloc_read(tdb,tdb->travlocks.off+sizeof(rec),
					    rec.key_len))
		    || memcmp(k, oldkey.dptr, oldkey.dsize) != 0) {
			/* No, it wasn't: unlock it and start from scratch */
			unlock_record(tdb, tdb->travlocks.off);
			tdb_unlock(tdb, tdb->travlocks.hash, F_WRLCK);
			tdb->travlocks.off = 0;
		}

		if (k)
			free(k);
	}

	if (!tdb->travlocks.off) {
		/* No previous element: do normal find, and lock record */
		tdb->travlocks.off = tdb_find_lock(tdb, oldkey, F_WRLCK, &rec);
		if (!tdb->travlocks.off)
			return tdb_null;
		tdb->travlocks.hash = BUCKET(rec.full_hash);
		lock_record(tdb, tdb->travlocks.off);
	}
	oldhash = tdb->travlocks.hash;

	/* Grab next record: locks chain and returned record,
	   unlocks old record */
	if (tdb_next_lock(tdb, &tdb->travlocks, &rec) > 0) {
		key.dsize = rec.key_len;
		key.dptr = tdb_alloc_read(tdb, tdb->travlocks.off+sizeof(rec),
					  key.dsize);
		/* Unlock the chain of this new record */
		tdb_unlock(tdb, tdb->travlocks.hash, F_WRLCK);
	}
	/* Unlock the chain of old record */
	tdb_unlock(tdb, BUCKET(oldhash), F_WRLCK);
	return key;
}

/* delete an entry in the database given a key */
int tdb_delete(TDB_CONTEXT *tdb, TDB_DATA key)
{
	tdb_off rec_ptr;
	struct list_struct rec;
	int ret;

	if (!(rec_ptr = tdb_find_lock(tdb, key, F_WRLCK, &rec)))
		return -1;
	ret = do_delete(tdb, rec_ptr, &rec);
	tdb_unlock(tdb, BUCKET(rec.full_hash), F_WRLCK);
	return ret;
}

/* store an element in the database, replacing any existing element
   with the same key 

   return 0 on success, -1 on failure
*/
int tdb_store(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA dbuf, int flag)
{
	struct list_struct rec;
	u32 hash;
	tdb_off rec_ptr;
	char *p = NULL;
	int ret = 0;

	/* find which hash bucket it is in */
	hash = tdb_hash(&key);
	if (!tdb_keylocked(tdb, hash))
		return -1;
	if (tdb_lock(tdb, BUCKET(hash), F_WRLCK) == -1)
		return -1;

	/* check for it existing, on insert. */
	if (flag == TDB_INSERT) {
		if (tdb_exists(tdb, key)) {
			tdb->ecode = TDB_ERR_EXISTS;
			goto fail;
		}
	} else {
		/* first try in-place update, on modify or replace. */
		if (tdb_update(tdb, key, dbuf) == 0)
			goto out;
		if (flag == TDB_MODIFY && tdb->ecode == TDB_ERR_NOEXIST)
			goto fail;
	}
	/* reset the error code potentially set by the tdb_update() */
	tdb->ecode = TDB_SUCCESS;

	/* delete any existing record - if it doesn't exist we don't
           care.  Doing this first reduces fragmentation, and avoids
           coalescing with `allocated' block before it's updated. */
	if (flag != TDB_INSERT)
		tdb_delete(tdb, key);

	/* Copy key+value *before* allocating free space in case malloc
	   fails and we are left with a dead spot in the tdb. */

	if (!(p = (char *)malloc(key.dsize + dbuf.dsize))) {
		tdb->ecode = TDB_ERR_OOM;
		goto fail;
	}

	memcpy(p, key.dptr, key.dsize);
	memcpy(p+key.dsize, dbuf.dptr, dbuf.dsize);

	/* now we're into insert / modify / replace of a record which
	 * we know could not be optimised by an in-place store (for
	 * various reasons).  */
	if (!(rec_ptr = tdb_allocate(tdb, key.dsize + dbuf.dsize, &rec)))
		goto fail;

	/* Read hash top into next ptr */
	if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec.next) == -1)
		goto fail;

	rec.key_len = key.dsize;
	rec.data_len = dbuf.dsize;
	rec.full_hash = hash;
	rec.magic = TDB_MAGIC;

	/* write out and point the top of the hash chain at it */
	if (rec_write(tdb, rec_ptr, &rec) == -1
	    || tdb_write(tdb, rec_ptr+sizeof(rec), p, key.dsize+dbuf.dsize)==-1
	    || ofs_write(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1) {
	fail:
		/* Need to tdb_unallocate() here */
		ret = -1;
	}
 out:
	if (p)
		free(p); 
	tdb_unlock(tdb, BUCKET(hash), F_WRLCK);
	return ret;
}

static int tdb_already_open(dev_t device,
			    ino_t ino)
{
	TDB_CONTEXT *i;
	
	for (i = tdbs; i; i = i->next) {
		if (i->device == device && i->inode == ino) {
			return 1;
		}
	}

	return 0;
}

/* open the database, creating it if necessary 

   The open_flags and mode are passed straight to the open call on the
   database file. A flags value of O_WRONLY is invalid. The hash size
   is advisory, use zero for a default value.

   Return is NULL on error, in which case errno is also set.  Don't 
   try to call tdb_error or tdb_errname, just do strerror(errno).

   @param name may be NULL for internal databases. */
TDB_CONTEXT *tdb_open(char *name, int hash_size, int tdb_flags,
		      int open_flags, mode_t mode)
{
	TDB_CONTEXT *tdb;
	struct stat st;
	int rev = 0, locked;

	if (!(tdb = calloc(1, sizeof *tdb))) {
		/* Can't log this */
		errno = ENOMEM;
		goto fail;
	}
	tdb->fd = -1;
	tdb->name = NULL;
	tdb->map_ptr = NULL;
	tdb->lockedkeys = NULL;
	tdb->flags = tdb_flags;
	
	if ((open_flags & O_ACCMODE) == O_WRONLY) {
		errno = EINVAL;
		goto fail;
	}
	
	if (hash_size == 0)
		hash_size = DEFAULT_HASH_SIZE;
	if ((open_flags & O_ACCMODE) == O_RDONLY) {
		tdb->read_only = 1;
		/* read only databases don't do locking or clear if first */
		tdb->flags |= TDB_NOLOCK;
		tdb->flags &= ~TDB_CLEAR_IF_FIRST;
	}

	/* internal databases don't mmap or lock, and start off cleared */
	if (tdb->flags & TDB_INTERNAL) {
		tdb->flags |= (TDB_NOLOCK | TDB_NOMMAP);
		tdb->flags &= ~TDB_CLEAR_IF_FIRST;
		tdb_new_database(tdb, hash_size);
		goto internal;
	}

	if ((tdb->fd = open(name, open_flags, mode)) == -1) {
		goto fail;	/* errno set by open(2) */
	}

	/* ensure there is only one process initialising at once */
	if (tdb_brlock(tdb, GLOBAL_LOCK, F_WRLCK, F_SETLKW) == -1) {
		goto fail;	/* errno set by tdb_brlock */
	}

	/* we need to zero database if we are the only one with it open */
	if ((locked = (tdb_brlock(tdb, ACTIVE_LOCK, F_WRLCK, F_SETLK) == 0))
	    && (tdb_flags & TDB_CLEAR_IF_FIRST)) {
		open_flags |= O_CREAT;
		if (ftruncate(tdb->fd, 0) == -1) {
			goto fail; /* errno set by ftruncate */
		}
	}

	if (read(tdb->fd, &tdb->header, sizeof(tdb->header)) != sizeof(tdb->header)
	    || strcmp(tdb->header.magic_food, TDB_MAGIC_FOOD) != 0
	    || (tdb->header.version != TDB_VERSION
		&& !(rev = (tdb->header.version==TDB_BYTEREV(TDB_VERSION))))) {
		/* its not a valid database - possibly initialise it */
		if (!(open_flags & O_CREAT) || tdb_new_database(tdb, hash_size) == -1) {
			errno = EIO; /* ie bad format or something */
			goto fail;
		}
		rev = (tdb->flags & TDB_CONVERT);
	}
	if (!rev)
		tdb->flags &= ~TDB_CONVERT;
	else {
		tdb->flags |= TDB_CONVERT;
		convert(&tdb->header, sizeof(tdb->header));
	}
	if (fstat(tdb->fd, &st) == -1)
		goto fail;

	/* Is it already in the open list?  If so, fail. */
	if (tdb_already_open(st.st_dev, st.st_ino)) {
		errno = EBUSY;
		goto fail;
	}

	if (!(tdb->name = (char *)strdup(name))) {
		errno = ENOMEM;
		goto fail;
	}

	tdb->map_size = st.st_size;
	tdb->device = st.st_dev;
	tdb->inode = st.st_ino;
	tdb->locked = calloc(tdb->header.hash_size+1, sizeof(tdb->locked[0]));
	if (!tdb->locked) {
		errno = ENOMEM;
		goto fail;
	}
	tdb_mmap(tdb);
	if (locked) {
		if (tdb_brlock(tdb, ACTIVE_LOCK, F_UNLCK, F_SETLK) == -1) {
			goto fail;
		}
	}
	/* leave this lock in place to indicate it's in use */
	if (tdb_brlock(tdb, ACTIVE_LOCK, F_RDLCK, F_SETLKW) == -1)
		goto fail;

 internal:
	/* Internal (memory-only) databases skip all the code above to
	 * do with disk files, and resume here by releasing their
	 * global lock and hooking into the active list. */
	if (tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW) == -1)
		goto fail;
	tdb->next = tdbs;
	tdbs = tdb;
	return tdb;

 fail:
	{ int save_errno = errno;

	if (!tdb)
		return NULL;
	
	if (tdb->map_ptr) {
		if (tdb->flags & TDB_INTERNAL)
			free(tdb->map_ptr);
		else
			tdb_munmap(tdb);
	}
	if (tdb->name)
		free(tdb->name);
	if (tdb->fd != -1)
		close(tdb->fd);
	if (tdb->locked)
		free(tdb->locked);
	errno = save_errno;
	return NULL;
	}
}

/* close a database */
int tdb_close(TDB_CONTEXT *tdb)
{
	TDB_CONTEXT **i;
	int ret = 0;

	if (tdb->map_ptr) {
		if (tdb->flags & TDB_INTERNAL)
			free(tdb->map_ptr);
		else
			tdb_munmap(tdb);
	}
	if (tdb->name)
		free(tdb->name);
	if (tdb->fd != -1)
		ret = close(tdb->fd);
	if (tdb->locked)
		free(tdb->locked);
	if (tdb->lockedkeys)
		free(tdb->lockedkeys);

	/* Remove from contexts list */
	for (i = &tdbs; *i; i = &(*i)->next) {
		if (*i == tdb) {
			*i = tdb->next;
			break;
		}
	}

	memset(tdb, 0, sizeof(*tdb));
	free(tdb);

	return ret;
}

/* lock/unlock entire database */
int tdb_lockall(TDB_CONTEXT *tdb)
{
	u32 i;

	/* There are no locks on read-only dbs */
	if (tdb->read_only)
		return TDB_ERRCODE(TDB_ERR_LOCK, -1);
	if (tdb->lockedkeys)
		return TDB_ERRCODE(TDB_ERR_NOLOCK, -1);
	for (i = 0; i < tdb->header.hash_size; i++) 
		if (tdb_lock(tdb, i, F_WRLCK))
			break;

	/* If error, release locks we have... */
	if (i < tdb->header.hash_size) {
		u32 j;

		for ( j = 0; j < i; j++)
			tdb_unlock(tdb, j, F_WRLCK);
		return TDB_ERRCODE(TDB_ERR_NOLOCK, -1);
	}

	return 0;
}
void tdb_unlockall(TDB_CONTEXT *tdb)
{
	u32 i;
	for (i=0; i < tdb->header.hash_size; i++)
		tdb_unlock(tdb, i, F_WRLCK);
}

int tdb_lockkeys(TDB_CONTEXT *tdb, u32 number, TDB_DATA keys[])
{
	u32 i, j, hash;

	/* Can't lock more keys if already locked */
	if (tdb->lockedkeys)
		return TDB_ERRCODE(TDB_ERR_NOLOCK, -1);
	if (!(tdb->lockedkeys = malloc(sizeof(u32) * (number+1))))
		return TDB_ERRCODE(TDB_ERR_OOM, -1);
	/* First number in array is # keys */
	tdb->lockedkeys[0] = number;

	/* Insertion sort by bucket */
	for (i = 0; i < number; i++) {
		hash = tdb_hash(&keys[i]);
		for (j = 0; j < i && BUCKET(tdb->lockedkeys[j+1]) < BUCKET(hash); j++);
			memmove(&tdb->lockedkeys[j+2], &tdb->lockedkeys[j+1], sizeof(u32) * (i-j));
		tdb->lockedkeys[j+1] = hash;
	}
	/* Finally, lock in order */
	for (i = 0; i < number; i++)
		if (tdb_lock(tdb, i, F_WRLCK))
			break;

	/* If error, release locks we have... */
	if (i < number) {
		for ( j = 0; j < i; j++)
			tdb_unlock(tdb, j, F_WRLCK);
		free(tdb->lockedkeys);
		tdb->lockedkeys = NULL;
		return TDB_ERRCODE(TDB_ERR_NOLOCK, -1);
	}
	return 0;
}

/* Unlock the keys previously locked by tdb_lockkeys() */
void tdb_unlockkeys(TDB_CONTEXT *tdb)
{
	u32 i;
	for (i = 0; i < tdb->lockedkeys[0]; i++)
		tdb_unlock(tdb, tdb->lockedkeys[i+1], F_WRLCK);
	free(tdb->lockedkeys);
	tdb->lockedkeys = NULL;
}

/* lock/unlock one hash chain. This is meant to be used to reduce
   contention - it cannot guarantee how many records will be locked */
int tdb_chainlock(TDB_CONTEXT *tdb, TDB_DATA key)
{
	return tdb_lock(tdb, BUCKET(tdb_hash(&key)), F_WRLCK);
}
void tdb_chainunlock(TDB_CONTEXT *tdb, TDB_DATA key)
{
	tdb_unlock(tdb, BUCKET(tdb_hash(&key)), F_WRLCK);
}

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  1:41 Rusty Russell
  2002-02-25  1:58 ` your mail Alexander Viro
@ 2002-02-25 13:16 ` Alan Cox
  1 sibling, 0 replies; 657+ messages in thread
From: Alan Cox @ 2002-02-25 13:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, mingo, Matthew Kirkwood, Benjamin LaHaise,
	David Axmark, William Lee Irwin III, linux-kernel

> > 	fd = sem_initialize();
> > 	mmap(fd, ...)
> > 	..
> > 	munmap(..)
> > 
> > which gives you a handle for the semaphore.
> 
> No no no!  Implemented exactly that (and posted to l-k IIRC), and it's
> *horrible* to use.

All Linus forgot was to sem_initialize("filename"); With that the rest
comes out for free.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  2:14   ` Rusty Russell
  2002-02-25  3:18     ` Davide Libenzi
@ 2002-02-25  4:02     ` Alexander Viro
  2002-02-26  5:50       ` Rusty Russell
  1 sibling, 1 reply; 657+ messages in thread
From: Alexander Viro @ 2002-02-25  4:02 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, mingo, Matthew Kirkwood, Benjamin LaHaise,
	David Axmark, William Lee Irwin III, linux-kernel



On Mon, 25 Feb 2002, Rusty Russell wrote:

> In message <Pine.GSO.4.21.0202242054410.1329-100000@weyl.math.psu.edu> you writ
> e:
> > 
> > 
> > On Mon, 25 Feb 2002, Rusty Russell wrote:
> > > First, fd passing sucks: you can't leave an fd somewhere and wait for
> > > someone to pick it up, and they vanish when you exit.  Secondly, you
> > 
> > Yes, you can.  Please, RTFS - what is passed is not a descriptor, it's
> > struct file *.  As soon as datagram is sent, descriptors are resolved and
> > after that point descriptor table of sender (or, for that matter, survival
> > of sender) doesn't matter.
> 
> Please explain how I leave a fd somewhere for other processes to grab
> it.  
> 
> And then please explain how they get the fd after I've exited.
> 
> Al, you are one of the most unpleasant people to deal with on this
> list.  This is *not* an honor, and I beg you to consider a different
> approach in future correspondence.

Honour or not, in this case your complaint is hardly deserved.  To
compress the above a bit:

you: <false statement>
me: RTFS.  <short description of the reasons why statement is wrong; further
details could be obtained by reading TFS>

As for your question, SCM_RIGHTS datagram can easily outlive the sending
process.  You will need a helper process (either per-meeting point or
system-wide) to avoid GC killing the thing, but that's it.

Writing such helper is left as an exercise to reader - it _is_ trivial.
To put fd(s):
	connect to (name of AF_UNIX socket)
	sendmsg to it; no OOB data, one byte of data (non-0)
	form an SCM_RIGHTS datagram with fds in question
	sendmsg it to the same socket.
	close the socket
In helper:
	listen on (name)
repeat:
	accept connection
	read one byte
	if it's non-zero
		put fd of connection into a list
		goto repeat
	else
		take first fd from list
		form an SCM_RIGHTS datagram with that fd
		send it into the new connection
		close fd
		close connection
		goto repeat
To get fd(s):
	connect ....
	sendmsg .................................... (0)
	recvmsg and pick fd from the message
	close connection
	recvmsg from fd and pick the set of fds from the message
	close fd

End of story.  In real-life situation you will want to throttle in helper,
etc., but in any case main loop is ~20 lines of code.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  2:14   ` Rusty Russell
@ 2002-02-25  3:18     ` Davide Libenzi
  2002-02-25  4:02     ` Alexander Viro
  1 sibling, 0 replies; 657+ messages in thread
From: Davide Libenzi @ 2002-02-25  3:18 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Alexander Viro, Linus Torvalds, mingo, Matthew Kirkwood,
	Benjamin LaHaise, David Axmark, William Lee Irwin III,
	linux-kernel

On Mon, 25 Feb 2002, Rusty Russell wrote:

> In message <Pine.GSO.4.21.0202242054410.1329-100000@weyl.math.psu.edu> you writ
> e:
> >
> >
> > On Mon, 25 Feb 2002, Rusty Russell wrote:
> > > First, fd passing sucks: you can't leave an fd somewhere and wait for
> > > someone to pick it up, and they vanish when you exit.  Secondly, you
> >
> > Yes, you can.  Please, RTFS - what is passed is not a descriptor, it's
> > struct file *.  As soon as datagram is sent, descriptors are resolved and
> > after that point descriptor table of sender (or, for that matter, survival
> > of sender) doesn't matter.
>
> Please explain how I leave a fd somewhere for other processes to grab
> it.
>
> And then please explain how they get the fd after I've exited.
>
> Al, you are one of the most unpleasant people to deal with on this
> list.  This is *not* an honor, and I beg you to consider a different
> approach in future correspondence.

Actually, this is one of Al's nicest posts :-)
You obviously can't share fd# but you can share file*
I don't know how you're going to have these semaphores 'externally visible',
if with numbers like IPC sems or if with pathnames like unix sockets ( or
something else ). But you can have internally a number/path/else -> file*
mapping and when a task attaches the sem you map the file* onto an fd# in
the task's file table. If you keep this mapping persistent ( until
explicit deletion ) the file* remain alive event with zero attached
processes. I think it's this what Al was trying to say.

- Davide

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  1:58 ` your mail Alexander Viro
@ 2002-02-25  2:14   ` Rusty Russell
  2002-02-25  3:18     ` Davide Libenzi
  2002-02-25  4:02     ` Alexander Viro
  0 siblings, 2 replies; 657+ messages in thread
From: Rusty Russell @ 2002-02-25  2:14 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, mingo, Matthew Kirkwood, Benjamin LaHaise,
	David Axmark, William Lee Irwin III, linux-kernel

In message <Pine.GSO.4.21.0202242054410.1329-100000@weyl.math.psu.edu> you writ
e:
> 
> 
> On Mon, 25 Feb 2002, Rusty Russell wrote:
> > First, fd passing sucks: you can't leave an fd somewhere and wait for
> > someone to pick it up, and they vanish when you exit.  Secondly, you
> 
> Yes, you can.  Please, RTFS - what is passed is not a descriptor, it's
> struct file *.  As soon as datagram is sent, descriptors are resolved and
> after that point descriptor table of sender (or, for that matter, survival
> of sender) doesn't matter.

Please explain how I leave a fd somewhere for other processes to grab
it.  

And then please explain how they get the fd after I've exited.

Al, you are one of the most unpleasant people to deal with on this
list.  This is *not* an honor, and I beg you to consider a different
approach in future correspondence.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-02-25  1:41 Rusty Russell
@ 2002-02-25  1:58 ` Alexander Viro
  2002-02-25  2:14   ` Rusty Russell
  2002-02-25 13:16 ` Alan Cox
  1 sibling, 1 reply; 657+ messages in thread
From: Alexander Viro @ 2002-02-25  1:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, mingo, Matthew Kirkwood, Benjamin LaHaise,
	David Axmark, William Lee Irwin III, linux-kernel



On Mon, 25 Feb 2002, Rusty Russell wrote:

> > Note that getting a file descriptor is really quite useful - it means that
> > you can pass the file descriptor around through unix domain sockets, for
> > example, and allow sharing of the semaphore across unrelated processes
> > that way.
> 
> First, fd passing sucks: you can't leave an fd somewhere and wait for
> someone to pick it up, and they vanish when you exit.  Secondly, you

Yes, you can.  Please, RTFS - what is passed is not a descriptor, it's
struct file *.  As soon as datagram is sent, descriptors are resolved and
after that point descriptor table of sender (or, for that matter, survival
of sender) doesn't matter.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-01-30 18:21 Nickolaos Fotopoulos
  2002-01-30 18:57 ` your mail Matti Aarnio
@ 2002-01-31  1:50 ` Drew P. Vogel
  1 sibling, 0 replies; 657+ messages in thread
From: Drew P. Vogel @ 2002-01-31  1:50 UTC (permalink / raw)
  To: Nickolaos Fotopoulos; +Cc: Linux kernel list (E-mail)

Personally, when I'm getting a few hundred emails per day, I don't even
notice the 5% spam.

--Drew Vogel

On Wed, 30 Jan 2002, Nickolaos Fotopoulos wrote:

>I'm new to this list.  Does it get spammed often, like this guy
>(grumph@pakistanmail.com) is doing?  It is allready becoming quite anouying!
>This is by far the busiest list I have ever subscribed to, and there does
>not seem to be any sort of spam blocker working here.  I thought Majodomo
>had stuff like this built in?  If not maybe a list moderator could address
>this.
>				Nick Fotopoulos
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>




^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-01-30 18:21 Nickolaos Fotopoulos
@ 2002-01-30 18:57 ` Matti Aarnio
  2002-01-31  1:50 ` Drew P. Vogel
  1 sibling, 0 replies; 657+ messages in thread
From: Matti Aarnio @ 2002-01-30 18:57 UTC (permalink / raw)
  To: Nickolaos Fotopoulos; +Cc: Linux kernel list (E-mail)

On Wed, Jan 30, 2002 at 01:21:17PM -0500, Nickolaos Fotopoulos wrote:
> I'm new to this list.  Does it get spammed often, like this guy
> (grumph@pakistanmail.com) is doing?  It is allready becoming quite anouying!

  I already asked about the phenomena, and the guy(?) replied that
  he won't use that system anymore as it is doing those repeated
  sends all by itself.

> This is by far the busiest list I have ever subscribed to, and there does
> not seem to be any sort of spam blocker working here.  I thought Majodomo
> had stuff like this built in?  If not maybe a list moderator could address
> this.

  http://vger.kernel.org/majordomo-info.html

  Trust me, there is HEAVY filtering.
  Still some spams do get thru.

> 				Nick Fotopoulos

/Matti Aarnio

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2002-01-09 17:49 Michael Zhu
@ 2002-01-09 18:17 ` Jens Axboe
  0 siblings, 0 replies; 657+ messages in thread
From: Jens Axboe @ 2002-01-09 18:17 UTC (permalink / raw)
  To: Michael Zhu; +Cc: root, linux-kernel

On Wed, Jan 09 2002, Michael Zhu wrote:
> > 
> > This may be a troll. How would you boot? Who
> decrypts during the
> > boot?
> > 
> 
> You mean that the loop device couldn't en/decrypt the
> whole data on the disk? That mean the loop device
> could implement the block level en/decryption.

Please, read up on the loop crypto stuff off-list. Most of these
questions are very FAQ. You can loop crypto a whole disk or partition of
you want.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-27 18:55     ` Linus Torvalds
  2001-12-27 19:41       ` Andrew Morton
@ 2001-12-28 22:14       ` Martin Dalecki
  1 sibling, 0 replies; 657+ messages in thread
From: Martin Dalecki @ 2001-12-28 22:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andre Hedrick, Keith Owens, kbuild-devel, linux-kernel

Linus Torvalds wrote:

>(Right now you can see this in block_ioctl.c - while only a few of the
>ioctl's have been converted, you get the idea. I'm actually surprised that
>nobody seems to have commented on that part).
>

That was just too obvious, at least for me... However I don't see why 
you just don't start killing of constructs like:

swtch  (ioctrl)

    BLASH:
BLAHHH:
 BLASHH:
 BLAASS:
     BLAH:
    default:
            return -ENOVAL;
}

There are ton' s of them out there in the block drivers..


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-27 18:55     ` Linus Torvalds
@ 2001-12-27 19:41       ` Andrew Morton
  2001-12-28 22:14       ` Martin Dalecki
  1 sibling, 0 replies; 657+ messages in thread
From: Andrew Morton @ 2001-12-27 19:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andre Hedrick, Keith Owens, kbuild-devel, linux-kernel

Linus Torvalds wrote:
> 
> The other part of the bio rewrite has been to get rid of another coupling:
> the coupling between "struct buffer_head" (which is used for a limited
> kind of memory management by a number of filesystems) and the act of
> actually just doing IO.
> 
> I used to think that we could just relegate "struct buffer_head" to _be_
> the IO entity, but it turns out to be much easier to just split off the IO
> part, which is why you now have a separate "bio" structure for the block
> IO part, and the buffer_head stuff uses that to get the work done.
> 

So... would it be correct to say that there won't be any large
changes to the buffer_head concept in 2.5?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-27 18:09   ` Andre Hedrick
@ 2001-12-27 18:55     ` Linus Torvalds
  2001-12-27 19:41       ` Andrew Morton
  2001-12-28 22:14       ` Martin Dalecki
  0 siblings, 2 replies; 657+ messages in thread
From: Linus Torvalds @ 2001-12-27 18:55 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Keith Owens, kbuild-devel, linux-kernel

On Thu, 27 Dec 2001, Andre Hedrick wrote:
>
> Lots of luck ... please pass your crack pipe arounds so the rest of us
> idiots can see your vision or lack of ...

Heh. I think I must have passed it on to you long ago, and you never gave
it back, you sneaky bastard ;)

The vision, btw, is to get the request layer in good enough shape that we
can dispense with the mid-layer approaches of SCSI/IDE, and block devices
turn into _just_ device drivers.

For example, ide-scsi is heading for that big scrap-yard in the sky: it's
not the SCSI layer that handles special ioctl requests any more, because
the upper layers are going to be flexible enough that you can just pass
the requests down the regular pipe.

(Right now you can see this in block_ioctl.c - while only a few of the
ioctl's have been converted, you get the idea. I'm actually surprised that
nobody seems to have commented on that part).

The final end result of this (I sincerely hope) is that we can get rid of
some of the couplings that we've had in the block layer. ide-scsi is just
the most obvious strange coupling - things like "sg.c" in general are
rather horrible. There's very little _SCSI_ in sg.c - it's really about
sending commands down to the block devices.

The reason I want to get rid of the couplings is that they end up being
big anchors holding down development: you can create a clean driver that
isn't dependent on the SCSI layer overheads (and people do, for things
like DAC etc), but when you do that you lose _all_ of the support
infrastructure, not just the bloat. Which is sad.

(And which is why things like ide-scsi exist - IDE didn't really want to
be a SCSI driver, but people _did_ want to be able to use some of the
generic support routines that the SCSI layer offers. You couldn't just
cherry-pick the parts you wanted).

The other part of the bio rewrite has been to get rid of another coupling:
the coupling between "struct buffer_head" (which is used for a limited
kind of memory management by a number of filesystems) and the act of
actually just doing IO.

I used to think that we could just relegate "struct buffer_head" to _be_
the IO entity, but it turns out to be much easier to just split off the IO
part, which is why you now have a separate "bio" structure for the block
IO part, and the buffer_head stuff uses that to get the work done.

Andre, I know that you're worried about the low-level drivers, but:

 - I've long since noticed that we cannot communicate, which is why Jens
   is the block level driver person. You'll have to live with it.

 - I personally don't think you _can_ make a good driver without having
   reasonable interfaces, and we didn't have them.

   For example, the network drivers have improved a lot and do not have
   _nearly_ the amount of problems block drivers have. That's obviously
   partly just because it is a simpler problem, but because it was simpler
   it was also possible to change them. The infrastructure changes in the
   networking during 2.3.x really did help drivers.

And note that the "Jens" and "communication" part is important. If you
have patches, please talk to Jens, tell him what the issues, are, and I
know I can communicate with him.

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-07  5:10 ` your mail Linus Torvalds
@ 2001-12-27 18:09   ` Andre Hedrick
  2001-12-27 18:55     ` Linus Torvalds
  0 siblings, 1 reply; 657+ messages in thread
From: Andre Hedrick @ 2001-12-27 18:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Keith Owens, kbuild-devel, linux-kernel

On Thu, 6 Dec 2001, Linus Torvalds wrote:

> 
> On Fri, 7 Dec 2001, Keith Owens wrote:
> >
> > Linus, the time has come to convert the 2.5 kernel to kbuild 2.5.
> 
> We're getting the block IO layer in shape first, the time has not come for
> _anything_ else before that.
> 
> 		Linus

Lots of luck ... please pass your crack pipe arounds so the rest of us
idiots can see your vision or lack of ...

Regards,

Andre Hedrick
CEO/President, LAD Storage Consulting Group
Linux ATA Development
Linux Disk Certification Project


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 16:46 ` Gérard Roudier
  2001-12-14 20:09   ` Jens Axboe
@ 2001-12-18  0:34   ` Kirk Alexander
  1 sibling, 0 replies; 657+ messages in thread
From: Kirk Alexander @ 2001-12-18  0:34 UTC (permalink / raw)
  To:  Gérard_Roudier ; +Cc: Jens Axboe, linux-kernel

 --- Gérard_Roudier <groudier@free.fr> wrote: > 
> 
[snip]
> 
> You may let me know if sym53c8xx_2 still works with 810 rev 2.
> 



I tried the sym53c8xx_2 driver, put a fair load on the system (lots of sync'ing
and swapping) and didn't seem to have any trouble.

Cheers,
 Kirk Alexander



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 18:42     ` Sebastian Dröge
@ 2001-12-17 18:43       ` Dave Jones
  0 siblings, 0 replies; 657+ messages in thread
From: Dave Jones @ 2001-12-17 18:43 UTC (permalink / raw)
  To: Sebastian Dröge; +Cc: linux-kernel

On Mon, 17 Dec 2001, Sebastian Dröge wrote:

> So I removed the apic.c hunk
> I think you meant that ;)

*nod*

> Anyway this doesn't solve the problem :(

Ok, this isn't urgent anyway, I'll get around to cleaning that
up later. Thanks for your help tracing this.

Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 17:23   ` Sebastian Dröge
  2001-12-17 17:25     ` Dave Jones
@ 2001-12-17 18:42     ` Sebastian Dröge
  2001-12-17 18:43       ` Dave Jones
  1 sibling, 1 reply; 657+ messages in thread
From: Sebastian Dröge @ 2001-12-17 18:42 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 685 bytes --]

Hmm...
I don't see anything about ioapic.c in the patch...
So I removed the apic.c hunk
I think you meant that ;)
Anyway this doesn't solve the problem :(

Bye

On Mon, 17 Dec 2001 18:25:37 +0100 (CET)
Dave Jones <davej@suse.de> wrote:

> On Mon, 17 Dec 2001, Sebastian Dröge wrote:
> 
> > Thanks
> > This does work
> 
> Great, now can you edit the patch to remove the ioapic.c hunk,
> reapply, and see if that works..
> 
> > What do you think was exactly the problem?
> 
> looks like I dorked the apic init...
> I'll back that bit out for -dj2, until I've given
> it a bit more work.
> 
> regards,
> Dave.
> 
> -- 
> | Dave Jones.        http://www.codemonkey.org.uk
> | SuSE Labs
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 17:23   ` Sebastian Dröge
@ 2001-12-17 17:25     ` Dave Jones
  2001-12-17 18:42     ` Sebastian Dröge
  1 sibling, 0 replies; 657+ messages in thread
From: Dave Jones @ 2001-12-17 17:25 UTC (permalink / raw)
  To: Sebastian Dröge; +Cc: linux-kernel

On Mon, 17 Dec 2001, Sebastian Dröge wrote:

> Thanks
> This does work

Great, now can you edit the patch to remove the ioapic.c hunk,
reapply, and see if that works..

> What do you think was exactly the problem?

looks like I dorked the apic init...
I'll back that bit out for -dj2, until I've given
it a bit more work.

regards,
Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 16:52 ` Sebastian Dröge
  2001-12-17 16:55   ` Arnaldo Carvalho de Melo
@ 2001-12-17 17:23   ` Sebastian Dröge
  2001-12-17 17:25     ` Dave Jones
  2001-12-17 18:42     ` Sebastian Dröge
  1 sibling, 2 replies; 657+ messages in thread
From: Sebastian Dröge @ 2001-12-17 17:23 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

Thanks
This does work
What do you think was exactly the problem?

Bye

On Mon, 17 Dec 2001 17:57:01 +0100 (CET)
Dave Jones <davej@suse.de> wrote:

> On Mon, 17 Dec 2001, Sebastian Dröge wrote:
> 
> > 2.4.16-2.4.17-rc1 works perfectly
> > 2.5.0-2.5.1 works perfectly
> > Only 2.5.1-dj1 has this 2 errors (ISA-PnP non-detection and USB only root hub detection)
> > All have the same .config
> > If you need some more information feel free to ask me ;)
> 
> Ok, can you try backing out this patch.. (just patch as normal but with -R)
> http://www.codemonkey.org.uk/patches/2.5/small-bits/early-cpuinit-1.diff
> 
> regards,
> Dave.
> 
> -- 
> | Dave Jones.        http://www.codemonkey.org.uk
> | SuSE Labs
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 16:52 ` Sebastian Dröge
@ 2001-12-17 16:55   ` Arnaldo Carvalho de Melo
  2001-12-17 17:23   ` Sebastian Dröge
  1 sibling, 0 replies; 657+ messages in thread
From: Arnaldo Carvalho de Melo @ 2001-12-17 16:55 UTC (permalink / raw)
  To: Sebastian, =?iso-8859-1?Q?Dr=F6ge_=3Csebastian=2Edroege=40gmx=2Ede=3E?=
  Cc: Dave Jones, linux-kernel, torvalds

Em Mon, Dec 17, 2001 at 05:52:06PM +0100, Sebastian Dröge escreveu:
> PS: 2.5.1 (dj1 or not ;) has one problem more on my pc:
> INIT can't send the TERM signal to all processes...

see the kill(-1,sig) thread...

> Nothing happens... no error message no nothing
> SysRQ works
> I don't know when it went into 2.5 but I think it wasn't there in -pre10 (don't try -pre11)
> PPS: What the hell is APIC (no I don't mean ACPI)? ;) I've enabled it on my UP machine but don't know what it does...
> Does anyone have informations about it?

Advanced Programmable Interrupt Controller, found in SMP machines and in
some UP ones, for UP its shouldn't be enabled in most cases.

- Arnaldo

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 16:07 Sebastian Dröge
  2001-12-17 16:22 ` your mail Dave Jones
@ 2001-12-17 16:52 ` Sebastian Dröge
  2001-12-17 16:55   ` Arnaldo Carvalho de Melo
  2001-12-17 17:23   ` Sebastian Dröge
  1 sibling, 2 replies; 657+ messages in thread
From: Sebastian Dröge @ 2001-12-17 16:52 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel, torvalds

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

Ok...
2.4.16-2.4.17-rc1 works perfectly
2.5.0-2.5.1 works perfectly
Only 2.5.1-dj1 has this 2 errors (ISA-PnP non-detection and USB only root hub detection)
All have the same .config
If you need some more information feel free to ask me ;)

Bye

PS: 2.5.1 (dj1 or not ;) has one problem more on my pc:
INIT can't send the TERM signal to all processes...
Nothing happens... no error message no nothing
SysRQ works
I don't know when it went into 2.5 but I think it wasn't there in -pre10 (don't try -pre11)
PPS: What the hell is APIC (no I don't mean ACPI)? ;) I've enabled it on my UP machine but don't know what it does...
Does anyone have informations about it?

On Mon, 17 Dec 2001 17:22:14 +0100 (CET)
Dave Jones <davej@suse.de> wrote:

> On Mon, 17 Dec 2001, Sebastian Dröge wrote:
> 
> > Attached you find my .config, lspci -vvv and dmesg output
> > I'll test 2.4.17-rc1 in a few minutes and will report what happens ;)
> 
> Thanks. Right now getting 2.4 into a better shape is more
> important than fixing 2.5, so if you find any problems repeatable
> in 2.4.17rc1, Marcelo really needs to know about it.
> 
> The only USB changes in my tree are __devinit_p changes, which
> really shouldn't be causing a problem, but there could be some
> other unrelated-to-usb patch which is causing this..
> 
> 2.4 info would be appreciated.
> 
> Dave.
> 
> -- 
> | Dave Jones.        http://www.codemonkey.org.uk
> | SuSE Labs
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-17 16:07 Sebastian Dröge
@ 2001-12-17 16:22 ` Dave Jones
  2001-12-17 16:52 ` Sebastian Dröge
  1 sibling, 0 replies; 657+ messages in thread
From: Dave Jones @ 2001-12-17 16:22 UTC (permalink / raw)
  To: Sebastian Dröge; +Cc: Linux Kernel Mailing List

On Mon, 17 Dec 2001, Sebastian Dröge wrote:

> Attached you find my .config, lspci -vvv and dmesg output
> I'll test 2.4.17-rc1 in a few minutes and will report what happens ;)

Thanks. Right now getting 2.4 into a better shape is more
important than fixing 2.5, so if you find any problems repeatable
in 2.4.17rc1, Marcelo really needs to know about it.

The only USB changes in my tree are __devinit_p changes, which
really shouldn't be causing a problem, but there could be some
other unrelated-to-usb patch which is causing this..

2.4 info would be appreciated.

Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-15  0:56 ` Stephan von Krawczynski
@ 2001-12-15  6:59   ` Gérard Roudier
  0 siblings, 0 replies; 657+ messages in thread
From: Gérard Roudier @ 2001-12-15  6:59 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: kirkalx, axboe, linux-kernel



On Sat, 15 Dec 2001, Stephan von Krawczynski wrote:

> On Fri, 14 Dec 2001 17:46:37 +0100 (CET)
> Gérard Roudier <groudier@free.fr> wrote:
>
> > > My system is a clunky old Digital Pentium Pro with a
> > > NCR53c810 rev 2 scsi controller, so it can't use the
> > > sym driver.
> >
> > Use sym53c8xx_2 instead. This one uses 2 different firmwares,
> > [...]
> > You may let me know if sym53c8xx_2 still works with 810 rev 2.
>
> On my system it does. I have it as a second controller and am using sym-2
> without troubles.

Thanks for your report.

  Gérard.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-15  0:54           ` Peter Bornemann
@ 2001-12-15  6:57             ` Gérard Roudier
  0 siblings, 0 replies; 657+ messages in thread
From: Gérard Roudier @ 2001-12-15  6:57 UTC (permalink / raw)
  To: Peter Bornemann; +Cc: Jens Axboe, Kirk Alexander, linux-kernel



On Sat, 15 Dec 2001, Peter Bornemann wrote:

> On Fri, 14 Dec 2001, [ISO-8859-1] Gérard Roudier wrote:
>
> >
> >
> > On Fri, 14 Dec 2001, Peter Bornemann wrote:
> > > Ahemm -- well,
> > > maybe I'm the first one. I have a symbios card, which is recognized by
> > > lspci:  SCSI storage controller: LSI Logic Corp. / Symbios Logic Inc.
> > > (formerly NCR) 53c810 (rev 23).
> > Could you, please,  report me more accurate information.
> > TIA,
> >
>
> Well, it seems I made my intention not very clear: I do not want You to
> fix something in the driver, I just wanted from You to leave the old
> ncr-driver in the kernel, just for the situation of a first install. I
> think no newbie with little knowledge will be able to install Linux (or,
> maybe, FreeBSD), when he happens to own such an controller. First, he
> won't be able to read very much on the screen, for the loop runs much too
> fast and second, he will not understand when he reads something about a
> sym53c8xx. Exactly for this case I think the old driver should be left in.
> If You want, You can tell him "Attention! Use of this driver deprecated.
> Contact Your support." or whatever seems appropriate. It is just about the
> first step to linuxland :-)
>
> Hope I managed to make myself clear tris time

I have limited time and am very bad in politics. I do prefer to have to
deal with accurate technical issues. My english is also limited to this
field.

You would have been clear if you had reported:

1) Which of the mpar= spar= and/or corresponding compiled-in options made
   your broken hardware works (with high risks of silent corruption).

2) If using the corresponding compiled-in option worked with sym-2.

3) Optionnaly, relevant messages printed by sym-2, even taken by hand,
   when the problem occurs.

+ any other pertinent information I cannot guess about you thought can
  help.

About FreeBSD, the only information I have is that the (sad) work-around I
implemented and that is incorporated in sym-2 _did_ work around the
problem of PCI parity error for people that did report results.

Could you be clear, as expected by technical group of discussions, please.

  Gérard.

PS: The ncr53c8xx may just work since it does trust POST software to
    enable PCI parity checking bit in PCI config space. But it seems that
    most POST shits donnot do so, leaving systems with the risk of silent
    data corruption in contradiction with either PCI specifications and user
    expectation.


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20011214041151.91557.qmail@web14904.mail.yahoo.com>
  2001-12-14 16:46 ` Gérard Roudier
  2001-12-14 20:34 ` Jens Axboe
@ 2001-12-15  0:56 ` Stephan von Krawczynski
  2001-12-15  6:59   ` Gérard Roudier
  2 siblings, 1 reply; 657+ messages in thread
From: Stephan von Krawczynski @ 2001-12-15  0:56 UTC (permalink / raw)
  To: G?rard Roudier; +Cc: kirkalx, axboe, linux-kernel

On Fri, 14 Dec 2001 17:46:37 +0100 (CET)
Gérard Roudier <groudier@free.fr> wrote:

> > My system is a clunky old Digital Pentium Pro with a
> > NCR53c810 rev 2 scsi controller, so it can't use the
> > sym driver.
> 
> Use sym53c8xx_2 instead. This one uses 2 different firmwares,
> [...]
> You may let me know if sym53c8xx_2 still works with 810 rev 2.

On my system it does. I have it as a second controller and am using sym-2
without troubles.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 20:16         ` Gérard Roudier
@ 2001-12-15  0:54           ` Peter Bornemann
  2001-12-15  6:57             ` Gérard Roudier
  0 siblings, 1 reply; 657+ messages in thread
From: Peter Bornemann @ 2001-12-15  0:54 UTC (permalink / raw)
  To: Gérard Roudier
  Cc: Peter Bornemann, Jens Axboe, Kirk Alexander, linux-kernel

On Fri, 14 Dec 2001, [ISO-8859-1] Gérard Roudier wrote:

>
>
> On Fri, 14 Dec 2001, Peter Bornemann wrote:
> > Ahemm -- well,
> > maybe I'm the first one. I have a symbios card, which is recognized by
> > lspci:  SCSI storage controller: LSI Logic Corp. / Symbios Logic Inc.
> > (formerly NCR) 53c810 (rev 23).
> Could you, please,  report me more accurate information.
> TIA,
>

Well, it seems I made my intention not very clear: I do not want You to
fix something in the driver, I just wanted from You to leave the old
ncr-driver in the kernel, just for the situation of a first install. I
think no newbie with little knowledge will be able to install Linux (or,
maybe, FreeBSD), when he happens to own such an controller. First, he
won't be able to read very much on the screen, for the loop runs much too
fast and second, he will not understand when he reads something about a
sym53c8xx. Exactly for this case I think the old driver should be left in.
If You want, You can tell him "Attention! Use of this driver deprecated.
Contact Your support." or whatever seems appropriate. It is just about the
first step to linuxland :-)

Hope I managed to make myself clear tris time

Peter B

          .         .
          |\_-^^^-_/|
          / (|)_(|) \
         ( === X === )
          \  ._|_.  /
           ^-_   _-^
              °°°

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 18:05     ` Gérard Roudier
@ 2001-12-14 22:26       ` Peter Bornemann
  2001-12-14 20:16         ` Gérard Roudier
  0 siblings, 1 reply; 657+ messages in thread
From: Peter Bornemann @ 2001-12-14 22:26 UTC (permalink / raw)
  To: Gérard Roudier; +Cc: Jens Axboe, Kirk Alexander, linux-kernel

On Fri, 14 Dec 2001, [ISO-8859-1] Gérard Roudier wrote:
> By the way, for now, I haven't received any report about sym-2 failing
> when sym-1 or ncr succeeds, and my feeling is that this could well be very
> unlikely.
>

Ahemm -- well,
maybe I'm the first one. I have a symbios card, which is recognized by
lspci:  SCSI storage controller: LSI Logic Corp. / Symbios Logic Inc.
(formerly NCR) 53c810 (rev 23).

This card goes into an endless loop during parity-checking. So tried to
disable it for the new sym53cxx in modules.conf:
options sym53c8xx mpar:n spar:n
This did not help in this case, however.

There have been so far three ways to solve  this problem:
1. Use the very old ncr53c7,8 or so driver. This is working rather
unreliable for me.
2. Use the ncr53c8xx, which works usually well
3. Use sym53c8xx(old) compiled with parity disabled

Probably there is a way around that, but somebody trying to install Linux
from a SCSI-CDROM with this card for the first time will very likely not
succeed. I have seen this with (for instance) Corel-Linux and FreeBSD
(same driver).
NB Parity checking for me is not really all that important as there is no
hardrive connected to that card. Only CD and scanner.

Peter B

          .         .
          |\_-^^^-_/|
          / (|)_(|) \
         ( === X === )
          \  ._|_.  /
           ^-_   _-^
              °°°

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20011214041151.91557.qmail@web14904.mail.yahoo.com>
  2001-12-14 16:46 ` Gérard Roudier
@ 2001-12-14 20:34 ` Jens Axboe
  2001-12-15  0:56 ` Stephan von Krawczynski
  2 siblings, 0 replies; 657+ messages in thread
From: Jens Axboe @ 2001-12-14 20:34 UTC (permalink / raw)
  To: Kirk Alexander; +Cc: groudier, linux-kernel

On Fri, Dec 14 2001, Kirk Alexander wrote:
> [cc'ed to lkml and Gerard Roudier]
> 
> Hi Jens,
> 
> You asked people to send in reports of which drivers
> were broken by the removal of io_request_lock.
> 
> My system is a clunky old Digital Pentium Pro with a
> NCR53c810 rev 2 scsi controller, so it can't use the
> sym driver. I fixed the problem by seeing what the sym
> driver did i.e. the patch below 
> This may not be right at all, and I haven't had a
> chance to boot the kernel - but it did build OK.

Missed your original post, it had no subject line. At first view, your
patch looks correct. However, check the ->detect() routing and verify
it's not assuming the lock is held there. That should be the only
pitfall.

Minor nit pick -- since this driver is _in_ the 2.5 tree, there's no way
the #ifdef would not hit. So the way I've been fixing these is to just
always assume latest kernel.

I think this was already fixed though, but at least know you now you did
it right :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 22:26       ` Peter Bornemann
@ 2001-12-14 20:16         ` Gérard Roudier
  2001-12-15  0:54           ` Peter Bornemann
  0 siblings, 1 reply; 657+ messages in thread
From: Gérard Roudier @ 2001-12-14 20:16 UTC (permalink / raw)
  To: Peter Bornemann; +Cc: Jens Axboe, Kirk Alexander, linux-kernel

On Fri, 14 Dec 2001, Peter Bornemann wrote:

> On Fri, 14 Dec 2001, [ISO-8859-1] Gérard Roudier wrote:
> > By the way, for now, I haven't received any report about sym-2 failing
> > when sym-1 or ncr succeeds, and my feeling is that this could well be very
> > unlikely.
> >
>
> Ahemm -- well,
> maybe I'm the first one. I have a symbios card, which is recognized by
> lspci:  SCSI storage controller: LSI Logic Corp. / Symbios Logic Inc.
> (formerly NCR) 53c810 (rev 23).
>
> This card goes into an endless loop during parity-checking. So tried to
> disable it for the new sym53cxx in modules.conf:
> options sym53c8xx mpar:n spar:n
> This did not help in this case, however.
>
> There have been so far three ways to solve  this problem:
> 1. Use the very old ncr53c7,8 or so driver. This is working rather
> unreliable for me.
> 2. Use the ncr53c8xx, which works usually well
> 3. Use sym53c8xx(old) compiled with parity disabled
>
> Probably there is a way around that, but somebody trying to install Linux
> from a SCSI-CDROM with this card for the first time will very likely not
> succeed. I have seen this with (for instance) Corel-Linux and FreeBSD
> (same driver).
> NB Parity checking for me is not really all that important as there is no
> hardrive connected to that card. Only CD and scanner.

About what parity sort are you talking about ?
PCI parity ? SCSI parity ?

PCI parity checking is not an option. If it is this one, then your
hardware is simply broken. For such broken hardwares that returns such
spurious PCI parity error early during HBA probing, sym-2 can detect this
and disable PCI parity checking. This has been reported to work well under
FreeBSD. And sym-2 is also supposed to accept the manual disabling, either
by compiled-in option or using the mpar=n boot-up options.

For SCSI parity, which is different matter, both drivers try to cope and
still sym-2 should accept the spar=n boot-up option.

Could you, please,  report me more accurate information.
TIA,
  Gérard.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 16:46 ` Gérard Roudier
@ 2001-12-14 20:09   ` Jens Axboe
  2001-12-14 18:05     ` Gérard Roudier
  2001-12-18  0:34   ` Kirk Alexander
  1 sibling, 1 reply; 657+ messages in thread
From: Jens Axboe @ 2001-12-14 20:09 UTC (permalink / raw)
  To: Gérard Roudier; +Cc: Kirk Alexander, linux-kernel

On Fri, Dec 14 2001, Gérard Roudier wrote:
> > I fixed the problem by seeing what the sym
> > driver did i.e. the patch below
> > This may not be right at all, and I haven't had a
> > chance to boot the kernel - but it did build OK.
> 
> The ncr53c8xx and sym53c8xx version 1 use the obsolete scsi eh handling.
> Moving the eh code from sym53c8xx_2 (version 2) to ncr53c8xx/sym53c8xx is
> quite feasible, but may-be it is just useless given sym53c8xx_2. For now,
> it seems that sym53c8xx_2 replaces both ncr/sym53c8xx without any loss of
> reliability and performance.

Gerard,

For 2.5, why don't we just yank old sym and ncr out of the kernel? Is
there _any_ reason to keep the two older ones given your new driver
handles it all?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-14 20:09   ` Jens Axboe
@ 2001-12-14 18:05     ` Gérard Roudier
  2001-12-14 22:26       ` Peter Bornemann
  0 siblings, 1 reply; 657+ messages in thread
From: Gérard Roudier @ 2001-12-14 18:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kirk Alexander, linux-kernel

On Fri, 14 Dec 2001, Jens Axboe wrote:

> On Fri, Dec 14 2001, Gérard Roudier wrote:
> > > I fixed the problem by seeing what the sym
> > > driver did i.e. the patch below
> > > This may not be right at all, and I haven't had a
> > > chance to boot the kernel - but it did build OK.
> >
> > The ncr53c8xx and sym53c8xx version 1 use the obsolete scsi eh handling.
> > Moving the eh code from sym53c8xx_2 (version 2) to ncr53c8xx/sym53c8xx is
> > quite feasible, but may-be it is just useless given sym53c8xx_2. For now,
> > it seems that sym53c8xx_2 replaces both ncr/sym53c8xx without any loss of
> > reliability and performance.
>
> Gerard,
>
> For 2.5, why don't we just yank old sym and ncr out of the kernel? Is
> there _any_ reason to keep the two older ones given your new driver
> handles it all?

On my side, there is obviously no reason to keep them in 2.5, as sym-2 is
intended to replace them both. Personnaly I have switched to sym-2 on my
systems since several months.

However, I donnot consider myself as the only owner of these drivers. The
owners are all people that may need symbios chips support under Linux. My
personnal vote, as a user/owner, is to remove them and rely for symbios
chip support on sym-2.

--

Linux stable is a different issue. For this one, I would prefer the old
drivers to remain in place for a longer time. However, I personnaly will
not track bugs on old drivers if either,

- The problem also shows up in sym-2. Then I will try to fix sym-2,
- Or the problem simply doesn't occur in sym-2.

This will apply to problems reported directly by users or by packagers.

By the way, for now, I haven't received any report about sym-2 failing
when sym-1 or ncr succeeds, and my feeling is that this could well be very
unlikely.

But I can make mistakes, me too. :-)

  Gérard.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
       [not found] <20011214041151.91557.qmail@web14904.mail.yahoo.com>
@ 2001-12-14 16:46 ` Gérard Roudier
  2001-12-14 20:09   ` Jens Axboe
  2001-12-18  0:34   ` Kirk Alexander
  2001-12-14 20:34 ` Jens Axboe
  2001-12-15  0:56 ` Stephan von Krawczynski
  2 siblings, 2 replies; 657+ messages in thread
From: Gérard Roudier @ 2001-12-14 16:46 UTC (permalink / raw)
  To: Kirk Alexander; +Cc: Jens Axboe, linux-kernel



On Fri, 14 Dec 2001, Kirk Alexander wrote:

> [cc'ed to lkml and Gerard Roudier]
>
> Hi Jens,
>
> You asked people to send in reports of which drivers
> were broken by the removal of io_request_lock.
>
> My system is a clunky old Digital Pentium Pro with a
> NCR53c810 rev 2 scsi controller, so it can't use the
> sym driver.

Use sym53c8xx_2 instead. This one uses 2 different firmwares,

- one based on sym53c8xx driver scripts called
  'LOAD/STORE based' firmware,
- and another one that only uses generic scripts instructions
  and called 'GENERIC' firmware.

The GENERIC firmware has worked for me witn a 810 rev. 2.

I haven't this controller installed at the moment, but I can test the
driver by forcing the driver to use the GENERIC scripts instead for any
symbios chip.

You may let me know if sym53c8xx_2 still works with 810 rev 2.

> I fixed the problem by seeing what the sym
> driver did i.e. the patch below
> This may not be right at all, and I haven't had a
> chance to boot the kernel - but it did build OK.

The ncr53c8xx and sym53c8xx version 1 use the obsolete scsi eh handling.
Moving the eh code from sym53c8xx_2 (version 2) to ncr53c8xx/sym53c8xx is
quite feasible, but may-be it is just useless given sym53c8xx_2. For now,
it seems that sym53c8xx_2 replaces both ncr/sym53c8xx without any loss of
reliability and performance.

  Gérard.

> Cheers,
> Kirk Alexander
>
> P.S.
> Please excuse me if this has already been fixed or
> posted, or if I've broken some lkml etiquette - first
> post I think after lurking off and on for ages. Also
> first time I've compiled a kernel that has only been
> out a few days!

You are welcome and you didn't break any etiquette. The lkml is a very
open mailing list but it gets more than 200 postings a day. Thus the
linux-scsi@vger.kernel.org list should be preferred for topics that
address scsi specifically.


> --- linux/drivers/scsi/sym53c8xx_comm.h	Fri Dec 14
> 16:46:45 2001
> +++ linux/drivers/scsi/sym53c8xx_comm.h	Fri Dec 14
> 16:49:19 2001
> @@ -438,11 +438,20 @@
>  #define	NCR_LOCK_NCB(np, flags)
> spin_lock_irqsave(&np->smp_lock, flags)
>  #define	NCR_UNLOCK_NCB(np, flags)
> spin_unlock_irqrestore(&np->smp_lock, flags)
>
> +#if LINUX_VERSION_CODE >= LinuxVersionCode(2,5,0)
> +
> +#define	NCR_LOCK_SCSI_DONE(np, flags) \
> +		spin_lock_irqsave((np)->done_list->host, flags)
> +#define	NCR_UNLOCK_SCSI_DONE(np, flags) \
> +		spin_unlock_irqrestore((np)->done_list->host,
> flags)
> +#else
> +
>  #define	NCR_LOCK_SCSI_DONE(np, flags) \
>  		spin_lock_irqsave(&io_request_lock, flags)
>  #define	NCR_UNLOCK_SCSI_DONE(np, flags) \
>  		spin_unlock_irqrestore(&io_request_lock, flags)
>
> +#endif
>  #else
>
>  #define	NCR_LOCK_DRIVER(flags)     do {
> save_flags(flags); cli(); } while (0)


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-12-07  4:17 Keith Owens
@ 2001-12-07  5:10 ` Linus Torvalds
  2001-12-27 18:09   ` Andre Hedrick
  0 siblings, 1 reply; 657+ messages in thread
From: Linus Torvalds @ 2001-12-07  5:10 UTC (permalink / raw)
  To: Keith Owens; +Cc: kbuild-devel, linux-kernel


On Fri, 7 Dec 2001, Keith Owens wrote:
>
> Linus, the time has come to convert the 2.5 kernel to kbuild 2.5.

We're getting the block IO layer in shape first, the time has not come for
_anything_ else before that.

		Linus


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-10-15  6:25 Dinesh  Gandhewar
@ 2001-10-15  6:31 ` Tim Hockin
  0 siblings, 0 replies; 657+ messages in thread
From: Tim Hockin @ 2001-10-15  6:31 UTC (permalink / raw)
  To: dinesh_gandhewar; +Cc: mlist-linux-kernel

> What is the effect of following statement at the end of function definition?
> *(int *)0 = 0;	

to cause a crash - you can't derefernce a pointer whose value is 0 (NULL).



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-10-02 15:29 Dinesh  Gandhewar
  2001-10-02 15:30 ` your mail Alan Cox
  2001-10-02 15:49 ` Richard B. Johnson
@ 2001-10-02 15:52 ` Michael H. Warfield
  2 siblings, 0 replies; 657+ messages in thread
From: Michael H. Warfield @ 2001-10-02 15:52 UTC (permalink / raw)
  To: Dinesh Gandhewar; +Cc: mlist-linux-kernel

On Tue, Oct 02, 2001 at 03:29:45PM -0000, Dinesh  Gandhewar wrote:

> Hello,
> I have written a linux kernel module. The linux version is 2.2.14. 
> In this module I have declared an array of size 2048. If I use this
	array, the execution of this module function causes kernel to
	reboot. If I kmalloc() this array then execution of this module
	function doesnot cause any problem.
> Can you explain this behaviour?

	You didn't say how you declared the array or what the element
size was.  If the array elements were larger than a char, by saying an
array of size 2048, do you mean in bytes or in array elements?

	You also didn't say where you called your module from.  Was it
in an interrupt handler or at insmod time or from a system call.

	If it was a automatic array on the stack (declared inside the
function and not declared static), you probably overflowed the stack.

> Thnaks,
> Dinesh 

	Mike
-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-10-02 15:29 Dinesh  Gandhewar
  2001-10-02 15:30 ` your mail Alan Cox
@ 2001-10-02 15:49 ` Richard B. Johnson
  2001-10-02 15:52 ` Michael H. Warfield
  2 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2001-10-02 15:49 UTC (permalink / raw)
  To: Dinesh Gandhewar; +Cc: mlist-linux-kernel

On 2 Oct 2001, Dinesh  Gandhewar wrote:

> 
> Hello,
> I have written a linux kernel module. The linux version is 2.2.14. 
> In this module I have declared an array of size 2048. If I use this
> array, the execution of this module function causes kernel to reboot.
> If I kmalloc() this array then execution of this module function
> doesnot cause any problem.
> Can you explain this behaviour?
> Thnaks,
> Dinesh 

I would check that you are not accidentally exceeding the bounds of
your array. Actual allocation occurs in page-size chunks. You may
be exceeding your 2048 byte-limit without exceeding the 4096-byte
page-size (of ix86).

However, a global array, or an array on the stack, has very strict
limits. You can blow things up on the stack by exceeding an array
boundary by one byte. And you can overwrite important memory objects
by exceeding the bounds of a global memory object.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

    I was going to compile a list of innovations that could be
    attributed to Microsoft. Once I realized that Ctrl-Alt-Del
    was handled in the BIOS, I found that there aren't any.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-10-02 15:29 Dinesh  Gandhewar
@ 2001-10-02 15:30 ` Alan Cox
  2001-10-02 15:49 ` Richard B. Johnson
  2001-10-02 15:52 ` Michael H. Warfield
  2 siblings, 0 replies; 657+ messages in thread
From: Alan Cox @ 2001-10-02 15:30 UTC (permalink / raw)
  To: dinesh_gandhewar; +Cc: mlist-linux-kernel

> I have written a linux kernel module. The linux version is 2.2.14. 
> In this module I have declared an array of size 2048. If I use this array, the execution of this module function causes kernel to reboot. If I kmalloc() this array then execution of this module function doesnot cause any problem.
> Can you explain this behaviour?

Yes
--
Alan

[Oh wait you want to know why...]

Either

1.	You are using it for DMA
2.	You are putting it on the stack and causing a stack overflow


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-07-24  0:38 新 月
@ 2001-07-24 12:47 ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2001-07-24 12:47 UTC (permalink / raw)
  To: 新 月; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 505 bytes --]

On Tue, 24 Jul 2001, [gb2312] ÐÂ ÔÂ wrote:

> Hi:
> 	how does the kernel know which filesystem should be
> mounted as root filesytem?
> 

The Easter Bunny whispers in its ear.... Erm actually, check
/etc/lilo.conf for hints.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

    I was going to compile a list of innovations that could be
    attributed to Microsoft. Once I realized that Ctrl-Alt-Del
    was handled in the BIOS, I found that there aren't any.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-06-13  1:55 Colonel
  2001-06-13  9:32 ` your mail Luigi Genoni
@ 2001-06-18 13:55 ` Jan Hudec
  1 sibling, 0 replies; 657+ messages in thread
From: Jan Hudec @ 2001-06-18 13:55 UTC (permalink / raw)
  To: Colonel; +Cc: linux-kernel

> So it seems that PnP finds the card, but the connections (or even the
> forced values) to the sb module fail.  Back when this was a single
> processor machine, but still running 2.4 kernel, a windoze
> installation found the SB at the listed interface parameters.
> 
> 
> Anyone have a solution?
> 
> Same problem without modules.conf settings, valid version of mod
> utilities, a web search did not help,...

I had a similar problem with different card (Gravi Usltrasound PnP).
The solution turned out to be to avoid dma 1 channel. May be some BIOSes
or ISA chipsets got the 8-bit dma channels handling wrong, but I really
don't know. Btw: for me 2.2.x autodetected right, 2.4.x need explicit setting.

--------------------------------------------------------------------------------
                  				- Jan Hudec `Bulb' <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-06-13  1:55 Colonel
@ 2001-06-13  9:32 ` Luigi Genoni
  2001-06-18 13:55 ` Jan Hudec
  1 sibling, 0 replies; 657+ messages in thread
From: Luigi Genoni @ 2001-06-13  9:32 UTC (permalink / raw)
  To: Colonel; +Cc: linux-kernel

I have the sound blaster 16 card on one of my athlon (on PIII i have
es1731), that has one isa slot on its MB.
It works well, but i do not use isapnp nor any pnp support is enabled
inside of the kernel.
running 2.4.5/2.4.6-pre2

Luigi


On Tue, 12 Jun 2001, Colonel wrote:

> From: Colonel <klink@clouddancer.com>
> To: linux-kernel@vger.kernel.org
> Subject: ISA Soundblaster problem
> Reply-to: klink@clouddancer.com
>
>
> The Maintainers list does not contain anyone for 2.4 Soundblaster
> modules, so perhaps some one on the mailing list is aware of a
> solution.  My ISA Soundblaster 16waveffects worked fine in kernel 2.2
> with XMMS.  But I have never been successful in a varity of 2.4
> kernels, the latest being 2.4.5-ac12.  This is what I know:
>
> [DMESG]
> isapnp: Scanning for PnP cards...
> isapnp: Calling quirk for 01:00
> isapnp: SB audio device quirk - increasing port range
> isapnp: Card 'Creative SB16 PnP'
> isapnp: 1 Plug & Play card detected total
>
> }modprobe sb
> /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: init_module: No such device
> /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters
> /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: insmod /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o failed
> /lib/modules/2.4.5-ac12/kernel/drivers/sound/sb.o: insmod sb failed
>
>
> [/etc/modules.conf]
> options sb io=0x220 irq=5 dma=1 dma16=5 mpu_io=0x330
>
>
> [DMESG}
> Soundblaster audio driver Copyright (C) by Hannu Savolainen 1993-1996
> sb: No ISAPnP cards found, trying standard ones...
> sb: dsp reset failed.
>
>
> So it seems that PnP finds the card, but the connections (or even the
> forced values) to the sb module fail.  Back when this was a single
> processor machine, but still running 2.4 kernel, a windoze
> installation found the SB at the listed interface parameters.
>
>
> Anyone have a solution?
>
> Same problem without modules.conf settings, valid version of mod
> utilities, a web search did not help,...
>
>
>
> TIA
>
>
> please CC:  klink@clouddancer.com, not currently on the mailing list.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-31 20:37 ` your mail Andrzej Krzysztofowicz
@ 2001-05-31 23:04   ` H. Peter Anvin
  0 siblings, 0 replies; 657+ messages in thread
From: H. Peter Anvin @ 2001-05-31 23:04 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200105312037.WAA01610@kufel.dom>
By author:    Andrzej Krzysztofowicz <kufel!ankry@green.mif.pg.gda.pl>
In newsgroup: linux.dev.kernel
> 
> BTW, linux-kernel readers: anybody is a volunteer for making the kernel size
> counter 32-bit here? This would enable using the simple bootloader for
> greater kernel loading...  (current limit is sligtly below 1MB)
> Possibly some 16/32-bit real mode code mixing would be necessary.
> 

PLEASE don't go there.  bootsect.S is fundamentally broken these days
(doesn't work on USB floppies, for example.)  It should be killed
DEAD, DEAD, DEAD and not dragged along like a dead albatross.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-31 16:53 Ramil.Santamaria
@ 2001-05-31 20:37 ` Andrzej Krzysztofowicz
  2001-05-31 23:04   ` H. Peter Anvin
  0 siblings, 1 reply; 657+ messages in thread
From: Andrzej Krzysztofowicz @ 2001-05-31 20:37 UTC (permalink / raw)
  To: kufel!tais.toshiba.com!Ramil.Santamaria
  Cc: kufel!vger.kernel.org!linux-kernel

> 
> Minor issue with bootsect.s.

1. bootsect.S is the source file

> The single instance of the lds assembly instruction includes the comment of
> !  ds:si is source
> ...
> seg fs
> lds  si,(bx)        !     ds:si is source
> ...
> Is this comment not in reverse order (i.e should be lds
> dest,src)................

2. This is not a comment of i386 mnemonics. This comments the role of
   specific register in the following instructions.

BTW, linux-kernel readers: anybody is a volunteer for making the kernel size
counter 32-bit here? This would enable using the simple bootloader for
greater kernel loading...  (current limit is sligtly below 1MB)
Possibly some 16/32-bit real mode code mixing would be necessary.

Andrzej

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-21 20:12 ` your mail Lorenzo Marcantonio
@ 2001-05-22 10:06   ` Thomas Palm
  0 siblings, 0 replies; 657+ messages in thread
From: Thomas Palm @ 2001-05-22 10:06 UTC (permalink / raw)
  To: Lorenzo Marcantonio; +Cc: linux-kernel

1. The corrupted files have the same length but differ (I cannot say on what
bit-position)
2. I reproduced the problem while burning CD from SCSI-Disk to
SCSI-CD-Burner!!!
-> It´s definetly not a (single?) IDE-Problem

Burning CD (on slow 4x speed) seems to initialize many small transfers
(instead of a smooth stream)(same as copying many small files) on PCI/DMA wich
generate the same problems!!!



> On Mon, 21 May 2001, Thomas Palm wrote:
> 
> > there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05,
> > including Sound + Raid). My tests:
> > 1st run of "diff -r srcdir destdir" -> no differs
> > 2nd run of "diff -r srcdir destdir" -> 2 files differ
> > 3rd run of "diff -r srcdir destdir" -> 1 file differs
> > 4th run of "diff -r srcdir destdir" -> 1 file differs
> > 5th run of "diff -r srcdir destdir" -> no differs
> 
> Could you check WHERE the file differ and WHERE the data come from ?
> 
> I've got the same mobo AND some nasty DAT tape corruption problems...
> (also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's
> the DMA troubling us...
> 
> 				-- Lorenzo Marcantonio
> 
> 

-- 
Machen Sie Ihr Hobby zu Geld bei unserem Partner 1&1!
http://profiseller.de/info/index.php3?ac=OM.PS.PS003K00596T0409a

--
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-21 19:43 Thomas Palm
@ 2001-05-21 20:12 ` Lorenzo Marcantonio
  2001-05-22 10:06   ` Thomas Palm
  0 siblings, 1 reply; 657+ messages in thread
From: Lorenzo Marcantonio @ 2001-05-21 20:12 UTC (permalink / raw)
  To: Thomas Palm; +Cc: linux-kernel

On Mon, 21 May 2001, Thomas Palm wrote:

> there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05,
> including Sound + Raid). My tests:
> 1st run of "diff -r srcdir destdir" -> no differs
> 2nd run of "diff -r srcdir destdir" -> 2 files differ
> 3rd run of "diff -r srcdir destdir" -> 1 file differs
> 4th run of "diff -r srcdir destdir" -> 1 file differs
> 5th run of "diff -r srcdir destdir" -> no differs

Could you check WHERE the file differ and WHERE the data come from ?

I've got the same mobo AND some nasty DAT tape corruption problems...
(also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's
the DMA troubling us...

				-- Lorenzo Marcantonio



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-16 15:05 siva prasad
@ 2001-05-17  0:11 ` Erik Mouw
  0 siblings, 0 replies; 657+ messages in thread
From: Erik Mouw @ 2001-05-17  0:11 UTC (permalink / raw)
  To: siva prasad; +Cc: linux-kernel

On Wed, May 16, 2001 at 08:05:38AM -0700, siva prasad wrote:
> Is it true that the ipc calls like
> msgget(),shmget()...
> are  not really system calls?

No, they all use a system call, but the system call is the same for all
functions.

> Cos in the file "asm/unistd.h" where the
> system calls are listed as __NR_xxx we dont find
> the appropriate listing for the ipc calls.
> What I guessed was that all the ipc calls are
> clubbed under the 'int ipc()' system call and this
> is well listed in the "asm/unistd.h" 

Right, they all use __NR_ipc. See sys_ipc() in
arch/i386/kernel.sys_i386.c, especially the comment right above the
function...

> Could some one explain why the ipc is implemented 
> this way rather that implementing them as individual 
> system calls as in UNIX.

Probably because the original designer liked it this way and nobody
cared enough to do it otherwise.

> Or is it the same way in UNIX

I don't know, I don't have Unix source available.


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 20:16     ` Jens Axboe
@ 2001-05-09 13:59       ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2001-05-09 13:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux kernel

On Tue, 8 May 2001, Jens Axboe wrote:

> On Tue, May 08 2001, Richard B. Johnson wrote:
> > > Use a kernel thread? If you don't need to access user space, context
> > > switches are very cheap.
> > > 
> > > > So, what am I supposed to do to add a piece of driver code to the
> > > > run queue so it gets scheduled occasionally?
> > > 
> > > Several, grep for kernel_thread.
> > > 
> > > -- 
> > > Jens Axboe
> > > 
> > 
> > Okay. Thanks. I thought I would have to do that too. No problem.
> 
> A small worker thread and a wait queue to sleeep on and you are all set,
> 10 minutes tops :-)
> 
> > It's a "tomorrow" thing. Ten hours it too long to stare at a
> > screen.
> 
> Sissy!
> 

Okay. I am now awake. I will now try the kernel thread. Looks
simple. Got to remember to kill it before/during module removal.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 20:46 ` Alan Cox
@ 2001-05-08 21:05   ` Richard B. Johnson
  0 siblings, 0 replies; 657+ messages in thread
From: Richard B. Johnson @ 2001-05-08 21:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux kernel

On Tue, 8 May 2001, Alan Cox wrote:

> > I have a driver which needs to wait for some hardware.
> > Basically, it needs to have some code added to the run-queue
> > so it can get some CPU time even though it's not being called.
> 
> Wht does it have to wait ? Why cant it just poll and come back next time ?
> 

Good question. I wanted to be able to call the exact same routine(s)
that other routines (exected from read() and write()), execute.
These routines are complex and sleep while waiting for events. I
didn't want to duplicate that code with different time-out mechanisms.

GPIB is nasty because you can't do anything unless the 'controller'
tells you to do it. When "addressed to talk", you have to parse
all the stuff sent via interrupt (ATN bit set, control byte, which
address from the control byte, etc.), then let somebody sleeping
in poll() know that they can now "write()". That can all be handled
via interrupt. But, now for the receive <grin>. The user-mode code
needs to be sleeping until some data are available. That data
may never be available. Something in the driver needs to wait
until the hardware is addressed to receive. Since it is not now
receiving, there is no interrupt! It takes time for the controller
to tell you to listen and then tell somebody else to talk to you.
This means that I need some timeout to recover from the fact
that the other guy may never talk.

Once the other guy starts sending data, the interrupts can be
used to handle the data and, once there are valid data, the
device owner can be awakened, presumably sleeping in poll() or
select(). It's the intermediate time where there are no
interrupts that needs the CPU to determine that we've waited
too long for interrupts so the device had better get off the
bus to start the error recovery procedure.

Bright an early tommorrow, I will check out both ways. A kernel
thread might be "neat". However, I may just look to see if
I can just poll while using existing code.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 19:48 Richard B. Johnson
  2001-05-08 20:06 ` your mail Jens Axboe
@ 2001-05-08 20:46 ` Alan Cox
  2001-05-08 21:05   ` Richard B. Johnson
  1 sibling, 1 reply; 657+ messages in thread
From: Alan Cox @ 2001-05-08 20:46 UTC (permalink / raw)
  To: root; +Cc: Linux kernel

> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.

Wht does it have to wait ? Why cant it just poll and come back next time ?

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 20:15   ` Richard B. Johnson
@ 2001-05-08 20:16     ` Jens Axboe
  2001-05-09 13:59       ` Richard B. Johnson
  0 siblings, 1 reply; 657+ messages in thread
From: Jens Axboe @ 2001-05-08 20:16 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Linux kernel

On Tue, May 08 2001, Richard B. Johnson wrote:
> > Use a kernel thread? If you don't need to access user space, context
> > switches are very cheap.
> > 
> > > So, what am I supposed to do to add a piece of driver code to the
> > > run queue so it gets scheduled occasionally?
> > 
> > Several, grep for kernel_thread.
> > 
> > -- 
> > Jens Axboe
> > 
> 
> Okay. Thanks. I thought I would have to do that too. No problem.

A small worker thread and a wait queue to sleeep on and you are all set,
10 minutes tops :-)

> It's a "tomorrow" thing. Ten hours it too long to stare at a
> screen.

Sissy!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 20:06 ` your mail Jens Axboe
@ 2001-05-08 20:15   ` Richard B. Johnson
  2001-05-08 20:16     ` Jens Axboe
  0 siblings, 1 reply; 657+ messages in thread
From: Richard B. Johnson @ 2001-05-08 20:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux kernel

On Tue, 8 May 2001, Jens Axboe wrote:

> On Tue, May 08 2001, Richard B. Johnson wrote:
> > 
> > To driver wizards:
> > 
> > I have a driver which needs to wait for some hardware.
> > Basically, it needs to have some code added to the run-queue
> > so it can get some CPU time even though it's not being called.
> > 
> > It needs to get some CPU time which can be "turned on" or
> > "turned off" as a result of an interrupt or some external
> > input from  an ioctl().
> > 
> > So I thought that the "tasklet" would be ideal. However, the
> > scheduler "thinks" that a tasklet is an interrupt, so any
> > attempt to sleep in the tasklet results in a kernel panic,
> > "ieee scheduling in an interrupt..., BUG sched.c line 688".
> 
> Use a kernel thread? If you don't need to access user space, context
> switches are very cheap.
> 
> > So, what am I supposed to do to add a piece of driver code to the
> > run queue so it gets scheduled occasionally?
> 
> Several, grep for kernel_thread.
> 
> -- 
> Jens Axboe
> 

Okay. Thanks. I thought I would have to do that too. No problem.
It's a "tomorrow" thing. Ten hours it too long to stare at a
screen.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-08 19:48 Richard B. Johnson
@ 2001-05-08 20:06 ` Jens Axboe
  2001-05-08 20:15   ` Richard B. Johnson
  2001-05-08 20:46 ` Alan Cox
  1 sibling, 1 reply; 657+ messages in thread
From: Jens Axboe @ 2001-05-08 20:06 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Linux kernel

On Tue, May 08 2001, Richard B. Johnson wrote:
> 
> To driver wizards:
> 
> I have a driver which needs to wait for some hardware.
> Basically, it needs to have some code added to the run-queue
> so it can get some CPU time even though it's not being called.
> 
> It needs to get some CPU time which can be "turned on" or
> "turned off" as a result of an interrupt or some external
> input from  an ioctl().
> 
> So I thought that the "tasklet" would be ideal. However, the
> scheduler "thinks" that a tasklet is an interrupt, so any
> attempt to sleep in the tasklet results in a kernel panic,
> "ieee scheduling in an interrupt..., BUG sched.c line 688".

Use a kernel thread? If you don't need to access user space, context
switches are very cheap.

> So, what am I supposed to do to add a piece of driver code to the
> run queue so it gets scheduled occasionally?

Several, grep for kernel_thread.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-07 11:38 Chandrashekar Nagaraj
@ 2001-05-07 12:09 ` Erik Mouw
  0 siblings, 0 replies; 657+ messages in thread
From: Erik Mouw @ 2001-05-07 12:09 UTC (permalink / raw)
  To: Chandrashekar Nagaraj; +Cc: linux-kernel

On Mon, May 07, 2001 at 05:08:43PM +0530, Chandrashekar Nagaraj wrote:
> 	i want to know how to read tab without a terminating character,
> ie., if i use getchar() i have to enter '\n' after tab to read tab,
> same is the case with read system call and scanf. 

This is off topic for this list, but anyway.

Read man cfmakeraw, and/or get a copy of "Advanced programming in the
UNIX environment" by W. Richard Stevens.


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-05-02 22:34 Duc Vianney
@ 2001-05-03  0:10 ` Linus Torvalds
  0 siblings, 0 replies; 657+ messages in thread
From: Linus Torvalds @ 2001-05-03  0:10 UTC (permalink / raw)
  To: Duc Vianney
  Cc: castortz, Bill Hartner, staelin, Larry McVoy, lse-tech,
	linux-kernel, lmbench-users

On Wed, 2 May 2001, Duc Vianney wrote:
>
> Has anyone seen performance degradations between 2.2.19 and 2.4.x

Yes.

The signal handling one is because 2.4.x will save off the full SSE2
state, which means that the signal stack is almost 700 bytes, as compared
to <200 before. This is sadly necessary to be able to take advantage of
the SSE2 instructions - and on special applications the win can be quite
noticeable. This one you won't be able to avoid, although you shouldn't
see it on older hardware that do not have SSE2 (you see it because you
have a PIII).

You don't say how much memory you have, but the file handling ones might
be due to a really unfortunate hash thinko that cause the dentry hash to
be pretty much useless on machines that have 512MB of RAM (it can show up
in other cases, but 512M is the case that makes the hash really become a
non-hash). If so, it should be fixed in 2.4.2.

2.4.4 will give noticeably better numbers for fork and fork+exec. However,
the scheduling optimization that does that actually breaks at least
"bash", and it appears that we will just undo it during the stable series.
Even if the bug is obviously in user land (and a fix is available), stable
kernels shouldn't try to hide the problems.

			Linus

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-04-26 19:37 Alexandru Barloiu Nicolae
  2001-04-26 19:51 ` your mail Erik Mouw
  2001-04-26 19:54 ` Mohammad A. Haque
@ 2001-04-26 19:59 ` Joel Jaeggli
  2 siblings, 0 replies; 657+ messages in thread
From: Joel Jaeggli @ 2001-04-26 19:59 UTC (permalink / raw)
  To: Alexandru Barloiu Nicolae; +Cc: linux-kernel

yeah two hour upgrade window today...

joelja

On Thu, 26 Apr 2001, Alexandru Barloiu Nicolae wrote:

> is ftp.kernel.org down or is just my connections fault ?
>
> axl
>
>
> ______________________________________________________
> support slackware anyway posible paypal@slackware.com anyone ?
>    http://www.slackware.com/forum/read.php?f=5&i=7887&t=7887
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
--------------------------------------------------------------------------
Joel Jaeggli				       joelja@darkwing.uoregon.edu
Academic User Services			     consult@gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.



^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-04-26 19:37 Alexandru Barloiu Nicolae
  2001-04-26 19:51 ` your mail Erik Mouw
@ 2001-04-26 19:54 ` Mohammad A. Haque
  2001-04-26 19:59 ` Joel Jaeggli
  2 siblings, 0 replies; 657+ messages in thread
From: Mohammad A. Haque @ 2001-04-26 19:54 UTC (permalink / raw)
  To: Alexandru Barloiu Nicolae; +Cc: linux-kernel

Down for maint.


On Thu, 26 Apr 2001, Alexandru Barloiu Nicolae wrote:

> is ftp.kernel.org down or is just my connections fault ?
>

-- 

=====================================================================
Mohammad A. Haque                              http://www.haque.net/
                                               mhaque@haque.net

  "Alcohol and calculus don't mix.             Project Lead
   Don't drink and derive." --Unknown          http://wm.themes.org/
                                               batmanppc@themes.org
=====================================================================


^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-04-26 19:37 Alexandru Barloiu Nicolae
@ 2001-04-26 19:51 ` Erik Mouw
  2001-04-26 19:54 ` Mohammad A. Haque
  2001-04-26 19:59 ` Joel Jaeggli
  2 siblings, 0 replies; 657+ messages in thread
From: Erik Mouw @ 2001-04-26 19:51 UTC (permalink / raw)
  To: Alexandru Barloiu Nicolae; +Cc: linux-kernel

On Thu, Apr 26, 2001 at 10:37:32PM +0300, Alexandru Barloiu Nicolae wrote:
> is ftp.kernel.org down or is just my connections fault ?

Yes, scheduled downtime. Use your local mirror (ftp.ro.kernel.org).


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-04-02 19:20 Jakob Kemi
@ 2001-04-09 13:23 ` Tim Waugh
  0 siblings, 0 replies; 657+ messages in thread
From: Tim Waugh @ 2001-04-09 13:23 UTC (permalink / raw)
  To: Jakob Kemi; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

On Mon, Apr 02, 2001 at 09:20:43PM +0200, Jakob Kemi wrote:

> Ok, maybe this isn't the right list for this question. In 2.2.x the
> parport_probe module extracted the ieee1284 device id correctly and
> added to the proc fs. However this doesn't seem to work for me in
> 2.4.x

It changed place: perhaps that's the problem?  /proc/parport/$n is now
/proc/sys/dev/parport/$name.

> Is there some option I need to enable. As far as I understand the
>  CONFIG_PARPORT_1284 should be enough??

Yes, it should.

Tim.
*/

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-03-24  0:04 dhar
@ 2001-03-24  1:09 ` Tim Wright
  0 siblings, 0 replies; 657+ messages in thread
From: Tim Wright @ 2001-03-24  1:09 UTC (permalink / raw)
  To: dhar; +Cc: linux-smp, linux-kernel

Hmmm...
you don't really give enough information to make much of a guess.
I'd do the following:
Grab at least 2.2.18, or even better, get Alan's 2.2.19pre (which is almost
2.2.19 now, I believe), and build and install that kernel.

Now, if you run into the same problems, record the crash details, especially
if the kernels oopses, and then send the information (kernel version, output
of ksymoops if there is an oops, kernel .config used etc.) to the mailing list.

Tim

On Sat, Mar 24, 2001 at 05:34:39AM +0530, dhar wrote:
> Hi,
> 
> I am not a member of either of these lists and would appreciate if you could send your replies to me personally.
> 
> Now the problem:
> 
> I have an IBM Netfinity X330 server. Dual Processor (PIII 800). I compiled kernel 2.2.14 with SMP support. NFS was however compiled as a module. 
> 
> Now the problem is as follows:
> 
> Most of the times the machine just works fine. 
> But whenever there is heavy disk write activity it just hangs/crashes. Also this is when the SMP kernel is used. If I use the normal kernel then there is no problem. 
> 
> Could any one tell me what has to be done to prevent this from happening? 
> 
> Any help in this regard will be very much appreciated.
> 
> Once again, kindly reply to me personally as I am not a member of either of these lists.
>  
> Regards
> Dhar 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-03-12  5:03 ` your mail Greg KH
@ 2001-03-14 17:46   ` Robert Read
  0 siblings, 0 replies; 657+ messages in thread
From: Robert Read @ 2001-03-14 17:46 UTC (permalink / raw)
  To: Greg KH, Martin Bruchanov, linux-kernel

On Sun, Mar 11, 2001 at 09:03:09PM -0800, Greg KH wrote:
> On Sun, Mar 11, 2001 at 06:06:24PM +0100, Martin Bruchanov wrote:
> > 
> > Bug report from Martin Bruchanov (bruxy@kgb.cz, bruchm@racom.cz)
> > 
> > ############################################################################
> > [1.] One line summary of the problem:    
> > USB doesn't work properly with SMP kernel on dual-mainboard or with APIC.
> 
> What kind of motherboard is this?
> 

>From the lspci output, looks like I have the same mainboard or at
least one with an identical chipset. I've got an MSI 694D Pro
Mainboard with 694X VIA chipset, with 2 cpus installed, and I had the
same USB problem with 2.4.0, but haven't had time to test it on a
recent kernel.

robert

> And does USB work in SMP mode with "noapic" given on the kernel command
> line?
> 
> thanks,
> 
> greg k-h
> 
> -- 
> greg@(kroah|wirex).com
> http://immunix.org/~greg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-03-11 17:06 Martin Bruchanov
@ 2001-03-12  5:03 ` Greg KH
  2001-03-14 17:46   ` Robert Read
  0 siblings, 1 reply; 657+ messages in thread
From: Greg KH @ 2001-03-12  5:03 UTC (permalink / raw)
  To: Martin Bruchanov; +Cc: linux-kernel

On Sun, Mar 11, 2001 at 06:06:24PM +0100, Martin Bruchanov wrote:
> 
> Bug report from Martin Bruchanov (bruxy@kgb.cz, bruchm@racom.cz)
> 
> ############################################################################
> [1.] One line summary of the problem:    
> USB doesn't work properly with SMP kernel on dual-mainboard or with APIC.

What kind of motherboard is this?

And does USB work in SMP mode with "noapic" given on the kernel command
line?

thanks,

greg k-h

-- 
greg@(kroah|wirex).com
http://immunix.org/~greg

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-03-06 23:55 Ying Chen
@ 2001-03-07  0:40 ` Don Dugger
  0 siblings, 0 replies; 657+ messages in thread
From: Don Dugger @ 2001-03-07  0:40 UTC (permalink / raw)
  To: Ying Chen; +Cc: linux-kernel

Ying-

I'm a little confused here.  It's very hard to compare a UP application
vs. the same app. converted to use threads.  Unless the app. is
structured such that multiple threads can run at the same time then
no, you won't see any improvement by going to SMP, in fact a true
single threaded app. will frequently slow down when run on an SMP
kernel.

Have you watched a CPU meter while your benchmark runs?  Even something
basic like `top' should give you a feel for whether or not your
using all of the CPU's.


On Tue, Mar 06, 2001 at 03:55:55PM -0800, Ying Chen wrote:
> Hi,
> 
> I have two questions on Linux pthread related issues. Would anyone be able 
> to help?
> 
> ...
>
> 2. We ran multi-threaded application using Linux pthread library on 2-way 
> SMP and UP intel platforms (with both 2.2 and 2.4 kernels). We see 
> significant increase in context switching when moving from UP to SMP, and 
> high CPU usage with no performance gain in turns of actual work being done 
> when moving to SMP, despite the fact the benchmark we are running is 
> CPU-bound. The kernel profiler indicates that the a lot of kernel CPU ticks 
> went to scheduling and signaling overheads. Has anyone seen something like 
> this before with pthread applications running on SMP platforms? Any 
> suggestions or pointers on this subject?
> 
> Thanks a lot!
> 
> Ying
> 
> 
> 
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
n0ano@valinux.com
Ph: 303/938-9838

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-01-19 13:37 Robert Kaiser
@ 2001-01-19 14:37 ` Steve Hill
  0 siblings, 0 replies; 657+ messages in thread
From: Steve Hill @ 2001-01-19 14:37 UTC (permalink / raw)
  To: RobertKaiser; +Cc: linux-kernel

On Fri, 19 Jan 2001, RobertKaiser wrote:

> On Thu Jan 18 16:30:30 2001 steve@navaho.co.uk wrote
> > Has anyone had any luck getting a 2.4 kernel to run on Cobalt x86
> > hardware?  It doesn't even seem to start (I get nothing on the screen from
> >t he kernel, it just sits there and does nothing). :(
> 
> What processor does it use ? (386 or 486 perchance?)

AMD K6 (so 586) - I was trying the i386 version of the kernel on it
though, if that's going to be a problem, I can try the 586 version...

-- 

- Steve Hill
System Administrator         Email: steve@navaho.co.uk
Navaho Technologies Ltd.       Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2001-01-10 18:24 Thiago Rondon
@ 2001-01-11  4:08 ` David Hinds
  0 siblings, 0 replies; 657+ messages in thread
From: David Hinds @ 2001-01-11  4:08 UTC (permalink / raw)
  To: Thiago Rondon, dahinds; +Cc: Linux Kernel, Alan Cox

On Wed, Jan 10, 2001 at 04:24:21PM -0200, Thiago Rondon wrote:
> 
> Check kmalloc().
> 
> -Thiago Rondon
> 
> --- linux-2.4.0-ac5/drivers/pcmcia/ds.c	Sat Sep  2 04:13:49 2000
> +++ linux-2.4.0-ac5.maluco/drivers/pcmcia/ds.c	Wed Jan 10 16:20:53 2001
> @@ -414,6 +414,8 @@
>      /* Add binding to list for this socket */
>      driver->use_count++;
>      b = kmalloc(sizeof(socket_bind_t), GFP_KERNEL);
> +    if (!b) 
> +      return -ENOMEM;    
>      b->driver = driver;
>      b->function = bind_info->function;
>      b->instance = NULL;
> 

As with the other kmalloc patch, this is also broken; things have been
done that need to be un-done, and you can't just exit the function
here.  I'll come up with a better fix.

-- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

* Re: your mail
  2000-12-11 14:01 Heiko.Carstens
@ 2000-12-11 15:14 ` Alan Cox
  0 siblings, 0 replies; 657+ messages in thread
From: Alan Cox @ 2000-12-11 15:14 UTC (permalink / raw)
  To: Heiko.Carstens; +Cc: linux-kernel

> sigp. To synchronize n CPUs one can create n kernel threads and give
> them a high priority to make sure they will be executed soon (e.g. by
> setting p->policy to SCHED_RR and p->rt_priority to a very high
> value). As soon as all CPUs are in synchronized state (with
> interrupts disabled) the new CPU can be started. But before this can
> be done there are some other things left to do:

You dont IMHO need to use such a large hammer. We already do similar sequences
for tlb invalidation on X86 for example. You can broadcast an interprocessor
interrupt and have the other processors set a flag each. You spin until they
are all captured, then when you clear the flag they all continue. You just
need to watch two processors doing it at the same time 8)

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 657+ messages in thread

end of thread, other threads:[~2023-10-16 13:24 UTC | newest]

Thread overview: 657+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-09 13:14 [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
2007-08-09 12:41 ` Arnd Bergmann
2007-08-09 14:29   ` Chris Snook
2007-08-09 15:30     ` Arnd Bergmann
2007-08-14 22:31 ` Christoph Lameter
2007-08-14 22:45   ` Chris Snook
2007-08-14 22:51     ` Christoph Lameter
2007-08-14 23:08   ` Satyam Sharma
2007-08-14 23:04     ` Chris Snook
2007-08-14 23:14       ` Christoph Lameter
2007-08-15  6:49       ` Herbert Xu
2007-08-15  8:18         ` Heiko Carstens
2007-08-15 13:53           ` Stefan Richter
2007-08-15 14:35             ` Satyam Sharma
2007-08-15 14:52               ` Herbert Xu
2007-08-15 16:09                 ` Stefan Richter
2007-08-15 16:27                   ` Paul E. McKenney
2007-08-15 17:13                     ` Satyam Sharma
2007-08-15 18:31                     ` Segher Boessenkool
2007-08-15 18:57                       ` Paul E. McKenney
2007-08-15 19:54                         ` Satyam Sharma
2007-08-15 20:17                           ` Paul E. McKenney
2007-08-15 20:52                             ` Segher Boessenkool
2007-08-15 22:42                               ` Paul E. McKenney
2007-08-15 20:47                           ` Segher Boessenkool
2007-08-16  0:36                             ` Satyam Sharma
2007-08-16  0:32                               ` your mail Herbert Xu
2007-08-16  0:58                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Satyam Sharma
2007-08-16  0:51                                   ` Herbert Xu
2007-08-16  1:18                                     ` Satyam Sharma
2007-08-16  1:38                               ` Segher Boessenkool
2007-08-15 21:05                         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
2007-08-15 22:44                           ` Paul E. McKenney
2007-08-16  1:23                             ` Segher Boessenkool
2007-08-16  2:22                               ` Paul E. McKenney
2007-08-15 19:58               ` Stefan Richter
2007-08-16  0:39           ` [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert() Satyam Sharma
2007-08-24 11:59             ` Denys Vlasenko
2007-08-24 12:07               ` Andi Kleen
2007-08-24 12:12               ` Kenn Humborg
2007-08-24 14:25                 ` Denys Vlasenko
2007-08-24 17:34                   ` Linus Torvalds
2007-08-24 13:30               ` Satyam Sharma
2007-08-24 17:06                 ` Christoph Lameter
2007-08-24 20:26                   ` Denys Vlasenko
2007-08-24 20:34                     ` Chris Snook
2007-08-24 16:19               ` Luck, Tony
2007-08-15 16:13         ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Chris Snook
2007-08-15 23:40           ` Herbert Xu
2007-08-15 23:51             ` Paul E. McKenney
2007-08-16  1:30               ` Segher Boessenkool
2007-08-16  2:30                 ` Paul E. McKenney
2007-08-16 19:33                   ` Segher Boessenkool
2007-08-16  1:26             ` Segher Boessenkool
2007-08-16  2:23               ` Nick Piggin
2007-08-16 19:32                 ` Segher Boessenkool
2007-08-17  2:19                   ` Nick Piggin
2007-08-17  3:16                     ` Paul Mackerras
2007-08-17  3:32                       ` Nick Piggin
2007-08-17  3:50                         ` Linus Torvalds
2007-08-17 23:59                           ` Paul E. McKenney
2007-08-18  0:09                             ` Herbert Xu
2007-08-18  1:08                               ` Paul E. McKenney
2007-08-18  1:24                                 ` Christoph Lameter
2007-08-18  1:41                                   ` Satyam Sharma
2007-08-18  4:13                                     ` Linus Torvalds
2007-08-18 13:36                                       ` Satyam Sharma
2007-08-18 21:54                                       ` Paul E. McKenney
2007-08-18 22:41                                         ` Linus Torvalds
2007-08-18 23:19                                           ` Paul E. McKenney
2007-08-24 12:19                                       ` Denys Vlasenko
2007-08-24 17:19                                         ` Linus Torvalds
2007-08-18 21:56                                   ` Paul E. McKenney
2007-08-20 13:31                                   ` Chris Snook
2007-08-20 22:04                                     ` Segher Boessenkool
2007-08-20 22:48                                       ` Russell King
2007-08-20 23:02                                         ` Segher Boessenkool
2007-08-21  0:05                                           ` Paul E. McKenney
2007-08-21  7:08                                             ` Russell King
2007-08-21  7:05                                           ` Russell King
2007-08-21  9:33                                             ` Paul Mackerras
2007-08-21 11:37                                               ` Andi Kleen
2007-08-21 14:48                                               ` Segher Boessenkool
2007-08-21 16:16                                                 ` Paul E. McKenney
2007-08-21 22:51                                                   ` Valdis.Kletnieks
2007-08-22  0:50                                                     ` Paul E. McKenney
2007-08-22 21:38                                                     ` Adrian Bunk
2007-08-21 14:39                                             ` Segher Boessenkool
2007-08-17  3:42                       ` Linus Torvalds
2007-08-17  5:18                         ` Paul E. McKenney
2007-08-17  5:56                         ` Satyam Sharma
2007-08-17  7:26                           ` Nick Piggin
2007-08-17  8:47                             ` Satyam Sharma
2007-08-17  9:15                               ` Nick Piggin
2007-08-17 10:12                                 ` Satyam Sharma
2007-08-17 12:14                                   ` Nick Piggin
2007-08-17 13:05                                     ` Satyam Sharma
2007-08-17  9:48                               ` Paul Mackerras
2007-08-17 10:23                                 ` Satyam Sharma
2007-08-17 22:49                           ` Segher Boessenkool
2007-08-17 23:51                             ` Satyam Sharma
2007-08-17 23:55                               ` Segher Boessenkool
2007-08-17  6:42                         ` Geert Uytterhoeven
2007-08-17  8:52                         ` Andi Kleen
2007-08-17 10:08                           ` Satyam Sharma
2007-08-17 22:29                         ` Segher Boessenkool
2007-08-17 17:37                     ` Segher Boessenkool
2007-08-14 23:26     ` Paul E. McKenney
2007-08-15 10:35     ` Stefan Richter
2007-08-15 12:04       ` Herbert Xu
2007-08-15 12:31       ` Satyam Sharma
2007-08-15 13:08         ` Stefan Richter
2007-08-15 13:11           ` Stefan Richter
2007-08-15 13:47           ` Satyam Sharma
2007-08-15 14:25             ` Paul E. McKenney
2007-08-15 15:33               ` Herbert Xu
2007-08-15 16:08                 ` Paul E. McKenney
2007-08-15 17:18                   ` Satyam Sharma
2007-08-15 17:33                     ` Paul E. McKenney
2007-08-15 18:05                       ` Satyam Sharma
2007-08-15 17:55               ` Satyam Sharma
2007-08-15 19:07                 ` Paul E. McKenney
2007-08-15 21:07                   ` Segher Boessenkool
2007-08-15 20:58                 ` Segher Boessenkool
2007-08-15 18:19               ` David Howells
2007-08-15 18:45                 ` Paul E. McKenney
2007-08-15 23:41                   ` Herbert Xu
2007-08-15 23:53                     ` Paul E. McKenney
2007-08-16  0:12                       ` Herbert Xu
2007-08-16  0:23                         ` Paul E. McKenney
2007-08-16  0:30                           ` Herbert Xu
2007-08-16  0:49                             ` Paul E. McKenney
2007-08-16  0:53                               ` Herbert Xu
2007-08-16  1:14                                 ` Paul E. McKenney
2007-08-15 18:31         ` Segher Boessenkool
2007-08-15 19:40           ` Satyam Sharma
2007-08-15 20:42             ` Segher Boessenkool
2007-08-16  1:23               ` Satyam Sharma
2007-08-15 23:22         ` Paul Mackerras
2007-08-16  0:26           ` Christoph Lameter
2007-08-16  0:34             ` Paul Mackerras
2007-08-16  0:40               ` Christoph Lameter
2007-08-16  0:39             ` Paul E. McKenney
2007-08-16  0:42               ` Christoph Lameter
2007-08-16  0:53                 ` Paul E. McKenney
2007-08-16  0:59                   ` Christoph Lameter
2007-08-16  1:14                     ` Paul E. McKenney
2007-08-16  1:41                       ` Christoph Lameter
2007-08-16  2:15                         ` Satyam Sharma
2007-08-16  2:08                           ` Herbert Xu
2007-08-16  2:18                             ` Christoph Lameter
2007-08-16  3:23                               ` Paul Mackerras
2007-08-16  3:33                                 ` Herbert Xu
2007-08-16  3:48                                   ` Paul Mackerras
2007-08-16  4:03                                     ` Herbert Xu
2007-08-16  4:34                                       ` Paul Mackerras
2007-08-16  5:37                                         ` Herbert Xu
2007-08-16  6:00                                           ` Paul Mackerras
2007-08-16 18:50                                             ` Christoph Lameter
2007-08-16 18:48                                 ` Christoph Lameter
2007-08-16 19:44                                 ` Segher Boessenkool
2007-08-16  2:18                             ` Chris Friesen
2007-08-16  2:32                         ` Paul E. McKenney
2007-08-16  1:51                 ` Paul Mackerras
2007-08-16  2:00                   ` Herbert Xu
2007-08-16  2:05                     ` Paul Mackerras
2007-08-16  2:11                       ` Herbert Xu
2007-08-16  2:35                         ` Paul E. McKenney
2007-08-16  3:15                         ` Paul Mackerras
2007-08-16  3:43                           ` Herbert Xu
2007-08-16  2:15                       ` Christoph Lameter
2007-08-16  2:17                         ` Christoph Lameter
2007-08-16  2:33                       ` Satyam Sharma
2007-08-16  3:01                         ` Satyam Sharma
2007-08-16  4:11                           ` Paul Mackerras
2007-08-16  5:39                             ` Herbert Xu
2007-08-16  6:56                               ` Paul Mackerras
2007-08-16  7:09                                 ` Herbert Xu
2007-08-16  8:06                                   ` Stefan Richter
2007-08-16  8:10                                     ` Herbert Xu
2007-08-16  9:54                                       ` Stefan Richter
2007-08-16 10:31                                         ` Stefan Richter
2007-08-16 10:42                                           ` Herbert Xu
2007-08-16 16:34                                             ` Paul E. McKenney
2007-08-16 23:59                                               ` Herbert Xu
2007-08-17  1:01                                                 ` Paul E. McKenney
2007-08-17  7:39                                                   ` Satyam Sharma
2007-08-17 14:31                                                     ` Paul E. McKenney
2007-08-17 18:31                                                       ` Satyam Sharma
2007-08-17 18:56                                                         ` Paul E. McKenney
2007-08-17  3:15                                               ` Nick Piggin
2007-08-17  4:02                                                 ` Paul Mackerras
2007-08-17  4:39                                                   ` Nick Piggin
2007-08-17  7:25                                                 ` Stefan Richter
2007-08-17  8:06                                                   ` Nick Piggin
2007-08-17  8:58                                                     ` Satyam Sharma
2007-08-17  9:15                                                       ` Nick Piggin
2007-08-17 10:03                                                         ` Satyam Sharma
2007-08-17 11:50                                                           ` Nick Piggin
2007-08-17 12:50                                                             ` Satyam Sharma
2007-08-17 12:56                                                               ` Nick Piggin
2007-08-18  2:15                                                                 ` Satyam Sharma
2007-08-17 10:48                                                     ` Stefan Richter
2007-08-17 10:58                                                       ` Stefan Richter
2007-08-18 14:35                                                     ` LDD3 pitfalls (was Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures) Stefan Richter
2007-08-20 13:28                                                       ` Chris Snook
2007-08-17 22:14                                                 ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Segher Boessenkool
2007-08-17  5:04                                             ` Paul Mackerras
2007-08-16 10:35                                         ` Herbert Xu
2007-08-16 19:48                                       ` Chris Snook
2007-08-17  0:02                                         ` Herbert Xu
2007-08-17  2:04                                           ` Chris Snook
2007-08-17  2:13                                             ` Herbert Xu
2007-08-17  2:31                                             ` Nick Piggin
2007-08-17  5:09                                       ` Paul Mackerras
2007-08-17  5:32                                         ` Herbert Xu
2007-08-17  5:41                                           ` Paul Mackerras
2007-08-17  8:28                                             ` Satyam Sharma
2007-08-16 14:48                                   ` Ilpo Järvinen
2007-08-16 16:19                                     ` Stefan Richter
2007-08-16 19:55                                     ` Chris Snook
2007-08-16 20:20                                       ` Christoph Lameter
2007-08-17  1:02                                         ` Paul E. McKenney
2007-08-17  1:28                                           ` Herbert Xu
2007-08-17  5:07                                             ` Paul E. McKenney
2007-08-17  2:16                                         ` Paul Mackerras
2007-08-17  3:03                                           ` Linus Torvalds
2007-08-17  3:43                                             ` Paul Mackerras
2007-08-17  3:53                                               ` Herbert Xu
2007-08-17  6:26                                                 ` Satyam Sharma
2007-08-17  8:38                                                   ` Nick Piggin
2007-08-17  9:14                                                     ` Satyam Sharma
2007-08-17  9:31                                                       ` Nick Piggin
2007-08-17 10:55                                                         ` Satyam Sharma
2007-08-17 12:39                                                           ` Nick Piggin
2007-08-17 13:36                                                             ` Satyam Sharma
2007-08-17 16:48                                                             ` Linus Torvalds
2007-08-17 18:50                                                               ` Chris Friesen
2007-08-17 18:54                                                                 ` Arjan van de Ven
2007-08-17 19:49                                                                   ` Paul E. McKenney
2007-08-17 19:49                                                                     ` Arjan van de Ven
2007-08-17 20:12                                                                       ` Paul E. McKenney
2007-08-17 19:08                                                                 ` Linus Torvalds
2007-08-20 13:15                                                               ` Chris Snook
2007-08-20 13:32                                                                 ` Herbert Xu
2007-08-20 13:38                                                                   ` Chris Snook
2007-08-20 22:07                                                                     ` Segher Boessenkool
2007-08-21  5:46                                                                 ` Linus Torvalds
2007-08-21  7:04                                                                   ` David Miller
2007-08-21 13:50                                                                     ` Chris Snook
2007-08-21 14:59                                                                       ` Segher Boessenkool
2007-08-21 16:31                                                                       ` Satyam Sharma
2007-08-21 16:43                                                                       ` Linus Torvalds
2007-09-09 18:02                                                               ` Denys Vlasenko
2007-09-09 18:18                                                                 ` Arjan van de Ven
2007-09-10 10:56                                                                   ` Denys Vlasenko
2007-09-10 11:15                                                                     ` Herbert Xu
2007-09-10 12:22                                                                     ` Kyle Moffett
2007-09-10 13:38                                                                       ` Denys Vlasenko
2007-09-10 14:16                                                                         ` Denys Vlasenko
2007-09-10 15:09                                                                           ` Linus Torvalds
2007-09-10 16:46                                                                             ` Denys Vlasenko
2007-09-10 19:59                                                                               ` Kyle Moffett
2007-09-10 18:59                                                                             ` Christoph Lameter
2007-09-10 23:19                                                                             ` [PATCH] Document non-semantics of atomic_read() and atomic_set() Chris Snook
2007-09-10 23:44                                                                               ` Paul E. McKenney
2007-09-11 19:35                                                                               ` Christoph Lameter
2007-09-10 14:51                                                                     ` [PATCH 0/24] make atomic_read() behave consistently across all architectures Arjan van de Ven
2007-09-10 14:38                                                                       ` Denys Vlasenko
2007-09-10 17:02                                                                         ` Arjan van de Ven
2007-08-17 11:08                                                     ` Stefan Richter
2007-08-17 22:09                                             ` Segher Boessenkool
2007-08-17 17:41                                         ` Segher Boessenkool
2007-08-17 18:38                                           ` Satyam Sharma
2007-08-17 23:17                                             ` Segher Boessenkool
2007-08-17 23:55                                               ` Satyam Sharma
2007-08-18  0:04                                                 ` Segher Boessenkool
2007-08-18  1:56                                                   ` Satyam Sharma
2007-08-18  2:15                                                     ` Segher Boessenkool
2007-08-18  3:33                                                       ` Satyam Sharma
2007-08-18  5:18                                                         ` Segher Boessenkool
2007-08-18 13:20                                                           ` Satyam Sharma
2007-09-10 18:59                                           ` Christoph Lameter
2007-09-10 20:54                                             ` Paul E. McKenney
2007-09-10 21:36                                               ` Christoph Lameter
2007-09-10 21:50                                                 ` Paul E. McKenney
2007-09-11  2:27                                             ` Segher Boessenkool
2007-08-16 21:08                                       ` Luck, Tony
2007-08-16 19:55                                     ` Chris Snook
2007-08-16 18:54                             ` Christoph Lameter
2007-08-16 20:07                               ` Paul E. McKenney
2007-08-16  3:05                         ` Paul Mackerras
2007-08-16 19:39                           ` Segher Boessenkool
2007-08-16  2:07                   ` Segher Boessenkool
2007-08-24 12:50           ` Denys Vlasenko
2007-08-24 17:15             ` Christoph Lameter
2007-08-24 20:21               ` Denys Vlasenko
2007-08-16  3:37         ` Bill Fink
2007-08-16  5:20           ` Satyam Sharma
2007-08-16  5:57             ` Satyam Sharma
2007-08-16  9:25               ` Satyam Sharma
2007-08-16 21:00               ` Segher Boessenkool
2007-08-17  4:32                 ` Satyam Sharma
2007-08-17 22:38                   ` Segher Boessenkool
2007-08-18 14:42                     ` Satyam Sharma
2007-08-16 20:50             ` Segher Boessenkool
2007-08-16 22:40               ` David Schwartz
2007-08-17  4:36                 ` Satyam Sharma
2007-08-17  4:24               ` Satyam Sharma
2007-08-17 22:34                 ` Segher Boessenkool
2007-08-15 19:59       ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2023-10-16 12:31 Gilbert Adikankwu
2023-10-16 12:34 ` your mail Julia Lawall
2023-10-16 12:42   ` Gilbert Adikankwu
2023-10-16 13:23     ` Julia Lawall
2023-05-10 19:01 [PATCH] maple_tree: Fix a few documentation issues, Thomas Gleixner
2023-05-15 19:27 ` your mail Liam R. Howlett
2023-05-15 21:16   ` Thomas Gleixner
2023-05-16 22:47   ` Thomas Gleixner
2023-05-23 13:46     ` Liam R. Howlett
     [not found] <CAP7CzPfLu6mm6f2fon-zez3PW6rDACEH6ihF2aG+1Dc7Zc2WuQ@mail.gmail.com>
2021-09-13  6:06 ` Willy Tarreau
2021-08-21  8:59 Kari Argillander
2021-08-22 13:13 ` your mail CGEL
2021-08-16  2:46 Kari Argillander
2021-08-16 12:27 ` your mail Christoph Hellwig
2021-04-07  1:27 [PATCH] drivers/gpu/drm/ttm/ttm_page_allo.c: adjust ttm pages refcount fix the bug: Feb 6 17:13:13 aaa-PC kernel: [ 466.271034] BUG: Bad page state in process blur_image pfn:7aee2 Feb 6 17:13:13 aaa-PC kernel: [ 466.271037] page:980000025fca4170 count:0 mapcount:0 mapping:980000025a0dca60 index:0x0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271039] flags: 0x1e01fff000000() Feb 6 17:13:13 aaa-PC kernel: [ 466.271042] raw: 0001e01fff000000 0000000000000100 0000000000000200 980000025a0dca60 Feb 6 17:13:13 aaa-PC kernel: [ 466.271044] raw: 0000000000000000 0000000000000000 00000000ffffffff Feb 6 17:13:13 aaa-PC kernel: [ 466.271046] page dumped because: non-NULL mapping Feb 6 17:13:13 aaa-PC kernel: [ 466.271047] Modules linked in: bnep fuse bluetooth ecdh_generic sha256_generic cfg80211 rfkill vfat fat serio_raw uio_pdrv_genirq binfmt_misc ip_tables amdgpu chash radeon r8168 loongson gpu_sched Feb 6 17:13:13 aaa-PC kernel: [ 466.271059] CPU: 3 PID: 9554 Comm: blur_image Tainted: G B 4.19.0-loongson-3-desktop #3036 Feb 6 17:13:13 aaa-PC kernel: [ 466.271061] Hardware name: Haier Kunlun-LS3A4000-LS7A-desktop/Kunlun-LS3A4000-LS7A-desktop, BIOS Kunlun-V4.0.12V4.0 LS3A4000 03/19/2020 Feb 6 17:13:13 aaa-PC kernel: [ 466.271063] Stack : 000000000000007b 000000007400cce0 0000000000000000 0000000000000007 Feb 6 17:13:13 aaa-PC kernel: [ 466.271067] 0000000000000000 0000000000000000 0000000000002a82 ffffffff8202c910 Feb 6 17:13:13 aaa-PC kernel: [ 466.271070] 0000000000000000 0000000000002a82 0000000000000000 ffffffff81e20000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271074] 0000000000000000 ffffffff8021301c ffffffff82040000 6e754b20534f4942 Feb 6 17:13:13 aaa-PC kernel: [ 466.271078] ffff000000000000 0000000000000000 000000007400cce0 0000000000000000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271082] 9800000007155d40 ffffffff81cc5470 0000000000000005 6db6db6db6db0000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271086] 0000000000000003 fffffffffffffffb 0000000000006000 98000002559f4000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271090] 980000024a448000 980000024a44b7f0 9800000007155d50 ffffffff819f5158 Feb 6 17:13:13 aaa-PC kernel: [ 466.271094] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Feb 6 17:13:13 aaa-PC kernel: [ 466.271097] 9800000007155d40 ffffffff802310c4 ffffffff81e70000 ffffffff819f5158 Feb 6 17:13:13 aaa-PC kernel: [ 466.271101] ... Feb 6 17:13:13 aaa-PC kernel: [ 466.271103] Call Trace: Feb 6 17:13:13 aaa-PC kernel: [ 466.271107] [<ffffffff802310c4>] show_stack+0x44/0x1c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271110] [<ffffffff819f5158>] dump_stack+0x1d8/0x240 Feb 6 17:13:13 aaa-PC kernel: [ 466.271113] [<ffffffff80491c10>] bad_page+0x210/0x2c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271116] [<ffffffff804931c8>] free_pcppages_bulk+0x708/0x900 Feb 6 17:13:13 aaa-PC kernel: [ 46 6.271119] [<ffffffff804980cc>] free_unref_page_list+0x1cc/0x2c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271122] [<ffffffff804ad2c8>] release_pages+0x648/0x900 Feb 6 17:13:13 aaa-PC kernel: [ 466.271125] [<ffffffff804f3b48>] tlb_flush_mmu_free+0x88/0x100 Feb 6 17:13:13 aaa-PC kernel: [ 466.271128] [<ffffffff804f8a24>] zap_pte_range+0xa24/0x1480 Feb 6 17:13:13 aaa-PC kernel: [ 466.271132] [<ffffffff804f98b0>] unmap_page_range+0x1f0/0x500 Feb 6 17:13:13 aaa-PC kernel: [ 466.271135] [<ffffffff804fa054>] unmap_vmas+0x154/0x200 Feb 6 17:13:13 aaa-PC kernel: [ 466.271138] [<ffffffff8051190c>] exit_mmap+0x20c/0x380 Feb 6 17:13:13 aaa-PC kernel: [ 466.271142] [<ffffffff802bb9c8>] mmput+0x148/0x300 Feb 6 17:13:13 aaa-PC kernel: [ 466.271145] [<ffffffff802c80d8>] do_exit+0x6d8/0x1900 Feb 6 17:13:13 aaa-PC kernel: [ 466.271148] [<ffffffff802cb288>] do_group_exit+0x88/0x1c0 Feb 6 17:13:13 aaa-PC kernel: [ 466.271151] [<ffffffff802cb3d8>] sys_exit_group+0x18/0x40 Feb 6 17 :13:13 aaa-PC kernel: [ 466.271155] [<ffffffff8023f954>] syscall_common+0x34/0xa4 songqiang
2021-04-07  8:25 ` your mail Huang Rui
2021-04-07  9:25   ` Christian König
2021-04-01 21:16 Bhaumik Bhatt
2021-04-07  6:56 ` your mail Manivannan Sadhasivam
     [not found] <20210322213644.333112726@goodmis.org>
2021-03-22 21:40 ` Steven Rostedt
     [not found] <20210322212156.440428241@goodmis.org>
2021-03-22 21:36 ` Steven Rostedt
2020-12-02  1:10 [PATCH] lib/find_bit: Add find_prev_*_bit functions Yun Levi
2020-12-02  9:47 ` Andy Shevchenko
2020-12-02 10:04   ` Rasmus Villemoes
2020-12-02 11:50     ` Yun Levi
     [not found]       ` <CAAH8bW-jUeFVU-0OrJzK-MuGgKJgZv38RZugEQzFRJHSXFRRDA@mail.gmail.com>
2020-12-02 17:37         ` Andy Shevchenko
2020-12-02 18:27           ` Yun Levi
2020-12-02 18:51             ` your mail Andy Shevchenko
2020-12-02 18:56               ` Andy Shevchenko
2020-08-05 11:02 [PATCH v4] arm64: dts: qcom: Add support for Xiaomi Poco F1 (Beryllium) Amit Pundir
2020-08-06 22:31 ` Konrad Dybcio
2020-08-13  7:04   ` your mail Bjorn Andersson
2020-08-17 17:12     ` Amit Pundir
2020-08-30 18:58       ` Bjorn Andersson
2020-06-09 11:38 Gaurav Singh
2020-06-09 11:54 ` your mail Greg KH
2020-05-06  5:52 Jiaxun Yang
2020-05-07 11:00 ` your mail Thomas Bogendoerfer
     [not found] <20191026192359.27687-1-frank-w@public-files.de>
2019-10-26 19:30 ` Greg Kroah-Hartman
     [not found] <20190626145238.19708-1-bigeasy@linutronix.de>
2019-06-27 21:13 ` Tejun Heo
     [not found] <20190411060536.22409-1-npiggin@gmail.com>
2019-04-11 10:53 ` Peter Zijlstra
2019-04-12  3:23   ` Nicholas Piggin
     [not found] <20190323171738.GA26736@titus.pi.local>
2019-03-26  8:42 ` Dan Carpenter
     [not found] <20190319144116.400-1-mlevitsk@redhat.com>
2019-03-19 15:22 ` Keith Busch
2019-03-19 23:49   ` Chaitanya Kulkarni
2019-03-20 16:44     ` Maxim Levitsky
2019-03-20 16:30   ` Maxim Levitsky
2019-03-20 17:03     ` Keith Busch
2019-03-20 17:33       ` Maxim Levitsky
2019-04-08 10:04   ` Maxim Levitsky
2019-03-21 16:13 ` Stefan Hajnoczi
2019-03-21 17:07   ` Maxim Levitsky
2019-03-25 16:46     ` Stefan Hajnoczi
     [not found] <20190319022012.11051-1-thirtythreeforty@gmail.com>
2019-03-20  7:26 ` Greg Kroah-Hartman
     [not found] <20190225201635.4648-1-hannes@cmpxchg.org>
2019-02-26 23:49 ` Roman Gushchin
     [not found] <20180827145032.9522-1-hch@lst.de>
2018-08-31 20:23 ` Paul Burton
     [not found] <20180724222212.8742-1-tsotsos@gmail.com>
2018-07-25  7:39 ` Greg Kroah-Hartman
     [not found] <2018071901551081442221@163.com>
2018-07-18 20:04 ` Johan Hovold
     [not found] <201807160555.w6G5t9Dc075492@mse.aten.com.tw>
2018-07-16 10:03 ` Johan Hovold
2018-06-13 17:00 [PATCH v2] staging: rts5208: add check on NULL before dereference Andy Shevchenko
     [not found] ` <20180613173128.32384-1-vasilyev@ispras.ru>
2018-06-19  7:42   ` your mail Dan Carpenter
2017-12-07  9:26 Alexander Kappner
2017-12-07 10:38 ` your mail Greg Kroah-Hartman
2017-08-18 17:42 Rajneesh Bhardwaj
2017-08-18 17:53 ` your mail Rajneesh Bhardwaj
2017-06-04 11:59 Yury Norov
2017-06-14 20:16 ` your mail Yury Norov
2017-04-10 11:03 [PATCH -v2 0/9] mm: make movable onlining suck less Michal Hocko
2017-04-15 12:17 ` Michal Hocko
2017-04-17  5:47   ` your mail Joonsoo Kim
2017-04-17  8:15     ` Michal Hocko
2017-04-20  1:27       ` Joonsoo Kim
2017-04-20  7:28         ` Michal Hocko
2017-04-20  8:49           ` Michal Hocko
2017-04-20 11:56             ` Vlastimil Babka
2017-04-20 12:13               ` Michal Hocko
2017-04-21  4:38           ` Joonsoo Kim
2017-04-21  7:16             ` Michal Hocko
2017-04-24  1:44               ` Joonsoo Kim
2017-04-24  7:53                 ` Michal Hocko
2017-04-25  2:50                   ` Joonsoo Kim
2017-04-26  9:19                     ` Michal Hocko
2017-04-27  2:08                       ` Joonsoo Kim
2017-04-27 15:10                         ` Michal Hocko
2016-11-15 20:29 Christoph Lameter
2016-11-16 10:40 ` your mail Peter Zijlstra
2016-11-16 14:25   ` Steven Rostedt
2016-11-16 14:28     ` Peter Zijlstra
2016-09-20 22:21 Andrew Banman
2016-09-20 22:23 ` your mail andrew banman
2015-08-03  6:18 Shraddha Barke
2015-08-03  7:12 ` your mail Sudip Mukherjee
2015-08-03  7:24 ` Dan Carpenter
     [not found] <20150121201024.GA4548@obsidianresearch.com>
     [not found] ` <alpine.DEB.2.02.1501211520150.13480@linuxheads99>
2015-01-21 23:57   ` Jason Gunthorpe
2015-01-22 20:50     ` One Thousand Gnomes
2015-01-28 22:09     ` atull
2014-10-15  8:10 Christoph Lameter
2014-10-27 15:07 ` your mail Tejun Heo
2014-09-01 15:47 sunwxg
2014-09-01 17:01 ` your mail Dan Carpenter
     [not found] <1409556896-21523-2-git-send-email-xiaoguang_wang5188@qq.com>
2014-09-01  8:04 ` Dan Carpenter
2014-07-09  1:03 James Ban
2014-07-09  7:56 ` your mail Mark Brown
     [not found] <CAM0G4ztXWM5kw6dV4WRrTVJBMmeJDXuRnbeRBE603hM+7c=PCg@mail.gmail.com>
2014-02-25 15:01 ` Will Deacon
     [not found] <CAM0G4zvu1BHcOrSgBuobvb-+fVsNWXjXdzZdV51T70B9_ZC4XQ@mail.gmail.com>
2014-02-24 17:28 ` Will Deacon
2014-02-25 11:28   ` Varun Sethi
2014-01-23  9:06 Prabhakar Lad
2014-01-23 19:55 ` your mail Mark Brown
     [not found] <1388425244-10017-1-git-send-email-jdb@sitrep3.com>
2014-01-09 18:39 ` Greg KH
2014-01-09 18:49   ` Joe Borġ
2014-01-14 16:40     ` Steven Rostedt
     [not found] <CACaajQtCTW_PKA25q3-4o4XAV6sgZnyD+Skkw6mhUHpRBEgbjQ@mail.gmail.com>
2012-11-26 18:29 ` Greg KH
2012-10-10 15:06 Kent Yoder
2012-10-10 15:12 ` your mail Kent Yoder
2012-10-04 16:50 Andrea Arcangeli
2012-10-04 18:17 ` your mail Christoph Lameter
2012-08-03 17:43 Tejun Heo
2012-08-08 16:39 ` your mail Tejun Heo
2012-02-21 15:39 Yang Honggang
2012-02-21 11:34 ` your mail Hans J. Koch
2012-02-12  0:21 Richard Weinberger
2012-02-12  0:25 ` your mail Jesper Juhl
2012-02-12  1:02 ` Al Viro
2012-02-12 12:40   ` Jiri Slaby
2012-02-12 19:06     ` Al Viro
2012-02-13  9:40       ` Jiri Slaby
2012-02-12 19:11 ` Al Viro
2012-02-13  9:15   ` Jiri Slaby
     [not found] <20120110061735.9BD676BA98@mailhub.coreip.homeip.net>
2012-01-10  7:45 ` Dmitry Torokhov
2011-09-21 21:54 jim.cromie
2011-09-26 23:23 ` your mail Greg KH
2011-05-17 23:33 [PATCH] module: Use binary search in lookup_symbol() Tim Bird
2011-05-18 18:55 ` Alessio Igor Bogani
2011-05-18 19:22   ` your mail Greg KH
2011-05-18 20:35     ` Alessio Igor Bogani
2011-05-16  9:34 Keshava Munegowda
2011-05-16  9:44 ` your mail Felipe Balbi
2011-05-16 10:07   ` Munegowda, Keshava
2011-01-14  1:14 Omar Ramirez Luna
2011-01-14  4:36 ` your mail Greg KH
2011-01-03 16:38 castet.matthieu
2011-01-03 17:03 ` your mail Stanislaw Gruszka
2011-01-04  5:17   ` Tejun Heo
2010-06-13  6:16 Mike Gilks
2010-06-18 23:52 ` your mail Greg KH
2010-04-14 12:54 Alan Cox
2010-04-14 23:16 ` your mail Dmitry Torokhov
2010-04-15 23:41   ` Rafi Rubin
2010-04-16  4:21     ` Dmitry Torokhov
     [not found] <20100113004939.289333186@suse.com>
2010-01-13 14:57 ` scameron
2009-05-06 23:53 [RFC PATCH 3/3a] ptrace: add _ptrace_may_access() Oleg Nesterov
2009-05-07  0:21 ` Roland McGrath
2009-05-07  6:36   ` Oleg Nesterov
2009-05-07  8:20     ` Ingo Molnar
2009-05-07  8:31       ` Oleg Nesterov
2009-05-07  8:38         ` Ingo Molnar
2009-05-07  8:57           ` Chris Wright
2009-05-07  9:04             ` Ingo Molnar
2009-05-07  9:20               ` Chris Wright
2009-05-07  9:54                 ` James Morris
2009-05-07 10:20                   ` your mail Ingo Molnar
2009-05-08  3:27                     ` Casey Schaufler
2009-03-27 23:26 Eric Anholt
2009-03-28  0:02 ` your mail Linus Torvalds
     [not found] <8F90F944E50427428C60E12A34A309D21C401BA619@carmd-exchmb01.sierrawireless.local>
2009-03-13 16:54 ` Ralf Nyren
2009-03-11 10:47 Vitaly Mayatskikh
2009-03-11 14:59 ` your mail Linus Torvalds
2009-03-11 17:23   ` Vitaly Mayatskikh
2009-02-13  0:45 Youngwhan Kim
2009-02-13  3:40 ` your mail Johannes Weiner
2009-01-19  2:54 Gao, Yunpeng
2009-01-19  3:07 ` your mail Matthew Wilcox
2009-01-13  6:10 Steven Rostedt
2009-01-13 13:21 ` your mail Steven Rostedt
2009-01-11  3:41 Jose Luis Marchetti
2009-01-11  6:47 ` your mail Jesper Juhl
2008-05-24 20:05 Thomas Gleixner
2008-05-24 21:06 ` Daniel Walker
2008-05-20 12:34 Lukas Hejtmanek
2008-05-20 15:20 ` your mail Alan Stern
2007-10-17 18:28 nicholas.thompson1
2007-10-17 16:30 nicholas.thompson1
2007-10-17 16:36 ` your mail Jan Engelhardt
2007-10-17 17:50   ` Matti Aarnio
2007-09-24 20:44 Steven Rostedt
2007-09-24 20:50 ` your mail Steven Rostedt
2007-05-16 13:30 Bob Picco
2007-05-16 16:43 ` your mail Linas Vepstas
2007-05-16 17:11   ` Olof Johansson
2007-05-16 17:24     ` Bob Picco
2007-03-29 21:39 Gerard Braad Jr.
2007-03-29 21:42 ` your mail Jan Engelhardt
2007-03-29 21:46   ` David Miller
2007-03-29 21:48   ` Gerard Braad
2007-02-05 15:41 logic
2007-02-05 12:36 ` your mail Joerg Roedel
2007-02-05 14:01   ` Pekka Enberg
2007-02-06  9:41     ` Joerg Roedel
2005-11-25 22:06 root
2005-11-26  0:11 ` your mail Hugh Dickins
2005-06-16 23:08 trmcneal
2005-06-16 23:32 ` your mail Chris Wedgwood
2005-06-17  1:46   ` Tom McNeal
2005-02-03  0:17 Aleksey Gorelov
2005-02-03  1:12 ` your mail Matthew Dharm
2005-02-03 16:03 ` Alan Stern
2004-09-19 12:29 plt
2004-09-19 18:22 ` your mail Jesper Juhl
2004-08-16 15:42 Jon Smirl
2004-08-16 23:55 ` Dave Airlie
2004-08-15 12:19 Dave Airlie
2004-08-15 12:34 ` your mail Christoph Hellwig
2004-08-15 23:40   ` Dave Airlie
2004-08-16  9:17     ` Christoph Hellwig
2004-08-16  9:30       ` Dave Airlie
2004-08-16  9:50         ` Christoph Hellwig
2004-08-16 10:29           ` Dave Airlie
2004-08-16 10:38             ` Christoph Hellwig
2004-08-16 11:02               ` Dave Airlie
2004-08-16 11:08                 ` Christoph Hellwig
2004-08-16 11:12                   ` Alan Cox
2004-08-16 11:47                   ` Dave Airlie
2004-08-16 11:12           ` Alan Cox
2004-08-16 12:20             ` Christoph Hellwig
2004-08-16 12:24               ` Dave Airlie
2004-08-16 12:37                 ` Christoph Hellwig
2004-08-16 23:33                   ` Dave Airlie
2004-05-24 23:15 Laughlin, Joseph V
2004-05-24 23:04 Laughlin, Joseph V
2004-05-24 23:13 ` Bernd Petrovitsch
2004-05-24 23:21 ` Chris Wright
2004-05-24 22:20 Laughlin, Joseph V
2004-05-24 22:30 ` your mail Herbert Poetzl
2004-05-24 22:34   ` Marc-Christian Petersen
2004-05-24 22:33 ` Chris Wright
2004-04-29  3:03 whitehorse
2004-04-29  3:21 ` your mail Jon
     [not found] <200404121623.42558.vda@port.imtp.ilyichevsk.odessa.ua>
2004-04-13 13:46 ` James Morris
2004-04-09 17:54 Martin Knoblauch
2004-04-09 18:12 ` your mail Joel Jaeggli
2004-03-15 22:49 Kevin Leung
2004-03-15 23:26 ` your mail Richard B. Johnson
2004-02-24 13:58 Jim Deas
2004-02-24 14:44 ` your mail Richard B. Johnson
2004-02-19 13:52 Joilnen Leite
2004-02-19 14:12 ` your mail Richard B. Johnson
2004-02-13 19:23 Bloch, Jack
2004-02-13 19:14 Bloch, Jack
2004-02-13 16:45 Bloch, Jack
2004-02-13 18:11 ` your mail Maciej Zenczykowski
2004-02-10 23:36 Bloch, Jack
2004-02-11  1:09 ` your mail Maciej Zenczykowski
2003-12-26 20:20 caszonyi
2003-12-26 22:27 ` your mail Linus Torvalds
2004-01-05 10:59   ` Gerd Knorr
2003-12-23 14:16 dublinux
2003-12-23 14:54 ` your mail Matti Aarnio
2003-12-23 17:36   ` Norberto Bensa
     [not found] <20031210120336.GU8039@holomorphy.com>
2003-12-10 13:17 ` Stephan von Krawczynski
2003-12-03 16:19 Bloch, Jack
2003-12-03 15:08 Bloch, Jack
2003-12-03 15:43 ` your mail Richard B. Johnson
2003-12-03 16:03 ` Linus Torvalds
2003-09-30  8:17 John Bradford
2003-09-30 13:31 ` your mail Dave Jones
2003-09-30 14:06   ` Jamie Lokier
2003-09-30 14:50     ` Dave Jones
2003-09-30 15:30       ` Jamie Lokier
2003-09-30 16:34       ` Adrian Bunk
2003-09-30 14:10   ` John Bradford
2003-09-30 14:58     ` Jamie Lokier
2003-09-30 15:11       ` Dave Jones
2003-08-28  2:25 warudkar
2003-08-27 16:02 ` your mail William Lee Irwin III
2003-08-25 16:45 Marcelo Tosatti
2003-08-25 16:59 ` Herbert Pötzl
2003-08-25 13:53 Marcelo Tosatti
2003-08-25 14:30 ` your mail Herbert Pötzl
2003-08-18  6:21 "Andrey Borzenkov" 
2003-08-18 20:42 ` your mail Greg KH
2003-08-14 21:57 kartikey bhatt
2003-08-15  3:31 ` your mail James Morris
     [not found] <200308031136.17768.lx@lxhp.in-berlin.de>
2003-08-03 18:30 ` Linus Torvalds
2003-05-14 18:41 dirf
2003-05-16 10:00 ` your mail Maciej Soltysiak
     [not found] <053C05D4.4D025D2E.0005F166@netscape.net>
2003-05-08  9:06 ` Gerd Knorr
2003-04-30 21:39 Mauricio Oliveira Carneiro
2003-05-01  0:05 ` your mail Greg KH
2003-04-25 17:35 Bloch, Jack
2003-04-25 19:43 ` your mail Francois Romieu
2003-04-05  0:38 Ed Vance
2003-04-05  4:51 ` Keith Owens
2003-04-04 22:10 Ed Vance
2003-04-04 23:19 ` William Scott Lockwood III
2003-04-04 23:21 ` Keith Owens
2003-04-03 16:22 Richard B. Johnson
2003-04-03 19:22 ` David S. Miller
2003-04-03 20:02   ` your mail Richard B. Johnson
2003-04-03 19:24     ` Alan Cox
2003-04-03 20:00     ` David S. Miller
2003-04-03 20:21       ` Richard B. Johnson
2003-04-03 20:15         ` David S. Miller
2003-04-04  0:31       ` William Scott Lockwood III
2003-04-04  0:40         ` David S. Miller
2003-04-04  0:47           ` William Scott Lockwood III
2003-04-04 12:57         ` Richard B. Johnson
2003-04-04 15:28           ` William Scott Lockwood III
2003-04-04 16:04             ` Richard B. Johnson
2003-04-04 16:04             ` Christoph Hellwig
2003-04-04 16:10             ` Jens Axboe
2003-04-04 20:37             ` Matti Aarnio
2003-04-03 20:40     ` Trever L. Adams
2003-01-31 18:46 saurabh  khanna
2003-02-03 12:53 ` your mail Alexander Kellett
2003-01-24  5:54 Anoop J.
2003-01-24  6:28 ` your mail David Lang
2003-01-24  8:51   ` Anoop J.
2003-01-24  8:48     ` David Lang
2003-01-24  9:49       ` Anoop J.
2003-01-24 19:14         ` David Lang
2003-01-24 19:40           ` Maciej W. Rozycki
2003-01-24  5:08 Anoop J.
2003-01-24  5:11 ` your mail David Lang
2003-01-24  6:06   ` John Alvord
2003-01-25  2:29     ` Jason Papadopoulos
2003-01-25  2:26       ` Larry McVoy
2003-01-25 17:47         ` Eric W. Biederman
2003-01-25 23:10           ` Larry McVoy
2003-01-26  8:12             ` David S. Miller
2003-01-12 13:28 Philip K.F. Hölzenspies
2003-01-13 16:37 ` your mail Pete Zaitcev
2002-11-11 19:22 David Mosberger
2002-11-12  1:39 ` your mail Rik van Riel
2002-10-31 18:13 Bloch, Jack
2002-10-31 15:39 Bloch, Jack
2002-10-31 18:00 ` your mail Tom Bradley
2002-10-30 12:45 Roberto Fichera
2002-10-30 14:04 ` your mail Richard B. Johnson
2002-10-17  7:41 Rusty Russell
2002-10-17 14:56 ` your mail Kai Germaschewski
2002-10-18  2:47   ` Rusty Russell
2002-10-18 21:50     ` Kai Germaschewski
2002-10-14  6:28 Maros RAJNOCH /HiaeR Silvanna/
2002-10-14 12:28 ` your mail Dave Jones
2002-10-02 19:58 Mark Peloquin
2002-10-02 20:19 ` your mail jbradford
2002-10-02 12:41 s.stoklossa
2002-10-02 12:51 ` your mail Sam Ravnborg
2002-09-21  5:32 Greg KH
2002-09-23 18:35 ` your mail Patrick Mochel
2002-09-14 12:39 Paolo Ciarrocchi
2002-09-14 17:05 ` your mail Rik van Riel
     [not found] <200208312335.g7VNZmk37659@sullivan.realtime.net>
2002-09-01  9:53 ` Krzysiek Taraszka
2002-08-30 18:43 Bloch, Jack
2002-08-30 18:55 ` your mail Matthew Dharm
2002-08-30 19:22 ` Andreas Dilger
2002-08-31  0:12 ` David Woodhouse
2002-08-27 18:22 Steffen Persvold
2002-08-27 19:27 ` your mail Willy Tarreau
2002-08-23 14:45 Mike Dresser
2002-08-23 15:12 ` your mail Bill Unruh
2002-08-23 15:26   ` Mike Dresser
2002-08-23 16:12     ` Bill Unruh
2002-08-23 20:33       ` Mike Dresser
2002-08-25  2:05       ` Mike Dresser
2002-08-19 21:29 Bloch, Jack
2002-08-20  6:47 ` your mail Philipp Matthias Hahn
2002-08-16  7:51 Misha Alex
2002-08-16  9:52 ` your mail Willy Tarreau
2002-07-06 15:59 Hacksaw
2002-07-07 19:32 ` your mail Min Li
2002-07-05  8:47 Christian Berger
2002-07-05 13:34 ` your mail Gerhard Mack
     [not found] <000d01c22361$62c9d6f0$0100a8c0@digital>
2002-07-04 20:45 ` Stephen Tweedie
2002-06-24  5:49 pah
2002-06-24  7:34 ` your mail Zwane Mwaikambo
2002-05-16 12:40 Sanket Rathi
2002-05-16 13:38 ` your mail Alan Cox
2002-05-16 15:54   ` Sanket Rathi
2002-05-16 18:05     ` Alan Cox
2002-05-20 18:07     ` David Mosberger
2002-05-03 14:19 Keith Owens
2002-05-03 14:37 ` your mail tomas szepe
2002-05-03 15:07   ` tomas szepe
2002-05-03 15:29   ` Keith Owens
2002-05-03 15:45     ` tomas szepe
2002-04-24  7:55 Huo Zhigang
2002-04-24  7:51 ` your mail Zwane Mwaikambo
2002-04-24  8:27 ` Alan Cox
2002-04-21 21:16 Ivan G.
2002-04-21 23:02 ` your mail Jeff Garzik
2002-03-13 19:21 Romain Liévin
2002-03-13 19:43 ` your mail Alan Cox
2002-03-13 20:28   ` Romain Liévin
2002-03-13 20:49     ` Richard B. Johnson
2002-03-13 22:27       ` Alan Cox
2002-03-13 22:35     ` Alan Cox
2002-03-14  7:08 ` Zwane Mwaikambo
2002-02-28 13:58 shura
2002-03-01 15:30 ` your mail Jan-Marek Glogowski
2002-02-25  1:41 Rusty Russell
2002-02-25  1:58 ` your mail Alexander Viro
2002-02-25  2:14   ` Rusty Russell
2002-02-25  3:18     ` Davide Libenzi
2002-02-25  4:02     ` Alexander Viro
2002-02-26  5:50       ` Rusty Russell
2002-02-25 13:16 ` Alan Cox
2002-01-30 18:21 Nickolaos Fotopoulos
2002-01-30 18:57 ` your mail Matti Aarnio
2002-01-31  1:50 ` Drew P. Vogel
2002-01-09 17:49 Michael Zhu
2002-01-09 18:17 ` your mail Jens Axboe
2001-12-17 16:07 Sebastian Dröge
2001-12-17 16:22 ` your mail Dave Jones
2001-12-17 16:52 ` Sebastian Dröge
2001-12-17 16:55   ` Arnaldo Carvalho de Melo
2001-12-17 17:23   ` Sebastian Dröge
2001-12-17 17:25     ` Dave Jones
2001-12-17 18:42     ` Sebastian Dröge
2001-12-17 18:43       ` Dave Jones
     [not found] <20011214041151.91557.qmail@web14904.mail.yahoo.com>
2001-12-14 16:46 ` Gérard Roudier
2001-12-14 20:09   ` Jens Axboe
2001-12-14 18:05     ` Gérard Roudier
2001-12-14 22:26       ` Peter Bornemann
2001-12-14 20:16         ` Gérard Roudier
2001-12-15  0:54           ` Peter Bornemann
2001-12-15  6:57             ` Gérard Roudier
2001-12-18  0:34   ` Kirk Alexander
2001-12-14 20:34 ` Jens Axboe
2001-12-15  0:56 ` Stephan von Krawczynski
2001-12-15  6:59   ` Gérard Roudier
2001-12-07  4:17 Keith Owens
2001-12-07  5:10 ` your mail Linus Torvalds
2001-12-27 18:09   ` Andre Hedrick
2001-12-27 18:55     ` Linus Torvalds
2001-12-27 19:41       ` Andrew Morton
2001-12-28 22:14       ` Martin Dalecki
2001-10-15  6:25 Dinesh  Gandhewar
2001-10-15  6:31 ` your mail Tim Hockin
2001-10-02 15:29 Dinesh  Gandhewar
2001-10-02 15:30 ` your mail Alan Cox
2001-10-02 15:49 ` Richard B. Johnson
2001-10-02 15:52 ` Michael H. Warfield
2001-07-24  0:38 新 月
2001-07-24 12:47 ` your mail Richard B. Johnson
2001-06-13  1:55 Colonel
2001-06-13  9:32 ` your mail Luigi Genoni
2001-06-18 13:55 ` Jan Hudec
2001-05-31 16:53 Ramil.Santamaria
2001-05-31 20:37 ` your mail Andrzej Krzysztofowicz
2001-05-31 23:04   ` H. Peter Anvin
2001-05-21 19:43 Thomas Palm
2001-05-21 20:12 ` your mail Lorenzo Marcantonio
2001-05-22 10:06   ` Thomas Palm
2001-05-16 15:05 siva prasad
2001-05-17  0:11 ` your mail Erik Mouw
2001-05-08 19:48 Richard B. Johnson
2001-05-08 20:06 ` your mail Jens Axboe
2001-05-08 20:15   ` Richard B. Johnson
2001-05-08 20:16     ` Jens Axboe
2001-05-09 13:59       ` Richard B. Johnson
2001-05-08 20:46 ` Alan Cox
2001-05-08 21:05   ` Richard B. Johnson
2001-05-07 11:38 Chandrashekar Nagaraj
2001-05-07 12:09 ` your mail Erik Mouw
2001-05-02 22:34 Duc Vianney
2001-05-03  0:10 ` your mail Linus Torvalds
2001-04-26 19:37 Alexandru Barloiu Nicolae
2001-04-26 19:51 ` your mail Erik Mouw
2001-04-26 19:54 ` Mohammad A. Haque
2001-04-26 19:59 ` Joel Jaeggli
2001-04-02 19:20 Jakob Kemi
2001-04-09 13:23 ` your mail Tim Waugh
2001-03-24  0:04 dhar
2001-03-24  1:09 ` your mail Tim Wright
2001-03-11 17:06 Martin Bruchanov
2001-03-12  5:03 ` your mail Greg KH
2001-03-14 17:46   ` Robert Read
2001-03-06 23:55 Ying Chen
2001-03-07  0:40 ` your mail Don Dugger
2001-01-19 13:37 Robert Kaiser
2001-01-19 14:37 ` your mail Steve Hill
2001-01-10 18:24 Thiago Rondon
2001-01-11  4:08 ` your mail David Hinds
2000-12-11 14:01 Heiko.Carstens
2000-12-11 15:14 ` your mail Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).