Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]]

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]]
       [not found] <Pine.LNX.4.31.0104192315480.4357-100000@penguin.transmeta.com>
@ 2001-04-20  8:23 ` David Howells
  2001-04-20 17:46   ` Linus Torvalds
  2001-04-20 18:58   ` Andrea Arcangeli
  0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2001-04-20  8:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dhowells, David S. Miller, linux-kernel

Linus Torvalds <torvalds@transmeta.com> wrote:
> I think Andrea is right. Although this file seems to be entirely
> old-fashioned and should never be used, right?

I presume you're talking about "include/asm-i386/rwsem-spin.h"... If so,
Andrea is right, there is a bug in it (repeated a number of times), though why
the tests succeeded, I'm not sure.

The file should only be used for the 80386 and maybe early 80486's where
CMPXCHG doesn't work properly, everything above that can use the XADD
implementation.

> Also, I _really_ don't see why the code is inlined at all (in the real
> <linux/rwsem-spinlock.h>. It shouldn't be. It should be a real function
> call, and all be done inside lib/rwsem.c inside a
> 
> 	#ifdef CONFIG_RWSEM_GENERIC_SPINLOCK
> 
> or whatever.

Andrea seems to have changed his mind on the non-inlining in the generic case.

But if you want it totally non-inline, then that can be done. However, whilst
developing it, I did notice that that slowed things down, hence why I wanted
it kept in line.

I have some ideas on how to improve efficiency in that one anyway, based on
some a comment from Alan Cox.

> Please either set me straight, or send me a patch to remove
> asm-i386/rwsem-spin.h and fix up linux/rwsem-spinlock.h. Ok?

I think there are two seperate issues here:

  (1) asm-i386/rwsem-spin.h is wrong, and can probably be replaced with the
      generic spinlock implementation without inconveniencing people much.
      (though someone has commented that they'd want this to be inline as
       cycles are precious on the slow 80386).

  (2) "fix up linux/rwsem-spinlock.h": do you want the whole generic spinlock
      implementation made non-inline then?

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]]
  2001-04-20  8:23 ` [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]] David Howells
@ 2001-04-20 17:46   ` Linus Torvalds
  2001-04-25 13:20     ` Jamie Lokier
  2001-04-20 18:58   ` Andrea Arcangeli
  1 sibling, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2001-04-20 17:46 UTC (permalink / raw)
  To: David Howells; +Cc: dhowells, David S. Miller, linux-kernel



On Fri, 20 Apr 2001, David Howells wrote:
>
> The file should only be used for the 80386 and maybe early 80486's where
> CMPXCHG doesn't work properly, everything above that can use the XADD
> implementation.

Why are those not using the generic files? The generic code is obviously
more maintainable.

> But if you want it totally non-inline, then that can be done. However, whilst
> developing it, I did notice that that slowed things down, hence why I wanted
> it kept in line.

I want to keep the _fast_ case in-line.

I do not care at ALL about the stupid spinlock version. That should be the
_fallback_, and it should be out-of-line. It is always going to be the
slowest implementation, modulo bugs in architecture-specific code.

For i386 and i486, there is no reason to try to maintain a complex fast
case. The machines are unquestionably going away - we should strive to not
burden them unnecessarily, but we should _not_ try to save two cycles.

In short:
 - the only case that _really_ matters for performance is the uncontended
   read-lock for "reasonable" machines. A i386 no longer counts as
   reasonable, and designing for it would be silly. And the write-lock
   case is much less compelling.
 - We should avoid any inlines where the inline code is >2* the
   out-of-line code. Icache issues can overcome any cycle gains, and do
   not show up well in benchmarks (benchmarks tend to have very hot
   icaches). Note that this is less important for the out-of-line code in
   another segment that doesn't get brought into the icache at all for the
   non-contention case, but that should still be taken _somewhat_ into
   account if only because of kernel size issues.

Both of the above rules implies that the generic spin-lock implementation
should be out-of-line.

>   (1) asm-i386/rwsem-spin.h is wrong, and can probably be replaced with the
>       generic spinlock implementation without inconveniencing people much.
>       (though someone has commented that they'd want this to be inline as
>        cycles are precious on the slow 80386).

Icache is also precious on the 386, which has no L2 in 99% of all cases.
Make it out-of-line.

>   (2) "fix up linux/rwsem-spinlock.h": do you want the whole generic spinlock
>       implementation made non-inline then?

Yes. People who care about performance _will_ have architecture-specific
inlines on architectures where they make sense (ie 99% of them).

		Linus


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]]
  2001-04-20 17:46   ` Linus Torvalds
@ 2001-04-25 13:20     ` Jamie Lokier
  0 siblings, 0 replies; 4+ messages in thread
From: Jamie Lokier @ 2001-04-25 13:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Howells, dhowells, David S. Miller, linux-kernel

Linus Torvalds wrote:
> For i386 and i486, there is no reason to try to maintain a complex fast
> case. The machines are unquestionably going away - we should strive to not
> burden them unnecessarily, but we should _not_ try to save two cycles.
...
> Icache is also precious on the 386, which has no L2 in 99% of all cases.
> Make it out-of-line.

AFAIK, only some 386 clones have a cache -- the Intel ones do not.
Therefore saving icache is not an issue, and the cycle cost of an out of
line call is somewhat more than two cycles.

-- Jamie

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]]
  2001-04-20  8:23 ` [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]] David Howells
  2001-04-20 17:46   ` Linus Torvalds
@ 2001-04-20 18:58   ` Andrea Arcangeli
  1 sibling, 0 replies; 4+ messages in thread
From: Andrea Arcangeli @ 2001-04-20 18:58 UTC (permalink / raw)
  To: David Howells; +Cc: Linus Torvalds, dhowells, David S. Miller, linux-kernel

On Fri, Apr 20, 2001 at 09:23:47AM +0100, David Howells wrote:
> Andrea seems to have changed his mind on the non-inlining in the generic case.

I changed my mind because if you benchmark the fast path you will do it without
running out of icache (basically only down_* and up_* will be in the icache
during the tight loop). And either ways shouldn't make a measurable difference
in a real life benchmark.

Andrea

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-04-25 13:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.31.0104192315480.4357-100000@penguin.transmeta.com>
2001-04-20  8:23 ` [andrea@suse.de: Re: generic rwsem [Re: Alpha "process table hang"]] David Howells
2001-04-20 17:46   ` Linus Torvalds
2001-04-25 13:20     ` Jamie Lokier
2001-04-20 18:58   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).