All of lore.kernel.org
 help / color / mirror / Atom feed
* power and percpu: Could we move the paca into the percpu area?
@ 2014-06-11 19:37 Christoph Lameter
  2014-06-11 20:22 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Lameter @ 2014-06-11 19:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tejun Heo


Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
comes from a local_paca field and local_paca is in r13. That means that
for all percpu operations we first have to determine the address through a
memory access.

Would it be possible to put the paca at the beginning of the percpu data
area and then have r31 point to the percpu area?

power has these nice instructions that fetch from an offset relative to a
base register which could be used throughout for percpu operations in the
kernel (similar to x86 segment registers).

With that we may also be able to use the atomic ops for fast percpu access
so that we can avoid the irq enable/disable sequence that is now required
for percpu atomics. Would result in fast and reliable percpu
counters for powerpc.

I.e. powerpc atomic inc
static __inline__ void atomic_inc(atomic_t *v)
{
        int t;

        __asm__ __volatile__(
"1:     lwarx   %0,0,%2         # atomic_inc\n\
        addic   %0,%0,1\n"
        PPC405_ERR77(0,%2)
"       stwcx.  %0,0,%2 \n\
        bne-    1b"
        : "=&r" (t), "+m" (v->counter)
        : "r" (&v->counter)
        : "cc", "xer");
}

Could be used as a template to get:

static __inline__ void raw_cpu_inc_4(__percpu void *v)
{
        int t;

        __asm__ __volatile__(
"1:     lwarx   %0,r31,%2         # percpu_inc\n\
        addic   %0,%0,1\n"
        PPC405_ERR77(0,%2)
"       stwcx.  %0,r31,%2 \n\
        bne-    1b"
        : "=&r" (t), "+m" (v)
        : "r" (&v->counter)
        : "cc", "xer");
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-11 19:37 power and percpu: Could we move the paca into the percpu area? Christoph Lameter
@ 2014-06-11 20:22 ` Benjamin Herrenschmidt
  2014-06-11 21:03   ` Gabriel Paubert
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-11 20:22 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Tejun Heo, linuxppc-dev

On Wed, 2014-06-11 at 14:37 -0500, Christoph Lameter wrote:
> Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
> comes from a local_paca field and local_paca is in r13. That means that
> for all percpu operations we first have to determine the address through a
> memory access.
> 
> Would it be possible to put the paca at the beginning of the percpu data
> area and then have r31 point to the percpu area?
> 
> power has these nice instructions that fetch from an offset relative to a
> base register which could be used throughout for percpu operations in the
> kernel (similar to x86 segment registers).
> 
> With that we may also be able to use the atomic ops for fast percpu access
> so that we can avoid the irq enable/disable sequence that is now required
> for percpu atomics. Would result in fast and reliable percpu
> counters for powerpc.

So.... this is complicated :) And it's something I did want to tackle
for a while but haven't had a chance.

The issues off the top of my head are:

 - The PACA must be accessible in real mode, which means that when
running under a hypervisor, it must be allocated in the "RMA" which is
the low part of memory up to a limit that depends on the hypervisor, but
can be as low as 128M on some older machines.

 - However, we use percpu more than paca in normal kernel C code, the
PACA is mostly used during exception entry/exit, KVM, and for interrupt
soft-enable/disable. So it might make sense to change things so that r13
contains the per-cpu offset instead. However, doing that change and
updating the asm to cope isn't a trivial undertaking.

 - Direct offset from r13 in asm ... works as long as the offset is
within the signed 32k range. Otherwise we need at least one more addis
instruction. Anton mentioned the linker may have some smarts however for
removing that addis if the high part of the offset happens to be 0.

 - For atomics, the jury is still out as to whether it would be faster
or not. The atomic ops (lwarx/stwcx.) are expensive. They flush the
value out of the L1 (to L2) among others. On the other hand we have
interrupts soft-disable so masking interrupts isn't very expensive.
Unmasking, while cheap, is currently out of line however. I have been
wondering if we could move some of the soft-irq state instead to a CR
field and mark that -ffixed with gcc so we can make irq
soft-disable/enable even faster and more in-line.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-11 20:22 ` Benjamin Herrenschmidt
@ 2014-06-11 21:03   ` Gabriel Paubert
  2014-06-12 12:26     ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Gabriel Paubert @ 2014-06-11 21:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

On Thu, Jun 12, 2014 at 06:22:11AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-11 at 14:37 -0500, Christoph Lameter wrote:
> > Looking at arch/powerpc/include/asm/percpu.h I see that the per cpu offset
> > comes from a local_paca field and local_paca is in r13. That means that
> > for all percpu operations we first have to determine the address through a
> > memory access.
> > 
> > Would it be possible to put the paca at the beginning of the percpu data
> > area and then have r31 point to the percpu area?
> > 
> > power has these nice instructions that fetch from an offset relative to a
> > base register which could be used throughout for percpu operations in the
> > kernel (similar to x86 segment registers).
> > 
> > With that we may also be able to use the atomic ops for fast percpu access
> > so that we can avoid the irq enable/disable sequence that is now required
> > for percpu atomics. Would result in fast and reliable percpu
> > counters for powerpc.
> 
> So.... this is complicated :) And it's something I did want to tackle
> for a while but haven't had a chance.
> 
> The issues off the top of my head are:
> 
>  - The PACA must be accessible in real mode, which means that when
> running under a hypervisor, it must be allocated in the "RMA" which is
> the low part of memory up to a limit that depends on the hypervisor, but
> can be as low as 128M on some older machines.
> 
>  - However, we use percpu more than paca in normal kernel C code, the
> PACA is mostly used during exception entry/exit, KVM, and for interrupt
> soft-enable/disable. So it might make sense to change things so that r13
> contains the per-cpu offset instead. However, doing that change and
> updating the asm to cope isn't a trivial undertaking.
> 
>  - Direct offset from r13 in asm ... works as long as the offset is
> within the signed 32k range. Otherwise we need at least one more addis
> instruction. Anton mentioned the linker may have some smarts however for
> removing that addis if the high part of the offset happens to be 0.
> 
>  - For atomics, the jury is still out as to whether it would be faster
> or not. The atomic ops (lwarx/stwcx.) are expensive. They flush the
> value out of the L1 (to L2) among others. On the other hand we have
> interrupts soft-disable so masking interrupts isn't very expensive.
> Unmasking, while cheap, is currently out of line however. I have been
> wondering if we could move some of the soft-irq state instead to a CR
> field and mark that -ffixed with gcc so we can make irq
> soft-disable/enable even faster and more in-line.

Actually, from gcc/config/rs6000.h:

/* 1 for registers that have pervasive standard uses
   and are not available for the register allocator.

   On RS/6000, r1 is used for the stack.  On Darwin, r2 is available
   as a local register; for all other OS's r2 is the TOC pointer.

   cr5 is not supposed to be used.

   On System V implementations, r13 is fixed and not available for use.  */

#define FIXED_REGISTERS  \
  {0, 1, FIXED_R2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
   0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,          \
   /* AltiVec registers.  */                       \
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
   1, 1                                            \
   , 1, 1, 1, 1, 1, 1                              \
}

So cr5, which is number 73, is never used by gcc. 
Disassembling a few kernels seems to confirm this.
This gives you 4 booleans to play with...

	Gabriel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-11 21:03   ` Gabriel Paubert
@ 2014-06-12 12:26     ` Segher Boessenkool
  2014-06-12 21:57       ` Benjamin Herrenschmidt
  2014-06-13 11:55       ` Gabriel Paubert
  0 siblings, 2 replies; 8+ messages in thread
From: Segher Boessenkool @ 2014-06-12 12:26 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

> Actually, from gcc/config/rs6000.h:
> 
> /* 1 for registers that have pervasive standard uses
>    and are not available for the register allocator.

[snip]

> So cr5, which is number 73, is never used by gcc. 

Not available for RA is not the same thing at all as not used by GCC.
For example, GPR1 and XER[CA] are also fixed regs.

But, indeed, it does look like GCC doesn't use it.  It seems to me that
some ABI forbade userland (or non-libraries or whatever) from using it.
I'll see what I can find out.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-12 12:26     ` Segher Boessenkool
@ 2014-06-12 21:57       ` Benjamin Herrenschmidt
  2014-06-12 22:15         ` Segher Boessenkool
  2014-06-13 11:55       ` Gabriel Paubert
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-12 21:57 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

On Thu, 2014-06-12 at 07:26 -0500, Segher Boessenkool wrote:
> But, indeed, it does look like GCC doesn't use it.  It seems to me
> that
> some ABI forbade userland (or non-libraries or whatever) from using
> it.
> I'll see what I can find out.

I can still use -ffixed-cr5 for safety no ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-12 21:57       ` Benjamin Herrenschmidt
@ 2014-06-12 22:15         ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2014-06-12 22:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

On Fri, Jun 13, 2014 at 07:57:08AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2014-06-12 at 07:26 -0500, Segher Boessenkool wrote:
> > But, indeed, it does look like GCC doesn't use it.  It seems to me
> > that
> > some ABI forbade userland (or non-libraries or whatever) from using
> > it.
> > I'll see what I can find out.
> 
> I can still use -ffixed-cr5 for safety no ?

Yes, definitely, and in fact you have to if GCC changes to not have cr5
fixed by default (which may or may not happen).  It will be good
documentation in either case :-)


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-12 12:26     ` Segher Boessenkool
  2014-06-12 21:57       ` Benjamin Herrenschmidt
@ 2014-06-13 11:55       ` Gabriel Paubert
  2014-06-13 14:16         ` Segher Boessenkool
  1 sibling, 1 reply; 8+ messages in thread
From: Gabriel Paubert @ 2014-06-13 11:55 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

On Thu, Jun 12, 2014 at 07:26:39AM -0500, Segher Boessenkool wrote:
> > Actually, from gcc/config/rs6000.h:
> > 
> > /* 1 for registers that have pervasive standard uses
> >    and are not available for the register allocator.
> 
> [snip]
> 
> > So cr5, which is number 73, is never used by gcc. 
> 
> Not available for RA is not the same thing at all as not used by GCC.
> For example, GPR1 and XER[CA] are also fixed regs.

Indeed, I should have been more clear, it is never explicitly reserved
by any ABI like GPR1 for the stack pointer nor used implicitly by any pattern
like the carry (which is also never allocated, but used or clobbered
by many patterns). However no machine description pattern uses cr5.

The line "cr5 is not supposed to be used" has always been a mystery
to me, but gcc has always obeyed this rule. If memory serves it has 
been in rs6000.h since I got my first PPC board in 1997.

> 
> But, indeed, it does look like GCC doesn't use it.  It seems to me that
> some ABI forbade userland (or non-libraries or whatever) from using it.
> I'll see what I can find out.
> 

Take what I say with a grain of salt, but I always had the impression that
it was a remnant from the first port of GCC to Power, which could well have 
been AIX. This first port might have been done by Richard Kenner.

Actually, a long time ago (1999-2000?) I toyed with he idea of using cr5 for 
soft masking of interrupts on PPC32. But I got distracted by other things
and never came around to it.

	Gabriel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: power and percpu: Could we move the paca into the percpu area?
  2014-06-13 11:55       ` Gabriel Paubert
@ 2014-06-13 14:16         ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2014-06-13 14:16 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Tejun Heo, linuxppc-dev, Christoph Lameter

> > > So cr5, which is number 73, is never used by gcc. 
> > 
> > Not available for RA is not the same thing at all as not used by GCC.
> > For example, GPR1 and XER[CA] are also fixed regs.
> 
> Indeed, I should have been more clear, it is never explicitly reserved
> by any ABI like GPR1 for the stack pointer nor used implicitly by any pattern
> like the carry (which is also never allocated, but used or clobbered
> by many patterns). However no machine description pattern uses cr5.
> 
> The line "cr5 is not supposed to be used" has always been a mystery
> to me, but gcc has always obeyed this rule. If memory serves it has 
> been in rs6000.h since I got my first PPC board in 1997.

It's been there (in the public repo) since 1992.  The copyrights for
some  of the rest of the port go back to 1990, so the code has most
likely existed at least that far back.

> > But, indeed, it does look like GCC doesn't use it.  It seems to me that
> > some ABI forbade userland (or non-libraries or whatever) from using it.
> > I'll see what I can find out.
> 
> Take what I say with a grain of salt, but I always had the impression that
> it was a remnant from the first port of GCC to Power, which could well have 
> been AIX. This first port might have been done by Richard Kenner.

That is all correct as far as I can see.

Presumably the AIX ABI at that time (AIX2? AIX3?) had cr5 reserved.

Still digging...


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-06-13 14:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 19:37 power and percpu: Could we move the paca into the percpu area? Christoph Lameter
2014-06-11 20:22 ` Benjamin Herrenschmidt
2014-06-11 21:03   ` Gabriel Paubert
2014-06-12 12:26     ` Segher Boessenkool
2014-06-12 21:57       ` Benjamin Herrenschmidt
2014-06-12 22:15         ` Segher Boessenkool
2014-06-13 11:55       ` Gabriel Paubert
2014-06-13 14:16         ` Segher Boessenkool

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.