linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question about barriers for ARM on tools/perf/
@ 2015-05-08 14:04 Arnaldo Carvalho de Melo
  2015-05-08 14:16 ` Peter Zijlstra
  2015-05-08 14:21 ` Will Deacon
  0 siblings, 2 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-05-08 14:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

Hi Will,

	I am working on moving the stuff we have for mb/rmb/wmb from
tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting
to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the
kernel and who knows, at some point even share the source code.

	For now I am getting just what is needed for work on having
atomic.h done in the same fashion, to implement refcounts for various
perf data structures, starting with struct thread, for which I have
a patch that makes perf survive in high core count machines where it
currently crashes, most nobably 'perf top'.

	While doing that I noticed that arm64 implementation, lastly
fixed in:

  f428ebd184c82a7914b2aa7e9f868918aaf7ea78
  perf tools: Fix AAAAARGH64 memory barriers

By peterz, it implements those barriers as:

#define mb()            asm volatile("dmb ish" ::: "memory")
#define wmb()           asm volatile("dmb ishst" ::: "memory")
#define rmb()           asm volatile("dmb ishld" ::: "memory")

Which are not the same as in the kernel, i.e. in
arch/arm64/include/asm/barrier.h, where the above are really smp_mb,
smp_wmb and smp_rmb.

Would it be enough for us to use the same implementation as the kernel?
I.e. make it be:

#define mb()            asm volatile("dsb sy" ::: "memory")
#define wmb()           asm volatile("dsb st" ::: "memory")
#define rmb()           asm volatile("dsb ld" ::: "memory")

? If so I would then use those dsb/dmb macros, etc, to get tools/ to use
the proper instructions, etc.

I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that
in turn is used by atomic_dec_and_test(), that I need for refcounts.

Can you clarify?

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo
@ 2015-05-08 14:16 ` Peter Zijlstra
  2015-05-08 14:21   ` Will Deacon
  2015-05-08 14:21 ` Will Deacon
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2015-05-08 14:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim,
	Linux Kernel Mailing List

On Fri, May 08, 2015 at 11:04:59AM -0300, Arnaldo Carvalho de Melo wrote:
> Hi Will,
> 
> 	I am working on moving the stuff we have for mb/rmb/wmb from
> tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting
> to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the
> kernel and who knows, at some point even share the source code.
> 
> 	For now I am getting just what is needed for work on having
> atomic.h done in the same fashion, to implement refcounts for various
> perf data structures, starting with struct thread, for which I have
> a patch that makes perf survive in high core count machines where it
> currently crashes, most nobably 'perf top'.
> 
> 	While doing that I noticed that arm64 implementation, lastly
> fixed in:
> 
>   f428ebd184c82a7914b2aa7e9f868918aaf7ea78
>   perf tools: Fix AAAAARGH64 memory barriers
> 
> By peterz, it implements those barriers as:
> 
> #define mb()            asm volatile("dmb ish" ::: "memory")
> #define wmb()           asm volatile("dmb ishst" ::: "memory")
> #define rmb()           asm volatile("dmb ishld" ::: "memory")
> 
> Which are not the same as in the kernel, i.e. in
> arch/arm64/include/asm/barrier.h, where the above are really smp_mb,
> smp_wmb and smp_rmb.
> 
> Would it be enough for us to use the same implementation as the kernel?
> I.e. make it be:
> 
> #define mb()            asm volatile("dsb sy" ::: "memory")
> #define wmb()           asm volatile("dsb st" ::: "memory")
> #define rmb()           asm volatile("dsb ld" ::: "memory")
> 
> ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use
> the proper instructions, etc.
> 
> I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that
> in turn is used by atomic_dec_and_test(), that I need for refcounts.
> 
> Can you clarify?

The dmb things include a fence for IO, the dsb are only for between
CPUs.

So for your work the dsb are fine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo
  2015-05-08 14:16 ` Peter Zijlstra
@ 2015-05-08 14:21 ` Will Deacon
  2015-05-08 14:25   ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-05-08 14:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:04:59PM +0100, Arnaldo Carvalho de Melo wrote:
> Hi Will,

Hi Arnaldo,

> 	I am working on moving the stuff we have for mb/rmb/wmb from
> tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting
> to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the
> kernel and who knows, at some point even share the source code.
> 
> 	For now I am getting just what is needed for work on having
> atomic.h done in the same fashion, to implement refcounts for various
> perf data structures, starting with struct thread, for which I have
> a patch that makes perf survive in high core count machines where it
> currently crashes, most nobably 'perf top'.

Sharing atomic.h with userspace sounds a bit scary to me. I'm currently
working on patches that involve patching those routines at runtime to
enable use of some new instructions that we have, so that would cause
problems for userspace.

> 	While doing that I noticed that arm64 implementation, lastly
> fixed in:
> 
>   f428ebd184c82a7914b2aa7e9f868918aaf7ea78
>   perf tools: Fix AAAAARGH64 memory barriers
> 
> By peterz, it implements those barriers as:
> 
> #define mb()            asm volatile("dmb ish" ::: "memory")
> #define wmb()           asm volatile("dmb ishst" ::: "memory")
> #define rmb()           asm volatile("dmb ishld" ::: "memory")
> 
> Which are not the same as in the kernel, i.e. in
> arch/arm64/include/asm/barrier.h, where the above are really smp_mb,
> smp_wmb and smp_rmb.
> 
> Would it be enough for us to use the same implementation as the kernel?
> I.e. make it be:
> 
> #define mb()            asm volatile("dsb sy" ::: "memory")
> #define wmb()           asm volatile("dsb st" ::: "memory")
> #define rmb()           asm volatile("dsb ld" ::: "memory")
> 
> ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use
> the proper instructions, etc.

The mandatory barriers (i.e. the non-smp_* versions) are used for ordering
between CPUs and I/O, so they have a significantly higher performance
penalty on ARM. Given that the perf tool assumedly only cares about ordering
between CPUs, the smp_* variants are the correct versions to use. However,
on a !SMP kernel, they become nops (compiler barriers), which is why they
are defined like they are at the moment.

> I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that
> in turn is used by atomic_dec_and_test(), that I need for refcounts.

Hmm, that would mean if I build a perf tool in a kernel source tree that is
configured as !SMP, then the tool would be subtly broken.

Wouldn't it be better to go the other way, and use compiler builtins for
the memory barriers instead of relying on the kernel? It looks like the
perf_mmap__{read,write}_head functions are basically just acquire/release
operations and could therefore be implemented using something like
__atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
__atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:16 ` Peter Zijlstra
@ 2015-05-08 14:21   ` Will Deacon
  2015-05-08 14:23     ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-05-08 14:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:16:20PM +0100, Peter Zijlstra wrote:
> On Fri, May 08, 2015 at 11:04:59AM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Will,
> > 
> > 	I am working on moving the stuff we have for mb/rmb/wmb from
> > tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting
> > to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the
> > kernel and who knows, at some point even share the source code.
> > 
> > 	For now I am getting just what is needed for work on having
> > atomic.h done in the same fashion, to implement refcounts for various
> > perf data structures, starting with struct thread, for which I have
> > a patch that makes perf survive in high core count machines where it
> > currently crashes, most nobably 'perf top'.
> > 
> > 	While doing that I noticed that arm64 implementation, lastly
> > fixed in:
> > 
> >   f428ebd184c82a7914b2aa7e9f868918aaf7ea78
> >   perf tools: Fix AAAAARGH64 memory barriers
> > 
> > By peterz, it implements those barriers as:
> > 
> > #define mb()            asm volatile("dmb ish" ::: "memory")
> > #define wmb()           asm volatile("dmb ishst" ::: "memory")
> > #define rmb()           asm volatile("dmb ishld" ::: "memory")
> > 
> > Which are not the same as in the kernel, i.e. in
> > arch/arm64/include/asm/barrier.h, where the above are really smp_mb,
> > smp_wmb and smp_rmb.
> > 
> > Would it be enough for us to use the same implementation as the kernel?
> > I.e. make it be:
> > 
> > #define mb()            asm volatile("dsb sy" ::: "memory")
> > #define wmb()           asm volatile("dsb st" ::: "memory")
> > #define rmb()           asm volatile("dsb ld" ::: "memory")
> > 
> > ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use
> > the proper instructions, etc.
> > 
> > I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that
> > in turn is used by atomic_dec_and_test(), that I need for refcounts.
> > 
> > Can you clarify?
> 
> The dmb things include a fence for IO, the dsb are only for between
> CPUs.
> 
> So for your work the dsb are fine.

Other way around ;)

(I relied separately anyway)

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:21   ` Will Deacon
@ 2015-05-08 14:23     ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2015-05-08 14:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:21:56PM +0100, Will Deacon wrote:
> > The dmb things include a fence for IO, the dsb are only for between
> > CPUs.
> > 
> > So for your work the dsb are fine.
> 
> Other way around ;)
> 

Duh..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:21 ` Will Deacon
@ 2015-05-08 14:25   ` Peter Zijlstra
  2015-05-08 14:27     ` Will Deacon
  2015-05-08 14:37     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2015-05-08 14:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> Wouldn't it be better to go the other way, and use compiler builtins for
> the memory barriers instead of relying on the kernel? It looks like the
> perf_mmap__{read,write}_head functions are basically just acquire/release
> operations and could therefore be implemented using something like
> __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).

He wants to do smp refcounting, which needs atomic_inc() /
atomic_inc_non_zero() / atomic_dec_return() etc..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:25   ` Peter Zijlstra
@ 2015-05-08 14:27     ` Will Deacon
  2015-05-08 14:36       ` David Ahern
  2015-05-08 14:37     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-05-08 14:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:25:13PM +0100, Peter Zijlstra wrote:
> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > Wouldn't it be better to go the other way, and use compiler builtins for
> > the memory barriers instead of relying on the kernel? It looks like the
> > perf_mmap__{read,write}_head functions are basically just acquire/release
> > operations and could therefore be implemented using something like
> > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
> 
> He wants to do smp refcounting, which needs atomic_inc() /
> atomic_inc_non_zero() / atomic_dec_return() etc..

Right, of course, but GCC has those too:

  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:27     ` Will Deacon
@ 2015-05-08 14:36       ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2015-05-08 14:36 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Linux Kernel Mailing List

On 5/8/15 8:27 AM, Will Deacon wrote:
> On Fri, May 08, 2015 at 03:25:13PM +0100, Peter Zijlstra wrote:
>> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
>>> Wouldn't it be better to go the other way, and use compiler builtins for
>>> the memory barriers instead of relying on the kernel? It looks like the
>>> perf_mmap__{read,write}_head functions are basically just acquire/release
>>> operations and could therefore be implemented using something like
>>> __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
>>> __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
>>
>> He wants to do smp refcounting, which needs atomic_inc() /
>> atomic_inc_non_zero() / atomic_dec_return() etc..
>
> Right, of course, but GCC has those too:
>
>    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

And we need a solution that works from RHEL5 forward. Not sure what gcc 
version that is; RHEL6 uses 4.4.7. We have done a prototype with the 
__sync functions and it worked nicely.

David


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:25   ` Peter Zijlstra
  2015-05-08 14:27     ` Will Deacon
@ 2015-05-08 14:37     ` Arnaldo Carvalho de Melo
  2015-05-08 14:48       ` Will Deacon
  2015-05-08 14:52       ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-05-08 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim,
	Linux Kernel Mailing List

Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > Wouldn't it be better to go the other way, and use compiler builtins for
> > the memory barriers instead of relying on the kernel? It looks like the
> > perf_mmap__{read,write}_head functions are basically just acquire/release
> > operations and could therefore be implemented using something like
> > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
 
> He wants to do smp refcounting, which needs atomic_inc() /
> atomic_inc_non_zero() / atomic_dec_return() etc..

Right, Will concentrated on what we use those barriers for right now in
tools/perf.

What I am doing right now is to expose what we use in perf to a wider
audience, i.e. code being developed in tools/, with the current intent
of implementing referece counting for multithreaded tools/perf/ tools,
right now only 'perf top', but there are patches floating to load a
perf.data file using as many CPUs as one would like, IIRC initially one
per available CPU.

I am using as a fallback the gcc intrinsics (), but I've heard I rather
should not use those, albeit they seemed to work well for x86_64 and
sparc64:

-------------------------------------------

/**
 * atomic_inc - increment atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically increments @v by 1.
 */
static inline void atomic_inc(atomic_t *v)
{
       __sync_add_and_fetch(&v->counter, 1);
}

/**
 * atomic_dec_and_test - decrement and test
 * @v: pointer of type atomic_t
 *
 * Atomically decrements @v by 1 and
 * returns true if the result is 0, or false for all other
 * cases.
 */
static inline int atomic_dec_and_test(atomic_t *v)
{
       return __sync_sub_and_fetch(&v->counter, 1) == 0;
}

-------------------------------------------

One of my hopes for a byproduct was to take advantage of improvements
made to that code in the kernel, etc.

At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
even cooler.

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:37     ` Arnaldo Carvalho de Melo
@ 2015-05-08 14:48       ` Will Deacon
  2015-05-08 14:57         ` Arnaldo Carvalho de Melo
  2015-05-08 14:52       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-05-08 14:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 03:37:29PM +0100, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > > Wouldn't it be better to go the other way, and use compiler builtins for
> > > the memory barriers instead of relying on the kernel? It looks like the
> > > perf_mmap__{read,write}_head functions are basically just acquire/release
> > > operations and could therefore be implemented using something like
> > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
>  
> > He wants to do smp refcounting, which needs atomic_inc() /
> > atomic_inc_non_zero() / atomic_dec_return() etc..
> 
> Right, Will concentrated on what we use those barriers for right now in
> tools/perf.
> 
> What I am doing right now is to expose what we use in perf to a wider
> audience, i.e. code being developed in tools/, with the current intent
> of implementing referece counting for multithreaded tools/perf/ tools,
> right now only 'perf top', but there are patches floating to load a
> perf.data file using as many CPUs as one would like, IIRC initially one
> per available CPU.
> 
> I am using as a fallback the gcc intrinsics (), but I've heard I rather
> should not use those, albeit they seemed to work well for x86_64 and
> sparc64:

Do you know what the objection to the intrinsics was? I believe that
the __sync versions are deprecated in favour of the C11-like __atomic
flavours, so if that was all the objection was about then we could use
one or the other depending on what the compiler supports.

> -------------------------------------------
> 
> /**
>  * atomic_inc - increment atomic variable
>  * @v: pointer of type atomic_t
>  *
>  * Atomically increments @v by 1.
>  */
> static inline void atomic_inc(atomic_t *v)
> {
>        __sync_add_and_fetch(&v->counter, 1);
> }
> 
> /**
>  * atomic_dec_and_test - decrement and test
>  * @v: pointer of type atomic_t
>  *
>  * Atomically decrements @v by 1 and
>  * returns true if the result is 0, or false for all other
>  * cases.
>  */
> static inline int atomic_dec_and_test(atomic_t *v)
> {
>        return __sync_sub_and_fetch(&v->counter, 1) == 0;
> }
> 
> -------------------------------------------
> 
> One of my hopes for a byproduct was to take advantage of improvements
> made to that code in the kernel, etc.
> 
> At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
> atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
> even cooler.

Perhaps, but including atomic.h sounds pretty fragile to me. Sure, if we
define the right set of macros we may get it to work today, but we could
easily get subtle breakages as the kernel sources move forward and we might
not easily notice/diagnose the failures in the perf tool.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:37     ` Arnaldo Carvalho de Melo
  2015-05-08 14:48       ` Will Deacon
@ 2015-05-08 14:52       ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-05-08 14:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim,
	Linux Kernel Mailing List

Em Fri, May 08, 2015 at 11:37:29AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> > He wants to do smp refcounting, which needs atomic_inc() /
> > atomic_inc_non_zero() / atomic_dec_return() etc..
 
> Right, Will concentrated on what we use those barriers for right now in
> tools/perf.

So, for reference, and this just moves what was already in
tools/perf/perf-sys.h to a place that is named as the kernel is and when
what was used in perf-sys.h is the same as in the kernel, uses the same
kernel source excerpts:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/barrier

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:48       ` Will Deacon
@ 2015-05-08 14:57         ` Arnaldo Carvalho de Melo
  2015-05-08 15:27           ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-05-08 14:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu:
> On Fri, May 08, 2015 at 03:37:29PM +0100, Arnaldo Carvalho de Melo wrote:
> > Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> > > He wants to do smp refcounting, which needs atomic_inc() /
> > > atomic_inc_non_zero() / atomic_dec_return() etc..
> > 
> > Right, Will concentrated on what we use those barriers for right now in
> > tools/perf.
> > 
> > What I am doing right now is to expose what we use in perf to a wider
> > audience, i.e. code being developed in tools/, with the current intent
> > of implementing referece counting for multithreaded tools/perf/ tools,
> > right now only 'perf top', but there are patches floating to load a
> > perf.data file using as many CPUs as one would like, IIRC initially one
> > per available CPU.
> > 
> > I am using as a fallback the gcc intrinsics (), but I've heard I rather
> > should not use those, albeit they seemed to work well for x86_64 and
> > sparc64:
> 
> Do you know what the objection to the intrinsics was? I believe that
> the __sync versions are deprecated in favour of the C11-like __atomic
> flavours, so if that was all the objection was about then we could use
> one or the other depending on what the compiler supports.

Peter? Ingo?
 
> > One of my hopes for a byproduct was to take advantage of improvements
> > made to that code in the kernel, etc.
> > 
> > At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
> > atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
> > even cooler.
> 
> Perhaps, but including atomic.h sounds pretty fragile to me. Sure, if we
> define the right set of macros we may get it to work today, but we could
> easily get subtle breakages as the kernel sources move forward and we might
> not easily notice/diagnose the failures in the perf tool.

Ok, that is a good argument not to share the same source code and
instead do what I am doing now, use it as the starting point, keep the
source code as much as possible the same, so that doing a:

  diff -u arch/$ARCH/include/asm/barrier.h tools/arch/$ARCH/include/asm/barrier.h

Would help in figuring out differences that may or may be desired, while
tracking what the kernel does would help keep the tools/ version in the
best possible shape.

This could even make it more likely that the kernel developers would
help having the best possible implementation in tools/ for that subset
of their work... :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 14:57         ` Arnaldo Carvalho de Melo
@ 2015-05-08 15:27           ` Peter Zijlstra
  2015-05-08 16:45             ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2015-05-08 15:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim,
	Linux Kernel Mailing List

On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu:

> > Do you know what the objection to the intrinsics was? I believe that
> > the __sync versions are deprecated in favour of the C11-like __atomic
> > flavours, so if that was all the objection was about then we could use
> > one or the other depending on what the compiler supports.
> 
> Peter? Ingo?

I cannot remember, the __sync things should mostly work I suppose, and
if you wrap then in the normal atomic interface we don't have to learn
yet another API.

That said, I've successfully lifted this kernel code into userspace in
the past.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 15:27           ` Peter Zijlstra
@ 2015-05-08 16:45             ` Will Deacon
  2015-05-08 18:18               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2015-05-08 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

On Fri, May 08, 2015 at 04:27:59PM +0100, Peter Zijlstra wrote:
> On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu:
> 
> > > Do you know what the objection to the intrinsics was? I believe that
> > > the __sync versions are deprecated in favour of the C11-like __atomic
> > > flavours, so if that was all the objection was about then we could use
> > > one or the other depending on what the compiler supports.
> > 
> > Peter? Ingo?
> 
> I cannot remember, the __sync things should mostly work I suppose, and
> if you wrap then in the normal atomic interface we don't have to learn
> yet another API.

Yeah, I think that's a good idea.

> That said, I've successfully lifted this kernel code into userspace in
> the past.

Lifting a copy isn't too bad, it's using the same file that worries me.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Question about barriers for ARM on tools/perf/
  2015-05-08 16:45             ` Will Deacon
@ 2015-05-08 18:18               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-05-08 18:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa,
	Namhyung Kim, Linux Kernel Mailing List

Em Fri, May 08, 2015 at 05:45:35PM +0100, Will Deacon escreveu:
> On Fri, May 08, 2015 at 04:27:59PM +0100, Peter Zijlstra wrote:
> > On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu:

> > > > Do you know what the objection to the intrinsics was? I believe that
> > > > the __sync versions are deprecated in favour of the C11-like __atomic
> > > > flavours, so if that was all the objection was about then we could use
> > > > one or the other depending on what the compiler supports.
 
> > > Peter? Ingo?

> > I cannot remember, the __sync things should mostly work I suppose, and
> > if you wrap then in the normal atomic interface we don't have to learn
> > yet another API.
 
> Yeah, I think that's a good idea.

I'll use what I have, that is what I posted, i.e. something that was
tested. As we test it further, then we can either lift more stuff from
the kernel or add more compiler-foo.h stuff like we have in the kernel.
 
> > That said, I've successfully lifted this kernel code into userspace in
> > the past.
> 
> Lifting a copy isn't too bad, it's using the same file that worries me.

So that is how I'll proceed, lifting a copy of just what I need, lifting
more as the need arises, and fallbacking to gcc intrinsics for
architectures where the implementations drag too much stuff.

Over time, as the need arises, maybe to support some older compiler or
some problematic toolchain, lifting more from the kernel may be desired
and should be considered.

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-05-08 18:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo
2015-05-08 14:16 ` Peter Zijlstra
2015-05-08 14:21   ` Will Deacon
2015-05-08 14:23     ` Peter Zijlstra
2015-05-08 14:21 ` Will Deacon
2015-05-08 14:25   ` Peter Zijlstra
2015-05-08 14:27     ` Will Deacon
2015-05-08 14:36       ` David Ahern
2015-05-08 14:37     ` Arnaldo Carvalho de Melo
2015-05-08 14:48       ` Will Deacon
2015-05-08 14:57         ` Arnaldo Carvalho de Melo
2015-05-08 15:27           ` Peter Zijlstra
2015-05-08 16:45             ` Will Deacon
2015-05-08 18:18               ` Arnaldo Carvalho de Melo
2015-05-08 14:52       ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).