* Question about barriers for ARM on tools/perf/ @ 2015-05-08 14:04 Arnaldo Carvalho de Melo 2015-05-08 14:16 ` Peter Zijlstra 2015-05-08 14:21 ` Will Deacon 0 siblings, 2 replies; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-05-08 14:04 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List Hi Will, I am working on moving the stuff we have for mb/rmb/wmb from tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the kernel and who knows, at some point even share the source code. For now I am getting just what is needed for work on having atomic.h done in the same fashion, to implement refcounts for various perf data structures, starting with struct thread, for which I have a patch that makes perf survive in high core count machines where it currently crashes, most nobably 'perf top'. While doing that I noticed that arm64 implementation, lastly fixed in: f428ebd184c82a7914b2aa7e9f868918aaf7ea78 perf tools: Fix AAAAARGH64 memory barriers By peterz, it implements those barriers as: #define mb() asm volatile("dmb ish" ::: "memory") #define wmb() asm volatile("dmb ishst" ::: "memory") #define rmb() asm volatile("dmb ishld" ::: "memory") Which are not the same as in the kernel, i.e. in arch/arm64/include/asm/barrier.h, where the above are really smp_mb, smp_wmb and smp_rmb. Would it be enough for us to use the same implementation as the kernel? I.e. make it be: #define mb() asm volatile("dsb sy" ::: "memory") #define wmb() asm volatile("dsb st" ::: "memory") #define rmb() asm volatile("dsb ld" ::: "memory") ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use the proper instructions, etc. I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that in turn is used by atomic_dec_and_test(), that I need for refcounts. Can you clarify? - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo @ 2015-05-08 14:16 ` Peter Zijlstra 2015-05-08 14:21 ` Will Deacon 2015-05-08 14:21 ` Will Deacon 1 sibling, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2015-05-08 14:16 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 11:04:59AM -0300, Arnaldo Carvalho de Melo wrote: > Hi Will, > > I am working on moving the stuff we have for mb/rmb/wmb from > tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting > to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the > kernel and who knows, at some point even share the source code. > > For now I am getting just what is needed for work on having > atomic.h done in the same fashion, to implement refcounts for various > perf data structures, starting with struct thread, for which I have > a patch that makes perf survive in high core count machines where it > currently crashes, most nobably 'perf top'. > > While doing that I noticed that arm64 implementation, lastly > fixed in: > > f428ebd184c82a7914b2aa7e9f868918aaf7ea78 > perf tools: Fix AAAAARGH64 memory barriers > > By peterz, it implements those barriers as: > > #define mb() asm volatile("dmb ish" ::: "memory") > #define wmb() asm volatile("dmb ishst" ::: "memory") > #define rmb() asm volatile("dmb ishld" ::: "memory") > > Which are not the same as in the kernel, i.e. in > arch/arm64/include/asm/barrier.h, where the above are really smp_mb, > smp_wmb and smp_rmb. > > Would it be enough for us to use the same implementation as the kernel? > I.e. make it be: > > #define mb() asm volatile("dsb sy" ::: "memory") > #define wmb() asm volatile("dsb st" ::: "memory") > #define rmb() asm volatile("dsb ld" ::: "memory") > > ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use > the proper instructions, etc. > > I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that > in turn is used by atomic_dec_and_test(), that I need for refcounts. > > Can you clarify? The dmb things include a fence for IO, the dsb are only for between CPUs. So for your work the dsb are fine. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:16 ` Peter Zijlstra @ 2015-05-08 14:21 ` Will Deacon 2015-05-08 14:23 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2015-05-08 14:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:16:20PM +0100, Peter Zijlstra wrote: > On Fri, May 08, 2015 at 11:04:59AM -0300, Arnaldo Carvalho de Melo wrote: > > Hi Will, > > > > I am working on moving the stuff we have for mb/rmb/wmb from > > tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting > > to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the > > kernel and who knows, at some point even share the source code. > > > > For now I am getting just what is needed for work on having > > atomic.h done in the same fashion, to implement refcounts for various > > perf data structures, starting with struct thread, for which I have > > a patch that makes perf survive in high core count machines where it > > currently crashes, most nobably 'perf top'. > > > > While doing that I noticed that arm64 implementation, lastly > > fixed in: > > > > f428ebd184c82a7914b2aa7e9f868918aaf7ea78 > > perf tools: Fix AAAAARGH64 memory barriers > > > > By peterz, it implements those barriers as: > > > > #define mb() asm volatile("dmb ish" ::: "memory") > > #define wmb() asm volatile("dmb ishst" ::: "memory") > > #define rmb() asm volatile("dmb ishld" ::: "memory") > > > > Which are not the same as in the kernel, i.e. in > > arch/arm64/include/asm/barrier.h, where the above are really smp_mb, > > smp_wmb and smp_rmb. > > > > Would it be enough for us to use the same implementation as the kernel? > > I.e. make it be: > > > > #define mb() asm volatile("dsb sy" ::: "memory") > > #define wmb() asm volatile("dsb st" ::: "memory") > > #define rmb() asm volatile("dsb ld" ::: "memory") > > > > ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use > > the proper instructions, etc. > > > > I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that > > in turn is used by atomic_dec_and_test(), that I need for refcounts. > > > > Can you clarify? > > The dmb things include a fence for IO, the dsb are only for between > CPUs. > > So for your work the dsb are fine. Other way around ;) (I relied separately anyway) Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:21 ` Will Deacon @ 2015-05-08 14:23 ` Peter Zijlstra 0 siblings, 0 replies; 15+ messages in thread From: Peter Zijlstra @ 2015-05-08 14:23 UTC (permalink / raw) To: Will Deacon Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:21:56PM +0100, Will Deacon wrote: > > The dmb things include a fence for IO, the dsb are only for between > > CPUs. > > > > So for your work the dsb are fine. > > Other way around ;) > Duh.. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo 2015-05-08 14:16 ` Peter Zijlstra @ 2015-05-08 14:21 ` Will Deacon 2015-05-08 14:25 ` Peter Zijlstra 1 sibling, 1 reply; 15+ messages in thread From: Will Deacon @ 2015-05-08 14:21 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:04:59PM +0100, Arnaldo Carvalho de Melo wrote: > Hi Will, Hi Arnaldo, > I am working on moving the stuff we have for mb/rmb/wmb from > tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting > to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the > kernel and who knows, at some point even share the source code. > > For now I am getting just what is needed for work on having > atomic.h done in the same fashion, to implement refcounts for various > perf data structures, starting with struct thread, for which I have > a patch that makes perf survive in high core count machines where it > currently crashes, most nobably 'perf top'. Sharing atomic.h with userspace sounds a bit scary to me. I'm currently working on patches that involve patching those routines at runtime to enable use of some new instructions that we have, so that would cause problems for userspace. > While doing that I noticed that arm64 implementation, lastly > fixed in: > > f428ebd184c82a7914b2aa7e9f868918aaf7ea78 > perf tools: Fix AAAAARGH64 memory barriers > > By peterz, it implements those barriers as: > > #define mb() asm volatile("dmb ish" ::: "memory") > #define wmb() asm volatile("dmb ishst" ::: "memory") > #define rmb() asm volatile("dmb ishld" ::: "memory") > > Which are not the same as in the kernel, i.e. in > arch/arm64/include/asm/barrier.h, where the above are really smp_mb, > smp_wmb and smp_rmb. > > Would it be enough for us to use the same implementation as the kernel? > I.e. make it be: > > #define mb() asm volatile("dsb sy" ::: "memory") > #define wmb() asm volatile("dsb st" ::: "memory") > #define rmb() asm volatile("dsb ld" ::: "memory") > > ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use > the proper instructions, etc. The mandatory barriers (i.e. the non-smp_* versions) are used for ordering between CPUs and I/O, so they have a significantly higher performance penalty on ARM. Given that the perf tool assumedly only cares about ordering between CPUs, the smp_* variants are the correct versions to use. However, on a !SMP kernel, they become nops (compiler barriers), which is why they are defined like they are at the moment. > I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that > in turn is used by atomic_dec_and_test(), that I need for refcounts. Hmm, that would mean if I build a perf tool in a kernel source tree that is configured as !SMP, then the tool would be subtly broken. Wouldn't it be better to go the other way, and use compiler builtins for the memory barriers instead of relying on the kernel? It looks like the perf_mmap__{read,write}_head functions are basically just acquire/release operations and could therefore be implemented using something like __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:21 ` Will Deacon @ 2015-05-08 14:25 ` Peter Zijlstra 2015-05-08 14:27 ` Will Deacon 2015-05-08 14:37 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 15+ messages in thread From: Peter Zijlstra @ 2015-05-08 14:25 UTC (permalink / raw) To: Will Deacon Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: > Wouldn't it be better to go the other way, and use compiler builtins for > the memory barriers instead of relying on the kernel? It looks like the > perf_mmap__{read,write}_head functions are basically just acquire/release > operations and could therefore be implemented using something like > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). He wants to do smp refcounting, which needs atomic_inc() / atomic_inc_non_zero() / atomic_dec_return() etc.. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:25 ` Peter Zijlstra @ 2015-05-08 14:27 ` Will Deacon 2015-05-08 14:36 ` David Ahern 2015-05-08 14:37 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 15+ messages in thread From: Will Deacon @ 2015-05-08 14:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:25:13PM +0100, Peter Zijlstra wrote: > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: > > Wouldn't it be better to go the other way, and use compiler builtins for > > the memory barriers instead of relying on the kernel? It looks like the > > perf_mmap__{read,write}_head functions are basically just acquire/release > > operations and could therefore be implemented using something like > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). > > He wants to do smp refcounting, which needs atomic_inc() / > atomic_inc_non_zero() / atomic_dec_return() etc.. Right, of course, but GCC has those too: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:27 ` Will Deacon @ 2015-05-08 14:36 ` David Ahern 0 siblings, 0 replies; 15+ messages in thread From: David Ahern @ 2015-05-08 14:36 UTC (permalink / raw) To: Will Deacon, Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On 5/8/15 8:27 AM, Will Deacon wrote: > On Fri, May 08, 2015 at 03:25:13PM +0100, Peter Zijlstra wrote: >> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: >>> Wouldn't it be better to go the other way, and use compiler builtins for >>> the memory barriers instead of relying on the kernel? It looks like the >>> perf_mmap__{read,write}_head functions are basically just acquire/release >>> operations and could therefore be implemented using something like >>> __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and >>> __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). >> >> He wants to do smp refcounting, which needs atomic_inc() / >> atomic_inc_non_zero() / atomic_dec_return() etc.. > > Right, of course, but GCC has those too: > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html And we need a solution that works from RHEL5 forward. Not sure what gcc version that is; RHEL6 uses 4.4.7. We have done a prototype with the __sync functions and it worked nicely. David ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:25 ` Peter Zijlstra 2015-05-08 14:27 ` Will Deacon @ 2015-05-08 14:37 ` Arnaldo Carvalho de Melo 2015-05-08 14:48 ` Will Deacon 2015-05-08 14:52 ` Arnaldo Carvalho de Melo 1 sibling, 2 replies; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-05-08 14:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu: > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: > > Wouldn't it be better to go the other way, and use compiler builtins for > > the memory barriers instead of relying on the kernel? It looks like the > > perf_mmap__{read,write}_head functions are basically just acquire/release > > operations and could therefore be implemented using something like > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). > He wants to do smp refcounting, which needs atomic_inc() / > atomic_inc_non_zero() / atomic_dec_return() etc.. Right, Will concentrated on what we use those barriers for right now in tools/perf. What I am doing right now is to expose what we use in perf to a wider audience, i.e. code being developed in tools/, with the current intent of implementing referece counting for multithreaded tools/perf/ tools, right now only 'perf top', but there are patches floating to load a perf.data file using as many CPUs as one would like, IIRC initially one per available CPU. I am using as a fallback the gcc intrinsics (), but I've heard I rather should not use those, albeit they seemed to work well for x86_64 and sparc64: ------------------------------------------- /** * atomic_inc - increment atomic variable * @v: pointer of type atomic_t * * Atomically increments @v by 1. */ static inline void atomic_inc(atomic_t *v) { __sync_add_and_fetch(&v->counter, 1); } /** * atomic_dec_and_test - decrement and test * @v: pointer of type atomic_t * * Atomically decrements @v by 1 and * returns true if the result is 0, or false for all other * cases. */ static inline int atomic_dec_and_test(atomic_t *v) { return __sync_sub_and_fetch(&v->counter, 1) == 0; } ------------------------------------------- One of my hopes for a byproduct was to take advantage of improvements made to that code in the kernel, etc. At least using the same API, i.e. barrier(), mb(), rmb(), wmb(), atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be even cooler. - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:37 ` Arnaldo Carvalho de Melo @ 2015-05-08 14:48 ` Will Deacon 2015-05-08 14:57 ` Arnaldo Carvalho de Melo 2015-05-08 14:52 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 15+ messages in thread From: Will Deacon @ 2015-05-08 14:48 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 03:37:29PM +0100, Arnaldo Carvalho de Melo wrote: > Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu: > > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: > > > Wouldn't it be better to go the other way, and use compiler builtins for > > > the memory barriers instead of relying on the kernel? It looks like the > > > perf_mmap__{read,write}_head functions are basically just acquire/release > > > operations and could therefore be implemented using something like > > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and > > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). > > > He wants to do smp refcounting, which needs atomic_inc() / > > atomic_inc_non_zero() / atomic_dec_return() etc.. > > Right, Will concentrated on what we use those barriers for right now in > tools/perf. > > What I am doing right now is to expose what we use in perf to a wider > audience, i.e. code being developed in tools/, with the current intent > of implementing referece counting for multithreaded tools/perf/ tools, > right now only 'perf top', but there are patches floating to load a > perf.data file using as many CPUs as one would like, IIRC initially one > per available CPU. > > I am using as a fallback the gcc intrinsics (), but I've heard I rather > should not use those, albeit they seemed to work well for x86_64 and > sparc64: Do you know what the objection to the intrinsics was? I believe that the __sync versions are deprecated in favour of the C11-like __atomic flavours, so if that was all the objection was about then we could use one or the other depending on what the compiler supports. > ------------------------------------------- > > /** > * atomic_inc - increment atomic variable > * @v: pointer of type atomic_t > * > * Atomically increments @v by 1. > */ > static inline void atomic_inc(atomic_t *v) > { > __sync_add_and_fetch(&v->counter, 1); > } > > /** > * atomic_dec_and_test - decrement and test > * @v: pointer of type atomic_t > * > * Atomically decrements @v by 1 and > * returns true if the result is 0, or false for all other > * cases. > */ > static inline int atomic_dec_and_test(atomic_t *v) > { > return __sync_sub_and_fetch(&v->counter, 1) == 0; > } > > ------------------------------------------- > > One of my hopes for a byproduct was to take advantage of improvements > made to that code in the kernel, etc. > > At least using the same API, i.e. barrier(), mb(), rmb(), wmb(), > atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be > even cooler. Perhaps, but including atomic.h sounds pretty fragile to me. Sure, if we define the right set of macros we may get it to work today, but we could easily get subtle breakages as the kernel sources move forward and we might not easily notice/diagnose the failures in the perf tool. Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:48 ` Will Deacon @ 2015-05-08 14:57 ` Arnaldo Carvalho de Melo 2015-05-08 15:27 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-05-08 14:57 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu: > On Fri, May 08, 2015 at 03:37:29PM +0100, Arnaldo Carvalho de Melo wrote: > > Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu: > > > He wants to do smp refcounting, which needs atomic_inc() / > > > atomic_inc_non_zero() / atomic_dec_return() etc.. > > > > Right, Will concentrated on what we use those barriers for right now in > > tools/perf. > > > > What I am doing right now is to expose what we use in perf to a wider > > audience, i.e. code being developed in tools/, with the current intent > > of implementing referece counting for multithreaded tools/perf/ tools, > > right now only 'perf top', but there are patches floating to load a > > perf.data file using as many CPUs as one would like, IIRC initially one > > per available CPU. > > > > I am using as a fallback the gcc intrinsics (), but I've heard I rather > > should not use those, albeit they seemed to work well for x86_64 and > > sparc64: > > Do you know what the objection to the intrinsics was? I believe that > the __sync versions are deprecated in favour of the C11-like __atomic > flavours, so if that was all the objection was about then we could use > one or the other depending on what the compiler supports. Peter? Ingo? > > One of my hopes for a byproduct was to take advantage of improvements > > made to that code in the kernel, etc. > > > > At least using the same API, i.e. barrier(), mb(), rmb(), wmb(), > > atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be > > even cooler. > > Perhaps, but including atomic.h sounds pretty fragile to me. Sure, if we > define the right set of macros we may get it to work today, but we could > easily get subtle breakages as the kernel sources move forward and we might > not easily notice/diagnose the failures in the perf tool. Ok, that is a good argument not to share the same source code and instead do what I am doing now, use it as the starting point, keep the source code as much as possible the same, so that doing a: diff -u arch/$ARCH/include/asm/barrier.h tools/arch/$ARCH/include/asm/barrier.h Would help in figuring out differences that may or may be desired, while tracking what the kernel does would help keep the tools/ version in the best possible shape. This could even make it more likely that the kernel developers would help having the best possible implementation in tools/ for that subset of their work... :-) - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:57 ` Arnaldo Carvalho de Melo @ 2015-05-08 15:27 ` Peter Zijlstra 2015-05-08 16:45 ` Will Deacon 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2015-05-08 15:27 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote: > Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu: > > Do you know what the objection to the intrinsics was? I believe that > > the __sync versions are deprecated in favour of the C11-like __atomic > > flavours, so if that was all the objection was about then we could use > > one or the other depending on what the compiler supports. > > Peter? Ingo? I cannot remember, the __sync things should mostly work I suppose, and if you wrap then in the normal atomic interface we don't have to learn yet another API. That said, I've successfully lifted this kernel code into userspace in the past. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 15:27 ` Peter Zijlstra @ 2015-05-08 16:45 ` Will Deacon 2015-05-08 18:18 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 15+ messages in thread From: Will Deacon @ 2015-05-08 16:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Arnaldo Carvalho de Melo, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List On Fri, May 08, 2015 at 04:27:59PM +0100, Peter Zijlstra wrote: > On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu: > > > > Do you know what the objection to the intrinsics was? I believe that > > > the __sync versions are deprecated in favour of the C11-like __atomic > > > flavours, so if that was all the objection was about then we could use > > > one or the other depending on what the compiler supports. > > > > Peter? Ingo? > > I cannot remember, the __sync things should mostly work I suppose, and > if you wrap then in the normal atomic interface we don't have to learn > yet another API. Yeah, I think that's a good idea. > That said, I've successfully lifted this kernel code into userspace in > the past. Lifting a copy isn't too bad, it's using the same file that worries me. Will ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 16:45 ` Will Deacon @ 2015-05-08 18:18 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-05-08 18:18 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List Em Fri, May 08, 2015 at 05:45:35PM +0100, Will Deacon escreveu: > On Fri, May 08, 2015 at 04:27:59PM +0100, Peter Zijlstra wrote: > > On Fri, May 08, 2015 at 11:57:01AM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Fri, May 08, 2015 at 03:48:20PM +0100, Will Deacon escreveu: > > > > Do you know what the objection to the intrinsics was? I believe that > > > > the __sync versions are deprecated in favour of the C11-like __atomic > > > > flavours, so if that was all the objection was about then we could use > > > > one or the other depending on what the compiler supports. > > > Peter? Ingo? > > I cannot remember, the __sync things should mostly work I suppose, and > > if you wrap then in the normal atomic interface we don't have to learn > > yet another API. > Yeah, I think that's a good idea. I'll use what I have, that is what I posted, i.e. something that was tested. As we test it further, then we can either lift more stuff from the kernel or add more compiler-foo.h stuff like we have in the kernel. > > That said, I've successfully lifted this kernel code into userspace in > > the past. > > Lifting a copy isn't too bad, it's using the same file that worries me. So that is how I'll proceed, lifting a copy of just what I need, lifting more as the need arises, and fallbacking to gcc intrinsics for architectures where the implementations drag too much stuff. Over time, as the need arises, maybe to support some older compiler or some problematic toolchain, lifting more from the kernel may be desired and should be considered. - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about barriers for ARM on tools/perf/ 2015-05-08 14:37 ` Arnaldo Carvalho de Melo 2015-05-08 14:48 ` Will Deacon @ 2015-05-08 14:52 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2015-05-08 14:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, Ingo Molnar, David Ahern, Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List Em Fri, May 08, 2015 at 11:37:29AM -0300, Arnaldo Carvalho de Melo escreveu: > Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu: > > He wants to do smp refcounting, which needs atomic_inc() / > > atomic_inc_non_zero() / atomic_dec_return() etc.. > Right, Will concentrated on what we use those barriers for right now in > tools/perf. So, for reference, and this just moves what was already in tools/perf/perf-sys.h to a place that is named as the kernel is and when what was used in perf-sys.h is the same as in the kernel, uses the same kernel source excerpts: https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/barrier - Arnaldo ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-05-08 18:18 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-08 14:04 Question about barriers for ARM on tools/perf/ Arnaldo Carvalho de Melo 2015-05-08 14:16 ` Peter Zijlstra 2015-05-08 14:21 ` Will Deacon 2015-05-08 14:23 ` Peter Zijlstra 2015-05-08 14:21 ` Will Deacon 2015-05-08 14:25 ` Peter Zijlstra 2015-05-08 14:27 ` Will Deacon 2015-05-08 14:36 ` David Ahern 2015-05-08 14:37 ` Arnaldo Carvalho de Melo 2015-05-08 14:48 ` Will Deacon 2015-05-08 14:57 ` Arnaldo Carvalho de Melo 2015-05-08 15:27 ` Peter Zijlstra 2015-05-08 16:45 ` Will Deacon 2015-05-08 18:18 ` Arnaldo Carvalho de Melo 2015-05-08 14:52 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).