From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753295AbbEHOhs (ORCPT ); Fri, 8 May 2015 10:37:48 -0400 Received: from mail.kernel.org ([198.145.29.136]:36635 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752078AbbEHOhr (ORCPT ); Fri, 8 May 2015 10:37:47 -0400 Date: Fri, 8 May 2015 11:37:29 -0300 From: Arnaldo Carvalho de Melo To: Peter Zijlstra Cc: Will Deacon , Ingo Molnar , David Ahern , Jiri Olsa , Namhyung Kim , Linux Kernel Mailing List Subject: Re: Question about barriers for ARM on tools/perf/ Message-ID: <20150508143729.GJ7862@kernel.org> References: <20150508140459.GI7862@kernel.org> <20150508142107.GA25587@arm.com> <20150508142513.GM27504@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150508142513.GM27504@twins.programming.kicks-ass.net> X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu: > On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote: > > Wouldn't it be better to go the other way, and use compiler builtins for > > the memory barriers instead of relying on the kernel? It looks like the > > perf_mmap__{read,write}_head functions are basically just acquire/release > > operations and could therefore be implemented using something like > > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and > > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). > He wants to do smp refcounting, which needs atomic_inc() / > atomic_inc_non_zero() / atomic_dec_return() etc.. Right, Will concentrated on what we use those barriers for right now in tools/perf. What I am doing right now is to expose what we use in perf to a wider audience, i.e. code being developed in tools/, with the current intent of implementing referece counting for multithreaded tools/perf/ tools, right now only 'perf top', but there are patches floating to load a perf.data file using as many CPUs as one would like, IIRC initially one per available CPU. I am using as a fallback the gcc intrinsics (), but I've heard I rather should not use those, albeit they seemed to work well for x86_64 and sparc64: ------------------------------------------- /** * atomic_inc - increment atomic variable * @v: pointer of type atomic_t * * Atomically increments @v by 1. */ static inline void atomic_inc(atomic_t *v) { __sync_add_and_fetch(&v->counter, 1); } /** * atomic_dec_and_test - decrement and test * @v: pointer of type atomic_t * * Atomically decrements @v by 1 and * returns true if the result is 0, or false for all other * cases. */ static inline int atomic_dec_and_test(atomic_t *v) { return __sync_sub_and_fetch(&v->counter, 1) == 0; } ------------------------------------------- One of my hopes for a byproduct was to take advantage of improvements made to that code in the kernel, etc. At least using the same API, i.e. barrier(), mb(), rmb(), wmb(), atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be even cooler. - Arnaldo