From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753295AbbEHOhs (ORCPT <rfc822;w@1wt.eu>);
	Fri, 8 May 2015 10:37:48 -0400
Received: from mail.kernel.org ([198.145.29.136]:36635 "EHLO mail.kernel.org"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752078AbbEHOhr (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 8 May 2015 10:37:47 -0400
Date: Fri, 8 May 2015 11:37:29 -0300
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>, Ingo Molnar <mingo@kernel.org>,
        David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@redhat.com>,
        Namhyung Kim <namhyung@gmail.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Question about barriers for ARM on tools/perf/
Message-ID: <20150508143729.GJ7862@kernel.org>
References: <20150508140459.GI7862@kernel.org>
 <20150508142107.GA25587@arm.com>
 <20150508142513.GM27504@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150508142513.GM27504@twins.programming.kicks-ass.net>
X-Url: http://acmel.wordpress.com
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Em Fri, May 08, 2015 at 04:25:13PM +0200, Peter Zijlstra escreveu:
> On Fri, May 08, 2015 at 03:21:08PM +0100, Will Deacon wrote:
> > Wouldn't it be better to go the other way, and use compiler builtins for
> > the memory barriers instead of relying on the kernel? It looks like the
> > perf_mmap__{read,write}_head functions are basically just acquire/release
> > operations and could therefore be implemented using something like
> > __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and
> > __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE).
 
> He wants to do smp refcounting, which needs atomic_inc() /
> atomic_inc_non_zero() / atomic_dec_return() etc..

Right, Will concentrated on what we use those barriers for right now in
tools/perf.

What I am doing right now is to expose what we use in perf to a wider
audience, i.e. code being developed in tools/, with the current intent
of implementing referece counting for multithreaded tools/perf/ tools,
right now only 'perf top', but there are patches floating to load a
perf.data file using as many CPUs as one would like, IIRC initially one
per available CPU.

I am using as a fallback the gcc intrinsics (), but I've heard I rather
should not use those, albeit they seemed to work well for x86_64 and
sparc64:

-------------------------------------------

/**
 * atomic_inc - increment atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically increments @v by 1.
 */
static inline void atomic_inc(atomic_t *v)
{
       __sync_add_and_fetch(&v->counter, 1);
}

/**
 * atomic_dec_and_test - decrement and test
 * @v: pointer of type atomic_t
 *
 * Atomically decrements @v by 1 and
 * returns true if the result is 0, or false for all other
 * cases.
 */
static inline int atomic_dec_and_test(atomic_t *v)
{
       return __sync_sub_and_fetch(&v->counter, 1) == 0;
}

-------------------------------------------

One of my hopes for a byproduct was to take advantage of improvements
made to that code in the kernel, etc.

At least using the same API, i.e.  barrier(), mb(), rmb(), wmb(),
atomic_{inc,dec_and_test,read_init} I will, the whole shebang would be
even cooler.

- Arnaldo