From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753049AbbEHOVO (ORCPT ); Fri, 8 May 2015 10:21:14 -0400 Received: from foss.arm.com ([217.140.101.70]:53131 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752485AbbEHOVN (ORCPT ); Fri, 8 May 2015 10:21:13 -0400 Date: Fri, 8 May 2015 15:21:08 +0100 From: Will Deacon To: Arnaldo Carvalho de Melo Cc: Peter Zijlstra , Ingo Molnar , David Ahern , Jiri Olsa , Namhyung Kim , Linux Kernel Mailing List Subject: Re: Question about barriers for ARM on tools/perf/ Message-ID: <20150508142107.GA25587@arm.com> References: <20150508140459.GI7862@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150508140459.GI7862@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 08, 2015 at 03:04:59PM +0100, Arnaldo Carvalho de Melo wrote: > Hi Will, Hi Arnaldo, > I am working on moving the stuff we have for mb/rmb/wmb from > tools/perf/perf-sys.h to tools/include/asm/barrier.h, redirecting > to tools/arch/$ARCH/include/asm/barrier.h, to make it look like the > kernel and who knows, at some point even share the source code. > > For now I am getting just what is needed for work on having > atomic.h done in the same fashion, to implement refcounts for various > perf data structures, starting with struct thread, for which I have > a patch that makes perf survive in high core count machines where it > currently crashes, most nobably 'perf top'. Sharing atomic.h with userspace sounds a bit scary to me. I'm currently working on patches that involve patching those routines at runtime to enable use of some new instructions that we have, so that would cause problems for userspace. > While doing that I noticed that arm64 implementation, lastly > fixed in: > > f428ebd184c82a7914b2aa7e9f868918aaf7ea78 > perf tools: Fix AAAAARGH64 memory barriers > > By peterz, it implements those barriers as: > > #define mb() asm volatile("dmb ish" ::: "memory") > #define wmb() asm volatile("dmb ishst" ::: "memory") > #define rmb() asm volatile("dmb ishld" ::: "memory") > > Which are not the same as in the kernel, i.e. in > arch/arm64/include/asm/barrier.h, where the above are really smp_mb, > smp_wmb and smp_rmb. > > Would it be enough for us to use the same implementation as the kernel? > I.e. make it be: > > #define mb() asm volatile("dsb sy" ::: "memory") > #define wmb() asm volatile("dsb st" ::: "memory") > #define rmb() asm volatile("dsb ld" ::: "memory") > > ? If so I would then use those dsb/dmb macros, etc, to get tools/ to use > the proper instructions, etc. The mandatory barriers (i.e. the non-smp_* versions) are used for ordering between CPUs and I/O, so they have a significantly higher performance penalty on ARM. Given that the perf tool assumedly only cares about ordering between CPUs, the smp_* variants are the correct versions to use. However, on a !SMP kernel, they become nops (compiler barriers), which is why they are defined like they are at the moment. > I need now, for arm64, smp_mb, that is used by atomic_sub_return(), that > in turn is used by atomic_dec_and_test(), that I need for refcounts. Hmm, that would mean if I build a perf tool in a kernel source tree that is configured as !SMP, then the tool would be subtly broken. Wouldn't it be better to go the other way, and use compiler builtins for the memory barriers instead of relying on the kernel? It looks like the perf_mmap__{read,write}_head functions are basically just acquire/release operations and could therefore be implemented using something like __atomic_load_n(&pc->data_head, __ATOMIC_ACQUIRE) and __atomic_store_n(&pc->data_tail, tail, __ATOMIC_RELEASE). Will