From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751950AbbFXLKj (ORCPT ); Wed, 24 Jun 2015 07:10:39 -0400 Received: from e28smtp06.in.ibm.com ([122.248.162.6]:57313 "EHLO e28smtp06.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750882AbbFXLKb (ORCPT ); Wed, 24 Jun 2015 07:10:31 -0400 X-Helo: d28dlp02.in.ibm.com X-MailFrom: srikar@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 24 Jun 2015 16:40:04 +0530 From: Srikar Dronamraju To: Arnaldo Carvalho de Melo , Jiri Olsa , Vinson Lee , Ingo Molnar Cc: LKML , Namhyung Kim , Masami Hiramatsu Subject: Regression in perf bench numa convergence stats Message-ID: <20150624111004.GA5220@linux.vnet.ibm.com> Reply-To: Srikar Dronamraju MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062411-0021-0000-0000-000005ECA430 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org perf bench numa mem with -c / -m options on v4.1 and latest tip arent showing correct convergence statistics. I ran git bisect between v4.0 and v4.1. I have included the patch that fixed the problem for me. After bisect, git bisect visualize shows >>From e1e455f4f4d35850c30235747620d0d078fe9f64 Mon Sep 17 00:00:00 2001 From: Vinson Lee Date: Mon, 23 Mar 2015 12:09:16 -0700 Subject: [PATCH] perf tools: Work around lack of sched_getcpu in glibc < 2.6. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch fixes this build error with glibc < 2.6. CC util/cloexec.o cc1: warnings being treated as errors util/cloexec.c: In function _perf_flag_probe_: util/cloexec.c:24: error: implicit declaration of function _sched_getcpu_ util/cloexec.c:24: error: nested extern declaration of _sched_getcpu_ make: *** [util/cloexec.o] Error 1 Signed-off-by: Vinson Lee Acked-by: Jiri Olsa Acked-by: Namhyung Kim Cc: Adrian Hunter Cc: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Yann Droneaud Cc: stable@vger.kernel.org # 3.18+ Link: http://lkml.kernel.org/r/1427137761-16119-1-git-send-email-vlee@twopensource.com Signed-off-by: Arnaldo Carvalho de Melo # git log --oneline e1e455f e1e455f perf tools: Work around lack of sched_getcpu in glibc < 2.6. 77cfe38 perf kmem: Print big numbers using thousands' group 929a6bb tools lib traceevent: Factor out allocating and processing args e6d7c91 perf probe: Fix to get ummapped symbol address on kernel 228f14f perf tools: Remove (null) value of "Sort order" for perf mem report 2c7da8c perf annotate: Allow annotation for decompressed kernel modules bc84f46 perf tools: Try to lookup kernel module map before creating one 907fb50 perf tools: Remove is_kmodule_extension function e746b3e perf tools: Remove compressed argument from is_kernel_module 8dee9ff perf tools: Use kmod_path__parse in is_kernel_module To further verify if the problem is because of e1e455f commit, I did roll back to e1e455f and its parent 77cfe38. I see this problem on more than one system. # rpm -qa | grep glibc-2 glibc-2.17-55.el7.x86_64 git reset --hard e1e455f # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c" # # ### # 64 tasks will execute (on 4 nodes, 64 CPUs): # 800x 0MB global shared mem operations # 800x 0MB process shared mem operations # 800x 32MB thread local mem operations ### ### # # Startup synchronization: ... threads initialized in 0.512908 seconds. # # 0.1% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0} # 0.6% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0} # 5.1% [0.0 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0} # 9.6% [0.1 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0} # 14.0% [0.1 mins] 0/0 0/0 0/0 0/0 [ 0/0 ] l: -1-0 ( 1) {0-0} ### 4.903 secs slowest (max) thread-runtime 4.873 secs fastest (min) thread-runtime 4.941 secs average thread-runtime 0.301 % difference between max/avg runtime 4.228 GB data processed, per thread 270.583 GB data processed, total 1.160 nsecs/byte/thread runtime 0.862 GB/sec/thread speed 55.193 GB/sec total speed and its parent 77cfe38 # git reset --hard 77cfe38 # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c" # # ### # 64 tasks will execute (on 4 nodes, 64 CPUs): # 800x 0MB global shared mem operations # 800x 0MB process shared mem operations # 800x 32MB thread local mem operations ### ### # # Startup synchronization: ... threads initialized in 0.421336 seconds. # # 0.4% [0.0 mins] 16/1 16/1 16/1 16/1 [ 0/4 ] l: 1-20 ( 19) [95.0%] {4-4} # 2.6% [0.0 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 3-37 ( 34) [91.9%] {4-4} # 7.1% [0.0 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 32-67 ( 35) [52.2%] {4-4} # 11.8% [0.1 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 65-103 ( 38) [36.9%] {4-4} # 15.9% [0.1 mins] 17/1 15/1 16/1 16/1 [ 2/4 ] l: 98-136 ( 38) [27.9%] {4-4} ### 4.970 secs slowest (max) thread-runtime 4.940 secs fastest (min) thread-runtime 4.980 secs average thread-runtime 0.300 % difference between max/avg runtime 4.237 GB data processed, per thread 271.187 GB data processed, total 1.173 nsecs/byte/thread runtime 0.853 GB/sec/thread speed 54.562 GB/sec total speed Even reverting e1e455f on top of tip/master seems to avoid the problem. The below patch fixes the problem. -- Thanks and Regards Srikar Dronamraju ---->8-------------------------------------------- >>From 88199ad8a3d6495080eaa016b87a612bc742b1c4 Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Wed, 24 Jun 2015 16:23:22 +0530 Subject: [PATCH] perf tools:Fix perf_bench to show proper convergence With commit: e1e455f (perf tools: Work around lack of sched_getcpu in glibc < 2.6), perf_bench numa mem with -c or -m option is not able to correctly calculate convergence. With the above commit, sched_getcpu always seems to return -1. The intention of commit e1e455f was to add a sched_getcpu in glibc < 2.6. Hence keep the sched_getcpu definition under an ifdef. This regression happened occurred between v4.0 and v4.1 Signed-off-by: Srikar Dronamraju --- tools/perf/util/cloexec.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c index 85b5238..2babdda 100644 --- a/tools/perf/util/cloexec.c +++ b/tools/perf/util/cloexec.c @@ -7,11 +7,15 @@ static unsigned long flag = PERF_FLAG_FD_CLOEXEC; +#ifdef __GLIBC_PREREQ +#if !__GLIBC_PREREQ(2, 6) int __weak sched_getcpu(void) { errno = ENOSYS; return -1; } +#endif +#endif static int perf_flag_probe(void) { -- 1.8.3.1