All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in perf bench numa convergence stats
@ 2015-06-24 11:10 Srikar Dronamraju
  2015-06-24 12:49 ` Ingo Molnar
  2015-06-26  8:43 ` [tip:perf/urgent] perf bench numa: Fix to show proper " tip-bot for Srikar Dronamraju
  0 siblings, 2 replies; 4+ messages in thread
From: Srikar Dronamraju @ 2015-06-24 11:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Vinson Lee, Ingo Molnar
  Cc: LKML, Namhyung Kim, Masami Hiramatsu


perf bench numa mem with -c / -m options on v4.1 and latest tip arent
showing correct convergence statistics. I ran git bisect between v4.0 and
v4.1. I have included the patch that fixed the problem for me.

After bisect,  git bisect visualize shows

>From e1e455f4f4d35850c30235747620d0d078fe9f64 Mon Sep 17 00:00:00 2001
From: Vinson Lee <vlee@twitter.com>
Date: Mon, 23 Mar 2015 12:09:16 -0700
Subject: [PATCH] perf tools: Work around lack of sched_getcpu in glibc < 2.6.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch fixes this build error with glibc < 2.6.

  CC       util/cloexec.o
cc1: warnings being treated as errors
util/cloexec.c: In function _perf_flag_probe_:
util/cloexec.c:24: error: implicit declaration of function
_sched_getcpu_
util/cloexec.c:24: error: nested extern declaration of _sched_getcpu_
make: *** [util/cloexec.o] Error 1

Signed-off-by: Vinson Lee <vlee@twitter.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: stable@vger.kernel.org # 3.18+
Link: http://lkml.kernel.org/r/1427137761-16119-1-git-send-email-vlee@twopensource.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# git log --oneline e1e455f
e1e455f perf tools: Work around lack of sched_getcpu in glibc < 2.6.
77cfe38 perf kmem: Print big numbers using thousands' group
929a6bb tools lib traceevent: Factor out allocating and processing args
e6d7c91 perf probe: Fix to get ummapped symbol address on kernel
228f14f perf tools: Remove (null) value of "Sort order" for perf mem report
2c7da8c perf annotate: Allow annotation for decompressed kernel modules
bc84f46 perf tools: Try to lookup kernel module map before creating one
907fb50 perf tools: Remove is_kmodule_extension function
e746b3e perf tools: Remove compressed argument from is_kernel_module
8dee9ff perf tools: Use kmod_path__parse in is_kernel_module

To further verify if the problem is because of e1e455f commit, I did roll back to e1e455f
and its parent 77cfe38. I see this problem on more than one system.

# rpm -qa | grep glibc-2
glibc-2.17-55.el7.x86_64


git reset --hard e1e455f

# Running 'numa/mem' benchmark:

# Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c"
#
#

 ###
 # 64 tasks will execute (on 4 nodes, 64 CPUs):
 #        800x     0MB global  shared mem operations
 #        800x     0MB process shared mem operations
 #        800x    32MB thread  local  mem operations
 ###

 ###
 #
 # Startup synchronization: ... threads initialized in 0.512908 seconds.
 #
 #    0.1%  [0.0 mins]  0/0   0/0   0/0   0/0  [ 0/0 ] l: -1-0   (  1) {0-0}
 #    0.6%  [0.0 mins]  0/0   0/0   0/0   0/0  [ 0/0 ] l: -1-0   (  1) {0-0}
 #    5.1%  [0.0 mins]  0/0   0/0   0/0   0/0  [ 0/0 ] l: -1-0   (  1) {0-0}
 #    9.6%  [0.1 mins]  0/0   0/0   0/0   0/0  [ 0/0 ] l: -1-0   (  1) {0-0}
 #   14.0%  [0.1 mins]  0/0   0/0   0/0   0/0  [ 0/0 ] l: -1-0   (  1) {0-0}

 ###

          4.903 secs slowest (max) thread-runtime
          4.873 secs fastest (min) thread-runtime
          4.941 secs average thread-runtime
          0.301 % difference between max/avg runtime
          4.228 GB data processed, per thread
        270.583 GB data processed, total
          1.160 nsecs/byte/thread runtime
          0.862 GB/sec/thread speed
         55.193 GB/sec total speed

and its parent 77cfe38
# git reset --hard 77cfe38

# Running 'numa/mem' benchmark:


# Running main, "perf bench numa numa-mem --no-data_rand_walk -p 1 -t 64 -G 0 -P 0 -T 32 -l 800 -zZ0c"
#
#

 ###
 # 64 tasks will execute (on 4 nodes, 64 CPUs):
 #        800x     0MB global  shared mem operations
 #        800x     0MB process shared mem operations
 #        800x    32MB thread  local  mem operations
 ###

 ###
 #
 # Startup synchronization: ... threads initialized in 0.421336 seconds.
 #
 #    0.4%  [0.0 mins] 16/1  16/1  16/1  16/1  [ 0/4 ] l:  1-20  ( 19) [95.0%] {4-4}
 #    2.6%  [0.0 mins] 17/1  15/1  16/1  16/1  [ 2/4 ] l:  3-37  ( 34) [91.9%] {4-4}
 #    7.1%  [0.0 mins] 17/1  15/1  16/1  16/1  [ 2/4 ] l: 32-67  ( 35) [52.2%] {4-4}
 #   11.8%  [0.1 mins] 17/1  15/1  16/1  16/1  [ 2/4 ] l: 65-103 ( 38) [36.9%] {4-4}
 #   15.9%  [0.1 mins] 17/1  15/1  16/1  16/1  [ 2/4 ] l: 98-136 ( 38) [27.9%] {4-4}

 ###

          4.970 secs slowest (max) thread-runtime
          4.940 secs fastest (min) thread-runtime
          4.980 secs average thread-runtime
          0.300 % difference between max/avg runtime
          4.237 GB data processed, per thread
        271.187 GB data processed, total
          1.173 nsecs/byte/thread runtime
          0.853 GB/sec/thread speed
         54.562 GB/sec total speed


Even reverting e1e455f on top of tip/master seems to avoid the problem.
The below patch fixes the problem.

-- 
Thanks and Regards
Srikar Dronamraju

---->8--------------------------------------------

>From 88199ad8a3d6495080eaa016b87a612bc742b1c4 Mon Sep 17 00:00:00 2001
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Date: Wed, 24 Jun 2015 16:23:22 +0530
Subject: [PATCH] perf tools:Fix perf_bench to show proper convergence

With commit: e1e455f (perf tools: Work around lack of sched_getcpu in
glibc < 2.6), perf_bench numa mem with -c or -m option is not able to
correctly calculate convergence. With the above commit, sched_getcpu
always seems to return -1. The intention of commit e1e455f was to add a
sched_getcpu in glibc < 2.6. Hence keep the sched_getcpu definition
under an ifdef.

This regression happened occurred between v4.0 and v4.1

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 tools/perf/util/cloexec.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
index 85b5238..2babdda 100644
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
@@ -7,11 +7,15 @@
 
 static unsigned long flag = PERF_FLAG_FD_CLOEXEC;
 
+#ifdef __GLIBC_PREREQ
+#if !__GLIBC_PREREQ(2, 6)
 int __weak sched_getcpu(void)
 {
 	errno = ENOSYS;
 	return -1;
 }
+#endif
+#endif
 
 static int perf_flag_probe(void)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-26  8:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-24 11:10 Regression in perf bench numa convergence stats Srikar Dronamraju
2015-06-24 12:49 ` Ingo Molnar
2015-06-25 15:30   ` Arnaldo Carvalho de Melo
2015-06-26  8:43 ` [tip:perf/urgent] perf bench numa: Fix to show proper " tip-bot for Srikar Dronamraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.