All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] perf: perf sched warning possibly due to clock granularity on AMD
@ 2012-02-06 13:25 Stephane Eranian
  2012-02-06 14:26 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Stephane Eranian @ 2012-02-06 13:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, mingo, robert.richter, eric.dumazet

Hi,

I am running 3.3.30-rc2 on an AMD Bulldozer system in 64-bit mode.
I was testing perf sched and I ran into an issue. That issue seems
to exist only on AMD and not on Intel systems. It is not PMU related
because I am doing tracing.

I am running a simple ping pong test to stress the context switch
code. Two processes exchanging a byte through a pipe (program provided
below).

Then, I capture a trace using perf sched, and I run sched lat, but
on certain runs, I get:

   $ perf sched rec pong 2
   $ perf sched lat
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: Timestamp below last timeslice flush
     Warning: TimesFound 4934 unknown events!
     Is this an older tool processing a perf.data file generated by a more recent tool?

I did some investigation in the kernel to figure out what
could cause this problem. I found out that with this measurement, it is
possible to get multiple consecutive samples with the SAME timestamp.
That is possible on AMD because, apparently, local_clock() ends up calling,
a sched_clock() function that is only based on jiffies. That means, the
granularity is a tick (1ms in my case). On Intel systems, at least on mines,
local_clock() ends up calling a routine which uses TSC and thus has a much
greater precision. This is all decided by the sched_clock_stable variable. It
is set at init time for Intel and not used on AMD. Yet, recent AMD systems do
have support for NONSTOP_TSC. It is not clear to me why that would not be enough
(unless it is a cross-socket synchronization issue).

One thing is clear, is that if I enable sched_clock_stable on my AMD system, then
the perf sched warning disappears.

I believe there is really a bug in perf. I am sure there are systems out there
which cannot provide a very fine grain timestamp, thus it is possible to get samples
with identical timestamps. I don't quite understand the sample re-ordering code in
perf but it seems sensitive to timer granularity. I think it should not.

In summary, two issues:
  - Why is sched_clock_stable not set or even tested on recent AMD systems?
  - perf should not rely on fine granularity timestamps in the re-ordering code

Any comments?

#include <sys/types.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <syscall.h>
#include <err.h>

#define MAX_CPUS	64
#define NR_CPU_BITS	(MAX_CPUS>>3)
int
pin_cpu(pid_t pid, unsigned int cpu)
{
	uint64_t my_mask[NR_CPU_BITS];

	if (cpu >= MAX_CPUS)
		errx(1, "this program supports only up to %d CPUs", MAX_CPUS);

	my_mask[cpu>>6] = 1ULL << (cpu&63);

	return syscall(__NR_sched_setaffinity, pid, sizeof(my_mask), &my_mask);
}

static volatile int quit;
void sig_handler(int n)
{
	quit = 1;
}

static void
do_child(int fr, int fw)
{
	char c;
	ssize_t ret;

	for(;;) {
		ret = read(fr, &c, 1);	
		if (ret < 0)
			break;
		ret = write(fw, "c", 1);
		if (ret < 0)
			break;
		
	}
	printf("child exited\n");
	exit(0);
}


int
main(int argc, char **argv)
{
	int ret;
	int pr[2], pw[2];
	int which_cpu;
	pid_t pid;
	uint64_t nctx = 0;
	ssize_t nbytes;
	char c = '0';
	int delay;

	delay = argc > 1 ? atoi(argv[1]) : 10;
	
	srandom(getpid());
	which_cpu = random() % sysconf(_SC_NPROCESSORS_ONLN);

	ret = pipe(pr);
	if (ret)
		err(1, "cannot create read pipe");

	ret = pipe(pw);
	if (ret)
		err(1, "cannot create write pipe");

	ret = pin_cpu(getpid(), which_cpu);
	if (ret)
		err(1, "cannot pin to CPU%d", which_cpu);

	printf("Both processes pinned to CPU%d, running for %ds\n", which_cpu, delay);

	switch(pid=fork()) {
		case -1:
			err(1, "cannot create child");
		case 0:
			close(pr[1]); close(pw[0]);
			do_child(pr[0], pw[1]);
			exit(1);
	}

	close(pr[0]);
	close(pw[1]);

	signal(SIGALRM, sig_handler);
	alarm(delay);

	while(!quit) {
		nbytes = write(pr[1], "c", 1);
		nbytes = read(pw[0], &c, 1);	
		nctx++;
	}
	close(pr[1]);
	close(pw[0]);
	return 0;
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-03-22 15:28 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-06 13:25 [BUG] perf: perf sched warning possibly due to clock granularity on AMD Stephane Eranian
2012-02-06 14:26 ` Peter Zijlstra
2012-02-06 14:31   ` Stephane Eranian
2012-02-06 17:17     ` Arnaldo Carvalho de Melo
2012-02-06 15:34   ` Borislav Petkov
2012-02-06 16:37     ` Peter Zijlstra
2012-02-06 16:46       ` Borislav Petkov
2012-02-06 16:54         ` Peter Zijlstra
2012-02-06 20:27           ` Borislav Petkov
2012-02-06 20:31             ` Peter Zijlstra
2012-02-06 20:37               ` Borislav Petkov
2012-02-06 21:19                 ` Venki Pallipadi
2012-02-07  7:51                   ` Peter Zijlstra
2012-02-07  8:32                   ` Ingo Molnar
2012-02-07  9:06                     ` Borislav Petkov
2012-02-07  9:50                       ` Ingo Molnar
2012-02-07 12:08                         ` [PATCH] x86, AMD: Set sched_clock_stable Borislav Petkov
2012-02-15 15:30                           ` Peter Zijlstra
2012-02-07 19:43 ` [tip:perf/core] x86/sched/perf/AMD: " tip-bot for Borislav Petkov
2012-02-08 15:07 ` [BUG] perf: perf sched warning possibly due to clock granularity on AMD Frederic Weisbecker
2012-02-08 15:10   ` Stephane Eranian
2012-02-08 15:22     ` Frederic Weisbecker
2012-02-08 15:23       ` Stephane Eranian
2012-02-18 16:50 ` [PATCH] perf tools: Fix ordering with unstable tsc Frederic Weisbecker
2012-02-22 15:35   ` Stephane Eranian
2012-02-22 15:39     ` David Ahern
2012-03-05 18:43   ` Frederic Weisbecker
2012-03-14 19:55   ` Arnaldo Carvalho de Melo
2012-03-14 20:07     ` David Ahern
2012-03-22  0:10     ` Frederic Weisbecker
2012-03-22 15:28       ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.