All of lore.kernel.org
 help / color / mirror / Atom feed
* Bizarre results from perf event API
@ 2015-02-20 23:26 Benjamin Moody
  2015-02-21  0:39 ` David Ahern
  2015-02-23 20:44 ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Benjamin Moody @ 2015-02-20 23:26 UTC (permalink / raw)
  To: linux-perf-users

I'm trying to write a program that uses the Linux perf event API to
track the CPU usage of an arbitrary task.  I'm following the
documentation at <http://linux.die.net/man/2/perf_event_open>.

My program (highly simplified version shown below) seems to work most
of the time, and as far as I can tell, it should be measuring exactly
the same thing as what 'perf stat -e instructions:uD' measures.
However, when I try measuring the performance of a certain large
proprietary application, I get *completely* different results from my
program and from perf stat.  (The results from perf are somewhat
believable; mine are not.)

For example:

  perf stat -e instructions:uD $APP         => perf says   6157957531

  mystat $APP                               => mystat says  854145434

  mystat perf stat -e instructions:uD $APP  => perf says   6158468700
                                               mystat says  856605022

  perf stat -e instructions:uD mystat $APP  => mystat says  854139621
                                               perf says   6159312914

Especially the latter two results make no sense to me, as both perf
and my program are supposed to be counting the total number of
instructions from all descendant processes.

I have tried looking at the system calls with strace, I've tried
looking at them with gdb.  I cannot see any relevant difference
between what my program is doing, and what the perf tool does.  I am
at my wits' end here.  Can anyone see what might cause my program to
be counting fewer events than perf does?

(If there's a better place to ask this question, please let me know!)

Benjamin Moody

---- mystat.c ----

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <asm/unistd.h>

static int perf_event_open(struct perf_event_attr *hw_event,
                           pid_t pid, int cpu, int group_fd,
                           unsigned long flags)
{
  int fd = syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
  return fd;
}

static int read_perf_counter(int fd, long long *value)
{
  long long v[3];

  errno = EIO;

  if (read(fd, v, sizeof(v)) != sizeof(v))
    return 0;

  if (v[0] <= 0 || v[1] <= 0 || v[2] <= 0)
    return 0;

  if (v[1] == v[2])
    *value = v[0];
  else
    *value = (double) v[0] * v[1] / v[2];
  return 1;
}

int main(int argc, char **argv)
{
  int event_fd;
  struct perf_event_attr pe;
  pid_t child;
  long long event_count;
  int status;

  memset(&pe, 0, sizeof(pe));
  pe.size = sizeof(pe);

  pe.type = PERF_TYPE_HARDWARE;
  pe.config = PERF_COUNT_HW_INSTRUCTIONS;

  pe.read_format = (PERF_FORMAT_TOTAL_TIME_ENABLED
                    | PERF_FORMAT_TOTAL_TIME_RUNNING);

  pe.pinned = 1;                /* always track */

  pe.exclude_kernel = 1;        /* only count userspace instructions */
  pe.exclude_hv = 1;            /* (not kernel or hypervisor) */

  pe.disabled = 1;              /* counter initially disabled */
  pe.inherit = 1;               /* track all descendant processes */
  pe.enable_on_exec = 1;        /* start counting when we exec the
                                   child process */

  event_fd = perf_event_open(&pe, getpid(), -1, -1, 0);
  if (event_fd < 0) {
    fprintf(stderr, "cannot track performance: %s\n", strerror(errno));
    return 126;
  }

  child = fork();
  if (child < 0) {
    fprintf(stderr, "cannot create process: %s\n", strerror(errno));
    return 126;
  }
  else if (child == 0) {
    execvp(argv[1], &argv[1]);
    fprintf(stderr, "unable to execute command: %s\n",
            strerror(errno));
    exit(127);
  }

  while (waitpid(child, &status, 0) != child) ;

  ioctl(event_fd, PERF_EVENT_IOC_DISABLE, 0);
  if (!read_perf_counter(event_fd, &event_count)) {
    fprintf(stderr, "unable to read performance data: %s\n",
            strerror(errno));
    close(event_fd);
    return 126;
  }

  close(event_fd);

  printf("Total instructions: %lld\n", event_count);
  return 0;
}

---- end of mystat.c ----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-20 23:26 Bizarre results from perf event API Benjamin Moody
@ 2015-02-21  0:39 ` David Ahern
  2015-02-21 21:44   ` Benjamin Moody
  2015-02-23 20:44 ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2015-02-21  0:39 UTC (permalink / raw)
  To: Benjamin Moody, linux-perf-users

On 2/20/15 4:26 PM, Benjamin Moody wrote:
> I'm trying to write a program that uses the Linux perf event API to
> track the CPU usage of an arbitrary task.  I'm following the
> documentation at <http://linux.die.net/man/2/perf_event_open>.
>
> My program (highly simplified version shown below) seems to work most
> of the time, and as far as I can tell, it should be measuring exactly
> the same thing as what 'perf stat -e instructions:uD' measures.
> However, when I try measuring the performance of a certain large
> proprietary application, I get *completely* different results from my
> program and from perf stat.  (The results from perf are somewhat
> believable; mine are not.)
>

Trying using a program with established results. e.g., This is from 
PeterZ years ago. On intel 'perf stat -e instructions a.out' should show 
~1 billion (a wee bit more, but clearly in the 1b range).

int main (void)
{
         int i;

         for (i = 0; i < 100000000; i++) {
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
         }

         return 0;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-21  0:39 ` David Ahern
@ 2015-02-21 21:44   ` Benjamin Moody
  2015-02-21 22:59     ` Vince Weaver
  2015-02-24  1:21     ` David Ahern
  0 siblings, 2 replies; 11+ messages in thread
From: Benjamin Moody @ 2015-02-21 21:44 UTC (permalink / raw)
  To: David Ahern; +Cc: linux-perf-users

Hi,

> Trying using a program with established results. e.g., This is from
> PeterZ years ago. On intel 'perf stat -e instructions a.out' should show
> ~1 billion (a wee bit more, but clearly in the 1b range).

That program itself gave the expected results, but it's a good
suggestion.  When I tried changing that program to also fork a bunch
of child processes, I started getting similarly weird results to what
I had seen before.

Looking more closely at the behavior of perf, I think the essential
difference is that perf uses the *child* PID as argument to
perf_event_open.  When I tried changing my program to do the same, it
seems to have fixed the problem.  That is to say, where I used
something like

   fd = perf_event_open(..., getpid(), ...);
   child = fork();
   if (child == 0) {
       execvp(...);
   }

I needed to instead use

   pipe(pipefd);
   child = fork();
   if (child == 0) {
       read(pipefd[0], &x, 1);
       execvp(...);
   }
   fd = perf_event_open(..., child, ...);
   write(pipefd[1], &x, 1);

This seems like a bug, especially since it only seems to come up when
there are grandchild processes involved.  (And I can't help thinking
the first version is a lot more elegant!)  Does the kernel somehow get
confused because enable_on_exec is set and the original process hasn't
actually exec'ed anything?

The problem doesn't occur *every* time there are grandchildren - I
didn't see any problems when running some basic experiments with shell
scripts.  But here's a simple C program that does exhibit the problem:

int main()
{
  int i, j;
  for (i = 0; i < 50; i++) {
    if (!fork()) {
      for (i = 0; i < 100000000; i++) {
        asm("nop");
        asm("nop");
        asm("nop");
        asm("nop");
        asm("nop");
        asm("nop");
        asm("nop");
      }
      exit(0);
    }
  }
  for (i = 0; i < 50; i++)
    wait(&j);
  return 0;
}

At any rate, thanks for your advice!

Benjamin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-21 21:44   ` Benjamin Moody
@ 2015-02-21 22:59     ` Vince Weaver
  2015-02-22 20:12       ` Benjamin Moody
  2015-02-24  1:21     ` David Ahern
  1 sibling, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-02-21 22:59 UTC (permalink / raw)
  To: Benjamin Moody; +Cc: David Ahern, linux-perf-users

On Sat, 21 Feb 2015, Benjamin Moody wrote:


> there are grandchild processes involved.  (And I can't help thinking
> the first version is a lot more elegant!)  Does the kernel somehow get
> confused because enable_on_exec is set and the original process hasn't
> actually exec'ed anything?

Have you tried enabling the "inherit_stat" flag to see if that helps?

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-21 22:59     ` Vince Weaver
@ 2015-02-22 20:12       ` Benjamin Moody
  2015-02-22 20:37         ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Moody @ 2015-02-22 20:12 UTC (permalink / raw)
  To: Vince Weaver; +Cc: David Ahern, linux-perf-users

On 2/21/15, Vince Weaver <vincent.weaver@maine.edu> wrote:
> On Sat, 21 Feb 2015, Benjamin Moody wrote:
>
>
>> there are grandchild processes involved.  (And I can't help thinking
>> the first version is a lot more elegant!)  Does the kernel somehow get
>> confused because enable_on_exec is set and the original process hasn't
>> actually exec'ed anything?
>
> Have you tried enabling the "inherit_stat" flag to see if that helps?

That does seem to help for the simple case I posted.  It doesn't work
in all cases, though.  I'll have to experiment a bit to find a simple
example.

My impression was that the inherit_stat bit shouldn't matter if we are
only interested in aggregate event counts, as opposed to counting
events per thread (found a ML thread about this a while ago but I
can't find it right now.)  Please correct me if I'm wrong.  In any
case, though, the perf tool doesn't use that bit.

Benjamin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-22 20:12       ` Benjamin Moody
@ 2015-02-22 20:37         ` Vince Weaver
  2015-02-24  0:46           ` Benjamin Moody
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-02-22 20:37 UTC (permalink / raw)
  To: Benjamin Moody; +Cc: Vince Weaver, David Ahern, linux-perf-users

On Sun, 22 Feb 2015, Benjamin Moody wrote:

> On 2/21/15, Vince Weaver <vincent.weaver@maine.edu> wrote:
> > On Sat, 21 Feb 2015, Benjamin Moody wrote:
> >
> >
> >> there are grandchild processes involved.  (And I can't help thinking
> >> the first version is a lot more elegant!)  Does the kernel somehow get
> >> confused because enable_on_exec is set and the original process hasn't
> >> actually exec'ed anything?
> >
> > Have you tried enabling the "inherit_stat" flag to see if that helps?
> 
> That does seem to help for the simple case I posted.  It doesn't work
> in all cases, though.  I'll have to experiment a bit to find a simple
> example.
> 
> My impression was that the inherit_stat bit shouldn't matter if we are
> only interested in aggregate event counts, as opposed to counting
> events per thread (found a ML thread about this a while ago but I
> can't find it right now.)  Please correct me if I'm wrong.  In any
> case, though, the perf tool doesn't use that bit.

If you can come up with a coherent explanation of what the inherit_stat 
bit does I'll be glad to add it to the manpage.

From what I vaguely understand it has to do with when a read() happens 
whether you bother to iterate all children and get updated event counts 
from them rather than just using the value from the last time they context 
switched.  I probably have that all mixed up though.

It's true perf doesn't use that bit, but I thought we already established 
that your code doesn't start a process the same way that perf does.

I am interesting in seeing if we can get to the bottom of this issue, as 
I've had users report issues when trying to measure OpenMP programs, 
though I'm not sure if they're seeing the exact same problem.

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-20 23:26 Bizarre results from perf event API Benjamin Moody
  2015-02-21  0:39 ` David Ahern
@ 2015-02-23 20:44 ` Andi Kleen
  2015-02-23 23:13   ` Benjamin Moody
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2015-02-23 20:44 UTC (permalink / raw)
  To: Benjamin Moody; +Cc: linux-perf-users

Benjamin Moody <benjamin.moody@gmail.com> writes:
>
>   pe.pinned = 1;                /* always track */

You definitely don't want pinned

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-23 20:44 ` Andi Kleen
@ 2015-02-23 23:13   ` Benjamin Moody
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Moody @ 2015-02-23 23:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On 2/23/15, Andi Kleen <andi@firstfloor.org> wrote:
> Benjamin Moody <benjamin.moody@gmail.com> writes:
>>
>>   pe.pinned = 1;                /* always track */
>
> You definitely don't want pinned

I'm fairly confident that I do, actually, but why do you say that?

In any case, these problems occur regardless of pinned-ness.

Benjamin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-22 20:37         ` Vince Weaver
@ 2015-02-24  0:46           ` Benjamin Moody
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Moody @ 2015-02-24  0:46 UTC (permalink / raw)
  To: Vince Weaver; +Cc: David Ahern, linux-perf-users

> If you can come up with a coherent explanation of what the inherit_stat
> bit does I'll be glad to add it to the manpage.
>
> From what I vaguely understand it has to do with when a read() happens
> whether you bother to iterate all children and get updated event counts
> from them rather than just using the value from the last time they context
> switched.  I probably have that all mixed up though.

Sorry, I can't make sense of it either.

This must be the thread I was thinking of (I see you're the original author):
http://www.mail-archive.com/linux-perf-users@vger.kernel.org/msg01014.html

> It's true perf doesn't use that bit, but I thought we already established
> that your code doesn't start a process the same way that perf does.

True.  But if inherit_stat is necessary in order to get accurate
results in some circumstances, then perhaps perf should be using it as
well.


I'm starting to see where the problem might be with my original
understanding of the event counter system.  I was thinking of an event
counter as a thing that could be turned on and off on a per-process
basis, so it made sense to think that 'enable_on_exec' would enable
the counter for only the processes that were descended from the one
that did the exec.  But that's not really the right model, is it?
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_DISABLE seem to apply to the
entire set of processes that have inherited the counter.  So if
"enabled" and "disabled" are global states associated with the
counter, rather than with individual processes, then it's silly to
think of using enable_on_exec to limit by process.  The fact that it
seems to work in many cases, then, may be an accident.

Or I could be talking complete nonsense here.

Benjamin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-21 21:44   ` Benjamin Moody
  2015-02-21 22:59     ` Vince Weaver
@ 2015-02-24  1:21     ` David Ahern
  2015-02-27 21:41       ` Benjamin Moody
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2015-02-24  1:21 UTC (permalink / raw)
  To: Benjamin Moody; +Cc: linux-perf-users

On 2/21/15 2:44 PM, Benjamin Moody wrote:
> Hi,
>
>> Trying using a program with established results. e.g., This is from
>> PeterZ years ago. On intel 'perf stat -e instructions a.out' should show
>> ~1 billion (a wee bit more, but clearly in the 1b range).
>
> That program itself gave the expected results, but it's a good
> suggestion.  When I tried changing that program to also fork a bunch
> of child processes, I started getting similarly weird results to what
> I had seen before.

This is with the 1billion instructions sent earlier:

$ ./myperfstat ./1bi
Total instructions: 1000087491

$ perf stat -e instructions:u ./1bi

  Performance counter stats for './1bi':

      1,000,087,608      instructions:u

        0.285943713 seconds time elapsed

Using the second version of this program:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

main ()
{
         int i;

         fork();
         fork();

         for (i = 0; i < 100000000; i++) {
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
                 asm("nop");
         }
         wait(NULL);
         wait(NULL);
         wait(NULL);
         wait(NULL);
}

This one spawn child processes so you have
     perf -> child -> gchild -> ggchild
                |---> gchild

So it has the grandparent characteristics you are getting at.

$ ./myperfstat ./1bi4
Total instructions: 4000094342

$ perf stat -e instructions:u ./1bi4

  Performance counter stats for './1bi4':

      4,000,094,649      instructions:u

        0.259537842 seconds time elapsed

Still seems to match up.

This is on:
$ uname -r
3.18.7-200.fc21.x86_64

>
> Looking more closely at the behavior of perf, I think the essential
> difference is that perf uses the *child* PID as argument to
> perf_event_open.  When I tried changing my program to do the same, it
> seems to have fixed the problem.  That is to say, where I used
> something like

And I did not modify the mystat.c program to do this:

>
>     fd = perf_event_open(..., getpid(), ...);
>     child = fork();
>     if (child == 0) {
>         execvp(...);
>     }
>
> I needed to instead use
>
>     pipe(pipefd);
>     child = fork();
>     if (child == 0) {
>         read(pipefd[0], &x, 1);
>         execvp(...);
>     }
>     fd = perf_event_open(..., child, ...);
>     write(pipefd[1], &x, 1);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bizarre results from perf event API
  2015-02-24  1:21     ` David Ahern
@ 2015-02-27 21:41       ` Benjamin Moody
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Moody @ 2015-02-27 21:41 UTC (permalink / raw)
  To: David Ahern; +Cc: linux-perf-users

Using this program, with the original 'mystat' program:

> #include <stdlib.h>
> #include <stdio.h>
> #include <time.h>
>
> main ()
> {
>          int i;
>
>          fork();
>          fork();
>
>          for (i = 0; i < 100000000; i++) {
>                  asm("nop");
>                  asm("nop");
>                  asm("nop");
>                  asm("nop");
>                  asm("nop");
>                  asm("nop");
>                  asm("nop");
>          }
>          wait(NULL);
>          wait(NULL);
>          wait(NULL);
>          wait(NULL);
> }

I consistently get a result of slightly over 1 billion instructions
(whereas perf stat gives the correct result of around 4 billion.)  I
get the same results on several systems, including:

  2.6.32-504.8.1.el6.x86_64 on a single-core Athlon
  2.6.32-431.17.1.el6.x86_64 on an 8-core Xeon
  3.10.0-123.13.2.el7.x86_64 on an 8-core Opteron
  3.16.0-0.bpo.4-amd64 on a 2-core Athlon

Will try to test with a newer kernel when I have a chance.

Benjamin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-02-27 21:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-20 23:26 Bizarre results from perf event API Benjamin Moody
2015-02-21  0:39 ` David Ahern
2015-02-21 21:44   ` Benjamin Moody
2015-02-21 22:59     ` Vince Weaver
2015-02-22 20:12       ` Benjamin Moody
2015-02-22 20:37         ` Vince Weaver
2015-02-24  0:46           ` Benjamin Moody
2015-02-24  1:21     ` David Ahern
2015-02-27 21:41       ` Benjamin Moody
2015-02-23 20:44 ` Andi Kleen
2015-02-23 23:13   ` Benjamin Moody

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.