All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kyle Huey <me@kylehuey.com>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>,
	Robert O'Callahan <robert@ocallahan.org>
Subject: Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
Date: Mon, 24 Oct 2016 12:22:05 -0700	[thread overview]
Message-ID: <CAP045Aox9-wJzoT=sRVWJe9KzraHDa8OBEbLWLV-sMsy-K6qAA@mail.gmail.com> (raw)
In-Reply-To: <01df3215-3f0f-aaf9-98c9-301d2aebd0b8@oracle.com>

On Mon, Oct 24, 2016 at 8:05 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 10/24/2016 12:18 AM, Kyle Huey wrote:
>>
>> The anomalies we see appear to be related to, or at least triggerable
>> by, the performance monitoring interrupt.  The following program runs
>> a loop of roughly 2^25 conditional branches.  It takes one argument,
>> the number of conditional branches to program the PMI to trigger on.
>> The default is 50,000, and if you run the program with that it'll
>> produce the same value every time.  If you drop it to 5000 or so
>> you'll probably see occasional off-by-one discrepancies.  If you drop
>> it to 500 the performance counter values fluctuate wildly.
>
> Yes, it does change but I also see the difference on baremetal (although
> not as big as it is in an HVM guest):
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5950003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5850003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 7530107 conditional branches
> ostr@workbase>

Yeah, you're right.  I simplified the testcase too far.  I have
included a better one.  This testcase is stable on bare metal (down to
an interrupt every 10 branches, I didn't try below that) and more
accurately represents what our software actually does.  rr acts as a
ptrace supervisor to the process being recorded, and it seems that
context switching between the supervisor and tracee processes
stabilizes the performance counter values somehow.

>> I'm not yet sure if this is specifically related to the PMI, or if it
>> can be caused by any interrupt and it's only how frequently the
>> interrupts occur that matters.
>
> I have never used file interface to performance counters, but what are
> we reporting here (in read_counter()) --- total number of events or
> number of events since last sample? It is also curious to me that the
> counter in non-zero after  PERF_EVENT_IOC_RESET (but again, I don't have
> any experience with these interfaces).

It should be number of events since the last time the counter was
reset (or overflowed, I guess).  On my machine the counter value is
zero both before and after the PERF_EVENT_IOC_RESET ioctl.

> Also, exclude_guest doesn't appear to make any difference, I don't know
> if there are any bits in Intel counters that allow you to distinguish
> guest from host (unlike AMD, where there is a bit for that).

exclude_guest is a Linux specific thing for excluding KVM guests.
There is no hardware support involved; it's handled entirely in the
perf events infrastructure in the kernel.

- Kyle

#define _GNU_SOURCE 1

#include <assert.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static struct perf_event_attr rcb_attr;
static uint64_t period;
static int fd;

void counter_on(uint64_t ticks)
{
  int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
  assert(!ret);
}

void counter_off()
{
  int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
  assert(!ret);
}

int64_t read_counter()
{
  int64_t val;
  ssize_t nread = read(fd, &val, sizeof(val));
  assert(nread == sizeof(val));
  return val;
}

void do_test()
{
  int i, dummy;

  for (i = 0; i < (1 << 25); i++) {
    dummy += i % (1 << 10);
    dummy += i % (79 * (1 << 10));
  }
}

int main(int argc, const char* argv[])
{
  int pid;
  memset(&rcb_attr, 0, sizeof(rcb_attr));
  rcb_attr.size = sizeof(rcb_attr);
  rcb_attr.type = PERF_TYPE_RAW;
  /* Intel retired conditional branches counter, ring 3 only */
  rcb_attr.config = 0x5101c4;
  rcb_attr.exclude_kernel = 1;
  rcb_attr.exclude_guest = 1;
  /* We'll change this later */
  rcb_attr.sample_period = 0xffffffff;

  signal(SIGALRM, SIG_IGN);
  pid = fork();
  if (pid == 0) {
    /* Wait for the parent */
    kill(getpid(), SIGSTOP);
    do_test();
    return 0;
  }

  /* start the counter */
  fd = syscall(__NR_perf_event_open, &rcb_attr, pid, -1, -1, 0);
  if (fd < 0) {
    printf("Failed to initialize counter\n");
    return -1;
  }

  counter_off();

  struct f_owner_ex own;
  own.type = F_OWNER_PID;
  own.pid = pid;
  if (fcntl(fd, F_SETOWN_EX, &own) ||
      fcntl(fd, F_SETFL, O_ASYNC) ||
      fcntl(fd, F_SETSIG, SIGALRM)) {
    printf("Failed to make counter async\n");
    return -1;
  }

  period = 50000;
  if (argc > 1) {
    sscanf(argv[1], "%ld", &period);
  }

  printf("Period is %ld\n", period);

  counter_on(period);
  ptrace(PTRACE_SEIZE, pid, NULL, 0);
  ptrace(PTRACE_CONT, pid, NULL, SIGCONT);

  int status = 0;
  while (1) {
    waitpid(pid, &status, 0);
    if (WIFEXITED(status)) {
      break;
    }
    if (WIFSIGNALED(status)) {
      assert(0);
      continue;
    }
    if (WIFSTOPPED(status)) {
      if (WSTOPSIG(status) == SIGALRM ||
      WSTOPSIG(status) == SIGSTOP) {
    ptrace(PTRACE_CONT, pid, NULL, WSTOPSIG(status));
    continue;
      }
    }
    assert(0 && "unhandled ptrace event!");
  }

  counter_off();
  int64_t counts = read_counter();
  printf("Counted %ld conditional branches\n", counts);

  return 0;
}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-10-24 19:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-14 19:47 [PATCH v3] x86/Intel: virtualize support for cpuid faulting Kyle Huey
2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
2016-10-17 12:35   ` Andrew Cooper
2016-10-17 12:43   ` Wei Liu
2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
2016-10-17 12:32   ` Wei Liu
2016-10-20  5:10     ` Kyle Huey
2016-10-20  7:56       ` Andrew Cooper
2016-10-20 13:55         ` Kyle Huey
2016-10-20 14:11           ` Andrew Cooper
2016-10-20 14:40             ` Boris Ostrovsky
2016-10-21 15:52               ` Kyle Huey
2016-10-24  4:18                 ` Kyle Huey
2016-10-24 15:05                   ` Boris Ostrovsky
2016-10-24 19:22                     ` Kyle Huey [this message]
2016-10-24 21:15                       ` Boris Ostrovsky
2016-10-17 12:49   ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP045Aox9-wJzoT=sRVWJe9KzraHDa8OBEbLWLV-sMsy-K6qAA@mail.gmail.com' \
    --to=me@kylehuey.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=robert@ocallahan.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.