linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf with precise attribute kills all KVM based VMs
@ 2012-07-09 14:12 David Ahern
  2012-07-09 14:19 ` Gleb Natapov
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-09 14:12 UTC (permalink / raw)
  To: Peter Zijlstra, Avi Kivity, Gleb Natapov, LKML

This is 100% reproducible with Fedora 17, 3.4.2-1.fc16.x86_64 kernel.

Using the precise attribute (:p or :pp) with perf-record, eg,

perf record -e cycles:p -ag -- sleep 10

All running VMs are killed. The VMs appear to be restarted but crash on 
restart.

 From one of the VMs that has the console redirected to ttyS0 and ttyS0 
of the VM mapped to stdio I was able to capture some boot logs on the 
restart of the VM:

[    0.019136] BUG: unable to handle kernel paging request at 13832f88
[    0.020252] IP: [<c0104ed0>] mcount+0x0/0xc
[    0.021000] *pdpt = 0000000000572001 *pde = 0000000000000000
[    0.021942] Oops: 0000 [#1] SMP
[    0.021996] __die: Die type Oops
[    0.021996] Modules linked in:
[    0.021996]
[    0.021996] Pid: 0, comm: swapper Not tainted (2.6.27.47 #0)
[    0.021996] EIP: 0060:[<c0104ed0>] EFLAGS: 00010296 CPU: 0
[    0.021996] EIP is at mcount+0x0/0xc
[    0.021996] EAX: f78026c0 EBX: 00000000 ECX: f78026c0 EDX: 000080d0
[    0.021996] ESI: f782c228 EDI: f780b240 EBP: c051fef0 ESP: c051fed4
[    0.021996]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.021996] Process swapper (pid: 0, ti=c051e000 task=c04d0380 
task.ti=c051e000)
[    0.021996] Stack: c0193339 c0267510 c051feec c026745f 00000000 
f782c228 f780b240 c051fefc
[    0.021996]        c026c6b3 f780b240 c051ff14 c0267690 00000000 
00000000 f782c200 f780b2a0
[    0.021996]        c051ff4c c026720e 00000001 f88055af f782c228 
00000000 f782c200 00000000



David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:12 perf with precise attribute kills all KVM based VMs David Ahern
@ 2012-07-09 14:19 ` Gleb Natapov
  2012-07-09 14:24   ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Gleb Natapov @ 2012-07-09 14:19 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML

On Mon, Jul 09, 2012 at 08:12:40AM -0600, David Ahern wrote:
> This is 100% reproducible with Fedora 17, 3.4.2-1.fc16.x86_64 kernel.
> 
> Using the precise attribute (:p or :pp) with perf-record, eg,
> 
> perf record -e cycles:p -ag -- sleep 10
> 
> All running VMs are killed. The VMs appear to be restarted but crash
> on restart.
> 
> From one of the VMs that has the console redirected to ttyS0 and
> ttyS0 of the VM mapped to stdio I was able to capture some boot logs
> on the restart of the VM:
> 
Yes, this is knows problem that I can't find time to fix. The crash is
cause by CPU using host PEBS virtual address while guest is running
which causes guest memory corruption. We should disable evens that use
PEBS at the guest entry.


> [    0.019136] BUG: unable to handle kernel paging request at 13832f88
> [    0.020252] IP: [<c0104ed0>] mcount+0x0/0xc
> [    0.021000] *pdpt = 0000000000572001 *pde = 0000000000000000
> [    0.021942] Oops: 0000 [#1] SMP
> [    0.021996] __die: Die type Oops
> [    0.021996] Modules linked in:
> [    0.021996]
> [    0.021996] Pid: 0, comm: swapper Not tainted (2.6.27.47 #0)
> [    0.021996] EIP: 0060:[<c0104ed0>] EFLAGS: 00010296 CPU: 0
> [    0.021996] EIP is at mcount+0x0/0xc
> [    0.021996] EAX: f78026c0 EBX: 00000000 ECX: f78026c0 EDX: 000080d0
> [    0.021996] ESI: f782c228 EDI: f780b240 EBP: c051fef0 ESP: c051fed4
> [    0.021996]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [    0.021996] Process swapper (pid: 0, ti=c051e000 task=c04d0380
> task.ti=c051e000)
> [    0.021996] Stack: c0193339 c0267510 c051feec c026745f 00000000
> f782c228 f780b240 c051fefc
> [    0.021996]        c026c6b3 f780b240 c051ff14 c0267690 00000000
> 00000000 f782c200 f780b2a0
> [    0.021996]        c051ff4c c026720e 00000001 f88055af f782c228
> 00000000 f782c200 00000000
> 
> 
> 
> David

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:19 ` Gleb Natapov
@ 2012-07-09 14:24   ` Peter Zijlstra
  2012-07-09 14:36     ` Gleb Natapov
  2012-07-09 14:39     ` Avi Kivity
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-09 14:24 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML

On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote:
> Yes, this is knows problem that I can't find time to fix. The crash is
> cause by CPU using host PEBS virtual address while guest is running
> which causes guest memory corruption. We should disable evens that use
> PEBS at the guest entry. 

Whoops.. so the hardware is reading the DS address as programmed by the
host (host linear address) and using it as a guest linear address?

Quality stuff..

Disabling PEBS events for guests isn't pretty though.. but I guess the
only alternative is mapping the DS into the guest and reprogramming
MSR_IA32_DS_AREA which is all a little involved I suppose?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:24   ` Peter Zijlstra
@ 2012-07-09 14:36     ` Gleb Natapov
  2012-07-09 14:39     ` Avi Kivity
  1 sibling, 0 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-09 14:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: David Ahern, Avi Kivity, LKML

On Mon, Jul 09, 2012 at 04:24:04PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote:
> > Yes, this is knows problem that I can't find time to fix. The crash is
> > cause by CPU using host PEBS virtual address while guest is running
> > which causes guest memory corruption. We should disable evens that use
> > PEBS at the guest entry. 
> 
> Whoops.. so the hardware is reading the DS address as programmed by the
> host (host linear address) and using it as a guest linear address?
> 
Yes.

> Quality stuff..
DS area and vitalization are definitely not friends.

> 
> Disabling PEBS events for guests isn't pretty though.. but I guess the
> only alternative is mapping the DS into the guest and reprogramming
> MSR_IA32_DS_AREA which is all a little involved I suppose?
This needs guest cooperation which is possible with PV guest of course,
but there is a security issue here. Intel SDM requires that DS have to
be mapped by page table at all time and marked dirty/accessed otherwise
unspecified bad things can happen. Nothing prevents guest from unmapping
DS and cause host troubles.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:24   ` Peter Zijlstra
  2012-07-09 14:36     ` Gleb Natapov
@ 2012-07-09 14:39     ` Avi Kivity
  2012-07-09 14:47       ` David Ahern
  2012-07-09 14:47       ` Peter Zijlstra
  1 sibling, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2012-07-09 14:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Gleb Natapov, David Ahern, LKML

On 07/09/2012 05:24 PM, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote:
>> Yes, this is knows problem that I can't find time to fix. The crash is
>> cause by CPU using host PEBS virtual address while guest is running
>> which causes guest memory corruption. We should disable evens that use
>> PEBS at the guest entry. 
> 
> Whoops.. so the hardware is reading the DS address as programmed by the
> host (host linear address) and using it as a guest linear address?
> 
> Quality stuff..
> 
> Disabling PEBS events for guests isn't pretty though.. 

We already have atomic MSR switching at guest entry/exit time.  So it's
not pretty in terms of not getting full profiling, but the code won't be
too hard.  Basically we just have to exclude_guest any pebs event.

> but I guess the
> only alternative is mapping the DS into the guest and reprogramming
> MSR_IA32_DS_AREA which is all a little involved I suppose?

Way too involved, especially as it's virtual addresses and we don't
control the guest cr3.  Note that the hardware won't fail gracefully.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:39     ` Avi Kivity
@ 2012-07-09 14:47       ` David Ahern
  2012-07-09 14:49         ` Peter Zijlstra
  2012-07-09 14:47       ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-09 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Peter Zijlstra, Gleb Natapov, LKML

On 7/9/12 8:39 AM, Avi Kivity wrote:
> On 07/09/2012 05:24 PM, Peter Zijlstra wrote:
>> On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote:
>>> Yes, this is knows problem that I can't find time to fix. The crash is
>>> cause by CPU using host PEBS virtual address while guest is running
>>> which causes guest memory corruption. We should disable evens that use
>>> PEBS at the guest entry.
>>
>> Whoops.. so the hardware is reading the DS address as programmed by the
>> host (host linear address) and using it as a guest linear address?
>>
>> Quality stuff..
>>
>> Disabling PEBS events for guests isn't pretty though..
>
> We already have atomic MSR switching at guest entry/exit time.  So it's
> not pretty in terms of not getting full profiling, but the code won't be
> too hard.  Basically we just have to exclude_guest any pebs event.

I found this testing changes to perf-kvm, but found the problem extends 
to just perf-record. With perf-record exclude_guest defaults to 1. See 
tools/perf/util/util.c, event_attr_init().

>
>> but I guess the
>> only alternative is mapping the DS into the guest and reprogramming
>> MSR_IA32_DS_AREA which is all a little involved I suppose?
>
> Way too involved, especially as it's virtual addresses and we don't
> control the guest cr3.  Note that the hardware won't fail gracefully.
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:39     ` Avi Kivity
  2012-07-09 14:47       ` David Ahern
@ 2012-07-09 14:47       ` Peter Zijlstra
  2012-07-20 23:35         ` David Ahern
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-09 14:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, David Ahern, LKML

On Mon, 2012-07-09 at 17:39 +0300, Avi Kivity wrote:
> > Disabling PEBS events for guests isn't pretty though.. 
> 
> We already have atomic MSR switching at guest entry/exit time.  So it's
> not pretty in terms of not getting full profiling, but the code won't be
> too hard.  Basically we just have to exclude_guest any pebs event. 

OK, so ideally we'd do something like the below, except of course that
that isn't backwards compatible and will break the world :/

bugger that

---
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -336,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event)
                /* BTS is currently only allowed for user-mode. */
                if (!attr->exclude_kernel)
                        return -EOPNOTSUPP;
+
+               if (!attr->exclude_guest)
+                       return -EOPNOTSUPP;
        }
 
        hwc->config |= config;
@@ -378,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
        if (event->attr.precise_ip) {
                int precise = 0;
 
+               if (!event->attr.exclude_guest)
+                       return -EOPNOTSUPP;
+
                /* Support for constant skid */
                if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
                        precise++;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:47       ` David Ahern
@ 2012-07-09 14:49         ` Peter Zijlstra
  2012-07-09 14:51           ` Avi Kivity
  2012-07-09 14:52           ` David Ahern
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-09 14:49 UTC (permalink / raw)
  To: David Ahern; +Cc: Avi Kivity, Gleb Natapov, LKML

On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
> 
> I found this testing changes to perf-kvm, but found the problem extends 
> to just perf-record. With perf-record exclude_guest defaults to 1. See 
> tools/perf/util/util.c, event_attr_init(). 

You lost me there.. so perf-record defaults to exclude_guest=1 (which
would make the proposed patch I just send actually possible), but its
still going *bang* ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:49         ` Peter Zijlstra
@ 2012-07-09 14:51           ` Avi Kivity
  2012-07-09 14:54             ` Gleb Natapov
  2012-07-09 14:59             ` Peter Zijlstra
  2012-07-09 14:52           ` David Ahern
  1 sibling, 2 replies; 35+ messages in thread
From: Avi Kivity @ 2012-07-09 14:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: David Ahern, Gleb Natapov, LKML

On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
>> 
>> I found this testing changes to perf-kvm, but found the problem extends 
>> to just perf-record. With perf-record exclude_guest defaults to 1. See 
>> tools/perf/util/util.c, event_attr_init(). 
> 
> You lost me there.. so perf-record defaults to exclude_guest=1 (which
> would make the proposed patch I just send actually possible), but its
> still going *bang* ?

It's possible that the DS writes overshoot the MSR... or that there's a
bug somewhere.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:49         ` Peter Zijlstra
  2012-07-09 14:51           ` Avi Kivity
@ 2012-07-09 14:52           ` David Ahern
  2012-07-09 14:58             ` David Ahern
  1 sibling, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-09 14:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML

On 7/9/12 8:49 AM, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
>>
>> I found this testing changes to perf-kvm, but found the problem extends
>> to just perf-record. With perf-record exclude_guest defaults to 1. See
>> tools/perf/util/util.c, event_attr_init().
>
> You lost me there.. so perf-record defaults to exclude_guest=1 (which
> would make the proposed patch I just send actually possible), but its
> still going *bang* ?
>


I can do the formality of testing your proposed patch, but yes I think 
so. exclude_guest defaults to 1 because perf_guest defaults to false.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:51           ` Avi Kivity
@ 2012-07-09 14:54             ` Gleb Natapov
  2012-07-09 14:57               ` Gleb Natapov
  2012-07-09 14:59             ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Gleb Natapov @ 2012-07-09 14:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Peter Zijlstra, David Ahern, LKML

On Mon, Jul 09, 2012 at 05:51:29PM +0300, Avi Kivity wrote:
> On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
> > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
> >> 
> >> I found this testing changes to perf-kvm, but found the problem extends 
> >> to just perf-record. With perf-record exclude_guest defaults to 1. See 
> >> tools/perf/util/util.c, event_attr_init(). 
> > 
> > You lost me there.. so perf-record defaults to exclude_guest=1 (which
> > would make the proposed patch I just send actually possible), but its
> > still going *bang* ?
> 
> It's possible that the DS writes overshoot the MSR... or that there's a
> bug somewhere.
> 
I hope it is the later, because we can do nothing with the former.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:54             ` Gleb Natapov
@ 2012-07-09 14:57               ` Gleb Natapov
  0 siblings, 0 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-09 14:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Peter Zijlstra, David Ahern, LKML

On Mon, Jul 09, 2012 at 05:54:42PM +0300, Gleb Natapov wrote:
> On Mon, Jul 09, 2012 at 05:51:29PM +0300, Avi Kivity wrote:
> > On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
> > > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
> > >> 
> > >> I found this testing changes to perf-kvm, but found the problem extends 
> > >> to just perf-record. With perf-record exclude_guest defaults to 1. See 
> > >> tools/perf/util/util.c, event_attr_init(). 
> > > 
> > > You lost me there.. so perf-record defaults to exclude_guest=1 (which
> > > would make the proposed patch I just send actually possible), but its
> > > still going *bang* ?
> > 
> > It's possible that the DS writes overshoot the MSR... or that there's a
> > bug somewhere.
> > 
> I hope it is the later, because we can do nothing with the former.
> 
Actually we can disable DS before guest entry.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:52           ` David Ahern
@ 2012-07-09 14:58             ` David Ahern
  2012-07-09 15:18               ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-09 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML

On 7/9/12 8:52 AM, David Ahern wrote:
> On 7/9/12 8:49 AM, Peter Zijlstra wrote:
>> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
>>>
>>> I found this testing changes to perf-kvm, but found the problem extends
>>> to just perf-record. With perf-record exclude_guest defaults to 1. See
>>> tools/perf/util/util.c, event_attr_init().
>>
>> You lost me there.. so perf-record defaults to exclude_guest=1 (which
>> would make the proposed patch I just send actually possible), but its
>> still going *bang* ?
>>
>
>
> I can do the formality of testing your proposed patch, but yes I think
> so. exclude_guest defaults to 1 because perf_guest defaults to false.


Let me dig for a few minutes. exclude_user is getting reset somewhere. 
Setting a breakpoint on sys_perf_event_open and dumping the attribute:

$1 = {type = 0, size = 80, config = 0, {sample_period = 4000, 
sample_freq = 4000}, sample_type = 263, read_format = 7,
   disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user = 
0, exclude_kernel = 0, exclude_hv = 0, exclude_idle =
     0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec = 
1, task = 0, watermark = 0, precise_ip = 1,
   mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest = 
0, __reserved_1 = 0, {wakeup_events = 0,
     wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0}, 
{bp_len = 0, config2 = 0}, branch_sample_type = 0}



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:51           ` Avi Kivity
  2012-07-09 14:54             ` Gleb Natapov
@ 2012-07-09 14:59             ` Peter Zijlstra
  2012-07-10 23:38               ` David Ahern
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-09 14:59 UTC (permalink / raw)
  To: Avi Kivity; +Cc: David Ahern, Gleb Natapov, LKML

On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote:
> On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
> > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
> >> 
> >> I found this testing changes to perf-kvm, but found the problem extends 
> >> to just perf-record. With perf-record exclude_guest defaults to 1. See 
> >> tools/perf/util/util.c, event_attr_init(). 
> > 
> > You lost me there.. so perf-record defaults to exclude_guest=1 (which
> > would make the proposed patch I just send actually possible), but its
> > still going *bang* ?
> 
> It's possible that the DS writes overshoot the MSR... or that there's a
> bug somewhere.

OK, so the first thing to stare at is in what order the MSRs are
touched, I think you first need to stop the counter, then clear the PEBS
bits, then clear the DS_AREA one.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:58             ` David Ahern
@ 2012-07-09 15:18               ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-09 15:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML

On 7/9/12 8:58 AM, David Ahern wrote:
> On 7/9/12 8:52 AM, David Ahern wrote:
>> On 7/9/12 8:49 AM, Peter Zijlstra wrote:
>>> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
>>>>
>>>> I found this testing changes to perf-kvm, but found the problem extends
>>>> to just perf-record. With perf-record exclude_guest defaults to 1. See
>>>> tools/perf/util/util.c, event_attr_init().
>>>
>>> You lost me there.. so perf-record defaults to exclude_guest=1 (which
>>> would make the proposed patch I just send actually possible), but its
>>> still going *bang* ?
>>>
>>
>>
>> I can do the formality of testing your proposed patch, but yes I think
>> so. exclude_guest defaults to 1 because perf_guest defaults to false.
>
>
> Let me dig for a few minutes. exclude_user is getting reset somewhere.
> Setting a breakpoint on sys_perf_event_open and dumping the attribute:
>
> $1 = {type = 0, size = 80, config = 0, {sample_period = 4000,
> sample_freq = 4000}, sample_type = 263, read_format = 7,
>    disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user =
> 0, exclude_kernel = 0, exclude_hv = 0, exclude_idle =
>      0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec =
> 1, task = 0, watermark = 0, precise_ip = 1,
>    mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest =
> 0, __reserved_1 = 0, {wakeup_events = 0,
>      wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0},
> {bp_len = 0, config2 = 0}, branch_sample_type = 0}
>
>

Offending function is parse_events_modifier(). It resets exclude_guest 
to 0. That's a separate thread/bug.

Undoing that (hardcoding exclude_guest to 1) and verifying in gdb that 
it is set for all cpus:

(gdb) p *attr
$17 = {type = 0, size = 80, config = 0, {sample_period = 4000, 
sample_freq = 4000}, sample_type = 423, read_format = 7,
   disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user = 
0, exclude_kernel = 0, exclude_hv = 0, exclude_idle =
     0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec = 
0, task = 0, watermark = 0, precise_ip = 1,
   mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest = 
1, __reserved_1 = 0, {wakeup_events = 0,
     wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0}, 
{bp_len = 0, config2 = 0}, branch_sample_type = 0}

It still bombs and the VM dies.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:59             ` Peter Zijlstra
@ 2012-07-10 23:38               ` David Ahern
  2012-07-11  7:10                 ` Gleb Natapov
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-10 23:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML

On 7/9/12 8:59 AM, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote:
>> On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
>>> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
>>>>
>>>> I found this testing changes to perf-kvm, but found the problem extends
>>>> to just perf-record. With perf-record exclude_guest defaults to 1. See
>>>> tools/perf/util/util.c, event_attr_init().
>>>
>>> You lost me there.. so perf-record defaults to exclude_guest=1 (which
>>> would make the proposed patch I just send actually possible), but its
>>> still going *bang* ?
>>
>> It's possible that the DS writes overshoot the MSR... or that there's a
>> bug somewhere.
>
> OK, so the first thing to stare at is in what order the MSRs are
> touched, I think you first need to stop the counter, then clear the PEBS
> bits, then clear the DS_AREA one.
>

Any updates on the options for this or is there still squinting going on 
with MSR orders?

I guess it's a good thing Ingo's request for :pp to be the default was 
not implemented:
   https://lkml.org/lkml/2012/5/14/26

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-10 23:38               ` David Ahern
@ 2012-07-11  7:10                 ` Gleb Natapov
  2012-07-11  9:49                   ` Peter Zijlstra
  2012-07-20 23:34                   ` David Ahern
  0 siblings, 2 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-11  7:10 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML

On Tue, Jul 10, 2012 at 05:38:40PM -0600, David Ahern wrote:
> On 7/9/12 8:59 AM, Peter Zijlstra wrote:
> >On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote:
> >>On 07/09/2012 05:49 PM, Peter Zijlstra wrote:
> >>>On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote:
> >>>>
> >>>>I found this testing changes to perf-kvm, but found the problem extends
> >>>>to just perf-record. With perf-record exclude_guest defaults to 1. See
> >>>>tools/perf/util/util.c, event_attr_init().
> >>>
> >>>You lost me there.. so perf-record defaults to exclude_guest=1 (which
> >>>would make the proposed patch I just send actually possible), but its
> >>>still going *bang* ?
> >>
> >>It's possible that the DS writes overshoot the MSR... or that there's a
> >>bug somewhere.
> >
> >OK, so the first thing to stare at is in what order the MSRs are
> >touched, I think you first need to stop the counter, then clear the PEBS
> >bits, then clear the DS_AREA one.
> >
> 
> Any updates on the options for this or is there still squinting
> going on with MSR orders?
> 
Looks like Avi is right about the overshoot. Can you test something like this?

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 166546e..5fb371a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
 	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
 	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
+	arr[1].msr = MSR_IA32_PEBS_ENABLE;
+	arr[1].host = cpuc->pebs_enabled;
+	arr[1].guest = 0;
+	*nr = 2;
 
-	*nr = 1;
 	return arr;
 }
 

> I guess it's a good thing Ingo's request for :pp to be the default
> was not implemented:
>   https://lkml.org/lkml/2012/5/14/26
> 
> David

--
			Gleb.

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-11  7:10                 ` Gleb Natapov
@ 2012-07-11  9:49                   ` Peter Zijlstra
  2012-07-11  9:53                     ` Gleb Natapov
  2012-07-20 23:34                   ` David Ahern
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-11  9:49 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML

On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:

> Looks like Avi is right about the overshoot. Can you test something like this?
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 166546e..5fb371a 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>  	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>  	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>  	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> +	arr[1].host = cpuc->pebs_enabled;
> +	arr[1].guest = 0;
> +	*nr = 2;
>  
> -	*nr = 1;
>  	return arr;
>  }


You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and
ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a
proper NULL deref or such.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-11  9:49                   ` Peter Zijlstra
@ 2012-07-11  9:53                     ` Gleb Natapov
  2012-07-11 13:34                       ` David Ahern
  2012-07-12  4:11                       ` David Ahern
  0 siblings, 2 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-11  9:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: David Ahern, Avi Kivity, LKML

On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
> 
> > Looks like Avi is right about the overshoot. Can you test something like this?
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> > index 166546e..5fb371a 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
> >  	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> >  	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> >  	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> > +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> > +	arr[1].host = cpuc->pebs_enabled;
> > +	arr[1].guest = 0;
> > +	*nr = 2;
> >  
> > -	*nr = 1;
> >  	return arr;
> >  }
> 
> 
> You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and
> ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a
> proper NULL deref or such.
Yes. With the patch above :pp modifier does not crash guest for me, but
in theory it should since BTS are still written to DS. May be BTS writes do
not overshoot guest entry. Will have to ask Intel for clarification.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-11  9:53                     ` Gleb Natapov
@ 2012-07-11 13:34                       ` David Ahern
  2012-07-12  4:11                       ` David Ahern
  1 sibling, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-11 13:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/11/12 3:53 AM, Gleb Natapov wrote:
> On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
>> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
>>
>>> Looks like Avi is right about the overshoot. Can you test something like this?

I head to the airport in a few minutes; I'll try it out tonight or 
tomorrow morning.

David

>>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
>>> index 166546e..5fb371a 100644
>>> --- a/arch/x86/kernel/cpu/perf_event_intel.c
>>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
>>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>>>   	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>>>   	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>>>   	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
>>> +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
>>> +	arr[1].host = cpuc->pebs_enabled;
>>> +	arr[1].guest = 0;
>>> +	*nr = 2;
>>>
>>> -	*nr = 1;
>>>   	return arr;
>>>   }
>>
>>
>> You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and
>> ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a
>> proper NULL deref or such.
> Yes. With the patch above :pp modifier does not crash guest for me, but
> in theory it should since BTS are still written to DS. May be BTS writes do
> not overshoot guest entry. Will have to ask Intel for clarification.
>
> --
> 			Gleb.
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-11  9:53                     ` Gleb Natapov
  2012-07-11 13:34                       ` David Ahern
@ 2012-07-12  4:11                       ` David Ahern
  2012-07-12  4:29                         ` Gleb Natapov
  1 sibling, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-12  4:11 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/11/12 3:53 AM, Gleb Natapov wrote:
> On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
>> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
>>
>>> Looks like Avi is right about the overshoot. Can you test something like this?
>>>
>>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
>>> index 166546e..5fb371a 100644
>>> --- a/arch/x86/kernel/cpu/perf_event_intel.c
>>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
>>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>>>   	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>>>   	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>>>   	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
>>> +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
>>> +	arr[1].host = cpuc->pebs_enabled;
>>> +	arr[1].guest = 0;
>>> +	*nr = 2;
>>>
>>> -	*nr = 1;
>>>   	return arr;
>>>   }
>>

So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel and a 
2.6.38 kernel have not faired well - and that's the only VM I have tried 
at the moment. Using -e cycles:pp I have been able to lock up the VM 3 
times out of 3 series of tests with perf-kvm that includes network 
traffic (e.g., netperf), disk I/O (dd based to create a file with dsync 
flag) and pure userspace cpu bound (openssl speed). May or may not be 
related.

Also, I noted that 'perf kvm --guest record -e cycles:pp' does not 
generate a whole lot of samples -- like < 100 in a 20-second sample -- 
despite the fact that the guest is rather busy.

I won't have much time over the next few days to run much in the way of 
tests; I'll come back to it Sunday night.

David

>>
>> You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and
>> ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a
>> proper NULL deref or such.
> Yes. With the patch above :pp modifier does not crash guest for me, but
> in theory it should since BTS are still written to DS. May be BTS writes do
> not overshoot guest entry. Will have to ask Intel for clarification.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12  4:11                       ` David Ahern
@ 2012-07-12  4:29                         ` Gleb Natapov
  2012-07-12 15:20                           ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: Gleb Natapov @ 2012-07-12  4:29 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML

On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote:
> On 7/11/12 3:53 AM, Gleb Natapov wrote:
> >On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
> >>On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
> >>
> >>>Looks like Avi is right about the overshoot. Can you test something like this?
> >>>
> >>>diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> >>>index 166546e..5fb371a 100644
> >>>--- a/arch/x86/kernel/cpu/perf_event_intel.c
> >>>+++ b/arch/x86/kernel/cpu/perf_event_intel.c
> >>>@@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
> >>>  	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> >>>  	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> >>>  	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> >>>+	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> >>>+	arr[1].host = cpuc->pebs_enabled;
> >>>+	arr[1].guest = 0;
> >>>+	*nr = 2;
> >>>
> >>>-	*nr = 1;
> >>>  	return arr;
> >>>  }
> >>
> 
> So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel
> and a 2.6.38 kernel have not faired well - and that's the only VM I
> have tried at the moment. Using -e cycles:pp I have been able to
> lock up the VM 3 times out of 3 series of tests with perf-kvm that
> includes network traffic (e.g., netperf), disk I/O (dd based to
> create a file with dsync flag) and pure userspace cpu bound (openssl
> speed). May or may not be related.
> 
OK that's may be BTSes. What about -e cycles:p? BTW are you using your
patch to set exclude_guest parameter? If not use -e cycles:Hp.

> Also, I noted that 'perf kvm --guest record -e cycles:pp' does not
> generate a whole lot of samples -- like < 100 in a 20-second sample
> -- despite the fact that the guest is rather busy.
> 
Host events do not suppose to generate events while guest is running.

> I won't have much time over the next few days to run much in the way
> of tests; I'll come back to it Sunday night.
> 
> David
> 
> >>
> >>You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and
> >>ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a
> >>proper NULL deref or such.
> >Yes. With the patch above :pp modifier does not crash guest for me, but
> >in theory it should since BTS are still written to DS. May be BTS writes do
> >not overshoot guest entry. Will have to ask Intel for clarification.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12  4:29                         ` Gleb Natapov
@ 2012-07-12 15:20                           ` David Ahern
  2012-07-12 16:06                             ` Gleb Natapov
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-12 15:20 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/11/12 10:29 PM, Gleb Natapov wrote:
> On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote:
>> On 7/11/12 3:53 AM, Gleb Natapov wrote:
>>> On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
>>>> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
>>>>
>>>>> Looks like Avi is right about the overshoot. Can you test something like this?
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
>>>>> index 166546e..5fb371a 100644
>>>>> --- a/arch/x86/kernel/cpu/perf_event_intel.c
>>>>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
>>>>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>>>>>   	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>>>>>   	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>>>>>   	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
>>>>> +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
>>>>> +	arr[1].host = cpuc->pebs_enabled;
>>>>> +	arr[1].guest = 0;
>>>>> +	*nr = 2;
>>>>>
>>>>> -	*nr = 1;
>>>>>   	return arr;
>>>>>   }
>>>>
>>
>> So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel
>> and a 2.6.38 kernel have not faired well - and that's the only VM I
>> have tried at the moment. Using -e cycles:pp I have been able to
>> lock up the VM 3 times out of 3 series of tests with perf-kvm that
>> includes network traffic (e.g., netperf), disk I/O (dd based to
>> create a file with dsync flag) and pure userspace cpu bound (openssl
>> speed). May or may not be related.
>>
> OK that's may be BTSes. What about -e cycles:p? BTW are you using your
> patch to set exclude_guest parameter? If not use -e cycles:Hp.

I started with cycles:pp; should not really matter - they all need to 
work without blowing up VMs (cycles:p, cycles:pH, cycles:pG, cycles:pp, 
cycles:ppH, cycles:ppG).

For grins I ran a quick test while reading emails this morning. This 
time a fedora 17 VM with 3.4.0-1.fc17.x86_64 kernel. It too locks up 
pretty quickly - just a couple of iterations of perf:

perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a -v 
-e cycles:pH -- sleep 60

Note the :pH this time.

I did not have netserver installed in the VM so used a ping flood for 
network traffic.

>
>> Also, I noted that 'perf kvm --guest record -e cycles:pp' does not
>> generate a whole lot of samples -- like < 100 in a 20-second sample
>> -- despite the fact that the guest is rather busy.
>>
> Host events do not suppose to generate events while guest is running.

My server has 16 cpus and the VM has only 2 vcpus; with the -a I would 
expect some host sampling. Note: in the above case :pp resets the 
exclude-host modifier set by the perf-kvm part, so hosts samples are not 
excluded. See parse_events_modifier().

So, is the idea of your patch to not enable the PEBS in guest mode?

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 15:20                           ` David Ahern
@ 2012-07-12 16:06                             ` Gleb Natapov
  2012-07-12 16:13                               ` Gleb Natapov
                                                 ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-12 16:06 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML

On Thu, Jul 12, 2012 at 09:20:53AM -0600, David Ahern wrote:
> On 7/11/12 10:29 PM, Gleb Natapov wrote:
> >On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote:
> >>On 7/11/12 3:53 AM, Gleb Natapov wrote:
> >>>On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote:
> >>>>On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote:
> >>>>
> >>>>>Looks like Avi is right about the overshoot. Can you test something like this?
> >>>>>
> >>>>>diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> >>>>>index 166546e..5fb371a 100644
> >>>>>--- a/arch/x86/kernel/cpu/perf_event_intel.c
> >>>>>+++ b/arch/x86/kernel/cpu/perf_event_intel.c
> >>>>>@@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
> >>>>>  	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> >>>>>  	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> >>>>>  	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> >>>>>+	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> >>>>>+	arr[1].host = cpuc->pebs_enabled;
> >>>>>+	arr[1].guest = 0;
> >>>>>+	*nr = 2;
> >>>>>
> >>>>>-	*nr = 1;
> >>>>>  	return arr;
> >>>>>  }
> >>>>
> >>
> >>So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel
> >>and a 2.6.38 kernel have not faired well - and that's the only VM I
> >>have tried at the moment. Using -e cycles:pp I have been able to
> >>lock up the VM 3 times out of 3 series of tests with perf-kvm that
> >>includes network traffic (e.g., netperf), disk I/O (dd based to
> >>create a file with dsync flag) and pure userspace cpu bound (openssl
> >>speed). May or may not be related.
> >>
> >OK that's may be BTSes. What about -e cycles:p? BTW are you using your
> >patch to set exclude_guest parameter? If not use -e cycles:Hp.
> 
> I started with cycles:pp; should not really matter - they all need
> to work without blowing up VMs (cycles:p, cycles:pH, cycles:pG,
> cycles:pp, cycles:ppH, cycles:ppG).
cycles:ppG and cycles:pG should be illegal. Peter's patch takes care of
this. Others should set exclude guest and have to work without blowing
up VMs.

> 
> For grins I ran a quick test while reading emails this morning. This
> time a fedora 17 VM with 3.4.0-1.fc17.x86_64 kernel. It too locks up
> pretty quickly - just a couple of iterations of perf:
> 
> perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a
> -v -e cycles:pH -- sleep 60
Do not run perf kvm. It does not set exclude_guest and :p and :pp is not
compatible with guest profiling and should be disallowed. Again Peter's
patch takes care of this.

Run perf top -e cycles:pH or similar.

> 
> Note the :pH this time.
I am not sure what perf kvm does with :pH modifier, but H modifier does
not make sense with perf kvm and should be reported as an error by perf tool.

> 
> I did not have netserver installed in the VM so used a ping flood
> for network traffic.
> 
> >
> >>Also, I noted that 'perf kvm --guest record -e cycles:pp' does not
> >>generate a whole lot of samples -- like < 100 in a 20-second sample
> >>-- despite the fact that the guest is rather busy.
> >>
> >Host events do not suppose to generate events while guest is running.
> 
> My server has 16 cpus and the VM has only 2 vcpus; with the -a I
> would expect some host sampling. Note: in the above case :pp resets
> the exclude-host modifier set by the perf-kvm part, so hosts samples
> are not excluded. See parse_events_modifier().
Isn't this a bug? Why anything at all resets exclude-host set by
perf-kvm?


> 
> So, is the idea of your patch to not enable the PEBS in guest mode?
> 
Yes, the idea of my patch is to disable PEBS and the guest entry.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 16:06                             ` Gleb Natapov
@ 2012-07-12 16:13                               ` Gleb Natapov
  2012-07-12 16:58                                 ` Peter Zijlstra
  2012-07-16  1:51                                 ` David Ahern
  2012-07-15  8:07                               ` Avi Kivity
  2012-07-16  2:19                               ` David Ahern
  2 siblings, 2 replies; 35+ messages in thread
From: Gleb Natapov @ 2012-07-12 16:13 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML

On Thu, Jul 12, 2012 at 07:06:35PM +0300, Gleb Natapov wrote:
> > 
> > So, is the idea of your patch to not enable the PEBS in guest mode?
> > 
> Yes, the idea of my patch is to disable PEBS and the guest entry.
> 
BTW what is you host cpu? My patch works only for those that have global
control register to disable PMU events, but that should cover most of
moder cpus.

--
			Gleb.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 16:13                               ` Gleb Natapov
@ 2012-07-12 16:58                                 ` Peter Zijlstra
  2012-07-16  1:51                                 ` David Ahern
  1 sibling, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2012-07-12 16:58 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML

On Thu, 2012-07-12 at 19:13 +0300, Gleb Natapov wrote:
> BTW what is you host cpu? My patch works only for those that have global
> control register to disable PMU events, but that should cover most of
> moder cpus. 

That covers everything but core and p4, and I don't think we support
PEBS on either of those.

that said, you might want to also clear EN on the individual counters,
just in case the PEBS clear and DS_AREA clear aren't working.

The SDM is pretty clear on what you _should_ do to enable/disable PEBS,
doing less _might_ work but...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 16:06                             ` Gleb Natapov
  2012-07-12 16:13                               ` Gleb Natapov
@ 2012-07-15  8:07                               ` Avi Kivity
  2012-07-15 13:00                                 ` David Ahern
  2012-07-16  2:19                               ` David Ahern
  2 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-07-15  8:07 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: David Ahern, Peter Zijlstra, LKML

On 07/12/2012 07:06 PM, Gleb Natapov wrote:
>> 
>> Note the :pH this time.
> I am not sure what perf kvm does with :pH modifier, but H modifier does
> not make sense with perf kvm and should be reported as an error by perf tool.

Maybe it should refer to the guest vs. the nested guest...

Well that's a horrible UI but we do need some way to distinguish among
the two.  Samples from the nested guest's kernel will make no sense when
looking at the guest kernel.  I guess we should default to exclude
nested guest mode.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-15  8:07                               ` Avi Kivity
@ 2012-07-15 13:00                                 ` David Ahern
  2012-07-15 13:03                                   ` Avi Kivity
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-15 13:00 UTC (permalink / raw)
  To: Avi Kivity, Gleb Natapov; +Cc: Peter Zijlstra, LKML

On 7/15/12 2:07 AM, Avi Kivity wrote:
> On 07/12/2012 07:06 PM, Gleb Natapov wrote:
>>>
>>> Note the :pH this time.
>> I am not sure what perf kvm does with :pH modifier, but H modifier does
>> not make sense with perf kvm and should be reported as an error by perf tool.
>
> Maybe it should refer to the guest vs. the nested guest...

:H = host mode; you are thinking of :h for hypervisor mode. From 
perf-list documentation:

"Modifiers allow the user to restrict when events are
counted with 'u' for user-space, 'k' for kernel, 'h' for hypervisor.
Additional modifiers are 'G' for guest counting (in KVM guests) and 'H'
for host counting (not in KVM guests)."

>
> Well that's a horrible UI but we do need some way to distinguish among
> the two.  Samples from the nested guest's kernel will make no sense when
> looking at the guest kernel.  I guess we should default to exclude
> nested guest mode.
>

I catch a return flight in a couple of hours. I will send a summary 
email tonight to make sure we are on the same page with the problem, 
uses cases and patches.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-15 13:00                                 ` David Ahern
@ 2012-07-15 13:03                                   ` Avi Kivity
  2012-07-16  1:52                                     ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: Avi Kivity @ 2012-07-15 13:03 UTC (permalink / raw)
  To: David Ahern; +Cc: Gleb Natapov, Peter Zijlstra, LKML

On 07/15/2012 04:00 PM, David Ahern wrote:
> On 7/15/12 2:07 AM, Avi Kivity wrote:
>> On 07/12/2012 07:06 PM, Gleb Natapov wrote:
>>>>
>>>> Note the :pH this time.
>>> I am not sure what perf kvm does with :pH modifier, but H modifier does
>>> not make sense with perf kvm and should be reported as an error by
>>> perf tool.
>>
>> Maybe it should refer to the guest vs. the nested guest...
> 
> :H = host mode; you are thinking of :h for hypervisor mode. From
> perf-list documentation:
> 
> "Modifiers allow the user to restrict when events are
> counted with 'u' for user-space, 'k' for kernel, 'h' for hypervisor.
> Additional modifiers are 'G' for guest counting (in KVM guests) and 'H'
> for host counting (not in KVM guests)."

No, it's an additional distinction.  A kvm guest can be in kernel mode,
user mode, or in a nested guest mode (which has its own user mode and
kernel mode).  Currently we have no way of distinguishing between guest
kernel mode and nested guest kernel mode.

I assume 'h' means profiling the hypervisor from a guest (i.e. xen dom0)?

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 16:13                               ` Gleb Natapov
  2012-07-12 16:58                                 ` Peter Zijlstra
@ 2012-07-16  1:51                                 ` David Ahern
  1 sibling, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-16  1:51 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/12/12 10:13 AM, Gleb Natapov wrote:
> On Thu, Jul 12, 2012 at 07:06:35PM +0300, Gleb Natapov wrote:
>>>
>>> So, is the idea of your patch to not enable the PEBS in guest mode?
>>>
>> Yes, the idea of my patch is to disable PEBS and the guest entry.
>>
> BTW what is you host cpu? My patch works only for those that have global
> control register to disable PMU events, but that should cover most of
> moder cpus.

So far all tests have been done on an HP with 2 Nehalem processors 
(E5540); I have another server with a couple of Westmere's (E5620) that 
I will test at some point.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-15 13:03                                   ` Avi Kivity
@ 2012-07-16  1:52                                     ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-16  1:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, Peter Zijlstra, LKML

On 7/15/12 7:03 AM, Avi Kivity wrote:
> I assume 'h' means profiling the hypervisor from a guest (i.e. xen dom0)?

I do not think so. It does not appear to do anything on x86.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-12 16:06                             ` Gleb Natapov
  2012-07-12 16:13                               ` Gleb Natapov
  2012-07-15  8:07                               ` Avi Kivity
@ 2012-07-16  2:19                               ` David Ahern
  2 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-16  2:19 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/12/12 10:06 AM, Gleb Natapov wrote:
>> I started with cycles:pp; should not really matter - they all need
>> to work without blowing up VMs (cycles:p, cycles:pH, cycles:pG,
>> cycles:pp, cycles:ppH, cycles:ppG).
> cycles:ppG and cycles:pG should be illegal. Peter's patch takes care of
> this. Others should set exclude guest and have to work without blowing
> up VMs.

I was only testing your patch; I'll add Peter's going forward.

---8<---

>> perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a
>> -v -e cycles:pH -- sleep 60
> Do not run perf kvm. It does not set exclude_guest and :p and :pp is not
> compatible with guest profiling and should be disallowed. Again Peter's
> patch takes care of this.

Well, perf-kvm needs to work too - and yes you can set exclude_guest 
with it (doesn't make sense to use perf-kvm with exclude_guest, but then 
VMs should not die because of it). If someone adds -e <hardware 
event>:p{p} it needs to be handled cleanly. The G and H event modifiers 
can be used explicitly to exclude host or guest and that case too needs 
to be handled properly from whatever the context is.

Peter's patch handles kernel side checks requiring exclude_guest if the 
pebs is to be used. A user running perf-kvm, perf-record and perf-top 
deserves something better than a confusing 'not supported error' which 
includes a message suggesting perf events is not enabled. I have some 
local patches for record and top to cover this.

>
> Run perf top -e cycles:pH or similar.
>
>>
>> Note the :pH this time.
> I am not sure what perf kvm does with :pH modifier, but H modifier does
> not make sense with perf kvm and should be reported as an error by perf tool.

perf-kvm passes everything from record on to the builtin record command. 
So, 'perf kvm record -e cycles:pH' effectively becomes 'perf record -e 
cycles:pH' with 2 global flags modified (perf_guest set and perf_host 
unset). There is nothing invalid about the perf-kvm command.

---8<---

>> My server has 16 cpus and the VM has only 2 vcpus; with the -a I
>> would expect some host sampling. Note: in the above case :pp resets
>> the exclude-host modifier set by the perf-kvm part, so hosts samples
>> are not excluded. See parse_events_modifier().
> Isn't this a bug? Why anything at all resets exclude-host set by
> perf-kvm?

Arguably yes and arguably no. As with any linux command later arguments 
can unset/reset earlier ones. In this case -e <event>:<modifier> comes 
after the perf-kvm --guest --host settings. I can whip up a patch for 
perf {record,top} to spit out a debug message if perf_{guest,host} are 
not consistent with excl_{guest,host} if that's wanted.

Don't forget -- perf-kvm has the --host flag to enable host side 
monitoring and --no-guest can be used to disable guest side monitoring.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-11  7:10                 ` Gleb Natapov
  2012-07-11  9:49                   ` Peter Zijlstra
@ 2012-07-20 23:34                   ` David Ahern
  2012-07-22  9:52                     ` Avi Kivity
  1 sibling, 1 reply; 35+ messages in thread
From: David Ahern @ 2012-07-20 23:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML

On 7/11/12 1:10 AM, Gleb Natapov wrote:
> Looks like Avi is right about the overshoot. Can you test something like this?
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 166546e..5fb371a 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>   	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>   	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>   	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> +	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> +	arr[1].host = cpuc->pebs_enabled;
> +	arr[1].guest = 0;
> +	*nr = 2;
>
> -	*nr = 1;
>   	return arr;
>   }


Tested-by: David Ahern <dsahern@gmail.com>

Peter's patch is required as well. It's the combination that fixes the 
problem.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-09 14:47       ` Peter Zijlstra
@ 2012-07-20 23:35         ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2012-07-20 23:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML

On 7/9/12 8:47 AM, Peter Zijlstra wrote:
> On Mon, 2012-07-09 at 17:39 +0300, Avi Kivity wrote:
>>> Disabling PEBS events for guests isn't pretty though..
>>
>> We already have atomic MSR switching at guest entry/exit time.  So it's
>> not pretty in terms of not getting full profiling, but the code won't be
>> too hard.  Basically we just have to exclude_guest any pebs event.
>
> OK, so ideally we'd do something like the below, except of course that
> that isn't backwards compatible and will break the world :/
>
> bugger that
>
> ---
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -336,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event)
>                  /* BTS is currently only allowed for user-mode. */
>                  if (!attr->exclude_kernel)
>                          return -EOPNOTSUPP;
> +
> +               if (!attr->exclude_guest)
> +                       return -EOPNOTSUPP;
>          }
>
>          hwc->config |= config;
> @@ -378,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
>          if (event->attr.precise_ip) {
>                  int precise = 0;
>
> +               if (!event->attr.exclude_guest)
> +                       return -EOPNOTSUPP;
> +
>                  /* Support for constant skid */
>                  if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
>                          precise++;
>

Tested-by: David Ahern <dsahern@gmail.com>

Gleb's patch is required as well. It takes both to fix the problem. And 
my userspace patch keeps compatibility for users who currently use -e 
cycles:pp (they won't be forced to add 'H'). (lkml is slow to show it so 
I can't pull a link for reference, but it's in your email ahead of this 
one).

Be nice to get this set into stable releases as well.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: perf with precise attribute kills all KVM based VMs
  2012-07-20 23:34                   ` David Ahern
@ 2012-07-22  9:52                     ` Avi Kivity
  0 siblings, 0 replies; 35+ messages in thread
From: Avi Kivity @ 2012-07-22  9:52 UTC (permalink / raw)
  To: David Ahern; +Cc: Gleb Natapov, Peter Zijlstra, LKML

On 07/21/2012 02:34 AM, David Ahern wrote:
> On 7/11/12 1:10 AM, Gleb Natapov wrote:
>> Looks like Avi is right about the overshoot. Can you test something
>> like this?
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
>> b/arch/x86/kernel/cpu/perf_event_intel.c
>> index 166546e..5fb371a 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr
>> *intel_guest_get_msrs(int *nr)
>>       arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>>       arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>>       arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
>> +    arr[1].msr = MSR_IA32_PEBS_ENABLE;
>> +    arr[1].host = cpuc->pebs_enabled;
>> +    arr[1].guest = 0;
>> +    *nr = 2;
>>
>> -    *nr = 1;
>>       return arr;
>>   }
> 
> 
> Tested-by: David Ahern <dsahern@gmail.com>

Thanks.

Gleb, please add a comment explaining about the skid, because otherwise
it looks redundant.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-07-22  9:53 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 14:12 perf with precise attribute kills all KVM based VMs David Ahern
2012-07-09 14:19 ` Gleb Natapov
2012-07-09 14:24   ` Peter Zijlstra
2012-07-09 14:36     ` Gleb Natapov
2012-07-09 14:39     ` Avi Kivity
2012-07-09 14:47       ` David Ahern
2012-07-09 14:49         ` Peter Zijlstra
2012-07-09 14:51           ` Avi Kivity
2012-07-09 14:54             ` Gleb Natapov
2012-07-09 14:57               ` Gleb Natapov
2012-07-09 14:59             ` Peter Zijlstra
2012-07-10 23:38               ` David Ahern
2012-07-11  7:10                 ` Gleb Natapov
2012-07-11  9:49                   ` Peter Zijlstra
2012-07-11  9:53                     ` Gleb Natapov
2012-07-11 13:34                       ` David Ahern
2012-07-12  4:11                       ` David Ahern
2012-07-12  4:29                         ` Gleb Natapov
2012-07-12 15:20                           ` David Ahern
2012-07-12 16:06                             ` Gleb Natapov
2012-07-12 16:13                               ` Gleb Natapov
2012-07-12 16:58                                 ` Peter Zijlstra
2012-07-16  1:51                                 ` David Ahern
2012-07-15  8:07                               ` Avi Kivity
2012-07-15 13:00                                 ` David Ahern
2012-07-15 13:03                                   ` Avi Kivity
2012-07-16  1:52                                     ` David Ahern
2012-07-16  2:19                               ` David Ahern
2012-07-20 23:34                   ` David Ahern
2012-07-22  9:52                     ` Avi Kivity
2012-07-09 14:52           ` David Ahern
2012-07-09 14:58             ` David Ahern
2012-07-09 15:18               ` David Ahern
2012-07-09 14:47       ` Peter Zijlstra
2012-07-20 23:35         ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).