* perf with precise attribute kills all KVM based VMs @ 2012-07-09 14:12 David Ahern 2012-07-09 14:19 ` Gleb Natapov 0 siblings, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-09 14:12 UTC (permalink / raw) To: Peter Zijlstra, Avi Kivity, Gleb Natapov, LKML This is 100% reproducible with Fedora 17, 3.4.2-1.fc16.x86_64 kernel. Using the precise attribute (:p or :pp) with perf-record, eg, perf record -e cycles:p -ag -- sleep 10 All running VMs are killed. The VMs appear to be restarted but crash on restart. From one of the VMs that has the console redirected to ttyS0 and ttyS0 of the VM mapped to stdio I was able to capture some boot logs on the restart of the VM: [ 0.019136] BUG: unable to handle kernel paging request at 13832f88 [ 0.020252] IP: [<c0104ed0>] mcount+0x0/0xc [ 0.021000] *pdpt = 0000000000572001 *pde = 0000000000000000 [ 0.021942] Oops: 0000 [#1] SMP [ 0.021996] __die: Die type Oops [ 0.021996] Modules linked in: [ 0.021996] [ 0.021996] Pid: 0, comm: swapper Not tainted (2.6.27.47 #0) [ 0.021996] EIP: 0060:[<c0104ed0>] EFLAGS: 00010296 CPU: 0 [ 0.021996] EIP is at mcount+0x0/0xc [ 0.021996] EAX: f78026c0 EBX: 00000000 ECX: f78026c0 EDX: 000080d0 [ 0.021996] ESI: f782c228 EDI: f780b240 EBP: c051fef0 ESP: c051fed4 [ 0.021996] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 0.021996] Process swapper (pid: 0, ti=c051e000 task=c04d0380 task.ti=c051e000) [ 0.021996] Stack: c0193339 c0267510 c051feec c026745f 00000000 f782c228 f780b240 c051fefc [ 0.021996] c026c6b3 f780b240 c051ff14 c0267690 00000000 00000000 f782c200 f780b2a0 [ 0.021996] c051ff4c c026720e 00000001 f88055af f782c228 00000000 f782c200 00000000 David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:12 perf with precise attribute kills all KVM based VMs David Ahern @ 2012-07-09 14:19 ` Gleb Natapov 2012-07-09 14:24 ` Peter Zijlstra 0 siblings, 1 reply; 35+ messages in thread From: Gleb Natapov @ 2012-07-09 14:19 UTC (permalink / raw) To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML On Mon, Jul 09, 2012 at 08:12:40AM -0600, David Ahern wrote: > This is 100% reproducible with Fedora 17, 3.4.2-1.fc16.x86_64 kernel. > > Using the precise attribute (:p or :pp) with perf-record, eg, > > perf record -e cycles:p -ag -- sleep 10 > > All running VMs are killed. The VMs appear to be restarted but crash > on restart. > > From one of the VMs that has the console redirected to ttyS0 and > ttyS0 of the VM mapped to stdio I was able to capture some boot logs > on the restart of the VM: > Yes, this is knows problem that I can't find time to fix. The crash is cause by CPU using host PEBS virtual address while guest is running which causes guest memory corruption. We should disable evens that use PEBS at the guest entry. > [ 0.019136] BUG: unable to handle kernel paging request at 13832f88 > [ 0.020252] IP: [<c0104ed0>] mcount+0x0/0xc > [ 0.021000] *pdpt = 0000000000572001 *pde = 0000000000000000 > [ 0.021942] Oops: 0000 [#1] SMP > [ 0.021996] __die: Die type Oops > [ 0.021996] Modules linked in: > [ 0.021996] > [ 0.021996] Pid: 0, comm: swapper Not tainted (2.6.27.47 #0) > [ 0.021996] EIP: 0060:[<c0104ed0>] EFLAGS: 00010296 CPU: 0 > [ 0.021996] EIP is at mcount+0x0/0xc > [ 0.021996] EAX: f78026c0 EBX: 00000000 ECX: f78026c0 EDX: 000080d0 > [ 0.021996] ESI: f782c228 EDI: f780b240 EBP: c051fef0 ESP: c051fed4 > [ 0.021996] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 0.021996] Process swapper (pid: 0, ti=c051e000 task=c04d0380 > task.ti=c051e000) > [ 0.021996] Stack: c0193339 c0267510 c051feec c026745f 00000000 > f782c228 f780b240 c051fefc > [ 0.021996] c026c6b3 f780b240 c051ff14 c0267690 00000000 > 00000000 f782c200 f780b2a0 > [ 0.021996] c051ff4c c026720e 00000001 f88055af f782c228 > 00000000 f782c200 00000000 > > > > David -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:19 ` Gleb Natapov @ 2012-07-09 14:24 ` Peter Zijlstra 2012-07-09 14:36 ` Gleb Natapov 2012-07-09 14:39 ` Avi Kivity 0 siblings, 2 replies; 35+ messages in thread From: Peter Zijlstra @ 2012-07-09 14:24 UTC (permalink / raw) To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote: > Yes, this is knows problem that I can't find time to fix. The crash is > cause by CPU using host PEBS virtual address while guest is running > which causes guest memory corruption. We should disable evens that use > PEBS at the guest entry. Whoops.. so the hardware is reading the DS address as programmed by the host (host linear address) and using it as a guest linear address? Quality stuff.. Disabling PEBS events for guests isn't pretty though.. but I guess the only alternative is mapping the DS into the guest and reprogramming MSR_IA32_DS_AREA which is all a little involved I suppose? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:24 ` Peter Zijlstra @ 2012-07-09 14:36 ` Gleb Natapov 2012-07-09 14:39 ` Avi Kivity 1 sibling, 0 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-09 14:36 UTC (permalink / raw) To: Peter Zijlstra; +Cc: David Ahern, Avi Kivity, LKML On Mon, Jul 09, 2012 at 04:24:04PM +0200, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote: > > Yes, this is knows problem that I can't find time to fix. The crash is > > cause by CPU using host PEBS virtual address while guest is running > > which causes guest memory corruption. We should disable evens that use > > PEBS at the guest entry. > > Whoops.. so the hardware is reading the DS address as programmed by the > host (host linear address) and using it as a guest linear address? > Yes. > Quality stuff.. DS area and vitalization are definitely not friends. > > Disabling PEBS events for guests isn't pretty though.. but I guess the > only alternative is mapping the DS into the guest and reprogramming > MSR_IA32_DS_AREA which is all a little involved I suppose? This needs guest cooperation which is possible with PV guest of course, but there is a security issue here. Intel SDM requires that DS have to be mapped by page table at all time and marked dirty/accessed otherwise unspecified bad things can happen. Nothing prevents guest from unmapping DS and cause host troubles. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:24 ` Peter Zijlstra 2012-07-09 14:36 ` Gleb Natapov @ 2012-07-09 14:39 ` Avi Kivity 2012-07-09 14:47 ` David Ahern 2012-07-09 14:47 ` Peter Zijlstra 1 sibling, 2 replies; 35+ messages in thread From: Avi Kivity @ 2012-07-09 14:39 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Gleb Natapov, David Ahern, LKML On 07/09/2012 05:24 PM, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote: >> Yes, this is knows problem that I can't find time to fix. The crash is >> cause by CPU using host PEBS virtual address while guest is running >> which causes guest memory corruption. We should disable evens that use >> PEBS at the guest entry. > > Whoops.. so the hardware is reading the DS address as programmed by the > host (host linear address) and using it as a guest linear address? > > Quality stuff.. > > Disabling PEBS events for guests isn't pretty though.. We already have atomic MSR switching at guest entry/exit time. So it's not pretty in terms of not getting full profiling, but the code won't be too hard. Basically we just have to exclude_guest any pebs event. > but I guess the > only alternative is mapping the DS into the guest and reprogramming > MSR_IA32_DS_AREA which is all a little involved I suppose? Way too involved, especially as it's virtual addresses and we don't control the guest cr3. Note that the hardware won't fail gracefully. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:39 ` Avi Kivity @ 2012-07-09 14:47 ` David Ahern 2012-07-09 14:49 ` Peter Zijlstra 2012-07-09 14:47 ` Peter Zijlstra 1 sibling, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-09 14:47 UTC (permalink / raw) To: Avi Kivity; +Cc: Peter Zijlstra, Gleb Natapov, LKML On 7/9/12 8:39 AM, Avi Kivity wrote: > On 07/09/2012 05:24 PM, Peter Zijlstra wrote: >> On Mon, 2012-07-09 at 17:19 +0300, Gleb Natapov wrote: >>> Yes, this is knows problem that I can't find time to fix. The crash is >>> cause by CPU using host PEBS virtual address while guest is running >>> which causes guest memory corruption. We should disable evens that use >>> PEBS at the guest entry. >> >> Whoops.. so the hardware is reading the DS address as programmed by the >> host (host linear address) and using it as a guest linear address? >> >> Quality stuff.. >> >> Disabling PEBS events for guests isn't pretty though.. > > We already have atomic MSR switching at guest entry/exit time. So it's > not pretty in terms of not getting full profiling, but the code won't be > too hard. Basically we just have to exclude_guest any pebs event. I found this testing changes to perf-kvm, but found the problem extends to just perf-record. With perf-record exclude_guest defaults to 1. See tools/perf/util/util.c, event_attr_init(). > >> but I guess the >> only alternative is mapping the DS into the guest and reprogramming >> MSR_IA32_DS_AREA which is all a little involved I suppose? > > Way too involved, especially as it's virtual addresses and we don't > control the guest cr3. Note that the hardware won't fail gracefully. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:47 ` David Ahern @ 2012-07-09 14:49 ` Peter Zijlstra 2012-07-09 14:51 ` Avi Kivity 2012-07-09 14:52 ` David Ahern 0 siblings, 2 replies; 35+ messages in thread From: Peter Zijlstra @ 2012-07-09 14:49 UTC (permalink / raw) To: David Ahern; +Cc: Avi Kivity, Gleb Natapov, LKML On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: > > I found this testing changes to perf-kvm, but found the problem extends > to just perf-record. With perf-record exclude_guest defaults to 1. See > tools/perf/util/util.c, event_attr_init(). You lost me there.. so perf-record defaults to exclude_guest=1 (which would make the proposed patch I just send actually possible), but its still going *bang* ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:49 ` Peter Zijlstra @ 2012-07-09 14:51 ` Avi Kivity 2012-07-09 14:54 ` Gleb Natapov 2012-07-09 14:59 ` Peter Zijlstra 2012-07-09 14:52 ` David Ahern 1 sibling, 2 replies; 35+ messages in thread From: Avi Kivity @ 2012-07-09 14:51 UTC (permalink / raw) To: Peter Zijlstra; +Cc: David Ahern, Gleb Natapov, LKML On 07/09/2012 05:49 PM, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: >> >> I found this testing changes to perf-kvm, but found the problem extends >> to just perf-record. With perf-record exclude_guest defaults to 1. See >> tools/perf/util/util.c, event_attr_init(). > > You lost me there.. so perf-record defaults to exclude_guest=1 (which > would make the proposed patch I just send actually possible), but its > still going *bang* ? It's possible that the DS writes overshoot the MSR... or that there's a bug somewhere. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:51 ` Avi Kivity @ 2012-07-09 14:54 ` Gleb Natapov 2012-07-09 14:57 ` Gleb Natapov 2012-07-09 14:59 ` Peter Zijlstra 1 sibling, 1 reply; 35+ messages in thread From: Gleb Natapov @ 2012-07-09 14:54 UTC (permalink / raw) To: Avi Kivity; +Cc: Peter Zijlstra, David Ahern, LKML On Mon, Jul 09, 2012 at 05:51:29PM +0300, Avi Kivity wrote: > On 07/09/2012 05:49 PM, Peter Zijlstra wrote: > > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: > >> > >> I found this testing changes to perf-kvm, but found the problem extends > >> to just perf-record. With perf-record exclude_guest defaults to 1. See > >> tools/perf/util/util.c, event_attr_init(). > > > > You lost me there.. so perf-record defaults to exclude_guest=1 (which > > would make the proposed patch I just send actually possible), but its > > still going *bang* ? > > It's possible that the DS writes overshoot the MSR... or that there's a > bug somewhere. > I hope it is the later, because we can do nothing with the former. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:54 ` Gleb Natapov @ 2012-07-09 14:57 ` Gleb Natapov 0 siblings, 0 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-09 14:57 UTC (permalink / raw) To: Avi Kivity; +Cc: Peter Zijlstra, David Ahern, LKML On Mon, Jul 09, 2012 at 05:54:42PM +0300, Gleb Natapov wrote: > On Mon, Jul 09, 2012 at 05:51:29PM +0300, Avi Kivity wrote: > > On 07/09/2012 05:49 PM, Peter Zijlstra wrote: > > > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: > > >> > > >> I found this testing changes to perf-kvm, but found the problem extends > > >> to just perf-record. With perf-record exclude_guest defaults to 1. See > > >> tools/perf/util/util.c, event_attr_init(). > > > > > > You lost me there.. so perf-record defaults to exclude_guest=1 (which > > > would make the proposed patch I just send actually possible), but its > > > still going *bang* ? > > > > It's possible that the DS writes overshoot the MSR... or that there's a > > bug somewhere. > > > I hope it is the later, because we can do nothing with the former. > Actually we can disable DS before guest entry. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:51 ` Avi Kivity 2012-07-09 14:54 ` Gleb Natapov @ 2012-07-09 14:59 ` Peter Zijlstra 2012-07-10 23:38 ` David Ahern 1 sibling, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2012-07-09 14:59 UTC (permalink / raw) To: Avi Kivity; +Cc: David Ahern, Gleb Natapov, LKML On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote: > On 07/09/2012 05:49 PM, Peter Zijlstra wrote: > > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: > >> > >> I found this testing changes to perf-kvm, but found the problem extends > >> to just perf-record. With perf-record exclude_guest defaults to 1. See > >> tools/perf/util/util.c, event_attr_init(). > > > > You lost me there.. so perf-record defaults to exclude_guest=1 (which > > would make the proposed patch I just send actually possible), but its > > still going *bang* ? > > It's possible that the DS writes overshoot the MSR... or that there's a > bug somewhere. OK, so the first thing to stare at is in what order the MSRs are touched, I think you first need to stop the counter, then clear the PEBS bits, then clear the DS_AREA one. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:59 ` Peter Zijlstra @ 2012-07-10 23:38 ` David Ahern 2012-07-11 7:10 ` Gleb Natapov 0 siblings, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-10 23:38 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML On 7/9/12 8:59 AM, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote: >> On 07/09/2012 05:49 PM, Peter Zijlstra wrote: >>> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: >>>> >>>> I found this testing changes to perf-kvm, but found the problem extends >>>> to just perf-record. With perf-record exclude_guest defaults to 1. See >>>> tools/perf/util/util.c, event_attr_init(). >>> >>> You lost me there.. so perf-record defaults to exclude_guest=1 (which >>> would make the proposed patch I just send actually possible), but its >>> still going *bang* ? >> >> It's possible that the DS writes overshoot the MSR... or that there's a >> bug somewhere. > > OK, so the first thing to stare at is in what order the MSRs are > touched, I think you first need to stop the counter, then clear the PEBS > bits, then clear the DS_AREA one. > Any updates on the options for this or is there still squinting going on with MSR orders? I guess it's a good thing Ingo's request for :pp to be the default was not implemented: https://lkml.org/lkml/2012/5/14/26 David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-10 23:38 ` David Ahern @ 2012-07-11 7:10 ` Gleb Natapov 2012-07-11 9:49 ` Peter Zijlstra 2012-07-20 23:34 ` David Ahern 0 siblings, 2 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-11 7:10 UTC (permalink / raw) To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML On Tue, Jul 10, 2012 at 05:38:40PM -0600, David Ahern wrote: > On 7/9/12 8:59 AM, Peter Zijlstra wrote: > >On Mon, 2012-07-09 at 17:51 +0300, Avi Kivity wrote: > >>On 07/09/2012 05:49 PM, Peter Zijlstra wrote: > >>>On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: > >>>> > >>>>I found this testing changes to perf-kvm, but found the problem extends > >>>>to just perf-record. With perf-record exclude_guest defaults to 1. See > >>>>tools/perf/util/util.c, event_attr_init(). > >>> > >>>You lost me there.. so perf-record defaults to exclude_guest=1 (which > >>>would make the proposed patch I just send actually possible), but its > >>>still going *bang* ? > >> > >>It's possible that the DS writes overshoot the MSR... or that there's a > >>bug somewhere. > > > >OK, so the first thing to stare at is in what order the MSRs are > >touched, I think you first need to stop the counter, then clear the PEBS > >bits, then clear the DS_AREA one. > > > > Any updates on the options for this or is there still squinting > going on with MSR orders? > Looks like Avi is right about the overshoot. Can you test something like this? diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 166546e..5fb371a 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; + arr[1].msr = MSR_IA32_PEBS_ENABLE; + arr[1].host = cpuc->pebs_enabled; + arr[1].guest = 0; + *nr = 2; - *nr = 1; return arr; } > I guess it's a good thing Ingo's request for :pp to be the default > was not implemented: > https://lkml.org/lkml/2012/5/14/26 > > David -- Gleb. ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-11 7:10 ` Gleb Natapov @ 2012-07-11 9:49 ` Peter Zijlstra 2012-07-11 9:53 ` Gleb Natapov 2012-07-20 23:34 ` David Ahern 1 sibling, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2012-07-11 9:49 UTC (permalink / raw) To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: > Looks like Avi is right about the overshoot. Can you test something like this? > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > index 166546e..5fb371a 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) > arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; > + arr[1].msr = MSR_IA32_PEBS_ENABLE; > + arr[1].host = cpuc->pebs_enabled; > + arr[1].guest = 0; > + *nr = 2; > > - *nr = 1; > return arr; > } You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a proper NULL deref or such. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-11 9:49 ` Peter Zijlstra @ 2012-07-11 9:53 ` Gleb Natapov 2012-07-11 13:34 ` David Ahern 2012-07-12 4:11 ` David Ahern 0 siblings, 2 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-11 9:53 UTC (permalink / raw) To: Peter Zijlstra; +Cc: David Ahern, Avi Kivity, LKML On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: > On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: > > > Looks like Avi is right about the overshoot. Can you test something like this? > > > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > > index 166546e..5fb371a 100644 > > --- a/arch/x86/kernel/cpu/perf_event_intel.c > > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > > @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) > > arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > > arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > > arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; > > + arr[1].msr = MSR_IA32_PEBS_ENABLE; > > + arr[1].host = cpuc->pebs_enabled; > > + arr[1].guest = 0; > > + *nr = 2; > > > > - *nr = 1; > > return arr; > > } > > > You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and > ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a > proper NULL deref or such. Yes. With the patch above :pp modifier does not crash guest for me, but in theory it should since BTS are still written to DS. May be BTS writes do not overshoot guest entry. Will have to ask Intel for clarification. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-11 9:53 ` Gleb Natapov @ 2012-07-11 13:34 ` David Ahern 2012-07-12 4:11 ` David Ahern 1 sibling, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-11 13:34 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/11/12 3:53 AM, Gleb Natapov wrote: > On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: >> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: >> >>> Looks like Avi is right about the overshoot. Can you test something like this? I head to the airport in a few minutes; I'll try it out tonight or tomorrow morning. David >>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c >>> index 166546e..5fb371a 100644 >>> --- a/arch/x86/kernel/cpu/perf_event_intel.c >>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c >>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) >>> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; >>> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; >>> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; >>> + arr[1].msr = MSR_IA32_PEBS_ENABLE; >>> + arr[1].host = cpuc->pebs_enabled; >>> + arr[1].guest = 0; >>> + *nr = 2; >>> >>> - *nr = 1; >>> return arr; >>> } >> >> >> You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and >> ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a >> proper NULL deref or such. > Yes. With the patch above :pp modifier does not crash guest for me, but > in theory it should since BTS are still written to DS. May be BTS writes do > not overshoot guest entry. Will have to ask Intel for clarification. > > -- > Gleb. > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-11 9:53 ` Gleb Natapov 2012-07-11 13:34 ` David Ahern @ 2012-07-12 4:11 ` David Ahern 2012-07-12 4:29 ` Gleb Natapov 1 sibling, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-12 4:11 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/11/12 3:53 AM, Gleb Natapov wrote: > On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: >> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: >> >>> Looks like Avi is right about the overshoot. Can you test something like this? >>> >>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c >>> index 166546e..5fb371a 100644 >>> --- a/arch/x86/kernel/cpu/perf_event_intel.c >>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c >>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) >>> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; >>> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; >>> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; >>> + arr[1].msr = MSR_IA32_PEBS_ENABLE; >>> + arr[1].host = cpuc->pebs_enabled; >>> + arr[1].guest = 0; >>> + *nr = 2; >>> >>> - *nr = 1; >>> return arr; >>> } >> So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel and a 2.6.38 kernel have not faired well - and that's the only VM I have tried at the moment. Using -e cycles:pp I have been able to lock up the VM 3 times out of 3 series of tests with perf-kvm that includes network traffic (e.g., netperf), disk I/O (dd based to create a file with dsync flag) and pure userspace cpu bound (openssl speed). May or may not be related. Also, I noted that 'perf kvm --guest record -e cycles:pp' does not generate a whole lot of samples -- like < 100 in a 20-second sample -- despite the fact that the guest is rather busy. I won't have much time over the next few days to run much in the way of tests; I'll come back to it Sunday night. David >> >> You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and >> ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a >> proper NULL deref or such. > Yes. With the patch above :pp modifier does not crash guest for me, but > in theory it should since BTS are still written to DS. May be BTS writes do > not overshoot guest entry. Will have to ask Intel for clarification. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 4:11 ` David Ahern @ 2012-07-12 4:29 ` Gleb Natapov 2012-07-12 15:20 ` David Ahern 0 siblings, 1 reply; 35+ messages in thread From: Gleb Natapov @ 2012-07-12 4:29 UTC (permalink / raw) To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote: > On 7/11/12 3:53 AM, Gleb Natapov wrote: > >On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: > >>On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: > >> > >>>Looks like Avi is right about the overshoot. Can you test something like this? > >>> > >>>diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > >>>index 166546e..5fb371a 100644 > >>>--- a/arch/x86/kernel/cpu/perf_event_intel.c > >>>+++ b/arch/x86/kernel/cpu/perf_event_intel.c > >>>@@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) > >>> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > >>> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > >>> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; > >>>+ arr[1].msr = MSR_IA32_PEBS_ENABLE; > >>>+ arr[1].host = cpuc->pebs_enabled; > >>>+ arr[1].guest = 0; > >>>+ *nr = 2; > >>> > >>>- *nr = 1; > >>> return arr; > >>> } > >> > > So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel > and a 2.6.38 kernel have not faired well - and that's the only VM I > have tried at the moment. Using -e cycles:pp I have been able to > lock up the VM 3 times out of 3 series of tests with perf-kvm that > includes network traffic (e.g., netperf), disk I/O (dd based to > create a file with dsync flag) and pure userspace cpu bound (openssl > speed). May or may not be related. > OK that's may be BTSes. What about -e cycles:p? BTW are you using your patch to set exclude_guest parameter? If not use -e cycles:Hp. > Also, I noted that 'perf kvm --guest record -e cycles:pp' does not > generate a whole lot of samples -- like < 100 in a 20-second sample > -- despite the fact that the guest is rather busy. > Host events do not suppose to generate events while guest is running. > I won't have much time over the next few days to run much in the way > of tests; I'll come back to it Sunday night. > > David > > >> > >>You also need to clear TR, BTS, BTINT from MSR_IA32_DEBUGCTLMSR and > >>ideally you'd also clear MSR_IA32_DS_AREA so that any write will be a > >>proper NULL deref or such. > >Yes. With the patch above :pp modifier does not crash guest for me, but > >in theory it should since BTS are still written to DS. May be BTS writes do > >not overshoot guest entry. Will have to ask Intel for clarification. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 4:29 ` Gleb Natapov @ 2012-07-12 15:20 ` David Ahern 2012-07-12 16:06 ` Gleb Natapov 0 siblings, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-12 15:20 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/11/12 10:29 PM, Gleb Natapov wrote: > On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote: >> On 7/11/12 3:53 AM, Gleb Natapov wrote: >>> On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: >>>> On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: >>>> >>>>> Looks like Avi is right about the overshoot. Can you test something like this? >>>>> >>>>> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c >>>>> index 166546e..5fb371a 100644 >>>>> --- a/arch/x86/kernel/cpu/perf_event_intel.c >>>>> +++ b/arch/x86/kernel/cpu/perf_event_intel.c >>>>> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) >>>>> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; >>>>> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; >>>>> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; >>>>> + arr[1].msr = MSR_IA32_PEBS_ENABLE; >>>>> + arr[1].host = cpuc->pebs_enabled; >>>>> + arr[1].guest = 0; >>>>> + *nr = 2; >>>>> >>>>> - *nr = 1; >>>>> return arr; >>>>> } >>>> >> >> So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel >> and a 2.6.38 kernel have not faired well - and that's the only VM I >> have tried at the moment. Using -e cycles:pp I have been able to >> lock up the VM 3 times out of 3 series of tests with perf-kvm that >> includes network traffic (e.g., netperf), disk I/O (dd based to >> create a file with dsync flag) and pure userspace cpu bound (openssl >> speed). May or may not be related. >> > OK that's may be BTSes. What about -e cycles:p? BTW are you using your > patch to set exclude_guest parameter? If not use -e cycles:Hp. I started with cycles:pp; should not really matter - they all need to work without blowing up VMs (cycles:p, cycles:pH, cycles:pG, cycles:pp, cycles:ppH, cycles:ppG). For grins I ran a quick test while reading emails this morning. This time a fedora 17 VM with 3.4.0-1.fc17.x86_64 kernel. It too locks up pretty quickly - just a couple of iterations of perf: perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a -v -e cycles:pH -- sleep 60 Note the :pH this time. I did not have netserver installed in the VM so used a ping flood for network traffic. > >> Also, I noted that 'perf kvm --guest record -e cycles:pp' does not >> generate a whole lot of samples -- like < 100 in a 20-second sample >> -- despite the fact that the guest is rather busy. >> > Host events do not suppose to generate events while guest is running. My server has 16 cpus and the VM has only 2 vcpus; with the -a I would expect some host sampling. Note: in the above case :pp resets the exclude-host modifier set by the perf-kvm part, so hosts samples are not excluded. See parse_events_modifier(). So, is the idea of your patch to not enable the PEBS in guest mode? David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 15:20 ` David Ahern @ 2012-07-12 16:06 ` Gleb Natapov 2012-07-12 16:13 ` Gleb Natapov ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-12 16:06 UTC (permalink / raw) To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML On Thu, Jul 12, 2012 at 09:20:53AM -0600, David Ahern wrote: > On 7/11/12 10:29 PM, Gleb Natapov wrote: > >On Wed, Jul 11, 2012 at 10:11:57PM -0600, David Ahern wrote: > >>On 7/11/12 3:53 AM, Gleb Natapov wrote: > >>>On Wed, Jul 11, 2012 at 11:49:47AM +0200, Peter Zijlstra wrote: > >>>>On Wed, 2012-07-11 at 10:10 +0300, Gleb Natapov wrote: > >>>> > >>>>>Looks like Avi is right about the overshoot. Can you test something like this? > >>>>> > >>>>>diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > >>>>>index 166546e..5fb371a 100644 > >>>>>--- a/arch/x86/kernel/cpu/perf_event_intel.c > >>>>>+++ b/arch/x86/kernel/cpu/perf_event_intel.c > >>>>>@@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) > >>>>> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > >>>>> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > >>>>> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; > >>>>>+ arr[1].msr = MSR_IA32_PEBS_ENABLE; > >>>>>+ arr[1].host = cpuc->pebs_enabled; > >>>>>+ arr[1].guest = 0; > >>>>>+ *nr = 2; > >>>>> > >>>>>- *nr = 1; > >>>>> return arr; > >>>>> } > >>>> > >> > >>So far the 64-bit Fedora 10 VM with both a Fedora 10 stock kernel > >>and a 2.6.38 kernel have not faired well - and that's the only VM I > >>have tried at the moment. Using -e cycles:pp I have been able to > >>lock up the VM 3 times out of 3 series of tests with perf-kvm that > >>includes network traffic (e.g., netperf), disk I/O (dd based to > >>create a file with dsync flag) and pure userspace cpu bound (openssl > >>speed). May or may not be related. > >> > >OK that's may be BTSes. What about -e cycles:p? BTW are you using your > >patch to set exclude_guest parameter? If not use -e cycles:Hp. > > I started with cycles:pp; should not really matter - they all need > to work without blowing up VMs (cycles:p, cycles:pH, cycles:pG, > cycles:pp, cycles:ppH, cycles:ppG). cycles:ppG and cycles:pG should be illegal. Peter's patch takes care of this. Others should set exclude guest and have to work without blowing up VMs. > > For grins I ran a quick test while reading emails this morning. This > time a fedora 17 VM with 3.4.0-1.fc17.x86_64 kernel. It too locks up > pretty quickly - just a couple of iterations of perf: > > perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a > -v -e cycles:pH -- sleep 60 Do not run perf kvm. It does not set exclude_guest and :p and :pp is not compatible with guest profiling and should be disallowed. Again Peter's patch takes care of this. Run perf top -e cycles:pH or similar. > > Note the :pH this time. I am not sure what perf kvm does with :pH modifier, but H modifier does not make sense with perf kvm and should be reported as an error by perf tool. > > I did not have netserver installed in the VM so used a ping flood > for network traffic. > > > > >>Also, I noted that 'perf kvm --guest record -e cycles:pp' does not > >>generate a whole lot of samples -- like < 100 in a 20-second sample > >>-- despite the fact that the guest is rather busy. > >> > >Host events do not suppose to generate events while guest is running. > > My server has 16 cpus and the VM has only 2 vcpus; with the -a I > would expect some host sampling. Note: in the above case :pp resets > the exclude-host modifier set by the perf-kvm part, so hosts samples > are not excluded. See parse_events_modifier(). Isn't this a bug? Why anything at all resets exclude-host set by perf-kvm? > > So, is the idea of your patch to not enable the PEBS in guest mode? > Yes, the idea of my patch is to disable PEBS and the guest entry. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 16:06 ` Gleb Natapov @ 2012-07-12 16:13 ` Gleb Natapov 2012-07-12 16:58 ` Peter Zijlstra 2012-07-16 1:51 ` David Ahern 2012-07-15 8:07 ` Avi Kivity 2012-07-16 2:19 ` David Ahern 2 siblings, 2 replies; 35+ messages in thread From: Gleb Natapov @ 2012-07-12 16:13 UTC (permalink / raw) To: David Ahern; +Cc: Peter Zijlstra, Avi Kivity, LKML On Thu, Jul 12, 2012 at 07:06:35PM +0300, Gleb Natapov wrote: > > > > So, is the idea of your patch to not enable the PEBS in guest mode? > > > Yes, the idea of my patch is to disable PEBS and the guest entry. > BTW what is you host cpu? My patch works only for those that have global control register to disable PMU events, but that should cover most of moder cpus. -- Gleb. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 16:13 ` Gleb Natapov @ 2012-07-12 16:58 ` Peter Zijlstra 2012-07-16 1:51 ` David Ahern 1 sibling, 0 replies; 35+ messages in thread From: Peter Zijlstra @ 2012-07-12 16:58 UTC (permalink / raw) To: Gleb Natapov; +Cc: David Ahern, Avi Kivity, LKML On Thu, 2012-07-12 at 19:13 +0300, Gleb Natapov wrote: > BTW what is you host cpu? My patch works only for those that have global > control register to disable PMU events, but that should cover most of > moder cpus. That covers everything but core and p4, and I don't think we support PEBS on either of those. that said, you might want to also clear EN on the individual counters, just in case the PEBS clear and DS_AREA clear aren't working. The SDM is pretty clear on what you _should_ do to enable/disable PEBS, doing less _might_ work but... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 16:13 ` Gleb Natapov 2012-07-12 16:58 ` Peter Zijlstra @ 2012-07-16 1:51 ` David Ahern 1 sibling, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-16 1:51 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/12/12 10:13 AM, Gleb Natapov wrote: > On Thu, Jul 12, 2012 at 07:06:35PM +0300, Gleb Natapov wrote: >>> >>> So, is the idea of your patch to not enable the PEBS in guest mode? >>> >> Yes, the idea of my patch is to disable PEBS and the guest entry. >> > BTW what is you host cpu? My patch works only for those that have global > control register to disable PMU events, but that should cover most of > moder cpus. So far all tests have been done on an HP with 2 Nehalem processors (E5540); I have another server with a couple of Westmere's (E5620) that I will test at some point. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 16:06 ` Gleb Natapov 2012-07-12 16:13 ` Gleb Natapov @ 2012-07-15 8:07 ` Avi Kivity 2012-07-15 13:00 ` David Ahern 2012-07-16 2:19 ` David Ahern 2 siblings, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-07-15 8:07 UTC (permalink / raw) To: Gleb Natapov; +Cc: David Ahern, Peter Zijlstra, LKML On 07/12/2012 07:06 PM, Gleb Natapov wrote: >> >> Note the :pH this time. > I am not sure what perf kvm does with :pH modifier, but H modifier does > not make sense with perf kvm and should be reported as an error by perf tool. Maybe it should refer to the guest vs. the nested guest... Well that's a horrible UI but we do need some way to distinguish among the two. Samples from the nested guest's kernel will make no sense when looking at the guest kernel. I guess we should default to exclude nested guest mode. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-15 8:07 ` Avi Kivity @ 2012-07-15 13:00 ` David Ahern 2012-07-15 13:03 ` Avi Kivity 0 siblings, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-15 13:00 UTC (permalink / raw) To: Avi Kivity, Gleb Natapov; +Cc: Peter Zijlstra, LKML On 7/15/12 2:07 AM, Avi Kivity wrote: > On 07/12/2012 07:06 PM, Gleb Natapov wrote: >>> >>> Note the :pH this time. >> I am not sure what perf kvm does with :pH modifier, but H modifier does >> not make sense with perf kvm and should be reported as an error by perf tool. > > Maybe it should refer to the guest vs. the nested guest... :H = host mode; you are thinking of :h for hypervisor mode. From perf-list documentation: "Modifiers allow the user to restrict when events are counted with 'u' for user-space, 'k' for kernel, 'h' for hypervisor. Additional modifiers are 'G' for guest counting (in KVM guests) and 'H' for host counting (not in KVM guests)." > > Well that's a horrible UI but we do need some way to distinguish among > the two. Samples from the nested guest's kernel will make no sense when > looking at the guest kernel. I guess we should default to exclude > nested guest mode. > I catch a return flight in a couple of hours. I will send a summary email tonight to make sure we are on the same page with the problem, uses cases and patches. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-15 13:00 ` David Ahern @ 2012-07-15 13:03 ` Avi Kivity 2012-07-16 1:52 ` David Ahern 0 siblings, 1 reply; 35+ messages in thread From: Avi Kivity @ 2012-07-15 13:03 UTC (permalink / raw) To: David Ahern; +Cc: Gleb Natapov, Peter Zijlstra, LKML On 07/15/2012 04:00 PM, David Ahern wrote: > On 7/15/12 2:07 AM, Avi Kivity wrote: >> On 07/12/2012 07:06 PM, Gleb Natapov wrote: >>>> >>>> Note the :pH this time. >>> I am not sure what perf kvm does with :pH modifier, but H modifier does >>> not make sense with perf kvm and should be reported as an error by >>> perf tool. >> >> Maybe it should refer to the guest vs. the nested guest... > > :H = host mode; you are thinking of :h for hypervisor mode. From > perf-list documentation: > > "Modifiers allow the user to restrict when events are > counted with 'u' for user-space, 'k' for kernel, 'h' for hypervisor. > Additional modifiers are 'G' for guest counting (in KVM guests) and 'H' > for host counting (not in KVM guests)." No, it's an additional distinction. A kvm guest can be in kernel mode, user mode, or in a nested guest mode (which has its own user mode and kernel mode). Currently we have no way of distinguishing between guest kernel mode and nested guest kernel mode. I assume 'h' means profiling the hypervisor from a guest (i.e. xen dom0)? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-15 13:03 ` Avi Kivity @ 2012-07-16 1:52 ` David Ahern 0 siblings, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-16 1:52 UTC (permalink / raw) To: Avi Kivity; +Cc: Gleb Natapov, Peter Zijlstra, LKML On 7/15/12 7:03 AM, Avi Kivity wrote: > I assume 'h' means profiling the hypervisor from a guest (i.e. xen dom0)? I do not think so. It does not appear to do anything on x86. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-12 16:06 ` Gleb Natapov 2012-07-12 16:13 ` Gleb Natapov 2012-07-15 8:07 ` Avi Kivity @ 2012-07-16 2:19 ` David Ahern 2 siblings, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-16 2:19 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/12/12 10:06 AM, Gleb Natapov wrote: >> I started with cycles:pp; should not really matter - they all need >> to work without blowing up VMs (cycles:p, cycles:pH, cycles:pG, >> cycles:pp, cycles:ppH, cycles:ppG). > cycles:ppG and cycles:pG should be illegal. Peter's patch takes care of > this. Others should set exclude guest and have to work without blowing > up VMs. I was only testing your patch; I'll add Peter's going forward. ---8<--- >> perf kvm --guestmount=/tmp/guest-mount record -fo /tmp/perf.data -a >> -v -e cycles:pH -- sleep 60 > Do not run perf kvm. It does not set exclude_guest and :p and :pp is not > compatible with guest profiling and should be disallowed. Again Peter's > patch takes care of this. Well, perf-kvm needs to work too - and yes you can set exclude_guest with it (doesn't make sense to use perf-kvm with exclude_guest, but then VMs should not die because of it). If someone adds -e <hardware event>:p{p} it needs to be handled cleanly. The G and H event modifiers can be used explicitly to exclude host or guest and that case too needs to be handled properly from whatever the context is. Peter's patch handles kernel side checks requiring exclude_guest if the pebs is to be used. A user running perf-kvm, perf-record and perf-top deserves something better than a confusing 'not supported error' which includes a message suggesting perf events is not enabled. I have some local patches for record and top to cover this. > > Run perf top -e cycles:pH or similar. > >> >> Note the :pH this time. > I am not sure what perf kvm does with :pH modifier, but H modifier does > not make sense with perf kvm and should be reported as an error by perf tool. perf-kvm passes everything from record on to the builtin record command. So, 'perf kvm record -e cycles:pH' effectively becomes 'perf record -e cycles:pH' with 2 global flags modified (perf_guest set and perf_host unset). There is nothing invalid about the perf-kvm command. ---8<--- >> My server has 16 cpus and the VM has only 2 vcpus; with the -a I >> would expect some host sampling. Note: in the above case :pp resets >> the exclude-host modifier set by the perf-kvm part, so hosts samples >> are not excluded. See parse_events_modifier(). > Isn't this a bug? Why anything at all resets exclude-host set by > perf-kvm? Arguably yes and arguably no. As with any linux command later arguments can unset/reset earlier ones. In this case -e <event>:<modifier> comes after the perf-kvm --guest --host settings. I can whip up a patch for perf {record,top} to spit out a debug message if perf_{guest,host} are not consistent with excl_{guest,host} if that's wanted. Don't forget -- perf-kvm has the --host flag to enable host side monitoring and --no-guest can be used to disable guest side monitoring. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-11 7:10 ` Gleb Natapov 2012-07-11 9:49 ` Peter Zijlstra @ 2012-07-20 23:34 ` David Ahern 2012-07-22 9:52 ` Avi Kivity 1 sibling, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-20 23:34 UTC (permalink / raw) To: Gleb Natapov; +Cc: Peter Zijlstra, Avi Kivity, LKML On 7/11/12 1:10 AM, Gleb Natapov wrote: > Looks like Avi is right about the overshoot. Can you test something like this? > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > index 166546e..5fb371a 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) > arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; > arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; > arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; > + arr[1].msr = MSR_IA32_PEBS_ENABLE; > + arr[1].host = cpuc->pebs_enabled; > + arr[1].guest = 0; > + *nr = 2; > > - *nr = 1; > return arr; > } Tested-by: David Ahern <dsahern@gmail.com> Peter's patch is required as well. It's the combination that fixes the problem. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-20 23:34 ` David Ahern @ 2012-07-22 9:52 ` Avi Kivity 0 siblings, 0 replies; 35+ messages in thread From: Avi Kivity @ 2012-07-22 9:52 UTC (permalink / raw) To: David Ahern; +Cc: Gleb Natapov, Peter Zijlstra, LKML On 07/21/2012 02:34 AM, David Ahern wrote: > On 7/11/12 1:10 AM, Gleb Natapov wrote: >> Looks like Avi is right about the overshoot. Can you test something >> like this? >> >> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c >> b/arch/x86/kernel/cpu/perf_event_intel.c >> index 166546e..5fb371a 100644 >> --- a/arch/x86/kernel/cpu/perf_event_intel.c >> +++ b/arch/x86/kernel/cpu/perf_event_intel.c >> @@ -1374,8 +1374,11 @@ static struct perf_guest_switch_msr >> *intel_guest_get_msrs(int *nr) >> arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; >> arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; >> arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; >> + arr[1].msr = MSR_IA32_PEBS_ENABLE; >> + arr[1].host = cpuc->pebs_enabled; >> + arr[1].guest = 0; >> + *nr = 2; >> >> - *nr = 1; >> return arr; >> } > > > Tested-by: David Ahern <dsahern@gmail.com> Thanks. Gleb, please add a comment explaining about the skid, because otherwise it looks redundant. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:49 ` Peter Zijlstra 2012-07-09 14:51 ` Avi Kivity @ 2012-07-09 14:52 ` David Ahern 2012-07-09 14:58 ` David Ahern 1 sibling, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-09 14:52 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML On 7/9/12 8:49 AM, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: >> >> I found this testing changes to perf-kvm, but found the problem extends >> to just perf-record. With perf-record exclude_guest defaults to 1. See >> tools/perf/util/util.c, event_attr_init(). > > You lost me there.. so perf-record defaults to exclude_guest=1 (which > would make the proposed patch I just send actually possible), but its > still going *bang* ? > I can do the formality of testing your proposed patch, but yes I think so. exclude_guest defaults to 1 because perf_guest defaults to false. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:52 ` David Ahern @ 2012-07-09 14:58 ` David Ahern 2012-07-09 15:18 ` David Ahern 0 siblings, 1 reply; 35+ messages in thread From: David Ahern @ 2012-07-09 14:58 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML On 7/9/12 8:52 AM, David Ahern wrote: > On 7/9/12 8:49 AM, Peter Zijlstra wrote: >> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: >>> >>> I found this testing changes to perf-kvm, but found the problem extends >>> to just perf-record. With perf-record exclude_guest defaults to 1. See >>> tools/perf/util/util.c, event_attr_init(). >> >> You lost me there.. so perf-record defaults to exclude_guest=1 (which >> would make the proposed patch I just send actually possible), but its >> still going *bang* ? >> > > > I can do the formality of testing your proposed patch, but yes I think > so. exclude_guest defaults to 1 because perf_guest defaults to false. Let me dig for a few minutes. exclude_user is getting reset somewhere. Setting a breakpoint on sys_perf_event_open and dumping the attribute: $1 = {type = 0, size = 80, config = 0, {sample_period = 4000, sample_freq = 4000}, sample_type = 263, read_format = 7, disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user = 0, exclude_kernel = 0, exclude_hv = 0, exclude_idle = 0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec = 1, task = 0, watermark = 0, precise_ip = 1, mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest = 0, __reserved_1 = 0, {wakeup_events = 0, wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0}, {bp_len = 0, config2 = 0}, branch_sample_type = 0} ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:58 ` David Ahern @ 2012-07-09 15:18 ` David Ahern 0 siblings, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-09 15:18 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML On 7/9/12 8:58 AM, David Ahern wrote: > On 7/9/12 8:52 AM, David Ahern wrote: >> On 7/9/12 8:49 AM, Peter Zijlstra wrote: >>> On Mon, 2012-07-09 at 08:47 -0600, David Ahern wrote: >>>> >>>> I found this testing changes to perf-kvm, but found the problem extends >>>> to just perf-record. With perf-record exclude_guest defaults to 1. See >>>> tools/perf/util/util.c, event_attr_init(). >>> >>> You lost me there.. so perf-record defaults to exclude_guest=1 (which >>> would make the proposed patch I just send actually possible), but its >>> still going *bang* ? >>> >> >> >> I can do the formality of testing your proposed patch, but yes I think >> so. exclude_guest defaults to 1 because perf_guest defaults to false. > > > Let me dig for a few minutes. exclude_user is getting reset somewhere. > Setting a breakpoint on sys_perf_event_open and dumping the attribute: > > $1 = {type = 0, size = 80, config = 0, {sample_period = 4000, > sample_freq = 4000}, sample_type = 263, read_format = 7, > disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user = > 0, exclude_kernel = 0, exclude_hv = 0, exclude_idle = > 0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec = > 1, task = 0, watermark = 0, precise_ip = 1, > mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest = > 0, __reserved_1 = 0, {wakeup_events = 0, > wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0}, > {bp_len = 0, config2 = 0}, branch_sample_type = 0} > > Offending function is parse_events_modifier(). It resets exclude_guest to 0. That's a separate thread/bug. Undoing that (hardcoding exclude_guest to 1) and verifying in gdb that it is set for all cpus: (gdb) p *attr $17 = {type = 0, size = 80, config = 0, {sample_period = 4000, sample_freq = 4000}, sample_type = 423, read_format = 7, disabled = 1, inherit = 1, pinned = 0, exclusive = 0, exclude_user = 0, exclude_kernel = 0, exclude_hv = 0, exclude_idle = 0, mmap = 1, comm = 1, freq = 1, inherit_stat = 0, enable_on_exec = 0, task = 0, watermark = 0, precise_ip = 1, mmap_data = 0, sample_id_all = 1, exclude_host = 0, exclude_guest = 1, __reserved_1 = 0, {wakeup_events = 0, wakeup_watermark = 0}, bp_type = 0, {bp_addr = 0, config1 = 0}, {bp_len = 0, config2 = 0}, branch_sample_type = 0} It still bombs and the VM dies. David ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:39 ` Avi Kivity 2012-07-09 14:47 ` David Ahern @ 2012-07-09 14:47 ` Peter Zijlstra 2012-07-20 23:35 ` David Ahern 1 sibling, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2012-07-09 14:47 UTC (permalink / raw) To: Avi Kivity; +Cc: Gleb Natapov, David Ahern, LKML On Mon, 2012-07-09 at 17:39 +0300, Avi Kivity wrote: > > Disabling PEBS events for guests isn't pretty though.. > > We already have atomic MSR switching at guest entry/exit time. So it's > not pretty in terms of not getting full profiling, but the code won't be > too hard. Basically we just have to exclude_guest any pebs event. OK, so ideally we'd do something like the below, except of course that that isn't backwards compatible and will break the world :/ bugger that --- --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -336,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event) /* BTS is currently only allowed for user-mode. */ if (!attr->exclude_kernel) return -EOPNOTSUPP; + + if (!attr->exclude_guest) + return -EOPNOTSUPP; } hwc->config |= config; @@ -378,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.precise_ip) { int precise = 0; + if (!event->attr.exclude_guest) + return -EOPNOTSUPP; + /* Support for constant skid */ if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) { precise++; ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: perf with precise attribute kills all KVM based VMs 2012-07-09 14:47 ` Peter Zijlstra @ 2012-07-20 23:35 ` David Ahern 0 siblings, 0 replies; 35+ messages in thread From: David Ahern @ 2012-07-20 23:35 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Avi Kivity, Gleb Natapov, LKML On 7/9/12 8:47 AM, Peter Zijlstra wrote: > On Mon, 2012-07-09 at 17:39 +0300, Avi Kivity wrote: >>> Disabling PEBS events for guests isn't pretty though.. >> >> We already have atomic MSR switching at guest entry/exit time. So it's >> not pretty in terms of not getting full profiling, but the code won't be >> too hard. Basically we just have to exclude_guest any pebs event. > > OK, so ideally we'd do something like the below, except of course that > that isn't backwards compatible and will break the world :/ > > bugger that > > --- > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -336,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event) > /* BTS is currently only allowed for user-mode. */ > if (!attr->exclude_kernel) > return -EOPNOTSUPP; > + > + if (!attr->exclude_guest) > + return -EOPNOTSUPP; > } > > hwc->config |= config; > @@ -378,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event) > if (event->attr.precise_ip) { > int precise = 0; > > + if (!event->attr.exclude_guest) > + return -EOPNOTSUPP; > + > /* Support for constant skid */ > if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) { > precise++; > Tested-by: David Ahern <dsahern@gmail.com> Gleb's patch is required as well. It takes both to fix the problem. And my userspace patch keeps compatibility for users who currently use -e cycles:pp (they won't be forced to add 'H'). (lkml is slow to show it so I can't pull a link for reference, but it's in your email ahead of this one). Be nice to get this set into stable releases as well. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2012-07-22 9:53 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-07-09 14:12 perf with precise attribute kills all KVM based VMs David Ahern 2012-07-09 14:19 ` Gleb Natapov 2012-07-09 14:24 ` Peter Zijlstra 2012-07-09 14:36 ` Gleb Natapov 2012-07-09 14:39 ` Avi Kivity 2012-07-09 14:47 ` David Ahern 2012-07-09 14:49 ` Peter Zijlstra 2012-07-09 14:51 ` Avi Kivity 2012-07-09 14:54 ` Gleb Natapov 2012-07-09 14:57 ` Gleb Natapov 2012-07-09 14:59 ` Peter Zijlstra 2012-07-10 23:38 ` David Ahern 2012-07-11 7:10 ` Gleb Natapov 2012-07-11 9:49 ` Peter Zijlstra 2012-07-11 9:53 ` Gleb Natapov 2012-07-11 13:34 ` David Ahern 2012-07-12 4:11 ` David Ahern 2012-07-12 4:29 ` Gleb Natapov 2012-07-12 15:20 ` David Ahern 2012-07-12 16:06 ` Gleb Natapov 2012-07-12 16:13 ` Gleb Natapov 2012-07-12 16:58 ` Peter Zijlstra 2012-07-16 1:51 ` David Ahern 2012-07-15 8:07 ` Avi Kivity 2012-07-15 13:00 ` David Ahern 2012-07-15 13:03 ` Avi Kivity 2012-07-16 1:52 ` David Ahern 2012-07-16 2:19 ` David Ahern 2012-07-20 23:34 ` David Ahern 2012-07-22 9:52 ` Avi Kivity 2012-07-09 14:52 ` David Ahern 2012-07-09 14:58 ` David Ahern 2012-07-09 15:18 ` David Ahern 2012-07-09 14:47 ` Peter Zijlstra 2012-07-20 23:35 ` David Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).