kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
@ 2009-09-23 14:04 Zhai, Edwin
  2009-09-23 14:09 ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Zhai, Edwin @ 2009-09-23 14:04 UTC (permalink / raw)
  To: Avi Kivity, Ingo Molnar; +Cc: kvm, Zhai, Edwin

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

Avi,

This is the patch to enable PLE, which depends on the a small change of 
Linux scheduler
(see http://lkml.org/lkml/2009/5/20/447).

According to our discussion last time, one missing part is that if PLE
exit, pick up an unscheduled vcpu at random and schedule it. But
further investigation found that:
1. KVM is hard to know the schedule state for each vcpu.
2. Linux scheduler has no existed API can be used to pull a specific
task to this cpu, so we need more changes to the common scheduler.
So I prefer current simple way: just give up current cpu time.

If no objection, I'll try to push common scheduler change first to
linux.

Thanks,
edwin


[-- Attachment #2: kvm_ple_v2.patch --]
[-- Type: application/octet-stream, Size: 5846 bytes --]

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>

--- linux-2.6.orig/arch/x86/include/asm/vmx.h
+++ linux-2.6/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
 	VM_ENTRY_INSTRUCTION_LEN        = 0x0000401a,
 	TPR_THRESHOLD                   = 0x0000401c,
 	SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
+	PLE_GAP                         = 0x00004020,
+	PLE_WINDOW                      = 0x00004022,
 	VM_INSTRUCTION_ERROR            = 0x00004400,
 	VM_EXIT_REASON                  = 0x00004402,
 	VM_EXIT_INTR_INFO               = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY	 41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
diff -u linux-2.6/arch/x86/kvm/vmx.c linux-2.6/arch/x86/kvm/vmx.c
--- linux-2.6/arch/x86/kvm/vmx.c
+++ linux-2.6/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:    upper bound on the amount of time between two successive
+ *             executions of PAUSE in a loop. Also indicate if ple enabled.
+ *             According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ *             in a PAUSE loop. Tests indicate that most spinlocks are held for
+ *             less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP    41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
 struct vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -320,6 +339,12 @@
 		SECONDARY_EXEC_UNRESTRICTED_GUEST;
 }
 
+static inline int cpu_has_vmx_ple(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
 	return flexpriority_enabled &&
@@ -1240,7 +1265,8 @@
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_UNRESTRICTED_GUEST |
+			SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -1387,6 +1413,9 @@
 	if (enable_ept && !cpu_has_vmx_ept_2m_page())
 		kvm_disable_largepages();
 
+	if (!cpu_has_vmx_ple())
+		ple_gap = 0;
+
 	return alloc_kvm_area();
 }
 
@@ -2301,9 +2330,16 @@
 			exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
 		if (!enable_unrestricted_guest)
 			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+		if (!ple_gap)
+			exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (ple_gap) {
+		vmcs_write32(PLE_GAP, ple_gap);
+		vmcs_write32(PLE_WINDOW, ple_window);
+	}
+
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
 	vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -3351,6 +3387,18 @@
 }
 
 /*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+				struct kvm_run *kvm_run)
+{
+	skip_emulated_instruction(vcpu);
+	sched_delay_yield(1000000);
+	return 1;
+}
+
+/*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
  * to be done to userspace and return 0.
@@ -3387,6 +3435,7 @@
 	[EXIT_REASON_MCE_DURING_VMENTRY]      = handle_machine_check,
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
+	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 };
 
 static const int kvm_vmx_max_exit_handlers =

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-23 14:04 [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting Zhai, Edwin
@ 2009-09-23 14:09 ` Avi Kivity
  2009-09-25  1:11   ` Zhai, Edwin
  2009-09-25 20:43   ` Joerg Roedel
  0 siblings, 2 replies; 22+ messages in thread
From: Avi Kivity @ 2009-09-23 14:09 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: Ingo Molnar, kvm

On 09/23/2009 05:04 PM, Zhai, Edwin wrote:
> Avi,
>
> This is the patch to enable PLE, which depends on the a small change 
> of Linux scheduler
> (see http://lkml.org/lkml/2009/5/20/447).
>
> According to our discussion last time, one missing part is that if PLE
> exit, pick up an unscheduled vcpu at random and schedule it. But
> further investigation found that:
> 1. KVM is hard to know the schedule state for each vcpu.
> 2. Linux scheduler has no existed API can be used to pull a specific
> task to this cpu, so we need more changes to the common scheduler.
> So I prefer current simple way: just give up current cpu time.
>
> If no objection, I'll try to push common scheduler change first to
> linux.

We haven't sorted out what is the correct thing to do here.  I think we 
should go for a directed yield, but until we have it, you can use 
hrtimers to sleep for 100 microseconds and hope the holding vcpu will 
get scheduled.  Even if it doesn't, we're only wasting a few percent cpu 
time instead of spinning.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-23 14:09 ` Avi Kivity
@ 2009-09-25  1:11   ` Zhai, Edwin
  2009-09-27  8:28     ` Avi Kivity
  2009-09-25 20:43   ` Joerg Roedel
  1 sibling, 1 reply; 22+ messages in thread
From: Zhai, Edwin @ 2009-09-25  1:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ingo Molnar, kvm, Zhai, Edwin

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

Avi,

hrtimer is used for sleep in attached patch, which have similar perf 
gain with previous one. Maybe we can check in this patch first, and turn 
to direct yield in future, as you suggested.

Thanks,
edwin

Avi Kivity wrote:
> On 09/23/2009 05:04 PM, Zhai, Edwin wrote:
>   
>> Avi,
>>
>> This is the patch to enable PLE, which depends on the a small change 
>> of Linux scheduler
>> (see http://lkml.org/lkml/2009/5/20/447).
>>
>> According to our discussion last time, one missing part is that if PLE
>> exit, pick up an unscheduled vcpu at random and schedule it. But
>> further investigation found that:
>> 1. KVM is hard to know the schedule state for each vcpu.
>> 2. Linux scheduler has no existed API can be used to pull a specific
>> task to this cpu, so we need more changes to the common scheduler.
>> So I prefer current simple way: just give up current cpu time.
>>
>> If no objection, I'll try to push common scheduler change first to
>> linux.
>>     
>
> We haven't sorted out what is the correct thing to do here.  I think we 
> should go for a directed yield, but until we have it, you can use 
> hrtimers to sleep for 100 microseconds and hope the holding vcpu will 
> get scheduled.  Even if it doesn't, we're only wasting a few percent cpu 
> time instead of spinning.
>
>   

-- 
best rgds,
edwin


[-- Attachment #2: kvm_ple_hrtime.patch --]
[-- Type: text/plain, Size: 6438 bytes --]

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
 	VM_ENTRY_INSTRUCTION_LEN        = 0x0000401a,
 	TPR_THRESHOLD                   = 0x0000401c,
 	SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
+	PLE_GAP                         = 0x00004020,
+	PLE_WINDOW                      = 0x00004022,
 	VM_INSTRUCTION_ERROR            = 0x00004400,
 	VM_EXIT_REASON                  = 0x00004402,
 	VM_EXIT_INTR_INFO               = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY	 41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..21dbfe9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:    upper bound on the amount of time between two successive
+ *             executions of PAUSE in a loop. Also indicate if ple enabled.
+ *             According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ *             in a PAUSE loop. Tests indicate that most spinlocks are held for
+ *             less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP    41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
 struct vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -319,6 +338,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
 		SECONDARY_EXEC_UNRESTRICTED_GUEST;
 }
 
+static inline int cpu_has_vmx_ple(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
 	return flexpriority_enabled &&
@@ -1256,7 +1281,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_UNRESTRICTED_GUEST |
+			SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -1400,6 +1426,9 @@ static __init int hardware_setup(void)
 	if (enable_ept && !cpu_has_vmx_ept_2m_page())
 		kvm_disable_largepages();
 
+	if (!cpu_has_vmx_ple())
+		ple_gap = 0;
+
 	return alloc_kvm_area();
 }
 
@@ -2312,9 +2341,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 			exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
 		if (!enable_unrestricted_guest)
 			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+		if (!ple_gap)
+			exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (ple_gap) {
+		vmcs_write32(PLE_GAP, ple_gap);
+		vmcs_write32(PLE_WINDOW, ple_window);
+	}
+
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
 	vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -3362,6 +3398,24 @@ out:
 }
 
 /*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+				struct kvm_run *kvm_run)
+{
+	ktime_t expires;
+	skip_emulated_instruction(vcpu);
+
+	/* Sleep for 1 msec, and hope lock-holder got scheduled */
+	expires = ktime_add_ns(ktime_get(), 1000000UL);
+	set_current_state(TASK_INTERRUPTIBLE);
+	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+
+	return 1;
+}
+
+/*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
  * to be done to userspace and return 0.
@@ -3397,6 +3451,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_MCE_DURING_VMENTRY]      = handle_machine_check,
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
+	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 };
 
 static const int kvm_vmx_max_exit_handlers =

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-23 14:09 ` Avi Kivity
  2009-09-25  1:11   ` Zhai, Edwin
@ 2009-09-25 20:43   ` Joerg Roedel
  2009-09-27  8:31     ` Avi Kivity
  1 sibling, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2009-09-25 20:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On Wed, Sep 23, 2009 at 05:09:38PM +0300, Avi Kivity wrote:
>
> We haven't sorted out what is the correct thing to do here.  I think we  
> should go for a directed yield, but until we have it, you can use  
> hrtimers to sleep for 100 microseconds and hope the holding vcpu will  
> get scheduled.  Even if it doesn't, we're only wasting a few percent cpu  
> time instead of spinning.

How do you plan to find out to which vcpu thread the current thread
should yield?

	Joerg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-25  1:11   ` Zhai, Edwin
@ 2009-09-27  8:28     ` Avi Kivity
  2009-09-28  9:33       ` Zhai, Edwin
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-09-27  8:28 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: Ingo Molnar, kvm

On 09/25/2009 04:11 AM, Zhai, Edwin wrote:
> Avi,
>
> hrtimer is used for sleep in attached patch, which have similar perf 
> gain with previous one. Maybe we can check in this patch first, and 
> turn to direct yield in future, as you suggested.
>
> +/*
> + * These 2 parameters are used to config the controls for Pause-Loop Exiting:
> + * ple_gap:    upper bound on the amount of time between two successive
> + *             executions of PAUSE in a loop. Also indicate if ple enabled.
> + *             According to test, this time is usually small than 41 cycles.
> + * ple_window: upper bound on the amount of time a guest is allowed to execute
> + *             in a PAUSE loop. Tests indicate that most spinlocks are held for
> + *             less than 2^12 cycles
> + * Time is measured based on a counter that runs at the same rate as the TSC,
> + * refer SDM volume 3b section 21.6.13&  22.1.3.
> + */
> +#define KVM_VMX_DEFAULT_PLE_GAP    41
> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
> +module_param(ple_gap, int, S_IRUGO);
> +
> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> +module_param(ple_window, int, S_IRUGO);
>    

Shouldn't be __read_mostly since they're read very rarely (__read_mostly 
should be for variables that are very often read, and rarely written).

I'm not even sure they should be parameters.

>   /*
> + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
> + * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
> + */
> +static int handle_pause(struct kvm_vcpu *vcpu,
> +				struct kvm_run *kvm_run)
> +{
> +	ktime_t expires;
> +	skip_emulated_instruction(vcpu);
> +
> +	/* Sleep for 1 msec, and hope lock-holder got scheduled */
> +	expires = ktime_add_ns(ktime_get(), 1000000UL);
>    

I think this should be much lower, 50-100us.  Maybe this should be a 
parameter.  With 1ms we losing significant cpu time if the congestion 
clears.

> +	set_current_state(TASK_INTERRUPTIBLE);
> +	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
> +
>    

Please add a tracepoint for this (since it can cause significant change 
in behaviour), and move the logic to kvm_main.c.  It will be reused by 
the AMD implementation, possibly my software spinlock detector, 
paravirtualized spinlocks, and hopefully other architectures.

> +	return 1;
> +}
> +
> +/*
>    

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-25 20:43   ` Joerg Roedel
@ 2009-09-27  8:31     ` Avi Kivity
  2009-09-27 13:46       ` Joerg Roedel
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-09-27  8:31 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On 09/25/2009 11:43 PM, Joerg Roedel wrote:
> On Wed, Sep 23, 2009 at 05:09:38PM +0300, Avi Kivity wrote:
>    
>> We haven't sorted out what is the correct thing to do here.  I think we
>> should go for a directed yield, but until we have it, you can use
>> hrtimers to sleep for 100 microseconds and hope the holding vcpu will
>> get scheduled.  Even if it doesn't, we're only wasting a few percent cpu
>> time instead of spinning.
>>      
> How do you plan to find out to which vcpu thread the current thread
> should yield?
>    

We can't find exactly which vcpu, but we can:

- rule out threads that are not vcpus for this guest
- rule out threads that are already running

A major problem with sleep() is that it effectively reduces the vm 
priority relative to guests that don't have spinlock contention.  By 
selecting a random nonrunnable vcpu belonging to this guest, we at least 
preserve the guest's timeslice.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27  8:31     ` Avi Kivity
@ 2009-09-27 13:46       ` Joerg Roedel
  2009-09-27 13:47         ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2009-09-27 13:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On Sun, Sep 27, 2009 at 10:31:21AM +0200, Avi Kivity wrote:
> On 09/25/2009 11:43 PM, Joerg Roedel wrote:
>> On Wed, Sep 23, 2009 at 05:09:38PM +0300, Avi Kivity wrote:
>>    
>>> We haven't sorted out what is the correct thing to do here.  I think we
>>> should go for a directed yield, but until we have it, you can use
>>> hrtimers to sleep for 100 microseconds and hope the holding vcpu will
>>> get scheduled.  Even if it doesn't, we're only wasting a few percent cpu
>>> time instead of spinning.
>>>      
>> How do you plan to find out to which vcpu thread the current thread
>> should yield?
>>    
>
> We can't find exactly which vcpu, but we can:
>
> - rule out threads that are not vcpus for this guest
> - rule out threads that are already running
>
> A major problem with sleep() is that it effectively reduces the vm  
> priority relative to guests that don't have spinlock contention.  By  
> selecting a random nonrunnable vcpu belonging to this guest, we at least  
> preserve the guest's timeslice.

Ok, that makes sense. But before trying that we should probably try to
call just yield() instead of schedule()? I remember someone from our
team here at AMD did this for Xen a while ago and already had pretty
good results with that. Xen has a completly other scheduler but maybe
its worth trying?

	Joerg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27 13:46       ` Joerg Roedel
@ 2009-09-27 13:47         ` Avi Kivity
  2009-09-27 14:07           ` Joerg Roedel
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-09-27 13:47 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On 09/27/2009 03:46 PM, Joerg Roedel wrote:
>
>> We can't find exactly which vcpu, but we can:
>>
>> - rule out threads that are not vcpus for this guest
>> - rule out threads that are already running
>>
>> A major problem with sleep() is that it effectively reduces the vm
>> priority relative to guests that don't have spinlock contention.  By
>> selecting a random nonrunnable vcpu belonging to this guest, we at least
>> preserve the guest's timeslice.
>>      
> Ok, that makes sense. But before trying that we should probably try to
> call just yield() instead of schedule()? I remember someone from our
> team here at AMD did this for Xen a while ago and already had pretty
> good results with that. Xen has a completly other scheduler but maybe
> its worth trying?
>    

yield() is a no-op in CFS.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27 13:47         ` Avi Kivity
@ 2009-09-27 14:07           ` Joerg Roedel
  2009-09-27 14:18             ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2009-09-27 14:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote:
> On 09/27/2009 03:46 PM, Joerg Roedel wrote:
>>
>>> We can't find exactly which vcpu, but we can:
>>>
>>> - rule out threads that are not vcpus for this guest
>>> - rule out threads that are already running
>>>
>>> A major problem with sleep() is that it effectively reduces the vm
>>> priority relative to guests that don't have spinlock contention.  By
>>> selecting a random nonrunnable vcpu belonging to this guest, we at least
>>> preserve the guest's timeslice.
>>>      
>> Ok, that makes sense. But before trying that we should probably try to
>> call just yield() instead of schedule()? I remember someone from our
>> team here at AMD did this for Xen a while ago and already had pretty
>> good results with that. Xen has a completly other scheduler but maybe
>> its worth trying?
>>    
>
> yield() is a no-op in CFS.

Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my
distro.
If the scheduler would give us something like a real_yield() function
which asumes kernel.sched_compat_yield = 1 might help. At least its
better than sleeping for some random amount of time.

	Joerg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27 14:07           ` Joerg Roedel
@ 2009-09-27 14:18             ` Avi Kivity
  2009-09-27 14:53               ` Joerg Roedel
  0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-09-27 14:18 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On 09/27/2009 04:07 PM, Joerg Roedel wrote:
> On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote:
>    
>> On 09/27/2009 03:46 PM, Joerg Roedel wrote:
>>      
>>>        
>>>> We can't find exactly which vcpu, but we can:
>>>>
>>>> - rule out threads that are not vcpus for this guest
>>>> - rule out threads that are already running
>>>>
>>>> A major problem with sleep() is that it effectively reduces the vm
>>>> priority relative to guests that don't have spinlock contention.  By
>>>> selecting a random nonrunnable vcpu belonging to this guest, we at least
>>>> preserve the guest's timeslice.
>>>>
>>>>          
>>> Ok, that makes sense. But before trying that we should probably try to
>>> call just yield() instead of schedule()? I remember someone from our
>>> team here at AMD did this for Xen a while ago and already had pretty
>>> good results with that. Xen has a completly other scheduler but maybe
>>> its worth trying?
>>>
>>>        
>> yield() is a no-op in CFS.
>>      
> Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my
> distro.
> If the scheduler would give us something like a real_yield() function
> which asumes kernel.sched_compat_yield = 1 might help. At least its
> better than sleeping for some random amount of time.
>
>    

Depends.  If it's a global yield(), yes.  If it's a local yield() that 
doesn't rebalance the runqueues we might be left with the spinning task 
re-running.

Also, if yield means "give up the reminder of our timeslice", then we 
potentially end up sleeping a much longer random amount of time.  If we 
yield to another vcpu in the same guest we might not care, but if we 
yield to some other guest we're seriously penalizing ourselves.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27 14:18             ` Avi Kivity
@ 2009-09-27 14:53               ` Joerg Roedel
  2009-09-29 16:46                 ` Avi Kivity
  0 siblings, 1 reply; 22+ messages in thread
From: Joerg Roedel @ 2009-09-27 14:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On Sun, Sep 27, 2009 at 04:18:00PM +0200, Avi Kivity wrote:
> On 09/27/2009 04:07 PM, Joerg Roedel wrote:
>> On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote:
>>    
>>> On 09/27/2009 03:46 PM, Joerg Roedel wrote:
>>>      
>>>>        
>>>>> We can't find exactly which vcpu, but we can:
>>>>>
>>>>> - rule out threads that are not vcpus for this guest
>>>>> - rule out threads that are already running
>>>>>
>>>>> A major problem with sleep() is that it effectively reduces the vm
>>>>> priority relative to guests that don't have spinlock contention.  By
>>>>> selecting a random nonrunnable vcpu belonging to this guest, we at least
>>>>> preserve the guest's timeslice.
>>>>>
>>>>>          
>>>> Ok, that makes sense. But before trying that we should probably try to
>>>> call just yield() instead of schedule()? I remember someone from our
>>>> team here at AMD did this for Xen a while ago and already had pretty
>>>> good results with that. Xen has a completly other scheduler but maybe
>>>> its worth trying?
>>>>
>>>>        
>>> yield() is a no-op in CFS.
>>>      
>> Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my
>> distro.
>> If the scheduler would give us something like a real_yield() function
>> which asumes kernel.sched_compat_yield = 1 might help. At least its
>> better than sleeping for some random amount of time.
>>
>>    
>
> Depends.  If it's a global yield(), yes.  If it's a local yield() that  
> doesn't rebalance the runqueues we might be left with the spinning task  
> re-running.

Only one runable task on each cpu is unlikely in a situation of high
vcpu overcommit (where pause filtering matters).

> Also, if yield means "give up the reminder of our timeslice", then we  
> potentially end up sleeping a much longer random amount of time.  If we  
> yield to another vcpu in the same guest we might not care, but if we  
> yield to some other guest we're seriously penalizing ourselves.

I agree that a directed yield with possible rebalance would be good to
have, but this is very intrusive to the scheduler code and I think we
should at least try if this simpler approach already gives us good
results.

	Joerg


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27  8:28     ` Avi Kivity
@ 2009-09-28  9:33       ` Zhai, Edwin
  2009-09-29 12:05         ` Zhai, Edwin
  2009-09-29 13:34         ` Avi Kivity
  0 siblings, 2 replies; 22+ messages in thread
From: Zhai, Edwin @ 2009-09-28  9:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Zhai, Edwin

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]


Avi Kivity wrote:
> +#define KVM_VMX_DEFAULT_PLE_GAP    41
> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
> +module_param(ple_gap, int, S_IRUGO);
> +
> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> +module_param(ple_window, int, S_IRUGO);
>    
>   
>
> Shouldn't be __read_mostly since they're read very rarely (__read_mostly 
> should be for variables that are very often read, and rarely written).
>   

In general, they are read only except that experienced user may try 
different parameter for perf tuning.

> I'm not even sure they should be parameters.
>   

For different spinlock in different OS, and for different workloads, we 
need different parameter for tuning. It's similar as the enable_ept.

>   
>>   /*
>> + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>> + * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
>> + */
>> +static int handle_pause(struct kvm_vcpu *vcpu,
>> +				struct kvm_run *kvm_run)
>> +{
>> +	ktime_t expires;
>> +	skip_emulated_instruction(vcpu);
>> +
>> +	/* Sleep for 1 msec, and hope lock-holder got scheduled */
>> +	expires = ktime_add_ns(ktime_get(), 1000000UL);
>>    
>>     
>
> I think this should be much lower, 50-100us.  Maybe this should be a 
> parameter.  With 1ms we losing significant cpu time if the congestion 
> clears.
>   

I have made it a parameter with default value of 100 us.

>   
>> +	set_current_state(TASK_INTERRUPTIBLE);
>> +	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>> +
>>    
>>     
>
> Please add a tracepoint for this (since it can cause significant change 
> in behaviour), 

Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE 
vmexit from other vmexits.

> and move the logic to kvm_main.c.  It will be reused by 
> the AMD implementation, possibly my software spinlock detector, 
> paravirtualized spinlocks, and hopefully other architectures.
>   

Done.
>   
>> +	return 1;
>> +}
>> +
>> +/*
>>    
>>     
>
>   

[-- Attachment #2: kvm_ple_hrtimer_v2.patch --]
[-- Type: application/octet-stream, Size: 7678 bytes --]

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
 	VM_ENTRY_INSTRUCTION_LEN        = 0x0000401a,
 	TPR_THRESHOLD                   = 0x0000401c,
 	SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
+	PLE_GAP                         = 0x00004020,
+	PLE_WINDOW                      = 0x00004022,
 	VM_INSTRUCTION_ERROR            = 0x00004400,
 	VM_EXIT_REASON                  = 0x00004402,
 	VM_EXIT_INTR_INFO               = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY	 41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..ed40386 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,31 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:    upper bound on the amount of time between two successive
+ *             executions of PAUSE in a loop. Also indicate if ple enabled.
+ *             According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ *             in a PAUSE loop. Tests indicate that most spinlocks are held for
+ *             less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP    41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
+/*
+ * ple_sleep controls how long(us) the VCPU sleep upon a PLE vmexit
+ */
+static int __read_mostly ple_sleep = 100;
+module_param(ple_sleep, int, S_IRUGO);
+
 struct vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -319,6 +344,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
 		SECONDARY_EXEC_UNRESTRICTED_GUEST;
 }
 
+static inline int cpu_has_vmx_ple(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
 	return flexpriority_enabled &&
@@ -1256,7 +1287,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_UNRESTRICTED_GUEST |
+			SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -1400,6 +1432,9 @@ static __init int hardware_setup(void)
 	if (enable_ept && !cpu_has_vmx_ept_2m_page())
 		kvm_disable_largepages();
 
+	if (!cpu_has_vmx_ple())
+		ple_gap = 0;
+
 	return alloc_kvm_area();
 }
 
@@ -2312,9 +2347,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 			exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
 		if (!enable_unrestricted_guest)
 			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+		if (!ple_gap)
+			exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (ple_gap) {
+		vmcs_write32(PLE_GAP, ple_gap);
+		vmcs_write32(PLE_WINDOW, ple_window);
+	}
+
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
 	vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -3362,6 +3404,19 @@ out:
 }
 
 /*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+				struct kvm_run *kvm_run)
+{
+	skip_emulated_instruction(vcpu);
+	kvm_vcpu_sleep(vcpu, ple_sleep);
+
+	return 1;
+}
+
+/*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
  * to be done to userspace and return 0.
@@ -3397,6 +3452,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_MCE_DURING_VMENTRY]      = handle_machine_check,
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
+	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 };
 
 static const int kvm_vmx_max_exit_handlers =
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bf9ee9..3723d62 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -287,6 +287,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_sleep(struct kvm_vcpu *vcpu, unsigned int sleep_time);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e27b7a9..ff006ce 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1095,6 +1095,17 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
+void kvm_vcpu_sleep(struct kvm_vcpu *vcpu, unsigned int sleep_time)
+{
+	/* Sleep for required time(us), and hope lock-holder got scheduled */
+	ktime_t expires;
+
+	expires = ktime_add_ns(ktime_get(), 1000UL * sleep_time);
+	set_current_state(TASK_INTERRUPTIBLE);
+	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_sleep);
+
 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct kvm_vcpu *vcpu = vma->vm_file->private_data;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-28  9:33       ` Zhai, Edwin
@ 2009-09-29 12:05         ` Zhai, Edwin
  2009-09-29 13:34         ` Avi Kivity
  1 sibling, 0 replies; 22+ messages in thread
From: Zhai, Edwin @ 2009-09-29 12:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Zhai, Edwin

Avi,
Any comments for this new patch?
Thanks,


Zhai, Edwin wrote:
> Avi Kivity wrote:
>   
>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>> +module_param(ple_gap, int, S_IRUGO);
>> +
>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>> +module_param(ple_window, int, S_IRUGO);
>>    
>>   
>>
>> Shouldn't be __read_mostly since they're read very rarely (__read_mostly 
>> should be for variables that are very often read, and rarely written).
>>   
>>     
>
> In general, they are read only except that experienced user may try 
> different parameter for perf tuning.
>
>   
>> I'm not even sure they should be parameters.
>>   
>>     
>
> For different spinlock in different OS, and for different workloads, we 
> need different parameter for tuning. It's similar as the enable_ept.
>
>   
>>   
>>     
>>>   /*
>>> + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
>>> + * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
>>> + */
>>> +static int handle_pause(struct kvm_vcpu *vcpu,
>>> +				struct kvm_run *kvm_run)
>>> +{
>>> +	ktime_t expires;
>>> +	skip_emulated_instruction(vcpu);
>>> +
>>> +	/* Sleep for 1 msec, and hope lock-holder got scheduled */
>>> +	expires = ktime_add_ns(ktime_get(), 1000000UL);
>>>    
>>>     
>>>       
>> I think this should be much lower, 50-100us.  Maybe this should be a 
>> parameter.  With 1ms we losing significant cpu time if the congestion 
>> clears.
>>   
>>     
>
> I have made it a parameter with default value of 100 us.
>
>   
>>   
>>     
>>> +	set_current_state(TASK_INTERRUPTIBLE);
>>> +	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>>> +
>>>    
>>>     
>>>       
>> Please add a tracepoint for this (since it can cause significant change 
>> in behaviour), 
>>     
>
> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE 
> vmexit from other vmexits.
>
>   
>> and move the logic to kvm_main.c.  It will be reused by 
>> the AMD implementation, possibly my software spinlock detector, 
>> paravirtualized spinlocks, and hopefully other architectures.
>>   
>>     
>
> Done.
>   
>>   
>>     
>>> +	return 1;
>>> +}
>>> +
>>> +/*
>>>    
>>>     
>>>       
>>   
>>     

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-28  9:33       ` Zhai, Edwin
  2009-09-29 12:05         ` Zhai, Edwin
@ 2009-09-29 13:34         ` Avi Kivity
  2009-09-30  1:01           ` Zhai, Edwin
  1 sibling, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2009-09-29 13:34 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: kvm

On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
>
> Avi Kivity wrote:
>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>> +module_param(ple_gap, int, S_IRUGO);
>> +
>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>> +module_param(ple_window, int, S_IRUGO);
>>
>> Shouldn't be __read_mostly since they're read very rarely 
>> (__read_mostly should be for variables that are very often read, and 
>> rarely written).
>
> In general, they are read only except that experienced user may try 
> different parameter for perf tuning.


__read_mostly doesn't just mean it's read mostly.  It also means it's 
read often.  Otherwise it's just wasting space in hot cachelines.

>
>> I'm not even sure they should be parameters.
>
> For different spinlock in different OS, and for different workloads, 
> we need different parameter for tuning. It's similar as the enable_ept.

No, global parameters don't work for tuning workloads and guests since 
they cannot be modified on a per-guest basis.  enable_ept is only useful 
for debugging and testing.

>
>>> +    set_current_state(TASK_INTERRUPTIBLE);
>>> +    schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>>> +
>>
>> Please add a tracepoint for this (since it can cause significant 
>> change in behaviour), 
>
> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE 
> vmexit from other vmexits.

Right.  I thought of the software spinlock detector, but that's another 
problem.

I think you can drop the sleep_time parameter, it can be part of the 
function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.  
Please call it kvm_vcpu_on_spin() or something (since that's what the 
guest is doing).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-27 14:53               ` Joerg Roedel
@ 2009-09-29 16:46                 ` Avi Kivity
  0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2009-09-29 16:46 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Zhai, Edwin, Ingo Molnar, kvm

On 09/27/2009 04:53 PM, Joerg Roedel wrote:
>
>    
>> Depends.  If it's a global yield(), yes.  If it's a local yield() that
>> doesn't rebalance the runqueues we might be left with the spinning task
>> re-running.
>>      
> Only one runable task on each cpu is unlikely in a situation of high
> vcpu overcommit (where pause filtering matters).
>
>    

I think even 2:1 overcommit can degrade performance terribly.

>> Also, if yield means "give up the reminder of our timeslice", then we
>> potentially end up sleeping a much longer random amount of time.  If we
>> yield to another vcpu in the same guest we might not care, but if we
>> yield to some other guest we're seriously penalizing ourselves.
>>      
> I agree that a directed yield with possible rebalance would be good to
> have, but this is very intrusive to the scheduler code and I think we
> should at least try if this simpler approach already gives us good
> results.
>    

No objection to trying.  I'd like to see hrtimer sleep as a baseline 
since it doesn't require any core changes, and we can play with it as we 
add more core infrastructure:

- not sleeping if all vcpus are running
- true yield() instead of sleep
- directed yield
- cross cpu directed yield

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-29 13:34         ` Avi Kivity
@ 2009-09-30  1:01           ` Zhai, Edwin
  2009-09-30  6:28             ` Avi Kivity
  2009-09-30 16:22             ` Marcelo Tosatti
  0 siblings, 2 replies; 22+ messages in thread
From: Zhai, Edwin @ 2009-09-30  1:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Zhai, Edwin

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

Avi,
I modify it according your comments. The only thing I want to keep is 
the module param ple_gap/window.  Although they are not per-guest, they 
can be used to find the right value, and disable PLE for debug purpose.

Thanks,


Avi Kivity wrote:
> On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
>   
>> Avi Kivity wrote:
>>     
>>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
>>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
>>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>>> +module_param(ple_gap, int, S_IRUGO);
>>> +
>>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>>> +module_param(ple_window, int, S_IRUGO);
>>>
>>> Shouldn't be __read_mostly since they're read very rarely 
>>> (__read_mostly should be for variables that are very often read, and 
>>> rarely written).
>>>       
>> In general, they are read only except that experienced user may try 
>> different parameter for perf tuning.
>>     
>
>
> __read_mostly doesn't just mean it's read mostly.  It also means it's 
> read often.  Otherwise it's just wasting space in hot cachelines.
>
>   
>>> I'm not even sure they should be parameters.
>>>       
>> For different spinlock in different OS, and for different workloads, 
>> we need different parameter for tuning. It's similar as the enable_ept.
>>     
>
> No, global parameters don't work for tuning workloads and guests since 
> they cannot be modified on a per-guest basis.  enable_ept is only useful 
> for debugging and testing.
>
>   
>>>> +    set_current_state(TASK_INTERRUPTIBLE);
>>>> +    schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>>>> +
>>>>         
>>> Please add a tracepoint for this (since it can cause significant 
>>> change in behaviour), 
>>>       
>> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE 
>> vmexit from other vmexits.
>>     
>
> Right.  I thought of the software spinlock detector, but that's another 
> problem.
>
> I think you can drop the sleep_time parameter, it can be part of the 
> function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.  
> Please call it kvm_vcpu_on_spin() or something (since that's what the 
> guest is doing).
>
>   

[-- Attachment #2: kvm_ple_hrtimer_v3.patch --]
[-- Type: application/octet-stream, Size: 7412 bytes --]

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
 	VM_ENTRY_INSTRUCTION_LEN        = 0x0000401a,
 	TPR_THRESHOLD                   = 0x0000401c,
 	SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
+	PLE_GAP                         = 0x00004020,
+	PLE_WINDOW                      = 0x00004022,
 	VM_INSTRUCTION_ERROR            = 0x00004400,
 	VM_EXIT_REASON                  = 0x00004402,
 	VM_EXIT_INTR_INFO               = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY	 41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..7b191bb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:    upper bound on the amount of time between two successive
+ *             executions of PAUSE in a loop. Also indicate if ple enabled.
+ *             According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ *             in a PAUSE loop. Tests indicate that most spinlocks are held for
+ *             less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP    41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
 struct vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -319,6 +338,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
 		SECONDARY_EXEC_UNRESTRICTED_GUEST;
 }
 
+static inline int cpu_has_vmx_ple(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
 	return flexpriority_enabled &&
@@ -1256,7 +1281,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_UNRESTRICTED_GUEST |
+			SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -1400,6 +1426,9 @@ static __init int hardware_setup(void)
 	if (enable_ept && !cpu_has_vmx_ept_2m_page())
 		kvm_disable_largepages();
 
+	if (!cpu_has_vmx_ple())
+		ple_gap = 0;
+
 	return alloc_kvm_area();
 }
 
@@ -2312,9 +2341,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 			exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
 		if (!enable_unrestricted_guest)
 			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+		if (!ple_gap)
+			exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (ple_gap) {
+		vmcs_write32(PLE_GAP, ple_gap);
+		vmcs_write32(PLE_WINDOW, ple_window);
+	}
+
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
 	vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -3362,6 +3398,19 @@ out:
 }
 
 /*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+				struct kvm_run *kvm_run)
+{
+	skip_emulated_instruction(vcpu);
+	kvm_vcpu_on_spin(vcpu);
+
+	return 1;
+}
+
+/*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
  * to be done to userspace and return 0.
@@ -3397,6 +3446,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_MCE_DURING_VMENTRY]      = handle_machine_check,
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
+	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 };
 
 static const int kvm_vmx_max_exit_handlers =
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bf9ee9..0c86b1d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -287,6 +287,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e27b7a9..f36519b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1095,6 +1095,17 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+{
+	/* Sleep for 100 us, and hope lock-holder got scheduled */
+	ktime_t expires;
+
+	expires = ktime_add_ns(ktime_get(), 100000UL);
+	set_current_state(TASK_INTERRUPTIBLE);
+	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
+
 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct kvm_vcpu *vcpu = vma->vm_file->private_data;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-30  1:01           ` Zhai, Edwin
@ 2009-09-30  6:28             ` Avi Kivity
  2009-09-30 16:22             ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2009-09-30  6:28 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: kvm, Marcelo Tosatti

On 09/30/2009 03:01 AM, Zhai, Edwin wrote:
> Avi,
> I modify it according your comments. The only thing I want to keep is 
> the module param ple_gap/window.  Although they are not per-guest, 
> they can be used to find the right value, and disable PLE for debug 
> purpose.

Fair enough, ACK.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-30  1:01           ` Zhai, Edwin
  2009-09-30  6:28             ` Avi Kivity
@ 2009-09-30 16:22             ` Marcelo Tosatti
  2009-10-02 18:28               ` Marcelo Tosatti
  1 sibling, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2009-09-30 16:22 UTC (permalink / raw)
  To: Zhai, Edwin, Mark Langsdorf; +Cc: Avi Kivity, kvm

On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote:
> Avi,
> I modify it according your comments. The only thing I want to keep is  
> the module param ple_gap/window.  Although they are not per-guest, they  
> can be used to find the right value, and disable PLE for debug purpose.
>
> Thanks,
>
>
> Avi Kivity wrote:
>> On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
>>   
>>> Avi Kivity wrote:
>>>     
>>>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
>>>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
>>>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>>>> +module_param(ple_gap, int, S_IRUGO);
>>>> +
>>>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>>>> +module_param(ple_window, int, S_IRUGO);
>>>>
>>>> Shouldn't be __read_mostly since they're read very rarely  
>>>> (__read_mostly should be for variables that are very often read, 
>>>> and rarely written).
>>>>       
>>> In general, they are read only except that experienced user may try  
>>> different parameter for perf tuning.
>>>     
>>
>>
>> __read_mostly doesn't just mean it's read mostly.  It also means it's  
>> read often.  Otherwise it's just wasting space in hot cachelines.
>>
>>   
>>>> I'm not even sure they should be parameters.
>>>>       
>>> For different spinlock in different OS, and for different workloads,  
>>> we need different parameter for tuning. It's similar as the 
>>> enable_ept.
>>>     
>>
>> No, global parameters don't work for tuning workloads and guests since  
>> they cannot be modified on a per-guest basis.  enable_ept is only 
>> useful for debugging and testing.
>>
>>   
>>>>> +    set_current_state(TASK_INTERRUPTIBLE);
>>>>> +    schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>>>>> +
>>>>>         
>>>> Please add a tracepoint for this (since it can cause significant  
>>>> change in behaviour),       
>>> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE  
>>> vmexit from other vmexits.
>>>     
>>
>> Right.  I thought of the software spinlock detector, but that's another 
>> problem.
>>
>> I think you can drop the sleep_time parameter, it can be part of the  
>> function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.   
>> Please call it kvm_vcpu_on_spin() or something (since that's what the  
>> guest is doing).

kvm_vcpu_on_spin() should add the vcpu to vcpu->wq (so a new pending
interrupt wakes it up immediately).

Do you (and/or Mark) have any numbers for non-vcpu overcommited guests?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-09-30 16:22             ` Marcelo Tosatti
@ 2009-10-02 18:28               ` Marcelo Tosatti
  2009-10-09 10:03                 ` Zhai, Edwin
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2009-10-02 18:28 UTC (permalink / raw)
  To: Zhai, Edwin, Mark Langsdorf; +Cc: Avi Kivity, kvm

On Wed, Sep 30, 2009 at 01:22:49PM -0300, Marcelo Tosatti wrote:
> On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote:
> > Avi,
> > I modify it according your comments. The only thing I want to keep is  
> > the module param ple_gap/window.  Although they are not per-guest, they  
> > can be used to find the right value, and disable PLE for debug purpose.
> >
> > Thanks,
> >
> >
> > Avi Kivity wrote:
> >> On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
> >>   
> >>> Avi Kivity wrote:
> >>>     
> >>>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
> >>>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
> >>>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
> >>>> +module_param(ple_gap, int, S_IRUGO);
> >>>> +
> >>>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> >>>> +module_param(ple_window, int, S_IRUGO);
> >>>>
> >>>> Shouldn't be __read_mostly since they're read very rarely  
> >>>> (__read_mostly should be for variables that are very often read, 
> >>>> and rarely written).
> >>>>       
> >>> In general, they are read only except that experienced user may try  
> >>> different parameter for perf tuning.
> >>>     
> >>
> >>
> >> __read_mostly doesn't just mean it's read mostly.  It also means it's  
> >> read often.  Otherwise it's just wasting space in hot cachelines.
> >>
> >>   
> >>>> I'm not even sure they should be parameters.
> >>>>       
> >>> For different spinlock in different OS, and for different workloads,  
> >>> we need different parameter for tuning. It's similar as the 
> >>> enable_ept.
> >>>     
> >>
> >> No, global parameters don't work for tuning workloads and guests since  
> >> they cannot be modified on a per-guest basis.  enable_ept is only 
> >> useful for debugging and testing.
> >>
> >>   
> >>>>> +    set_current_state(TASK_INTERRUPTIBLE);
> >>>>> +    schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
> >>>>> +
> >>>>>         
> >>>> Please add a tracepoint for this (since it can cause significant  
> >>>> change in behaviour),       
> >>> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE  
> >>> vmexit from other vmexits.
> >>>     
> >>
> >> Right.  I thought of the software spinlock detector, but that's another 
> >> problem.
> >>
> >> I think you can drop the sleep_time parameter, it can be part of the  
> >> function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.   
> >> Please call it kvm_vcpu_on_spin() or something (since that's what the  
> >> guest is doing).
> 
> kvm_vcpu_on_spin() should add the vcpu to vcpu->wq (so a new pending
> interrupt wakes it up immediately).

Updated version (also please send it separately from the vmx.c patch):

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 894a56e..43125dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -231,6 +231,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4d0dd39..e788d70 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1479,6 +1479,21 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+{
+	ktime_t expires;
+	DEFINE_WAIT(wait);
+
+	prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+	/* Sleep for 100 us, and hope lock-holder got scheduled */
+	expires = ktime_add_ns(ktime_get(), 100000UL);
+	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+
+	finish_wait(&vcpu->wq, &wait);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
+
 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct kvm_vcpu *vcpu = vma->vm_file->private_data;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-10-02 18:28               ` Marcelo Tosatti
@ 2009-10-09 10:03                 ` Zhai, Edwin
  2009-10-11 15:34                   ` Avi Kivity
  2009-10-12 19:13                   ` Marcelo Tosatti
  0 siblings, 2 replies; 22+ messages in thread
From: Zhai, Edwin @ 2009-10-09 10:03 UTC (permalink / raw)
  To: Marcelo Tosatti, Avi Kivity; +Cc: Mark Langsdorf, kvm, Zhai, Edwin

[-- Attachment #1: Type: text/plain, Size: 4416 bytes --]

Tosatti,
See attached patch.

Avi,
Could you pls. do the check in if no any other comments.
Thanks,


Marcelo Tosatti wrote:
> On Wed, Sep 30, 2009 at 01:22:49PM -0300, Marcelo Tosatti wrote:
>   
>> On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote:
>>     
>>> Avi,
>>> I modify it according your comments. The only thing I want to keep is  
>>> the module param ple_gap/window.  Although they are not per-guest, they  
>>> can be used to find the right value, and disable PLE for debug purpose.
>>>
>>> Thanks,
>>>
>>>
>>> Avi Kivity wrote:
>>>       
>>>> On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
>>>>   
>>>>         
>>>>> Avi Kivity wrote:
>>>>>     
>>>>>           
>>>>>> +#define KVM_VMX_DEFAULT_PLE_GAP    41
>>>>>> +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
>>>>>> +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
>>>>>> +module_param(ple_gap, int, S_IRUGO);
>>>>>> +
>>>>>> +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
>>>>>> +module_param(ple_window, int, S_IRUGO);
>>>>>>
>>>>>> Shouldn't be __read_mostly since they're read very rarely  
>>>>>> (__read_mostly should be for variables that are very often read, 
>>>>>> and rarely written).
>>>>>>       
>>>>>>             
>>>>> In general, they are read only except that experienced user may try  
>>>>> different parameter for perf tuning.
>>>>>     
>>>>>           
>>>> __read_mostly doesn't just mean it's read mostly.  It also means it's  
>>>> read often.  Otherwise it's just wasting space in hot cachelines.
>>>>
>>>>   
>>>>         
>>>>>> I'm not even sure they should be parameters.
>>>>>>       
>>>>>>             
>>>>> For different spinlock in different OS, and for different workloads,  
>>>>> we need different parameter for tuning. It's similar as the 
>>>>> enable_ept.
>>>>>     
>>>>>           
>>>> No, global parameters don't work for tuning workloads and guests since  
>>>> they cannot be modified on a per-guest basis.  enable_ept is only 
>>>> useful for debugging and testing.
>>>>
>>>>   
>>>>         
>>>>>>> +    set_current_state(TASK_INTERRUPTIBLE);
>>>>>>> +    schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
>>>>>>> +
>>>>>>>         
>>>>>>>               
>>>>>> Please add a tracepoint for this (since it can cause significant  
>>>>>> change in behaviour),       
>>>>>>             
>>>>> Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE  
>>>>> vmexit from other vmexits.
>>>>>     
>>>>>           
>>>> Right.  I thought of the software spinlock detector, but that's another 
>>>> problem.
>>>>
>>>> I think you can drop the sleep_time parameter, it can be part of the  
>>>> function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.   
>>>> Please call it kvm_vcpu_on_spin() or something (since that's what the  
>>>> guest is doing).
>>>>         
>> kvm_vcpu_on_spin() should add the vcpu to vcpu->wq (so a new pending
>> interrupt wakes it up immediately).
>>     
>
> Updated version (also please send it separately from the vmx.c patch):
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 894a56e..43125dc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -231,6 +231,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
>  void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
>  
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
> +void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
>  void kvm_resched(struct kvm_vcpu *vcpu);
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4d0dd39..e788d70 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1479,6 +1479,21 @@ void kvm_resched(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_resched);
>  
> +void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
> +{
> +	ktime_t expires;
> +	DEFINE_WAIT(wait);
> +
> +	prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> +
> +	/* Sleep for 100 us, and hope lock-holder got scheduled */
> +	expires = ktime_add_ns(ktime_get(), 100000UL);
> +	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
> +
> +	finish_wait(&vcpu->wq, &wait);
> +}
> +EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
> +
>  static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  {
>  	struct kvm_vcpu *vcpu = vma->vm_file->private_data;
>
>   

-- 
best rgds,
edwin


[-- Attachment #2: kvm_ple_hrtimer_1.patch --]
[-- Type: text/plain, Size: 6213 bytes --]

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID              0x00000020
 #define SECONDARY_EXEC_WBINVD_EXITING		0x00000040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST	0x00000080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING	0x00000400
 
 
 #define PIN_BASED_EXT_INTR_MASK                 0x00000001
@@ -144,6 +145,8 @@ enum vmcs_field {
 	VM_ENTRY_INSTRUCTION_LEN        = 0x0000401a,
 	TPR_THRESHOLD                   = 0x0000401c,
 	SECONDARY_VM_EXEC_CONTROL       = 0x0000401e,
+	PLE_GAP                         = 0x00004020,
+	PLE_WINDOW                      = 0x00004022,
 	VM_INSTRUCTION_ERROR            = 0x00004400,
 	VM_EXIT_REASON                  = 0x00004402,
 	VM_EXIT_INTR_INFO               = 0x00004404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ            31
 #define EXIT_REASON_MSR_WRITE           32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY	 41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS         44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 70020e5..93274c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:    upper bound on the amount of time between two successive
+ *             executions of PAUSE in a loop. Also indicate if ple enabled.
+ *             According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ *             in a PAUSE loop. Tests indicate that most spinlocks are held for
+ *             less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP    41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
 struct vmcs {
 	u32 revision_id;
 	u32 abort;
@@ -319,6 +338,12 @@ static inline int cpu_has_vmx_unrestricted_guest(void)
 		SECONDARY_EXEC_UNRESTRICTED_GUEST;
 }
 
+static inline int cpu_has_vmx_ple(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
 	return flexpriority_enabled &&
@@ -1240,7 +1265,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 			SECONDARY_EXEC_WBINVD_EXITING |
 			SECONDARY_EXEC_ENABLE_VPID |
 			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_UNRESTRICTED_GUEST |
+			SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -1386,6 +1412,9 @@ static __init int hardware_setup(void)
 	if (enable_ept && !cpu_has_vmx_ept_2m_page())
 		kvm_disable_largepages();
 
+	if (!cpu_has_vmx_ple())
+		ple_gap = 0;
+
 	return alloc_kvm_area();
 }
 
@@ -2298,9 +2327,16 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 			exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
 		if (!enable_unrestricted_guest)
 			exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+		if (!ple_gap)
+			exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 		vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 	}
 
+	if (ple_gap) {
+		vmcs_write32(PLE_GAP, ple_gap);
+		vmcs_write32(PLE_WINDOW, ple_window);
+	}
+
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, !!bypass_guest_pf);
 	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
 	vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -3348,6 +3384,19 @@ out:
 }
 
 /*
+ * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
+ * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
+ */
+static int handle_pause(struct kvm_vcpu *vcpu,
+				struct kvm_run *kvm_run)
+{
+	skip_emulated_instruction(vcpu);
+	kvm_vcpu_on_spin(vcpu);
+
+	return 1;
+}
+
+/*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
  * to be done to userspace and return 0.
@@ -3383,6 +3432,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_MCE_DURING_VMENTRY]      = handle_machine_check,
 	[EXIT_REASON_EPT_VIOLATION]	      = handle_ept_violation,
 	[EXIT_REASON_EPT_MISCONFIG]           = handle_ept_misconfig,
+	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 };
 
 static const int kvm_vmx_max_exit_handlers =

[-- Attachment #3: kvm_ple_hrtimer_2.patch --]
[-- Type: text/plain, Size: 1273 bytes --]

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b985a29..bd5a616 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -286,6 +286,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c0a929f..c4289c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1108,6 +1108,21 @@ void kvm_resched(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_resched);
 
+void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
+{
+	ktime_t expires;
+	DEFINE_WAIT(wait);
+
+	prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+	/* Sleep for 100 us, and hope lock-holder got scheduled */
+	expires = ktime_add_ns(ktime_get(), 100000UL);
+	schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+
+	finish_wait(&vcpu->wq, &wait);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
+
 static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct kvm_vcpu *vcpu = vma->vm_file->private_data;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-10-09 10:03                 ` Zhai, Edwin
@ 2009-10-11 15:34                   ` Avi Kivity
  2009-10-12 19:13                   ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2009-10-11 15:34 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: Marcelo Tosatti, Mark Langsdorf, kvm

On 10/09/2009 12:03 PM, Zhai, Edwin wrote:
> Tosatti,
> See attached patch.
>
> Avi,
> Could you pls. do the check in if no any other comments.

Looks reasonable to me (Marcelo commits this week).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
  2009-10-09 10:03                 ` Zhai, Edwin
  2009-10-11 15:34                   ` Avi Kivity
@ 2009-10-12 19:13                   ` Marcelo Tosatti
  1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2009-10-12 19:13 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: Avi Kivity, Mark Langsdorf, kvm

On Fri, Oct 09, 2009 at 06:03:20PM +0800, Zhai, Edwin wrote:
> Tosatti,
> See attached patch.
>
> Avi,
> Could you pls. do the check in if no any other comments.
> Thanks,

Applied, thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-10-12 19:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-23 14:04 [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting Zhai, Edwin
2009-09-23 14:09 ` Avi Kivity
2009-09-25  1:11   ` Zhai, Edwin
2009-09-27  8:28     ` Avi Kivity
2009-09-28  9:33       ` Zhai, Edwin
2009-09-29 12:05         ` Zhai, Edwin
2009-09-29 13:34         ` Avi Kivity
2009-09-30  1:01           ` Zhai, Edwin
2009-09-30  6:28             ` Avi Kivity
2009-09-30 16:22             ` Marcelo Tosatti
2009-10-02 18:28               ` Marcelo Tosatti
2009-10-09 10:03                 ` Zhai, Edwin
2009-10-11 15:34                   ` Avi Kivity
2009-10-12 19:13                   ` Marcelo Tosatti
2009-09-25 20:43   ` Joerg Roedel
2009-09-27  8:31     ` Avi Kivity
2009-09-27 13:46       ` Joerg Roedel
2009-09-27 13:47         ` Avi Kivity
2009-09-27 14:07           ` Joerg Roedel
2009-09-27 14:18             ` Avi Kivity
2009-09-27 14:53               ` Joerg Roedel
2009-09-29 16:46                 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).