All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-14 18:25 ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.

One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case

Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro. 
- add  barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.

Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
 Stephan Diestelhorst <stephan.diestelhorst@amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks

 Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
  Add debugfs support to print u32-arrays in debugfs
  Add a hypercall to KVM hypervisor to support pv-ticketlocks
  Added configuration support to enable debug information for KVM Guests
  pv-ticketlocks support for linux guests running on KVM hypervisor
  Add documentation on Hypercalls and features used for PV spinlock
 
Test Set up :
The BASE patch is pre 3.2.0 + Jeremy's following patches.
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
(Note:locked add change is not taken yet)

Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE (non PLE).
 On PLE machine we do not see greater performance improvement because of PLE
 complimenting halt()

3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)

scenario A: unpinned

1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest

scenario B: unpinned, run kernbench on all the guests no hogs.

Dbench on PLE machine:
dbench run on all the guest simultaneously with
dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).

Result for Non PLE machine :
============================
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:
case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356

Result for PLE machine:
======================
Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
         online cores and 4*64GB RAM

Kernbench:
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:	 			
case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446

Scenario B:
		 446.104 (58.54 )	 433.12733 (54.476)	2.91

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

---
 V3 kernel Changes:
 https://lkml.org/lkml/2011/11/30/62
 V2 kernel changes : 
 https://lkml.org/lkml/2011/10/23/207

 Previous discussions : (posted by Srivatsa V).
 https://lkml.org/lkml/2010/7/26/24
 https://lkml.org/lkml/2011/1/19/212
 
 Qemu patch for V3:
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html

 Documentation/virtual/kvm/api.txt        |    7 +
 Documentation/virtual/kvm/cpuid.txt      |    4 +
 Documentation/virtual/kvm/hypercalls.txt |   54 +++++++
 arch/x86/Kconfig                         |    9 +
 arch/x86/include/asm/kvm_para.h          |   16 ++-
 arch/x86/kernel/kvm.c                    |  249 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                       |   37 ++++-
 arch/x86/xen/debugfs.c                   |  104 -------------
 arch/x86/xen/debugfs.h                   |    4 -
 arch/x86/xen/spinlock.c                  |    2 +-
 fs/debugfs/file.c                        |  128 +++++++++++++++
 include/linux/debugfs.h                  |   11 ++
 include/linux/kvm.h                      |    1 +
 include/linux/kvm_host.h                 |    1 +
 include/linux/kvm_para.h                 |    1 +
 15 files changed, 514 insertions(+), 114 deletions(-)


^ permalink raw reply	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-14 18:25 ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.

One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case

Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro. 
- add  barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.

Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
 Stephan Diestelhorst <stephan.diestelhorst@amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks

 Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
  Add debugfs support to print u32-arrays in debugfs
  Add a hypercall to KVM hypervisor to support pv-ticketlocks
  Added configuration support to enable debug information for KVM Guests
  pv-ticketlocks support for linux guests running on KVM hypervisor
  Add documentation on Hypercalls and features used for PV spinlock
 
Test Set up :
The BASE patch is pre 3.2.0 + Jeremy's following patches.
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
(Note:locked add change is not taken yet)

Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE (non PLE).
 On PLE machine we do not see greater performance improvement because of PLE
 complimenting halt()

3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)

scenario A: unpinned

1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest

scenario B: unpinned, run kernbench on all the guests no hogs.

Dbench on PLE machine:
dbench run on all the guest simultaneously with
dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).

Result for Non PLE machine :
============================
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:
case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356

Result for PLE machine:
======================
Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
         online cores and 4*64GB RAM

Kernbench:
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:	 			
case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446

Scenario B:
		 446.104 (58.54 )	 433.12733 (54.476)	2.91

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

---
 V3 kernel Changes:
 https://lkml.org/lkml/2011/11/30/62
 V2 kernel changes : 
 https://lkml.org/lkml/2011/10/23/207

 Previous discussions : (posted by Srivatsa V).
 https://lkml.org/lkml/2010/7/26/24
 https://lkml.org/lkml/2011/1/19/212
 
 Qemu patch for V3:
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html

 Documentation/virtual/kvm/api.txt        |    7 +
 Documentation/virtual/kvm/cpuid.txt      |    4 +
 Documentation/virtual/kvm/hypercalls.txt |   54 +++++++
 arch/x86/Kconfig                         |    9 +
 arch/x86/include/asm/kvm_para.h          |   16 ++-
 arch/x86/kernel/kvm.c                    |  249 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                       |   37 ++++-
 arch/x86/xen/debugfs.c                   |  104 -------------
 arch/x86/xen/debugfs.h                   |    4 -
 arch/x86/xen/spinlock.c                  |    2 +-
 fs/debugfs/file.c                        |  128 +++++++++++++++
 include/linux/debugfs.h                  |   11 ++
 include/linux/kvm.h                      |    1 +
 include/linux/kvm_host.h                 |    1 +
 include/linux/kvm_para.h                 |    1 +
 15 files changed, 514 insertions(+), 114 deletions(-)

^ permalink raw reply	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-14 18:25 ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.

One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case

Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro. 
- add  barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.

Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
 Stephan Diestelhorst <stephan.diestelhorst@amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks

 Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
  Add debugfs support to print u32-arrays in debugfs
  Add a hypercall to KVM hypervisor to support pv-ticketlocks
  Added configuration support to enable debug information for KVM Guests
  pv-ticketlocks support for linux guests running on KVM hypervisor
  Add documentation on Hypercalls and features used for PV spinlock
 
Test Set up :
The BASE patch is pre 3.2.0 + Jeremy's following patches.
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
(Note:locked add change is not taken yet)

Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE (non PLE).
 On PLE machine we do not see greater performance improvement because of PLE
 complimenting halt()

3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)

scenario A: unpinned

1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest

scenario B: unpinned, run kernbench on all the guests no hogs.

Dbench on PLE machine:
dbench run on all the guest simultaneously with
dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).

Result for Non PLE machine :
============================
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:
case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356

Result for PLE machine:
======================
Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
         online cores and 4*64GB RAM

Kernbench:
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:	 			
case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446

Scenario B:
		 446.104 (58.54 )	 433.12733 (54.476)	2.91

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

---
 V3 kernel Changes:
 https://lkml.org/lkml/2011/11/30/62
 V2 kernel changes : 
 https://lkml.org/lkml/2011/10/23/207

 Previous discussions : (posted by Srivatsa V).
 https://lkml.org/lkml/2010/7/26/24
 https://lkml.org/lkml/2011/1/19/212
 
 Qemu patch for V3:
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html

 Documentation/virtual/kvm/api.txt        |    7 +
 Documentation/virtual/kvm/cpuid.txt      |    4 +
 Documentation/virtual/kvm/hypercalls.txt |   54 +++++++
 arch/x86/Kconfig                         |    9 +
 arch/x86/include/asm/kvm_para.h          |   16 ++-
 arch/x86/kernel/kvm.c                    |  249 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                       |   37 ++++-
 arch/x86/xen/debugfs.c                   |  104 -------------
 arch/x86/xen/debugfs.h                   |    4 -
 arch/x86/xen/spinlock.c                  |    2 +-
 fs/debugfs/file.c                        |  128 +++++++++++++++
 include/linux/debugfs.h                  |   11 ++
 include/linux/kvm.h                      |    1 +
 include/linux/kvm_host.h                 |    1 +
 include/linux/kvm_para.h                 |    1 +
 15 files changed, 514 insertions(+), 114 deletions(-)

^ permalink raw reply	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs
  2012-01-14 18:25 ` Raghavendra K T
  (?)
@ 2012-01-14 18:25   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs
to make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index 7c0fedd..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array, data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index e281320..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index fc506e6..14a8961 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 90f7657..df44ccf 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/debugfs.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 	return debugfs_create_file(name, mode, parent, blob, &fops_blob);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+	void *array;
+	u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, u32 array_size)
+{
+	size_t ret = 0;
+	u32 i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+						u32 array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data = format_array_alloc("%u", data->array,
+							      data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, u32 elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index e7d9b20..253e2fb 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob);
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void)
 	return false;
 }
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 #endif
 
 #endif


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs
@ 2012-01-14 18:25   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti, Xen
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs
to make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index 7c0fedd..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array, data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index e281320..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index fc506e6..14a8961 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 90f7657..df44ccf 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/debugfs.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 	return debugfs_create_file(name, mode, parent, blob, &fops_blob);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+	void *array;
+	u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, u32 array_size)
+{
+	size_t ret = 0;
+	u32 i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+						u32 array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data = format_array_alloc("%u", data->array,
+							      data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, u32 elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index e7d9b20..253e2fb 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob);
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void)
 	return false;
 }
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 #endif
 
 #endif

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs
@ 2012-01-14 18:25   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs
to make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index 7c0fedd..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array, data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index e281320..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index fc506e6..14a8961 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 90f7657..df44ccf 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/debugfs.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 	return debugfs_create_file(name, mode, parent, blob, &fops_blob);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+	void *array;
+	u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, u32 array_size)
+{
+	size_t ret = 0;
+	u32 i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+						u32 array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data = format_array_alloc("%u", data->array,
+							      data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, u32 elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index e7d9b20..253e2fb 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob);
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void)
 	return false;
 }
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 #endif
 
 #endif

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2012-01-14 18:25 ` Raghavendra K T
  (?)
@ 2012-01-14 18:25   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Add a hypercall to KVM hypervisor to support pv-ticketlocks 

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
    
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.

Qemu needs a corresponding patch to pass up the presence of this feature to 
guest via cpuid. Patch to qemu will be sent separately.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..7a94987 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PVLOCK_KICK		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c938da..c7b05fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_PVLOCK_KICK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PVLOCK_KICK);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_MMU_OP:
 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..63fb6b0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PVLOCK_KICK 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..3b1ae7b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -50,6 +50,7 @@
 #define KVM_REQ_APF_HALT          12
 #define KVM_REQ_STEAL_UPDATE      13
 #define KVM_REQ_NMI               14
+#define KVM_REQ_PVLOCK_KICK       15
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID	0
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..19f10bd 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-14 18:25   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Add a hypercall to KVM hypervisor to support pv-ticketlocks 

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
    
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.

Qemu needs a corresponding patch to pass up the presence of this feature to 
guest via cpuid. Patch to qemu will be sent separately.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..7a94987 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PVLOCK_KICK		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c938da..c7b05fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_PVLOCK_KICK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PVLOCK_KICK);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_MMU_OP:
 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..63fb6b0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PVLOCK_KICK 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..3b1ae7b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -50,6 +50,7 @@
 #define KVM_REQ_APF_HALT          12
 #define KVM_REQ_STEAL_UPDATE      13
 #define KVM_REQ_NMI               14
+#define KVM_REQ_PVLOCK_KICK       15
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID	0
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..19f10bd 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-14 18:25   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Add a hypercall to KVM hypervisor to support pv-ticketlocks 

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
    
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.

Qemu needs a corresponding patch to pass up the presence of this feature to 
guest via cpuid. Patch to qemu will be sent separately.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..7a94987 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PVLOCK_KICK		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c938da..c7b05fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_PVLOCK_KICK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PVLOCK_KICK);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_MMU_OP:
 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..63fb6b0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PVLOCK_KICK 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..3b1ae7b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -50,6 +50,7 @@
 #define KVM_REQ_APF_HALT          12
 #define KVM_REQ_STEAL_UPDATE      13
 #define KVM_REQ_NMI               14
+#define KVM_REQ_PVLOCK_KICK       15
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID	0
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..19f10bd 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests
  2012-01-14 18:25 ` Raghavendra K T
  (?)
@ 2012-01-14 18:26   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Added configuration support to enable debug information
for KVM Guests in debugfs
    
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72e8b64..344a7db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -565,6 +565,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests
@ 2012-01-14 18:26   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti, Xen
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Added configuration support to enable debug information
for KVM Guests in debugfs
    
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72e8b64..344a7db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -565,6 +565,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests
@ 2012-01-14 18:26   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Added configuration support to enable debug information
for KVM Guests in debugfs
    
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72e8b64..344a7db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -565,6 +565,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-14 18:25 ` Raghavendra K T
  (?)
@ 2012-01-14 18:26   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7a94987..cf5327c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..ec55a0b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+
+#define HISTO_BUCKETS	30
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index = ilog2(delta);
+
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta = sched_clock() - start;
+
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm = kvm_init_debugfs();
+
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
+	u64 start;
+	unsigned long flags;
+
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int apicid)
+{
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+	int apicid;
+
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			apicid = per_cpu(x86_cpu_to_apicid, cpu);
+			kvm_kick_cpu(apicid);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7b05fc..4d7a950 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	local_irq_disable();
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE
+		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
+		 || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-14 18:26   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley, Xen
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7a94987..cf5327c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..ec55a0b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+
+#define HISTO_BUCKETS	30
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index = ilog2(delta);
+
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta = sched_clock() - start;
+
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm = kvm_init_debugfs();
+
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
+	u64 start;
+	unsigned long flags;
+
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int apicid)
+{
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+	int apicid;
+
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			apicid = per_cpu(x86_cpu_to_apicid, cpu);
+			kvm_kick_cpu(apicid);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7b05fc..4d7a950 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	local_irq_disable();
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE
+		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
+		 || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-14 18:26   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, X86,
	Marcelo Tosatti, Gleb Natapov, Avi Kivity, Alexander Graf,
	Stefano Stabellini, Paul Mackerras, Sedat Dilek, Ingo Molnar,
	LKML, Greg Kroah-Hartman, Virtualization, Rob Landley
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7a94987..cf5327c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..ec55a0b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+
+#define HISTO_BUCKETS	30
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index = ilog2(delta);
+
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta = sched_clock() - start;
+
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm = kvm_init_debugfs();
+
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
+	u64 start;
+	unsigned long flags;
+
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int apicid)
+{
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+	int apicid;
+
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			apicid = per_cpu(x86_cpu_to_apicid, cpu);
+			kvm_kick_cpu(apicid);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7b05fc..4d7a950 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	local_irq_disable();
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE
+		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
+		 || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-14 18:25 ` Raghavendra K T
  (?)
@ 2012-01-14 18:27   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti, Xen
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
paravirtual spinlock enabled guest.

KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
be enabled in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)

A minimal Documentation and template is added for hypercalls.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e2a4b52..1583bc7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 8820685..c7fc0da 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..7872da5
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,54 @@
+KVM Hypercalls Documentation
+===========================
+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose:
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+new shared page that contains parts of supervisor visible register state.
+The guest can map this shared page using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available)
+can execute HLT instruction once it has busy-waited for more than a
+threshold time-interval. Execution of HLT instruction would cause
+the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
+of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
+vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
+wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-14 18:27   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
paravirtual spinlock enabled guest.

KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
be enabled in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)

A minimal Documentation and template is added for hypercalls.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e2a4b52..1583bc7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 8820685..c7fc0da 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..7872da5
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,54 @@
+KVM Hypercalls Documentation
+===========================
+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose:
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+new shared page that contains parts of supervisor visible register state.
+The guest can map this shared page using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available)
+can execute HLT instruction once it has busy-waited for more than a
+threshold time-interval. Execution of HLT instruction would cause
+the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
+of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
+vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
+wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-14 18:25 ` Raghavendra K T
                   ` (5 preceding siblings ...)
  (?)
@ 2012-01-14 18:27 ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti
  Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
	Suzuki Poulose, Sasha Levin

Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
paravirtual spinlock enabled guest.

KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
be enabled in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)

A minimal Documentation and template is added for hypercalls.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e2a4b52..1583bc7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 8820685..c7fc0da 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..7872da5
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,54 @@
+KVM Hypercalls Documentation
+===========================
+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose:
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+new shared page that contains parts of supervisor visible register state.
+The guest can map this shared page using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available)
+can execute HLT instruction once it has busy-waited for more than a
+threshold time-interval. Execution of HLT instruction would cause
+the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
+of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
+vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
+wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-14 18:27   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-14 18:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Randy Dunlap, linux-doc, KVM,
	Konrad Rzeszutek Wilk, Glauber Costa, Jan Kiszka, Rik van Riel,
	Dave Jiang, H. Peter Anvin, Thomas Gleixner, Rob Landley, X86,
	Gleb Natapov, Avi Kivity, Alexander Graf, Stefano Stabellini,
	Paul Mackerras, Sedat Dilek, Ingo Molnar, LKML,
	Greg Kroah-Hartman, Virtualization, Marcelo Tosatti
  Cc: Srivatsa Vaddagiri, Peter Zijlstra, Raghavendra K T, Sasha Levin,
	Suzuki Poulose, Dave Hansen

Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
paravirtual spinlock enabled guest.

KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
be enabled in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)

A minimal Documentation and template is added for hypercalls.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e2a4b52..1583bc7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 8820685..c7fc0da 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..7872da5
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,54 @@
+KVM Hypercalls Documentation
+===========================
+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose:
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+new shared page that contains parts of supervisor visible register state.
+The guest can map this shared page using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available)
+can execute HLT instruction once it has busy-waited for more than a
+threshold time-interval. Execution of HLT instruction would cause
+the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
+of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
+vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
+wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.


^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-14 18:26   ` Raghavendra K T
@ 2012-01-16  3:12     ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:12 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:26, Raghavendra K T wrote:

> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
> 
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
> support for pv-ticketlocks is registered via pv_lock_ops.
> 
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 7a94987..cf5327c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
> void kvm_async_pf_task_wake(u32 token);
> u32 kvm_read_and_reset_pf_reason(void);
> extern void kvm_disable_steal_time(void);
> -#else
> -#define kvm_guest_init() do { } while (0)
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +void __init kvm_spinlock_init(void);
> +#else /* CONFIG_PARAVIRT_SPINLOCKS */
> +static void kvm_spinlock_init(void)
> +{
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#else /* CONFIG_KVM_GUEST */
> +#define kvm_guest_init() do {} while (0)
> #define kvm_async_pf_task_wait(T) do {} while(0)
> #define kvm_async_pf_task_wake(T) do {} while(0)
> +
> static inline u32 kvm_read_and_reset_pf_reason(void)
> {
> 	return 0;
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a9c2116..ec55a0b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -33,6 +33,7 @@
> #include <linux/sched.h>
> #include <linux/slab.h>
> #include <linux/kprobes.h>
> +#include <linux/debugfs.h>
> #include <asm/timer.h>
> #include <asm/cpu.h>
> #include <asm/traps.h>
> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
> #endif
> 	kvm_guest_cpu_init();
> 	native_smp_prepare_boot_cpu();
> +	kvm_spinlock_init();
> }
> 
> static void __cpuinit kvm_guest_cpu_online(void *dummy)
> @@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
> 	return 0;
> }
> arch_initcall(activate_jump_labels);
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +
> +enum kvm_contention_stat {
> +	TAKEN_SLOW,
> +	TAKEN_SLOW_PICKUP,
> +	RELEASED_SLOW,
> +	RELEASED_SLOW_KICKED,
> +	NR_CONTENTION_STATS
> +};
> +
> +#ifdef CONFIG_KVM_DEBUG_FS
> +
> +static struct kvm_spinlock_stats
> +{
> +	u32 contention_stats[NR_CONTENTION_STATS];
> +
> +#define HISTO_BUCKETS	30
> +	u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> +	u64 time_blocked;
> +} spinlock_stats;
> +
> +static u8 zero_stats;
> +
> +static inline void check_zero(void)
> +{
> +	u8 ret;
> +	u8 old = ACCESS_ONCE(zero_stats);
> +	if (unlikely(old)) {
> +		ret = cmpxchg(&zero_stats, old, 0);
> +		/* This ensures only one fellow resets the stat */
> +		if (ret == old)
> +			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
> +	}
> +}
> +
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +	check_zero();
> +	spinlock_stats.contention_stats[var] += val;
> +}
> +
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return sched_clock();
> +}
> +
> +static void __spin_time_accum(u64 delta, u32 *array)
> +{
> +	unsigned index = ilog2(delta);
> +
> +	check_zero();
> +
> +	if (index < HISTO_BUCKETS)
> +		array[index]++;
> +	else
> +		array[HISTO_BUCKETS]++;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +	u32 delta = sched_clock() - start;
> +
> +	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
> +	spinlock_stats.time_blocked += delta;
> +}
> +
> +static struct dentry *d_spin_debug;
> +static struct dentry *d_kvm_debug;
> +
> +struct dentry *kvm_init_debugfs(void)
> +{
> +	d_kvm_debug = debugfs_create_dir("kvm", NULL);
> +	if (!d_kvm_debug)
> +		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
> +
> +	return d_kvm_debug;
> +}
> +
> +static int __init kvm_spinlock_debugfs(void)
> +{
> +	struct dentry *d_kvm = kvm_init_debugfs();
> +
> +	if (d_kvm == NULL)
> +		return -ENOMEM;
> +
> +	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
> +	return 0;
> +}
> +fs_initcall(kvm_spinlock_debugfs);
> +#else  /* !CONFIG_KVM_DEBUG_FS */
> +#define TIMEOUT			(1 << 10)
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +}
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return 0;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +}
> +#endif  /* CONFIG_KVM_DEBUG_FS */
> +
> +struct kvm_lock_waiting {
> +	struct arch_spinlock *lock;
> +	__ticket_t want;
> +};
> +
> +/* cpus 'waiting' on a spinlock to become available */
> +static cpumask_t waiting_cpus;
> +
> +/* Track spinlock on which a cpu is waiting */
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
> +	int cpu = smp_processor_id();
> +	u64 start;
> +	unsigned long flags;
> +
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/* Allow interrupts while blocked */
> +	local_irq_restore(flags);
> +
> +	/* halt until it's our turn and kicked. */
> +	halt();
> +
> +	local_irq_save(flags);
> +out:
> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick a cpu by its apicid*/
> +static inline void kvm_kick_cpu(int apicid)
> +{
> +	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
> +}
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +	int apicid;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> +			kvm_kick_cpu(apicid);
> +			break;
> +		}
> +	}
> +}
> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
> +		return;
> +
> +	jump_label_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c7b05fc..4d7a950 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c

This patch is mixing host and guest code. Please split those up.


Alex

> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> 
> 	local_irq_disable();
> 
> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> -	    || need_resched() || signal_pending(current)) {
> +	if (vcpu->mode == EXITING_GUEST_MODE
> +		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
> +		 || need_resched() || signal_pending(current)) {
> 		vcpu->mode = OUTSIDE_GUEST_MODE;
> 		smp_wmb();
> 		local_irq_enable();
> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> 		!vcpu->arch.apf.halted)
> 		|| !list_empty_careful(&vcpu->async_pf.done)
> 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
> 		|| atomic_read(&vcpu->arch.nmi_queued) ||
> 		(kvm_arch_interrupt_allowed(vcpu) &&
> 		 kvm_cpu_has_interrupt(vcpu));
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16  3:12     ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:12 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:26, Raghavendra K T wrote:

> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
> 
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
> support for pv-ticketlocks is registered via pv_lock_ops.
> 
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 7a94987..cf5327c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
> void kvm_async_pf_task_wake(u32 token);
> u32 kvm_read_and_reset_pf_reason(void);
> extern void kvm_disable_steal_time(void);
> -#else
> -#define kvm_guest_init() do { } while (0)
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +void __init kvm_spinlock_init(void);
> +#else /* CONFIG_PARAVIRT_SPINLOCKS */
> +static void kvm_spinlock_init(void)
> +{
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#else /* CONFIG_KVM_GUEST */
> +#define kvm_guest_init() do {} while (0)
> #define kvm_async_pf_task_wait(T) do {} while(0)
> #define kvm_async_pf_task_wake(T) do {} while(0)
> +
> static inline u32 kvm_read_and_reset_pf_reason(void)
> {
> 	return 0;
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a9c2116..ec55a0b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -33,6 +33,7 @@
> #include <linux/sched.h>
> #include <linux/slab.h>
> #include <linux/kprobes.h>
> +#include <linux/debugfs.h>
> #include <asm/timer.h>
> #include <asm/cpu.h>
> #include <asm/traps.h>
> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
> #endif
> 	kvm_guest_cpu_init();
> 	native_smp_prepare_boot_cpu();
> +	kvm_spinlock_init();
> }
> 
> static void __cpuinit kvm_guest_cpu_online(void *dummy)
> @@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
> 	return 0;
> }
> arch_initcall(activate_jump_labels);
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +
> +enum kvm_contention_stat {
> +	TAKEN_SLOW,
> +	TAKEN_SLOW_PICKUP,
> +	RELEASED_SLOW,
> +	RELEASED_SLOW_KICKED,
> +	NR_CONTENTION_STATS
> +};
> +
> +#ifdef CONFIG_KVM_DEBUG_FS
> +
> +static struct kvm_spinlock_stats
> +{
> +	u32 contention_stats[NR_CONTENTION_STATS];
> +
> +#define HISTO_BUCKETS	30
> +	u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> +	u64 time_blocked;
> +} spinlock_stats;
> +
> +static u8 zero_stats;
> +
> +static inline void check_zero(void)
> +{
> +	u8 ret;
> +	u8 old = ACCESS_ONCE(zero_stats);
> +	if (unlikely(old)) {
> +		ret = cmpxchg(&zero_stats, old, 0);
> +		/* This ensures only one fellow resets the stat */
> +		if (ret == old)
> +			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
> +	}
> +}
> +
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +	check_zero();
> +	spinlock_stats.contention_stats[var] += val;
> +}
> +
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return sched_clock();
> +}
> +
> +static void __spin_time_accum(u64 delta, u32 *array)
> +{
> +	unsigned index = ilog2(delta);
> +
> +	check_zero();
> +
> +	if (index < HISTO_BUCKETS)
> +		array[index]++;
> +	else
> +		array[HISTO_BUCKETS]++;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +	u32 delta = sched_clock() - start;
> +
> +	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
> +	spinlock_stats.time_blocked += delta;
> +}
> +
> +static struct dentry *d_spin_debug;
> +static struct dentry *d_kvm_debug;
> +
> +struct dentry *kvm_init_debugfs(void)
> +{
> +	d_kvm_debug = debugfs_create_dir("kvm", NULL);
> +	if (!d_kvm_debug)
> +		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
> +
> +	return d_kvm_debug;
> +}
> +
> +static int __init kvm_spinlock_debugfs(void)
> +{
> +	struct dentry *d_kvm = kvm_init_debugfs();
> +
> +	if (d_kvm == NULL)
> +		return -ENOMEM;
> +
> +	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
> +	return 0;
> +}
> +fs_initcall(kvm_spinlock_debugfs);
> +#else  /* !CONFIG_KVM_DEBUG_FS */
> +#define TIMEOUT			(1 << 10)
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +}
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return 0;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +}
> +#endif  /* CONFIG_KVM_DEBUG_FS */
> +
> +struct kvm_lock_waiting {
> +	struct arch_spinlock *lock;
> +	__ticket_t want;
> +};
> +
> +/* cpus 'waiting' on a spinlock to become available */
> +static cpumask_t waiting_cpus;
> +
> +/* Track spinlock on which a cpu is waiting */
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
> +	int cpu = smp_processor_id();
> +	u64 start;
> +	unsigned long flags;
> +
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/* Allow interrupts while blocked */
> +	local_irq_restore(flags);
> +
> +	/* halt until it's our turn and kicked. */
> +	halt();
> +
> +	local_irq_save(flags);
> +out:
> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick a cpu by its apicid*/
> +static inline void kvm_kick_cpu(int apicid)
> +{
> +	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
> +}
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +	int apicid;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> +			kvm_kick_cpu(apicid);
> +			break;
> +		}
> +	}
> +}
> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
> +		return;
> +
> +	jump_label_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c7b05fc..4d7a950 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c

This patch is mixing host and guest code. Please split those up.


Alex

> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> 
> 	local_irq_disable();
> 
> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> -	    || need_resched() || signal_pending(current)) {
> +	if (vcpu->mode == EXITING_GUEST_MODE
> +		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
> +		 || need_resched() || signal_pending(current)) {
> 		vcpu->mode = OUTSIDE_GUEST_MODE;
> 		smp_wmb();
> 		local_irq_enable();
> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> 		!vcpu->arch.apf.halted)
> 		|| !list_empty_careful(&vcpu->async_pf.done)
> 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
> 		|| atomic_read(&vcpu->arch.nmi_queued) ||
> 		(kvm_arch_interrupt_allowed(vcpu) &&
> 		 kvm_cpu_has_interrupt(vcpu));
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-14 18:27   ` Raghavendra K T
@ 2012-01-16  3:23     ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:23 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen


On 14.01.2012, at 19:27, Raghavendra K T wrote:

> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
> 
> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
> paravirtual spinlock enabled guest.
> 
> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
> be enabled in guest. support in host is queried via
> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
> 
> A minimal Documentation and template is added for hypercalls.
> 
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> ---
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index e2a4b52..1583bc7 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1109,6 +1109,13 @@ support.  Instead it is reported via
> if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
> feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
> 
> +Paravirtualized ticket spinlocks can be enabled in guest by checking whether
> +support exists in host via,
> +
> +  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
> +
> +if this call return true, guest can use the feature.
> +
> 4.47 KVM_PPC_GET_PVINFO
> 
> Capability: KVM_CAP_PPC_GET_PVINFO
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 8820685..c7fc0da 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
> KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
>                                    ||       || writing to msr 0x4b564d02
> ------------------------------------------------------------------------------
> +KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
> +                                   ||       || before enabling paravirtualized
> +                                   ||       || spinlock support.
> +------------------------------------------------------------------------------
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                    ||       || per-cpu warps are expected in
>                                    ||       || kvmclock.
> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> new file mode 100644
> index 0000000..7872da5
> --- /dev/null
> +++ b/Documentation/virtual/kvm/hypercalls.txt
> @@ -0,0 +1,54 @@
> +KVM Hypercalls Documentation
> +===========================
> +Template for documentation is
> +The documenenation for hypercalls should inlcude
> +1. Hypercall name, value.
> +2. Architecture(s)
> +3. Purpose
> +
> +
> +1. KVM_HC_VAPIC_POLL_IRQ
> +------------------------
> +value: 1
> +Architecture: x86
> +Purpose:
> +
> +2. KVM_HC_MMU_OP
> +------------------------
> +value: 2
> +Architecture: x86
> +Purpose: Support MMU operations such as writing to PTE,
> +flushing TLB, release PT.

This one is deprecated, no? Should probably be mentioned here.

> +
> +3. KVM_HC_FEATURES
> +------------------------
> +value: 3
> +Architecture: PPC
> +Purpose:

Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.

> +
> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
> +------------------------
> +value: 4
> +Architecture: PPC
> +Purpose: To enable communication between the hypervisor and guest there is a
> +new

It's not new anymore :)

> shared page that contains parts of supervisor visible register state.
> +The guest can map this shared page using this hypercall.

... to access its supervisor register through memory.

> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> +wokenup.

The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu. The APIC piece is an implementation detail for x86. On PPC we could just use the PIR register contents (processor identifier).

Maybe I didn't fully understand what this really is about though :)


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16  3:23     ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:23 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen


On 14.01.2012, at 19:27, Raghavendra K T wrote:

> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
> 
> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
> paravirtual spinlock enabled guest.
> 
> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
> be enabled in guest. support in host is queried via
> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
> 
> A minimal Documentation and template is added for hypercalls.
> 
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> ---
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index e2a4b52..1583bc7 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1109,6 +1109,13 @@ support.  Instead it is reported via
> if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
> feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
> 
> +Paravirtualized ticket spinlocks can be enabled in guest by checking whether
> +support exists in host via,
> +
> +  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
> +
> +if this call return true, guest can use the feature.
> +
> 4.47 KVM_PPC_GET_PVINFO
> 
> Capability: KVM_CAP_PPC_GET_PVINFO
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 8820685..c7fc0da 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
> KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
>                                    ||       || writing to msr 0x4b564d02
> ------------------------------------------------------------------------------
> +KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
> +                                   ||       || before enabling paravirtualized
> +                                   ||       || spinlock support.
> +------------------------------------------------------------------------------
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
>                                    ||       || per-cpu warps are expected in
>                                    ||       || kvmclock.
> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
> new file mode 100644
> index 0000000..7872da5
> --- /dev/null
> +++ b/Documentation/virtual/kvm/hypercalls.txt
> @@ -0,0 +1,54 @@
> +KVM Hypercalls Documentation
> +===========================
> +Template for documentation is
> +The documenenation for hypercalls should inlcude
> +1. Hypercall name, value.
> +2. Architecture(s)
> +3. Purpose
> +
> +
> +1. KVM_HC_VAPIC_POLL_IRQ
> +------------------------
> +value: 1
> +Architecture: x86
> +Purpose:
> +
> +2. KVM_HC_MMU_OP
> +------------------------
> +value: 2
> +Architecture: x86
> +Purpose: Support MMU operations such as writing to PTE,
> +flushing TLB, release PT.

This one is deprecated, no? Should probably be mentioned here.

> +
> +3. KVM_HC_FEATURES
> +------------------------
> +value: 3
> +Architecture: PPC
> +Purpose:

Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.

> +
> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
> +------------------------
> +value: 4
> +Architecture: PPC
> +Purpose: To enable communication between the hypervisor and guest there is a
> +new

It's not new anymore :)

> shared page that contains parts of supervisor visible register state.
> +The guest can map this shared page using this hypercall.

... to access its supervisor register through memory.

> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> +wokenup.

The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu. The APIC piece is an implementation detail for x86. On PPC we could just use the PIR register contents (processor identifier).

Maybe I didn't fully understand what this really is about though :)


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2012-01-14 18:25   ` Raghavendra K T
@ 2012-01-16  3:24     ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:24 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:25, Raghavendra K T wrote:

> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
> 
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 734c376..7a94987 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,12 +16,14 @@
> #define KVM_FEATURE_CLOCKSOURCE		0
> #define KVM_FEATURE_NOP_IO_DELAY	1
> #define KVM_FEATURE_MMU_OP		2
> +
> /* This indicates that the new set of kvmclock msrs
>  * are available. The use of 0x11 and 0x12 is deprecated
>  */
> #define KVM_FEATURE_CLOCKSOURCE2        3
> #define KVM_FEATURE_ASYNC_PF		4
> #define KVM_FEATURE_STEAL_TIME		5
> +#define KVM_FEATURE_PVLOCK_KICK		6
> 
> /* The last 8 bits are used to indicate how to interpret the flags field
>  * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c938da..c7b05fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> 	case KVM_CAP_XSAVE:
> 	case KVM_CAP_ASYNC_PF:
> 	case KVM_CAP_GET_TSC_KHZ:
> +	case KVM_CAP_PVLOCK_KICK:
> 		r = 1;
> 		break;
> 	case KVM_CAP_COALESCED_MMIO:
> @@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
> 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
> 			     (1 << KVM_FEATURE_ASYNC_PF) |
> -			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> +			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> +			     (1 << KVM_FEATURE_PVLOCK_KICK);
> 
> 		if (sched_info_on())
> 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> 	return 1;
> }
> 
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> {
> 	unsigned long nr, a0, a1, a2, a3, ret;
> @@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> 	case KVM_HC_MMU_OP:
> 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
> 		break;
> +	case KVM_HC_KICK_CPU:
> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
> +		ret = 0;
> +		break;
> 	default:
> 		ret = -KVM_ENOSYS;
> 		break;
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 68e67e5..63fb6b0 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
> #define KVM_CAP_PPC_PAPR 68
> #define KVM_CAP_S390_GMAP 71
> #define KVM_CAP_TSC_DEADLINE_TIMER 72
> +#define KVM_CAP_PVLOCK_KICK 73
> 
> #ifdef KVM_CAP_IRQ_ROUTING
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d526231..3b1ae7b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -50,6 +50,7 @@
> #define KVM_REQ_APF_HALT          12
> #define KVM_REQ_STEAL_UPDATE      13
> #define KVM_REQ_NMI               14
> +#define KVM_REQ_PVLOCK_KICK       15

Everything I see in this patch is pvlock agnostic. It's only a vcpu kick hypercall. So it's probably a good idea to also name it accordingly :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-16  3:24     ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:24 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:25, Raghavendra K T wrote:

> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
> 
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 734c376..7a94987 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,12 +16,14 @@
> #define KVM_FEATURE_CLOCKSOURCE		0
> #define KVM_FEATURE_NOP_IO_DELAY	1
> #define KVM_FEATURE_MMU_OP		2
> +
> /* This indicates that the new set of kvmclock msrs
>  * are available. The use of 0x11 and 0x12 is deprecated
>  */
> #define KVM_FEATURE_CLOCKSOURCE2        3
> #define KVM_FEATURE_ASYNC_PF		4
> #define KVM_FEATURE_STEAL_TIME		5
> +#define KVM_FEATURE_PVLOCK_KICK		6
> 
> /* The last 8 bits are used to indicate how to interpret the flags field
>  * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c938da..c7b05fc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> 	case KVM_CAP_XSAVE:
> 	case KVM_CAP_ASYNC_PF:
> 	case KVM_CAP_GET_TSC_KHZ:
> +	case KVM_CAP_PVLOCK_KICK:
> 		r = 1;
> 		break;
> 	case KVM_CAP_COALESCED_MMIO:
> @@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
> 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
> 			     (1 << KVM_FEATURE_ASYNC_PF) |
> -			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> +			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> +			     (1 << KVM_FEATURE_PVLOCK_KICK);
> 
> 		if (sched_info_on())
> 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> 	return 1;
> }
> 
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> {
> 	unsigned long nr, a0, a1, a2, a3, ret;
> @@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> 	case KVM_HC_MMU_OP:
> 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
> 		break;
> +	case KVM_HC_KICK_CPU:
> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
> +		ret = 0;
> +		break;
> 	default:
> 		ret = -KVM_ENOSYS;
> 		break;
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 68e67e5..63fb6b0 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
> #define KVM_CAP_PPC_PAPR 68
> #define KVM_CAP_S390_GMAP 71
> #define KVM_CAP_TSC_DEADLINE_TIMER 72
> +#define KVM_CAP_PVLOCK_KICK 73
> 
> #ifdef KVM_CAP_IRQ_ROUTING
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d526231..3b1ae7b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -50,6 +50,7 @@
> #define KVM_REQ_APF_HALT          12
> #define KVM_REQ_STEAL_UPDATE      13
> #define KVM_REQ_NMI               14
> +#define KVM_REQ_PVLOCK_KICK       15

Everything I see in this patch is pvlock agnostic. It's only a vcpu kick hypercall. So it's probably a good idea to also name it accordingly :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  3:23     ` Alexander Graf
@ 2012-01-16  3:51       ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16  3:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML, Dave Hansen

* Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:

> > +5. KVM_HC_KICK_CPU
> > +------------------------
> > +value: 5
> > +Architecture: x86
> > +Purpose: Hypercall used to wakeup a vcpu from HLT state
> > +
> > +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> > +kernel mode for an event to occur (ex: a spinlock to become available)
> > +can execute HLT instruction once it has busy-waited for more than a
> > +threshold time-interval. Execution of HLT instruction would cause
> > +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> > +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> > +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> > +wokenup.
> 
> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.

Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
target vcpu to be prodded/wokenup, after which vcpu continues execution.

Note that semantics of this hypercall is different from the hypercall on which
PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
of ticketlocks on x86 (which does not allow us to easily store owning cpu
details in lock word itself).

> The APIC piece is an implementation detail for x86. On PPC we could just use the PIR register contents (processor identifier).

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16  3:51       ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16  3:51 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML

* Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:

> > +5. KVM_HC_KICK_CPU
> > +------------------------
> > +value: 5
> > +Architecture: x86
> > +Purpose: Hypercall used to wakeup a vcpu from HLT state
> > +
> > +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> > +kernel mode for an event to occur (ex: a spinlock to become available)
> > +can execute HLT instruction once it has busy-waited for more than a
> > +threshold time-interval. Execution of HLT instruction would cause
> > +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> > +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> > +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> > +wokenup.
> 
> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.

Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
target vcpu to be prodded/wokenup, after which vcpu continues execution.

Note that semantics of this hypercall is different from the hypercall on which
PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
of ticketlocks on x86 (which does not allow us to easily store owning cpu
details in lock word itself).

> The APIC piece is an implementation detail for x86. On PPC we could just use the PIR register contents (processor identifier).

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-14 18:25 ` Raghavendra K T
@ 2012-01-16  3:57   ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:57 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:25, Raghavendra K T wrote:

> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
> 
> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
> another vcpu out of halt state.
> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.

Either way, thinking about this I stumbled over the following passage of his patch:

> +               unsigned count = SPIN_THRESHOLD;
> +
> +               do {
> +                       if (inc.head == inc.tail)
> +                               goto out;
> +                       cpu_relax();
> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
> +               } while (--count);
> +               __ticket_lock_spinning(lock, inc.tail);


That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.

Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.

Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!

So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.

Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.

Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.

> 
> Changes in V4:
> - reabsed to 3.2.0 pre.
> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
>  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
> 
> Changes in V3:
> - rebased to 3.2-rc1
> - use halt() instead of wait for kick hypercall.
> - modify kick hyper call to do wakeup halted vcpu.
> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
> - fix the potential race when zero_stat is read.
> - export debugfs_create_32 and add documentation to API.
> - use static inline and enum instead of ADDSTAT macro. 
> - add  barrier() in after setting kick_vcpu.
> - empty static inline function for kvm_spinlock_init.
> - combine the patches one and two readuce overhead.
> - make KVM_DEBUGFS depends on DEBUGFS.
> - include debugfs header unconditionally.
> 
> Changes in V2:
> - rebased patchesto -rc9
> - synchronization related changes based on Jeremy's changes 
> (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
> Stephan Diestelhorst <stephan.diestelhorst@amd.com>
> - enabling 32 bit guests
> - splitted patches into two more chunks
> 
> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>  Add debugfs support to print u32-arrays in debugfs
>  Add a hypercall to KVM hypervisor to support pv-ticketlocks
>  Added configuration support to enable debug information for KVM Guests
>  pv-ticketlocks support for linux guests running on KVM hypervisor
>  Add documentation on Hypercalls and features used for PV spinlock
> 
> Test Set up :
> The BASE patch is pre 3.2.0 + Jeremy's following patches.
> xadd (https://lkml.org/lkml/2011/10/4/328)
> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
> (Note:locked add change is not taken yet)
> 
> Results:
> The performance gain is mainly because of reduced busy-wait time.
> From the results we can see that patched kernel performance is similar to
> BASE when there is no lock contention. But once we start seeing more
> contention, patched kernel outperforms BASE (non PLE).
> On PLE machine we do not see greater performance improvement because of PLE
> complimenting halt()
> 
> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
> true with an instruction)
> 
> scenario A: unpinned
> 
> 1x: no hogs
> 2x: 8hogs in one guest
> 3x: 8hogs each in two guest
> 
> scenario B: unpinned, run kernbench on all the guests no hogs.
> 
> Dbench on PLE machine:
> dbench run on all the guest simultaneously with
> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
> 
> Result for Non PLE machine :
> ============================
> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
> 
> Result for PLE machine:
> ======================
> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>         online cores and 4*64GB RAM
> 
> Kernbench:
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:	 			
> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
> 
> Scenario B:
> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

So on a very contended system we're actually slower? Is this expected?


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16  3:57   ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  3:57 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen


On 14.01.2012, at 19:25, Raghavendra K T wrote:

> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
> 
> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
> another vcpu out of halt state.
> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.

Either way, thinking about this I stumbled over the following passage of his patch:

> +               unsigned count = SPIN_THRESHOLD;
> +
> +               do {
> +                       if (inc.head == inc.tail)
> +                               goto out;
> +                       cpu_relax();
> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
> +               } while (--count);
> +               __ticket_lock_spinning(lock, inc.tail);


That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.

Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.

Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!

So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.

Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.

Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.

> 
> Changes in V4:
> - reabsed to 3.2.0 pre.
> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
>  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
> 
> Changes in V3:
> - rebased to 3.2-rc1
> - use halt() instead of wait for kick hypercall.
> - modify kick hyper call to do wakeup halted vcpu.
> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
> - fix the potential race when zero_stat is read.
> - export debugfs_create_32 and add documentation to API.
> - use static inline and enum instead of ADDSTAT macro. 
> - add  barrier() in after setting kick_vcpu.
> - empty static inline function for kvm_spinlock_init.
> - combine the patches one and two readuce overhead.
> - make KVM_DEBUGFS depends on DEBUGFS.
> - include debugfs header unconditionally.
> 
> Changes in V2:
> - rebased patchesto -rc9
> - synchronization related changes based on Jeremy's changes 
> (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
> Stephan Diestelhorst <stephan.diestelhorst@amd.com>
> - enabling 32 bit guests
> - splitted patches into two more chunks
> 
> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>  Add debugfs support to print u32-arrays in debugfs
>  Add a hypercall to KVM hypervisor to support pv-ticketlocks
>  Added configuration support to enable debug information for KVM Guests
>  pv-ticketlocks support for linux guests running on KVM hypervisor
>  Add documentation on Hypercalls and features used for PV spinlock
> 
> Test Set up :
> The BASE patch is pre 3.2.0 + Jeremy's following patches.
> xadd (https://lkml.org/lkml/2011/10/4/328)
> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
> (Note:locked add change is not taken yet)
> 
> Results:
> The performance gain is mainly because of reduced busy-wait time.
> From the results we can see that patched kernel performance is similar to
> BASE when there is no lock contention. But once we start seeing more
> contention, patched kernel outperforms BASE (non PLE).
> On PLE machine we do not see greater performance improvement because of PLE
> complimenting halt()
> 
> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
> true with an instruction)
> 
> scenario A: unpinned
> 
> 1x: no hogs
> 2x: 8hogs in one guest
> 3x: 8hogs each in two guest
> 
> scenario B: unpinned, run kernbench on all the guests no hogs.
> 
> Dbench on PLE machine:
> dbench run on all the guest simultaneously with
> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
> 
> Result for Non PLE machine :
> ============================
> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
> 
> Result for PLE machine:
> ======================
> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>         online cores and 4*64GB RAM
> 
> Kernbench:
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:	 			
> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
> 
> Scenario B:
> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

So on a very contended system we're actually slower? Is this expected?


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  3:51       ` Srivatsa Vaddagiri
@ 2012-01-16  4:00         ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  4:00 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML, Dave Hansen


On 16.01.2012, at 04:51, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:
> 
>>> +5. KVM_HC_KICK_CPU
>>> +------------------------
>>> +value: 5
>>> +Architecture: x86
>>> +Purpose: Hypercall used to wakeup a vcpu from HLT state
>>> +
>>> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
>>> +kernel mode for an event to occur (ex: a spinlock to become available)
>>> +can execute HLT instruction once it has busy-waited for more than a
>>> +threshold time-interval. Execution of HLT instruction would cause
>>> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
>>> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
>>> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
>>> +wokenup.
>> 
>> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.
> 
> Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
> target vcpu to be prodded/wokenup, after which vcpu continues execution.
> 
> Note that semantics of this hypercall is different from the hypercall on which
> PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
> of ticketlocks on x86 (which does not allow us to easily store owning cpu
> details in lock word itself).

Yes, sorry for not being more exact in my wording. It is a directed yield(). Not like the normal old style thing that just says "I'm done, get some work to someone else" but more something like "I'm done, get some work to this specific guy over there" :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16  4:00         ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16  4:00 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML


On 16.01.2012, at 04:51, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:
> 
>>> +5. KVM_HC_KICK_CPU
>>> +------------------------
>>> +value: 5
>>> +Architecture: x86
>>> +Purpose: Hypercall used to wakeup a vcpu from HLT state
>>> +
>>> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
>>> +kernel mode for an event to occur (ex: a spinlock to become available)
>>> +can execute HLT instruction once it has busy-waited for more than a
>>> +threshold time-interval. Execution of HLT instruction would cause
>>> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
>>> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
>>> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
>>> +wokenup.
>> 
>> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.
> 
> Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
> target vcpu to be prodded/wokenup, after which vcpu continues execution.
> 
> Note that semantics of this hypercall is different from the hypercall on which
> PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
> of ticketlocks on x86 (which does not allow us to easily store owning cpu
> details in lock word itself).

Yes, sorry for not being more exact in my wording. It is a directed yield(). Not like the normal old style thing that just says "I'm done, get some work to someone else" but more something like "I'm done, get some work to this specific guy over there" :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  3:57   ` Alexander Graf
@ 2012-01-16  6:40     ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16  6:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:

> 
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
> 
>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>> 
>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>> another vcpu out of halt state.
>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
> 
> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.

No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

> 
> Either way, thinking about this I stumbled over the following passage of his patch:
> 
>> +               unsigned count = SPIN_THRESHOLD;
>> +
>> +               do {
>> +                       if (inc.head == inc.tail)
>> +                               goto out;
>> +                       cpu_relax();
>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>> +               } while (--count);
>> +               __ticket_lock_spinning(lock, inc.tail);
> 
> 
> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> 
> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.

I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

> 
> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!

Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.

But all of this is good to consider for future work, rather than being essential for the first version.

> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> 
> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.

Yes, that mechanism exists, but it doesn't solve a very interesting problem.

The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.

The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

	J

> 
>> 
>> Changes in V4:
>> - reabsed to 3.2.0 pre.
>> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
>> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
>> changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
>> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>> and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
>> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
>> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
>> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
>> 
>> Changes in V3:
>> - rebased to 3.2-rc1
>> - use halt() instead of wait for kick hypercall.
>> - modify kick hyper call to do wakeup halted vcpu.
>> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
>> - fix the potential race when zero_stat is read.
>> - export debugfs_create_32 and add documentation to API.
>> - use static inline and enum instead of ADDSTAT macro. 
>> - add  barrier() in after setting kick_vcpu.
>> - empty static inline function for kvm_spinlock_init.
>> - combine the patches one and two readuce overhead.
>> - make KVM_DEBUGFS depends on DEBUGFS.
>> - include debugfs header unconditionally.
>> 
>> Changes in V2:
>> - rebased patchesto -rc9
>> - synchronization related changes based on Jeremy's changes 
>> (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
>> Stephan Diestelhorst <stephan.diestelhorst@amd.com>
>> - enabling 32 bit guests
>> - splitted patches into two more chunks
>> 
>> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>> Add debugfs support to print u32-arrays in debugfs
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>> Added configuration support to enable debug information for KVM Guests
>> pv-ticketlocks support for linux guests running on KVM hypervisor
>> Add documentation on Hypercalls and features used for PV spinlock
>> 
>> Test Set up :
>> The BASE patch is pre 3.2.0 + Jeremy's following patches.
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
>> (Note:locked add change is not taken yet)
>> 
>> Results:
>> The performance gain is mainly because of reduced busy-wait time.
>> From the results we can see that patched kernel performance is similar to
>> BASE when there is no lock contention. But once we start seeing more
>> contention, patched kernel outperforms BASE (non PLE).
>> On PLE machine we do not see greater performance improvement because of PLE
>> complimenting halt()
>> 
>> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
>> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
>> true with an instruction)
>> 
>> scenario A: unpinned
>> 
>> 1x: no hogs
>> 2x: 8hogs in one guest
>> 3x: 8hogs each in two guest
>> 
>> scenario B: unpinned, run kernbench on all the guests no hogs.
>> 
>> Dbench on PLE machine:
>> dbench run on all the guest simultaneously with
>> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
>> 
>> Result for Non PLE machine :
>> ============================
>> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
>> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
>> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
>> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
>> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
>> 
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>        online cores and 4*64GB RAM
>> 
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>> 
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
> 
> So on a very contended system we're actually slower? Is this expected?
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16  6:40     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16  6:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:

> 
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
> 
>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>> 
>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>> another vcpu out of halt state.
>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
> 
> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.

No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

> 
> Either way, thinking about this I stumbled over the following passage of his patch:
> 
>> +               unsigned count = SPIN_THRESHOLD;
>> +
>> +               do {
>> +                       if (inc.head == inc.tail)
>> +                               goto out;
>> +                       cpu_relax();
>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>> +               } while (--count);
>> +               __ticket_lock_spinning(lock, inc.tail);
> 
> 
> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> 
> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.

I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

> 
> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!

Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.

But all of this is good to consider for future work, rather than being essential for the first version.

> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> 
> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.

Yes, that mechanism exists, but it doesn't solve a very interesting problem.

The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.

The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

	J

> 
>> 
>> Changes in V4:
>> - reabsed to 3.2.0 pre.
>> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
>> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
>> changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
>> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>> and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
>> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
>> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
>> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
>> 
>> Changes in V3:
>> - rebased to 3.2-rc1
>> - use halt() instead of wait for kick hypercall.
>> - modify kick hyper call to do wakeup halted vcpu.
>> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
>> - fix the potential race when zero_stat is read.
>> - export debugfs_create_32 and add documentation to API.
>> - use static inline and enum instead of ADDSTAT macro. 
>> - add  barrier() in after setting kick_vcpu.
>> - empty static inline function for kvm_spinlock_init.
>> - combine the patches one and two readuce overhead.
>> - make KVM_DEBUGFS depends on DEBUGFS.
>> - include debugfs header unconditionally.
>> 
>> Changes in V2:
>> - rebased patchesto -rc9
>> - synchronization related changes based on Jeremy's changes 
>> (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
>> Stephan Diestelhorst <stephan.diestelhorst@amd.com>
>> - enabling 32 bit guests
>> - splitted patches into two more chunks
>> 
>> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>> Add debugfs support to print u32-arrays in debugfs
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>> Added configuration support to enable debug information for KVM Guests
>> pv-ticketlocks support for linux guests running on KVM hypervisor
>> Add documentation on Hypercalls and features used for PV spinlock
>> 
>> Test Set up :
>> The BASE patch is pre 3.2.0 + Jeremy's following patches.
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
>> (Note:locked add change is not taken yet)
>> 
>> Results:
>> The performance gain is mainly because of reduced busy-wait time.
>> From the results we can see that patched kernel performance is similar to
>> BASE when there is no lock contention. But once we start seeing more
>> contention, patched kernel outperforms BASE (non PLE).
>> On PLE machine we do not see greater performance improvement because of PLE
>> complimenting halt()
>> 
>> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
>> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
>> true with an instruction)
>> 
>> scenario A: unpinned
>> 
>> 1x: no hogs
>> 2x: 8hogs in one guest
>> 3x: 8hogs each in two guest
>> 
>> scenario B: unpinned, run kernbench on all the guests no hogs.
>> 
>> Dbench on PLE machine:
>> dbench run on all the guest simultaneously with
>> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
>> 
>> Result for Non PLE machine :
>> ============================
>> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
>> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
>> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
>> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
>> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
>> 
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>        online cores and 4*64GB RAM
>> 
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>> 
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
> 
> So on a very contended system we're actually slower? Is this expected?
> 
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-16  3:12     ` Alexander Graf
@ 2012-01-16  7:25       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  7:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 08:42 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:26, Raghavendra K T wrote:
>
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>> support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 7a94987..cf5327c 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
>> void kvm_async_pf_task_wake(u32 token);
[...]
>> +}
>> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c7b05fc..4d7a950 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>
> This patch is mixing host and guest code. Please split those up.
>
>

Agree. The host code should have gone to patch 2.

> Alex
>
>> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16  7:25       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  7:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 08:42 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:26, Raghavendra K T wrote:
>
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>> support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 7a94987..cf5327c 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
>> void kvm_async_pf_task_wake(u32 token);
[...]
>> +}
>> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c7b05fc..4d7a950 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>
> This patch is mixing host and guest code. Please split those up.
>
>

Agree. The host code should have gone to patch 2.

> Alex
>
>> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2012-01-16  3:24     ` Alexander Graf
@ 2012-01-16  8:43       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  8:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 08:54 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>>
>> Qemu needs a corresponding patch to pass up the presence of this feature to
>> guest via cpuid. Patch to qemu will be sent separately.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 734c376..7a94987 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -16,12 +16,14 @@
>> #define KVM_FEATURE_CLOCKSOURCE		0
>> #define KVM_FEATURE_NOP_IO_DELAY	1
>> #define KVM_FEATURE_MMU_OP		2
>> +
>> /* This indicates that the new set of kvmclock msrs
>>   * are available. The use of 0x11 and 0x12 is deprecated
>>   */
>> #define KVM_FEATURE_CLOCKSOURCE2        3
>> #define KVM_FEATURE_ASYNC_PF		4
>> #define KVM_FEATURE_STEAL_TIME		5
>> +#define KVM_FEATURE_PVLOCK_KICK		6
>>
>> /* The last 8 bits are used to indicate how to interpret the flags field
>>   * in pvclock structure. If no bits are set, all flags are ignored.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4c938da..c7b05fc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>> 	case KVM_CAP_XSAVE:
>> 	case KVM_CAP_ASYNC_PF:
>> 	case KVM_CAP_GET_TSC_KHZ:
>> +	case KVM_CAP_PVLOCK_KICK:
>> 		r = 1;
>> 		break;
>> 	case KVM_CAP_COALESCED_MMIO:
>> @@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>> 			     (1<<  KVM_FEATURE_NOP_IO_DELAY) |
>> 			     (1<<  KVM_FEATURE_CLOCKSOURCE2) |
>> 			     (1<<  KVM_FEATURE_ASYNC_PF) |
>> -			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>> +			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
>> +			     (1<<  KVM_FEATURE_PVLOCK_KICK);
>>
>> 		if (sched_info_on())
>> 			entry->eax |= (1<<  KVM_FEATURE_STEAL_TIME);
>> @@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>> 	return 1;
>> }
>>
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> {
>> 	unsigned long nr, a0, a1, a2, a3, ret;
>> @@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> 	case KVM_HC_MMU_OP:
>> 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),&ret);
>> 		break;
>> +	case KVM_HC_KICK_CPU:
>> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
>> +		ret = 0;
>> +		break;
>> 	default:
>> 		ret = -KVM_ENOSYS;
>> 		break;
>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>> index 68e67e5..63fb6b0 100644
>> --- a/include/linux/kvm.h
>> +++ b/include/linux/kvm.h
>> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
>> #define KVM_CAP_PPC_PAPR 68
>> #define KVM_CAP_S390_GMAP 71
>> #define KVM_CAP_TSC_DEADLINE_TIMER 72
>> +#define KVM_CAP_PVLOCK_KICK 73
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index d526231..3b1ae7b 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -50,6 +50,7 @@
>> #define KVM_REQ_APF_HALT          12
>> #define KVM_REQ_STEAL_UPDATE      13
>> #define KVM_REQ_NMI               14
>> +#define KVM_REQ_PVLOCK_KICK       15
>
> Everything I see in this patch is pvlock agnostic. It's only a vcpu kick hypercall. So it's probably a good idea to also name it accordingly :).
>
>
> Alex
>
>

It was indeed KICK_VCPU in V4. But since we are currently dealing with
only pv locks it is renamed so.  But if we start using the code for
flush_tlb_others_ipi() optimization etc, it is good idea to rename
accordingly. OR even  go back to KICK_VCPU as used earlier..

  - Raghu

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-16  8:43       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  8:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 08:54 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>>
>> Qemu needs a corresponding patch to pass up the presence of this feature to
>> guest via cpuid. Patch to qemu will be sent separately.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 734c376..7a94987 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -16,12 +16,14 @@
>> #define KVM_FEATURE_CLOCKSOURCE		0
>> #define KVM_FEATURE_NOP_IO_DELAY	1
>> #define KVM_FEATURE_MMU_OP		2
>> +
>> /* This indicates that the new set of kvmclock msrs
>>   * are available. The use of 0x11 and 0x12 is deprecated
>>   */
>> #define KVM_FEATURE_CLOCKSOURCE2        3
>> #define KVM_FEATURE_ASYNC_PF		4
>> #define KVM_FEATURE_STEAL_TIME		5
>> +#define KVM_FEATURE_PVLOCK_KICK		6
>>
>> /* The last 8 bits are used to indicate how to interpret the flags field
>>   * in pvclock structure. If no bits are set, all flags are ignored.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4c938da..c7b05fc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>> 	case KVM_CAP_XSAVE:
>> 	case KVM_CAP_ASYNC_PF:
>> 	case KVM_CAP_GET_TSC_KHZ:
>> +	case KVM_CAP_PVLOCK_KICK:
>> 		r = 1;
>> 		break;
>> 	case KVM_CAP_COALESCED_MMIO:
>> @@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>> 			     (1<<  KVM_FEATURE_NOP_IO_DELAY) |
>> 			     (1<<  KVM_FEATURE_CLOCKSOURCE2) |
>> 			     (1<<  KVM_FEATURE_ASYNC_PF) |
>> -			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>> +			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
>> +			     (1<<  KVM_FEATURE_PVLOCK_KICK);
>>
>> 		if (sched_info_on())
>> 			entry->eax |= (1<<  KVM_FEATURE_STEAL_TIME);
>> @@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>> 	return 1;
>> }
>>
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> {
>> 	unsigned long nr, a0, a1, a2, a3, ret;
>> @@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> 	case KVM_HC_MMU_OP:
>> 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),&ret);
>> 		break;
>> +	case KVM_HC_KICK_CPU:
>> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
>> +		ret = 0;
>> +		break;
>> 	default:
>> 		ret = -KVM_ENOSYS;
>> 		break;
>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>> index 68e67e5..63fb6b0 100644
>> --- a/include/linux/kvm.h
>> +++ b/include/linux/kvm.h
>> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
>> #define KVM_CAP_PPC_PAPR 68
>> #define KVM_CAP_S390_GMAP 71
>> #define KVM_CAP_TSC_DEADLINE_TIMER 72
>> +#define KVM_CAP_PVLOCK_KICK 73
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index d526231..3b1ae7b 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -50,6 +50,7 @@
>> #define KVM_REQ_APF_HALT          12
>> #define KVM_REQ_STEAL_UPDATE      13
>> #define KVM_REQ_NMI               14
>> +#define KVM_REQ_PVLOCK_KICK       15
>
> Everything I see in this patch is pvlock agnostic. It's only a vcpu kick hypercall. So it's probably a good idea to also name it accordingly :).
>
>
> Alex
>
>

It was indeed KICK_VCPU in V4. But since we are currently dealing with
only pv locks it is renamed so.  But if we start using the code for
flush_tlb_others_ipi() optimization etc, it is good idea to rename
accordingly. OR even  go back to KICK_VCPU as used earlier..

  - Raghu

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  3:23     ` Alexander Graf
@ 2012-01-16  8:44       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  8:44 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On 01/16/2012 08:53 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>
>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>
>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>> paravirtual spinlock enabled guest.
>>
>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>> be enabled in guest. support in host is queried via
>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>
>> A minimal Documentation and template is added for hypercalls.
>>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> ---
[...]
>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>> new file mode 100644
>> index 0000000..7872da5
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>> @@ -0,0 +1,54 @@
>> +KVM Hypercalls Documentation
>> +===========================

>> +2. KVM_HC_MMU_OP
>> +------------------------
>> +value: 2
>> +Architecture: x86
>> +Purpose: Support MMU operations such as writing to PTE,
>> +flushing TLB, release PT.
>
> This one is deprecated, no? Should probably be mentioned here.

Ok, then may be adding state = deprecated/obsolete/in use (active) may
be good idea.

>
>> +
>> +3. KVM_HC_FEATURES
>> +------------------------
>> +value: 3
>> +Architecture: PPC
>> +Purpose:
>
> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>

Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

>> +
>> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
>> +------------------------
>> +value: 4
>> +Architecture: PPC
>> +Purpose: To enable communication between the hypervisor and guest there is a
>> +new
>
> It's not new anymore :)
>
>> shared page that contains parts of supervisor visible register state.
>> +The guest can map this shared page using this hypercall.
>
> ... to access its supervisor register through memory.
>

Will update accordingly.

- Raghu

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16  8:44       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  8:44 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On 01/16/2012 08:53 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>
>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>
>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>> paravirtual spinlock enabled guest.
>>
>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>> be enabled in guest. support in host is queried via
>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>
>> A minimal Documentation and template is added for hypercalls.
>>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> ---
[...]
>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>> new file mode 100644
>> index 0000000..7872da5
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>> @@ -0,0 +1,54 @@
>> +KVM Hypercalls Documentation
>> +===========================

>> +2. KVM_HC_MMU_OP
>> +------------------------
>> +value: 2
>> +Architecture: x86
>> +Purpose: Support MMU operations such as writing to PTE,
>> +flushing TLB, release PT.
>
> This one is deprecated, no? Should probably be mentioned here.

Ok, then may be adding state = deprecated/obsolete/in use (active) may
be good idea.

>
>> +
>> +3. KVM_HC_FEATURES
>> +------------------------
>> +value: 3
>> +Architecture: PPC
>> +Purpose:
>
> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>

Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

>> +
>> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
>> +------------------------
>> +value: 4
>> +Architecture: PPC
>> +Purpose: To enable communication between the hypervisor and guest there is a
>> +new
>
> It's not new anymore :)
>
>> shared page that contains parts of supervisor visible register state.
>> +The guest can map this shared page using this hypercall.
>
> ... to access its supervisor register through memory.
>

Will update accordingly.

- Raghu

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  4:00         ` Alexander Graf
  (?)
@ 2012-01-16  8:47         ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  8:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Virtualization, Greg Kroah-Hartman,
	LKML

On 01/16/2012 06:00 AM, Alexander Graf wrote:
> On 16.01.2012, at 04:51, Srivatsa Vaddagiri wrote:
>
> > * Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:
> > 
> >>> +5. KVM_HC_KICK_CPU
> >>> +------------------------
> >>> +value: 5
> >>> +Architecture: x86
> >>> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> >>> +
> >>> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> >>> +kernel mode for an event to occur (ex: a spinlock to become available)
> >>> +can execute HLT instruction once it has busy-waited for more than a
> >>> +threshold time-interval. Execution of HLT instruction would cause
> >>> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> >>> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> >>> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> >>> +wokenup.
> >> 
> >> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.
> > 
> > Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
> > target vcpu to be prodded/wokenup, after which vcpu continues execution.
> > 
> > Note that semantics of this hypercall is different from the hypercall on which
> > PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
> > of ticketlocks on x86 (which does not allow us to easily store owning cpu
> > details in lock word itself).
>
> Yes, sorry for not being more exact in my wording. It is a directed yield(). Not like the normal old style thing that just says "I'm done, get some work to someone else" but more something like "I'm done, get some work to this specific guy over there" :).
>

It's not a yield.  It unhalts a vcpu.  Kind of like an IPI, but without
actually issuing an interrupt on the target, and disregarding the
interrupt flag.  It says nothing about the source.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  6:40     ` Jeremy Fitzhardinge
@ 2012-01-16  8:55       ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  8:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose, S

On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
> > 
> > That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> > 
> > Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

The wakeup path is slower though.  The previous lock holder has to
hypercall, and the new lock holder has to be scheduled, and transition
from halted state to running (a vmentry).  So it's only a clear win if
we can do something with the cpu other than go into the idle loop.

> > 
> > Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
>
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?
>
> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
>
> But all of this is good to consider for future work, rather than being essential for the first version.

Agree.

> > So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> > 
> > Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
>
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
>
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

kvm does a directed yield to an unscheduled vcpu, selected in a round
robin fashion.  So if your overload factor is N (N runnable vcpus for
every physical cpu), and your spin counter waits for S cycles before
exiting, you will burn N*S cycles (actually more since there is overhead
involved, but lets fold it into S).

> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

Right.

>
> > Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
>
> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.
>

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16  8:55       ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  8:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
> > 
> > That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> > 
> > Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

The wakeup path is slower though.  The previous lock holder has to
hypercall, and the new lock holder has to be scheduled, and transition
from halted state to running (a vmentry).  So it's only a clear win if
we can do something with the cpu other than go into the idle loop.

> > 
> > Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
>
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?
>
> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
>
> But all of this is good to consider for future work, rather than being essential for the first version.

Agree.

> > So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> > 
> > Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
>
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
>
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

kvm does a directed yield to an unscheduled vcpu, selected in a round
robin fashion.  So if your overload factor is N (N runnable vcpus for
every physical cpu), and your spin counter waits for S cycles before
exiting, you will burn N*S cycles (actually more since there is overhead
involved, but lets fold it into S).

> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

Right.

>
> > Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
>
> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.
>

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-14 18:27   ` Raghavendra K T
@ 2012-01-16  9:00     ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:00 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:27 PM, Raghavendra K T wrote:
> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> +wokenup.

Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
advertised?

> +
> +TODO:
> +1. more information on input and output needed?
> +2. Add more detail to purpose of hypercalls.
>


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16  9:00     ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:00 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:27 PM, Raghavendra K T wrote:
> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> +wokenup.

Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
advertised?

> +
> +TODO:
> +1. more information on input and output needed?
> +2. Add more detail to purpose of hypercalls.
>


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2012-01-14 18:25   ` Raghavendra K T
@ 2012-01-16  9:03     ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:03 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:25 PM, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>     
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.

No need to discuss qemu in a kernel patch.

>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +

The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-16  9:03     ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:03 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:25 PM, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>     
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.

No need to discuss qemu in a kernel patch.

>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +

The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-14 18:26   ` Raghavendra K T
@ 2012-01-16  9:05     ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:05 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:26 PM, Raghavendra K T wrote:
> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
>
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
>
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
>

Please drop all of these and replace with tracepoints in the appropriate
spots.  Everything else (including the historgram) can be reconstructed
the tracepoints in userspace.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16  9:05     ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16  9:05 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/14/2012 08:26 PM, Raghavendra K T wrote:
> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
>
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
>
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
>

Please drop all of these and replace with tracepoints in the appropriate
spots.  Everything else (including the historgram) can be reconstructed
the tracepoints in userspace.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  9:00     ` Avi Kivity
  (?)
@ 2012-01-16  9:40     ` Srivatsa Vaddagiri
  2012-01-16 10:14       ` Avi Kivity
  -1 siblings, 1 reply; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16  9:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki

* Avi Kivity <avi@redhat.com> [2012-01-16 11:00:41]:

> Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
> advertised?

Hmm ..I don't think it will work when yield_on_hlt=0.

One option is to make the kick hypercall available only when
yield_on_hlt=1?

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2012-01-16  9:03     ` Avi Kivity
@ 2012-01-16  9:55       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  9:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 02:33 PM, Avi Kivity wrote:
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>
> The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.
>
>

Yes, Agree. as Alex also pointed, the related hunk from patch 4 should 
be added here.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2012-01-16  9:55       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16  9:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 02:33 PM, Avi Kivity wrote:
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>
> The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.
>
>

Yes, Agree. as Alex also pointed, the related hunk from patch 4 should 
be added here.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  9:40     ` Srivatsa Vaddagiri
@ 2012-01-16 10:14       ` Avi Kivity
  2012-01-16 14:11         ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 139+ messages in thread
From: Avi Kivity @ 2012-01-16 10:14 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki

On 01/16/2012 11:40 AM, Srivatsa Vaddagiri wrote:
> * Avi Kivity <avi@redhat.com> [2012-01-16 11:00:41]:
>
> > Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
> > advertised?
>
> Hmm ..I don't think it will work when yield_on_hlt=0.
>
> One option is to make the kick hypercall available only when
> yield_on_hlt=1?

It's not a good idea to tie various options together.  Features should
be orthogonal.

Can't we make it work?  Just have different handling for
KVM_REQ_PVLOCK_KICK (let's rename it, and the hypercall, PV_UNHALT,
since we can use it for non-locks too).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  6:40     ` Jeremy Fitzhardinge
@ 2012-01-16 10:24       ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 10:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose


On 16.01.2012, at 07:40, Jeremy Fitzhardinge wrote:

> On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:
> 
>> 
>> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>> 
>>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>>> 
>>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>>> another vcpu out of halt state.
>>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
>> 
>> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.
> 
> No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

Yup, that's a very good idea :)

> 
>> 
>> Either way, thinking about this I stumbled over the following passage of his patch:
>> 
>>> +               unsigned count = SPIN_THRESHOLD;
>>> +
>>> +               do {
>>> +                       if (inc.head == inc.tail)
>>> +                               goto out;
>>> +                       cpu_relax();
>>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>>> +               } while (--count);
>>> +               __ticket_lock_spinning(lock, inc.tail);
>> 
>> 
>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>> 
>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
> 
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
> 
>> 
>> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
> 
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

I'm saying what I'm saying in the next paragraph :). The guest doesn't know, but the host does. So if we had shared memory between guest and host, the host could put its threshold limit in there, which on an idle system could be -1 and on a contended system could be 1.

> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
> 
> But all of this is good to consider for future work, rather than being essential for the first version.

Well, yes, of course! It's by no means an objection to what's there today. I'm just trying to think of ways to make it even better :)

> 
>> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
>> 
>> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
> 
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
> 
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.
> 
> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

This is true in case you're spinning. If on overcommit spinlocks would instead of spin just yield(), we wouldn't have any vcpu running that's just waiting for a late ticket.

We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we

  * don't change the uncontended case
  * can set the threshold on the host, which knows how contended the system is

And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
> 
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

Yes, that would be very good to have :)


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 10:24       ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 10:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen


On 16.01.2012, at 07:40, Jeremy Fitzhardinge wrote:

> On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:
> 
>> 
>> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>> 
>>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>>> 
>>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>>> another vcpu out of halt state.
>>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
>> 
>> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.
> 
> No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

Yup, that's a very good idea :)

> 
>> 
>> Either way, thinking about this I stumbled over the following passage of his patch:
>> 
>>> +               unsigned count = SPIN_THRESHOLD;
>>> +
>>> +               do {
>>> +                       if (inc.head == inc.tail)
>>> +                               goto out;
>>> +                       cpu_relax();
>>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>>> +               } while (--count);
>>> +               __ticket_lock_spinning(lock, inc.tail);
>> 
>> 
>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>> 
>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
> 
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
> 
>> 
>> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
> 
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

I'm saying what I'm saying in the next paragraph :). The guest doesn't know, but the host does. So if we had shared memory between guest and host, the host could put its threshold limit in there, which on an idle system could be -1 and on a contended system could be 1.

> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
> 
> But all of this is good to consider for future work, rather than being essential for the first version.

Well, yes, of course! It's by no means an objection to what's there today. I'm just trying to think of ways to make it even better :)

> 
>> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
>> 
>> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
> 
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
> 
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.
> 
> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

This is true in case you're spinning. If on overcommit spinlocks would instead of spin just yield(), we wouldn't have any vcpu running that's just waiting for a late ticket.

We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we

  * don't change the uncontended case
  * can set the threshold on the host, which knows how contended the system is

And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
> 
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

Yes, that would be very good to have :)


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16  8:44       ` Raghavendra K T
@ 2012-01-16 10:26         ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 10:26 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen


On 16.01.2012, at 09:44, Raghavendra K T wrote:

> On 01/16/2012 08:53 AM, Alexander Graf wrote:
>> 
>> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>> 
>>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>> 
>>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>>> paravirtual spinlock enabled guest.
>>> 
>>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>>> be enabled in guest. support in host is queried via
>>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>> 
>>> A minimal Documentation and template is added for hypercalls.
>>> 
>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>>> ---
> [...]
>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>>> new file mode 100644
>>> index 0000000..7872da5
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>>> @@ -0,0 +1,54 @@
>>> +KVM Hypercalls Documentation
>>> +===========================
> 
>>> +2. KVM_HC_MMU_OP
>>> +------------------------
>>> +value: 2
>>> +Architecture: x86
>>> +Purpose: Support MMU operations such as writing to PTE,
>>> +flushing TLB, release PT.
>> 
>> This one is deprecated, no? Should probably be mentioned here.
> 
> Ok, then may be adding state = deprecated/obsolete/in use (active) may
> be good idea.
> 
>> 
>>> +
>>> +3. KVM_HC_FEATURES
>>> +------------------------
>>> +value: 3
>>> +Architecture: PPC
>>> +Purpose:
>> 
>> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>> 
> 
> Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

I don't think you need a signed-off-by from me for this very simple documentation addition :). You should probably also reword it. I didn't quite write it as a paragraph that should go into the file.


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-16 10:26         ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 10:26 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen


On 16.01.2012, at 09:44, Raghavendra K T wrote:

> On 01/16/2012 08:53 AM, Alexander Graf wrote:
>> 
>> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>> 
>>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>> 
>>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>>> paravirtual spinlock enabled guest.
>>> 
>>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>>> be enabled in guest. support in host is queried via
>>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>> 
>>> A minimal Documentation and template is added for hypercalls.
>>> 
>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>>> ---
> [...]
>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>>> new file mode 100644
>>> index 0000000..7872da5
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>>> @@ -0,0 +1,54 @@
>>> +KVM Hypercalls Documentation
>>> +===========================
> 
>>> +2. KVM_HC_MMU_OP
>>> +------------------------
>>> +value: 2
>>> +Architecture: x86
>>> +Purpose: Support MMU operations such as writing to PTE,
>>> +flushing TLB, release PT.
>> 
>> This one is deprecated, no? Should probably be mentioned here.
> 
> Ok, then may be adding state = deprecated/obsolete/in use (active) may
> be good idea.
> 
>> 
>>> +
>>> +3. KVM_HC_FEATURES
>>> +------------------------
>>> +value: 3
>>> +Architecture: PPC
>>> +Purpose:
>> 
>> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>> 
> 
> Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

I don't think you need a signed-off-by from me for this very simple documentation addition :). You should probably also reword it. I didn't quite write it as a paragraph that should go into the file.


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  3:57   ` Alexander Graf
@ 2012-01-16 13:43     ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 13:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 09:27 AM, Alexander Graf wrote:
>
[...]
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>          online cores and 4*64GB RAM
>>
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>>
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>>
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>                	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
>
> So on a very contended system we're actually slower? Is this expected?
>
>

I think, the result is interesting because its PLE machine. I have to 
experiment more with parameters, SPIN_THRESHOLD, and also may be ple_gap 
and ple_window.

> Alex
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 13:43     ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 13:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen

On 01/16/2012 09:27 AM, Alexander Graf wrote:
>
[...]
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>          online cores and 4*64GB RAM
>>
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>>
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>>
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>                	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
>
> So on a very contended system we're actually slower? Is this expected?
>
>

I think, the result is interesting because its PLE machine. I have to 
experiment more with parameters, SPIN_THRESHOLD, and also may be ple_gap 
and ple_window.

> Alex
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 13:43     ` Raghavendra K T
@ 2012-01-16 13:49       ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16 13:49 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 03:43 PM, Raghavendra K T wrote:
>>> Dbench:
>>> Throughput is in MB/sec
>>> NRCLIENTS     BASE                    BASE+patch           
>>> %improvement
>>>                     mean (sd)               mean (sd)
>>> 8           1.101190  (0.875082)     1.700395  (0.846809)     54.4143
>>> 16          1.524312  (0.120354)     1.477553  (0.058166)     -3.06755
>>> 32            2.143028  (0.157103)     2.090307  (0.136778)    
>>> -2.46012
>>
>> So on a very contended system we're actually slower? Is this expected?
>>
>>
>
>
> I think, the result is interesting because its PLE machine. I have to
> experiment more with parameters, SPIN_THRESHOLD, and also may be
> ple_gap and ple_window.

Perhaps the PLE stuff fights with the PV stuff?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 13:49       ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16 13:49 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 03:43 PM, Raghavendra K T wrote:
>>> Dbench:
>>> Throughput is in MB/sec
>>> NRCLIENTS     BASE                    BASE+patch           
>>> %improvement
>>>                     mean (sd)               mean (sd)
>>> 8           1.101190  (0.875082)     1.700395  (0.846809)     54.4143
>>> 16          1.524312  (0.120354)     1.477553  (0.058166)     -3.06755
>>> 32            2.143028  (0.157103)     2.090307  (0.136778)    
>>> -2.46012
>>
>> So on a very contended system we're actually slower? Is this expected?
>>
>>
>
>
> I think, the result is interesting because its PLE machine. I have to
> experiment more with parameters, SPIN_THRESHOLD, and also may be
> ple_gap and ple_window.

Perhaps the PLE stuff fights with the PV stuff?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16 10:14       ` Avi Kivity
@ 2012-01-16 14:11         ` Srivatsa Vaddagiri
  2012-01-17  9:14             ` Gleb Natapov
  0 siblings, 1 reply; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16 14:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki

* Avi Kivity <avi@redhat.com> [2012-01-16 12:14:27]:

> > One option is to make the kick hypercall available only when
> > yield_on_hlt=1?
> 
> It's not a good idea to tie various options together.  Features should
> be orthogonal.
> 
> Can't we make it work?  Just have different handling for
> KVM_REQ_PVLOCK_KICK (let 's rename it, and the hypercall, PV_UNHALT,
> since we can use it for non-locks too).

The problem case I was thinking of was when guest VCPU would have issued
HLT with interrupts disabled. I guess one option is to inject an NMI,
and have the guest kernel NMI handler recognize this and make
adjustments such that the vcpu avoids going back to HLT instruction.

Having another hypercall to do yield/sleep (rather than effecting that
via HLT) seems like an alternate clean solution here ..

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-16  9:05     ` Avi Kivity
@ 2012-01-16 14:13       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 14:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 02:35 PM, Avi Kivity wrote:
> On 01/14/2012 08:26 PM, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>>   support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>> +
>> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug,&zero_stats);
>> +
>> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW]);
>> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
>> +
>> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW]);
>> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
>> +
>> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
>> +			&spinlock_stats.time_blocked);
>> +
>> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
>> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
>> +
>>
>
> Please drop all of these and replace with tracepoints in the appropriate
> spots.  Everything else (including the historgram) can be reconstructed
> the tracepoints in userspace.
>

I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
is the option.. no ?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16 14:13       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 14:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 02:35 PM, Avi Kivity wrote:
> On 01/14/2012 08:26 PM, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>>   support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>> +
>> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug,&zero_stats);
>> +
>> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW]);
>> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
>> +
>> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW]);
>> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
>> +
>> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
>> +			&spinlock_stats.time_blocked);
>> +
>> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
>> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
>> +
>>
>
> Please drop all of these and replace with tracepoints in the appropriate
> spots.  Everything else (including the historgram) can be reconstructed
> the tracepoints in userspace.
>

I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
is the option.. no ?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  3:57   ` Alexander Graf
@ 2012-01-16 14:20     ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16 14:20 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

* Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?

You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
some workload(s)?

In some sense, the 1x overcommitcase results posted does measure the overhead
of (pv-)spinlocks no? We don't see any overhead in that case for atleast
kernbench ..

> Result for Non PLE machine :
> ============================

[snip]

> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517

[snip]

> Result for PLE machine:
> ======================

[snip]
> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 14:20     ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-16 14:20 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML

* Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?

You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
some workload(s)?

In some sense, the 1x overcommitcase results posted does measure the overhead
of (pv-)spinlocks no? We don't see any overhead in that case for atleast
kernbench ..

> Result for Non PLE machine :
> ============================

[snip]

> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517

[snip]

> Result for PLE machine:
> ======================

[snip]
> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 14:20     ` Srivatsa Vaddagiri
@ 2012-01-16 14:23       ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 14:23 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen


On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:
> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
> 
> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
> some workload(s)?

Yup

> 
> In some sense, the 1x overcommitcase results posted does measure the overhead
> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
> kernbench ..
> 
>> Result for Non PLE machine :
>> ============================
> 
> [snip]
> 
>> Kernbench:
>>               BASE                    BASE+patch

What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.


Alex

>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517
> 
> [snip]
> 
>> Result for PLE machine:
>> ======================
> 
> [snip]
>> Kernbench:
>>               BASE                    BASE+patch
>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953
> 
> - vatsa
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 14:23       ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 14:23 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML


On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:
> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
> 
> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
> some workload(s)?

Yup

> 
> In some sense, the 1x overcommitcase results posted does measure the overhead
> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
> kernbench ..
> 
>> Result for Non PLE machine :
>> ============================
> 
> [snip]
> 
>> Kernbench:
>>               BASE                    BASE+patch

What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.


Alex

>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517
> 
> [snip]
> 
>> Result for PLE machine:
>> ======================
> 
> [snip]
>> Kernbench:
>>               BASE                    BASE+patch
>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953
> 
> - vatsa
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-16 14:13       ` Raghavendra K T
@ 2012-01-16 14:47         ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16 14:47 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>> Please drop all of these and replace with tracepoints in the appropriate
>> spots.  Everything else (including the historgram) can be reconstructed
>> the tracepoints in userspace.
>>
>
>
> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
> is the option.. no ?
>

Yeah, I think you're right.  What a pity.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16 14:47         ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-16 14:47 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>> Please drop all of these and replace with tracepoints in the appropriate
>> spots.  Everything else (including the historgram) can be reconstructed
>> the tracepoints in userspace.
>>
>
>
> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
> is the option.. no ?
>

Yeah, I think you're right.  What a pity.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 14:23       ` Alexander Graf
@ 2012-01-16 18:38         ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 18:38 UTC (permalink / raw)
  To: Alexander Graf, Jeremy Fitzhardinge
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On 01/16/2012 07:53 PM, Alexander Graf wrote:
>
> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>
>> * Alexander Graf<agraf@suse.de>  [2012-01-16 04:57:45]:
>>
>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>
>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>> some workload(s)?
>
> Yup
>
>>
>> In some sense, the 1x overcommitcase results posted does measure the overhead
>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>> kernbench ..
>>
>>> Result for Non PLE machine :
>>> ============================
>>
>> [snip]
>>
>>> Kernbench:
>>>                BASE                    BASE+patch
>
> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>
>
> Alex

Sorry for confusion, I think I was little imprecise on the BASE.

The BASE is pre 3.2.0 + Jeremy's following patches:
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
So this would have ticketlock cleanups from Jeremy and
CONFIG_PARAVIRT_SPINLOCKS=y

BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
series and CONFIG_PARAVIRT_SPINLOCKS=y

In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.

So let,
A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
D. pre-3.2.0 + Jeremy's above patches + V5 patches with 
CONFIG_PARAVIRT_SPINLOCKS = n
E. pre-3.2.0 + Jeremy's above patches + V5 patches with 
CONFIG_PARAVIRT_SPINLOCKS = y

is it performance of A vs E ? (currently C vs E)

Please let me know the configuration expected for testing.

Jeremy,
I would be happy to test A vs B vs C vs E, for some workload of interest 
if you wish, for your upcoming patches.

Thanks and Regards
Raghu

>
>>>                %improvement
>>>                mean (sd)               mean (sd)
>>> Scenario A:
>>> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517
>>
>> [snip]
>>
>>> Result for PLE machine:
>>> ======================
>>
>> [snip]
>>> Kernbench:
>>>                BASE                    BASE+patch
>>>                %improvement
>>>                mean (sd)               mean (sd)
>>> Scenario A:
>>> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953
>>
>> - vatsa
>>
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 18:38         ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 18:38 UTC (permalink / raw)
  To: Alexander Graf, Jeremy Fitzhardinge
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On 01/16/2012 07:53 PM, Alexander Graf wrote:
>
> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>
>> * Alexander Graf<agraf@suse.de>  [2012-01-16 04:57:45]:
>>
>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>
>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>> some workload(s)?
>
> Yup
>
>>
>> In some sense, the 1x overcommitcase results posted does measure the overhead
>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>> kernbench ..
>>
>>> Result for Non PLE machine :
>>> ============================
>>
>> [snip]
>>
>>> Kernbench:
>>>                BASE                    BASE+patch
>
> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>
>
> Alex

Sorry for confusion, I think I was little imprecise on the BASE.

The BASE is pre 3.2.0 + Jeremy's following patches:
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
So this would have ticketlock cleanups from Jeremy and
CONFIG_PARAVIRT_SPINLOCKS=y

BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
series and CONFIG_PARAVIRT_SPINLOCKS=y

In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.

So let,
A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
D. pre-3.2.0 + Jeremy's above patches + V5 patches with 
CONFIG_PARAVIRT_SPINLOCKS = n
E. pre-3.2.0 + Jeremy's above patches + V5 patches with 
CONFIG_PARAVIRT_SPINLOCKS = y

is it performance of A vs E ? (currently C vs E)

Please let me know the configuration expected for testing.

Jeremy,
I would be happy to test A vs B vs C vs E, for some workload of interest 
if you wish, for your upcoming patches.

Thanks and Regards
Raghu

>
>>>                %improvement
>>>                mean (sd)               mean (sd)
>>> Scenario A:
>>> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517
>>
>> [snip]
>>
>>> Result for PLE machine:
>>> ======================
>>
>> [snip]
>>> Kernbench:
>>>                BASE                    BASE+patch
>>>                %improvement
>>>                mean (sd)               mean (sd)
>>> Scenario A:
>>> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953
>>
>> - vatsa
>>
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 18:38         ` Raghavendra K T
@ 2012-01-16 18:42           ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 18:42 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen


On 16.01.2012, at 19:38, Raghavendra K T wrote:

> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>> 
>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>> 
>>> * Alexander Graf<agraf@suse.de>  [2012-01-16 04:57:45]:
>>> 
>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>> 
>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>> some workload(s)?
>> 
>> Yup
>> 
>>> 
>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>> kernbench ..
>>> 
>>>> Result for Non PLE machine :
>>>> ============================
>>> 
>>> [snip]
>>> 
>>>> Kernbench:
>>>>               BASE                    BASE+patch
>> 
>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>> 
>> 
>> Alex
> 
> Sorry for confusion, I think I was little imprecise on the BASE.
> 
> The BASE is pre 3.2.0 + Jeremy's following patches:
> xadd (https://lkml.org/lkml/2011/10/4/328)
> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
> So this would have ticketlock cleanups from Jeremy and
> CONFIG_PARAVIRT_SPINLOCKS=y
> 
> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
> series and CONFIG_PARAVIRT_SPINLOCKS=y
> 
> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
> 
> So let,
> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
> 
> is it performance of A vs E ? (currently C vs E)

Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 18:42           ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-16 18:42 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen


On 16.01.2012, at 19:38, Raghavendra K T wrote:

> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>> 
>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>> 
>>> * Alexander Graf<agraf@suse.de>  [2012-01-16 04:57:45]:
>>> 
>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>> 
>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>> some workload(s)?
>> 
>> Yup
>> 
>>> 
>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>> kernbench ..
>>> 
>>>> Result for Non PLE machine :
>>>> ============================
>>> 
>>> [snip]
>>> 
>>>> Kernbench:
>>>>               BASE                    BASE+patch
>> 
>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>> 
>> 
>> Alex
> 
> Sorry for confusion, I think I was little imprecise on the BASE.
> 
> The BASE is pre 3.2.0 + Jeremy's following patches:
> xadd (https://lkml.org/lkml/2011/10/4/328)
> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
> So this would have ticketlock cleanups from Jeremy and
> CONFIG_PARAVIRT_SPINLOCKS=y
> 
> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
> series and CONFIG_PARAVIRT_SPINLOCKS=y
> 
> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
> 
> So let,
> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
> 
> is it performance of A vs E ? (currently C vs E)

Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).


Alex

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 13:49       ` Avi Kivity
@ 2012-01-16 18:48         ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 18:48 UTC (permalink / raw)
  To: Avi Kivity, Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Rob Landley, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 07:19 PM, Avi Kivity wrote:
> On 01/16/2012 03:43 PM, Raghavendra K T wrote:
>>>> Dbench:
>>>> Throughput is in MB/sec
>>>> NRCLIENTS     BASE                    BASE+patch
>>>> %improvement
>>>>                      mean (sd)               mean (sd)
>>>> 8           1.101190  (0.875082)     1.700395  (0.846809)     54.4143
>>>> 16          1.524312  (0.120354)     1.477553  (0.058166)     -3.06755
>>>> 32            2.143028  (0.157103)     2.090307  (0.136778)
>>>> -2.46012
>>>
>>> So on a very contended system we're actually slower? Is this expected?
>>>
>>>
>>
>>
>> I think, the result is interesting because its PLE machine. I have to
>> experiment more with parameters, SPIN_THRESHOLD, and also may be
>> ple_gap and ple_window.
>
> Perhaps the PLE stuff fights with the PV stuff?
>

I also think so. The slight advantage in PLE, with current patch would
be  that, we are be able to say " This is the next guy who should
probably get his turn".  But If total number of unnecessary "halt
exits" disadvantage dominates above advantage, then we see degradation.

One clarification in above benchmarking is,  Dbench is run
simultaneously on all (8 vcpu) 3 guests. So we already have 1:3 
overcommit when we run 8 clients of dbench. after that it was just 
increasing number of clients.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 18:48         ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-16 18:48 UTC (permalink / raw)
  To: Avi Kivity, Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Rob Landley, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki

On 01/16/2012 07:19 PM, Avi Kivity wrote:
> On 01/16/2012 03:43 PM, Raghavendra K T wrote:
>>>> Dbench:
>>>> Throughput is in MB/sec
>>>> NRCLIENTS     BASE                    BASE+patch
>>>> %improvement
>>>>                      mean (sd)               mean (sd)
>>>> 8           1.101190  (0.875082)     1.700395  (0.846809)     54.4143
>>>> 16          1.524312  (0.120354)     1.477553  (0.058166)     -3.06755
>>>> 32            2.143028  (0.157103)     2.090307  (0.136778)
>>>> -2.46012
>>>
>>> So on a very contended system we're actually slower? Is this expected?
>>>
>>>
>>
>>
>> I think, the result is interesting because its PLE machine. I have to
>> experiment more with parameters, SPIN_THRESHOLD, and also may be
>> ple_gap and ple_window.
>
> Perhaps the PLE stuff fights with the PV stuff?
>

I also think so. The slight advantage in PLE, with current patch would
be  that, we are be able to say " This is the next guy who should
probably get his turn".  But If total number of unnecessary "halt
exits" disadvantage dominates above advantage, then we see degradation.

One clarification in above benchmarking is,  Dbench is run
simultaneously on all (8 vcpu) 3 guests. So we already have 1:3 
overcommit when we run 8 clients of dbench. after that it was just 
increasing number of clients.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-16 14:47         ` Avi Kivity
@ 2012-01-16 23:49           ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16 23:49 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose, S

On 01/17/2012 01:47 AM, Avi Kivity wrote:
> On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>>> Please drop all of these and replace with tracepoints in the appropriate
>>> spots.  Everything else (including the historgram) can be reconstructed
>>> the tracepoints in userspace.
>>>
>>
>> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
>> is the option.. no ?
>>
> Yeah, I think you're right.  What a pity.

Yes, I went through the same thought process.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-16 23:49           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16 23:49 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/17/2012 01:47 AM, Avi Kivity wrote:
> On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>>> Please drop all of these and replace with tracepoints in the appropriate
>>> spots.  Everything else (including the historgram) can be reconstructed
>>> the tracepoints in userspace.
>>>
>>
>> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
>> is the option.. no ?
>>
> Yeah, I think you're right.  What a pity.

Yes, I went through the same thought process.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16  8:55       ` Avi Kivity
@ 2012-01-16 23:59         ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16 23:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose, S

On 01/16/2012 07:55 PM, Avi Kivity wrote:
> On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
>>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>>>
>>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
> The wakeup path is slower though.  The previous lock holder has to
> hypercall, and the new lock holder has to be scheduled, and transition
> from halted state to running (a vmentry).  So it's only a clear win if
> we can do something with the cpu other than go into the idle loop.

Not burning power is a win too.

Actually what you want is something like "if you preempt a VCPU while
its spinning in a lock, then push it into the slowpath and don't
reschedule it without a kick".  But I think that interface would have a
lot of fiddly corners.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-16 23:59         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-16 23:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/16/2012 07:55 PM, Avi Kivity wrote:
> On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
>>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>>>
>>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
> The wakeup path is slower though.  The previous lock holder has to
> hypercall, and the new lock holder has to be scheduled, and transition
> from halted state to running (a vmentry).  So it's only a clear win if
> we can do something with the cpu other than go into the idle loop.

Not burning power is a win too.

Actually what you want is something like "if you preempt a VCPU while
its spinning in a lock, then push it into the slowpath and don't
reschedule it without a kick".  But I think that interface would have a
lot of fiddly corners.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 10:24       ` Alexander Graf
@ 2012-01-17  0:30         ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-17  0:30 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/16/2012 09:24 PM, Alexander Graf wrote:
> This is true in case you're spinning. If on overcommit spinlocks would
> instead of spin just yield(), we wouldn't have any vcpu running that's
> just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>
>   * don't change the uncontended case

I don't follow you.  What do you mean by "the normal spin dance"?  What
do you mean by "have another spinlock notify us in the CPU"?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

>   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the "slowpath if preempted while spinning" idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a "resched handler block"
  * if vcpu is preempted within such a range, make sure it is
    rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
> You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

It still has a "pause" instruction in that loop, so that CPU mechanism
will still come into play.  "pause" doesn't directly "save power"; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of "calculations".

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-17  0:30         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-17  0:30 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On 01/16/2012 09:24 PM, Alexander Graf wrote:
> This is true in case you're spinning. If on overcommit spinlocks would
> instead of spin just yield(), we wouldn't have any vcpu running that's
> just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>
>   * don't change the uncontended case

I don't follow you.  What do you mean by "the normal spin dance"?  What
do you mean by "have another spinlock notify us in the CPU"?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

>   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the "slowpath if preempted while spinning" idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a "resched handler block"
  * if vcpu is preempted within such a range, make sure it is
    rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
> You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

It still has a "pause" instruction in that loop, so that CPU mechanism
will still come into play.  "pause" doesn't directly "save power"; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of "calculations".

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-16 14:11         ` Srivatsa Vaddagiri
@ 2012-01-17  9:14             ` Gleb Natapov
  0 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17  9:14 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Mon, Jan 16, 2012 at 07:41:17PM +0530, Srivatsa Vaddagiri wrote:
> * Avi Kivity <avi@redhat.com> [2012-01-16 12:14:27]:
> 
> > > One option is to make the kick hypercall available only when
> > > yield_on_hlt=1?
> > 
> > It's not a good idea to tie various options together.  Features should
> > be orthogonal.
> > 
> > Can't we make it work?  Just have different handling for
> > KVM_REQ_PVLOCK_KICK (let 's rename it, and the hypercall, PV_UNHALT,
> > since we can use it for non-locks too).
> 
> The problem case I was thinking of was when guest VCPU would have issued
> HLT with interrupts disabled. I guess one option is to inject an NMI,
> and have the guest kernel NMI handler recognize this and make
> adjustments such that the vcpu avoids going back to HLT instruction.
> 
Just kick vcpu out of a guest mode and adjust rip to point after HLT on
next re-entry. Don't forget to call vmx_clear_hlt().

> Having another hypercall to do yield/sleep (rather than effecting that
> via HLT) seems like an alternate clean solution here ..
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17  9:14             ` Gleb Natapov
  0 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17  9:14 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On Mon, Jan 16, 2012 at 07:41:17PM +0530, Srivatsa Vaddagiri wrote:
> * Avi Kivity <avi@redhat.com> [2012-01-16 12:14:27]:
> 
> > > One option is to make the kick hypercall available only when
> > > yield_on_hlt=1?
> > 
> > It's not a good idea to tie various options together.  Features should
> > be orthogonal.
> > 
> > Can't we make it work?  Just have different handling for
> > KVM_REQ_PVLOCK_KICK (let 's rename it, and the hypercall, PV_UNHALT,
> > since we can use it for non-locks too).
> 
> The problem case I was thinking of was when guest VCPU would have issued
> HLT with interrupts disabled. I guess one option is to inject an NMI,
> and have the guest kernel NMI handler recognize this and make
> adjustments such that the vcpu avoids going back to HLT instruction.
> 
Just kick vcpu out of a guest mode and adjust rip to point after HLT on
next re-entry. Don't forget to call vmx_clear_hlt().

> Having another hypercall to do yield/sleep (rather than effecting that
> via HLT) seems like an alternate clean solution here ..
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-14 18:26   ` Raghavendra K T
@ 2012-01-17 11:02     ` Marcelo Tosatti
  -1 siblings, 0 replies; 139+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 11:02 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
> 
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
> 
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 7a94987..cf5327c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
>  void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_pf_reason(void);
>  extern void kvm_disable_steal_time(void);
> -#else
> -#define kvm_guest_init() do { } while (0)
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +void __init kvm_spinlock_init(void);
> +#else /* CONFIG_PARAVIRT_SPINLOCKS */
> +static void kvm_spinlock_init(void)
> +{
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#else /* CONFIG_KVM_GUEST */
> +#define kvm_guest_init() do {} while (0)
>  #define kvm_async_pf_task_wait(T) do {} while(0)
>  #define kvm_async_pf_task_wake(T) do {} while(0)
> +
>  static inline u32 kvm_read_and_reset_pf_reason(void)
>  {
>  	return 0;
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a9c2116..ec55a0b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -33,6 +33,7 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/kprobes.h>
> +#include <linux/debugfs.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
>  #endif
>  	kvm_guest_cpu_init();
>  	native_smp_prepare_boot_cpu();
> +	kvm_spinlock_init();
>  }
>  
>  static void __cpuinit kvm_guest_cpu_online(void *dummy)
> @@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
>  	return 0;
>  }
>  arch_initcall(activate_jump_labels);
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +
> +enum kvm_contention_stat {
> +	TAKEN_SLOW,
> +	TAKEN_SLOW_PICKUP,
> +	RELEASED_SLOW,
> +	RELEASED_SLOW_KICKED,
> +	NR_CONTENTION_STATS
> +};
> +
> +#ifdef CONFIG_KVM_DEBUG_FS
> +
> +static struct kvm_spinlock_stats
> +{
> +	u32 contention_stats[NR_CONTENTION_STATS];
> +
> +#define HISTO_BUCKETS	30
> +	u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> +	u64 time_blocked;
> +} spinlock_stats;
> +
> +static u8 zero_stats;
> +
> +static inline void check_zero(void)
> +{
> +	u8 ret;
> +	u8 old = ACCESS_ONCE(zero_stats);
> +	if (unlikely(old)) {
> +		ret = cmpxchg(&zero_stats, old, 0);
> +		/* This ensures only one fellow resets the stat */
> +		if (ret == old)
> +			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
> +	}
> +}
> +
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +	check_zero();
> +	spinlock_stats.contention_stats[var] += val;
> +}
> +
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return sched_clock();
> +}
> +
> +static void __spin_time_accum(u64 delta, u32 *array)
> +{
> +	unsigned index = ilog2(delta);
> +
> +	check_zero();
> +
> +	if (index < HISTO_BUCKETS)
> +		array[index]++;
> +	else
> +		array[HISTO_BUCKETS]++;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +	u32 delta = sched_clock() - start;
> +
> +	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
> +	spinlock_stats.time_blocked += delta;
> +}
> +
> +static struct dentry *d_spin_debug;
> +static struct dentry *d_kvm_debug;
> +
> +struct dentry *kvm_init_debugfs(void)
> +{
> +	d_kvm_debug = debugfs_create_dir("kvm", NULL);
> +	if (!d_kvm_debug)
> +		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
> +
> +	return d_kvm_debug;
> +}
> +
> +static int __init kvm_spinlock_debugfs(void)
> +{
> +	struct dentry *d_kvm = kvm_init_debugfs();
> +
> +	if (d_kvm == NULL)
> +		return -ENOMEM;
> +
> +	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
> +	return 0;
> +}
> +fs_initcall(kvm_spinlock_debugfs);
> +#else  /* !CONFIG_KVM_DEBUG_FS */
> +#define TIMEOUT			(1 << 10)
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +}
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return 0;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +}
> +#endif  /* CONFIG_KVM_DEBUG_FS */
> +
> +struct kvm_lock_waiting {
> +	struct arch_spinlock *lock;
> +	__ticket_t want;
> +};
> +
> +/* cpus 'waiting' on a spinlock to become available */
> +static cpumask_t waiting_cpus;
> +
> +/* Track spinlock on which a cpu is waiting */
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
> +	int cpu = smp_processor_id();
> +	u64 start;
> +	unsigned long flags;
> +
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/* Allow interrupts while blocked */
> +	local_irq_restore(flags);
> +
> +	/* halt until it's our turn and kicked. */
> +	halt();
> +
> +	local_irq_save(flags);
> +out:
> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick a cpu by its apicid*/
> +static inline void kvm_kick_cpu(int apicid)
> +{
> +	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
> +}
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +	int apicid;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> +			kvm_kick_cpu(apicid);
> +			break;
> +		}
> +	}

What prevents a kick from being lost here, if say, the waiter is at
local_irq_save in kvm_lock_spinning, before the lock/want assignments?

> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
> +		return;
> +
> +	jump_label_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c7b05fc..4d7a950 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	local_irq_disable();
>  
> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> -	    || need_resched() || signal_pending(current)) {
> +	if (vcpu->mode == EXITING_GUEST_MODE
> +		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
> +		 || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		smp_wmb();
>  		local_irq_enable();
> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
>  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)

The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
on vcpu entry (along with the other kvm_check_request processing).

Then the first hunk becomes unnecessary.

Please do not mix host/guest patches.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-17 11:02     ` Marcelo Tosatti
  0 siblings, 0 replies; 139+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 11:02 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
> 
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
> 
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 7a94987..cf5327c 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
>  void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_pf_reason(void);
>  extern void kvm_disable_steal_time(void);
> -#else
> -#define kvm_guest_init() do { } while (0)
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +void __init kvm_spinlock_init(void);
> +#else /* CONFIG_PARAVIRT_SPINLOCKS */
> +static void kvm_spinlock_init(void)
> +{
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> +
> +#else /* CONFIG_KVM_GUEST */
> +#define kvm_guest_init() do {} while (0)
>  #define kvm_async_pf_task_wait(T) do {} while(0)
>  #define kvm_async_pf_task_wake(T) do {} while(0)
> +
>  static inline u32 kvm_read_and_reset_pf_reason(void)
>  {
>  	return 0;
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a9c2116..ec55a0b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -33,6 +33,7 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/kprobes.h>
> +#include <linux/debugfs.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
>  #endif
>  	kvm_guest_cpu_init();
>  	native_smp_prepare_boot_cpu();
> +	kvm_spinlock_init();
>  }
>  
>  static void __cpuinit kvm_guest_cpu_online(void *dummy)
> @@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
>  	return 0;
>  }
>  arch_initcall(activate_jump_labels);
> +
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +
> +enum kvm_contention_stat {
> +	TAKEN_SLOW,
> +	TAKEN_SLOW_PICKUP,
> +	RELEASED_SLOW,
> +	RELEASED_SLOW_KICKED,
> +	NR_CONTENTION_STATS
> +};
> +
> +#ifdef CONFIG_KVM_DEBUG_FS
> +
> +static struct kvm_spinlock_stats
> +{
> +	u32 contention_stats[NR_CONTENTION_STATS];
> +
> +#define HISTO_BUCKETS	30
> +	u32 histo_spin_blocked[HISTO_BUCKETS+1];
> +
> +	u64 time_blocked;
> +} spinlock_stats;
> +
> +static u8 zero_stats;
> +
> +static inline void check_zero(void)
> +{
> +	u8 ret;
> +	u8 old = ACCESS_ONCE(zero_stats);
> +	if (unlikely(old)) {
> +		ret = cmpxchg(&zero_stats, old, 0);
> +		/* This ensures only one fellow resets the stat */
> +		if (ret == old)
> +			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
> +	}
> +}
> +
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +	check_zero();
> +	spinlock_stats.contention_stats[var] += val;
> +}
> +
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return sched_clock();
> +}
> +
> +static void __spin_time_accum(u64 delta, u32 *array)
> +{
> +	unsigned index = ilog2(delta);
> +
> +	check_zero();
> +
> +	if (index < HISTO_BUCKETS)
> +		array[index]++;
> +	else
> +		array[HISTO_BUCKETS]++;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +	u32 delta = sched_clock() - start;
> +
> +	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
> +	spinlock_stats.time_blocked += delta;
> +}
> +
> +static struct dentry *d_spin_debug;
> +static struct dentry *d_kvm_debug;
> +
> +struct dentry *kvm_init_debugfs(void)
> +{
> +	d_kvm_debug = debugfs_create_dir("kvm", NULL);
> +	if (!d_kvm_debug)
> +		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
> +
> +	return d_kvm_debug;
> +}
> +
> +static int __init kvm_spinlock_debugfs(void)
> +{
> +	struct dentry *d_kvm = kvm_init_debugfs();
> +
> +	if (d_kvm == NULL)
> +		return -ENOMEM;
> +
> +	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
> +	return 0;
> +}
> +fs_initcall(kvm_spinlock_debugfs);
> +#else  /* !CONFIG_KVM_DEBUG_FS */
> +#define TIMEOUT			(1 << 10)
> +static inline void add_stats(enum kvm_contention_stat var, u32 val)
> +{
> +}
> +
> +static inline u64 spin_time_start(void)
> +{
> +	return 0;
> +}
> +
> +static inline void spin_time_accum_blocked(u64 start)
> +{
> +}
> +#endif  /* CONFIG_KVM_DEBUG_FS */
> +
> +struct kvm_lock_waiting {
> +	struct arch_spinlock *lock;
> +	__ticket_t want;
> +};
> +
> +/* cpus 'waiting' on a spinlock to become available */
> +static cpumask_t waiting_cpus;
> +
> +/* Track spinlock on which a cpu is waiting */
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
> +	int cpu = smp_processor_id();
> +	u64 start;
> +	unsigned long flags;
> +
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/* Allow interrupts while blocked */
> +	local_irq_restore(flags);
> +
> +	/* halt until it's our turn and kicked. */
> +	halt();
> +
> +	local_irq_save(flags);
> +out:
> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick a cpu by its apicid*/
> +static inline void kvm_kick_cpu(int apicid)
> +{
> +	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
> +}
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +	int apicid;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> +			kvm_kick_cpu(apicid);
> +			break;
> +		}
> +	}

What prevents a kick from being lost here, if say, the waiter is at
local_irq_save in kvm_lock_spinning, before the lock/want assignments?

> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
> +		return;
> +
> +	jump_label_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c7b05fc..4d7a950 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	local_irq_disable();
>  
> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> -	    || need_resched() || signal_pending(current)) {
> +	if (vcpu->mode == EXITING_GUEST_MODE
> +		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
> +		 || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		smp_wmb();
>  		local_irq_enable();
> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
>  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)

The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
on vcpu entry (along with the other kvm_check_request processing).

Then the first hunk becomes unnecessary.

Please do not mix host/guest patches.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-17 11:02     ` Marcelo Tosatti
@ 2012-01-17 11:33       ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 11:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML, Dave Hansen

* Marcelo Tosatti <mtosatti@redhat.com> [2012-01-17 09:02:11]:

> > +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> > +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> > +{
> > +	int cpu;
> > +	int apicid;
> > +
> > +	add_stats(RELEASED_SLOW, 1);
> > +
> > +	for_each_cpu(cpu, &waiting_cpus) {
> > +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> > +		if (ACCESS_ONCE(w->lock) == lock &&
> > +		    ACCESS_ONCE(w->want) == ticket) {
> > +			add_stats(RELEASED_SLOW_KICKED, 1);
> > +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> > +			kvm_kick_cpu(apicid);
> > +			break;
> > +		}
> > +	}
> 
> What prevents a kick from being lost here, if say, the waiter is at
> local_irq_save in kvm_lock_spinning, before the lock/want assignments?

The waiter does check for lock becoming available before actually
sleeping:

+	/*
+        * check again make sure it didn't become free while
+        * we weren't looking.
+        */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+               add_stats(TAKEN_SLOW_PICKUP, 1);
+               goto out;
+	}

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-17 11:33       ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 11:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML

* Marcelo Tosatti <mtosatti@redhat.com> [2012-01-17 09:02:11]:

> > +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> > +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> > +{
> > +	int cpu;
> > +	int apicid;
> > +
> > +	add_stats(RELEASED_SLOW, 1);
> > +
> > +	for_each_cpu(cpu, &waiting_cpus) {
> > +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> > +		if (ACCESS_ONCE(w->lock) == lock &&
> > +		    ACCESS_ONCE(w->want) == ticket) {
> > +			add_stats(RELEASED_SLOW_KICKED, 1);
> > +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
> > +			kvm_kick_cpu(apicid);
> > +			break;
> > +		}
> > +	}
> 
> What prevents a kick from being lost here, if say, the waiter is at
> local_irq_save in kvm_lock_spinning, before the lock/want assignments?

The waiter does check for lock becoming available before actually
sleeping:

+	/*
+        * check again make sure it didn't become free while
+        * we weren't looking.
+        */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+               add_stats(TAKEN_SLOW_PICKUP, 1);
+               goto out;
+	}

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17  9:14             ` Gleb Natapov
@ 2012-01-17 12:26               ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 12:26 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

* Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:

> > The problem case I was thinking of was when guest VCPU would have issued
> > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > and have the guest kernel NMI handler recognize this and make
> > adjustments such that the vcpu avoids going back to HLT instruction.
> > 
> Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> next re-entry. Don't forget to call vmx_clear_hlt().

Looks bit hackish to me compared to having another hypercall to yield!

> > Having another hypercall to do yield/sleep (rather than effecting that
> > via HLT) seems like an alternate clean solution here ..

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 12:26               ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 12:26 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

* Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:

> > The problem case I was thinking of was when guest VCPU would have issued
> > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > and have the guest kernel NMI handler recognize this and make
> > adjustments such that the vcpu avoids going back to HLT instruction.
> > 
> Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> next re-entry. Don't forget to call vmx_clear_hlt().

Looks bit hackish to me compared to having another hypercall to yield!

> > Having another hypercall to do yield/sleep (rather than effecting that
> > via HLT) seems like an alternate clean solution here ..

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 12:26               ` Srivatsa Vaddagiri
@ 2012-01-17 12:51                 ` Gleb Natapov
  -1 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 12:51 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> 
> > > The problem case I was thinking of was when guest VCPU would have issued
> > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > and have the guest kernel NMI handler recognize this and make
> > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > 
> > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > next re-entry. Don't forget to call vmx_clear_hlt().
> 
> Looks bit hackish to me compared to having another hypercall to yield!
> 
Do not see anything hackish about it. But what you described above (the
part I replied to) is not another hypercall, but yet another NMI source
and special handling in a guest. So what hypercall do you mean?

> > > Having another hypercall to do yield/sleep (rather than effecting that
> > > via HLT) seems like an alternate clean solution here ..
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 12:51                 ` Gleb Natapov
  0 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 12:51 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> 
> > > The problem case I was thinking of was when guest VCPU would have issued
> > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > and have the guest kernel NMI handler recognize this and make
> > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > 
> > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > next re-entry. Don't forget to call vmx_clear_hlt().
> 
> Looks bit hackish to me compared to having another hypercall to yield!
> 
Do not see anything hackish about it. But what you described above (the
part I replied to) is not another hypercall, but yet another NMI source
and special handling in a guest. So what hypercall do you mean?

> > > Having another hypercall to do yield/sleep (rather than effecting that
> > > via HLT) seems like an alternate clean solution here ..
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 12:51                 ` Gleb Natapov
@ 2012-01-17 13:11                   ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 13:11 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

* Gleb Natapov <gleb@redhat.com> [2012-01-17 14:51:26]:

> On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> > * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> > 
> > > > The problem case I was thinking of was when guest VCPU would have issued
> > > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > > and have the guest kernel NMI handler recognize this and make
> > > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > > 
> > > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > > next re-entry. Don't forget to call vmx_clear_hlt().
> > 
> > Looks bit hackish to me compared to having another hypercall to yield!
> > 
> Do not see anything hackish about it. But what you described above (the
> part I replied to) is not another hypercall, but yet another NMI source
> and special handling in a guest.

True, which I didn't exactly like and hence was suggesting we use
another hypercall to let spinning vcpu sleep.

> So what hypercall do you mean?

The hypercall is described below:

> > > > Having another hypercall to do yield/sleep (rather than effecting that
> > > > via HLT) seems like an alternate clean solution here ..

and was implemented in an earlier version of this patch (v2) as
KVM_HC_WAIT_FOR_KICK hypercall:

https://lkml.org/lkml/2011/10/23/211

Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
hypervisor vs assuming that because of a trapped HLT instruction (which
will anyway won't work when yield_on_hlt=0).

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 13:11                   ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 13:11 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

* Gleb Natapov <gleb@redhat.com> [2012-01-17 14:51:26]:

> On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> > * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> > 
> > > > The problem case I was thinking of was when guest VCPU would have issued
> > > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > > and have the guest kernel NMI handler recognize this and make
> > > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > > 
> > > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > > next re-entry. Don't forget to call vmx_clear_hlt().
> > 
> > Looks bit hackish to me compared to having another hypercall to yield!
> > 
> Do not see anything hackish about it. But what you described above (the
> part I replied to) is not another hypercall, but yet another NMI source
> and special handling in a guest.

True, which I didn't exactly like and hence was suggesting we use
another hypercall to let spinning vcpu sleep.

> So what hypercall do you mean?

The hypercall is described below:

> > > > Having another hypercall to do yield/sleep (rather than effecting that
> > > > via HLT) seems like an alternate clean solution here ..

and was implemented in an earlier version of this patch (v2) as
KVM_HC_WAIT_FOR_KICK hypercall:

https://lkml.org/lkml/2011/10/23/211

Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
hypervisor vs assuming that because of a trapped HLT instruction (which
will anyway won't work when yield_on_hlt=0).

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 12:51                 ` Gleb Natapov
@ 2012-01-17 13:13                   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 13:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, KVM, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization,
	Greg Kroah-Hartman, LKML, Dave Hansen

On 01/17/2012 06:21 PM, Gleb Natapov wrote:
> On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
>> * Gleb Natapov<gleb@redhat.com>  [2012-01-17 11:14:13]:
>>
>>>> The problem case I was thinking of was when guest VCPU would have issued
>>>> HLT with interrupts disabled. I guess one option is to inject an NMI,
>>>> and have the guest kernel NMI handler recognize this and make
>>>> adjustments such that the vcpu avoids going back to HLT instruction.
>>>>
>>> Just kick vcpu out of a guest mode and adjust rip to point after HLT on
>>> next re-entry. Don't forget to call vmx_clear_hlt().
>>
>> Looks bit hackish to me compared to having another hypercall to yield!
>>
> Do not see anything hackish about it. But what you described above (the
> part I replied to) is not another hypercall, but yet another NMI source
> and special handling in a guest. So what hypercall do you mean?
>

Earlier version had a hypercall to sleep instead of current halt()
approach. This was taken out to avoid extra hypercall.

So here is the hypercall hunk referred :

+/*
+ * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
+ * hypercall or a event like interrupt.
+ *
+ * @vcpu : vcpu which is blocking.
+ */
+static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
+{
+       DEFINE_WAIT(wait);
+
+       /*
+        * Blocking on vcpu->wq allows us to wake up sooner if required to
+        * service pending events (like interrupts).
+        *
+        * Also set state to TASK_INTERRUPTIBLE before checking 
vcpu->kicked to
+        * avoid racing with kvm_pv_kick_cpu_op().
+        */
+       prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+       /*
+        * Somebody has already tried kicking us. Acknowledge that
+        * and terminate the wait.
+        */
+       if (vcpu->kicked) {
+               vcpu->kicked = 0;
+               goto end_wait;
+       }
+
+       /* Let's wait for either KVM_HC_KICK_CPU or someother event
+        * to wake us up.
+        */
+
+       srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+       schedule();
+       vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+end_wait:
+       finish_wait(&vcpu->wq, &wait);
+}

>>>> Having another hypercall to do yield/sleep (rather than effecting that
>>>> via HLT) seems like an alternate clean solution here ..

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 13:13                   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 13:13 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, KVM, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization,
	Greg Kroah-Hartman, LKML, Dave Hansen

On 01/17/2012 06:21 PM, Gleb Natapov wrote:
> On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
>> * Gleb Natapov<gleb@redhat.com>  [2012-01-17 11:14:13]:
>>
>>>> The problem case I was thinking of was when guest VCPU would have issued
>>>> HLT with interrupts disabled. I guess one option is to inject an NMI,
>>>> and have the guest kernel NMI handler recognize this and make
>>>> adjustments such that the vcpu avoids going back to HLT instruction.
>>>>
>>> Just kick vcpu out of a guest mode and adjust rip to point after HLT on
>>> next re-entry. Don't forget to call vmx_clear_hlt().
>>
>> Looks bit hackish to me compared to having another hypercall to yield!
>>
> Do not see anything hackish about it. But what you described above (the
> part I replied to) is not another hypercall, but yet another NMI source
> and special handling in a guest. So what hypercall do you mean?
>

Earlier version had a hypercall to sleep instead of current halt()
approach. This was taken out to avoid extra hypercall.

So here is the hypercall hunk referred :

+/*
+ * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
+ * hypercall or a event like interrupt.
+ *
+ * @vcpu : vcpu which is blocking.
+ */
+static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
+{
+       DEFINE_WAIT(wait);
+
+       /*
+        * Blocking on vcpu->wq allows us to wake up sooner if required to
+        * service pending events (like interrupts).
+        *
+        * Also set state to TASK_INTERRUPTIBLE before checking 
vcpu->kicked to
+        * avoid racing with kvm_pv_kick_cpu_op().
+        */
+       prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
+
+       /*
+        * Somebody has already tried kicking us. Acknowledge that
+        * and terminate the wait.
+        */
+       if (vcpu->kicked) {
+               vcpu->kicked = 0;
+               goto end_wait;
+       }
+
+       /* Let's wait for either KVM_HC_KICK_CPU or someother event
+        * to wake us up.
+        */
+
+       srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+       schedule();
+       vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+end_wait:
+       finish_wait(&vcpu->wq, &wait);
+}

>>>> Having another hypercall to do yield/sleep (rather than effecting that
>>>> via HLT) seems like an alternate clean solution here ..

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 13:11                   ` Srivatsa Vaddagiri
@ 2012-01-17 13:20                     ` Gleb Natapov
  -1 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 13:20 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Tue, Jan 17, 2012 at 06:41:03PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 14:51:26]:
> 
> > On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> > > * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> > > 
> > > > > The problem case I was thinking of was when guest VCPU would have issued
> > > > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > > > and have the guest kernel NMI handler recognize this and make
> > > > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > > > 
> > > > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > > > next re-entry. Don't forget to call vmx_clear_hlt().
> > > 
> > > Looks bit hackish to me compared to having another hypercall to yield!
> > > 
> > Do not see anything hackish about it. But what you described above (the
> > part I replied to) is not another hypercall, but yet another NMI source
> > and special handling in a guest.
> 
> True, which I didn't exactly like and hence was suggesting we use
> another hypercall to let spinning vcpu sleep.
> 
Ah, sorry. Missed that.

> > So what hypercall do you mean?
> 
> The hypercall is described below:
> 
> > > > > Having another hypercall to do yield/sleep (rather than effecting that
> > > > > via HLT) seems like an alternate clean solution here ..
> 
> and was implemented in an earlier version of this patch (v2) as
> KVM_HC_WAIT_FOR_KICK hypercall:
> 
> https://lkml.org/lkml/2011/10/23/211
> 
> Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> hypervisor vs assuming that because of a trapped HLT instruction (which
> will anyway won't work when yield_on_hlt=0).
> 
The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
entire time slice no mater what. I do not think disabling yield on HLT
is even make sense in CPU oversubscribe scenario. Now if you'll call
KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
yield_on_hlt=0 setting. This is like having PV HLT that does not obey
VMX exit control setting.

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 13:20                     ` Gleb Natapov
  0 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 13:20 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On Tue, Jan 17, 2012 at 06:41:03PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 14:51:26]:
> 
> > On Tue, Jan 17, 2012 at 05:56:50PM +0530, Srivatsa Vaddagiri wrote:
> > > * Gleb Natapov <gleb@redhat.com> [2012-01-17 11:14:13]:
> > > 
> > > > > The problem case I was thinking of was when guest VCPU would have issued
> > > > > HLT with interrupts disabled. I guess one option is to inject an NMI,
> > > > > and have the guest kernel NMI handler recognize this and make
> > > > > adjustments such that the vcpu avoids going back to HLT instruction.
> > > > > 
> > > > Just kick vcpu out of a guest mode and adjust rip to point after HLT on
> > > > next re-entry. Don't forget to call vmx_clear_hlt().
> > > 
> > > Looks bit hackish to me compared to having another hypercall to yield!
> > > 
> > Do not see anything hackish about it. But what you described above (the
> > part I replied to) is not another hypercall, but yet another NMI source
> > and special handling in a guest.
> 
> True, which I didn't exactly like and hence was suggesting we use
> another hypercall to let spinning vcpu sleep.
> 
Ah, sorry. Missed that.

> > So what hypercall do you mean?
> 
> The hypercall is described below:
> 
> > > > > Having another hypercall to do yield/sleep (rather than effecting that
> > > > > via HLT) seems like an alternate clean solution here ..
> 
> and was implemented in an earlier version of this patch (v2) as
> KVM_HC_WAIT_FOR_KICK hypercall:
> 
> https://lkml.org/lkml/2011/10/23/211
> 
> Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> hypervisor vs assuming that because of a trapped HLT instruction (which
> will anyway won't work when yield_on_hlt=0).
> 
The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
entire time slice no mater what. I do not think disabling yield on HLT
is even make sense in CPU oversubscribe scenario. Now if you'll call
KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
yield_on_hlt=0 setting. This is like having PV HLT that does not obey
VMX exit control setting.

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 13:20                     ` Gleb Natapov
@ 2012-01-17 14:28                       ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 14:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

* Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:

> > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > hypervisor vs assuming that because of a trapped HLT instruction (which
> > will anyway won't work when yield_on_hlt=0).
> > 
> The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> entire time slice no mater what. I do not think disabling yield on HLT
> is even make sense in CPU oversubscribe scenario.

Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
initially added it as a way to implement CPU bandwidth capping for VMs,
which would ensure that busy VMs don't eat into cycles meant for a idle
VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
there any real world use for yield_on_hlt=0? If not, deprecate it?

> Now if you'll call
> KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
> yield_on_hlt=0 setting.

I guess that depends on what we do in KVM_HC_WAIT_FOR_KICK. If we do
yield_to() rather than sleep, it should minimize how much cycles vcpu gives away
to a competing VM (which seems to be the biggest purpose why one may
want to set yield_on_hlt=0).

> This is like having PV HLT that does not obey
> VMX exit control setting.

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 14:28                       ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-17 14:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

* Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:

> > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > hypervisor vs assuming that because of a trapped HLT instruction (which
> > will anyway won't work when yield_on_hlt=0).
> > 
> The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> entire time slice no mater what. I do not think disabling yield on HLT
> is even make sense in CPU oversubscribe scenario.

Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
initially added it as a way to implement CPU bandwidth capping for VMs,
which would ensure that busy VMs don't eat into cycles meant for a idle
VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
there any real world use for yield_on_hlt=0? If not, deprecate it?

> Now if you'll call
> KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
> yield_on_hlt=0 setting.

I guess that depends on what we do in KVM_HC_WAIT_FOR_KICK. If we do
yield_to() rather than sleep, it should minimize how much cycles vcpu gives away
to a competing VM (which seems to be the biggest purpose why one may
want to set yield_on_hlt=0).

> This is like having PV HLT that does not obey
> VMX exit control setting.

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 14:28                       ` Srivatsa Vaddagiri
@ 2012-01-17 15:32                         ` Gleb Natapov
  -1 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 15:32 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen

On Tue, Jan 17, 2012 at 07:58:18PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:
> 
> > > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > > hypervisor vs assuming that because of a trapped HLT instruction (which
> > > will anyway won't work when yield_on_hlt=0).
> > > 
> > The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> > entire time slice no mater what. I do not think disabling yield on HLT
> > is even make sense in CPU oversubscribe scenario.
> 
> Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
> initially added it as a way to implement CPU bandwidth capping for VMs,
> which would ensure that busy VMs don't eat into cycles meant for a idle
> VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
> there any real world use for yield_on_hlt=0? If not, deprecate it?
> 
I was against adding it in the first place, so if IBM no longer needs it
I am for removing it ASAP.

> > Now if you'll call
> > KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
> > yield_on_hlt=0 setting.
> 
> I guess that depends on what we do in KVM_HC_WAIT_FOR_KICK. If we do
> yield_to() rather than sleep, it should minimize how much cycles vcpu gives away
> to a competing VM (which seems to be the biggest purpose why one may
> want to set yield_on_hlt=0).
> 
> > This is like having PV HLT that does not obey
> > VMX exit control setting.
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 15:32                         ` Gleb Natapov
  0 siblings, 0 replies; 139+ messages in thread
From: Gleb Natapov @ 2012-01-17 15:32 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On Tue, Jan 17, 2012 at 07:58:18PM +0530, Srivatsa Vaddagiri wrote:
> * Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:
> 
> > > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > > hypervisor vs assuming that because of a trapped HLT instruction (which
> > > will anyway won't work when yield_on_hlt=0).
> > > 
> > The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> > entire time slice no mater what. I do not think disabling yield on HLT
> > is even make sense in CPU oversubscribe scenario.
> 
> Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
> initially added it as a way to implement CPU bandwidth capping for VMs,
> which would ensure that busy VMs don't eat into cycles meant for a idle
> VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
> there any real world use for yield_on_hlt=0? If not, deprecate it?
> 
I was against adding it in the first place, so if IBM no longer needs it
I am for removing it ASAP.

> > Now if you'll call
> > KVM_HC_WAIT_FOR_KICK instead of HLT you will effectively ignore
> > yield_on_hlt=0 setting.
> 
> I guess that depends on what we do in KVM_HC_WAIT_FOR_KICK. If we do
> yield_to() rather than sleep, it should minimize how much cycles vcpu gives away
> to a competing VM (which seems to be the biggest purpose why one may
> want to set yield_on_hlt=0).
> 
> > This is like having PV HLT that does not obey
> > VMX exit control setting.
> 
> - vatsa

--
			Gleb.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 15:32                         ` Gleb Natapov
@ 2012-01-17 15:53                           ` Marcelo Tosatti
  -1 siblings, 0 replies; 139+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 15:53 UTC (permalink / raw)
  To: Gleb Natapov, Anthony Liguori
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, Greg Kroah-Hartman, LKML

On Tue, Jan 17, 2012 at 05:32:33PM +0200, Gleb Natapov wrote:
> On Tue, Jan 17, 2012 at 07:58:18PM +0530, Srivatsa Vaddagiri wrote:
> > * Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:
> > 
> > > > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > > > hypervisor vs assuming that because of a trapped HLT instruction (which
> > > > will anyway won't work when yield_on_hlt=0).
> > > > 
> > > The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> > > entire time slice no mater what. I do not think disabling yield on HLT
> > > is even make sense in CPU oversubscribe scenario.
> > 
> > Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
> > initially added it as a way to implement CPU bandwidth capping for VMs,
> > which would ensure that busy VMs don't eat into cycles meant for a idle
> > VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
> > there any real world use for yield_on_hlt=0? If not, deprecate it?
> > 
> I was against adding it in the first place, so if IBM no longer needs it
> I am for removing it ASAP.

+1.

Anthony?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2012-01-17 15:53                           ` Marcelo Tosatti
  0 siblings, 0 replies; 139+ messages in thread
From: Marcelo Tosatti @ 2012-01-17 15:53 UTC (permalink / raw)
  To: Gleb Natapov, Anthony Liguori
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, Greg Kroah-Hartman

On Tue, Jan 17, 2012 at 05:32:33PM +0200, Gleb Natapov wrote:
> On Tue, Jan 17, 2012 at 07:58:18PM +0530, Srivatsa Vaddagiri wrote:
> > * Gleb Natapov <gleb@redhat.com> [2012-01-17 15:20:51]:
> > 
> > > > Having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > > > hypervisor vs assuming that because of a trapped HLT instruction (which
> > > > will anyway won't work when yield_on_hlt=0).
> > > > 
> > > The purpose of yield_on_hlt=0 is to allow VCPU to occupy CPU for the
> > > entire time slice no mater what. I do not think disabling yield on HLT
> > > is even make sense in CPU oversubscribe scenario.
> > 
> > Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
> > initially added it as a way to implement CPU bandwidth capping for VMs,
> > which would ensure that busy VMs don't eat into cycles meant for a idle
> > VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
> > there any real world use for yield_on_hlt=0? If not, deprecate it?
> > 
> I was against adding it in the first place, so if IBM no longer needs it
> I am for removing it ASAP.

+1.

Anthony?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 18:42           ` Alexander Graf
@ 2012-01-17 17:27             ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 17:27 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/17/2012 12:12 AM, Alexander Graf wrote:
>
> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>
>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>
>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>
>>>> * Alexander Graf<agraf@suse.de>   [2012-01-16 04:57:45]:
>>>>
>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>
>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>> some workload(s)?
>>>
>>> Yup
>>>
>>>>
>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>> kernbench ..
>>>>
>>>>> Result for Non PLE machine :
>>>>> ============================
>>>>
>>>> [snip]
>>>>
>>>>> Kernbench:
>>>>>                BASE                    BASE+patch
>>>
>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>
>>>
>>> Alex
>>
>> Sorry for confusion, I think I was little imprecise on the BASE.
>>
>> The BASE is pre 3.2.0 + Jeremy's following patches:
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> So this would have ticketlock cleanups from Jeremy and
>> CONFIG_PARAVIRT_SPINLOCKS=y
>>
>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>
>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>
>> So let,
>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>
>> is it performance of A vs E ? (currently C vs E)
>
> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>
>
> Alex
>
>
setup :
Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core 
, 64GB RAM, (16 cpu online)

Guest : Single guest with 8 VCPU 4GB Ram.
benchmark : kernbench -f -H -M -o 20

Here is the result :
Native Run
============
case A               case B             %improvement   case C 
  %improvement
56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401) 
   -0.139344	

Guest Run
============
case A               case B             %improvement   case C 
  %improvement
166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497) 
  3.44852

We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-17 17:27             ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 17:27 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/17/2012 12:12 AM, Alexander Graf wrote:
>
> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>
>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>
>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>
>>>> * Alexander Graf<agraf@suse.de>   [2012-01-16 04:57:45]:
>>>>
>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>
>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>> some workload(s)?
>>>
>>> Yup
>>>
>>>>
>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>> kernbench ..
>>>>
>>>>> Result for Non PLE machine :
>>>>> ============================
>>>>
>>>> [snip]
>>>>
>>>>> Kernbench:
>>>>>                BASE                    BASE+patch
>>>
>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>
>>>
>>> Alex
>>
>> Sorry for confusion, I think I was little imprecise on the BASE.
>>
>> The BASE is pre 3.2.0 + Jeremy's following patches:
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> So this would have ticketlock cleanups from Jeremy and
>> CONFIG_PARAVIRT_SPINLOCKS=y
>>
>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>
>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>
>> So let,
>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>
>> is it performance of A vs E ? (currently C vs E)
>
> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>
>
> Alex
>
>
setup :
Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core 
, 64GB RAM, (16 cpu online)

Guest : Single guest with 8 VCPU 4GB Ram.
benchmark : kernbench -f -H -M -o 20

Here is the result :
Native Run
============
case A               case B             %improvement   case C 
  %improvement
56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401) 
   -0.139344	

Guest Run
============
case A               case B             %improvement   case C 
  %improvement
166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497) 
  3.44852

We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17 17:27             ` Raghavendra K T
@ 2012-01-17 17:39               ` Alexander Graf
  -1 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-17 17:39 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen


On 17.01.2012, at 18:27, Raghavendra K T wrote:

> On 01/17/2012 12:12 AM, Alexander Graf wrote:
>> 
>> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>> 
>>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>> 
>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>> 
>>>>> * Alexander Graf<agraf@suse.de>   [2012-01-16 04:57:45]:
>>>>> 
>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>> 
>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>>> some workload(s)?
>>>> 
>>>> Yup
>>>> 
>>>>> 
>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>>> kernbench ..
>>>>> 
>>>>>> Result for Non PLE machine :
>>>>>> ============================
>>>>> 
>>>>> [snip]
>>>>> 
>>>>>> Kernbench:
>>>>>>               BASE                    BASE+patch
>>>> 
>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>> 
>>>> 
>>>> Alex
>>> 
>>> Sorry for confusion, I think I was little imprecise on the BASE.
>>> 
>>> The BASE is pre 3.2.0 + Jeremy's following patches:
>>> xadd (https://lkml.org/lkml/2011/10/4/328)
>>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>>> So this would have ticketlock cleanups from Jeremy and
>>> CONFIG_PARAVIRT_SPINLOCKS=y
>>> 
>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>> 
>>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>> 
>>> So let,
>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>> 
>>> is it performance of A vs E ? (currently C vs E)
>> 
>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>> 
>> 
>> Alex
>> 
>> 
> setup :
> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online)
> 
> Guest : Single guest with 8 VCPU 4GB Ram.
> benchmark : kernbench -f -H -M -o 20
> 
> Here is the result :
> Native Run
> ============
> case A               case B             %improvement   case C  %improvement
> 56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401)   -0.139344	

This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time?

Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario.

> 
> Guest Run
> ============
> case A               case B             %improvement   case C  %improvement
> 166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497)  3.44852

Is this the same machine? Why is the guest 3x slower?


Alex

> 
> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-17 17:39               ` Alexander Graf
  0 siblings, 0 replies; 139+ messages in thread
From: Alexander Graf @ 2012-01-17 17:39 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen


On 17.01.2012, at 18:27, Raghavendra K T wrote:

> On 01/17/2012 12:12 AM, Alexander Graf wrote:
>> 
>> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>> 
>>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>> 
>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>> 
>>>>> * Alexander Graf<agraf@suse.de>   [2012-01-16 04:57:45]:
>>>>> 
>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>> 
>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>>> some workload(s)?
>>>> 
>>>> Yup
>>>> 
>>>>> 
>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>>> kernbench ..
>>>>> 
>>>>>> Result for Non PLE machine :
>>>>>> ============================
>>>>> 
>>>>> [snip]
>>>>> 
>>>>>> Kernbench:
>>>>>>               BASE                    BASE+patch
>>>> 
>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>> 
>>>> 
>>>> Alex
>>> 
>>> Sorry for confusion, I think I was little imprecise on the BASE.
>>> 
>>> The BASE is pre 3.2.0 + Jeremy's following patches:
>>> xadd (https://lkml.org/lkml/2011/10/4/328)
>>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>>> So this would have ticketlock cleanups from Jeremy and
>>> CONFIG_PARAVIRT_SPINLOCKS=y
>>> 
>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>> 
>>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>> 
>>> So let,
>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>> 
>>> is it performance of A vs E ? (currently C vs E)
>> 
>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>> 
>> 
>> Alex
>> 
>> 
> setup :
> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online)
> 
> Guest : Single guest with 8 VCPU 4GB Ram.
> benchmark : kernbench -f -H -M -o 20
> 
> Here is the result :
> Native Run
> ============
> case A               case B             %improvement   case C  %improvement
> 56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401)   -0.139344	

This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time?

Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario.

> 
> Guest Run
> ============
> case A               case B             %improvement   case C  %improvement
> 166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497)  3.44852

Is this the same machine? Why is the guest 3x slower?


Alex

> 
> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y
> 

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17 17:39               ` Alexander Graf
@ 2012-01-17 18:36                 ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 18:36 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/17/2012 11:09 PM, Alexander Graf wrote:
>
> On 17.01.2012, at 18:27, Raghavendra K T wrote:
>
>> On 01/17/2012 12:12 AM, Alexander Graf wrote:
>>>
>>> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>>>
>>>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>>>
>>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>>>
>>>>>> * Alexander Graf<agraf@suse.de>    [2012-01-16 04:57:45]:
>>>>>>
>>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>>>
>>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>>>> some workload(s)?
>>>>>
>>>>> Yup
>>>>>
>>>>>>
>>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>>>> kernbench ..
>>>>>>
>>>>>>> Result for Non PLE machine :
>>>>>>> ============================
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Kernbench:
>>>>>>>                BASE                    BASE+patch
>>>>>
>>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>>>
>>>>>
>>>>> Alex
>>>>
>>>> Sorry for confusion, I think I was little imprecise on the BASE.
>>>>
>>>> The BASE is pre 3.2.0 + Jeremy's following patches:
>>>> xadd (https://lkml.org/lkml/2011/10/4/328)
>>>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>>>> So this would have ticketlock cleanups from Jeremy and
>>>> CONFIG_PARAVIRT_SPINLOCKS=y
>>>>
>>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>>>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>>>
>>>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>>>
>>>> So let,
>>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>>>
>>>> is it performance of A vs E ? (currently C vs E)
>>>
>>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>>>
>>>
>>> Alex
>>>
>>>
>> setup :
>> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online)
>>
>> Guest : Single guest with 8 VCPU 4GB Ram.
>> benchmark : kernbench -f -H -M -o 20
>>
>> Here is the result :
>> Native Run
>> ============
>> case A               case B             %improvement   case C  %improvement
>> 56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401)   -0.139344	
>
> This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time?
>
> Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario.
>

It was a quick test.  two iteration of kernbench (=6runs) and had 
ensured cache is cleared.

echo "1" > /proc/sys/vm/drop_caches
ccache -C. Yes may be I can run test as you mentioned..

>>
>> Guest Run
>> ============
>> case A               case B             %improvement   case C  %improvement
>> 166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497)  3.44852
>
> Is this the same machine? Why is the guest 3x slower?
Yes non - ple machine but with all 16 cpus online. 3x slower you meant 
case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ?

>
>
> Alex
>
>>
>> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y
>>
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-17 18:36                 ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 18:36 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/17/2012 11:09 PM, Alexander Graf wrote:
>
> On 17.01.2012, at 18:27, Raghavendra K T wrote:
>
>> On 01/17/2012 12:12 AM, Alexander Graf wrote:
>>>
>>> On 16.01.2012, at 19:38, Raghavendra K T wrote:
>>>
>>>> On 01/16/2012 07:53 PM, Alexander Graf wrote:
>>>>>
>>>>> On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:
>>>>>
>>>>>> * Alexander Graf<agraf@suse.de>    [2012-01-16 04:57:45]:
>>>>>>
>>>>>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
>>>>>>
>>>>>> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
>>>>>> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for
>>>>>> some workload(s)?
>>>>>
>>>>> Yup
>>>>>
>>>>>>
>>>>>> In some sense, the 1x overcommitcase results posted does measure the overhead
>>>>>> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
>>>>>> kernbench ..
>>>>>>
>>>>>>> Result for Non PLE machine :
>>>>>>> ============================
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> Kernbench:
>>>>>>>                BASE                    BASE+patch
>>>>>
>>>>> What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.
>>>>>
>>>>>
>>>>> Alex
>>>>
>>>> Sorry for confusion, I think I was little imprecise on the BASE.
>>>>
>>>> The BASE is pre 3.2.0 + Jeremy's following patches:
>>>> xadd (https://lkml.org/lkml/2011/10/4/328)
>>>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>>>> So this would have ticketlock cleanups from Jeremy and
>>>> CONFIG_PARAVIRT_SPINLOCKS=y
>>>>
>>>> BASE+patch = pre 3.2.0 + Jeremy's above patches + above V5 PV spinlock
>>>> series and CONFIG_PARAVIRT_SPINLOCKS=y
>>>>
>>>> In both the cases  CONFIG_PARAVIRT_SPINLOCKS=y.
>>>>
>>>> So let,
>>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n
>>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y
>>>>
>>>> is it performance of A vs E ? (currently C vs E)
>>>
>>> Since D and E only matter with KVM in use, yes, I'm mostly interested in A, B and C :).
>>>
>>>
>>> Alex
>>>
>>>
>> setup :
>> Native: IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, (16 cpu online)
>>
>> Guest : Single guest with 8 VCPU 4GB Ram.
>> benchmark : kernbench -f -H -M -o 20
>>
>> Here is the result :
>> Native Run
>> ============
>> case A               case B             %improvement   case C  %improvement
>> 56.1917 (2.57125)    56.035 (2.02439)   0.278867       56.27 (2.40401)   -0.139344	
>
> This looks a lot like statistical derivation. How often did you execute the test case? Did you make sure to have a clean base state every time?
>
> Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario.
>

It was a quick test.  two iteration of kernbench (=6runs) and had 
ensured cache is cleared.

echo "1" > /proc/sys/vm/drop_caches
ccache -C. Yes may be I can run test as you mentioned..

>>
>> Guest Run
>> ============
>> case A               case B             %improvement   case C  %improvement
>> 166.999 (15.7613)    161.876 (14.4874) 	3.06768        161.24 (12.6497)  3.44852
>
> Is this the same machine? Why is the guest 3x slower?
Yes non - ple machine but with all 16 cpus online. 3x slower you meant 
case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ?

>
>
> Alex
>
>>
>> We do not see much overhead in native run with CONFIG_PARAVIRT_SPINLOCKS = y
>>
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-17 11:02     ` Marcelo Tosatti
@ 2012-01-17 18:57       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 18:57 UTC (permalink / raw)
  To: Marcelo Tosatti, Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On 01/17/2012 04:32 PM, Marcelo Tosatti wrote:
> On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c7b05fc..4d7a950 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>
>>   	local_irq_disable();
>>
>> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>> -	    || need_resched() || signal_pending(current)) {
>> +	if (vcpu->mode == EXITING_GUEST_MODE
>> +		 || (vcpu->requests&  ~(1UL<<KVM_REQ_PVLOCK_KICK))
>> +		 || need_resched() || signal_pending(current)) {
>>   		vcpu->mode = OUTSIDE_GUEST_MODE;
>>   		smp_wmb();
>>   		local_irq_enable();
>> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>>   		!vcpu->arch.apf.halted)
>>   		|| !list_empty_careful(&vcpu->async_pf.done)
>>   		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
>
> The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
> on vcpu entry (along with the other kvm_check_request processing).
>
> Then the first hunk becomes unnecessary.

true. [ patch below ]

>
> Please do not mix host/guest patches.

yes, will be taken care in next version..

>
>

I had tried alternative approach earlier, I think that is closer
to your expectation.

where
- flag is read in kvm_arch_vcpu_runnable
- flag cleared in vcpu entry along with others.

But it needs per vcpu flag to remember that pv_unhalted while clearing
the flag in vcpu enter [ patch below ]. Could not find third alternative
though.

Simply clearing the request bit in vcpu entry had made guest hang in 
*rare* scenario.  [as kick will be lost].

[ I had observed guest hang after 4 iteration of kernbench with 1:3 
overcommit. with 2/3 guest running while 1 hogs ]

Avi,
do you think having pv_unhalt flag in below patch cause problem to
live migration still? (vcpu->request bit is retained as is) OR do we 
have to have KVM_GET_MP_STATE changes also with below patch you 
mentioned earlier.

---8<---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..1bf8fa8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5684,6 +5717,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  			r = 1;
  			goto out;
  		}
+		if (kvm_check_request(KVM_REQ_PVKICK, vcpu)) {
+			vcpu->pv_unhalted = 1;
+			r = 1;
+			goto out;
+		}
  		if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
  			record_steal_time(vcpu);
  		if (kvm_check_request(KVM_REQ_NMI, vcpu))
@@ -6683,6 +6720,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
  		!vcpu->arch.apf.halted)
  		|| !list_empty_careful(&vcpu->async_pf.done)
  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| (kvm_test_request(KVM_REQ_PVKICK, vcpu) || vcpu->pv_unhalted)
  		|| atomic_read(&vcpu->arch.nmi_queued) ||
  		(kvm_arch_interrupt_allowed(vcpu) &&
  		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..a48e0f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -154,6 +155,8 @@ struct kvm_vcpu {
  #endif

  	struct kvm_vcpu_arch arch;
+
+	int pv_unhalted;
  };

  static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
@@ -770,5 +773,12 @@ static inline bool kvm_check_request(int req, 
struct kvm_vcpu *vcpu)
  	}
  }

+static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
+{
+	if (test_bit(req, &vcpu->requests))
+		return true;
+	else
+		return false;
+}
  #endif

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..55c44a2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm 
*kvm, unsigned id)
  	vcpu->kvm = kvm;
  	vcpu->vcpu_id = id;
  	vcpu->pid = NULL;
+	vcpu->pv_unhalted = 0;
  	init_waitqueue_head(&vcpu->wq);
  	kvm_async_pf_vcpu_init(vcpu);

@@ -1509,11 +1510,12 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  {
  	DEFINE_WAIT(wait);
  	for (;;) {
  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);

  		if (kvm_arch_vcpu_runnable(vcpu)) {
+			vcpu->pv_unhalted = 0;
  			kvm_make_request(KVM_REQ_UNHALT, vcpu);
  			break;
  		}

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-17 18:57       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-17 18:57 UTC (permalink / raw)
  To: Marcelo Tosatti, Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On 01/17/2012 04:32 PM, Marcelo Tosatti wrote:
> On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c7b05fc..4d7a950 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>
>>   	local_irq_disable();
>>
>> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>> -	    || need_resched() || signal_pending(current)) {
>> +	if (vcpu->mode == EXITING_GUEST_MODE
>> +		 || (vcpu->requests&  ~(1UL<<KVM_REQ_PVLOCK_KICK))
>> +		 || need_resched() || signal_pending(current)) {
>>   		vcpu->mode = OUTSIDE_GUEST_MODE;
>>   		smp_wmb();
>>   		local_irq_enable();
>> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>>   		!vcpu->arch.apf.halted)
>>   		|| !list_empty_careful(&vcpu->async_pf.done)
>>   		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>> +		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
>
> The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
> on vcpu entry (along with the other kvm_check_request processing).
>
> Then the first hunk becomes unnecessary.

true. [ patch below ]

>
> Please do not mix host/guest patches.

yes, will be taken care in next version..

>
>

I had tried alternative approach earlier, I think that is closer
to your expectation.

where
- flag is read in kvm_arch_vcpu_runnable
- flag cleared in vcpu entry along with others.

But it needs per vcpu flag to remember that pv_unhalted while clearing
the flag in vcpu enter [ patch below ]. Could not find third alternative
though.

Simply clearing the request bit in vcpu entry had made guest hang in 
*rare* scenario.  [as kick will be lost].

[ I had observed guest hang after 4 iteration of kernbench with 1:3 
overcommit. with 2/3 guest running while 1 hogs ]

Avi,
do you think having pv_unhalt flag in below patch cause problem to
live migration still? (vcpu->request bit is retained as is) OR do we 
have to have KVM_GET_MP_STATE changes also with below patch you 
mentioned earlier.

---8<---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..1bf8fa8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5684,6 +5717,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  			r = 1;
  			goto out;
  		}
+		if (kvm_check_request(KVM_REQ_PVKICK, vcpu)) {
+			vcpu->pv_unhalted = 1;
+			r = 1;
+			goto out;
+		}
  		if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
  			record_steal_time(vcpu);
  		if (kvm_check_request(KVM_REQ_NMI, vcpu))
@@ -6683,6 +6720,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
  		!vcpu->arch.apf.halted)
  		|| !list_empty_careful(&vcpu->async_pf.done)
  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| (kvm_test_request(KVM_REQ_PVKICK, vcpu) || vcpu->pv_unhalted)
  		|| atomic_read(&vcpu->arch.nmi_queued) ||
  		(kvm_arch_interrupt_allowed(vcpu) &&
  		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..a48e0f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -154,6 +155,8 @@ struct kvm_vcpu {
  #endif

  	struct kvm_vcpu_arch arch;
+
+	int pv_unhalted;
  };

  static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
@@ -770,5 +773,12 @@ static inline bool kvm_check_request(int req, 
struct kvm_vcpu *vcpu)
  	}
  }

+static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
+{
+	if (test_bit(req, &vcpu->requests))
+		return true;
+	else
+		return false;
+}
  #endif

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..55c44a2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm 
*kvm, unsigned id)
  	vcpu->kvm = kvm;
  	vcpu->vcpu_id = id;
  	vcpu->pid = NULL;
+	vcpu->pv_unhalted = 0;
  	init_waitqueue_head(&vcpu->wq);
  	kvm_async_pf_vcpu_init(vcpu);

@@ -1509,11 +1510,12 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
  {
  	DEFINE_WAIT(wait);
  	for (;;) {
  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);

  		if (kvm_arch_vcpu_runnable(vcpu)) {
+			vcpu->pv_unhalted = 0;
  			kvm_make_request(KVM_REQ_UNHALT, vcpu);
  			break;
  		}

^ permalink raw reply related	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17 18:36                 ` Raghavendra K T
@ 2012-01-17 21:57                   ` Dave Hansen
  -1 siblings, 0 replies; 139+ messages in thread
From: Dave Hansen @ 2012-01-17 21:57 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Suzuki Poulose

On 01/17/2012 10:36 AM, Raghavendra K T wrote:
> It was a quick test.  two iteration of kernbench (=6runs) and had
> ensured cache is cleared.
> 
> echo "1" > /proc/sys/vm/drop_caches
> ccache -C. Yes may be I can run test as you mentioned..

echo 3 > /proc/sys/vm/drop_caches

is better.  1 will only do page cache, but 3 also does dentries fwiw.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-17 21:57                   ` Dave Hansen
  0 siblings, 0 replies; 139+ messages in thread
From: Dave Hansen @ 2012-01-17 21:57 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Suzuki Poulose

On 01/17/2012 10:36 AM, Raghavendra K T wrote:
> It was a quick test.  two iteration of kernbench (=6runs) and had
> ensured cache is cleared.
> 
> echo "1" > /proc/sys/vm/drop_caches
> ccache -C. Yes may be I can run test as you mentioned..

echo 3 > /proc/sys/vm/drop_caches

is better.  1 will only do page cache, but 3 also does dentries fwiw.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-17 11:33       ` Srivatsa Vaddagiri
@ 2012-01-18  1:34         ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-18  1:34 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/17/2012 10:33 PM, Srivatsa Vaddagiri wrote:
> * Marcelo Tosatti <mtosatti@redhat.com> [2012-01-17 09:02:11]:
>
>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>> +{
>>> +	int cpu;
>>> +	int apicid;
>>> +
>>> +	add_stats(RELEASED_SLOW, 1);
>>> +
>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
>>> +			kvm_kick_cpu(apicid);
>>> +			break;
>>> +		}
>>> +	}
>> What prevents a kick from being lost here, if say, the waiter is at
>> local_irq_save in kvm_lock_spinning, before the lock/want assignments?
> The waiter does check for lock becoming available before actually
> sleeping:
>
> +	/*
> +        * check again make sure it didn't become free while
> +        * we weren't looking.
> +        */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +               add_stats(TAKEN_SLOW_PICKUP, 1);
> +               goto out;
> +	}

That logic relies on the "kick" being level triggered, so that "kick"
before "block" will cause the block to fall out immediately.  If you're
using "hlt" as the block and it has the usual edge-triggered behaviour,
what stops a "kick-before-hlt" from losing the kick?

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-18  1:34         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-18  1:34 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On 01/17/2012 10:33 PM, Srivatsa Vaddagiri wrote:
> * Marcelo Tosatti <mtosatti@redhat.com> [2012-01-17 09:02:11]:
>
>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>> +{
>>> +	int cpu;
>>> +	int apicid;
>>> +
>>> +	add_stats(RELEASED_SLOW, 1);
>>> +
>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>> +			apicid = per_cpu(x86_cpu_to_apicid, cpu);
>>> +			kvm_kick_cpu(apicid);
>>> +			break;
>>> +		}
>>> +	}
>> What prevents a kick from being lost here, if say, the waiter is at
>> local_irq_save in kvm_lock_spinning, before the lock/want assignments?
> The waiter does check for lock becoming available before actually
> sleeping:
>
> +	/*
> +        * check again make sure it didn't become free while
> +        * we weren't looking.
> +        */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +               add_stats(TAKEN_SLOW_PICKUP, 1);
> +               goto out;
> +	}

That logic relies on the "kick" being level triggered, so that "kick"
before "block" will cause the block to fall out immediately.  If you're
using "hlt" as the block and it has the usual edge-triggered behaviour,
what stops a "kick-before-hlt" from losing the kick?

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17 21:57                   ` Dave Hansen
@ 2012-01-18  2:27                     ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18  2:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Suzuki Poulose

On 01/18/2012 03:27 AM, Dave Hansen wrote:
> On 01/17/2012 10:36 AM, Raghavendra K T wrote:
>> It was a quick test.  two iteration of kernbench (=6runs) and had
>> ensured cache is cleared.
>>
>> echo "1">  /proc/sys/vm/drop_caches
>> ccache -C. Yes may be I can run test as you mentioned..
>
> echo 3>  /proc/sys/vm/drop_caches
>
> is better.  1 will only do page cache, but 3 also does dentries fwiw.
>

yes. that needs to be used.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-18  2:27                     ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18  2:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Suzuki Poulose

On 01/18/2012 03:27 AM, Dave Hansen wrote:
> On 01/17/2012 10:36 AM, Raghavendra K T wrote:
>> It was a quick test.  two iteration of kernbench (=6runs) and had
>> ensured cache is cleared.
>>
>> echo "1">  /proc/sys/vm/drop_caches
>> ccache -C. Yes may be I can run test as you mentioned..
>
> echo 3>  /proc/sys/vm/drop_caches
>
> is better.  1 will only do page cache, but 3 also does dentries fwiw.
>

yes. that needs to be used.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17  0:30         ` Jeremy Fitzhardinge
@ 2012-01-18 10:23           ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18 10:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Alexander Graf
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, Raghavendra, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave

On 01/17/2012 06:00 AM, Jeremy Fitzhardinge wrote:
> On 01/16/2012 09:24 PM, Alexander Graf wrote:
>> This is true in case you're spinning. If on overcommit spinlocks would
>> instead of spin just yield(), we wouldn't have any vcpu running that's
>> just waiting for a late ticket.
>
> Yes, but the reality is that most spinlocks are held for a short period
> of time and there's a low likelihood of being preempted while within a
> spinlock critical section.  Therefore if someone else tries to get the
> spinlock and there's contention, it's always worth spinning for a little
> while because the lock will likely become free soon.

I too believe that lock-holder-preemption forms small part of the 
problem. Non - PLE machine results seem to support that for the patch.

>
> At least that's the case if the lock has low contention (shallow queue
> depth and not in slow state).  Again, maybe it makes sense to never spin
> for deep queues or already slowstate locks.
>
>> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>>
>>    * don't change the uncontended case
>
> I don't follow you.  What do you mean by "the normal spin dance"?  What
> do you mean by "have another spinlock notify us in the CPU"?  Don't
> change which uncontended case?  Do you mean in the locking path?  Or the
> unlock path?  Or both?
>
>>    * can set the threshold on the host, which knows how contended the system is
>
> Hm, I'm not convinced that knowing how contended the system is is all
> that useful overall.  What's important is how contended a particular
> lock is, and what state the current holder is in.  If it's not currently
> running, then knowing the overall system contention would give you some
> idea about how long you need to wait for it to be rescheduled, but
> that's getting pretty indirect.
>
> I think the "slowpath if preempted while spinning" idea I mentioned in
> the other mail is probably worth following up, since that give specific
> actionable information to the guest from the hypervisor.  But lots of
> caveats.
>
> [[
> A possible mechanism:
>
>    * register ranges of [er]ips with the hypervisor
>    * each range is paired with a "resched handler block"
>    * if vcpu is preempted within such a range, make sure it is
>      rescheduled in the resched handler block
>
> This is obviously akin to the exception mechanism, but it is partially
> implemented by the hypervisor.  It allows the spinlock code to be
> unchanged from native, but make use of a resched rather than an explicit
> counter to determine when to slowpath the lock.  And it's a nice general
> mechanism that could be potentially useful elsewhere.
>
> Unfortunately, it doesn't change the unlock path at all; it still needs
> to explicitly test if a VCPU needs to be kicked on unlock.
> ]]
>
>
>> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.
>
> You've left a pile of parts of an idea lying around, but I'm not sure
> what shape you intend it to be.

Interesting option But, Is it a feasible option to have specific
registers ? Considering we can have nested locks [ which means we need
a table ] and also considering the fact that "normal" spinlock 
acquisition path should have less overhead.

>
>      J
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-18 10:23           ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18 10:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Alexander Graf
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, Raghavendra, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML

On 01/17/2012 06:00 AM, Jeremy Fitzhardinge wrote:
> On 01/16/2012 09:24 PM, Alexander Graf wrote:
>> This is true in case you're spinning. If on overcommit spinlocks would
>> instead of spin just yield(), we wouldn't have any vcpu running that's
>> just waiting for a late ticket.
>
> Yes, but the reality is that most spinlocks are held for a short period
> of time and there's a low likelihood of being preempted while within a
> spinlock critical section.  Therefore if someone else tries to get the
> spinlock and there's contention, it's always worth spinning for a little
> while because the lock will likely become free soon.

I too believe that lock-holder-preemption forms small part of the 
problem. Non - PLE machine results seem to support that for the patch.

>
> At least that's the case if the lock has low contention (shallow queue
> depth and not in slow state).  Again, maybe it makes sense to never spin
> for deep queues or already slowstate locks.
>
>> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>>
>>    * don't change the uncontended case
>
> I don't follow you.  What do you mean by "the normal spin dance"?  What
> do you mean by "have another spinlock notify us in the CPU"?  Don't
> change which uncontended case?  Do you mean in the locking path?  Or the
> unlock path?  Or both?
>
>>    * can set the threshold on the host, which knows how contended the system is
>
> Hm, I'm not convinced that knowing how contended the system is is all
> that useful overall.  What's important is how contended a particular
> lock is, and what state the current holder is in.  If it's not currently
> running, then knowing the overall system contention would give you some
> idea about how long you need to wait for it to be rescheduled, but
> that's getting pretty indirect.
>
> I think the "slowpath if preempted while spinning" idea I mentioned in
> the other mail is probably worth following up, since that give specific
> actionable information to the guest from the hypervisor.  But lots of
> caveats.
>
> [[
> A possible mechanism:
>
>    * register ranges of [er]ips with the hypervisor
>    * each range is paired with a "resched handler block"
>    * if vcpu is preempted within such a range, make sure it is
>      rescheduled in the resched handler block
>
> This is obviously akin to the exception mechanism, but it is partially
> implemented by the hypervisor.  It allows the spinlock code to be
> unchanged from native, but make use of a resched rather than an explicit
> counter to determine when to slowpath the lock.  And it's a nice general
> mechanism that could be potentially useful elsewhere.
>
> Unfortunately, it doesn't change the unlock path at all; it still needs
> to explicitly test if a VCPU needs to be kicked on unlock.
> ]]
>
>
>> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.
>
> You've left a pile of parts of an idea lying around, but I'm not sure
> what shape you intend it to be.

Interesting option But, Is it a feasible option to have specific
registers ? Considering we can have nested locks [ which means we need
a table ] and also considering the fact that "normal" spinlock 
acquisition path should have less overhead.

>
>      J
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-16 23:59         ` Jeremy Fitzhardinge
@ 2012-01-18 10:48           ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18 10:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Avi Kivity
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, Raghavendra, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On 01/17/2012 05:29 AM, Jeremy Fitzhardinge wrote:
> On 01/16/2012 07:55 PM, Avi Kivity wrote:
>> On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
>>>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>>>>
>>>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>>> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
>> The wakeup path is slower though.  The previous lock holder has to
>> hypercall, and the new lock holder has to be scheduled, and transition
>> from halted state to running (a vmentry).  So it's only a clear win if
>> we can do something with the cpu other than go into the idle loop.
>
> Not burning power is a win too.
>
> Actually what you want is something like "if you preempt a VCPU while
> its spinning in a lock, then push it into the slowpath and don't
> reschedule it without a kick".  But I think that interface would have a
> lot of fiddly corners.
>

Yes wakeup path is little slower but better than burning cpu. no?

Suppose we have  16 vcpu,
vcpu 1- lockholder (preempted).
vcpu 2-8 - in slowpath.

If scheduler schedules vcpu-1 that is most favourable for lock progress,
But if vcpu-9 - vcpu-16 OR something else scheduled, then  also it's a 
win right (we are doing some useful work), but yes lock progress is
again little slower though.

The optimization areas of interests are perhaps:
(1) suppose vcpu-1 is running and is about to release lock and next
vcpu in queue just goes to halt(). so this makes us to tune 
SPIN_THRESHOLD rightly and have a mechanism to determine if lock-holder 
is running and do continue spin. Identifying whether lock-holder is 
running would be easier task and can be next step of optimization.

(2) Much talked, identifying lockholder-preemption (vcpu) and do
yield_to().

But I am not sure how complicated is yield_to() implementation once we 
have identified the exact preempted vcpu (lock-holder).

>      J
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-18 10:48           ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-18 10:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Avi Kivity
  Cc: Greg Kroah-Hartman, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, Raghavendra, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Rob Landley, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On 01/17/2012 05:29 AM, Jeremy Fitzhardinge wrote:
> On 01/16/2012 07:55 PM, Avi Kivity wrote:
>> On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
>>>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>>>>
>>>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>>> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
>> The wakeup path is slower though.  The previous lock holder has to
>> hypercall, and the new lock holder has to be scheduled, and transition
>> from halted state to running (a vmentry).  So it's only a clear win if
>> we can do something with the cpu other than go into the idle loop.
>
> Not burning power is a win too.
>
> Actually what you want is something like "if you preempt a VCPU while
> its spinning in a lock, then push it into the slowpath and don't
> reschedule it without a kick".  But I think that interface would have a
> lot of fiddly corners.
>

Yes wakeup path is little slower but better than burning cpu. no?

Suppose we have  16 vcpu,
vcpu 1- lockholder (preempted).
vcpu 2-8 - in slowpath.

If scheduler schedules vcpu-1 that is most favourable for lock progress,
But if vcpu-9 - vcpu-16 OR something else scheduled, then  also it's a 
win right (we are doing some useful work), but yes lock progress is
again little slower though.

The optimization areas of interests are perhaps:
(1) suppose vcpu-1 is running and is about to release lock and next
vcpu in queue just goes to halt(). so this makes us to tune 
SPIN_THRESHOLD rightly and have a mechanism to determine if lock-holder 
is running and do continue spin. Identifying whether lock-holder is 
running would be easier task and can be next step of optimization.

(2) Much talked, identifying lockholder-preemption (vcpu) and do
yield_to().

But I am not sure how complicated is yield_to() implementation once we 
have identified the exact preempted vcpu (lock-holder).

>      J
>
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-18  1:34         ` Jeremy Fitzhardinge
@ 2012-01-18 13:54           ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-18 13:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

* Jeremy Fitzhardinge <jeremy@goop.org> [2012-01-18 12:34:42]:

> >> What prevents a kick from being lost here, if say, the waiter is at
> >> local_irq_save in kvm_lock_spinning, before the lock/want assignments?
> > The waiter does check for lock becoming available before actually
> > sleeping:
> >
> > +	/*
> > +        * check again make sure it didn't become free while
> > +        * we weren't looking.
> > +        */
> > +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> > +               add_stats(TAKEN_SLOW_PICKUP, 1);
> > +               goto out;
> > +	}
> 
> That logic relies on the "kick" being level triggered, so that "kick"
> before "block" will cause the block to fall out immediately.  If you're
> using "hlt" as the block and it has the usual edge-triggered behaviour,
> what stops a "kick-before-hlt" from losing the kick?

Hmm ..'hlt' should result in a check for kick request (in hypervisor
context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
before it puts vcpu0 to sleep because of trapped 'hlt' instruction.

Won't that trap the 'kick-before-hlt' case? What am I missing here?

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-18 13:54           ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-18 13:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

* Jeremy Fitzhardinge <jeremy@goop.org> [2012-01-18 12:34:42]:

> >> What prevents a kick from being lost here, if say, the waiter is at
> >> local_irq_save in kvm_lock_spinning, before the lock/want assignments?
> > The waiter does check for lock becoming available before actually
> > sleeping:
> >
> > +	/*
> > +        * check again make sure it didn't become free while
> > +        * we weren't looking.
> > +        */
> > +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> > +               add_stats(TAKEN_SLOW_PICKUP, 1);
> > +               goto out;
> > +	}
> 
> That logic relies on the "kick" being level triggered, so that "kick"
> before "block" will cause the block to fall out immediately.  If you're
> using "hlt" as the block and it has the usual edge-triggered behaviour,
> what stops a "kick-before-hlt" from losing the kick?

Hmm ..'hlt' should result in a check for kick request (in hypervisor
context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
before it puts vcpu0 to sleep because of trapped 'hlt' instruction.

Won't that trap the 'kick-before-hlt' case? What am I missing here?

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-18 13:54           ` Srivatsa Vaddagiri
@ 2012-01-18 21:52             ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-18 21:52 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose

On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
>
>> That logic relies on the "kick" being level triggered, so that "kick"
>> before "block" will cause the block to fall out immediately.  If you're
>> using "hlt" as the block and it has the usual edge-triggered behaviour,
>> what stops a "kick-before-hlt" from losing the kick?
> Hmm ..'hlt' should result in a check for kick request (in hypervisor
> context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
> will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
> before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
>
> Won't that trap the 'kick-before-hlt' case? What am I missing here?

Nothing, that sounds fine.  It wasn't clear to me that your kick
operation left persistent state, and so has a level-triggered effect on hlt.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-18 21:52             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 139+ messages in thread
From: Jeremy Fitzhardinge @ 2012-01-18 21:52 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: X86, linux-doc, Peter Zijlstra, Jan Kiszka, Virtualization,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, KVM, Glauber Costa, Raghavendra K T, Ingo Molnar,
	Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen

On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
>
>> That logic relies on the "kick" being level triggered, so that "kick"
>> before "block" will cause the block to fall out immediately.  If you're
>> using "hlt" as the block and it has the usual edge-triggered behaviour,
>> what stops a "kick-before-hlt" from losing the kick?
> Hmm ..'hlt' should result in a check for kick request (in hypervisor
> context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
> will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
> before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
>
> Won't that trap the 'kick-before-hlt' case? What am I missing here?

Nothing, that sounds fine.  It wasn't clear to me that your kick
operation left persistent state, and so has a level-triggered effect on hlt.

    J

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2012-01-17 15:53                           ` Marcelo Tosatti
  (?)
@ 2012-01-20 15:09                           ` Srivatsa Vaddagiri
  -1 siblings, 0 replies; 139+ messages in thread
From: Srivatsa Vaddagiri @ 2012-01-20 15:09 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Jeremy Fitzhardinge, Raghavendra K T, KVM, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Anthony Liguori,
	Paul Mackerras, H. Peter Anvin, Stefano Stabellini, Xen,
	Dave Jiang, Glauber Costa, X86, Ingo Molnar, Avi Kivity,
	Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek,
	Thomas Gleixner, Anthony Liguori, Greg

* Marcelo Tosatti <mtosatti@redhat.com> [2012-01-17 13:53:03]:

> on tue, jan 17, 2012 at 05:32:33pm +0200, gleb natapov wrote:
> > on tue, jan 17, 2012 at 07:58:18pm +0530, srivatsa vaddagiri wrote:
> > > * gleb natapov <gleb@redhat.com> [2012-01-17 15:20:51]:
> > > 
> > > > > having the hypercall makes the intent of vcpu (to sleep on a kick) clear to 
> > > > > hypervisor vs assuming that because of a trapped hlt instruction (which
> > > > > will anyway won't work when yield_on_hlt=0).
> > > > > 
> > > > the purpose of yield_on_hlt=0 is to allow vcpu to occupy cpu for the
> > > > entire time slice no mater what. i do not think disabling yield on hlt
> > > > is even make sense in cpu oversubscribe scenario.
> > > 
> > > Yes, so is there any real use for yield_on_hlt=0? I believe Anthony
> > > initially added it as a way to implement CPU bandwidth capping for VMs,
> > > which would ensure that busy VMs don't eat into cycles meant for a idle
> > > VM. Now that we have proper support in scheduler for CPU bandwidth capping, is 
> > > there any real world use for yield_on_hlt=0? If not, deprecate it?
> > > 
> > I was against adding it in the first place, so if IBM no longer needs it
> > I am for removing it ASAP.
> 
> +1. 
> 
> Anthony?

CCing Anthony.

Anthony, could you ACK removal of yield_on_hlt (as keeping it around will
require unnecessary complications in pv-spinlock patches)?

- vatsa

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-18 21:52             ` Jeremy Fitzhardinge
@ 2012-01-24 14:08               ` Avi Kivity
  -1 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-24 14:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/18/2012 11:52 PM, Jeremy Fitzhardinge wrote:
> On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
> >
> >> That logic relies on the "kick" being level triggered, so that "kick"
> >> before "block" will cause the block to fall out immediately.  If you're
> >> using "hlt" as the block and it has the usual edge-triggered behaviour,
> >> what stops a "kick-before-hlt" from losing the kick?
> > Hmm ..'hlt' should result in a check for kick request (in hypervisor
> > context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
> > will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
> > before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
> >
> > Won't that trap the 'kick-before-hlt' case? What am I missing here?
>
> Nothing, that sounds fine.  It wasn't clear to me that your kick
> operation left persistent state, and so has a level-triggered effect on hlt.
>

btw, this persistent state needs to be saved/restored for live
migration.  Best to put it into some MSR.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-24 14:08               ` Avi Kivity
  0 siblings, 0 replies; 139+ messages in thread
From: Avi Kivity @ 2012-01-24 14:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

On 01/18/2012 11:52 PM, Jeremy Fitzhardinge wrote:
> On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
> >
> >> That logic relies on the "kick" being level triggered, so that "kick"
> >> before "block" will cause the block to fall out immediately.  If you're
> >> using "hlt" as the block and it has the usual edge-triggered behaviour,
> >> what stops a "kick-before-hlt" from losing the kick?
> > Hmm ..'hlt' should result in a check for kick request (in hypervisor
> > context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
> > will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
> > before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
> >
> > Won't that trap the 'kick-before-hlt' case? What am I missing here?
>
> Nothing, that sounds fine.  It wasn't clear to me that your kick
> operation left persistent state, and so has a level-triggered effect on hlt.
>

btw, this persistent state needs to be saved/restored for live
migration.  Best to put it into some MSR.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-24 14:08               ` Avi Kivity
@ 2012-01-24 18:51                 ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-24 18:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen, Suzuki

On 01/24/2012 07:38 PM, Avi Kivity wrote:
> On 01/18/2012 11:52 PM, Jeremy Fitzhardinge wrote:
>> On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
>>>
>>>> That logic relies on the "kick" being level triggered, so that "kick"
>>>> before "block" will cause the block to fall out immediately.  If you're
>>>> using "hlt" as the block and it has the usual edge-triggered behaviour,
>>>> what stops a "kick-before-hlt" from losing the kick?
>>> Hmm ..'hlt' should result in a check for kick request (in hypervisor
>>> context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
>>> will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check
>>> before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
>>>
>>> Won't that trap the 'kick-before-hlt' case? What am I missing here?
>>
>> Nothing, that sounds fine.  It wasn't clear to me that your kick
>> operation left persistent state, and so has a level-triggered effect on hlt.
>>
>
> btw, this persistent state needs to be saved/restored for live
> migration.  Best to put it into some MSR.
>

I did not quite get it. Did you mean, add a new MSR to msrs_to_save[],
and may be retain only the kicked/pv_unhalt flag (persistent state) and 
get rid of PVLOCK_KICK vcpu->request?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
@ 2012-01-24 18:51                 ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-24 18:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen, Suzuki

On 01/24/2012 07:38 PM, Avi Kivity wrote:
> On 01/18/2012 11:52 PM, Jeremy Fitzhardinge wrote:
>> On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
>>>
>>>> That logic relies on the "kick" being level triggered, so that "kick"
>>>> before "block" will cause the block to fall out immediately.  If you're
>>>> using "hlt" as the block and it has the usual edge-triggered behaviour,
>>>> what stops a "kick-before-hlt" from losing the kick?
>>> Hmm ..'hlt' should result in a check for kick request (in hypervisor
>>> context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick vcpu0
>>> will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check
>>> before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
>>>
>>> Won't that trap the 'kick-before-hlt' case? What am I missing here?
>>
>> Nothing, that sounds fine.  It wasn't clear to me that your kick
>> operation left persistent state, and so has a level-triggered effect on hlt.
>>
>
> btw, this persistent state needs to be saved/restored for live
> migration.  Best to put it into some MSR.
>

I did not quite get it. Did you mean, add a new MSR to msrs_to_save[],
and may be retain only the kicked/pv_unhalt flag (persistent state) and 
get rid of PVLOCK_KICK vcpu->request?

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
  2012-01-17 18:57       ` Raghavendra K T
  (?)
@ 2012-01-24 19:01       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-24 19:01 UTC (permalink / raw)
  To: Marcelo Tosatti, Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML

On 01/18/2012 12:27 AM, Raghavendra K T wrote:
> On 01/17/2012 04:32 PM, Marcelo Tosatti wrote:
>> On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
[...]
>>> + || (vcpu->requests& ~(1UL<<KVM_REQ_PVLOCK_KICK))
>>> + || need_resched() || signal_pending(current)) {
>>> vcpu->mode = OUTSIDE_GUEST_MODE;
>>> smp_wmb();
>>> local_irq_enable();
>>> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>>> !vcpu->arch.apf.halted)
>>> || !list_empty_careful(&vcpu->async_pf.done)
>>> || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>>> + || kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
>>
>> The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
>> on vcpu entry (along with the other kvm_check_request processing).
>>
[...]
>
> I had tried alternative approach earlier, I think that is closer
> to your expectation.
>
> where
> - flag is read in kvm_arch_vcpu_runnable
> - flag cleared in vcpu entry along with others.
>
> But it needs per vcpu flag to remember that pv_unhalted while clearing
> the flag in vcpu enter [ patch below ]. Could not find third alternative
> though.
[...]
> do you think having pv_unhalt flag in below patch cause problem to
> live migration still? (vcpu->request bit is retained as is) OR do we
> have to have KVM_GET_MP_STATE changes also with below patch you
> mentioned earlier.
>

Avi, Marcello, Please let me know, any comments you have on how should
it look like in next version?
Should I get rid of KVM_REQ_PVLOCK_KICK bit in vcpu->request and have
only pv_unahlt flag as below and also add MSR as suggested?

> ---8<---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..1bf8fa8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5684,6 +5717,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> r = 1;
> goto out;
> }
> + if (kvm_check_request(KVM_REQ_PVKICK, vcpu)) {
> + vcpu->pv_unhalted = 1;
> + r = 1;
> + goto out;
> + }
> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
> record_steal_time(vcpu);
> if (kvm_check_request(KVM_REQ_NMI, vcpu))
> @@ -6683,6 +6720,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> !vcpu->arch.apf.halted)
> || !list_empty_careful(&vcpu->async_pf.done)
> || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> + || (kvm_test_request(KVM_REQ_PVKICK, vcpu) || vcpu->pv_unhalted)
> || atomic_read(&vcpu->arch.nmi_queued) ||
> (kvm_arch_interrupt_allowed(vcpu) &&
> kvm_cpu_has_interrupt(vcpu));
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d526231..a48e0f2 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -154,6 +155,8 @@ struct kvm_vcpu {
> #endif
>
> struct kvm_vcpu_arch arch;
> +
> + int pv_unhalted;
> };
>
> static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> @@ -770,5 +773,12 @@ static inline bool kvm_check_request(int req,
> struct kvm_vcpu *vcpu)
> }
> }
>
> +static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
> +{
> + if (test_bit(req, &vcpu->requests))
> + return true;
> + else
> + return false;
> +}
> #endif
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d9cfb78..55c44a2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
> *kvm, unsigned id)
> vcpu->kvm = kvm;
> vcpu->vcpu_id = id;
> vcpu->pid = NULL;
> + vcpu->pv_unhalted = 0;
> init_waitqueue_head(&vcpu->wq);
> kvm_async_pf_vcpu_init(vcpu);
>
> @@ -1509,11 +1510,12 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
> void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> {
> DEFINE_WAIT(wait);
> for (;;) {
> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>
> if (kvm_arch_vcpu_runnable(vcpu)) {
> + vcpu->pv_unhalted = 0;
> kvm_make_request(KVM_REQ_UNHALT, vcpu);
> break;
> }
>

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-17 18:36                 ` Raghavendra K T
@ 2012-01-25  8:55                   ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-25  8:55 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

On 01/18/2012 12:06 AM, Raghavendra K T wrote:
> On 01/17/2012 11:09 PM, Alexander Graf wrote:
[...]
>>>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> B. pre-3.2.0 + Jeremy's above patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> C. pre-3.2.0 + Jeremy's above patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = y
>>>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = y
[...]
>> Maybe it'd be a good idea to create a small in-kernel microbenchmark
>> with a couple threads that take spinlocks, then do work for a
>> specified number of cycles, then release them again and start anew. At
>> the end of it, we can check how long the whole thing took for n runs.
>> That would enable us to measure the worst case scenario.
>>
>
> It was a quick test. two iteration of kernbench (=6runs) and had ensured
> cache is cleared.
>
> echo "1" > /proc/sys/vm/drop_caches
> ccache -C. Yes may be I can run test as you mentioned..
>

Sorry for late reply. Was trying to do more performance analysis.
Measured the worst case scenario with a spinlock stress driver
[ attached below ]. I think S1 (below) is what you were
looking for:

2 types of scenarios:
S1.
lock()
increment counter.
unlock()

S2:
do_somework()
lock()
do_conditional_work() /* this is to give variable spinlock hold time */
unlock()

Setup:
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8
core , 64GB RAM, 16 online cpus.
The below results are taken across total 18 Runs of
insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000

Results:
scenario S1: plain counter
==========================
     total Mega cycles taken for completion (std)
A.  12343.833333      (1254.664021)
B.  12817.111111      (917.791606)
C.  13426.555556      (844.882978)

%improvement w.r.t BASE     -8.77

scenario S2: counter with variable work inside lock + do_work_outside_lock
=========================================================================
A.   25077.888889      (1349.471703)
B.   24906.777778      (1447.853874)
C.   21287.000000      (2731.643644)

%improvement w.r.t BASE      15.12

So it seems we have worst case overhead of around 8%. But we see 
improvement of at-least 15% once when little more time is spent in
critical section.

>>>
>>> Guest Run
>>> ============
>>> case A case B %improvement case C %improvement
>>> 166.999 (15.7613) 161.876 (14.4874) 3.06768 161.24 (12.6497) 3.44852
>>
>> Is this the same machine? Why is the guest 3x slower?
> Yes non - ple machine but with all 16 cpus online. 3x slower you meant
> case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ?

Got your point, There were multiple reasons. guest was 32 bit, and had
only 8vcpu  and the current RAM was only 1GB (max 4GB) when I increased
it to 4GB it came around just 127 second.

There is a happy news:
I created a new 64 bit guest and ran with 16GB RAM and 16VCPU.
Kernbench in The pv spinlock (case E)  took just around 42sec (against
57 sec of host), an improvement of around 26% against host.
So its much faster rather than 3x slower.

[-- Attachment #2: spinlock_thread.c --]
[-- Type: text/x-csrc, Size: 3529 bytes --]

/*
 * spinlock_thread.c 
 *
 * Author: Raghavendra K T
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <asm/uaccess.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sched.h>
#include <linux/kthread.h>
#include <asm/msr.h>

unsigned int start, end, diff;

static struct task_struct **spintask_pid;
static DECLARE_COMPLETION(spintask_exited);

static int total_thread_exit = 0;
static DEFINE_SPINLOCK(counter_spinlock);

#define DEFAULT_NR_THREADS 4
#define DEFAULT_LOOP_COUNT 4000000L

static int nr_spinlock_threads = DEFAULT_NR_THREADS;
static long loop_count = DEFAULT_LOOP_COUNT;

module_param(nr_spinlock_threads, int, S_IRUGO);
module_param(loop_count, long, S_IRUGO);

static long count = 0;
static int a[2][2] = {{2, 5}, {3, 7}};
static int b[2][2] = {{1, 19}, {11, 13}};
static int m[2][2];
static int n[2][2];
static int res[2][2];

static inline void matrix_initialize(int id)
{
	int i, j;
	for (i=0; i<2; i++)
		for(j=0; j<2; j++) {
			m[i][j] = (id + 1) * a[i][j];
			n[i][j] = (id + 1) * b[i][j];
		}
}

static inline void matrix_mult(void)
{
	int i, j, k;
	for (i=0; i<2; i++)
		for (j=0; j<2; j++) {
			res[i][j] = 0;
			for(k=0; k<2; k++) 
				res[i][j] += m[i][k] * n[k][j];
		} 
}

static int input_check_thread(void* arg)
{
	int id = (int)arg;
	long i = loop_count;
	allow_signal(SIGKILL);
#if 0
	matrix_initialize(id);
	matrix_mult();	
#endif
	do {

		spin_lock(&counter_spinlock);
		count++;
#if 0
		if (id%3) 
			matrix_initialize(id);
		else if (id%3 + 1)
			matrix_mult();	
#endif
		spin_unlock(&counter_spinlock);
	} while(i--); 

	spin_lock(&counter_spinlock);
	total_thread_exit++;
	spin_unlock(&counter_spinlock);
	if(total_thread_exit == nr_spinlock_threads) {
		rdtscl(end);
		diff = end - start;
		complete_and_exit(&spintask_exited, 0);
	}

	return 0;
}

static int spinlock_init_module(void)
{
	int i;
	char name[20];
	printk(KERN_INFO "insmod nr_spinlock_threads = %d\n", nr_spinlock_threads);
	spintask_pid = kzalloc(sizeof(struct task_struct *)* nr_spinlock_threads, GFP_KERNEL);
	rdtscl(start);
	for (i=0; i<nr_spinlock_threads; i++)
	{
		sprintf(name, "spintask%d", i);
		spintask_pid[i] = kthread_run(input_check_thread,(void *)i, name);
	}

	return 0;
}

static void spinlock_cleanup_module(void)
{
	wait_for_completion(&spintask_exited);
	kfree(spintask_pid);
	printk(KERN_INFO "rmmod count = %ld time elaspsed=%u\n", count, diff);
}

module_init(spinlock_init_module);
module_exit(spinlock_cleanup_module);

MODULE_PARM_DESC(loopcount, "How many iterations counter should be incremented");
MODULE_PARM_DESC(nr_spinlock_threads, "How many kernel threads to be spawned");
MODULE_AUTHOR("Raghavendra K T");
MODULE_DESCRIPTION("spinlock stress driver");
MODULE_LICENSE("GPL");

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-25  8:55                   ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-25  8:55 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Virtualization, LKML, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

On 01/18/2012 12:06 AM, Raghavendra K T wrote:
> On 01/17/2012 11:09 PM, Alexander Graf wrote:
[...]
>>>>> A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> B. pre-3.2.0 + Jeremy's above patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> C. pre-3.2.0 + Jeremy's above patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = y
>>>>> D. pre-3.2.0 + Jeremy's above patches + V5 patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = n
>>>>> E. pre-3.2.0 + Jeremy's above patches + V5 patches with
>>>>> CONFIG_PARAVIRT_SPINLOCKS = y
[...]
>> Maybe it'd be a good idea to create a small in-kernel microbenchmark
>> with a couple threads that take spinlocks, then do work for a
>> specified number of cycles, then release them again and start anew. At
>> the end of it, we can check how long the whole thing took for n runs.
>> That would enable us to measure the worst case scenario.
>>
>
> It was a quick test. two iteration of kernbench (=6runs) and had ensured
> cache is cleared.
>
> echo "1" > /proc/sys/vm/drop_caches
> ccache -C. Yes may be I can run test as you mentioned..
>

Sorry for late reply. Was trying to do more performance analysis.
Measured the worst case scenario with a spinlock stress driver
[ attached below ]. I think S1 (below) is what you were
looking for:

2 types of scenarios:
S1.
lock()
increment counter.
unlock()

S2:
do_somework()
lock()
do_conditional_work() /* this is to give variable spinlock hold time */
unlock()

Setup:
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8
core , 64GB RAM, 16 online cpus.
The below results are taken across total 18 Runs of
insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000

Results:
scenario S1: plain counter
==========================
     total Mega cycles taken for completion (std)
A.  12343.833333      (1254.664021)
B.  12817.111111      (917.791606)
C.  13426.555556      (844.882978)

%improvement w.r.t BASE     -8.77

scenario S2: counter with variable work inside lock + do_work_outside_lock
=========================================================================
A.   25077.888889      (1349.471703)
B.   24906.777778      (1447.853874)
C.   21287.000000      (2731.643644)

%improvement w.r.t BASE      15.12

So it seems we have worst case overhead of around 8%. But we see 
improvement of at-least 15% once when little more time is spent in
critical section.

>>>
>>> Guest Run
>>> ============
>>> case A case B %improvement case C %improvement
>>> 166.999 (15.7613) 161.876 (14.4874) 3.06768 161.24 (12.6497) 3.44852
>>
>> Is this the same machine? Why is the guest 3x slower?
> Yes non - ple machine but with all 16 cpus online. 3x slower you meant
> case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ?

Got your point, There were multiple reasons. guest was 32 bit, and had
only 8vcpu  and the current RAM was only 1GB (max 4GB) when I increased
it to 4GB it came around just 127 second.

There is a happy news:
I created a new 64 bit guest and ran with 16GB RAM and 16VCPU.
Kernbench in The pv spinlock (case E)  took just around 42sec (against
57 sec of host), an improvement of around 26% against host.
So its much faster rather than 3x slower.

[-- Attachment #2: spinlock_thread.c --]
[-- Type: text/x-csrc, Size: 3529 bytes --]

/*
 * spinlock_thread.c 
 *
 * Author: Raghavendra K T
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <asm/uaccess.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sched.h>
#include <linux/kthread.h>
#include <asm/msr.h>

unsigned int start, end, diff;

static struct task_struct **spintask_pid;
static DECLARE_COMPLETION(spintask_exited);

static int total_thread_exit = 0;
static DEFINE_SPINLOCK(counter_spinlock);

#define DEFAULT_NR_THREADS 4
#define DEFAULT_LOOP_COUNT 4000000L

static int nr_spinlock_threads = DEFAULT_NR_THREADS;
static long loop_count = DEFAULT_LOOP_COUNT;

module_param(nr_spinlock_threads, int, S_IRUGO);
module_param(loop_count, long, S_IRUGO);

static long count = 0;
static int a[2][2] = {{2, 5}, {3, 7}};
static int b[2][2] = {{1, 19}, {11, 13}};
static int m[2][2];
static int n[2][2];
static int res[2][2];

static inline void matrix_initialize(int id)
{
	int i, j;
	for (i=0; i<2; i++)
		for(j=0; j<2; j++) {
			m[i][j] = (id + 1) * a[i][j];
			n[i][j] = (id + 1) * b[i][j];
		}
}

static inline void matrix_mult(void)
{
	int i, j, k;
	for (i=0; i<2; i++)
		for (j=0; j<2; j++) {
			res[i][j] = 0;
			for(k=0; k<2; k++) 
				res[i][j] += m[i][k] * n[k][j];
		} 
}

static int input_check_thread(void* arg)
{
	int id = (int)arg;
	long i = loop_count;
	allow_signal(SIGKILL);
#if 0
	matrix_initialize(id);
	matrix_mult();	
#endif
	do {

		spin_lock(&counter_spinlock);
		count++;
#if 0
		if (id%3) 
			matrix_initialize(id);
		else if (id%3 + 1)
			matrix_mult();	
#endif
		spin_unlock(&counter_spinlock);
	} while(i--); 

	spin_lock(&counter_spinlock);
	total_thread_exit++;
	spin_unlock(&counter_spinlock);
	if(total_thread_exit == nr_spinlock_threads) {
		rdtscl(end);
		diff = end - start;
		complete_and_exit(&spintask_exited, 0);
	}

	return 0;
}

static int spinlock_init_module(void)
{
	int i;
	char name[20];
	printk(KERN_INFO "insmod nr_spinlock_threads = %d\n", nr_spinlock_threads);
	spintask_pid = kzalloc(sizeof(struct task_struct *)* nr_spinlock_threads, GFP_KERNEL);
	rdtscl(start);
	for (i=0; i<nr_spinlock_threads; i++)
	{
		sprintf(name, "spintask%d", i);
		spintask_pid[i] = kthread_run(input_check_thread,(void *)i, name);
	}

	return 0;
}

static void spinlock_cleanup_module(void)
{
	wait_for_completion(&spintask_exited);
	kfree(spintask_pid);
	printk(KERN_INFO "rmmod count = %ld time elaspsed=%u\n", count, diff);
}

module_init(spinlock_init_module);
module_exit(spinlock_cleanup_module);

MODULE_PARM_DESC(loopcount, "How many iterations counter should be incremented");
MODULE_PARM_DESC(nr_spinlock_threads, "How many kernel threads to be spawned");
MODULE_AUTHOR("Raghavendra K T");
MODULE_DESCRIPTION("spinlock stress driver");
MODULE_LICENSE("GPL");

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-25  8:55                   ` Raghavendra K T
@ 2012-01-25 16:35                     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-25 16:35 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On Wed, Jan 25, 2012 at 02:25:12PM +0530, Raghavendra K T wrote:
> On 01/18/2012 12:06 AM, Raghavendra K T wrote:
> >On 01/17/2012 11:09 PM, Alexander Graf wrote:
> [...]
> >>>>>A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>B. pre-3.2.0 + Jeremy's above patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>C. pre-3.2.0 + Jeremy's above patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = y
> >>>>>D. pre-3.2.0 + Jeremy's above patches + V5 patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>E. pre-3.2.0 + Jeremy's above patches + V5 patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = y
> [...]
> >>Maybe it'd be a good idea to create a small in-kernel microbenchmark
> >>with a couple threads that take spinlocks, then do work for a
> >>specified number of cycles, then release them again and start anew. At
> >>the end of it, we can check how long the whole thing took for n runs.
> >>That would enable us to measure the worst case scenario.
> >>
> >
> >It was a quick test. two iteration of kernbench (=6runs) and had ensured
> >cache is cleared.
> >
> >echo "1" > /proc/sys/vm/drop_caches
> >ccache -C. Yes may be I can run test as you mentioned..
> >
> 
> Sorry for late reply. Was trying to do more performance analysis.
> Measured the worst case scenario with a spinlock stress driver
> [ attached below ]. I think S1 (below) is what you were
> looking for:
> 
> 2 types of scenarios:
> S1.
> lock()
> increment counter.
> unlock()
> 
> S2:
> do_somework()
> lock()
> do_conditional_work() /* this is to give variable spinlock hold time */
> unlock()
> 
> Setup:
> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8
> core , 64GB RAM, 16 online cpus.
> The below results are taken across total 18 Runs of
> insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000
> 
> Results:
> scenario S1: plain counter
> ==========================
>     total Mega cycles taken for completion (std)
> A.  12343.833333      (1254.664021)
> B.  12817.111111      (917.791606)
> C.  13426.555556      (844.882978)
> 
> %improvement w.r.t BASE     -8.77
> 
> scenario S2: counter with variable work inside lock + do_work_outside_lock
> =========================================================================
> A.   25077.888889      (1349.471703)
> B.   24906.777778      (1447.853874)
> C.   21287.000000      (2731.643644)
> 
> %improvement w.r.t BASE      15.12
> 
> So it seems we have worst case overhead of around 8%. But we see
> improvement of at-least 15% once when little more time is spent in
> critical section.

Is this with collecting the histogram information about spinlocks? We found
that if you enable that for production runs it makes them quite slower.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-25 16:35                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-25 16:35 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On Wed, Jan 25, 2012 at 02:25:12PM +0530, Raghavendra K T wrote:
> On 01/18/2012 12:06 AM, Raghavendra K T wrote:
> >On 01/17/2012 11:09 PM, Alexander Graf wrote:
> [...]
> >>>>>A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>B. pre-3.2.0 + Jeremy's above patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>C. pre-3.2.0 + Jeremy's above patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = y
> >>>>>D. pre-3.2.0 + Jeremy's above patches + V5 patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = n
> >>>>>E. pre-3.2.0 + Jeremy's above patches + V5 patches with
> >>>>>CONFIG_PARAVIRT_SPINLOCKS = y
> [...]
> >>Maybe it'd be a good idea to create a small in-kernel microbenchmark
> >>with a couple threads that take spinlocks, then do work for a
> >>specified number of cycles, then release them again and start anew. At
> >>the end of it, we can check how long the whole thing took for n runs.
> >>That would enable us to measure the worst case scenario.
> >>
> >
> >It was a quick test. two iteration of kernbench (=6runs) and had ensured
> >cache is cleared.
> >
> >echo "1" > /proc/sys/vm/drop_caches
> >ccache -C. Yes may be I can run test as you mentioned..
> >
> 
> Sorry for late reply. Was trying to do more performance analysis.
> Measured the worst case scenario with a spinlock stress driver
> [ attached below ]. I think S1 (below) is what you were
> looking for:
> 
> 2 types of scenarios:
> S1.
> lock()
> increment counter.
> unlock()
> 
> S2:
> do_somework()
> lock()
> do_conditional_work() /* this is to give variable spinlock hold time */
> unlock()
> 
> Setup:
> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8
> core , 64GB RAM, 16 online cpus.
> The below results are taken across total 18 Runs of
> insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000
> 
> Results:
> scenario S1: plain counter
> ==========================
>     total Mega cycles taken for completion (std)
> A.  12343.833333      (1254.664021)
> B.  12817.111111      (917.791606)
> C.  13426.555556      (844.882978)
> 
> %improvement w.r.t BASE     -8.77
> 
> scenario S2: counter with variable work inside lock + do_work_outside_lock
> =========================================================================
> A.   25077.888889      (1349.471703)
> B.   24906.777778      (1447.853874)
> C.   21287.000000      (2731.643644)
> 
> %improvement w.r.t BASE      15.12
> 
> So it seems we have worst case overhead of around 8%. But we see
> improvement of at-least 15% once when little more time is spent in
> critical section.

Is this with collecting the histogram information about spinlocks? We found
that if you enable that for production runs it makes them quite slower.

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-25 16:35                     ` Konrad Rzeszutek Wilk
@ 2012-01-25 17:45                       ` Raghavendra K T
  -1 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-25 17:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On 01/25/2012 10:05 PM, Konrad Rzeszutek Wilk wrote:
>> So it seems we have worst case overhead of around 8%. But we see
>> improvement of at-least 15% once when little more time is spent in
>> critical section.
>
> Is this with collecting the histogram information about spinlocks? We found
> that if you enable that for production runs it makes them quite slower.
>

Ok. Are you referring to CONFIG_KVM_DEBUG_FS/CONFIG_XEN_DEBUG_FS?. Not
it was not enabled. But then may be I was not precise. This test was 
only on native. So it should have not affected IMO.

It is nice to know that histogram affects, since I was always under the
impression that it does not affect much on guest too. My experiments
had almost always enabled that.

Let me know if you referred to something else..

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-25 17:45                       ` Raghavendra K T
  0 siblings, 0 replies; 139+ messages in thread
From: Raghavendra K T @ 2012-01-25 17:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On 01/25/2012 10:05 PM, Konrad Rzeszutek Wilk wrote:
>> So it seems we have worst case overhead of around 8%. But we see
>> improvement of at-least 15% once when little more time is spent in
>> critical section.
>
> Is this with collecting the histogram information about spinlocks? We found
> that if you enable that for production runs it makes them quite slower.
>

Ok. Are you referring to CONFIG_KVM_DEBUG_FS/CONFIG_XEN_DEBUG_FS?. Not
it was not enabled. But then may be I was not precise. This test was 
only on native. So it should have not affected IMO.

It is nice to know that histogram affects, since I was always under the
impression that it does not affect much on guest too. My experiments
had almost always enabled that.

Let me know if you referred to something else..

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
  2012-01-25 17:45                       ` Raghavendra K T
@ 2012-01-25 19:05                         ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-25 19:05 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On Wed, Jan 25, 2012 at 11:15:36PM +0530, Raghavendra K T wrote:
> On 01/25/2012 10:05 PM, Konrad Rzeszutek Wilk wrote:
> >>So it seems we have worst case overhead of around 8%. But we see
> >>improvement of at-least 15% once when little more time is spent in
> >>critical section.
> >
> >Is this with collecting the histogram information about spinlocks? We found
> >that if you enable that for production runs it makes them quite slower.
> >
> 
> Ok. Are you referring to CONFIG_KVM_DEBUG_FS/CONFIG_XEN_DEBUG_FS?. Not

Those were the ones.

> it was not enabled. But then may be I was not precise. This test was
> only on native. So it should have not affected IMO.

Yup.
> 
> It is nice to know that histogram affects, since I was always under the
> impression that it does not affect much on guest too. My experiments
> had almost always enabled that.
> 
> Let me know if you referred to something else..

^ permalink raw reply	[flat|nested] 139+ messages in thread

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
@ 2012-01-25 19:05                         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 139+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-25 19:05 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras,
	H. Peter Anvin, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Stefano Stabellini,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Virtualization, LKML,
	Dave Hansen, Suzuki Poulose

On Wed, Jan 25, 2012 at 11:15:36PM +0530, Raghavendra K T wrote:
> On 01/25/2012 10:05 PM, Konrad Rzeszutek Wilk wrote:
> >>So it seems we have worst case overhead of around 8%. But we see
> >>improvement of at-least 15% once when little more time is spent in
> >>critical section.
> >
> >Is this with collecting the histogram information about spinlocks? We found
> >that if you enable that for production runs it makes them quite slower.
> >
> 
> Ok. Are you referring to CONFIG_KVM_DEBUG_FS/CONFIG_XEN_DEBUG_FS?. Not

Those were the ones.

> it was not enabled. But then may be I was not precise. This test was
> only on native. So it should have not affected IMO.

Yup.
> 
> It is nice to know that histogram affects, since I was always under the
> impression that it does not affect much on guest too. My experiments
> had almost always enabled that.
> 
> Let me know if you referred to something else..

^ permalink raw reply	[flat|nested] 139+ messages in thread

end of thread, other threads:[~2012-01-25 19:05 UTC | newest]

Thread overview: 139+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-14 18:25 [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Raghavendra K T
2012-01-14 18:25 ` Raghavendra K T
2012-01-14 18:25 ` Raghavendra K T
2012-01-14 18:25 ` [PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25 ` [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-16  3:24   ` Alexander Graf
2012-01-16  3:24     ` Alexander Graf
2012-01-16  8:43     ` Raghavendra K T
2012-01-16  8:43       ` Raghavendra K T
2012-01-16  9:03   ` Avi Kivity
2012-01-16  9:03     ` Avi Kivity
2012-01-16  9:55     ` Raghavendra K T
2012-01-16  9:55       ` Raghavendra K T
2012-01-14 18:26 ` [PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26 ` [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-16  3:12   ` Alexander Graf
2012-01-16  3:12     ` Alexander Graf
2012-01-16  7:25     ` Raghavendra K T
2012-01-16  7:25       ` Raghavendra K T
2012-01-16  9:05   ` Avi Kivity
2012-01-16  9:05     ` Avi Kivity
2012-01-16 14:13     ` Raghavendra K T
2012-01-16 14:13       ` Raghavendra K T
2012-01-16 14:47       ` Avi Kivity
2012-01-16 14:47         ` Avi Kivity
2012-01-16 23:49         ` Jeremy Fitzhardinge
2012-01-16 23:49           ` Jeremy Fitzhardinge
2012-01-17 11:02   ` Marcelo Tosatti
2012-01-17 11:02     ` Marcelo Tosatti
2012-01-17 11:33     ` Srivatsa Vaddagiri
2012-01-17 11:33       ` Srivatsa Vaddagiri
2012-01-18  1:34       ` Jeremy Fitzhardinge
2012-01-18  1:34         ` Jeremy Fitzhardinge
2012-01-18 13:54         ` Srivatsa Vaddagiri
2012-01-18 13:54           ` Srivatsa Vaddagiri
2012-01-18 21:52           ` Jeremy Fitzhardinge
2012-01-18 21:52             ` Jeremy Fitzhardinge
2012-01-24 14:08             ` Avi Kivity
2012-01-24 14:08               ` Avi Kivity
2012-01-24 18:51               ` Raghavendra K T
2012-01-24 18:51                 ` Raghavendra K T
2012-01-17 18:57     ` Raghavendra K T
2012-01-17 18:57       ` Raghavendra K T
2012-01-24 19:01       ` Raghavendra K T
2012-01-14 18:27 ` [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2012-01-14 18:27 ` Raghavendra K T
2012-01-14 18:27   ` Raghavendra K T
2012-01-14 18:27   ` Raghavendra K T
2012-01-16  3:23   ` Alexander Graf
2012-01-16  3:23     ` Alexander Graf
2012-01-16  3:51     ` Srivatsa Vaddagiri
2012-01-16  3:51       ` Srivatsa Vaddagiri
2012-01-16  4:00       ` Alexander Graf
2012-01-16  4:00         ` Alexander Graf
2012-01-16  8:47         ` Avi Kivity
2012-01-16  8:44     ` Raghavendra K T
2012-01-16  8:44       ` Raghavendra K T
2012-01-16 10:26       ` Alexander Graf
2012-01-16 10:26         ` Alexander Graf
2012-01-16  9:00   ` Avi Kivity
2012-01-16  9:00     ` Avi Kivity
2012-01-16  9:40     ` Srivatsa Vaddagiri
2012-01-16 10:14       ` Avi Kivity
2012-01-16 14:11         ` Srivatsa Vaddagiri
2012-01-17  9:14           ` Gleb Natapov
2012-01-17  9:14             ` Gleb Natapov
2012-01-17 12:26             ` Srivatsa Vaddagiri
2012-01-17 12:26               ` Srivatsa Vaddagiri
2012-01-17 12:51               ` Gleb Natapov
2012-01-17 12:51                 ` Gleb Natapov
2012-01-17 13:11                 ` Srivatsa Vaddagiri
2012-01-17 13:11                   ` Srivatsa Vaddagiri
2012-01-17 13:20                   ` Gleb Natapov
2012-01-17 13:20                     ` Gleb Natapov
2012-01-17 14:28                     ` Srivatsa Vaddagiri
2012-01-17 14:28                       ` Srivatsa Vaddagiri
2012-01-17 15:32                       ` Gleb Natapov
2012-01-17 15:32                         ` Gleb Natapov
2012-01-17 15:53                         ` Marcelo Tosatti
2012-01-17 15:53                           ` Marcelo Tosatti
2012-01-20 15:09                           ` Srivatsa Vaddagiri
2012-01-17 13:13                 ` Raghavendra K T
2012-01-17 13:13                   ` Raghavendra K T
2012-01-16  3:57 ` [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Alexander Graf
2012-01-16  3:57   ` Alexander Graf
2012-01-16  6:40   ` Jeremy Fitzhardinge
2012-01-16  6:40     ` Jeremy Fitzhardinge
2012-01-16  8:55     ` Avi Kivity
2012-01-16  8:55       ` Avi Kivity
2012-01-16 23:59       ` Jeremy Fitzhardinge
2012-01-16 23:59         ` Jeremy Fitzhardinge
2012-01-18 10:48         ` Raghavendra K T
2012-01-18 10:48           ` Raghavendra K T
2012-01-16 10:24     ` Alexander Graf
2012-01-16 10:24       ` Alexander Graf
2012-01-17  0:30       ` Jeremy Fitzhardinge
2012-01-17  0:30         ` Jeremy Fitzhardinge
2012-01-18 10:23         ` Raghavendra K T
2012-01-18 10:23           ` Raghavendra K T
2012-01-16 13:43   ` Raghavendra K T
2012-01-16 13:43     ` Raghavendra K T
2012-01-16 13:49     ` Avi Kivity
2012-01-16 13:49       ` Avi Kivity
2012-01-16 18:48       ` Raghavendra K T
2012-01-16 18:48         ` Raghavendra K T
2012-01-16 14:20   ` Srivatsa Vaddagiri
2012-01-16 14:20     ` Srivatsa Vaddagiri
2012-01-16 14:23     ` Alexander Graf
2012-01-16 14:23       ` Alexander Graf
2012-01-16 18:38       ` Raghavendra K T
2012-01-16 18:38         ` Raghavendra K T
2012-01-16 18:42         ` Alexander Graf
2012-01-16 18:42           ` Alexander Graf
2012-01-17 17:27           ` Raghavendra K T
2012-01-17 17:27             ` Raghavendra K T
2012-01-17 17:39             ` Alexander Graf
2012-01-17 17:39               ` Alexander Graf
2012-01-17 18:36               ` Raghavendra K T
2012-01-17 18:36                 ` Raghavendra K T
2012-01-17 21:57                 ` Dave Hansen
2012-01-17 21:57                   ` Dave Hansen
2012-01-18  2:27                   ` Raghavendra K T
2012-01-18  2:27                     ` Raghavendra K T
2012-01-25  8:55                 ` Raghavendra K T
2012-01-25  8:55                   ` Raghavendra K T
2012-01-25 16:35                   ` Konrad Rzeszutek Wilk
2012-01-25 16:35                     ` Konrad Rzeszutek Wilk
2012-01-25 17:45                     ` Raghavendra K T
2012-01-25 17:45                       ` Raghavendra K T
2012-01-25 19:05                       ` Konrad Rzeszutek Wilk
2012-01-25 19:05                         ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.