All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-07-22  6:16 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri


This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

Changes in V11:
 - use safe_halt in lock_spinning path to avoid potential problem 
  in case of irq_handlers taking lock in slowpath (Gleb)
 - add a0 flag for the kick hypercall for future extension  (Gleb)
 - add stubs for missing architecture for kvm_vcpu_schedule() (Gleb)
 - Change hypercall documentation.
 - Rebased to 3.11-rc1

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that. 

Ticket locks have an inherent problem in a virtualized case, because
the vCPUs are scheduled rather than running concurrently (ignoring
gang scheduled vCPUs).  This can result in catastrophic performance
collapses when the vCPU scheduler doesn't schedule the correct "next"
vCPU, and ends up scheduling a vCPU which burns its entire timeslice
spinning.  (Note that this is not the same problem as lock-holder
preemption, which this series also addresses; that's also a problem,
but not catastrophic).

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

Currently we deal with this by having PV spinlocks, which adds a layer
of indirection in front of all the spinlock functions, and defining a
completely new implementation for Xen (and for other pvops users, but
there are none at present).

PV ticketlocks keeps the existing ticketlock implemenentation
(fastpath) as-is, but adds a couple of pvops for the slow paths:

- If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
  iterations, then call out to the __ticket_lock_spinning() pvop,
  which allows a backend to block the vCPU rather than spinning.  This
  pvop can set the lock into "slowpath state".

- When releasing a lock, if it is in "slowpath state", the call
  __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
  lock is no longer in contention, it also clears the slowpath flag.

The "slowpath state" is stored in the LSB of the within the lock tail
ticket.  This has the effect of reducing the max number of CPUs by
half (so, a "small ticket" can deal with 128 CPUs, and "large ticket"
32768).

For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Overall, it results in a large reduction in code, it makes the native
and virtualized cases closer, and it removes a layer of indirection
around all the spinlock functions.

The fast path (taking an uncontended lock which isn't in "slowpath"
state) is optimal, identical to the non-paravirtualized case.

The inner part of ticket lock code becomes:
	inc = xadd(&lock->tickets, inc);
	inc.tail &= ~TICKET_SLOWPATH_FLAG;

	if (likely(inc.head == inc.tail))
		goto out;
	for (;;) {
		unsigned count = SPIN_THRESHOLD;
		do {
			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
				goto out;
			cpu_relax();
		} while (--count);
		__ticket_lock_spinning(lock, inc.tail);
	}
out:	barrier();
which results in:
	push   %rbp
	mov    %rsp,%rbp

	mov    $0x200,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f	# Slowpath if lock in contention

	pop    %rbp
	retq   

	### SLOWPATH START
1:	and    $-2,%edx
	movzbl %dl,%esi

2:	mov    $0x800,%eax
	jmp    4f

3:	pause  
	sub    $0x1,%eax
	je     5f

4:	movzbl (%rdi),%ecx
	cmp    %cl,%dl
	jne    3b

	pop    %rbp
	retq   

5:	callq  *__ticket_lock_spinning
	jmp    2b
	### SLOWPATH END

with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
the fastpath case is straight through (taking the lock without
contention), and the spin loop is out of line:

	push   %rbp
	mov    %rsp,%rbp

	mov    $0x100,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f

	pop    %rbp
	retq   

	### SLOWPATH START
1:	pause  
	movzbl (%rdi),%eax
	cmp    %dl,%al
	jne    1b

	pop    %rbp
	retq   
	### SLOWPATH END

The unlock code is complicated by the need to both add to the lock's
"head" and fetch the slowpath flag from "tail".  This version of the
patch uses a locked add to do this, followed by a test to see if the
slowflag is set.  The lock prefix acts as a full memory barrier, so we
can be sure that other CPUs will have seen the unlock before we read
the flag (without the barrier the read could be fetched from the
store queue before it hits memory, which could result in a deadlock).

This is is all unnecessary complication if you're not using PV ticket
locks, it also uses the jump-label machinery to use the standard
"add"-based unlock in the non-PV case.

	if (TICKET_SLOWPATH_FLAG &&
	     static_key_false(&paravirt_ticketlocks_enabled))) {
		arch_spinlock_t prev;
		prev = *lock;
		add_smp(&lock->tickets.head, TICKET_LOCK_INC);

		/* add_smp() is a full mb() */
		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
			__ticket_unlock_slowpath(lock, prev);
	} else
		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
which generates:
	push   %rbp
	mov    %rsp,%rbp

	nop5	# replaced by 5-byte jmp 2f when PV enabled

	# non-PV unlock
	addb   $0x2,(%rdi)

1:	pop    %rbp
	retq   

### PV unlock ###
2:	movzwl (%rdi),%esi	# Fetch prev

	lock addb $0x2,(%rdi)	# Do unlock

	testb  $0x1,0x1(%rdi)	# Test flag
	je     1b		# Finished if not set

### Slow path ###
	add    $2,%sil		# Add "head" in old lock state
	mov    %esi,%edx
	and    $0xfe,%dh	# clear slowflag for comparison
	movzbl %dh,%eax
	cmp    %dl,%al		# If head == tail (uncontended)
	je     4f		# clear slowpath flag

	# Kick next CPU waiting for lock
3:	movzbl %sil,%esi
	callq  *pv_lock_ops.kick

	pop    %rbp
	retq   

	# Lock no longer contended - clear slowflag
4:	mov    %esi,%eax
	lock cmpxchg %dx,(%rdi)	# cmpxchg to clear flag
	cmp    %si,%ax
	jne    3b		# If clear failed, then kick

	pop    %rbp
	retq   

So when not using PV ticketlocks, the unlock sequence just has a
5-byte nop added to it, and the PV case is reasonable straightforward
aside from requiring a "lock add".

Results:
=======
pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat.

For non PLE results are much better for smaller VMs. 
http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/01095.html

This series  with 3.11.rc1 as base is giving
28 to 50% improvement for ebizzy, 8 to 22% for dbench on  32 core machinne with
HT disabled with 32 vcpu guest. 

On 32cpu, 16core machine (HT on) with 16vcpu guests, results showed 1,3,61,77% improvement
 for .5x,1x,1.5x,2x respectively for ebizzy. dbench results were almost flat with -1% to +2%.

Your suggestions and comments are welcome.

github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11

Please note that we set SPIN_THRESHOLD = 32k with this series,
that would eatup little bit of overcommit performance of PLE machines
and overall performance of non-PLE machines.

The older series[3] was tested by Attilio for Xen implementation.

Note that Konrad needs to revert below two patches to enable xen on hvm 
  70dd4998, f10cd522c

Jeremy Fitzhardinge (9):
 x86/spinlock: Replace pv spinlocks with pv ticketlocks
 x86/ticketlock: Collapse a layer of functions
 xen: Defer spinlock setup until boot CPU setup
 xen/pvticketlock: Xen implementation for PV ticket locks
 xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
 x86/pvticketlock: Use callee-save for lock_spinning
 x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
 x86/ticketlock: Add slowpath logic
 xen/pvticketlock: Allow interrupts to be enabled while blocking

Andrew Jones (1):
 jump_label: Split jumplabel ratelimit

Srivatsa Vaddagiri (3):
 kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
 kvm guest : Add configuration support to enable debug information for KVM Guests
 kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

Raghavendra K T (5):
 x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
 kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
 kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
 Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
 kvm hypervisor: Add directed yield in vcpu block path

---
Link in V8 has links to previous patch series and also whole history.

[1]. V10 PV Ticketspinlock for Xen/KVM link: https://lkml.org/lkml/2013/6/24/252
[2]. V9 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2013/6/1/168
[3]. V8 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2012/5/2/119

 Documentation/virtual/kvm/cpuid.txt      |   4 +
 Documentation/virtual/kvm/hypercalls.txt |  14 ++
 arch/arm/include/asm/kvm_host.h          |   5 +
 arch/arm64/include/asm/kvm_host.h        |   5 +
 arch/ia64/include/asm/kvm_host.h         |   5 +
 arch/mips/include/asm/kvm_host.h         |   5 +
 arch/powerpc/include/asm/kvm_host.h      |   5 +
 arch/s390/include/asm/kvm_host.h         |   5 +
 arch/x86/Kconfig                         |  10 +
 arch/x86/include/asm/kvm_host.h          |   7 +-
 arch/x86/include/asm/kvm_para.h          |  14 +-
 arch/x86/include/asm/paravirt.h          |  32 +--
 arch/x86/include/asm/paravirt_types.h    |  10 +-
 arch/x86/include/asm/spinlock.h          | 128 ++++++----
 arch/x86/include/asm/spinlock_types.h    |  16 +-
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/kvm.c                    | 259 +++++++++++++++++++++
 arch/x86/kernel/paravirt-spinlocks.c     |  18 +-
 arch/x86/kvm/cpuid.c                     |   3 +-
 arch/x86/kvm/lapic.c                     |   5 +-
 arch/x86/kvm/x86.c                       |  39 +++-
 arch/x86/xen/smp.c                       |   2 +-
 arch/x86/xen/spinlock.c                  | 387 ++++++++++---------------------
 include/linux/jump_label.h               |  26 +--
 include/linux/jump_label_ratelimit.h     |  34 +++
 include/linux/kvm_host.h                 |   2 +-
 include/linux/perf_event.h               |   1 +
 include/uapi/linux/kvm_para.h            |   1 +
 kernel/jump_label.c                      |   1 +
 virt/kvm/kvm_main.c                      |   6 +-
 30 files changed, 665 insertions(+), 385 deletions(-)


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-07-22  6:16 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri


This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

Changes in V11:
 - use safe_halt in lock_spinning path to avoid potential problem 
  in case of irq_handlers taking lock in slowpath (Gleb)
 - add a0 flag for the kick hypercall for future extension  (Gleb)
 - add stubs for missing architecture for kvm_vcpu_schedule() (Gleb)
 - Change hypercall documentation.
 - Rebased to 3.11-rc1

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that. 

Ticket locks have an inherent problem in a virtualized case, because
the vCPUs are scheduled rather than running concurrently (ignoring
gang scheduled vCPUs).  This can result in catastrophic performance
collapses when the vCPU scheduler doesn't schedule the correct "next"
vCPU, and ends up scheduling a vCPU which burns its entire timeslice
spinning.  (Note that this is not the same problem as lock-holder
preemption, which this series also addresses; that's also a problem,
but not catastrophic).

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

Currently we deal with this by having PV spinlocks, which adds a layer
of indirection in front of all the spinlock functions, and defining a
completely new implementation for Xen (and for other pvops users, but
there are none at present).

PV ticketlocks keeps the existing ticketlock implemenentation
(fastpath) as-is, but adds a couple of pvops for the slow paths:

- If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
  iterations, then call out to the __ticket_lock_spinning() pvop,
  which allows a backend to block the vCPU rather than spinning.  This
  pvop can set the lock into "slowpath state".

- When releasing a lock, if it is in "slowpath state", the call
  __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
  lock is no longer in contention, it also clears the slowpath flag.

The "slowpath state" is stored in the LSB of the within the lock tail
ticket.  This has the effect of reducing the max number of CPUs by
half (so, a "small ticket" can deal with 128 CPUs, and "large ticket"
32768).

For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Overall, it results in a large reduction in code, it makes the native
and virtualized cases closer, and it removes a layer of indirection
around all the spinlock functions.

The fast path (taking an uncontended lock which isn't in "slowpath"
state) is optimal, identical to the non-paravirtualized case.

The inner part of ticket lock code becomes:
	inc = xadd(&lock->tickets, inc);
	inc.tail &= ~TICKET_SLOWPATH_FLAG;

	if (likely(inc.head == inc.tail))
		goto out;
	for (;;) {
		unsigned count = SPIN_THRESHOLD;
		do {
			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
				goto out;
			cpu_relax();
		} while (--count);
		__ticket_lock_spinning(lock, inc.tail);
	}
out:	barrier();
which results in:
	push   %rbp
	mov    %rsp,%rbp

	mov    $0x200,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f	# Slowpath if lock in contention

	pop    %rbp
	retq   

	### SLOWPATH START
1:	and    $-2,%edx
	movzbl %dl,%esi

2:	mov    $0x800,%eax
	jmp    4f

3:	pause  
	sub    $0x1,%eax
	je     5f

4:	movzbl (%rdi),%ecx
	cmp    %cl,%dl
	jne    3b

	pop    %rbp
	retq   

5:	callq  *__ticket_lock_spinning
	jmp    2b
	### SLOWPATH END

with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
the fastpath case is straight through (taking the lock without
contention), and the spin loop is out of line:

	push   %rbp
	mov    %rsp,%rbp

	mov    $0x100,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f

	pop    %rbp
	retq   

	### SLOWPATH START
1:	pause  
	movzbl (%rdi),%eax
	cmp    %dl,%al
	jne    1b

	pop    %rbp
	retq   
	### SLOWPATH END

The unlock code is complicated by the need to both add to the lock's
"head" and fetch the slowpath flag from "tail".  This version of the
patch uses a locked add to do this, followed by a test to see if the
slowflag is set.  The lock prefix acts as a full memory barrier, so we
can be sure that other CPUs will have seen the unlock before we read
the flag (without the barrier the read could be fetched from the
store queue before it hits memory, which could result in a deadlock).

This is is all unnecessary complication if you're not using PV ticket
locks, it also uses the jump-label machinery to use the standard
"add"-based unlock in the non-PV case.

	if (TICKET_SLOWPATH_FLAG &&
	     static_key_false(&paravirt_ticketlocks_enabled))) {
		arch_spinlock_t prev;
		prev = *lock;
		add_smp(&lock->tickets.head, TICKET_LOCK_INC);

		/* add_smp() is a full mb() */
		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
			__ticket_unlock_slowpath(lock, prev);
	} else
		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
which generates:
	push   %rbp
	mov    %rsp,%rbp

	nop5	# replaced by 5-byte jmp 2f when PV enabled

	# non-PV unlock
	addb   $0x2,(%rdi)

1:	pop    %rbp
	retq   

### PV unlock ###
2:	movzwl (%rdi),%esi	# Fetch prev

	lock addb $0x2,(%rdi)	# Do unlock

	testb  $0x1,0x1(%rdi)	# Test flag
	je     1b		# Finished if not set

### Slow path ###
	add    $2,%sil		# Add "head" in old lock state
	mov    %esi,%edx
	and    $0xfe,%dh	# clear slowflag for comparison
	movzbl %dh,%eax
	cmp    %dl,%al		# If head == tail (uncontended)
	je     4f		# clear slowpath flag

	# Kick next CPU waiting for lock
3:	movzbl %sil,%esi
	callq  *pv_lock_ops.kick

	pop    %rbp
	retq   

	# Lock no longer contended - clear slowflag
4:	mov    %esi,%eax
	lock cmpxchg %dx,(%rdi)	# cmpxchg to clear flag
	cmp    %si,%ax
	jne    3b		# If clear failed, then kick

	pop    %rbp
	retq   

So when not using PV ticketlocks, the unlock sequence just has a
5-byte nop added to it, and the PV case is reasonable straightforward
aside from requiring a "lock add".

Results:
=======
pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat.

For non PLE results are much better for smaller VMs. 
http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/01095.html

This series  with 3.11.rc1 as base is giving
28 to 50% improvement for ebizzy, 8 to 22% for dbench on  32 core machinne with
HT disabled with 32 vcpu guest. 

On 32cpu, 16core machine (HT on) with 16vcpu guests, results showed 1,3,61,77% improvement
 for .5x,1x,1.5x,2x respectively for ebizzy. dbench results were almost flat with -1% to +2%.

Your suggestions and comments are welcome.

github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11

Please note that we set SPIN_THRESHOLD = 32k with this series,
that would eatup little bit of overcommit performance of PLE machines
and overall performance of non-PLE machines.

The older series[3] was tested by Attilio for Xen implementation.

Note that Konrad needs to revert below two patches to enable xen on hvm 
  70dd4998, f10cd522c

Jeremy Fitzhardinge (9):
 x86/spinlock: Replace pv spinlocks with pv ticketlocks
 x86/ticketlock: Collapse a layer of functions
 xen: Defer spinlock setup until boot CPU setup
 xen/pvticketlock: Xen implementation for PV ticket locks
 xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
 x86/pvticketlock: Use callee-save for lock_spinning
 x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
 x86/ticketlock: Add slowpath logic
 xen/pvticketlock: Allow interrupts to be enabled while blocking

Andrew Jones (1):
 jump_label: Split jumplabel ratelimit

Srivatsa Vaddagiri (3):
 kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
 kvm guest : Add configuration support to enable debug information for KVM Guests
 kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

Raghavendra K T (5):
 x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
 kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
 kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
 Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
 kvm hypervisor: Add directed yield in vcpu block path

---
Link in V8 has links to previous patch series and also whole history.

[1]. V10 PV Ticketspinlock for Xen/KVM link: https://lkml.org/lkml/2013/6/24/252
[2]. V9 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2013/6/1/168
[3]. V8 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2012/5/2/119

 Documentation/virtual/kvm/cpuid.txt      |   4 +
 Documentation/virtual/kvm/hypercalls.txt |  14 ++
 arch/arm/include/asm/kvm_host.h          |   5 +
 arch/arm64/include/asm/kvm_host.h        |   5 +
 arch/ia64/include/asm/kvm_host.h         |   5 +
 arch/mips/include/asm/kvm_host.h         |   5 +
 arch/powerpc/include/asm/kvm_host.h      |   5 +
 arch/s390/include/asm/kvm_host.h         |   5 +
 arch/x86/Kconfig                         |  10 +
 arch/x86/include/asm/kvm_host.h          |   7 +-
 arch/x86/include/asm/kvm_para.h          |  14 +-
 arch/x86/include/asm/paravirt.h          |  32 +--
 arch/x86/include/asm/paravirt_types.h    |  10 +-
 arch/x86/include/asm/spinlock.h          | 128 ++++++----
 arch/x86/include/asm/spinlock_types.h    |  16 +-
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/kvm.c                    | 259 +++++++++++++++++++++
 arch/x86/kernel/paravirt-spinlocks.c     |  18 +-
 arch/x86/kvm/cpuid.c                     |   3 +-
 arch/x86/kvm/lapic.c                     |   5 +-
 arch/x86/kvm/x86.c                       |  39 +++-
 arch/x86/xen/smp.c                       |   2 +-
 arch/x86/xen/spinlock.c                  | 387 ++++++++++---------------------
 include/linux/jump_label.h               |  26 +--
 include/linux/jump_label_ratelimit.h     |  34 +++
 include/linux/kvm_host.h                 |   2 +-
 include/linux/perf_event.h               |   1 +
 include/uapi/linux/kvm_para.h            |   1 +
 kernel/jump_label.c                      |   1 +
 virt/kvm/kvm_main.c                      |   6 +-
 30 files changed, 665 insertions(+), 385 deletions(-)


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 1/18]  x86/spinlock: Replace pv spinlocks with pv ticketlocks
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:16   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/spinlock: Replace pv spinlocks with pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |   32 ++++----------------
 arch/x86/include/asm/paravirt_types.h |   10 ++----
 arch/x86/include/asm/spinlock.h       |   53 +++++++++++++++++++++++++++------
 arch/x86/include/asm/spinlock_types.h |    4 --
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +--------
 arch/x86/xen/spinlock.c               |    8 ++++-
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
-						  unsigned long flags)
-{
-	PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-	return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0db1fca..d5deb6d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,13 +327,11 @@ struct pv_mmu_ops {
 };
 
 struct arch_spinlock;
+#include <asm/spinlock_types.h>
+
 struct pv_lock_ops {
-	int (*spin_is_locked)(struct arch_spinlock *lock);
-	int (*spin_is_contended)(struct arch_spinlock *lock);
-	void (*spin_lock)(struct arch_spinlock *lock);
-	void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags);
-	int (*spin_trylock)(struct arch_spinlock *lock);
-	void (*spin_unlock)(struct arch_spinlock *lock);
+	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..4d54244 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -34,6 +34,35 @@
 # define UNLOCK_LOCK_PREFIX
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD	(1 << 15)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
+{
+}
+
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							 __ticket_t ticket)
+{
+}
+
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
+
+/*
+ * If a spinlock has someone waiting on it, then kick the appropriate
+ * waiting cpu.
+ */
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t next)
+{
+	if (unlikely(lock->tickets.tail != next))
+		____ticket_unlock_kick(lock, next);
+}
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -47,19 +76,24 @@
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
 	inc = xadd(&lock->tickets, inc);
 
 	for (;;) {
-		if (inc.head == inc.tail)
-			break;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
+		unsigned count = SPIN_THRESHOLD;
+
+		do {
+			if (inc.head == inc.tail)
+				goto out;
+			cpu_relax();
+			inc.head = ACCESS_ONCE(lock->tickets.head);
+		} while (--count);
+		__ticket_lock_spinning(lock, inc.tail);
 	}
-	barrier();		/* make sure nothing creeps before the lock is taken */
+out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -78,7 +112,10 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
+	__ticket_t next = lock->tickets.head + 1;
+
 	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__ticket_unlock_kick(lock, next);
 }
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
@@ -95,8 +132,6 @@ static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
-
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	return __ticket_spin_is_locked(lock);
@@ -129,8 +164,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
 	while (arch_spin_is_locked(lock))
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index ad0ad07..83fd3c7 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -1,10 +1,6 @@
 #ifndef _ASM_X86_SPINLOCK_TYPES_H
 #define _ASM_X86_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
 #include <linux/types.h>
 
 #if (CONFIG_NR_CPUS < 256)
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 676b8c7..c2e010e 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -7,21 +7,10 @@
 
 #include <asm/paravirt.h>
 
-static inline void
-default_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
-{
-	arch_spin_lock(lock);
-}
-
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.spin_is_locked = __ticket_spin_is_locked,
-	.spin_is_contended = __ticket_spin_is_contended,
-
-	.spin_lock = __ticket_spin_lock,
-	.spin_lock_flags = default_spin_lock_flags,
-	.spin_trylock = __ticket_spin_trylock,
-	.spin_unlock = __ticket_spin_unlock,
+	.lock_spinning = paravirt_nop,
+	.unlock_kick = paravirt_nop,
 #endif
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a40f850..0454bcb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -139,6 +139,9 @@ struct xen_spinlock {
 	xen_spinners_t spinners;	/* count of waiting cpus */
 };
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+
+#if 0
 static int xen_spin_is_locked(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
@@ -167,7 +170,6 @@ static int xen_spin_trylock(struct arch_spinlock *lock)
 }
 
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
 
 /*
@@ -354,6 +356,7 @@ static void xen_spin_unlock(struct arch_spinlock *lock)
 	if (unlikely(xl->spinners))
 		xen_spin_unlock_slow(xl);
 }
+#endif
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -418,13 +421,14 @@ void __init xen_init_spinlocks(void)
 		return;
 
 	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-
+#if 0
 	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
 	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
 	pv_lock_ops.spin_lock = xen_spin_lock;
 	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
 	pv_lock_ops.spin_trylock = xen_spin_trylock;
 	pv_lock_ops.spin_unlock = xen_spin_unlock;
+#endif
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks
@ 2013-07-22  6:16   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/spinlock: Replace pv spinlocks with pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |   32 ++++----------------
 arch/x86/include/asm/paravirt_types.h |   10 ++----
 arch/x86/include/asm/spinlock.h       |   53 +++++++++++++++++++++++++++------
 arch/x86/include/asm/spinlock_types.h |    4 --
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +--------
 arch/x86/xen/spinlock.c               |    8 ++++-
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
-						  unsigned long flags)
-{
-	PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-	return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0db1fca..d5deb6d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,13 +327,11 @@ struct pv_mmu_ops {
 };
 
 struct arch_spinlock;
+#include <asm/spinlock_types.h>
+
 struct pv_lock_ops {
-	int (*spin_is_locked)(struct arch_spinlock *lock);
-	int (*spin_is_contended)(struct arch_spinlock *lock);
-	void (*spin_lock)(struct arch_spinlock *lock);
-	void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags);
-	int (*spin_trylock)(struct arch_spinlock *lock);
-	void (*spin_unlock)(struct arch_spinlock *lock);
+	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..4d54244 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -34,6 +34,35 @@
 # define UNLOCK_LOCK_PREFIX
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD	(1 << 15)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
+{
+}
+
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							 __ticket_t ticket)
+{
+}
+
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
+
+/*
+ * If a spinlock has someone waiting on it, then kick the appropriate
+ * waiting cpu.
+ */
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t next)
+{
+	if (unlikely(lock->tickets.tail != next))
+		____ticket_unlock_kick(lock, next);
+}
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -47,19 +76,24 @@
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
 	inc = xadd(&lock->tickets, inc);
 
 	for (;;) {
-		if (inc.head == inc.tail)
-			break;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
+		unsigned count = SPIN_THRESHOLD;
+
+		do {
+			if (inc.head == inc.tail)
+				goto out;
+			cpu_relax();
+			inc.head = ACCESS_ONCE(lock->tickets.head);
+		} while (--count);
+		__ticket_lock_spinning(lock, inc.tail);
 	}
-	barrier();		/* make sure nothing creeps before the lock is taken */
+out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -78,7 +112,10 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
+	__ticket_t next = lock->tickets.head + 1;
+
 	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__ticket_unlock_kick(lock, next);
 }
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
@@ -95,8 +132,6 @@ static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
-
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	return __ticket_spin_is_locked(lock);
@@ -129,8 +164,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
 	while (arch_spin_is_locked(lock))
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index ad0ad07..83fd3c7 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -1,10 +1,6 @@
 #ifndef _ASM_X86_SPINLOCK_TYPES_H
 #define _ASM_X86_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
 #include <linux/types.h>
 
 #if (CONFIG_NR_CPUS < 256)
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 676b8c7..c2e010e 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -7,21 +7,10 @@
 
 #include <asm/paravirt.h>
 
-static inline void
-default_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
-{
-	arch_spin_lock(lock);
-}
-
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.spin_is_locked = __ticket_spin_is_locked,
-	.spin_is_contended = __ticket_spin_is_contended,
-
-	.spin_lock = __ticket_spin_lock,
-	.spin_lock_flags = default_spin_lock_flags,
-	.spin_trylock = __ticket_spin_trylock,
-	.spin_unlock = __ticket_spin_unlock,
+	.lock_spinning = paravirt_nop,
+	.unlock_kick = paravirt_nop,
 #endif
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a40f850..0454bcb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -139,6 +139,9 @@ struct xen_spinlock {
 	xen_spinners_t spinners;	/* count of waiting cpus */
 };
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+
+#if 0
 static int xen_spin_is_locked(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
@@ -167,7 +170,6 @@ static int xen_spin_trylock(struct arch_spinlock *lock)
 }
 
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
 
 /*
@@ -354,6 +356,7 @@ static void xen_spin_unlock(struct arch_spinlock *lock)
 	if (unlikely(xl->spinners))
 		xen_spin_unlock_slow(xl);
 }
+#endif
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -418,13 +421,14 @@ void __init xen_init_spinlocks(void)
 		return;
 
 	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-
+#if 0
 	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
 	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
 	pv_lock_ops.spin_lock = xen_spin_lock;
 	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
 	pv_lock_ops.spin_trylock = xen_spin_trylock;
 	pv_lock_ops.spin_unlock = xen_spin_unlock;
+#endif
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks
@ 2013-07-22  6:16   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/spinlock: Replace pv spinlocks with pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |   32 ++++----------------
 arch/x86/include/asm/paravirt_types.h |   10 ++----
 arch/x86/include/asm/spinlock.h       |   53 +++++++++++++++++++++++++++------
 arch/x86/include/asm/spinlock_types.h |    4 --
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +--------
 arch/x86/xen/spinlock.c               |    8 ++++-
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t ticket)
 {
-	return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended	arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
-						  unsigned long flags)
-{
-	PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-	return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-	PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0db1fca..d5deb6d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,13 +327,11 @@ struct pv_mmu_ops {
 };
 
 struct arch_spinlock;
+#include <asm/spinlock_types.h>
+
 struct pv_lock_ops {
-	int (*spin_is_locked)(struct arch_spinlock *lock);
-	int (*spin_is_contended)(struct arch_spinlock *lock);
-	void (*spin_lock)(struct arch_spinlock *lock);
-	void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags);
-	int (*spin_trylock)(struct arch_spinlock *lock);
-	void (*spin_unlock)(struct arch_spinlock *lock);
+	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..4d54244 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -34,6 +34,35 @@
 # define UNLOCK_LOCK_PREFIX
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD	(1 << 15)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+							__ticket_t ticket)
+{
+}
+
+static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+							 __ticket_t ticket)
+{
+}
+
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
+
+/*
+ * If a spinlock has someone waiting on it, then kick the appropriate
+ * waiting cpu.
+ */
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
+							__ticket_t next)
+{
+	if (unlikely(lock->tickets.tail != next))
+		____ticket_unlock_kick(lock, next);
+}
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -47,19 +76,24 @@
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
 	inc = xadd(&lock->tickets, inc);
 
 	for (;;) {
-		if (inc.head == inc.tail)
-			break;
-		cpu_relax();
-		inc.head = ACCESS_ONCE(lock->tickets.head);
+		unsigned count = SPIN_THRESHOLD;
+
+		do {
+			if (inc.head == inc.tail)
+				goto out;
+			cpu_relax();
+			inc.head = ACCESS_ONCE(lock->tickets.head);
+		} while (--count);
+		__ticket_lock_spinning(lock, inc.tail);
 	}
-	barrier();		/* make sure nothing creeps before the lock is taken */
+out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -78,7 +112,10 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
+	__ticket_t next = lock->tickets.head + 1;
+
 	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__ticket_unlock_kick(lock, next);
 }
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
@@ -95,8 +132,6 @@ static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
-
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	return __ticket_spin_is_locked(lock);
@@ -129,8 +164,6 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 	arch_spin_lock(lock);
 }
 
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
 	while (arch_spin_is_locked(lock))
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index ad0ad07..83fd3c7 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -1,10 +1,6 @@
 #ifndef _ASM_X86_SPINLOCK_TYPES_H
 #define _ASM_X86_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
-# error "please don't include this file directly"
-#endif
-
 #include <linux/types.h>
 
 #if (CONFIG_NR_CPUS < 256)
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 676b8c7..c2e010e 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -7,21 +7,10 @@
 
 #include <asm/paravirt.h>
 
-static inline void
-default_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
-{
-	arch_spin_lock(lock);
-}
-
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.spin_is_locked = __ticket_spin_is_locked,
-	.spin_is_contended = __ticket_spin_is_contended,
-
-	.spin_lock = __ticket_spin_lock,
-	.spin_lock_flags = default_spin_lock_flags,
-	.spin_trylock = __ticket_spin_trylock,
-	.spin_unlock = __ticket_spin_unlock,
+	.lock_spinning = paravirt_nop,
+	.unlock_kick = paravirt_nop,
 #endif
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a40f850..0454bcb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -139,6 +139,9 @@ struct xen_spinlock {
 	xen_spinners_t spinners;	/* count of waiting cpus */
 };
 
+static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+
+#if 0
 static int xen_spin_is_locked(struct arch_spinlock *lock)
 {
 	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
@@ -167,7 +170,6 @@ static int xen_spin_trylock(struct arch_spinlock *lock)
 }
 
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
 static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
 
 /*
@@ -354,6 +356,7 @@ static void xen_spin_unlock(struct arch_spinlock *lock)
 	if (unlikely(xl->spinners))
 		xen_spin_unlock_slow(xl);
 }
+#endif
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -418,13 +421,14 @@ void __init xen_init_spinlocks(void)
 		return;
 
 	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-
+#if 0
 	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
 	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
 	pv_lock_ops.spin_lock = xen_spin_lock;
 	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
 	pv_lock_ops.spin_trylock = xen_spin_trylock;
 	pv_lock_ops.spin_unlock = xen_spin_unlock;
+#endif
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 2/18]  x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  2013-07-22  6:16 ` Raghavendra K T
@ 2013-07-22  6:17   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..112e712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -632,6 +632,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
 	bool "Paravirtualization layer for spinlocks"
 	depends on PARAVIRT && SMP
+	select UNINLINE_SPIN_UNLOCK
 	---help---
 	  Paravirtualized spinlocks allow a pvops backend to replace the
 	  spinlock implementation with something virtualization-friendly


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
  2013-07-22  6:16 ` Raghavendra K T
                   ` (2 preceding siblings ...)
  (?)
@ 2013-07-22  6:17 ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..112e712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -632,6 +632,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
 	bool "Paravirtualization layer for spinlocks"
 	depends on PARAVIRT && SMP
+	select UNINLINE_SPIN_UNLOCK
 	---help---
 	  Paravirtualized spinlocks allow a pvops backend to replace the
 	  spinlock implementation with something virtualization-friendly

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 2/18]  x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..112e712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -632,6 +632,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
 	bool "Paravirtualization layer for spinlocks"
 	depends on PARAVIRT && SMP
+	select UNINLINE_SPIN_UNLOCK
 	---help---
 	  Paravirtualized spinlocks allow a pvops backend to replace the
 	  spinlock implementation with something virtualization-friendly


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 3/18]  x86/ticketlock: Collapse a layer of functions
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:17   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/ticketlock: Collapse a layer of functions

From: Jeremy Fitzhardinge <jeremy@goop.org>

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h |   35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	__ticket_t next = lock->tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended	arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	__ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 						  unsigned long flags)
 {


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 3/18]  x86/ticketlock: Collapse a layer of functions
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/ticketlock: Collapse a layer of functions

From: Jeremy Fitzhardinge <jeremy@goop.org>

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h |   35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	__ticket_t next = lock->tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended	arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	__ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 						  unsigned long flags)
 {

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 3/18]  x86/ticketlock: Collapse a layer of functions
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/ticketlock: Collapse a layer of functions

From: Jeremy Fitzhardinge <jeremy@goop.org>

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h |   35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
 	register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 out:	barrier();	/* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 	__ticket_t next = lock->tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 	__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
 	return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-	return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended	arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-	__ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-	return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-	__ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 						  unsigned long flags)
 {

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 4/18]  xen: Defer spinlock setup until boot CPU setup
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:17   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

xen: Defer spinlock setup until boot CPU setup

From: Jeremy Fitzhardinge <jeremy@goop.org>

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/smp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c1367b2..c3c9bcf 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
 	xen_filter_cpu_maps();
 	xen_setup_vcpu_info_placement();
+	xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
 	smp_ops = xen_smp_ops;
 	xen_fill_possible_map();
-	xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 4/18]  xen: Defer spinlock setup until boot CPU setup
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen: Defer spinlock setup until boot CPU setup

From: Jeremy Fitzhardinge <jeremy@goop.org>

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/smp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c1367b2..c3c9bcf 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
 	xen_filter_cpu_maps();
 	xen_setup_vcpu_info_placement();
+	xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
 	smp_ops = xen_smp_ops;
 	xen_fill_possible_map();
-	xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 4/18]  xen: Defer spinlock setup until boot CPU setup
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen: Defer spinlock setup until boot CPU setup

From: Jeremy Fitzhardinge <jeremy@goop.org>

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/smp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c1367b2..c3c9bcf 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
 	xen_filter_cpu_maps();
 	xen_setup_vcpu_info_placement();
+	xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
 	smp_ops = xen_smp_ops;
 	xen_fill_possible_map();
-	xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 5/18]  xen/pvticketlock: Xen implementation for PV ticket locks
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:17   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

xen/pvticketlock: Xen implementation for PV ticket locks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |  348 +++++++++++------------------------------------
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 0454bcb..31949da 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-	u64 taken;
-	u32 taken_slow;
-	u32 taken_slow_nested;
-	u32 taken_slow_pickup;
-	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;
+enum xen_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	TAKEN_SLOW_SPURIOUS,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
 
-	u64 released;
-	u32 released_slow;
-	u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS	30
-	u32 histo_spin_total[HISTO_BUCKETS+1];
-	u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
 	u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-	u64 time_total;
-	u64 time_spinning;
 	u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-	if (unlikely(zero_stats)) {
-		memset(&spinlock_stats, 0, sizeof(spinlock_stats));
-		zero_stats = 0;
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
 	}
 }
 
-#define ADD_STATS(elem, val)			\
-	do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
 		array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-	spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_total);
-	spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
 	u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT			(1 << 10)
-#define ADD_STATS(elem, val)	do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
 	return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
 	asm(LOCK_PREFIX " decw %0" : "+m" ((xl)->spinners) : : "memory");
 #endif
 
-struct xen_spinlock {
-	unsigned char lock;		/* 0 -> free; 1 -> locked */
-	xen_spinners_t spinners;	/* count of waiting cpus */
+struct xen_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	return xl->lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	/* Not strictly true; this is only the count of contended
-	   lock-takers entering the slow path. */
-	return xl->spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	u8 old = 1;
-
-	asm("xchgb %b0,%1"
-	    : "+q" (old), "+m" (xl->lock) : : "memory");
-
-	return old == 0;
-}
-
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
+static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
-	struct xen_spinlock *prev;
-
-	prev = __this_cpu_read(lock_spinners);
-	__this_cpu_write(lock_spinners, xl);
-
-	wmb();			/* set lock of interest before count */
-
-	inc_spinners(xl);
-
-	return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock *prev)
-{
-	dec_spinners(xl);
-	wmb();			/* decrement count before restoring lock */
-	__this_cpu_write(lock_spinners, prev);
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	struct xen_spinlock *prev;
 	int irq = __this_cpu_read(lock_kicker_irq);
-	int ret;
+	struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
 	u64 start;
+	unsigned long flags;
 
 	/* If kicker interrupts not initialized yet, just spin */
 	if (irq == -1)
-		return 0;
+		return;
 
 	start = spin_time_start();
 
-	/* announce we're spinning */
-	prev = spinning_lock(xl);
-
-	ADD_STATS(taken_slow, 1);
-	ADD_STATS(taken_slow_nested, prev != NULL);
-
-	do {
-		unsigned long flags;
-
-		/* clear pending */
-		xen_clear_irq_pending(irq);
-
-		/* check again make sure it didn't become free while
-		   we weren't looking  */
-		ret = xen_spin_trylock(lock);
-		if (ret) {
-			ADD_STATS(taken_slow_pickup, 1);
-
-			/*
-			 * If we interrupted another spinlock while it
-			 * was blocking, make sure it doesn't block
-			 * without rechecking the lock.
-			 */
-			if (prev != NULL)
-				xen_set_irq_pending(irq);
-			goto out;
-		}
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
 
-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
 
-		/*
-		 * Block until irq becomes pending.  If we're
-		 * interrupted at this point (after the trylock but
-		 * before entering the block), then the nested lock
-		 * handler guarantees that the irq will be left
-		 * pending if there's any chance the lock became free;
-		 * xen_poll_irq() returns immediately if the irq is
-		 * pending.
-		 */
-		xen_poll_irq(irq);
+	/* This uses set_bit, which atomic and therefore a barrier */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+	add_stats(TAKEN_SLOW, 1);
 
-		raw_local_irq_restore(flags);
+	/* clear pending */
+	xen_clear_irq_pending(irq);
 
-		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
-	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
+	/* Only check lock once pending cleared */
+	barrier();
 
+	/* check again make sure it didn't become free while
+	   we weren't looking  */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
+	xen_poll_irq(irq);
+	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
-
 out:
-	unspinning_lock(xl, prev);
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
-
-	return ret;
 }
 
-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	unsigned timeout;
-	u8 oldval;
-	u64 start_spin;
-
-	ADD_STATS(taken, 1);
-
-	start_spin = spin_time_start();
-
-	do {
-		u64 start_spin_fast = spin_time_start();
-
-		timeout = TIMEOUT;
-
-		asm("1: xchgb %1,%0\n"
-		    "   testb %1,%1\n"
-		    "   jz 3f\n"
-		    "2: rep;nop\n"
-		    "   cmpb $0,%0\n"
-		    "   je 1b\n"
-		    "   dec %2\n"
-		    "   jnz 2b\n"
-		    "3:\n"
-		    : "+m" (xl->lock), "=q" (oldval), "+r" (timeout)
-		    : "1" (1)
-		    : "memory");
-
-		spin_time_accum_spinning(start_spin_fast);
-
-	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
-
-	spin_time_accum_total(start_spin);
-}
-
-static void xen_spin_lock(struct arch_spinlock *lock)
-{
-	__xen_spin_lock(lock, false);
-}
-
-static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
-{
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
-}
-
-static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
+static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
 	int cpu;
 
-	ADD_STATS(released_slow, 1);
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-	for_each_online_cpu(cpu) {
-		/* XXX should mix up next cpu selection */
-		if (per_cpu(lock_spinners, cpu) == xl) {
-			ADD_STATS(released_slow_kicked, 1);
+		if (w->lock == lock && w->want == next) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+			break;
 		}
 	}
 }
 
-static void xen_spin_unlock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	ADD_STATS(released, 1);
-
-	smp_wmb();		/* make sure no writes get moved after unlock */
-	xl->lock = 0;		/* release lock */
-
-	/*
-	 * Make sure unlock happens before checking for waiting
-	 * spinners.  We need a strong barrier to enforce the
-	 * write-read ordering to different memory locations, as the
-	 * CPU makes no implied guarantees about their ordering.
-	 */
-	mb();
-
-	if (unlikely(xl->spinners))
-		xen_spin_unlock_slow(xl);
-}
-#endif
-
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
 	BUG();
@@ -420,15 +253,8 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
-	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-#if 0
-	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
-	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
-	pv_lock_ops.spin_lock = xen_spin_lock;
-	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
-	pv_lock_ops.spin_trylock = xen_spin_trylock;
-	pv_lock_ops.spin_unlock = xen_spin_unlock;
-#endif
+	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS
@@ -446,37 +272,21 @@ static int __init xen_spinlock_debugfs(void)
 
 	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
 
-	debugfs_create_u32("timeout", 0644, d_spin_debug, &lock_timeout);
-
-	debugfs_create_u64("taken", 0444, d_spin_debug, &spinlock_stats.taken);
 	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow);
-	debugfs_create_u32("taken_slow_nested", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_nested);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW]);
 	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_pickup);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_SPURIOUS]);
 
-	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW]);
 	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow_kicked);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
 
-	debugfs_create_u64("time_spinning", 0444, d_spin_debug,
-			   &spinlock_stats.time_spinning);
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
-	debugfs_create_u64("time_total", 0444, d_spin_debug,
-			   &spinlock_stats.time_total);
 
-	debugfs_create_u32_array("histo_total", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_total, HISTO_BUCKETS + 1);
-	debugfs_create_u32_array("histo_spinning", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_spinning, HISTO_BUCKETS + 1);
 	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 5/18] xen/pvticketlock: Xen implementation for PV ticket locks
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen/pvticketlock: Xen implementation for PV ticket locks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |  348 +++++++++++------------------------------------
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 0454bcb..31949da 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-	u64 taken;
-	u32 taken_slow;
-	u32 taken_slow_nested;
-	u32 taken_slow_pickup;
-	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;
+enum xen_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	TAKEN_SLOW_SPURIOUS,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
 
-	u64 released;
-	u32 released_slow;
-	u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS	30
-	u32 histo_spin_total[HISTO_BUCKETS+1];
-	u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
 	u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-	u64 time_total;
-	u64 time_spinning;
 	u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-	if (unlikely(zero_stats)) {
-		memset(&spinlock_stats, 0, sizeof(spinlock_stats));
-		zero_stats = 0;
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
 	}
 }
 
-#define ADD_STATS(elem, val)			\
-	do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
 		array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-	spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_total);
-	spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
 	u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT			(1 << 10)
-#define ADD_STATS(elem, val)	do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
 	return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
 	asm(LOCK_PREFIX " decw %0" : "+m" ((xl)->spinners) : : "memory");
 #endif
 
-struct xen_spinlock {
-	unsigned char lock;		/* 0 -> free; 1 -> locked */
-	xen_spinners_t spinners;	/* count of waiting cpus */
+struct xen_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	return xl->lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	/* Not strictly true; this is only the count of contended
-	   lock-takers entering the slow path. */
-	return xl->spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	u8 old = 1;
-
-	asm("xchgb %b0,%1"
-	    : "+q" (old), "+m" (xl->lock) : : "memory");
-
-	return old == 0;
-}
-
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
+static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
-	struct xen_spinlock *prev;
-
-	prev = __this_cpu_read(lock_spinners);
-	__this_cpu_write(lock_spinners, xl);
-
-	wmb();			/* set lock of interest before count */
-
-	inc_spinners(xl);
-
-	return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock *prev)
-{
-	dec_spinners(xl);
-	wmb();			/* decrement count before restoring lock */
-	__this_cpu_write(lock_spinners, prev);
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	struct xen_spinlock *prev;
 	int irq = __this_cpu_read(lock_kicker_irq);
-	int ret;
+	struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
 	u64 start;
+	unsigned long flags;
 
 	/* If kicker interrupts not initialized yet, just spin */
 	if (irq == -1)
-		return 0;
+		return;
 
 	start = spin_time_start();
 
-	/* announce we're spinning */
-	prev = spinning_lock(xl);
-
-	ADD_STATS(taken_slow, 1);
-	ADD_STATS(taken_slow_nested, prev != NULL);
-
-	do {
-		unsigned long flags;
-
-		/* clear pending */
-		xen_clear_irq_pending(irq);
-
-		/* check again make sure it didn't become free while
-		   we weren't looking  */
-		ret = xen_spin_trylock(lock);
-		if (ret) {
-			ADD_STATS(taken_slow_pickup, 1);
-
-			/*
-			 * If we interrupted another spinlock while it
-			 * was blocking, make sure it doesn't block
-			 * without rechecking the lock.
-			 */
-			if (prev != NULL)
-				xen_set_irq_pending(irq);
-			goto out;
-		}
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
 
-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
 
-		/*
-		 * Block until irq becomes pending.  If we're
-		 * interrupted at this point (after the trylock but
-		 * before entering the block), then the nested lock
-		 * handler guarantees that the irq will be left
-		 * pending if there's any chance the lock became free;
-		 * xen_poll_irq() returns immediately if the irq is
-		 * pending.
-		 */
-		xen_poll_irq(irq);
+	/* This uses set_bit, which atomic and therefore a barrier */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+	add_stats(TAKEN_SLOW, 1);
 
-		raw_local_irq_restore(flags);
+	/* clear pending */
+	xen_clear_irq_pending(irq);
 
-		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
-	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
+	/* Only check lock once pending cleared */
+	barrier();
 
+	/* check again make sure it didn't become free while
+	   we weren't looking  */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
+	xen_poll_irq(irq);
+	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
-
 out:
-	unspinning_lock(xl, prev);
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
-
-	return ret;
 }
 
-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	unsigned timeout;
-	u8 oldval;
-	u64 start_spin;
-
-	ADD_STATS(taken, 1);
-
-	start_spin = spin_time_start();
-
-	do {
-		u64 start_spin_fast = spin_time_start();
-
-		timeout = TIMEOUT;
-
-		asm("1: xchgb %1,%0\n"
-		    "   testb %1,%1\n"
-		    "   jz 3f\n"
-		    "2: rep;nop\n"
-		    "   cmpb $0,%0\n"
-		    "   je 1b\n"
-		    "   dec %2\n"
-		    "   jnz 2b\n"
-		    "3:\n"
-		    : "+m" (xl->lock), "=q" (oldval), "+r" (timeout)
-		    : "1" (1)
-		    : "memory");
-
-		spin_time_accum_spinning(start_spin_fast);
-
-	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
-
-	spin_time_accum_total(start_spin);
-}
-
-static void xen_spin_lock(struct arch_spinlock *lock)
-{
-	__xen_spin_lock(lock, false);
-}
-
-static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
-{
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
-}
-
-static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
+static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
 	int cpu;
 
-	ADD_STATS(released_slow, 1);
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-	for_each_online_cpu(cpu) {
-		/* XXX should mix up next cpu selection */
-		if (per_cpu(lock_spinners, cpu) == xl) {
-			ADD_STATS(released_slow_kicked, 1);
+		if (w->lock == lock && w->want == next) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+			break;
 		}
 	}
 }
 
-static void xen_spin_unlock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	ADD_STATS(released, 1);
-
-	smp_wmb();		/* make sure no writes get moved after unlock */
-	xl->lock = 0;		/* release lock */
-
-	/*
-	 * Make sure unlock happens before checking for waiting
-	 * spinners.  We need a strong barrier to enforce the
-	 * write-read ordering to different memory locations, as the
-	 * CPU makes no implied guarantees about their ordering.
-	 */
-	mb();
-
-	if (unlikely(xl->spinners))
-		xen_spin_unlock_slow(xl);
-}
-#endif
-
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
 	BUG();
@@ -420,15 +253,8 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
-	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-#if 0
-	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
-	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
-	pv_lock_ops.spin_lock = xen_spin_lock;
-	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
-	pv_lock_ops.spin_trylock = xen_spin_trylock;
-	pv_lock_ops.spin_unlock = xen_spin_unlock;
-#endif
+	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS
@@ -446,37 +272,21 @@ static int __init xen_spinlock_debugfs(void)
 
 	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
 
-	debugfs_create_u32("timeout", 0644, d_spin_debug, &lock_timeout);
-
-	debugfs_create_u64("taken", 0444, d_spin_debug, &spinlock_stats.taken);
 	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow);
-	debugfs_create_u32("taken_slow_nested", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_nested);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW]);
 	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_pickup);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_SPURIOUS]);
 
-	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW]);
 	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow_kicked);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
 
-	debugfs_create_u64("time_spinning", 0444, d_spin_debug,
-			   &spinlock_stats.time_spinning);
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
-	debugfs_create_u64("time_total", 0444, d_spin_debug,
-			   &spinlock_stats.time_total);
 
-	debugfs_create_u32_array("histo_total", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_total, HISTO_BUCKETS + 1);
-	debugfs_create_u32_array("histo_spinning", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_spinning, HISTO_BUCKETS + 1);
 	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 5/18] xen/pvticketlock: Xen implementation for PV ticket locks
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen/pvticketlock: Xen implementation for PV ticket locks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |  348 +++++++++++------------------------------------
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 0454bcb..31949da 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-	u64 taken;
-	u32 taken_slow;
-	u32 taken_slow_nested;
-	u32 taken_slow_pickup;
-	u32 taken_slow_spurious;
-	u32 taken_slow_irqenable;
+enum xen_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	TAKEN_SLOW_SPURIOUS,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
 
-	u64 released;
-	u32 released_slow;
-	u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS	30
-	u32 histo_spin_total[HISTO_BUCKETS+1];
-	u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
 	u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-	u64 time_total;
-	u64 time_spinning;
 	u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-	if (unlikely(zero_stats)) {
-		memset(&spinlock_stats, 0, sizeof(spinlock_stats));
-		zero_stats = 0;
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
 	}
 }
 
-#define ADD_STATS(elem, val)			\
-	do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
 		array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-	spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-	u32 delta = xen_clocksource_read() - start;
-
-	__spin_time_accum(delta, spinlock_stats.histo_spin_total);
-	spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
 	u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT			(1 << 10)
-#define ADD_STATS(elem, val)	do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
 	return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
 	asm(LOCK_PREFIX " decw %0" : "+m" ((xl)->spinners) : : "memory");
 #endif
 
-struct xen_spinlock {
-	unsigned char lock;		/* 0 -> free; 1 -> locked */
-	xen_spinners_t spinners;	/* count of waiting cpus */
+struct xen_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	return xl->lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	/* Not strictly true; this is only the count of contended
-	   lock-takers entering the slow path. */
-	return xl->spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	u8 old = 1;
-
-	asm("xchgb %b0,%1"
-	    : "+q" (old), "+m" (xl->lock) : : "memory");
-
-	return old == 0;
-}
-
 static DEFINE_PER_CPU(char *, irq_name);
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
+static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 {
-	struct xen_spinlock *prev;
-
-	prev = __this_cpu_read(lock_spinners);
-	__this_cpu_write(lock_spinners, xl);
-
-	wmb();			/* set lock of interest before count */
-
-	inc_spinners(xl);
-
-	return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct xen_spinlock *prev)
-{
-	dec_spinners(xl);
-	wmb();			/* decrement count before restoring lock */
-	__this_cpu_write(lock_spinners, prev);
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	struct xen_spinlock *prev;
 	int irq = __this_cpu_read(lock_kicker_irq);
-	int ret;
+	struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
 	u64 start;
+	unsigned long flags;
 
 	/* If kicker interrupts not initialized yet, just spin */
 	if (irq == -1)
-		return 0;
+		return;
 
 	start = spin_time_start();
 
-	/* announce we're spinning */
-	prev = spinning_lock(xl);
-
-	ADD_STATS(taken_slow, 1);
-	ADD_STATS(taken_slow_nested, prev != NULL);
-
-	do {
-		unsigned long flags;
-
-		/* clear pending */
-		xen_clear_irq_pending(irq);
-
-		/* check again make sure it didn't become free while
-		   we weren't looking  */
-		ret = xen_spin_trylock(lock);
-		if (ret) {
-			ADD_STATS(taken_slow_pickup, 1);
-
-			/*
-			 * If we interrupted another spinlock while it
-			 * was blocking, make sure it doesn't block
-			 * without rechecking the lock.
-			 */
-			if (prev != NULL)
-				xen_set_irq_pending(irq);
-			goto out;
-		}
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
 
-		flags = arch_local_save_flags();
-		if (irq_enable) {
-			ADD_STATS(taken_slow_irqenable, 1);
-			raw_local_irq_enable();
-		}
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
 
-		/*
-		 * Block until irq becomes pending.  If we're
-		 * interrupted at this point (after the trylock but
-		 * before entering the block), then the nested lock
-		 * handler guarantees that the irq will be left
-		 * pending if there's any chance the lock became free;
-		 * xen_poll_irq() returns immediately if the irq is
-		 * pending.
-		 */
-		xen_poll_irq(irq);
+	/* This uses set_bit, which atomic and therefore a barrier */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+	add_stats(TAKEN_SLOW, 1);
 
-		raw_local_irq_restore(flags);
+	/* clear pending */
+	xen_clear_irq_pending(irq);
 
-		ADD_STATS(taken_slow_spurious, !xen_test_irq_pending(irq));
-	} while (!xen_test_irq_pending(irq)); /* check for spurious wakeups */
+	/* Only check lock once pending cleared */
+	barrier();
 
+	/* check again make sure it didn't become free while
+	   we weren't looking  */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
+	xen_poll_irq(irq);
+	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
-
 out:
-	unspinning_lock(xl, prev);
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
-
-	return ret;
 }
 
-static inline void __xen_spin_lock(struct arch_spinlock *lock, bool irq_enable)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-	unsigned timeout;
-	u8 oldval;
-	u64 start_spin;
-
-	ADD_STATS(taken, 1);
-
-	start_spin = spin_time_start();
-
-	do {
-		u64 start_spin_fast = spin_time_start();
-
-		timeout = TIMEOUT;
-
-		asm("1: xchgb %1,%0\n"
-		    "   testb %1,%1\n"
-		    "   jz 3f\n"
-		    "2: rep;nop\n"
-		    "   cmpb $0,%0\n"
-		    "   je 1b\n"
-		    "   dec %2\n"
-		    "   jnz 2b\n"
-		    "3:\n"
-		    : "+m" (xl->lock), "=q" (oldval), "+r" (timeout)
-		    : "1" (1)
-		    : "memory");
-
-		spin_time_accum_spinning(start_spin_fast);
-
-	} while (unlikely(oldval != 0 &&
-			  (TIMEOUT == ~0 || !xen_spin_lock_slow(lock, irq_enable))));
-
-	spin_time_accum_total(start_spin);
-}
-
-static void xen_spin_lock(struct arch_spinlock *lock)
-{
-	__xen_spin_lock(lock, false);
-}
-
-static void xen_spin_lock_flags(struct arch_spinlock *lock, unsigned long flags)
-{
-	__xen_spin_lock(lock, !raw_irqs_disabled_flags(flags));
-}
-
-static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
+static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
 	int cpu;
 
-	ADD_STATS(released_slow, 1);
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-	for_each_online_cpu(cpu) {
-		/* XXX should mix up next cpu selection */
-		if (per_cpu(lock_spinners, cpu) == xl) {
-			ADD_STATS(released_slow_kicked, 1);
+		if (w->lock == lock && w->want == next) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
+			break;
 		}
 	}
 }
 
-static void xen_spin_unlock(struct arch_spinlock *lock)
-{
-	struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-	ADD_STATS(released, 1);
-
-	smp_wmb();		/* make sure no writes get moved after unlock */
-	xl->lock = 0;		/* release lock */
-
-	/*
-	 * Make sure unlock happens before checking for waiting
-	 * spinners.  We need a strong barrier to enforce the
-	 * write-read ordering to different memory locations, as the
-	 * CPU makes no implied guarantees about their ordering.
-	 */
-	mb();
-
-	if (unlikely(xl->spinners))
-		xen_spin_unlock_slow(xl);
-}
-#endif
-
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
 	BUG();
@@ -420,15 +253,8 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
-	BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
-#if 0
-	pv_lock_ops.spin_is_locked = xen_spin_is_locked;
-	pv_lock_ops.spin_is_contended = xen_spin_is_contended;
-	pv_lock_ops.spin_lock = xen_spin_lock;
-	pv_lock_ops.spin_lock_flags = xen_spin_lock_flags;
-	pv_lock_ops.spin_trylock = xen_spin_trylock;
-	pv_lock_ops.spin_unlock = xen_spin_unlock;
-#endif
+	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS
@@ -446,37 +272,21 @@ static int __init xen_spinlock_debugfs(void)
 
 	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
 
-	debugfs_create_u32("timeout", 0644, d_spin_debug, &lock_timeout);
-
-	debugfs_create_u64("taken", 0444, d_spin_debug, &spinlock_stats.taken);
 	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow);
-	debugfs_create_u32("taken_slow_nested", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_nested);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW]);
 	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_pickup);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
 	debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_spurious);
-	debugfs_create_u32("taken_slow_irqenable", 0444, d_spin_debug,
-			   &spinlock_stats.taken_slow_irqenable);
+			   &spinlock_stats.contention_stats[TAKEN_SLOW_SPURIOUS]);
 
-	debugfs_create_u64("released", 0444, d_spin_debug, &spinlock_stats.released);
 	debugfs_create_u32("released_slow", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW]);
 	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
-			   &spinlock_stats.released_slow_kicked);
+			   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
 
-	debugfs_create_u64("time_spinning", 0444, d_spin_debug,
-			   &spinlock_stats.time_spinning);
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
-	debugfs_create_u64("time_total", 0444, d_spin_debug,
-			   &spinlock_stats.time_total);
 
-	debugfs_create_u32_array("histo_total", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_total, HISTO_BUCKETS + 1);
-	debugfs_create_u32_array("histo_spinning", 0444, d_spin_debug,
-				spinlock_stats.histo_spin_spinning, HISTO_BUCKETS + 1);
 	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 6/18]  xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:17   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 31949da..ec9183b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
 	per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
 	/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
+	if (!xen_pvspin) {
+		printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+		return;
+	}
+
 	pv_lock_ops.lock_spinning = xen_lock_spinning;
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+	xen_pvspin = false;
+	return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 31949da..ec9183b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
 	per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
 	/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
+	if (!xen_pvspin) {
+		printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+		return;
+	}
+
 	pv_lock_ops.lock_spinning = xen_lock_spinning;
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+	xen_pvspin = false;
+	return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
@ 2013-07-22  6:17   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:17 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

From: Jeremy Fitzhardinge <jeremy@goop.org>

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 31949da..ec9183b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
 	per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
 	/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
 	if (xen_hvm_domain())
 		return;
 
+	if (!xen_pvspin) {
+		printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+		return;
+	}
+
 	pv_lock_ops.lock_spinning = xen_lock_spinning;
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+	xen_pvspin = false;
+	return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 7/18]  x86/pvticketlock: Use callee-save for lock_spinning
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:18   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/pvticketlock: Use callee-save for lock_spinning

From: Jeremy Fitzhardinge <jeremy@goop.org>

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 +-
 arch/x86/include/asm/paravirt_types.h |    2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |    2 +-
 arch/x86/xen/spinlock.c               |    3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
-	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include <asm/spinlock_types.h>
 
 struct pv_lock_ops {
-	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	struct paravirt_callee_save lock_spinning;
 	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.lock_spinning = paravirt_nop,
+	.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
 	.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index ec9183b..b41872b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
 	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
-	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 7/18] x86/pvticketlock: Use callee-save for lock_spinning
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/pvticketlock: Use callee-save for lock_spinning

From: Jeremy Fitzhardinge <jeremy@goop.org>

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 +-
 arch/x86/include/asm/paravirt_types.h |    2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |    2 +-
 arch/x86/xen/spinlock.c               |    3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
-	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include <asm/spinlock_types.h>
 
 struct pv_lock_ops {
-	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	struct paravirt_callee_save lock_spinning;
 	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.lock_spinning = paravirt_nop,
+	.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
 	.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index ec9183b..b41872b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
 	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
-	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 7/18] x86/pvticketlock: Use callee-save for lock_spinning
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/pvticketlock: Use callee-save for lock_spinning

From: Jeremy Fitzhardinge <jeremy@goop.org>

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 +-
 arch/x86/include/asm/paravirt_types.h |    2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |    2 +-
 arch/x86/xen/spinlock.c               |    3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
-	PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include <asm/spinlock_types.h>
 
 struct pv_lock_ops {
-	void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+	struct paravirt_callee_save lock_spinning;
 	void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-	.lock_spinning = paravirt_nop,
+	.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
 	.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index ec9183b..b41872b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
 	local_irq_restore(flags);
 	spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
-	pv_lock_ops.lock_spinning = xen_lock_spinning;
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 8/18]  x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:18   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

From: Jeremy Fitzhardinge <jeremy@goop.org>

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h       |   10 +++++-----
 arch/x86/include/asm/spinlock_types.h |   10 +++++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	if (old.tickets.head != old.tickets.tail)
 		return 0;
 
-	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
 	/* cmpxchg is a full barrier, so nothing can move before it */
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + 1;
+	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 	__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (__ticket_t)(tmp.tail - tmp.head) > 1;
+	return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include <linux/types.h>
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC	2
+#else
+#define __TICKET_LOCK_INC	1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC	((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

From: Jeremy Fitzhardinge <jeremy@goop.org>

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h       |   10 +++++-----
 arch/x86/include/asm/spinlock_types.h |   10 +++++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	if (old.tickets.head != old.tickets.tail)
 		return 0;
 
-	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
 	/* cmpxchg is a full barrier, so nothing can move before it */
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + 1;
+	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 	__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (__ticket_t)(tmp.tail - tmp.head) > 1;
+	return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include <linux/types.h>
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC	2
+#else
+#define __TICKET_LOCK_INC	1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC	((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

From: Jeremy Fitzhardinge <jeremy@goop.org>

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Attilio Rao <attilio.rao@citrix.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/spinlock.h       |   10 +++++-----
 arch/x86/include/asm/spinlock_types.h |   10 +++++++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-	register struct __raw_tickets inc = { .tail = 1 };
+	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	if (old.tickets.head != old.tickets.tail)
 		return 0;
 
-	new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
 	/* cmpxchg is a full barrier, so nothing can move before it */
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + 1;
+	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-	__add(&lock->tickets.head, 1, UNLOCK_LOCK_PREFIX);
+	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 	__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
 	struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-	return (__ticket_t)(tmp.tail - tmp.head) > 1;
+	return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include <linux/types.h>
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC	2
+#else
+#define __TICKET_LOCK_INC	1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC	((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT	(sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 9/18]  jump_label: Split out rate limiting from jump_label.h
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:18   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

jump_label: Split jumplabel ratelimit

From: Andrew Jones <drjones@redhat.com>

Commit b202952075f62603bea9bfb6ebc6b0420db11949 ("perf, core: Rate limit
perf_sched_events jump_label patching") introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 include/linux/jump_label.h           |   26 +-------------------------
 include/linux/jump_label_ratelimit.h |   34 ++++++++++++++++++++++++++++++++++
 include/linux/perf_event.h           |    1 +
 kernel/jump_label.c                  |    1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..53cdf89 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
-#include <linux/workqueue.h>
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-	struct static_key key;
-	unsigned long timeout;
-	struct delayed_work work;
-};
-
 # include <asm/jump_label.h>
 # define HAVE_JUMP_LABEL
 #endif	/* CC_HAVE_ASM_GOTO && CONFIG_JUMP_LABEL */
@@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 	{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-	struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
 	if (unlikely(atomic_read(&key->enabled)) > 0)
@@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key *key)
 	atomic_dec(&key->enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
-{
-	static_key_slow_dec(&key->key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
 	return 0;
@@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod)
 	return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-		unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 		{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
 	return (atomic_read(&key->enabled) > 0);
diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 0000000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include <linux/jump_label.h>
+#include <linux/workqueue.h>
+
+#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+	struct static_key key;
+	unsigned long timeout;
+	struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else	/* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+	struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
+{
+	static_key_slow_dec(&key->key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+		unsigned long rl)
+{
+}
+#endif	/* HAVE_JUMP_LABEL */
+#endif	/* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8873f82..b8cc383 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -48,6 +48,7 @@ struct perf_guest_info_callbacks {
 #include <linux/cpu.h>
 #include <linux/irq_work.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
 #include <linux/perf_regs.h>
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 60f48fa..297a924 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -13,6 +13,7 @@
 #include <linux/sort.h>
 #include <linux/err.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 
 #ifdef HAVE_JUMP_LABEL
 


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 9/18] jump_label: Split out rate limiting from jump_label.h
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

jump_label: Split jumplabel ratelimit

From: Andrew Jones <drjones@redhat.com>

Commit b202952075f62603bea9bfb6ebc6b0420db11949 ("perf, core: Rate limit
perf_sched_events jump_label patching") introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 include/linux/jump_label.h           |   26 +-------------------------
 include/linux/jump_label_ratelimit.h |   34 ++++++++++++++++++++++++++++++++++
 include/linux/perf_event.h           |    1 +
 kernel/jump_label.c                  |    1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..53cdf89 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
-#include <linux/workqueue.h>
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-	struct static_key key;
-	unsigned long timeout;
-	struct delayed_work work;
-};
-
 # include <asm/jump_label.h>
 # define HAVE_JUMP_LABEL
 #endif	/* CC_HAVE_ASM_GOTO && CONFIG_JUMP_LABEL */
@@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 	{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-	struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
 	if (unlikely(atomic_read(&key->enabled)) > 0)
@@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key *key)
 	atomic_dec(&key->enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
-{
-	static_key_slow_dec(&key->key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
 	return 0;
@@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod)
 	return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-		unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 		{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
 	return (atomic_read(&key->enabled) > 0);
diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 0000000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include <linux/jump_label.h>
+#include <linux/workqueue.h>
+
+#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+	struct static_key key;
+	unsigned long timeout;
+	struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else	/* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+	struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
+{
+	static_key_slow_dec(&key->key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+		unsigned long rl)
+{
+}
+#endif	/* HAVE_JUMP_LABEL */
+#endif	/* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8873f82..b8cc383 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -48,6 +48,7 @@ struct perf_guest_info_callbacks {
 #include <linux/cpu.h>
 #include <linux/irq_work.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
 #include <linux/perf_regs.h>
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 60f48fa..297a924 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -13,6 +13,7 @@
 #include <linux/sort.h>
 #include <linux/err.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 
 #ifdef HAVE_JUMP_LABEL

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 9/18] jump_label: Split out rate limiting from jump_label.h
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

jump_label: Split jumplabel ratelimit

From: Andrew Jones <drjones@redhat.com>

Commit b202952075f62603bea9bfb6ebc6b0420db11949 ("perf, core: Rate limit
perf_sched_events jump_label patching") introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 include/linux/jump_label.h           |   26 +-------------------------
 include/linux/jump_label_ratelimit.h |   34 ++++++++++++++++++++++++++++++++++
 include/linux/perf_event.h           |    1 +
 kernel/jump_label.c                  |    1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..53cdf89 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
-#include <linux/workqueue.h>
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-	struct static_key key;
-	unsigned long timeout;
-	struct delayed_work work;
-};
-
 # include <asm/jump_label.h>
 # define HAVE_JUMP_LABEL
 #endif	/* CC_HAVE_ASM_GOTO && CONFIG_JUMP_LABEL */
@@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 	{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-	struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
 	if (unlikely(atomic_read(&key->enabled)) > 0)
@@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key *key)
 	atomic_dec(&key->enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
-{
-	static_key_slow_dec(&key->key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
 	return 0;
@@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod)
 	return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-		unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
 		{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
 	return (atomic_read(&key->enabled) > 0);
diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 0000000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include <linux/jump_label.h>
+#include <linux/workqueue.h>
+
+#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+	struct static_key key;
+	unsigned long timeout;
+	struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else	/* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+	struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred *key)
+{
+	static_key_slow_dec(&key->key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+		unsigned long rl)
+{
+}
+#endif	/* HAVE_JUMP_LABEL */
+#endif	/* _LINUX_JUMP_LABEL_RATELIMIT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8873f82..b8cc383 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -48,6 +48,7 @@ struct perf_guest_info_callbacks {
 #include <linux/cpu.h>
 #include <linux/irq_work.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
 #include <linux/perf_regs.h>
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 60f48fa..297a924 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -13,6 +13,7 @@
 #include <linux/sort.h>
 #include <linux/err.h>
 #include <linux/static_key.h>
+#include <linux/jump_label_ratelimit.h>
 
 #ifdef HAVE_JUMP_LABEL

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 10/18]  x86/ticketlock: Add slowpath logic
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:18   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

x86/ticketlock: Add slowpath logic

From: Jeremy Fitzhardinge <jeremy@goop.org>

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

Unlocker			Locker
				test for lock pickup
					-> fail
unlock
test slowpath
	-> false
				set slowpath flags
				block

Whereas this works in any ordering:

Unlocker			Locker
				set slowpath flags
				test for lock pickup
					-> fail
				block
unlock
test slowpath
	-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 -
 arch/x86/include/asm/spinlock.h       |   86 ++++++++++++++++++++++++---------
 arch/x86/include/asm/spinlock_types.h |    2 +
 arch/x86/kernel/paravirt-spinlocks.c  |    3 +
 arch/x86/xen/spinlock.c               |    6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
 	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include <linux/jump_label.h>
 #include <linux/atomic.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <linux/compiler.h>
 #include <asm/paravirt.h>
+#include <asm/bitops.h>
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD	(1 << 15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-							__ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+	set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
-							 __ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
 }
-
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
-
-/*
- * If a spinlock has someone waiting on it, then kick the appropriate
- * waiting cpu.
- */
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
-							__ticket_t next)
+static inline void __ticket_unlock_kick(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
-	if (unlikely(lock->tickets.tail != next))
-		____ticket_unlock_kick(lock, next);
 }
 
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -76,20 +75,22 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
+	if (likely(inc.head == inc.tail))
+		goto out;
 
+	inc.tail &= ~TICKET_SLOWPATH_FLAG;
 	for (;;) {
 		unsigned count = SPIN_THRESHOLD;
 
 		do {
-			if (inc.head == inc.tail)
+			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
 				goto out;
 			cpu_relax();
-			inc.head = ACCESS_ONCE(lock->tickets.head);
 		} while (--count);
 		__ticket_lock_spinning(lock, inc.tail);
 	}
@@ -101,7 +102,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	arch_spinlock_t old, new;
 
 	old.tickets = ACCESS_ONCE(lock->tickets);
-	if (old.tickets.head != old.tickets.tail)
+	if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
 		return 0;
 
 	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
@@ -110,12 +111,49 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
+static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
+					    arch_spinlock_t old)
+{
+	arch_spinlock_t new;
+
+	BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
+
+	/* Perform the unlock on the "before" copy */
+	old.tickets.head += TICKET_LOCK_INC;
+
+	/* Clear the slowpath flag */
+	new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
+
+	/*
+	 * If the lock is uncontended, clear the flag - use cmpxchg in
+	 * case it changes behind our back though.
+	 */
+	if (new.tickets.head != new.tickets.tail ||
+	    cmpxchg(&lock->head_tail, old.head_tail,
+					new.head_tail) != old.head_tail) {
+		/*
+		 * Lock still has someone queued for it, so wake up an
+		 * appropriate waiter.
+		 */
+		__ticket_unlock_kick(lock, old.tickets.head);
+	}
+}
+
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
+	if (TICKET_SLOWPATH_FLAG &&
+	    static_key_false(&paravirt_ticketlocks_enabled)) {
+		arch_spinlock_t prev;
+
+		prev = *lock;
+		add_smp(&lock->tickets.head, TICKET_LOCK_INC);
+
+		/* add_smp() is a full mb() */
 
-	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
-	__ticket_unlock_kick(lock, next);
+		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
+			__ticket_unlock_slowpath(lock, prev);
+	} else
+		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e96fcbd..4f1bea1 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,8 +5,10 @@
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 #define __TICKET_LOCK_INC	2
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
 #else
 #define __TICKET_LOCK_INC	1
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
 #endif
 
 #if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 4251c1d..bbb6c73 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -4,6 +4,7 @@
  */
 #include <linux/spinlock.h>
 #include <linux/module.h>
+#include <linux/jump_label.h>
 
 #include <asm/paravirt.h>
 
@@ -15,3 +16,5 @@ struct pv_lock_ops pv_lock_ops = {
 };
 EXPORT_SYMBOL(pv_lock_ops);
 
+struct static_key paravirt_ticketlocks_enabled = STATIC_KEY_INIT_FALSE;
+EXPORT_SYMBOL(paravirt_ticketlocks_enabled);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index b41872b..dff5841 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -157,6 +157,10 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
+	/* Mark entry to slowpath before doing the pickup test to make
+	   sure we don't deadlock with an unlocker. */
+	__ticket_enter_slowpath(lock);
+
 	/* check again make sure it didn't become free while
 	   we weren't looking  */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
@@ -261,6 +265,8 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
 	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 10/18]  x86/ticketlock: Add slowpath logic
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/ticketlock: Add slowpath logic

From: Jeremy Fitzhardinge <jeremy@goop.org>

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

Unlocker			Locker
				test for lock pickup
					-> fail
unlock
test slowpath
	-> false
				set slowpath flags
				block

Whereas this works in any ordering:

Unlocker			Locker
				set slowpath flags
				test for lock pickup
					-> fail
				block
unlock
test slowpath
	-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 -
 arch/x86/include/asm/spinlock.h       |   86 ++++++++++++++++++++++++---------
 arch/x86/include/asm/spinlock_types.h |    2 +
 arch/x86/kernel/paravirt-spinlocks.c  |    3 +
 arch/x86/xen/spinlock.c               |    6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
 	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include <linux/jump_label.h>
 #include <linux/atomic.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <linux/compiler.h>
 #include <asm/paravirt.h>
+#include <asm/bitops.h>
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD	(1 << 15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-							__ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+	set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
-							 __ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
 }
-
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
-
-/*
- * If a spinlock has someone waiting on it, then kick the appropriate
- * waiting cpu.
- */
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
-							__ticket_t next)
+static inline void __ticket_unlock_kick(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
-	if (unlikely(lock->tickets.tail != next))
-		____ticket_unlock_kick(lock, next);
 }
 
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -76,20 +75,22 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
+	if (likely(inc.head == inc.tail))
+		goto out;
 
+	inc.tail &= ~TICKET_SLOWPATH_FLAG;
 	for (;;) {
 		unsigned count = SPIN_THRESHOLD;
 
 		do {
-			if (inc.head == inc.tail)
+			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
 				goto out;
 			cpu_relax();
-			inc.head = ACCESS_ONCE(lock->tickets.head);
 		} while (--count);
 		__ticket_lock_spinning(lock, inc.tail);
 	}
@@ -101,7 +102,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	arch_spinlock_t old, new;
 
 	old.tickets = ACCESS_ONCE(lock->tickets);
-	if (old.tickets.head != old.tickets.tail)
+	if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
 		return 0;
 
 	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
@@ -110,12 +111,49 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
+static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
+					    arch_spinlock_t old)
+{
+	arch_spinlock_t new;
+
+	BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
+
+	/* Perform the unlock on the "before" copy */
+	old.tickets.head += TICKET_LOCK_INC;
+
+	/* Clear the slowpath flag */
+	new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
+
+	/*
+	 * If the lock is uncontended, clear the flag - use cmpxchg in
+	 * case it changes behind our back though.
+	 */
+	if (new.tickets.head != new.tickets.tail ||
+	    cmpxchg(&lock->head_tail, old.head_tail,
+					new.head_tail) != old.head_tail) {
+		/*
+		 * Lock still has someone queued for it, so wake up an
+		 * appropriate waiter.
+		 */
+		__ticket_unlock_kick(lock, old.tickets.head);
+	}
+}
+
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
+	if (TICKET_SLOWPATH_FLAG &&
+	    static_key_false(&paravirt_ticketlocks_enabled)) {
+		arch_spinlock_t prev;
+
+		prev = *lock;
+		add_smp(&lock->tickets.head, TICKET_LOCK_INC);
+
+		/* add_smp() is a full mb() */
 
-	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
-	__ticket_unlock_kick(lock, next);
+		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
+			__ticket_unlock_slowpath(lock, prev);
+	} else
+		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e96fcbd..4f1bea1 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,8 +5,10 @@
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 #define __TICKET_LOCK_INC	2
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
 #else
 #define __TICKET_LOCK_INC	1
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
 #endif
 
 #if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 4251c1d..bbb6c73 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -4,6 +4,7 @@
  */
 #include <linux/spinlock.h>
 #include <linux/module.h>
+#include <linux/jump_label.h>
 
 #include <asm/paravirt.h>
 
@@ -15,3 +16,5 @@ struct pv_lock_ops pv_lock_ops = {
 };
 EXPORT_SYMBOL(pv_lock_ops);
 
+struct static_key paravirt_ticketlocks_enabled = STATIC_KEY_INIT_FALSE;
+EXPORT_SYMBOL(paravirt_ticketlocks_enabled);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index b41872b..dff5841 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -157,6 +157,10 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
+	/* Mark entry to slowpath before doing the pickup test to make
+	   sure we don't deadlock with an unlocker. */
+	__ticket_enter_slowpath(lock);
+
 	/* check again make sure it didn't become free while
 	   we weren't looking  */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
@@ -261,6 +265,8 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
 	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 10/18]  x86/ticketlock: Add slowpath logic
@ 2013-07-22  6:18   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:18 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

x86/ticketlock: Add slowpath logic

From: Jeremy Fitzhardinge <jeremy@goop.org>

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

Unlocker			Locker
				test for lock pickup
					-> fail
unlock
test slowpath
	-> false
				set slowpath flags
				block

Whereas this works in any ordering:

Unlocker			Locker
				set slowpath flags
				test for lock pickup
					-> fail
				block
unlock
test slowpath
	-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt.h       |    2 -
 arch/x86/include/asm/spinlock.h       |   86 ++++++++++++++++++++++++---------
 arch/x86/include/asm/spinlock_types.h |    2 +
 arch/x86/kernel/paravirt-spinlocks.c  |    3 +
 arch/x86/xen/spinlock.c               |    6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
 	PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
 							__ticket_t ticket)
 {
 	PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include <linux/jump_label.h>
 #include <linux/atomic.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <linux/compiler.h>
 #include <asm/paravirt.h>
+#include <asm/bitops.h>
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD	(1 << 15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-							__ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+	set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
-static __always_inline void ____ticket_unlock_kick(struct arch_spinlock *lock,
-							 __ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
 }
-
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
-
-
-/*
- * If a spinlock has someone waiting on it, then kick the appropriate
- * waiting cpu.
- */
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
-							__ticket_t next)
+static inline void __ticket_unlock_kick(arch_spinlock_t *lock,
+							__ticket_t ticket)
 {
-	if (unlikely(lock->tickets.tail != next))
-		____ticket_unlock_kick(lock, next);
 }
 
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
 /*
  * Ticket locks are conceptually two parts, one indicating the current head of
  * the queue, and the other indicating the current tail. The lock is acquired
@@ -76,20 +75,22 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
 	inc = xadd(&lock->tickets, inc);
+	if (likely(inc.head == inc.tail))
+		goto out;
 
+	inc.tail &= ~TICKET_SLOWPATH_FLAG;
 	for (;;) {
 		unsigned count = SPIN_THRESHOLD;
 
 		do {
-			if (inc.head == inc.tail)
+			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
 				goto out;
 			cpu_relax();
-			inc.head = ACCESS_ONCE(lock->tickets.head);
 		} while (--count);
 		__ticket_lock_spinning(lock, inc.tail);
 	}
@@ -101,7 +102,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	arch_spinlock_t old, new;
 
 	old.tickets = ACCESS_ONCE(lock->tickets);
-	if (old.tickets.head != old.tickets.tail)
+	if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
 		return 0;
 
 	new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
@@ -110,12 +111,49 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 	return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
 }
 
+static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
+					    arch_spinlock_t old)
+{
+	arch_spinlock_t new;
+
+	BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
+
+	/* Perform the unlock on the "before" copy */
+	old.tickets.head += TICKET_LOCK_INC;
+
+	/* Clear the slowpath flag */
+	new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
+
+	/*
+	 * If the lock is uncontended, clear the flag - use cmpxchg in
+	 * case it changes behind our back though.
+	 */
+	if (new.tickets.head != new.tickets.tail ||
+	    cmpxchg(&lock->head_tail, old.head_tail,
+					new.head_tail) != old.head_tail) {
+		/*
+		 * Lock still has someone queued for it, so wake up an
+		 * appropriate waiter.
+		 */
+		__ticket_unlock_kick(lock, old.tickets.head);
+	}
+}
+
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	__ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
+	if (TICKET_SLOWPATH_FLAG &&
+	    static_key_false(&paravirt_ticketlocks_enabled)) {
+		arch_spinlock_t prev;
+
+		prev = *lock;
+		add_smp(&lock->tickets.head, TICKET_LOCK_INC);
+
+		/* add_smp() is a full mb() */
 
-	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
-	__ticket_unlock_kick(lock, next);
+		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
+			__ticket_unlock_slowpath(lock, prev);
+	} else
+		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index e96fcbd..4f1bea1 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,8 +5,10 @@
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 #define __TICKET_LOCK_INC	2
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
 #else
 #define __TICKET_LOCK_INC	1
+#define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
 #endif
 
 #if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 4251c1d..bbb6c73 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -4,6 +4,7 @@
  */
 #include <linux/spinlock.h>
 #include <linux/module.h>
+#include <linux/jump_label.h>
 
 #include <asm/paravirt.h>
 
@@ -15,3 +16,5 @@ struct pv_lock_ops pv_lock_ops = {
 };
 EXPORT_SYMBOL(pv_lock_ops);
 
+struct static_key paravirt_ticketlocks_enabled = STATIC_KEY_INIT_FALSE;
+EXPORT_SYMBOL(paravirt_ticketlocks_enabled);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index b41872b..dff5841 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -157,6 +157,10 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
+	/* Mark entry to slowpath before doing the pickup test to make
+	   sure we don't deadlock with an unlocker. */
+	__ticket_enter_slowpath(lock);
+
 	/* check again make sure it didn't become free while
 	   we weren't looking  */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
@@ -261,6 +265,8 @@ void __init xen_init_spinlocks(void)
 		return;
 	}
 
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
 	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
 	pv_lock_ops.unlock_kick = xen_unlock_kick;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 11/18]  xen/pvticketlock: Allow interrupts to be enabled while blocking
  2013-07-22  6:16 ` Raghavendra K T
@ 2013-07-22  6:19   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

xen/pvticketlock: Allow interrupts to be enabled while blocking

From: Jeremy Fitzhardinge <jeremy@goop.org>

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index dff5841..44d06f3 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	 * partially setup state.
 	 */
 	local_irq_save(flags);
-
+	/*
+	 * We don't really care if we're overwriting some other
+	 * (lock,want) pair, as that would mean that we're currently
+	 * in an interrupt context, and the outer context had
+	 * interrupts enabled.  That has already kicked the VCPU out
+	 * of xen_poll_irq(), so it will just return spuriously and
+	 * retry with newly setup (lock,want).
+	 *
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
 	w->want = want;
 	smp_wmb();
 	w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
-	/* Mark entry to slowpath before doing the pickup test to make
-	   sure we don't deadlock with an unlocker. */
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
 	__ticket_enter_slowpath(lock);
 
-	/* check again make sure it didn't become free while
-	   we weren't looking  */
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking
+	 */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
 		add_stats(TAKEN_SLOW_PICKUP, 1);
 		goto out;
 	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/*
+	 * If an interrupt happens here, it will leave the wakeup irq
+	 * pending, which will cause xen_poll_irq() to return
+	 * immediately.
+	 */
+
 	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
 	xen_poll_irq(irq);
 	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+	local_irq_save(flags);
+
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
 	cpumask_clear_cpu(cpu, &waiting_cpus);
 	w->lock = NULL;
+
 	local_irq_restore(flags);
+
 	spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 	for_each_cpu(cpu, &waiting_cpus) {
 		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-		if (w->lock == lock && w->want == next) {
+		/* Make sure we read lock before want */
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == next) {
 			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
 			break;


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking
  2013-07-22  6:16 ` Raghavendra K T
                   ` (11 preceding siblings ...)
  (?)
@ 2013-07-22  6:19 ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

xen/pvticketlock: Allow interrupts to be enabled while blocking

From: Jeremy Fitzhardinge <jeremy@goop.org>

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index dff5841..44d06f3 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	 * partially setup state.
 	 */
 	local_irq_save(flags);
-
+	/*
+	 * We don't really care if we're overwriting some other
+	 * (lock,want) pair, as that would mean that we're currently
+	 * in an interrupt context, and the outer context had
+	 * interrupts enabled.  That has already kicked the VCPU out
+	 * of xen_poll_irq(), so it will just return spuriously and
+	 * retry with newly setup (lock,want).
+	 *
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
 	w->want = want;
 	smp_wmb();
 	w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
-	/* Mark entry to slowpath before doing the pickup test to make
-	   sure we don't deadlock with an unlocker. */
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
 	__ticket_enter_slowpath(lock);
 
-	/* check again make sure it didn't become free while
-	   we weren't looking  */
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking
+	 */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
 		add_stats(TAKEN_SLOW_PICKUP, 1);
 		goto out;
 	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/*
+	 * If an interrupt happens here, it will leave the wakeup irq
+	 * pending, which will cause xen_poll_irq() to return
+	 * immediately.
+	 */
+
 	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
 	xen_poll_irq(irq);
 	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+	local_irq_save(flags);
+
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
 	cpumask_clear_cpu(cpu, &waiting_cpus);
 	w->lock = NULL;
+
 	local_irq_restore(flags);
+
 	spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 	for_each_cpu(cpu, &waiting_cpus) {
 		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-		if (w->lock == lock && w->want == next) {
+		/* Make sure we read lock before want */
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == next) {
 			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
 			break;

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 11/18]  xen/pvticketlock: Allow interrupts to be enabled while blocking
@ 2013-07-22  6:19   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

xen/pvticketlock: Allow interrupts to be enabled while blocking

From: Jeremy Fitzhardinge <jeremy@goop.org>

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/xen/spinlock.c |   46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index dff5841..44d06f3 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	 * partially setup state.
 	 */
 	local_irq_save(flags);
-
+	/*
+	 * We don't really care if we're overwriting some other
+	 * (lock,want) pair, as that would mean that we're currently
+	 * in an interrupt context, and the outer context had
+	 * interrupts enabled.  That has already kicked the VCPU out
+	 * of xen_poll_irq(), so it will just return spuriously and
+	 * retry with newly setup (lock,want).
+	 *
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
 	w->want = want;
 	smp_wmb();
 	w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 	/* Only check lock once pending cleared */
 	barrier();
 
-	/* Mark entry to slowpath before doing the pickup test to make
-	   sure we don't deadlock with an unlocker. */
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
 	__ticket_enter_slowpath(lock);
 
-	/* check again make sure it didn't become free while
-	   we weren't looking  */
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking
+	 */
 	if (ACCESS_ONCE(lock->tickets.head) == want) {
 		add_stats(TAKEN_SLOW_PICKUP, 1);
 		goto out;
 	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/*
+	 * If an interrupt happens here, it will leave the wakeup irq
+	 * pending, which will cause xen_poll_irq() to return
+	 * immediately.
+	 */
+
 	/* Block until irq becomes pending (or perhaps a spurious wakeup) */
 	xen_poll_irq(irq);
 	add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+	local_irq_save(flags);
+
 	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
 	cpumask_clear_cpu(cpu, &waiting_cpus);
 	w->lock = NULL;
+
 	local_irq_restore(flags);
+
 	spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 	for_each_cpu(cpu, &waiting_cpus) {
 		const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
 
-		if (w->lock == lock && w->want == next) {
+		/* Make sure we read lock before want */
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == next) {
 			add_stats(RELEASED_SLOW_KICKED, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
 			break;


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:19   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h      |    5 +++++
 arch/x86/include/uapi/asm/kvm_para.h |    1 +
 arch/x86/kvm/cpuid.c                 |    3 ++-
 arch/x86/kvm/x86.c                   |   37 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm_para.h        |    1 +
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..1d1f711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -511,6 +511,11 @@ struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	/* pv related host specific info */
+	struct {
+		bool pv_unhalted;
+	} pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
+#define KVM_FEATURE_PV_UNHALT		7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PV_UNHALT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..dae4575 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5495,6 +5495,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		/*
+		 * Setting unhalt flag here can result in spurious runnable
+		 * state when unhalt reset does not happen in vcpu_block.
+		 * But that is harmless since that should soon result in halt.
+		 */
+		vcpu->arch.pv.pv_unhalted = true;
+		/* We need everybody see unhalt before vcpu unblocks */
+		smp_wmb();
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5528,6 +5558,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_VAPIC_POLL_IRQ:
 		ret = 0;
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0, a1);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -5950,6 +5984,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 				kvm_apic_accept_events(vcpu);
 				switch(vcpu->arch.mp_state) {
 				case KVM_MP_STATE_HALTED:
+					vcpu->arch.pv.pv_unhalted = false;
 					vcpu->arch.mp_state =
 						KVM_MP_STATE_RUNNABLE;
 				case KVM_MP_STATE_RUNNABLE:
@@ -6770,6 +6805,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	BUG_ON(vcpu->kvm == NULL);
 	kvm = vcpu->kvm;
 
+	vcpu->arch.pv.pv_unhalted = false;
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -7103,6 +7139,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| kvm_apic_has_events(vcpu)
+		|| vcpu->arch.pv.pv_unhalted
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2013-07-22  6:19   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h      |    5 +++++
 arch/x86/include/uapi/asm/kvm_para.h |    1 +
 arch/x86/kvm/cpuid.c                 |    3 ++-
 arch/x86/kvm/x86.c                   |   37 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm_para.h        |    1 +
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..1d1f711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -511,6 +511,11 @@ struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	/* pv related host specific info */
+	struct {
+		bool pv_unhalted;
+	} pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
+#define KVM_FEATURE_PV_UNHALT		7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PV_UNHALT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..dae4575 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5495,6 +5495,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		/*
+		 * Setting unhalt flag here can result in spurious runnable
+		 * state when unhalt reset does not happen in vcpu_block.
+		 * But that is harmless since that should soon result in halt.
+		 */
+		vcpu->arch.pv.pv_unhalted = true;
+		/* We need everybody see unhalt before vcpu unblocks */
+		smp_wmb();
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5528,6 +5558,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_VAPIC_POLL_IRQ:
 		ret = 0;
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0, a1);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -5950,6 +5984,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 				kvm_apic_accept_events(vcpu);
 				switch(vcpu->arch.mp_state) {
 				case KVM_MP_STATE_HALTED:
+					vcpu->arch.pv.pv_unhalted = false;
 					vcpu->arch.mp_state =
 						KVM_MP_STATE_RUNNABLE;
 				case KVM_MP_STATE_RUNNABLE:
@@ -6770,6 +6805,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	BUG_ON(vcpu->kvm == NULL);
 	kvm = vcpu->kvm;
 
+	vcpu->arch.pv.pv_unhalted = false;
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -7103,6 +7139,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| kvm_apic_has_events(vcpu)
+		|| vcpu->arch.pv.pv_unhalted
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
@ 2013-07-22  6:19   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h      |    5 +++++
 arch/x86/include/uapi/asm/kvm_para.h |    1 +
 arch/x86/kvm/cpuid.c                 |    3 ++-
 arch/x86/kvm/x86.c                   |   37 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm_para.h        |    1 +
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..1d1f711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -511,6 +511,11 @@ struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	/* pv related host specific info */
+	struct {
+		bool pv_unhalted;
+	} pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
+#define KVM_FEATURE_PV_UNHALT		7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PV_UNHALT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..dae4575 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5495,6 +5495,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		/*
+		 * Setting unhalt flag here can result in spurious runnable
+		 * state when unhalt reset does not happen in vcpu_block.
+		 * But that is harmless since that should soon result in halt.
+		 */
+		vcpu->arch.pv.pv_unhalted = true;
+		/* We need everybody see unhalt before vcpu unblocks */
+		smp_wmb();
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5528,6 +5558,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_VAPIC_POLL_IRQ:
 		ret = 0;
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0, a1);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -5950,6 +5984,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 				kvm_apic_accept_events(vcpu);
 				switch(vcpu->arch.mp_state) {
 				case KVM_MP_STATE_HALTED:
+					vcpu->arch.pv.pv_unhalted = false;
 					vcpu->arch.mp_state =
 						KVM_MP_STATE_RUNNABLE;
 				case KVM_MP_STATE_RUNNABLE:
@@ -6770,6 +6805,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	BUG_ON(vcpu->kvm == NULL);
 	kvm = vcpu->kvm;
 
+	vcpu->arch.pv.pv_unhalted = false;
 	vcpu->arch.emulate_ctxt.ops = &emulate_ops;
 	if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -7103,6 +7139,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| kvm_apic_has_events(vcpu)
+		|| vcpu->arch.pv.pv_unhalted
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:19   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dae4575..1e73dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6284,7 +6284,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
 	kvm_apic_accept_events(vcpu);
-	mp_state->mp_state = vcpu->arch.mp_state;
+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
+					vcpu->arch.pv.pv_unhalted)
+		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+	else
+		mp_state->mp_state = vcpu->arch.mp_state;
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
@ 2013-07-22  6:19   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dae4575..1e73dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6284,7 +6284,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
 	kvm_apic_accept_events(vcpu);
-	mp_state->mp_state = vcpu->arch.mp_state;
+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
+					vcpu->arch.pv.pv_unhalted)
+		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+	else
+		mp_state->mp_state = vcpu->arch.mp_state;
+
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
@ 2013-07-22  6:19   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:19 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dae4575..1e73dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6284,7 +6284,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
 	kvm_apic_accept_events(vcpu);
-	mp_state->mp_state = vcpu->arch.mp_state;
+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED &&
+					vcpu->arch.pv.pv_unhalted)
+		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+	else
+		mp_state->mp_state = vcpu->arch.mp_state;
+
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:20   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm guest : Add configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 112e712..b1fb846 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -657,6 +657,15 @@ config KVM_GUEST
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT_TIME_ACCOUNTING


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm guest : Add configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 112e712..b1fb846 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -657,6 +657,15 @@ config KVM_GUEST
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT_TIME_ACCOUNTING

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm guest : Add configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/Kconfig |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 112e712..b1fb846 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -657,6 +657,15 @@ config KVM_GUEST
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT_TIME_ACCOUNTING

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:20   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c           |  259 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 271 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..b5aa5f4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 	WARN_ON(kvm_register_clock("primary cpu clock"));
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,260 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+	unsigned long flags = 0;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/*
+	 * halt until it's our turn and kicked. Note that we do safe halt
+	 * for irq enabled case to avoid hang when lock info is overwritten
+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
+	 */
+	if (arch_irqs_disabled_flags(flags))
+		halt();
+	else
+		safe_halt();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
+
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c           |  259 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 271 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..b5aa5f4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 	WARN_ON(kvm_register_clock("primary cpu clock"));
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,260 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+	unsigned long flags = 0;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/*
+	 * halt until it's our turn and kicked. Note that we do safe halt
+	 * for irq enabled case to avoid hang when lock info is overwritten
+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
+	 */
+	if (arch_irqs_disabled_flags(flags))
+		halt();
+	else
+		safe_halt();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
+
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c           |  259 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 271 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..b5aa5f4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 	WARN_ON(kvm_register_clock("primary cpu clock"));
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,260 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+	unsigned long flags = 0;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/*
+	 * halt until it's our turn and kicked. Note that we do safe halt
+	 * for irq enabled case to avoid hang when lock info is overwritten
+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
+	 */
+	if (arch_irqs_disabled_flags(flags))
+		halt();
+	else
+		safe_halt();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
+
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:20   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/lapic.c |    5 ++++-
 arch/x86/kvm/x86.c   |   25 ++++++-------------------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
 		break;
 
 	case APIC_DM_REMRD:
-		apic_debug("Ignoring delivery mode 3\n");
+		result = 1;
+		vcpu->arch.pv.pv_unhalted = 1;
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
 		break;
 
 	case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
 {
-	struct kvm_vcpu *vcpu = NULL;
-	int i;
+	struct kvm_lapic_irq lapic_irq;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!kvm_apic_present(vcpu))
-			continue;
+	lapic_irq.shorthand = 0;
+	lapic_irq.dest_mode = 0;
+	lapic_irq.dest_id = apicid;
 
-		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-			break;
-	}
-	if (vcpu) {
-		/*
-		 * Setting unhalt flag here can result in spurious runnable
-		 * state when unhalt reset does not happen in vcpu_block.
-		 * But that is harmless since that should soon result in halt.
-		 */
-		vcpu->arch.pv.pv_unhalted = true;
-		/* We need everybody see unhalt before vcpu unblocks */
-		smp_wmb();
-		kvm_vcpu_kick(vcpu);
-	}
+	lapic_irq.delivery_mode = APIC_DM_REMRD;
+	kvm_irq_delivery_to_apic(kvm, 0, &lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/lapic.c |    5 ++++-
 arch/x86/kvm/x86.c   |   25 ++++++-------------------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
 		break;
 
 	case APIC_DM_REMRD:
-		apic_debug("Ignoring delivery mode 3\n");
+		result = 1;
+		vcpu->arch.pv.pv_unhalted = 1;
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
 		break;
 
 	case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
 {
-	struct kvm_vcpu *vcpu = NULL;
-	int i;
+	struct kvm_lapic_irq lapic_irq;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!kvm_apic_present(vcpu))
-			continue;
+	lapic_irq.shorthand = 0;
+	lapic_irq.dest_mode = 0;
+	lapic_irq.dest_id = apicid;
 
-		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-			break;
-	}
-	if (vcpu) {
-		/*
-		 * Setting unhalt flag here can result in spurious runnable
-		 * state when unhalt reset does not happen in vcpu_block.
-		 * But that is harmless since that should soon result in halt.
-		 */
-		vcpu->arch.pv.pv_unhalted = true;
-		/* We need everybody see unhalt before vcpu unblocks */
-		smp_wmb();
-		kvm_vcpu_kick(vcpu);
-	}
+	lapic_irq.delivery_mode = APIC_DM_REMRD;
+	kvm_irq_delivery_to_apic(kvm, 0, &lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/kvm/lapic.c |    5 ++++-
 arch/x86/kvm/x86.c   |   25 ++++++-------------------
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
 		break;
 
 	case APIC_DM_REMRD:
-		apic_debug("Ignoring delivery mode 3\n");
+		result = 1;
+		vcpu->arch.pv.pv_unhalted = 1;
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
 		break;
 
 	case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
 {
-	struct kvm_vcpu *vcpu = NULL;
-	int i;
+	struct kvm_lapic_irq lapic_irq;
 
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (!kvm_apic_present(vcpu))
-			continue;
+	lapic_irq.shorthand = 0;
+	lapic_irq.dest_mode = 0;
+	lapic_irq.dest_id = apicid;
 
-		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-			break;
-	}
-	if (vcpu) {
-		/*
-		 * Setting unhalt flag here can result in spurious runnable
-		 * state when unhalt reset does not happen in vcpu_block.
-		 * But that is harmless since that should soon result in halt.
-		 */
-		vcpu->arch.pv.pv_unhalted = true;
-		/* We need everybody see unhalt before vcpu unblocks */
-		smp_wmb();
-		kvm_vcpu_kick(vcpu);
-	}
+	lapic_irq.delivery_mode = APIC_DM_REMRD;
+	kvm_irq_delivery_to_apic(kvm, 0, &lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:20   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/cpuid.txt      |    4 ++++
 Documentation/virtual/kvm/hypercalls.txt |   14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..654f43c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_UNHALT              ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/cpuid.txt      |    4 ++++
 Documentation/virtual/kvm/hypercalls.txt |   14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..654f43c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_UNHALT              ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/cpuid.txt      |    4 ++++
 Documentation/virtual/kvm/hypercalls.txt |   14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..654f43c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_UNHALT              ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 18/18] kvm hypervisor: Add directed yield in vcpu block path
  2013-07-22  6:16 ` Raghavendra K T
  (?)
@ 2013-07-22  6:20   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: linux-doc, habanero, Raghavendra K T, xen-devel, peterz,
	mtosatti, stefano.stabellini, andi, attilio.rao, ouyang, gregkh,
	agraf, chegu_vinod, torvalds, avi.kivity, tglx, kvm,
	linux-kernel, riel, drjones, virtualization, srivatsa.vaddagiri

kvm hypervisor: Add directed yield in vcpu block path

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

We use the improved PLE handler logic in vcpu block patch for
scheduling rather than plain schedule, so that we can make
intelligent decisions.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Changes: 
 Added stubs for missing architecture (Gleb)

 arch/arm/include/asm/kvm_host.h     |    5 +++++
 arch/arm64/include/asm/kvm_host.h   |    5 +++++
 arch/ia64/include/asm/kvm_host.h    |    5 +++++
 arch/mips/include/asm/kvm_host.h    |    5 +++++
 arch/powerpc/include/asm/kvm_host.h |    5 +++++
 arch/s390/include/asm/kvm_host.h    |    5 +++++
 arch/x86/include/asm/kvm_host.h     |    2 +-
 arch/x86/kvm/x86.c                  |    8 ++++++++
 include/linux/kvm_host.h            |    2 +-
 virt/kvm/kvm_main.c                 |    6 ++++--
 10 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7d22517..94836c0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -226,6 +226,11 @@ static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 	return 0;
 }
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 644d739..cd18913 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -186,6 +186,11 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 				       phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..999ab15 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 4d6fa0b..2c4aae9 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -655,6 +655,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 			       struct kvm_vcpu *vcpu);
 
 /* Misc */
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern void mips32_SyncICache(unsigned long addr, unsigned long size);
 extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index af326cd..1aeecc0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -628,4 +628,9 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..d3409fa 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -274,6 +274,11 @@ struct kvm_arch{
 	int css_support;
 };
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d1f711..ed06ecd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1053,5 +1053,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
-
+void kvm_do_schedule(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 640d112..ea64481 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7319,6 +7319,14 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
+void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	/* We try to yield to a kicked vcpu else do a schedule */
+	if (kvm_vcpu_on_spin(vcpu) <= 0)
+		schedule();
+}
+EXPORT_SYMBOL_GPL(kvm_do_schedule);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..fab21ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -568,7 +568,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
-void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..72c49f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		if (signal_pending(current))
 			break;
 
-		schedule();
+		kvm_do_schedule(vcpu);
 	}
 
 	finish_wait(&vcpu->wq, &wait);
@@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 }
 #endif
 
-void kvm_vcpu_on_spin(struct kvm_vcpu *me)
+bool kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
 	struct kvm_vcpu *vcpu;
@@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 
 	/* Ensure vcpu is not eligible during next spinloop */
 	kvm_vcpu_set_dy_eligible(me, false);
+
+	return yielded;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
 


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 18/18] kvm hypervisor: Add directed yield in vcpu block path
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor: Add directed yield in vcpu block path

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

We use the improved PLE handler logic in vcpu block patch for
scheduling rather than plain schedule, so that we can make
intelligent decisions.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Changes: 
 Added stubs for missing architecture (Gleb)

 arch/arm/include/asm/kvm_host.h     |    5 +++++
 arch/arm64/include/asm/kvm_host.h   |    5 +++++
 arch/ia64/include/asm/kvm_host.h    |    5 +++++
 arch/mips/include/asm/kvm_host.h    |    5 +++++
 arch/powerpc/include/asm/kvm_host.h |    5 +++++
 arch/s390/include/asm/kvm_host.h    |    5 +++++
 arch/x86/include/asm/kvm_host.h     |    2 +-
 arch/x86/kvm/x86.c                  |    8 ++++++++
 include/linux/kvm_host.h            |    2 +-
 virt/kvm/kvm_main.c                 |    6 ++++--
 10 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7d22517..94836c0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -226,6 +226,11 @@ static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 	return 0;
 }
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 644d739..cd18913 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -186,6 +186,11 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 				       phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..999ab15 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 4d6fa0b..2c4aae9 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -655,6 +655,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 			       struct kvm_vcpu *vcpu);
 
 /* Misc */
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern void mips32_SyncICache(unsigned long addr, unsigned long size);
 extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index af326cd..1aeecc0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -628,4 +628,9 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..d3409fa 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -274,6 +274,11 @@ struct kvm_arch{
 	int css_support;
 };
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d1f711..ed06ecd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1053,5 +1053,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
-
+void kvm_do_schedule(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 640d112..ea64481 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7319,6 +7319,14 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
+void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	/* We try to yield to a kicked vcpu else do a schedule */
+	if (kvm_vcpu_on_spin(vcpu) <= 0)
+		schedule();
+}
+EXPORT_SYMBOL_GPL(kvm_do_schedule);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..fab21ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -568,7 +568,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
-void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..72c49f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		if (signal_pending(current))
 			break;
 
-		schedule();
+		kvm_do_schedule(vcpu);
 	}
 
 	finish_wait(&vcpu->wq, &wait);
@@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 }
 #endif
 
-void kvm_vcpu_on_spin(struct kvm_vcpu *me)
+bool kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
 	struct kvm_vcpu *vcpu;
@@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 
 	/* Ensure vcpu is not eligible during next spinloop */
 	kvm_vcpu_set_dy_eligible(me, false);
+
+	return yielded;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 18/18] kvm hypervisor: Add directed yield in vcpu block path
@ 2013-07-22  6:20   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:20 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds

kvm hypervisor: Add directed yield in vcpu block path

From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>

We use the improved PLE handler logic in vcpu block patch for
scheduling rather than plain schedule, so that we can make
intelligent decisions.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 Changes: 
 Added stubs for missing architecture (Gleb)

 arch/arm/include/asm/kvm_host.h     |    5 +++++
 arch/arm64/include/asm/kvm_host.h   |    5 +++++
 arch/ia64/include/asm/kvm_host.h    |    5 +++++
 arch/mips/include/asm/kvm_host.h    |    5 +++++
 arch/powerpc/include/asm/kvm_host.h |    5 +++++
 arch/s390/include/asm/kvm_host.h    |    5 +++++
 arch/x86/include/asm/kvm_host.h     |    2 +-
 arch/x86/kvm/x86.c                  |    8 ++++++++
 include/linux/kvm_host.h            |    2 +-
 virt/kvm/kvm_main.c                 |    6 ++++--
 10 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7d22517..94836c0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -226,6 +226,11 @@ static inline int kvm_arch_dev_ioctl_check_extension(long ext)
 	return 0;
 }
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 644d739..cd18913 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -186,6 +186,11 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 				       phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..999ab15 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 4d6fa0b..2c4aae9 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -655,6 +655,11 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc,
 			       struct kvm_vcpu *vcpu);
 
 /* Misc */
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern void mips32_SyncICache(unsigned long addr, unsigned long size);
 extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu);
 extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index af326cd..1aeecc0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -628,4 +628,9 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..d3409fa 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -274,6 +274,11 @@ struct kvm_arch{
 	int css_support;
 };
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	schedule();
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 extern char sie_exit;
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d1f711..ed06ecd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1053,5 +1053,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
-
+void kvm_do_schedule(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 640d112..ea64481 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7319,6 +7319,14 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
+void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+	/* We try to yield to a kicked vcpu else do a schedule */
+	if (kvm_vcpu_on_spin(vcpu) <= 0)
+		schedule();
+}
+EXPORT_SYMBOL_GPL(kvm_do_schedule);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..fab21ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -568,7 +568,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
-void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..72c49f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		if (signal_pending(current))
 			break;
 
-		schedule();
+		kvm_do_schedule(vcpu);
 	}
 
 	finish_wait(&vcpu->wq, &wait);
@@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 }
 #endif
 
-void kvm_vcpu_on_spin(struct kvm_vcpu *me)
+bool kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
 	struct kvm_vcpu *vcpu;
@@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 
 	/* Ensure vcpu is not eligible during next spinloop */
 	kvm_vcpu_set_dy_eligible(me, false);
+
+	return yielded;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
  2013-07-22  6:16 ` Raghavendra K T
@ 2013-07-22 19:36   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 121+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-22 19:36 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: gleb, mingo, jeremy, x86, hpa, pbonzini, linux-doc, habanero,
	xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

> 
> github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11

And chance you have a backup git tree? I get:

This repository is temporarily unavailable.

> 
> Please note that we set SPIN_THRESHOLD = 32k with this series,
> that would eatup little bit of overcommit performance of PLE machines
> and overall performance of non-PLE machines.
> 
> The older series[3] was tested by Attilio for Xen implementation.
> 
> Note that Konrad needs to revert below two patches to enable xen on hvm 
>   70dd4998, f10cd522c

We could add that to the series. But let me first test it out - and that
gets back to the repo :-)

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-07-22 19:36   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 121+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-22 19:36 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, xen-devel, x86, mingo, habanero, riel,
	stefano.stabellini, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

> 
> github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11

And chance you have a backup git tree? I get:

This repository is temporarily unavailable.

> 
> Please note that we set SPIN_THRESHOLD = 32k with this series,
> that would eatup little bit of overcommit performance of PLE machines
> and overall performance of non-PLE machines.
> 
> The older series[3] was tested by Attilio for Xen implementation.
> 
> Note that Konrad needs to revert below two patches to enable xen on hvm 
>   70dd4998, f10cd522c

We could add that to the series. But let me first test it out - and that
gets back to the repo :-)

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
  2013-07-22 19:36   ` Konrad Rzeszutek Wilk
@ 2013-07-23  2:50     ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-23  2:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: gleb, mingo, jeremy, x86, hpa, pbonzini, linux-doc, habanero,
	xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/23/2013 01:06 AM, Konrad Rzeszutek Wilk wrote:
>>
>> github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11
>
> And chance you have a backup git tree? I get:
>
> This repository is temporarily unavailable.

I only have it on local apart from there :(.
Hope it was a temporary github problem. working for me now.

>
>>
>> Please note that we set SPIN_THRESHOLD = 32k with this series,
>> that would eatup little bit of overcommit performance of PLE machines
>> and overall performance of non-PLE machines.
>>
>> The older series[3] was tested by Attilio for Xen implementation.
>>
>> Note that Konrad needs to revert below two patches to enable xen on hvm
>>    70dd4998, f10cd522c
>
> We could add that to the series. But let me first test it out - and that
> gets back to the repo :-)
>

okay.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-07-23  2:50     ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-23  2:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, xen-devel, x86, mingo, habanero, riel,
	stefano.stabellini, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/23/2013 01:06 AM, Konrad Rzeszutek Wilk wrote:
>>
>> github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11
>
> And chance you have a backup git tree? I get:
>
> This repository is temporarily unavailable.

I only have it on local apart from there :(.
Hope it was a temporary github problem. working for me now.

>
>>
>> Please note that we set SPIN_THRESHOLD = 32k with this series,
>> that would eatup little bit of overcommit performance of PLE machines
>> and overall performance of non-PLE machines.
>>
>> The older series[3] was tested by Attilio for Xen implementation.
>>
>> Note that Konrad needs to revert below two patches to enable xen on hvm
>>    70dd4998, f10cd522c
>
> We could add that to the series. But let me first test it out - and that
> gets back to the repo :-)
>

okay.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-22  6:20   ` Raghavendra K T
@ 2013-07-23 15:07     ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-23 15:07 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w;
> +	int cpu;
> +	u64 start;
> +	unsigned long flags;
> +
Why don't you bailout if in nmi here like we discussed?

> +	w = &__get_cpu_var(lock_waiting);
> +	cpu = smp_processor_id();
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/*
> +	 * halt until it's our turn and kicked. Note that we do safe halt
> +	 * for irq enabled case to avoid hang when lock info is overwritten
> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> +	 */
> +	if (arch_irqs_disabled_flags(flags))
> +		halt();
> +	else
> +		safe_halt();
> +
> +out:
So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to have them
enabled at this point? I do not see any problem yet, will keep thinking.

> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			kvm_kick_cpu(cpu);
What about using NMI to wake sleepers? I think it was discussed, but
forgot why it was dismissed.

> +			break;
> +		}
> +	}
> +}
> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> +		return;
> +
> +	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
> +
> +	static_key_slow_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-23 15:07     ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-23 15:07 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> +{
> +	struct kvm_lock_waiting *w;
> +	int cpu;
> +	u64 start;
> +	unsigned long flags;
> +
Why don't you bailout if in nmi here like we discussed?

> +	w = &__get_cpu_var(lock_waiting);
> +	cpu = smp_processor_id();
> +	start = spin_time_start();
> +
> +	/*
> +	 * Make sure an interrupt handler can't upset things in a
> +	 * partially setup state.
> +	 */
> +	local_irq_save(flags);
> +
> +	/*
> +	 * The ordering protocol on this is that the "lock" pointer
> +	 * may only be set non-NULL if the "want" ticket is correct.
> +	 * If we're updating "want", we must first clear "lock".
> +	 */
> +	w->lock = NULL;
> +	smp_wmb();
> +	w->want = want;
> +	smp_wmb();
> +	w->lock = lock;
> +
> +	add_stats(TAKEN_SLOW, 1);
> +
> +	/*
> +	 * This uses set_bit, which is atomic but we should not rely on its
> +	 * reordering gurantees. So barrier is needed after this call.
> +	 */
> +	cpumask_set_cpu(cpu, &waiting_cpus);
> +
> +	barrier();
> +
> +	/*
> +	 * Mark entry to slowpath before doing the pickup test to make
> +	 * sure we don't deadlock with an unlocker.
> +	 */
> +	__ticket_enter_slowpath(lock);
> +
> +	/*
> +	 * check again make sure it didn't become free while
> +	 * we weren't looking.
> +	 */
> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
> +		add_stats(TAKEN_SLOW_PICKUP, 1);
> +		goto out;
> +	}
> +
> +	/*
> +	 * halt until it's our turn and kicked. Note that we do safe halt
> +	 * for irq enabled case to avoid hang when lock info is overwritten
> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> +	 */
> +	if (arch_irqs_disabled_flags(flags))
> +		halt();
> +	else
> +		safe_halt();
> +
> +out:
So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to have them
enabled at this point? I do not see any problem yet, will keep thinking.

> +	cpumask_clear_cpu(cpu, &waiting_cpus);
> +	w->lock = NULL;
> +	local_irq_restore(flags);
> +	spin_time_accum_blocked(start);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> +
> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> +{
> +	int cpu;
> +
> +	add_stats(RELEASED_SLOW, 1);
> +	for_each_cpu(cpu, &waiting_cpus) {
> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> +		if (ACCESS_ONCE(w->lock) == lock &&
> +		    ACCESS_ONCE(w->want) == ticket) {
> +			add_stats(RELEASED_SLOW_KICKED, 1);
> +			kvm_kick_cpu(cpu);
What about using NMI to wake sleepers? I think it was discussed, but
forgot why it was dismissed.

> +			break;
> +		}
> +	}
> +}
> +
> +/*
> + * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
> + */
> +void __init kvm_spinlock_init(void)
> +{
> +	if (!kvm_para_available())
> +		return;
> +	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> +		return;
> +
> +	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
> +
> +	static_key_slow_inc(&paravirt_ticketlocks_enabled);
> +
> +	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
> +	pv_lock_ops.unlock_kick = kvm_unlock_kick;
> +}
> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RESEND RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-23 15:07     ` Gleb Natapov
@ 2013-07-24  9:24       ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24  9:24 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Raghavendra K T, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini,
	linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, tglx, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri

* Gleb Natapov <gleb@redhat.com> [2013-07-23 18:07:48]:

> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> > +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> > +{
> > +	struct kvm_lock_waiting *w;
> > +	int cpu;
> > +	u64 start;
> > +	unsigned long flags;
> > +
> Why don't you bailout if in nmi here like we discussed?

Sorry. I misunderstood that we shall ignore that part. Here
is the updated one

---8<---

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c           |  262 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..fe42970 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 	WARN_ON(kvm_register_clock("primary cpu clock"));
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+	unsigned long flags = 0;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	if (in_nmi())
+		return;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/*
+	 * halt until it's our turn and kicked. Note that we do safe halt
+	 * for irq enabled case to avoid hang when lock info is overwritten
+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
+	 */
+	if (arch_irqs_disabled_flags(flags))
+		halt();
+	else
+		safe_halt();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
+
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH RESEND RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-24  9:24       ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24  9:24 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, x86, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, Raghavendra K T, mingo,
	habanero, riel, konrad.wilk, ouyang, avi.kivity, tglx,
	chegu_vinod, gregkh, linux-kernel, srivatsa.vaddagiri,
	attilio.rao, pbonzini, torvalds

* Gleb Natapov <gleb@redhat.com> [2013-07-23 18:07:48]:

> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> > +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> > +{
> > +	struct kvm_lock_waiting *w;
> > +	int cpu;
> > +	u64 start;
> > +	unsigned long flags;
> > +
> Why don't you bailout if in nmi here like we discussed?

Sorry. I misunderstood that we shall ignore that part. Here
is the updated one

---8<---

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c           |  262 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..fe42970 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 	WARN_ON(kvm_register_clock("primary cpu clock"));
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+	unsigned long flags = 0;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	if (in_nmi())
+		return;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear "lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/*
+	 * halt until it's our turn and kicked. Note that we do safe halt
+	 * for irq enabled case to avoid hang when lock info is overwritten
+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
+	 */
+	if (arch_irqs_disabled_flags(flags))
+		halt();
+	else
+		safe_halt();
+
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	printk(KERN_INFO "KVM setup paravirtual spinlock\n");
+
+	static_key_slow_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-23 15:07     ` Gleb Natapov
@ 2013-07-24  9:45       ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24  9:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
[...]
>> +
>> +	/*
>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>> +	 */
>> +	if (arch_irqs_disabled_flags(flags))
>> +		halt();
>> +	else
>> +		safe_halt();
>> +
>> +out:
> So here now interrupts can be either disabled or enabled. Previous
> version disabled interrupts here, so are we sure it is safe to have them
> enabled at this point? I do not see any problem yet, will keep thinking.

If we enable interrupt here, then


>> +	cpumask_clear_cpu(cpu, &waiting_cpus);

and if we start serving lock for an interrupt that came here,
cpumask clear and w->lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null value for 
lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.

>> +	w->lock = NULL;
>> +	local_irq_restore(flags);
>> +	spin_time_accum_blocked(start);
>> +}
>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>> +
>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>> +{
>> +	int cpu;
>> +
>> +	add_stats(RELEASED_SLOW, 1);
>> +	for_each_cpu(cpu, &waiting_cpus) {
>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>> +		if (ACCESS_ONCE(w->lock) == lock &&
>> +		    ACCESS_ONCE(w->want) == ticket) {
>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>> +			kvm_kick_cpu(cpu);
> What about using NMI to wake sleepers? I think it was discussed, but
> forgot why it was dismissed.

I think I have missed that discussion. 'll go back and check. so what is 
the idea here? we can easily wake up the halted vcpus that have 
interrupt disabled?


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-24  9:45       ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24  9:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
[...]
>> +
>> +	/*
>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>> +	 */
>> +	if (arch_irqs_disabled_flags(flags))
>> +		halt();
>> +	else
>> +		safe_halt();
>> +
>> +out:
> So here now interrupts can be either disabled or enabled. Previous
> version disabled interrupts here, so are we sure it is safe to have them
> enabled at this point? I do not see any problem yet, will keep thinking.

If we enable interrupt here, then


>> +	cpumask_clear_cpu(cpu, &waiting_cpus);

and if we start serving lock for an interrupt that came here,
cpumask clear and w->lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null value for 
lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.

>> +	w->lock = NULL;
>> +	local_irq_restore(flags);
>> +	spin_time_accum_blocked(start);
>> +}
>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>> +
>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>> +{
>> +	int cpu;
>> +
>> +	add_stats(RELEASED_SLOW, 1);
>> +	for_each_cpu(cpu, &waiting_cpus) {
>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>> +		if (ACCESS_ONCE(w->lock) == lock &&
>> +		    ACCESS_ONCE(w->want) == ticket) {
>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>> +			kvm_kick_cpu(cpu);
> What about using NMI to wake sleepers? I think it was discussed, but
> forgot why it was dismissed.

I think I have missed that discussion. 'll go back and check. so what is 
the idea here? we can easily wake up the halted vcpus that have 
interrupt disabled?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24  9:45       ` Raghavendra K T
@ 2013-07-24 10:39         ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-24 10:39 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> [...]
> >>+
> >>+	/*
> >>+	 * halt until it's our turn and kicked. Note that we do safe halt
> >>+	 * for irq enabled case to avoid hang when lock info is overwritten
> >>+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> >>+	 */
> >>+	if (arch_irqs_disabled_flags(flags))
> >>+		halt();
> >>+	else
> >>+		safe_halt();
> >>+
> >>+out:
> >So here now interrupts can be either disabled or enabled. Previous
> >version disabled interrupts here, so are we sure it is safe to have them
> >enabled at this point? I do not see any problem yet, will keep thinking.
> 
> If we enable interrupt here, then
> 
> 
> >>+	cpumask_clear_cpu(cpu, &waiting_cpus);
> 
> and if we start serving lock for an interrupt that came here,
> cpumask clear and w->lock=null may not happen atomically.
> if irq spinlock does not take slow path we would have non null value
> for lock, but with no information in waitingcpu.
> 
> I am still thinking what would be problem with that.
> 
Exactly, for kicker waiting_cpus and w->lock updates are
non atomic anyway.

> >>+	w->lock = NULL;
> >>+	local_irq_restore(flags);
> >>+	spin_time_accum_blocked(start);
> >>+}
> >>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>+
> >>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> >>+{
> >>+	int cpu;
> >>+
> >>+	add_stats(RELEASED_SLOW, 1);
> >>+	for_each_cpu(cpu, &waiting_cpus) {
> >>+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> >>+		if (ACCESS_ONCE(w->lock) == lock &&
> >>+		    ACCESS_ONCE(w->want) == ticket) {
> >>+			add_stats(RELEASED_SLOW_KICKED, 1);
> >>+			kvm_kick_cpu(cpu);
> >What about using NMI to wake sleepers? I think it was discussed, but
> >forgot why it was dismissed.
> 
> I think I have missed that discussion. 'll go back and check. so
> what is the idea here? we can easily wake up the halted vcpus that
> have interrupt disabled?
We can of course. IIRC the objection was that NMI handling path is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still interesting
to see the numbers.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-24 10:39         ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-24 10:39 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> [...]
> >>+
> >>+	/*
> >>+	 * halt until it's our turn and kicked. Note that we do safe halt
> >>+	 * for irq enabled case to avoid hang when lock info is overwritten
> >>+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> >>+	 */
> >>+	if (arch_irqs_disabled_flags(flags))
> >>+		halt();
> >>+	else
> >>+		safe_halt();
> >>+
> >>+out:
> >So here now interrupts can be either disabled or enabled. Previous
> >version disabled interrupts here, so are we sure it is safe to have them
> >enabled at this point? I do not see any problem yet, will keep thinking.
> 
> If we enable interrupt here, then
> 
> 
> >>+	cpumask_clear_cpu(cpu, &waiting_cpus);
> 
> and if we start serving lock for an interrupt that came here,
> cpumask clear and w->lock=null may not happen atomically.
> if irq spinlock does not take slow path we would have non null value
> for lock, but with no information in waitingcpu.
> 
> I am still thinking what would be problem with that.
> 
Exactly, for kicker waiting_cpus and w->lock updates are
non atomic anyway.

> >>+	w->lock = NULL;
> >>+	local_irq_restore(flags);
> >>+	spin_time_accum_blocked(start);
> >>+}
> >>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>+
> >>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> >>+{
> >>+	int cpu;
> >>+
> >>+	add_stats(RELEASED_SLOW, 1);
> >>+	for_each_cpu(cpu, &waiting_cpus) {
> >>+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> >>+		if (ACCESS_ONCE(w->lock) == lock &&
> >>+		    ACCESS_ONCE(w->want) == ticket) {
> >>+			add_stats(RELEASED_SLOW_KICKED, 1);
> >>+			kvm_kick_cpu(cpu);
> >What about using NMI to wake sleepers? I think it was discussed, but
> >forgot why it was dismissed.
> 
> I think I have missed that discussion. 'll go back and check. so
> what is the idea here? we can easily wake up the halted vcpus that
> have interrupt disabled?
We can of course. IIRC the objection was that NMI handling path is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still interesting
to see the numbers.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 10:39         ` Gleb Natapov
  (?)
  (?)
@ 2013-07-24 12:00         ` Raghavendra K T
  2013-07-24 12:06           ` Gleb Natapov
  2013-07-24 12:06           ` Gleb Natapov
  -1 siblings, 2 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24 12:00 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>> [...]
>>>> +
>>>> +	/*
>>>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>>>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>>>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>>>> +	 */
>>>> +	if (arch_irqs_disabled_flags(flags))
>>>> +		halt();
>>>> +	else
>>>> +		safe_halt();
>>>> +
>>>> +out:
>>> So here now interrupts can be either disabled or enabled. Previous
>>> version disabled interrupts here, so are we sure it is safe to have them
>>> enabled at this point? I do not see any problem yet, will keep thinking.
>>
>> If we enable interrupt here, then
>>
>>
>>>> +	cpumask_clear_cpu(cpu, &waiting_cpus);
>>
>> and if we start serving lock for an interrupt that came here,
>> cpumask clear and w->lock=null may not happen atomically.
>> if irq spinlock does not take slow path we would have non null value
>> for lock, but with no information in waitingcpu.
>>
>> I am still thinking what would be problem with that.
>>
> Exactly, for kicker waiting_cpus and w->lock updates are
> non atomic anyway.
>
>>>> +	w->lock = NULL;
>>>> +	local_irq_restore(flags);
>>>> +	spin_time_accum_blocked(start);
>>>> +}
>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>> +
>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>>> +{
>>>> +	int cpu;
>>>> +
>>>> +	add_stats(RELEASED_SLOW, 1);
>>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>>> +			kvm_kick_cpu(cpu);
>>> What about using NMI to wake sleepers? I think it was discussed, but
>>> forgot why it was dismissed.
>>
>> I think I have missed that discussion. 'll go back and check. so
>> what is the idea here? we can easily wake up the halted vcpus that
>> have interrupt disabled?
> We can of course. IIRC the objection was that NMI handling path is very
> fragile and handling NMI on each wakeup will be more expensive then
> waking up a guest without injecting an event, but it is still interesting
> to see the numbers.
>

Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 10:39         ` Gleb Natapov
  (?)
@ 2013-07-24 12:00         ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24 12:00 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>> [...]
>>>> +
>>>> +	/*
>>>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>>>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>>>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>>>> +	 */
>>>> +	if (arch_irqs_disabled_flags(flags))
>>>> +		halt();
>>>> +	else
>>>> +		safe_halt();
>>>> +
>>>> +out:
>>> So here now interrupts can be either disabled or enabled. Previous
>>> version disabled interrupts here, so are we sure it is safe to have them
>>> enabled at this point? I do not see any problem yet, will keep thinking.
>>
>> If we enable interrupt here, then
>>
>>
>>>> +	cpumask_clear_cpu(cpu, &waiting_cpus);
>>
>> and if we start serving lock for an interrupt that came here,
>> cpumask clear and w->lock=null may not happen atomically.
>> if irq spinlock does not take slow path we would have non null value
>> for lock, but with no information in waitingcpu.
>>
>> I am still thinking what would be problem with that.
>>
> Exactly, for kicker waiting_cpus and w->lock updates are
> non atomic anyway.
>
>>>> +	w->lock = NULL;
>>>> +	local_irq_restore(flags);
>>>> +	spin_time_accum_blocked(start);
>>>> +}
>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>> +
>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>>> +{
>>>> +	int cpu;
>>>> +
>>>> +	add_stats(RELEASED_SLOW, 1);
>>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>>> +			kvm_kick_cpu(cpu);
>>> What about using NMI to wake sleepers? I think it was discussed, but
>>> forgot why it was dismissed.
>>
>> I think I have missed that discussion. 'll go back and check. so
>> what is the idea here? we can easily wake up the halted vcpus that
>> have interrupt disabled?
> We can of course. IIRC the objection was that NMI handling path is very
> fragile and handling NMI on each wakeup will be more expensive then
> waking up a guest without injecting an event, but it is still interesting
> to see the numbers.
>

Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 12:00         ` Raghavendra K T
  2013-07-24 12:06           ` Gleb Natapov
@ 2013-07-24 12:06           ` Gleb Natapov
  2013-07-24 12:36             ` Raghavendra K T
  2013-07-24 12:36             ` Raghavendra K T
  1 sibling, 2 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-24 12:06 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> >>[...]
> >>>>+
> >>>>+	/*
> >>>>+	 * halt until it's our turn and kicked. Note that we do safe halt
> >>>>+	 * for irq enabled case to avoid hang when lock info is overwritten
> >>>>+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> >>>>+	 */
> >>>>+	if (arch_irqs_disabled_flags(flags))
> >>>>+		halt();
> >>>>+	else
> >>>>+		safe_halt();
> >>>>+
> >>>>+out:
> >>>So here now interrupts can be either disabled or enabled. Previous
> >>>version disabled interrupts here, so are we sure it is safe to have them
> >>>enabled at this point? I do not see any problem yet, will keep thinking.
> >>
> >>If we enable interrupt here, then
> >>
> >>
> >>>>+	cpumask_clear_cpu(cpu, &waiting_cpus);
> >>
> >>and if we start serving lock for an interrupt that came here,
> >>cpumask clear and w->lock=null may not happen atomically.
> >>if irq spinlock does not take slow path we would have non null value
> >>for lock, but with no information in waitingcpu.
> >>
> >>I am still thinking what would be problem with that.
> >>
> >Exactly, for kicker waiting_cpus and w->lock updates are
> >non atomic anyway.
> >
> >>>>+	w->lock = NULL;
> >>>>+	local_irq_restore(flags);
> >>>>+	spin_time_accum_blocked(start);
> >>>>+}
> >>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>+
> >>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> >>>>+{
> >>>>+	int cpu;
> >>>>+
> >>>>+	add_stats(RELEASED_SLOW, 1);
> >>>>+	for_each_cpu(cpu, &waiting_cpus) {
> >>>>+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> >>>>+		if (ACCESS_ONCE(w->lock) == lock &&
> >>>>+		    ACCESS_ONCE(w->want) == ticket) {
> >>>>+			add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>+			kvm_kick_cpu(cpu);
> >>>What about using NMI to wake sleepers? I think it was discussed, but
> >>>forgot why it was dismissed.
> >>
> >>I think I have missed that discussion. 'll go back and check. so
> >>what is the idea here? we can easily wake up the halted vcpus that
> >>have interrupt disabled?
> >We can of course. IIRC the objection was that NMI handling path is very
> >fragile and handling NMI on each wakeup will be more expensive then
> >waking up a guest without injecting an event, but it is still interesting
> >to see the numbers.
> >
> 
> Haam, now I remember, We had tried request based mechanism. (new
> request like REQ_UNHALT) and process that. It had worked, but had some
> complex hacks in vcpu_enter_guest to avoid guest hang in case of
> request cleared.  So had left it there..
> 
> https://lkml.org/lkml/2012/4/30/67
> 
> But I do not remember performance impact though.
No, this is something different. Wakeup with NMI does not need KVM changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 12:00         ` Raghavendra K T
@ 2013-07-24 12:06           ` Gleb Natapov
  2013-07-24 12:06           ` Gleb Natapov
  1 sibling, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-24 12:06 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> >>[...]
> >>>>+
> >>>>+	/*
> >>>>+	 * halt until it's our turn and kicked. Note that we do safe halt
> >>>>+	 * for irq enabled case to avoid hang when lock info is overwritten
> >>>>+	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
> >>>>+	 */
> >>>>+	if (arch_irqs_disabled_flags(flags))
> >>>>+		halt();
> >>>>+	else
> >>>>+		safe_halt();
> >>>>+
> >>>>+out:
> >>>So here now interrupts can be either disabled or enabled. Previous
> >>>version disabled interrupts here, so are we sure it is safe to have them
> >>>enabled at this point? I do not see any problem yet, will keep thinking.
> >>
> >>If we enable interrupt here, then
> >>
> >>
> >>>>+	cpumask_clear_cpu(cpu, &waiting_cpus);
> >>
> >>and if we start serving lock for an interrupt that came here,
> >>cpumask clear and w->lock=null may not happen atomically.
> >>if irq spinlock does not take slow path we would have non null value
> >>for lock, but with no information in waitingcpu.
> >>
> >>I am still thinking what would be problem with that.
> >>
> >Exactly, for kicker waiting_cpus and w->lock updates are
> >non atomic anyway.
> >
> >>>>+	w->lock = NULL;
> >>>>+	local_irq_restore(flags);
> >>>>+	spin_time_accum_blocked(start);
> >>>>+}
> >>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>+
> >>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
> >>>>+{
> >>>>+	int cpu;
> >>>>+
> >>>>+	add_stats(RELEASED_SLOW, 1);
> >>>>+	for_each_cpu(cpu, &waiting_cpus) {
> >>>>+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> >>>>+		if (ACCESS_ONCE(w->lock) == lock &&
> >>>>+		    ACCESS_ONCE(w->want) == ticket) {
> >>>>+			add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>+			kvm_kick_cpu(cpu);
> >>>What about using NMI to wake sleepers? I think it was discussed, but
> >>>forgot why it was dismissed.
> >>
> >>I think I have missed that discussion. 'll go back and check. so
> >>what is the idea here? we can easily wake up the halted vcpus that
> >>have interrupt disabled?
> >We can of course. IIRC the objection was that NMI handling path is very
> >fragile and handling NMI on each wakeup will be more expensive then
> >waking up a guest without injecting an event, but it is still interesting
> >to see the numbers.
> >
> 
> Haam, now I remember, We had tried request based mechanism. (new
> request like REQ_UNHALT) and process that. It had worked, but had some
> complex hacks in vcpu_enter_guest to avoid guest hang in case of
> request cleared.  So had left it there..
> 
> https://lkml.org/lkml/2012/4/30/67
> 
> But I do not remember performance impact though.
No, this is something different. Wakeup with NMI does not need KVM changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 12:06           ` Gleb Natapov
  2013-07-24 12:36             ` Raghavendra K T
@ 2013-07-24 12:36             ` Raghavendra K T
  2013-07-25  9:17                 ` Raghavendra K T
  1 sibling, 1 reply; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24 12:36 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>>>> [...]
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>>>>>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>>>>>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>>>>>> +	 */
>>>>>> +	if (arch_irqs_disabled_flags(flags))
>>>>>> +		halt();
>>>>>> +	else
>>>>>> +		safe_halt();
>>>>>> +
>>>>>> +out:
>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>> version disabled interrupts here, so are we sure it is safe to have them
>>>>> enabled at this point? I do not see any problem yet, will keep thinking.
>>>>
>>>> If we enable interrupt here, then
>>>>
>>>>
>>>>>> +	cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>
>>>> and if we start serving lock for an interrupt that came here,
>>>> cpumask clear and w->lock=null may not happen atomically.
>>>> if irq spinlock does not take slow path we would have non null value
>>>> for lock, but with no information in waitingcpu.
>>>>
>>>> I am still thinking what would be problem with that.
>>>>
>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>> non atomic anyway.
>>>
>>>>>> +	w->lock = NULL;
>>>>>> +	local_irq_restore(flags);
>>>>>> +	spin_time_accum_blocked(start);
>>>>>> +}
>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>> +
>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>>>>> +{
>>>>>> +	int cpu;
>>>>>> +
>>>>>> +	add_stats(RELEASED_SLOW, 1);
>>>>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>>>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>>>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>>>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>>>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>> +			kvm_kick_cpu(cpu);
>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>> forgot why it was dismissed.
>>>>
>>>> I think I have missed that discussion. 'll go back and check. so
>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>> have interrupt disabled?
>>> We can of course. IIRC the objection was that NMI handling path is very
>>> fragile and handling NMI on each wakeup will be more expensive then
>>> waking up a guest without injecting an event, but it is still interesting
>>> to see the numbers.
>>>
>>
>> Haam, now I remember, We had tried request based mechanism. (new
>> request like REQ_UNHALT) and process that. It had worked, but had some
>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>> request cleared.  So had left it there..
>>
>> https://lkml.org/lkml/2012/4/30/67
>>
>> But I do not remember performance impact though.
> No, this is something different. Wakeup with NMI does not need KVM changes at
> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>

True. It was not NMI.
just to confirm, are you talking about something like this to be tried ?

apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 12:06           ` Gleb Natapov
@ 2013-07-24 12:36             ` Raghavendra K T
  2013-07-24 12:36             ` Raghavendra K T
  1 sibling, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-24 12:36 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>>>> [...]
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * halt until it's our turn and kicked. Note that we do safe halt
>>>>>> +	 * for irq enabled case to avoid hang when lock info is overwritten
>>>>>> +	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
>>>>>> +	 */
>>>>>> +	if (arch_irqs_disabled_flags(flags))
>>>>>> +		halt();
>>>>>> +	else
>>>>>> +		safe_halt();
>>>>>> +
>>>>>> +out:
>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>> version disabled interrupts here, so are we sure it is safe to have them
>>>>> enabled at this point? I do not see any problem yet, will keep thinking.
>>>>
>>>> If we enable interrupt here, then
>>>>
>>>>
>>>>>> +	cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>
>>>> and if we start serving lock for an interrupt that came here,
>>>> cpumask clear and w->lock=null may not happen atomically.
>>>> if irq spinlock does not take slow path we would have non null value
>>>> for lock, but with no information in waitingcpu.
>>>>
>>>> I am still thinking what would be problem with that.
>>>>
>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>> non atomic anyway.
>>>
>>>>>> +	w->lock = NULL;
>>>>>> +	local_irq_restore(flags);
>>>>>> +	spin_time_accum_blocked(start);
>>>>>> +}
>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>> +
>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>>>>> +{
>>>>>> +	int cpu;
>>>>>> +
>>>>>> +	add_stats(RELEASED_SLOW, 1);
>>>>>> +	for_each_cpu(cpu, &waiting_cpus) {
>>>>>> +		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>>>>> +		if (ACCESS_ONCE(w->lock) == lock &&
>>>>>> +		    ACCESS_ONCE(w->want) == ticket) {
>>>>>> +			add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>> +			kvm_kick_cpu(cpu);
>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>> forgot why it was dismissed.
>>>>
>>>> I think I have missed that discussion. 'll go back and check. so
>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>> have interrupt disabled?
>>> We can of course. IIRC the objection was that NMI handling path is very
>>> fragile and handling NMI on each wakeup will be more expensive then
>>> waking up a guest without injecting an event, but it is still interesting
>>> to see the numbers.
>>>
>>
>> Haam, now I remember, We had tried request based mechanism. (new
>> request like REQ_UNHALT) and process that. It had worked, but had some
>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>> request cleared.  So had left it there..
>>
>> https://lkml.org/lkml/2012/4/30/67
>>
>> But I do not remember performance impact though.
> No, this is something different. Wakeup with NMI does not need KVM changes at
> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>

True. It was not NMI.
just to confirm, are you talking about something like this to be tried ?

apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-25  9:17                 ` Raghavendra K T
@ 2013-07-25  9:15                   ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-25  9:15 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>__ticket_t want)
> >>>>>[...]
> >>>>>>>+
> >>>>>>>+    /*
> >>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>halt
> >>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>overwritten
> >>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>to save us.
> >>>>>>>+     */
> >>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>+        halt();
> >>>>>>>+    else
> >>>>>>>+        safe_halt();
> >>>>>>>+
> >>>>>>>+out:
> >>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>have them
> >>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>thinking.
> >>>>>
> >>>>>If we enable interrupt here, then
> >>>>>
> >>>>>
> >>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>
> >>>>>and if we start serving lock for an interrupt that came here,
> >>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>if irq spinlock does not take slow path we would have non null value
> >>>>>for lock, but with no information in waitingcpu.
> >>>>>
> >>>>>I am still thinking what would be problem with that.
> >>>>>
> >>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>non atomic anyway.
> >>>>
> >>>>>>>+    w->lock = NULL;
> >>>>>>>+    local_irq_restore(flags);
> >>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>+}
> >>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>+
> >>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>__ticket_t ticket)
> >>>>>>>+{
> >>>>>>>+    int cpu;
> >>>>>>>+
> >>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>What about using NMI to wake sleepers? I think it was discussed, but
> >>>>>>forgot why it was dismissed.
> >>>>>
> >>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>have interrupt disabled?
> >>>>We can of course. IIRC the objection was that NMI handling path is very
> >>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>waking up a guest without injecting an event, but it is still
> >>>>interesting
> >>>>to see the numbers.
> >>>>
> >>>
> >>>Haam, now I remember, We had tried request based mechanism. (new
> >>>request like REQ_UNHALT) and process that. It had worked, but had some
> >>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>request cleared.  So had left it there..
> >>>
> >>>https://lkml.org/lkml/2012/4/30/67
> >>>
> >>>But I do not remember performance impact though.
> >>No, this is something different. Wakeup with NMI does not need KVM
> >>changes at
> >>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>
> >
> >True. It was not NMI.
> >just to confirm, are you talking about something like this to be tried ?
> >
> >apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> 
> When I started benchmark, I started seeing
> "Dazed and confused, but trying to continue" from unknown nmi error
> handling.
> Did I miss anything (because we did not register any NMI handler)? or
> is it that spurious NMIs are trouble because we could get spurious NMIs
> if next waiter already acquired the lock.
There is a default NMI handler that tries to detect the reason why NMI
happened (which is no so easy on x86) and prints this message if it
fails. You need to add logic to detect spinlock slow path there. Check
bit in waiting_cpus for instance.

> 
> (note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
> but hypercall way of handling still performed well from the results I
> saw).
You mean better? This is strange. Have you ran guest with x2apic?

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-25  9:15                   ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-25  9:15 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>__ticket_t want)
> >>>>>[...]
> >>>>>>>+
> >>>>>>>+    /*
> >>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>halt
> >>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>overwritten
> >>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>to save us.
> >>>>>>>+     */
> >>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>+        halt();
> >>>>>>>+    else
> >>>>>>>+        safe_halt();
> >>>>>>>+
> >>>>>>>+out:
> >>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>have them
> >>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>thinking.
> >>>>>
> >>>>>If we enable interrupt here, then
> >>>>>
> >>>>>
> >>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>
> >>>>>and if we start serving lock for an interrupt that came here,
> >>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>if irq spinlock does not take slow path we would have non null value
> >>>>>for lock, but with no information in waitingcpu.
> >>>>>
> >>>>>I am still thinking what would be problem with that.
> >>>>>
> >>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>non atomic anyway.
> >>>>
> >>>>>>>+    w->lock = NULL;
> >>>>>>>+    local_irq_restore(flags);
> >>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>+}
> >>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>+
> >>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>__ticket_t ticket)
> >>>>>>>+{
> >>>>>>>+    int cpu;
> >>>>>>>+
> >>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>What about using NMI to wake sleepers? I think it was discussed, but
> >>>>>>forgot why it was dismissed.
> >>>>>
> >>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>have interrupt disabled?
> >>>>We can of course. IIRC the objection was that NMI handling path is very
> >>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>waking up a guest without injecting an event, but it is still
> >>>>interesting
> >>>>to see the numbers.
> >>>>
> >>>
> >>>Haam, now I remember, We had tried request based mechanism. (new
> >>>request like REQ_UNHALT) and process that. It had worked, but had some
> >>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>request cleared.  So had left it there..
> >>>
> >>>https://lkml.org/lkml/2012/4/30/67
> >>>
> >>>But I do not remember performance impact though.
> >>No, this is something different. Wakeup with NMI does not need KVM
> >>changes at
> >>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>
> >
> >True. It was not NMI.
> >just to confirm, are you talking about something like this to be tried ?
> >
> >apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> 
> When I started benchmark, I started seeing
> "Dazed and confused, but trying to continue" from unknown nmi error
> handling.
> Did I miss anything (because we did not register any NMI handler)? or
> is it that spurious NMIs are trouble because we could get spurious NMIs
> if next waiter already acquired the lock.
There is a default NMI handler that tries to detect the reason why NMI
happened (which is no so easy on x86) and prints this message if it
fails. You need to add logic to detect spinlock slow path there. Check
bit in waiting_cpus for instance.

> 
> (note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
> but hypercall way of handling still performed well from the results I
> saw).
You mean better? This is strange. Have you ran guest with x2apic?

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-24 12:36             ` Raghavendra K T
@ 2013-07-25  9:17                 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-25  9:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>> __ticket_t want)
>>>>> [...]
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>> halt
>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>> overwritten
>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>> to save us.
>>>>>>> +     */
>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>> +        halt();
>>>>>>> +    else
>>>>>>> +        safe_halt();
>>>>>>> +
>>>>>>> +out:
>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>> have them
>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>> thinking.
>>>>>
>>>>> If we enable interrupt here, then
>>>>>
>>>>>
>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>
>>>>> and if we start serving lock for an interrupt that came here,
>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>> if irq spinlock does not take slow path we would have non null value
>>>>> for lock, but with no information in waitingcpu.
>>>>>
>>>>> I am still thinking what would be problem with that.
>>>>>
>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>> non atomic anyway.
>>>>
>>>>>>> +    w->lock = NULL;
>>>>>>> +    local_irq_restore(flags);
>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>> +}
>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>> +
>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>> __ticket_t ticket)
>>>>>>> +{
>>>>>>> +    int cpu;
>>>>>>> +
>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>>> forgot why it was dismissed.
>>>>>
>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>> have interrupt disabled?
>>>> We can of course. IIRC the objection was that NMI handling path is very
>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>> waking up a guest without injecting an event, but it is still
>>>> interesting
>>>> to see the numbers.
>>>>
>>>
>>> Haam, now I remember, We had tried request based mechanism. (new
>>> request like REQ_UNHALT) and process that. It had worked, but had some
>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>> request cleared.  So had left it there..
>>>
>>> https://lkml.org/lkml/2012/4/30/67
>>>
>>> But I do not remember performance impact though.
>> No, this is something different. Wakeup with NMI does not need KVM
>> changes at
>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>
>
> True. It was not NMI.
> just to confirm, are you talking about something like this to be tried ?
>
> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);

When I started benchmark, I started seeing
"Dazed and confused, but trying to continue" from unknown nmi error
handling.
Did I miss anything (because we did not register any NMI handler)? or
is it that spurious NMIs are trouble because we could get spurious NMIs
if next waiter already acquired the lock.

(note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
but hypercall way of handling still performed well from the results I
saw).






^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-25  9:17                 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-25  9:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>> __ticket_t want)
>>>>> [...]
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>> halt
>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>> overwritten
>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>> to save us.
>>>>>>> +     */
>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>> +        halt();
>>>>>>> +    else
>>>>>>> +        safe_halt();
>>>>>>> +
>>>>>>> +out:
>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>> have them
>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>> thinking.
>>>>>
>>>>> If we enable interrupt here, then
>>>>>
>>>>>
>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>
>>>>> and if we start serving lock for an interrupt that came here,
>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>> if irq spinlock does not take slow path we would have non null value
>>>>> for lock, but with no information in waitingcpu.
>>>>>
>>>>> I am still thinking what would be problem with that.
>>>>>
>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>> non atomic anyway.
>>>>
>>>>>>> +    w->lock = NULL;
>>>>>>> +    local_irq_restore(flags);
>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>> +}
>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>> +
>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>> __ticket_t ticket)
>>>>>>> +{
>>>>>>> +    int cpu;
>>>>>>> +
>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>>> forgot why it was dismissed.
>>>>>
>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>> have interrupt disabled?
>>>> We can of course. IIRC the objection was that NMI handling path is very
>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>> waking up a guest without injecting an event, but it is still
>>>> interesting
>>>> to see the numbers.
>>>>
>>>
>>> Haam, now I remember, We had tried request based mechanism. (new
>>> request like REQ_UNHALT) and process that. It had worked, but had some
>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>> request cleared.  So had left it there..
>>>
>>> https://lkml.org/lkml/2012/4/30/67
>>>
>>> But I do not remember performance impact though.
>> No, this is something different. Wakeup with NMI does not need KVM
>> changes at
>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>
>
> True. It was not NMI.
> just to confirm, are you talking about something like this to be tried ?
>
> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);

When I started benchmark, I started seeing
"Dazed and confused, but trying to continue" from unknown nmi error
handling.
Did I miss anything (because we did not register any NMI handler)? or
is it that spurious NMIs are trouble because we could get spurious NMIs
if next waiter already acquired the lock.

(note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
but hypercall way of handling still performed well from the results I
saw).

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-25  9:15                   ` Gleb Natapov
@ 2013-07-25  9:38                     ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-25  9:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>> __ticket_t want)
>>>>>>> [...]
>>>>>>>>> +
>>>>>>>>> +    /*
>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>> halt
>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>> overwritten
>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>> to save us.
>>>>>>>>> +     */
>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>> +        halt();
>>>>>>>>> +    else
>>>>>>>>> +        safe_halt();
>>>>>>>>> +
>>>>>>>>> +out:
>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>> have them
>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>> thinking.
>>>>>>>
>>>>>>> If we enable interrupt here, then
>>>>>>>
>>>>>>>
>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>
>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>> if irq spinlock does not take slow path we would have non null value
>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>
>>>>>>> I am still thinking what would be problem with that.
>>>>>>>
>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>> non atomic anyway.
>>>>>>
>>>>>>>>> +    w->lock = NULL;
>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>> +}
>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>> +
>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>> __ticket_t ticket)
>>>>>>>>> +{
>>>>>>>>> +    int cpu;
>>>>>>>>> +
>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>>>>> forgot why it was dismissed.
>>>>>>>
>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>> have interrupt disabled?
>>>>>> We can of course. IIRC the objection was that NMI handling path is very
>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>> waking up a guest without injecting an event, but it is still
>>>>>> interesting
>>>>>> to see the numbers.
>>>>>>
>>>>>
>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>> request like REQ_UNHALT) and process that. It had worked, but had some
>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>> request cleared.  So had left it there..
>>>>>
>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>
>>>>> But I do not remember performance impact though.
>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>> changes at
>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>
>>>
>>> True. It was not NMI.
>>> just to confirm, are you talking about something like this to be tried ?
>>>
>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>
>> When I started benchmark, I started seeing
>> "Dazed and confused, but trying to continue" from unknown nmi error
>> handling.
>> Did I miss anything (because we did not register any NMI handler)? or
>> is it that spurious NMIs are trouble because we could get spurious NMIs
>> if next waiter already acquired the lock.
> There is a default NMI handler that tries to detect the reason why NMI
> happened (which is no so easy on x86) and prints this message if it
> fails. You need to add logic to detect spinlock slow path there. Check
> bit in waiting_cpus for instance.

aha.. Okay. will check that.

>
>>
>> (note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
>> but hypercall way of handling still performed well from the results I
>> saw).
> You mean better? This is strange. Have you ran guest with x2apic?
>

Had the same doubt. So ran the full benchmark for dbench.
So here is what I saw now. 1x was neck to neck (0.9% for hypercall vs 
0.7% for IPI which should boil to no difference considering the noise
factors) but otherwise, by sending IPI I see few percentage gain in 
overcommit cases.





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-25  9:38                     ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-25  9:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>> __ticket_t want)
>>>>>>> [...]
>>>>>>>>> +
>>>>>>>>> +    /*
>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>> halt
>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>> overwritten
>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>> to save us.
>>>>>>>>> +     */
>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>> +        halt();
>>>>>>>>> +    else
>>>>>>>>> +        safe_halt();
>>>>>>>>> +
>>>>>>>>> +out:
>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>> have them
>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>> thinking.
>>>>>>>
>>>>>>> If we enable interrupt here, then
>>>>>>>
>>>>>>>
>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>
>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>> if irq spinlock does not take slow path we would have non null value
>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>
>>>>>>> I am still thinking what would be problem with that.
>>>>>>>
>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>> non atomic anyway.
>>>>>>
>>>>>>>>> +    w->lock = NULL;
>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>> +}
>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>> +
>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>> __ticket_t ticket)
>>>>>>>>> +{
>>>>>>>>> +    int cpu;
>>>>>>>>> +
>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>> What about using NMI to wake sleepers? I think it was discussed, but
>>>>>>>> forgot why it was dismissed.
>>>>>>>
>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>> have interrupt disabled?
>>>>>> We can of course. IIRC the objection was that NMI handling path is very
>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>> waking up a guest without injecting an event, but it is still
>>>>>> interesting
>>>>>> to see the numbers.
>>>>>>
>>>>>
>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>> request like REQ_UNHALT) and process that. It had worked, but had some
>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>> request cleared.  So had left it there..
>>>>>
>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>
>>>>> But I do not remember performance impact though.
>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>> changes at
>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>
>>>
>>> True. It was not NMI.
>>> just to confirm, are you talking about something like this to be tried ?
>>>
>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>
>> When I started benchmark, I started seeing
>> "Dazed and confused, but trying to continue" from unknown nmi error
>> handling.
>> Did I miss anything (because we did not register any NMI handler)? or
>> is it that spurious NMIs are trouble because we could get spurious NMIs
>> if next waiter already acquired the lock.
> There is a default NMI handler that tries to detect the reason why NMI
> happened (which is no so easy on x86) and prints this message if it
> fails. You need to add logic to detect spinlock slow path there. Check
> bit in waiting_cpus for instance.

aha.. Okay. will check that.

>
>>
>> (note: I tried sending APIC_DM_REMRD IPI directly, which worked fine
>> but hypercall way of handling still performed well from the results I
>> saw).
> You mean better? This is strange. Have you ran guest with x2apic?
>

Had the same doubt. So ran the full benchmark for dbench.
So here is what I saw now. 1x was neck to neck (0.9% for hypercall vs 
0.7% for IPI which should boil to no difference considering the noise
factors) but otherwise, by sending IPI I see few percentage gain in 
overcommit cases.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-25  9:38                     ` Raghavendra K T
@ 2013-07-30 16:43                       ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-30 16:43 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc, habanero,
	xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>> __ticket_t want)
>>>>>>>> [...]
>>>>>>>>>> +
>>>>>>>>>> +    /*
>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>> halt
>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>> overwritten
>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>> to save us.
>>>>>>>>>> +     */
>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>> +        halt();
>>>>>>>>>> +    else
>>>>>>>>>> +        safe_halt();
>>>>>>>>>> +
>>>>>>>>>> +out:
>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>> have them
>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>> thinking.
>>>>>>>>
>>>>>>>> If we enable interrupt here, then
>>>>>>>>
>>>>>>>>
>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>
>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>> value
>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>
>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>
>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>> non atomic anyway.
>>>>>>>
>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>> +}
>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>> +
>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>> +{
>>>>>>>>>> +    int cpu;
>>>>>>>>>> +
>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>> discussed, but
>>>>>>>>> forgot why it was dismissed.
>>>>>>>>
>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>> have interrupt disabled?
>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>> is very
>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>> interesting
>>>>>>> to see the numbers.
>>>>>>>
>>>>>>
>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>> some
>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>> request cleared.  So had left it there..
>>>>>>
>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>
>>>>>> But I do not remember performance impact though.
>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>> changes at
>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>
>>>>
>>>> True. It was not NMI.
>>>> just to confirm, are you talking about something like this to be
>>>> tried ?
>>>>
>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>
>>> When I started benchmark, I started seeing
>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>> handling.
>>> Did I miss anything (because we did not register any NMI handler)? or
>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>> if next waiter already acquired the lock.
>> There is a default NMI handler that tries to detect the reason why NMI
>> happened (which is no so easy on x86) and prints this message if it
>> fails. You need to add logic to detect spinlock slow path there. Check
>> bit in waiting_cpus for instance.
>
> aha.. Okay. will check that.

yes. Thanks.. that did the trick.

I did like below in unknown_nmi_error():
if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
    return;

But I believe you asked NMI method only for experimental purpose to
check the upperbound. because as I doubted above, for spurious NMI
(i.e. when unlocker kicks when waiter already got the lock), we would
still hit unknown NMI error.

I had hit spurious NMI over 1656 times over entire benchmark run.
along with
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long 
to run: 24.886 msecs etc...

(and we cannot get away with that too because it means we bypass the
unknown NMI error even in genuine cases too)

Here was the result for the my dbench test( 32 core  machine with 32
vcpu guest HT off)

                  ---------- % improvement --------------
		pvspinlock      pvspin_ipi      pvpsin_nmi
dbench_1x	0.9016    	0.7442    	0.7522
dbench_2x	14.7513   	18.0164   	15.9421
dbench_3x	14.7571   	17.0793   	13.3572
dbench_4x	6.3625    	8.7897    	5.3800

So I am seeing over 2-4% improvement with IPI method.

Gleb,
  do you think the current series looks good to you? [one patch I
have resent with in_nmi() check] or do you think I have to respin the
series with IPI method etc. or is there any concerns that I have to
address. Please let me know..


PS: [Sorry for the late reply, was quickly checking whether unfair lock
with lockowner is better. it did not prove to be though. and so far
all the results are favoring this series.]


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-30 16:43                       ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-30 16:43 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>> __ticket_t want)
>>>>>>>> [...]
>>>>>>>>>> +
>>>>>>>>>> +    /*
>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>> halt
>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>> overwritten
>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>> to save us.
>>>>>>>>>> +     */
>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>> +        halt();
>>>>>>>>>> +    else
>>>>>>>>>> +        safe_halt();
>>>>>>>>>> +
>>>>>>>>>> +out:
>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>> have them
>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>> thinking.
>>>>>>>>
>>>>>>>> If we enable interrupt here, then
>>>>>>>>
>>>>>>>>
>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>
>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>> value
>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>
>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>
>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>> non atomic anyway.
>>>>>>>
>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>> +}
>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>> +
>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>> +{
>>>>>>>>>> +    int cpu;
>>>>>>>>>> +
>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>> discussed, but
>>>>>>>>> forgot why it was dismissed.
>>>>>>>>
>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>> have interrupt disabled?
>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>> is very
>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>> interesting
>>>>>>> to see the numbers.
>>>>>>>
>>>>>>
>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>> some
>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>> request cleared.  So had left it there..
>>>>>>
>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>
>>>>>> But I do not remember performance impact though.
>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>> changes at
>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>
>>>>
>>>> True. It was not NMI.
>>>> just to confirm, are you talking about something like this to be
>>>> tried ?
>>>>
>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>
>>> When I started benchmark, I started seeing
>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>> handling.
>>> Did I miss anything (because we did not register any NMI handler)? or
>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>> if next waiter already acquired the lock.
>> There is a default NMI handler that tries to detect the reason why NMI
>> happened (which is no so easy on x86) and prints this message if it
>> fails. You need to add logic to detect spinlock slow path there. Check
>> bit in waiting_cpus for instance.
>
> aha.. Okay. will check that.

yes. Thanks.. that did the trick.

I did like below in unknown_nmi_error():
if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
    return;

But I believe you asked NMI method only for experimental purpose to
check the upperbound. because as I doubted above, for spurious NMI
(i.e. when unlocker kicks when waiter already got the lock), we would
still hit unknown NMI error.

I had hit spurious NMI over 1656 times over entire benchmark run.
along with
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long 
to run: 24.886 msecs etc...

(and we cannot get away with that too because it means we bypass the
unknown NMI error even in genuine cases too)

Here was the result for the my dbench test( 32 core  machine with 32
vcpu guest HT off)

                  ---------- % improvement --------------
		pvspinlock      pvspin_ipi      pvpsin_nmi
dbench_1x	0.9016    	0.7442    	0.7522
dbench_2x	14.7513   	18.0164   	15.9421
dbench_3x	14.7571   	17.0793   	13.3572
dbench_4x	6.3625    	8.7897    	5.3800

So I am seeing over 2-4% improvement with IPI method.

Gleb,
  do you think the current series looks good to you? [one patch I
have resent with in_nmi() check] or do you think I have to respin the
series with IPI method etc. or is there any concerns that I have to
address. Please let me know..


PS: [Sorry for the late reply, was quickly checking whether unfair lock
with lockowner is better. it did not prove to be though. and so far
all the results are favoring this series.]

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-30 16:43                       ` Raghavendra K T
@ 2013-07-31  6:24                         ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-31  6:24 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, jeremy, x86, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t want)
> >>>>>>>>[...]
> >>>>>>>>>>+
> >>>>>>>>>>+    /*
> >>>>>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>>>>halt
> >>>>>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>>>>overwritten
> >>>>>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>>>>to save us.
> >>>>>>>>>>+     */
> >>>>>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>>>>+        halt();
> >>>>>>>>>>+    else
> >>>>>>>>>>+        safe_halt();
> >>>>>>>>>>+
> >>>>>>>>>>+out:
> >>>>>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>>>>have them
> >>>>>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>>>>thinking.
> >>>>>>>>
> >>>>>>>>If we enable interrupt here, then
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>>>>
> >>>>>>>>and if we start serving lock for an interrupt that came here,
> >>>>>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>>>>if irq spinlock does not take slow path we would have non null
> >>>>>>>>value
> >>>>>>>>for lock, but with no information in waitingcpu.
> >>>>>>>>
> >>>>>>>>I am still thinking what would be problem with that.
> >>>>>>>>
> >>>>>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>>>>non atomic anyway.
> >>>>>>>
> >>>>>>>>>>+    w->lock = NULL;
> >>>>>>>>>>+    local_irq_restore(flags);
> >>>>>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>>>>+}
> >>>>>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>>>>+
> >>>>>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t ticket)
> >>>>>>>>>>+{
> >>>>>>>>>>+    int cpu;
> >>>>>>>>>>+
> >>>>>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>>>>What about using NMI to wake sleepers? I think it was
> >>>>>>>>>discussed, but
> >>>>>>>>>forgot why it was dismissed.
> >>>>>>>>
> >>>>>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>>>>have interrupt disabled?
> >>>>>>>We can of course. IIRC the objection was that NMI handling path
> >>>>>>>is very
> >>>>>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>>>>waking up a guest without injecting an event, but it is still
> >>>>>>>interesting
> >>>>>>>to see the numbers.
> >>>>>>>
> >>>>>>
> >>>>>>Haam, now I remember, We had tried request based mechanism. (new
> >>>>>>request like REQ_UNHALT) and process that. It had worked, but had
> >>>>>>some
> >>>>>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>>>>request cleared.  So had left it there..
> >>>>>>
> >>>>>>https://lkml.org/lkml/2012/4/30/67
> >>>>>>
> >>>>>>But I do not remember performance impact though.
> >>>>>No, this is something different. Wakeup with NMI does not need KVM
> >>>>>changes at
> >>>>>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>>>>
> >>>>
> >>>>True. It was not NMI.
> >>>>just to confirm, are you talking about something like this to be
> >>>>tried ?
> >>>>
> >>>>apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> >>>
> >>>When I started benchmark, I started seeing
> >>>"Dazed and confused, but trying to continue" from unknown nmi error
> >>>handling.
> >>>Did I miss anything (because we did not register any NMI handler)? or
> >>>is it that spurious NMIs are trouble because we could get spurious NMIs
> >>>if next waiter already acquired the lock.
> >>There is a default NMI handler that tries to detect the reason why NMI
> >>happened (which is no so easy on x86) and prints this message if it
> >>fails. You need to add logic to detect spinlock slow path there. Check
> >>bit in waiting_cpus for instance.
> >
> >aha.. Okay. will check that.
> 
> yes. Thanks.. that did the trick.
> 
> I did like below in unknown_nmi_error():
> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>    return;
> 
> But I believe you asked NMI method only for experimental purpose to
> check the upperbound. because as I doubted above, for spurious NMI
> (i.e. when unlocker kicks when waiter already got the lock), we would
> still hit unknown NMI error.
> 
> I had hit spurious NMI over 1656 times over entire benchmark run.
> along with
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
> long to run: 24.886 msecs etc...
> 
I wonder why this happens.

> (and we cannot get away with that too because it means we bypass the
> unknown NMI error even in genuine cases too)
> 
> Here was the result for the my dbench test( 32 core  machine with 32
> vcpu guest HT off)
> 
>                  ---------- % improvement --------------
> 		pvspinlock      pvspin_ipi      pvpsin_nmi
> dbench_1x	0.9016    	0.7442    	0.7522
> dbench_2x	14.7513   	18.0164   	15.9421
> dbench_3x	14.7571   	17.0793   	13.3572
> dbench_4x	6.3625    	8.7897    	5.3800
> 
> So I am seeing over 2-4% improvement with IPI method.
> 
Yeah, this was expected.

> Gleb,
>  do you think the current series looks good to you? [one patch I
> have resent with in_nmi() check] or do you think I have to respin the
> series with IPI method etc. or is there any concerns that I have to
> address. Please let me know..
> 
The current code looks fine to me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-07-31  6:24                         ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-07-31  6:24 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t want)
> >>>>>>>>[...]
> >>>>>>>>>>+
> >>>>>>>>>>+    /*
> >>>>>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>>>>halt
> >>>>>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>>>>overwritten
> >>>>>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>>>>to save us.
> >>>>>>>>>>+     */
> >>>>>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>>>>+        halt();
> >>>>>>>>>>+    else
> >>>>>>>>>>+        safe_halt();
> >>>>>>>>>>+
> >>>>>>>>>>+out:
> >>>>>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>>>>have them
> >>>>>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>>>>thinking.
> >>>>>>>>
> >>>>>>>>If we enable interrupt here, then
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>>>>
> >>>>>>>>and if we start serving lock for an interrupt that came here,
> >>>>>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>>>>if irq spinlock does not take slow path we would have non null
> >>>>>>>>value
> >>>>>>>>for lock, but with no information in waitingcpu.
> >>>>>>>>
> >>>>>>>>I am still thinking what would be problem with that.
> >>>>>>>>
> >>>>>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>>>>non atomic anyway.
> >>>>>>>
> >>>>>>>>>>+    w->lock = NULL;
> >>>>>>>>>>+    local_irq_restore(flags);
> >>>>>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>>>>+}
> >>>>>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>>>>+
> >>>>>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>>>>__ticket_t ticket)
> >>>>>>>>>>+{
> >>>>>>>>>>+    int cpu;
> >>>>>>>>>>+
> >>>>>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>>>>What about using NMI to wake sleepers? I think it was
> >>>>>>>>>discussed, but
> >>>>>>>>>forgot why it was dismissed.
> >>>>>>>>
> >>>>>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>>>>have interrupt disabled?
> >>>>>>>We can of course. IIRC the objection was that NMI handling path
> >>>>>>>is very
> >>>>>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>>>>waking up a guest without injecting an event, but it is still
> >>>>>>>interesting
> >>>>>>>to see the numbers.
> >>>>>>>
> >>>>>>
> >>>>>>Haam, now I remember, We had tried request based mechanism. (new
> >>>>>>request like REQ_UNHALT) and process that. It had worked, but had
> >>>>>>some
> >>>>>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>>>>request cleared.  So had left it there..
> >>>>>>
> >>>>>>https://lkml.org/lkml/2012/4/30/67
> >>>>>>
> >>>>>>But I do not remember performance impact though.
> >>>>>No, this is something different. Wakeup with NMI does not need KVM
> >>>>>changes at
> >>>>>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>>>>
> >>>>
> >>>>True. It was not NMI.
> >>>>just to confirm, are you talking about something like this to be
> >>>>tried ?
> >>>>
> >>>>apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> >>>
> >>>When I started benchmark, I started seeing
> >>>"Dazed and confused, but trying to continue" from unknown nmi error
> >>>handling.
> >>>Did I miss anything (because we did not register any NMI handler)? or
> >>>is it that spurious NMIs are trouble because we could get spurious NMIs
> >>>if next waiter already acquired the lock.
> >>There is a default NMI handler that tries to detect the reason why NMI
> >>happened (which is no so easy on x86) and prints this message if it
> >>fails. You need to add logic to detect spinlock slow path there. Check
> >>bit in waiting_cpus for instance.
> >
> >aha.. Okay. will check that.
> 
> yes. Thanks.. that did the trick.
> 
> I did like below in unknown_nmi_error():
> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>    return;
> 
> But I believe you asked NMI method only for experimental purpose to
> check the upperbound. because as I doubted above, for spurious NMI
> (i.e. when unlocker kicks when waiter already got the lock), we would
> still hit unknown NMI error.
> 
> I had hit spurious NMI over 1656 times over entire benchmark run.
> along with
> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
> long to run: 24.886 msecs etc...
> 
I wonder why this happens.

> (and we cannot get away with that too because it means we bypass the
> unknown NMI error even in genuine cases too)
> 
> Here was the result for the my dbench test( 32 core  machine with 32
> vcpu guest HT off)
> 
>                  ---------- % improvement --------------
> 		pvspinlock      pvspin_ipi      pvpsin_nmi
> dbench_1x	0.9016    	0.7442    	0.7522
> dbench_2x	14.7513   	18.0164   	15.9421
> dbench_3x	14.7571   	17.0793   	13.3572
> dbench_4x	6.3625    	8.7897    	5.3800
> 
> So I am seeing over 2-4% improvement with IPI method.
> 
Yeah, this was expected.

> Gleb,
>  do you think the current series looks good to you? [one patch I
> have resent with in_nmi() check] or do you think I have to respin the
> series with IPI method etc. or is there any concerns that I have to
> address. Please let me know..
> 
The current code looks fine to me.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-07-31  6:24                         ` Gleb Natapov
@ 2013-08-01  7:38                           ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-01  7:38 UTC (permalink / raw)
  To: Gleb Natapov, mingo, x86, tglx
  Cc: jeremy, konrad.wilk, hpa, pbonzini, linux-doc, habanero,
	xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, kvm, linux-kernel, riel, drjones, virtualization,
	srivatsa.vaddagiri

On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
>> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
>>> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>>>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>>>> __ticket_t want)
>>>>>>>>>> [...]
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /*
>>>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>>>> halt
>>>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>>>> overwritten
>>>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>>>> to save us.
>>>>>>>>>>>> +     */
>>>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>>>> +        halt();
>>>>>>>>>>>> +    else
>>>>>>>>>>>> +        safe_halt();
>>>>>>>>>>>> +
>>>>>>>>>>>> +out:
>>>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>>>> have them
>>>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>>>> thinking.
>>>>>>>>>>
>>>>>>>>>> If we enable interrupt here, then
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>>>
>>>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>>>> value
>>>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>>>
>>>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>>>
>>>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>>>> non atomic anyway.
>>>>>>>>>
>>>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    int cpu;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>>>> discussed, but
>>>>>>>>>>> forgot why it was dismissed.
>>>>>>>>>>
>>>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>>>> have interrupt disabled?
>>>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>>>> is very
>>>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>>>> interesting
>>>>>>>>> to see the numbers.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>>>> some
>>>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>>>> request cleared.  So had left it there..
>>>>>>>>
>>>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>>>
>>>>>>>> But I do not remember performance impact though.
>>>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>>>> changes at
>>>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>>>
>>>>>>
>>>>>> True. It was not NMI.
>>>>>> just to confirm, are you talking about something like this to be
>>>>>> tried ?
>>>>>>
>>>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>>>
>>>>> When I started benchmark, I started seeing
>>>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>>>> handling.
>>>>> Did I miss anything (because we did not register any NMI handler)? or
>>>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>>>> if next waiter already acquired the lock.
>>>> There is a default NMI handler that tries to detect the reason why NMI
>>>> happened (which is no so easy on x86) and prints this message if it
>>>> fails. You need to add logic to detect spinlock slow path there. Check
>>>> bit in waiting_cpus for instance.
>>>
>>> aha.. Okay. will check that.
>>
>> yes. Thanks.. that did the trick.
>>
>> I did like below in unknown_nmi_error():
>> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>>     return;
>>
>> But I believe you asked NMI method only for experimental purpose to
>> check the upperbound. because as I doubted above, for spurious NMI
>> (i.e. when unlocker kicks when waiter already got the lock), we would
>> still hit unknown NMI error.
>>
>> I had hit spurious NMI over 1656 times over entire benchmark run.
>> along with
>> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
>> long to run: 24.886 msecs etc...
>>
> I wonder why this happens.
>
>> (and we cannot get away with that too because it means we bypass the
>> unknown NMI error even in genuine cases too)
>>
>> Here was the result for the my dbench test( 32 core  machine with 32
>> vcpu guest HT off)
>>
>>                   ---------- % improvement --------------
>> 		pvspinlock      pvspin_ipi      pvpsin_nmi
>> dbench_1x	0.9016    	0.7442    	0.7522
>> dbench_2x	14.7513   	18.0164   	15.9421
>> dbench_3x	14.7571   	17.0793   	13.3572
>> dbench_4x	6.3625    	8.7897    	5.3800
>>
>> So I am seeing over 2-4% improvement with IPI method.
>>
> Yeah, this was expected.
>
>> Gleb,
>>   do you think the current series looks good to you? [one patch I
>> have resent with in_nmi() check] or do you think I have to respin the
>> series with IPI method etc. or is there any concerns that I have to
>> address. Please let me know..
>>
> The current code looks fine to me.

Gleb,

Shall I consider this as an ack for kvm part?

Ingo,

Do you have any concerns reg this series? please let me know if this 
looks good now to you.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-01  7:38                           ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-01  7:38 UTC (permalink / raw)
  To: Gleb Natapov, mingo, x86, tglx
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, riel, virtualization,
	andi, hpa, xen-devel, stefano.stabellini, habanero, drjones,
	konrad.wilk, ouyang, avi.kivity, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
>> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
>>> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>>>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>>>> __ticket_t want)
>>>>>>>>>> [...]
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /*
>>>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>>>> halt
>>>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>>>> overwritten
>>>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>>>> to save us.
>>>>>>>>>>>> +     */
>>>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>>>> +        halt();
>>>>>>>>>>>> +    else
>>>>>>>>>>>> +        safe_halt();
>>>>>>>>>>>> +
>>>>>>>>>>>> +out:
>>>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>>>> have them
>>>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>>>> thinking.
>>>>>>>>>>
>>>>>>>>>> If we enable interrupt here, then
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>>>
>>>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>>>> value
>>>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>>>
>>>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>>>
>>>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>>>> non atomic anyway.
>>>>>>>>>
>>>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    int cpu;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>>>> discussed, but
>>>>>>>>>>> forgot why it was dismissed.
>>>>>>>>>>
>>>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>>>> have interrupt disabled?
>>>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>>>> is very
>>>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>>>> interesting
>>>>>>>>> to see the numbers.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>>>> some
>>>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>>>> request cleared.  So had left it there..
>>>>>>>>
>>>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>>>
>>>>>>>> But I do not remember performance impact though.
>>>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>>>> changes at
>>>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>>>
>>>>>>
>>>>>> True. It was not NMI.
>>>>>> just to confirm, are you talking about something like this to be
>>>>>> tried ?
>>>>>>
>>>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>>>
>>>>> When I started benchmark, I started seeing
>>>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>>>> handling.
>>>>> Did I miss anything (because we did not register any NMI handler)? or
>>>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>>>> if next waiter already acquired the lock.
>>>> There is a default NMI handler that tries to detect the reason why NMI
>>>> happened (which is no so easy on x86) and prints this message if it
>>>> fails. You need to add logic to detect spinlock slow path there. Check
>>>> bit in waiting_cpus for instance.
>>>
>>> aha.. Okay. will check that.
>>
>> yes. Thanks.. that did the trick.
>>
>> I did like below in unknown_nmi_error():
>> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>>     return;
>>
>> But I believe you asked NMI method only for experimental purpose to
>> check the upperbound. because as I doubted above, for spurious NMI
>> (i.e. when unlocker kicks when waiter already got the lock), we would
>> still hit unknown NMI error.
>>
>> I had hit spurious NMI over 1656 times over entire benchmark run.
>> along with
>> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
>> long to run: 24.886 msecs etc...
>>
> I wonder why this happens.
>
>> (and we cannot get away with that too because it means we bypass the
>> unknown NMI error even in genuine cases too)
>>
>> Here was the result for the my dbench test( 32 core  machine with 32
>> vcpu guest HT off)
>>
>>                   ---------- % improvement --------------
>> 		pvspinlock      pvspin_ipi      pvpsin_nmi
>> dbench_1x	0.9016    	0.7442    	0.7522
>> dbench_2x	14.7513   	18.0164   	15.9421
>> dbench_3x	14.7571   	17.0793   	13.3572
>> dbench_4x	6.3625    	8.7897    	5.3800
>>
>> So I am seeing over 2-4% improvement with IPI method.
>>
> Yeah, this was expected.
>
>> Gleb,
>>   do you think the current series looks good to you? [one patch I
>> have resent with in_nmi() check] or do you think I have to respin the
>> series with IPI method etc. or is there any concerns that I have to
>> address. Please let me know..
>>
> The current code looks fine to me.

Gleb,

Shall I consider this as an ack for kvm part?

Ingo,

Do you have any concerns reg this series? please let me know if this 
looks good now to you.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-01  7:38                           ` Raghavendra K T
@ 2013-08-01  7:45                             ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-08-01  7:45 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: mingo, x86, tglx, jeremy, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, kvm, linux-kernel, riel, drjones, virtualization,
	srivatsa.vaddagiri

On Thu, Aug 01, 2013 at 01:08:47PM +0530, Raghavendra K T wrote:
> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> >On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> >>On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >>>On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>>>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t want)
> >>>>>>>>>>[...]
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+    /*
> >>>>>>>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>>>>>>halt
> >>>>>>>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>>>>>>overwritten
> >>>>>>>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>>>>>>to save us.
> >>>>>>>>>>>>+     */
> >>>>>>>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>>>>>>+        halt();
> >>>>>>>>>>>>+    else
> >>>>>>>>>>>>+        safe_halt();
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+out:
> >>>>>>>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>>>>>>have them
> >>>>>>>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>>>>>>thinking.
> >>>>>>>>>>
> >>>>>>>>>>If we enable interrupt here, then
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>>>>>>
> >>>>>>>>>>and if we start serving lock for an interrupt that came here,
> >>>>>>>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>>>>>>if irq spinlock does not take slow path we would have non null
> >>>>>>>>>>value
> >>>>>>>>>>for lock, but with no information in waitingcpu.
> >>>>>>>>>>
> >>>>>>>>>>I am still thinking what would be problem with that.
> >>>>>>>>>>
> >>>>>>>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>>>>>>non atomic anyway.
> >>>>>>>>>
> >>>>>>>>>>>>+    w->lock = NULL;
> >>>>>>>>>>>>+    local_irq_restore(flags);
> >>>>>>>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>>>>>>+}
> >>>>>>>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t ticket)
> >>>>>>>>>>>>+{
> >>>>>>>>>>>>+    int cpu;
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>>>>>>What about using NMI to wake sleepers? I think it was
> >>>>>>>>>>>discussed, but
> >>>>>>>>>>>forgot why it was dismissed.
> >>>>>>>>>>
> >>>>>>>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>>>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>>>>>>have interrupt disabled?
> >>>>>>>>>We can of course. IIRC the objection was that NMI handling path
> >>>>>>>>>is very
> >>>>>>>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>>>>>>waking up a guest without injecting an event, but it is still
> >>>>>>>>>interesting
> >>>>>>>>>to see the numbers.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>Haam, now I remember, We had tried request based mechanism. (new
> >>>>>>>>request like REQ_UNHALT) and process that. It had worked, but had
> >>>>>>>>some
> >>>>>>>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>>>>>>request cleared.  So had left it there..
> >>>>>>>>
> >>>>>>>>https://lkml.org/lkml/2012/4/30/67
> >>>>>>>>
> >>>>>>>>But I do not remember performance impact though.
> >>>>>>>No, this is something different. Wakeup with NMI does not need KVM
> >>>>>>>changes at
> >>>>>>>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>>>>>>
> >>>>>>
> >>>>>>True. It was not NMI.
> >>>>>>just to confirm, are you talking about something like this to be
> >>>>>>tried ?
> >>>>>>
> >>>>>>apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> >>>>>
> >>>>>When I started benchmark, I started seeing
> >>>>>"Dazed and confused, but trying to continue" from unknown nmi error
> >>>>>handling.
> >>>>>Did I miss anything (because we did not register any NMI handler)? or
> >>>>>is it that spurious NMIs are trouble because we could get spurious NMIs
> >>>>>if next waiter already acquired the lock.
> >>>>There is a default NMI handler that tries to detect the reason why NMI
> >>>>happened (which is no so easy on x86) and prints this message if it
> >>>>fails. You need to add logic to detect spinlock slow path there. Check
> >>>>bit in waiting_cpus for instance.
> >>>
> >>>aha.. Okay. will check that.
> >>
> >>yes. Thanks.. that did the trick.
> >>
> >>I did like below in unknown_nmi_error():
> >>if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
> >>    return;
> >>
> >>But I believe you asked NMI method only for experimental purpose to
> >>check the upperbound. because as I doubted above, for spurious NMI
> >>(i.e. when unlocker kicks when waiter already got the lock), we would
> >>still hit unknown NMI error.
> >>
> >>I had hit spurious NMI over 1656 times over entire benchmark run.
> >>along with
> >>INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
> >>long to run: 24.886 msecs etc...
> >>
> >I wonder why this happens.
> >
> >>(and we cannot get away with that too because it means we bypass the
> >>unknown NMI error even in genuine cases too)
> >>
> >>Here was the result for the my dbench test( 32 core  machine with 32
> >>vcpu guest HT off)
> >>
> >>                  ---------- % improvement --------------
> >>		pvspinlock      pvspin_ipi      pvpsin_nmi
> >>dbench_1x	0.9016    	0.7442    	0.7522
> >>dbench_2x	14.7513   	18.0164   	15.9421
> >>dbench_3x	14.7571   	17.0793   	13.3572
> >>dbench_4x	6.3625    	8.7897    	5.3800
> >>
> >>So I am seeing over 2-4% improvement with IPI method.
> >>
> >Yeah, this was expected.
> >
> >>Gleb,
> >>  do you think the current series looks good to you? [one patch I
> >>have resent with in_nmi() check] or do you think I have to respin the
> >>series with IPI method etc. or is there any concerns that I have to
> >>address. Please let me know..
> >>
> >The current code looks fine to me.
> 
> Gleb,
> 
> Shall I consider this as an ack for kvm part?
> 
For everything except 18/18. For that I still want to see numbers. But
18/18 is pretty independent from the reset of the series so it should
not stop the reset from going in.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-01  7:45                             ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-08-01  7:45 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Thu, Aug 01, 2013 at 01:08:47PM +0530, Raghavendra K T wrote:
> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> >On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> >>On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >>>On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>>>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t want)
> >>>>>>>>>>[...]
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+    /*
> >>>>>>>>>>>>+     * halt until it's our turn and kicked. Note that we do safe
> >>>>>>>>>>>>halt
> >>>>>>>>>>>>+     * for irq enabled case to avoid hang when lock info is
> >>>>>>>>>>>>overwritten
> >>>>>>>>>>>>+     * in irq spinlock slowpath and no spurious interrupt occur
> >>>>>>>>>>>>to save us.
> >>>>>>>>>>>>+     */
> >>>>>>>>>>>>+    if (arch_irqs_disabled_flags(flags))
> >>>>>>>>>>>>+        halt();
> >>>>>>>>>>>>+    else
> >>>>>>>>>>>>+        safe_halt();
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+out:
> >>>>>>>>>>>So here now interrupts can be either disabled or enabled. Previous
> >>>>>>>>>>>version disabled interrupts here, so are we sure it is safe to
> >>>>>>>>>>>have them
> >>>>>>>>>>>enabled at this point? I do not see any problem yet, will keep
> >>>>>>>>>>>thinking.
> >>>>>>>>>>
> >>>>>>>>>>If we enable interrupt here, then
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>+    cpumask_clear_cpu(cpu, &waiting_cpus);
> >>>>>>>>>>
> >>>>>>>>>>and if we start serving lock for an interrupt that came here,
> >>>>>>>>>>cpumask clear and w->lock=null may not happen atomically.
> >>>>>>>>>>if irq spinlock does not take slow path we would have non null
> >>>>>>>>>>value
> >>>>>>>>>>for lock, but with no information in waitingcpu.
> >>>>>>>>>>
> >>>>>>>>>>I am still thinking what would be problem with that.
> >>>>>>>>>>
> >>>>>>>>>Exactly, for kicker waiting_cpus and w->lock updates are
> >>>>>>>>>non atomic anyway.
> >>>>>>>>>
> >>>>>>>>>>>>+    w->lock = NULL;
> >>>>>>>>>>>>+    local_irq_restore(flags);
> >>>>>>>>>>>>+    spin_time_accum_blocked(start);
> >>>>>>>>>>>>+}
> >>>>>>>>>>>>+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+/* Kick vcpu waiting on @lock->head to reach value @ticket */
> >>>>>>>>>>>>+static void kvm_unlock_kick(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t ticket)
> >>>>>>>>>>>>+{
> >>>>>>>>>>>>+    int cpu;
> >>>>>>>>>>>>+
> >>>>>>>>>>>>+    add_stats(RELEASED_SLOW, 1);
> >>>>>>>>>>>>+    for_each_cpu(cpu, &waiting_cpus) {
> >>>>>>>>>>>>+        const struct kvm_lock_waiting *w =
> >>>>>>>>>>>>&per_cpu(lock_waiting, cpu);
> >>>>>>>>>>>>+        if (ACCESS_ONCE(w->lock) == lock &&
> >>>>>>>>>>>>+            ACCESS_ONCE(w->want) == ticket) {
> >>>>>>>>>>>>+            add_stats(RELEASED_SLOW_KICKED, 1);
> >>>>>>>>>>>>+            kvm_kick_cpu(cpu);
> >>>>>>>>>>>What about using NMI to wake sleepers? I think it was
> >>>>>>>>>>>discussed, but
> >>>>>>>>>>>forgot why it was dismissed.
> >>>>>>>>>>
> >>>>>>>>>>I think I have missed that discussion. 'll go back and check. so
> >>>>>>>>>>what is the idea here? we can easily wake up the halted vcpus that
> >>>>>>>>>>have interrupt disabled?
> >>>>>>>>>We can of course. IIRC the objection was that NMI handling path
> >>>>>>>>>is very
> >>>>>>>>>fragile and handling NMI on each wakeup will be more expensive then
> >>>>>>>>>waking up a guest without injecting an event, but it is still
> >>>>>>>>>interesting
> >>>>>>>>>to see the numbers.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>Haam, now I remember, We had tried request based mechanism. (new
> >>>>>>>>request like REQ_UNHALT) and process that. It had worked, but had
> >>>>>>>>some
> >>>>>>>>complex hacks in vcpu_enter_guest to avoid guest hang in case of
> >>>>>>>>request cleared.  So had left it there..
> >>>>>>>>
> >>>>>>>>https://lkml.org/lkml/2012/4/30/67
> >>>>>>>>
> >>>>>>>>But I do not remember performance impact though.
> >>>>>>>No, this is something different. Wakeup with NMI does not need KVM
> >>>>>>>changes at
> >>>>>>>all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
> >>>>>>>
> >>>>>>
> >>>>>>True. It was not NMI.
> >>>>>>just to confirm, are you talking about something like this to be
> >>>>>>tried ?
> >>>>>>
> >>>>>>apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
> >>>>>
> >>>>>When I started benchmark, I started seeing
> >>>>>"Dazed and confused, but trying to continue" from unknown nmi error
> >>>>>handling.
> >>>>>Did I miss anything (because we did not register any NMI handler)? or
> >>>>>is it that spurious NMIs are trouble because we could get spurious NMIs
> >>>>>if next waiter already acquired the lock.
> >>>>There is a default NMI handler that tries to detect the reason why NMI
> >>>>happened (which is no so easy on x86) and prints this message if it
> >>>>fails. You need to add logic to detect spinlock slow path there. Check
> >>>>bit in waiting_cpus for instance.
> >>>
> >>>aha.. Okay. will check that.
> >>
> >>yes. Thanks.. that did the trick.
> >>
> >>I did like below in unknown_nmi_error():
> >>if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
> >>    return;
> >>
> >>But I believe you asked NMI method only for experimental purpose to
> >>check the upperbound. because as I doubted above, for spurious NMI
> >>(i.e. when unlocker kicks when waiter already got the lock), we would
> >>still hit unknown NMI error.
> >>
> >>I had hit spurious NMI over 1656 times over entire benchmark run.
> >>along with
> >>INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
> >>long to run: 24.886 msecs etc...
> >>
> >I wonder why this happens.
> >
> >>(and we cannot get away with that too because it means we bypass the
> >>unknown NMI error even in genuine cases too)
> >>
> >>Here was the result for the my dbench test( 32 core  machine with 32
> >>vcpu guest HT off)
> >>
> >>                  ---------- % improvement --------------
> >>		pvspinlock      pvspin_ipi      pvpsin_nmi
> >>dbench_1x	0.9016    	0.7442    	0.7522
> >>dbench_2x	14.7513   	18.0164   	15.9421
> >>dbench_3x	14.7571   	17.0793   	13.3572
> >>dbench_4x	6.3625    	8.7897    	5.3800
> >>
> >>So I am seeing over 2-4% improvement with IPI method.
> >>
> >Yeah, this was expected.
> >
> >>Gleb,
> >>  do you think the current series looks good to you? [one patch I
> >>have resent with in_nmi() check] or do you think I have to respin the
> >>series with IPI method etc. or is there any concerns that I have to
> >>address. Please let me know..
> >>
> >The current code looks fine to me.
> 
> Gleb,
> 
> Shall I consider this as an ack for kvm part?
> 
For everything except 18/18. For that I still want to see numbers. But
18/18 is pretty independent from the reset of the series so it should
not stop the reset from going in.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-01  7:45                             ` Gleb Natapov
  (?)
@ 2013-08-01  9:04                             ` Raghavendra K T
  2013-08-02  3:22                                 ` Raghavendra K T
  -1 siblings, 1 reply; 121+ messages in thread
From: Raghavendra K T @ 2013-08-01  9:04 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: x86, tglx, jeremy, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, kvm, linux-kernel, riel, drjones, virtualization,
	srivatsa.vaddagiri

On 08/01/2013 01:15 PM, Gleb Natapov wrote:
> On Thu, Aug 01, 2013 at 01:08:47PM +0530, Raghavendra K T wrote:
>> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
>>> On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
>>>> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
>>>>> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>>>>>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>>>>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>>>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>>>>>> __ticket_t want)
>>>>>>>>>>>> [...]
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    /*
>>>>>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>>>>>> halt
>>>>>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>>>>>> overwritten
>>>>>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>>>>>> to save us.
>>>>>>>>>>>>>> +     */
>>>>>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>>>>>> +        halt();
>>>>>>>>>>>>>> +    else
>>>>>>>>>>>>>> +        safe_halt();
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +out:
>>>>>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>>>>>> have them
>>>>>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>>>>>> thinking.
>>>>>>>>>>>>
>>>>>>>>>>>> If we enable interrupt here, then
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>>>>>
>>>>>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>>>>>> value
>>>>>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>>>>>
>>>>>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>>>>>
>>>>>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>>>>>> non atomic anyway.
>>>>>>>>>>>
>>>>>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    int cpu;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>>>>>> discussed, but
>>>>>>>>>>>>> forgot why it was dismissed.
>>>>>>>>>>>>
>>>>>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>>>>>> have interrupt disabled?
>>>>>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>>>>>> is very
>>>>>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>>>>>> interesting
>>>>>>>>>>> to see the numbers.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>>>>>> some
>>>>>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>>>>>> request cleared.  So had left it there..
>>>>>>>>>>
>>>>>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>>>>>
>>>>>>>>>> But I do not remember performance impact though.
>>>>>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>>>>>> changes at
>>>>>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>>>>>
>>>>>>>>
>>>>>>>> True. It was not NMI.
>>>>>>>> just to confirm, are you talking about something like this to be
>>>>>>>> tried ?
>>>>>>>>
>>>>>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>>>>>
>>>>>>> When I started benchmark, I started seeing
>>>>>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>>>>>> handling.
>>>>>>> Did I miss anything (because we did not register any NMI handler)? or
>>>>>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>>>>>> if next waiter already acquired the lock.
>>>>>> There is a default NMI handler that tries to detect the reason why NMI
>>>>>> happened (which is no so easy on x86) and prints this message if it
>>>>>> fails. You need to add logic to detect spinlock slow path there. Check
>>>>>> bit in waiting_cpus for instance.
>>>>>
>>>>> aha.. Okay. will check that.
>>>>
>>>> yes. Thanks.. that did the trick.
>>>>
>>>> I did like below in unknown_nmi_error():
>>>> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>>>>     return;
>>>>
>>>> But I believe you asked NMI method only for experimental purpose to
>>>> check the upperbound. because as I doubted above, for spurious NMI
>>>> (i.e. when unlocker kicks when waiter already got the lock), we would
>>>> still hit unknown NMI error.
>>>>
>>>> I had hit spurious NMI over 1656 times over entire benchmark run.
>>>> along with
>>>> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
>>>> long to run: 24.886 msecs etc...
>>>>
>>> I wonder why this happens.
>>>
>>>> (and we cannot get away with that too because it means we bypass the
>>>> unknown NMI error even in genuine cases too)
>>>>
>>>> Here was the result for the my dbench test( 32 core  machine with 32
>>>> vcpu guest HT off)
>>>>
>>>>                   ---------- % improvement --------------
>>>> 		pvspinlock      pvspin_ipi      pvpsin_nmi
>>>> dbench_1x	0.9016    	0.7442    	0.7522
>>>> dbench_2x	14.7513   	18.0164   	15.9421
>>>> dbench_3x	14.7571   	17.0793   	13.3572
>>>> dbench_4x	6.3625    	8.7897    	5.3800
>>>>
>>>> So I am seeing over 2-4% improvement with IPI method.
>>>>
>>> Yeah, this was expected.
>>>
>>>> Gleb,
>>>>   do you think the current series looks good to you? [one patch I
>>>> have resent with in_nmi() check] or do you think I have to respin the
>>>> series with IPI method etc. or is there any concerns that I have to
>>>> address. Please let me know..
>>>>
>>> The current code looks fine to me.
>>
>> Gleb,
>>
>> Shall I consider this as an ack for kvm part?
>>
> For everything except 18/18. For that I still want to see numbers. But
> 18/18 is pretty independent from the reset of the series so it should
> not stop the reset from going in.

Yes. agreed.
I am going to evaluate patch 18 separately and come with results for
that. Now we can consider only 1-17 patches.




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-01  7:45                             ` Gleb Natapov
  (?)
  (?)
@ 2013-08-01  9:04                             ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-01  9:04 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 08/01/2013 01:15 PM, Gleb Natapov wrote:
> On Thu, Aug 01, 2013 at 01:08:47PM +0530, Raghavendra K T wrote:
>> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
>>> On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
>>>> On 07/25/2013 03:08 PM, Raghavendra K T wrote:
>>>>> On 07/25/2013 02:45 PM, Gleb Natapov wrote:
>>>>>> On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
>>>>>>> On 07/24/2013 06:06 PM, Raghavendra K T wrote:
>>>>>>>> On 07/24/2013 05:36 PM, Gleb Natapov wrote:
>>>>>>>>> On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
>>>>>>>>>> On 07/24/2013 04:09 PM, Gleb Natapov wrote:
>>>>>>>>>>> On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
>>>>>>>>>>>> On 07/23/2013 08:37 PM, Gleb Natapov wrote:
>>>>>>>>>>>>> On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
>>>>>>>>>>>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock,
>>>>>>>>>>>>>> __ticket_t want)
>>>>>>>>>>>> [...]
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    /*
>>>>>>>>>>>>>> +     * halt until it's our turn and kicked. Note that we do safe
>>>>>>>>>>>>>> halt
>>>>>>>>>>>>>> +     * for irq enabled case to avoid hang when lock info is
>>>>>>>>>>>>>> overwritten
>>>>>>>>>>>>>> +     * in irq spinlock slowpath and no spurious interrupt occur
>>>>>>>>>>>>>> to save us.
>>>>>>>>>>>>>> +     */
>>>>>>>>>>>>>> +    if (arch_irqs_disabled_flags(flags))
>>>>>>>>>>>>>> +        halt();
>>>>>>>>>>>>>> +    else
>>>>>>>>>>>>>> +        safe_halt();
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +out:
>>>>>>>>>>>>> So here now interrupts can be either disabled or enabled. Previous
>>>>>>>>>>>>> version disabled interrupts here, so are we sure it is safe to
>>>>>>>>>>>>> have them
>>>>>>>>>>>>> enabled at this point? I do not see any problem yet, will keep
>>>>>>>>>>>>> thinking.
>>>>>>>>>>>>
>>>>>>>>>>>> If we enable interrupt here, then
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> +    cpumask_clear_cpu(cpu, &waiting_cpus);
>>>>>>>>>>>>
>>>>>>>>>>>> and if we start serving lock for an interrupt that came here,
>>>>>>>>>>>> cpumask clear and w->lock=null may not happen atomically.
>>>>>>>>>>>> if irq spinlock does not take slow path we would have non null
>>>>>>>>>>>> value
>>>>>>>>>>>> for lock, but with no information in waitingcpu.
>>>>>>>>>>>>
>>>>>>>>>>>> I am still thinking what would be problem with that.
>>>>>>>>>>>>
>>>>>>>>>>> Exactly, for kicker waiting_cpus and w->lock updates are
>>>>>>>>>>> non atomic anyway.
>>>>>>>>>>>
>>>>>>>>>>>>>> +    w->lock = NULL;
>>>>>>>>>>>>>> +    local_irq_restore(flags);
>>>>>>>>>>>>>> +    spin_time_accum_blocked(start);
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>>>>>>>>>>>>> +static void kvm_unlock_kick(struct arch_spinlock *lock,
>>>>>>>>>>>>>> __ticket_t ticket)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +    int cpu;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    add_stats(RELEASED_SLOW, 1);
>>>>>>>>>>>>>> +    for_each_cpu(cpu, &waiting_cpus) {
>>>>>>>>>>>>>> +        const struct kvm_lock_waiting *w =
>>>>>>>>>>>>>> &per_cpu(lock_waiting, cpu);
>>>>>>>>>>>>>> +        if (ACCESS_ONCE(w->lock) == lock &&
>>>>>>>>>>>>>> +            ACCESS_ONCE(w->want) == ticket) {
>>>>>>>>>>>>>> +            add_stats(RELEASED_SLOW_KICKED, 1);
>>>>>>>>>>>>>> +            kvm_kick_cpu(cpu);
>>>>>>>>>>>>> What about using NMI to wake sleepers? I think it was
>>>>>>>>>>>>> discussed, but
>>>>>>>>>>>>> forgot why it was dismissed.
>>>>>>>>>>>>
>>>>>>>>>>>> I think I have missed that discussion. 'll go back and check. so
>>>>>>>>>>>> what is the idea here? we can easily wake up the halted vcpus that
>>>>>>>>>>>> have interrupt disabled?
>>>>>>>>>>> We can of course. IIRC the objection was that NMI handling path
>>>>>>>>>>> is very
>>>>>>>>>>> fragile and handling NMI on each wakeup will be more expensive then
>>>>>>>>>>> waking up a guest without injecting an event, but it is still
>>>>>>>>>>> interesting
>>>>>>>>>>> to see the numbers.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Haam, now I remember, We had tried request based mechanism. (new
>>>>>>>>>> request like REQ_UNHALT) and process that. It had worked, but had
>>>>>>>>>> some
>>>>>>>>>> complex hacks in vcpu_enter_guest to avoid guest hang in case of
>>>>>>>>>> request cleared.  So had left it there..
>>>>>>>>>>
>>>>>>>>>> https://lkml.org/lkml/2012/4/30/67
>>>>>>>>>>
>>>>>>>>>> But I do not remember performance impact though.
>>>>>>>>> No, this is something different. Wakeup with NMI does not need KVM
>>>>>>>>> changes at
>>>>>>>>> all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.
>>>>>>>>>
>>>>>>>>
>>>>>>>> True. It was not NMI.
>>>>>>>> just to confirm, are you talking about something like this to be
>>>>>>>> tried ?
>>>>>>>>
>>>>>>>> apic->send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);
>>>>>>>
>>>>>>> When I started benchmark, I started seeing
>>>>>>> "Dazed and confused, but trying to continue" from unknown nmi error
>>>>>>> handling.
>>>>>>> Did I miss anything (because we did not register any NMI handler)? or
>>>>>>> is it that spurious NMIs are trouble because we could get spurious NMIs
>>>>>>> if next waiter already acquired the lock.
>>>>>> There is a default NMI handler that tries to detect the reason why NMI
>>>>>> happened (which is no so easy on x86) and prints this message if it
>>>>>> fails. You need to add logic to detect spinlock slow path there. Check
>>>>>> bit in waiting_cpus for instance.
>>>>>
>>>>> aha.. Okay. will check that.
>>>>
>>>> yes. Thanks.. that did the trick.
>>>>
>>>> I did like below in unknown_nmi_error():
>>>> if (cpumask_test_cpu(smp_processor_id(), &waiting_cpus))
>>>>     return;
>>>>
>>>> But I believe you asked NMI method only for experimental purpose to
>>>> check the upperbound. because as I doubted above, for spurious NMI
>>>> (i.e. when unlocker kicks when waiter already got the lock), we would
>>>> still hit unknown NMI error.
>>>>
>>>> I had hit spurious NMI over 1656 times over entire benchmark run.
>>>> along with
>>>> INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
>>>> long to run: 24.886 msecs etc...
>>>>
>>> I wonder why this happens.
>>>
>>>> (and we cannot get away with that too because it means we bypass the
>>>> unknown NMI error even in genuine cases too)
>>>>
>>>> Here was the result for the my dbench test( 32 core  machine with 32
>>>> vcpu guest HT off)
>>>>
>>>>                   ---------- % improvement --------------
>>>> 		pvspinlock      pvspin_ipi      pvpsin_nmi
>>>> dbench_1x	0.9016    	0.7442    	0.7522
>>>> dbench_2x	14.7513   	18.0164   	15.9421
>>>> dbench_3x	14.7571   	17.0793   	13.3572
>>>> dbench_4x	6.3625    	8.7897    	5.3800
>>>>
>>>> So I am seeing over 2-4% improvement with IPI method.
>>>>
>>> Yeah, this was expected.
>>>
>>>> Gleb,
>>>>   do you think the current series looks good to you? [one patch I
>>>> have resent with in_nmi() check] or do you think I have to respin the
>>>> series with IPI method etc. or is there any concerns that I have to
>>>> address. Please let me know..
>>>>
>>> The current code looks fine to me.
>>
>> Gleb,
>>
>> Shall I consider this as an ack for kvm part?
>>
> For everything except 18/18. For that I still want to see numbers. But
> 18/18 is pretty independent from the reset of the series so it should
> not stop the reset from going in.

Yes. agreed.
I am going to evaluate patch 18 separately and come with results for
that. Now we can consider only 1-17 patches.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-01  9:04                             ` Raghavendra K T
@ 2013-08-02  3:22                                 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02  3:22 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: x86, tglx, jeremy, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, kvm, linux-kernel, riel, drjones, virtualization,
	srivatsa.vaddagiri

On 08/01/2013 02:34 PM, Raghavendra K T wrote:
> On 08/01/2013 01:15 PM, Gleb Natapov wrote:
>>> Shall I consider this as an ack for kvm part?
>>>
>> For everything except 18/18. For that I still want to see numbers. But
>> 18/18 is pretty independent from the reset of the series so it should
>> not stop the reset from going in.
>
> Yes. agreed.
> I am going to evaluate patch 18 separately and come with results for
> that. Now we can consider only 1-17 patches.
>

Gleb,

32 core machine with HT off 32 vcpu guests.
base = 3.11-rc + patch 1 -17 pvspinlock_v11
patched = base + patch 18

+-----------+-----------+-----------+------------+-----------+
                   dbench  (Throughput in MB/sec higher is better)
+-----------+-----------+-----------+------------+-----------+
       base      stdev       patched    stdev       %improvement
+-----------+-----------+-----------+------------+-----------+
1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
3x   967.8212    42.0257     971.8855    18.8532     0.41994
4x   685.2764    25.7150     694.5881     8.3907     1.35882
+-----------+-----------+-----------+------------+-----------+

I saw -0.78% - .84% changes in ebizzy and 1-2% improvement in
hackbench.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02  3:22                                 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02  3:22 UTC (permalink / raw)
  To: Gleb Natapov, mingo
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, x86, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 08/01/2013 02:34 PM, Raghavendra K T wrote:
> On 08/01/2013 01:15 PM, Gleb Natapov wrote:
>>> Shall I consider this as an ack for kvm part?
>>>
>> For everything except 18/18. For that I still want to see numbers. But
>> 18/18 is pretty independent from the reset of the series so it should
>> not stop the reset from going in.
>
> Yes. agreed.
> I am going to evaluate patch 18 separately and come with results for
> that. Now we can consider only 1-17 patches.
>

Gleb,

32 core machine with HT off 32 vcpu guests.
base = 3.11-rc + patch 1 -17 pvspinlock_v11
patched = base + patch 18

+-----------+-----------+-----------+------------+-----------+
                   dbench  (Throughput in MB/sec higher is better)
+-----------+-----------+-----------+------------+-----------+
       base      stdev       patched    stdev       %improvement
+-----------+-----------+-----------+------------+-----------+
1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
3x   967.8212    42.0257     971.8855    18.8532     0.41994
4x   685.2764    25.7150     694.5881     8.3907     1.35882
+-----------+-----------+-----------+------------+-----------+

I saw -0.78% - .84% changes in ebizzy and 1-2% improvement in
hackbench.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-02  3:22                                 ` Raghavendra K T
@ 2013-08-02  9:23                                   ` Ingo Molnar
  -1 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-02  9:23 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Gleb Natapov, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri


* Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:

> On 08/01/2013 02:34 PM, Raghavendra K T wrote:
> >On 08/01/2013 01:15 PM, Gleb Natapov wrote:
> >>>Shall I consider this as an ack for kvm part?
> >>>
> >>For everything except 18/18. For that I still want to see numbers. But
> >>18/18 is pretty independent from the reset of the series so it should
> >>not stop the reset from going in.
> >
> >Yes. agreed.
> >I am going to evaluate patch 18 separately and come with results for
> >that. Now we can consider only 1-17 patches.
> >
> 
> Gleb,
> 
> 32 core machine with HT off 32 vcpu guests.
> base = 3.11-rc + patch 1 -17 pvspinlock_v11
> patched = base + patch 18
> 
> +-----------+-----------+-----------+------------+-----------+
>                   dbench  (Throughput in MB/sec higher is better)
> +-----------+-----------+-----------+------------+-----------+
>       base      stdev       patched    stdev       %improvement
> +-----------+-----------+-----------+------------+-----------+
> 1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
> 2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
> 3x   967.8212    42.0257     971.8855    18.8532     0.41994
> 4x   685.2764    25.7150     694.5881     8.3907     1.35882
> +-----------+-----------+-----------+------------+-----------+

Please list stddev in percentage as well ...

a blind stab gave me these figures:

>       base      stdev       patched    stdev       %improvement
> 3x   967.8212    4.3%     971.8855      1.8%     0.4

That makes the improvement an order of magnitude smaller than the noise of 
the measurement ... i.e. totally inconclusive.

Also please cut the excessive decimal points: with 2-4% noise what point 
is there in 5 decimal point results??

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02  9:23                                   ` Ingo Molnar
  0 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-02  9:23 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, linux-doc, peterz, drjones, virtualization, andi,
	hpa, stefano.stabellini, xen-devel, kvm, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds


* Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:

> On 08/01/2013 02:34 PM, Raghavendra K T wrote:
> >On 08/01/2013 01:15 PM, Gleb Natapov wrote:
> >>>Shall I consider this as an ack for kvm part?
> >>>
> >>For everything except 18/18. For that I still want to see numbers. But
> >>18/18 is pretty independent from the reset of the series so it should
> >>not stop the reset from going in.
> >
> >Yes. agreed.
> >I am going to evaluate patch 18 separately and come with results for
> >that. Now we can consider only 1-17 patches.
> >
> 
> Gleb,
> 
> 32 core machine with HT off 32 vcpu guests.
> base = 3.11-rc + patch 1 -17 pvspinlock_v11
> patched = base + patch 18
> 
> +-----------+-----------+-----------+------------+-----------+
>                   dbench  (Throughput in MB/sec higher is better)
> +-----------+-----------+-----------+------------+-----------+
>       base      stdev       patched    stdev       %improvement
> +-----------+-----------+-----------+------------+-----------+
> 1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
> 2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
> 3x   967.8212    42.0257     971.8855    18.8532     0.41994
> 4x   685.2764    25.7150     694.5881     8.3907     1.35882
> +-----------+-----------+-----------+------------+-----------+

Please list stddev in percentage as well ...

a blind stab gave me these figures:

>       base      stdev       patched    stdev       %improvement
> 3x   967.8212    4.3%     971.8855      1.8%     0.4

That makes the improvement an order of magnitude smaller than the noise of 
the measurement ... i.e. totally inconclusive.

Also please cut the excessive decimal points: with 2-4% noise what point 
is there in 5 decimal point results??

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-01  7:38                           ` Raghavendra K T
@ 2013-08-02  9:25                             ` Ingo Molnar
  -1 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-02  9:25 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Gleb Natapov, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri


* Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:

> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> >On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> >>On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >>>On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>>>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t want)
> >>>>>>>>>>[...]

[ a few hundred lines of unnecessary quotation snipped. ]

> Ingo,
> 
> Do you have any concerns reg this series? please let me know if this 
> looks good now to you.

I'm inclined to NAK it for excessive quotation - who knows how many people 
left the discussion in disgust? Was it done to drive away as many 
reviewers as possible?

Anyway, see my other reply, the measurement results seem hard to interpret 
and inconclusive at the moment.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02  9:25                             ` Ingo Molnar
  0 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-02  9:25 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, linux-doc, peterz, drjones, virtualization, andi,
	hpa, stefano.stabellini, xen-devel, kvm, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds


* Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:

> On 07/31/2013 11:54 AM, Gleb Natapov wrote:
> >On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
> >>On 07/25/2013 03:08 PM, Raghavendra K T wrote:
> >>>On 07/25/2013 02:45 PM, Gleb Natapov wrote:
> >>>>On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
> >>>>>On 07/24/2013 06:06 PM, Raghavendra K T wrote:
> >>>>>>On 07/24/2013 05:36 PM, Gleb Natapov wrote:
> >>>>>>>On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
> >>>>>>>>On 07/24/2013 04:09 PM, Gleb Natapov wrote:
> >>>>>>>>>On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
> >>>>>>>>>>On 07/23/2013 08:37 PM, Gleb Natapov wrote:
> >>>>>>>>>>>On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
> >>>>>>>>>>>>+static void kvm_lock_spinning(struct arch_spinlock *lock,
> >>>>>>>>>>>>__ticket_t want)
> >>>>>>>>>>[...]

[ a few hundred lines of unnecessary quotation snipped. ]

> Ingo,
> 
> Do you have any concerns reg this series? please let me know if this 
> looks good now to you.

I'm inclined to NAK it for excessive quotation - who knows how many people 
left the discussion in disgust? Was it done to drive away as many 
reviewers as possible?

Anyway, see my other reply, the measurement results seem hard to interpret 
and inconclusive at the moment.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-02  9:23                                   ` Ingo Molnar
@ 2013-08-02  9:44                                     ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gleb Natapov, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri

On 08/02/2013 02:53 PM, Ingo Molnar wrote:
>
> * Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:
>
>> On 08/01/2013 02:34 PM, Raghavendra K T wrote:
>>> On 08/01/2013 01:15 PM, Gleb Natapov wrote:
>>>>> Shall I consider this as an ack for kvm part?
>>>>>
>>>> For everything except 18/18. For that I still want to see numbers. But
>>>> 18/18 is pretty independent from the reset of the series so it should
>>>> not stop the reset from going in.
>>>
>>> Yes. agreed.
>>> I am going to evaluate patch 18 separately and come with results for
>>> that. Now we can consider only 1-17 patches.
>>>
>>
>> Gleb,
>>
>> 32 core machine with HT off 32 vcpu guests.
>> base = 3.11-rc + patch 1 -17 pvspinlock_v11
>> patched = base + patch 18
>>
>> +-----------+-----------+-----------+------------+-----------+
>>                    dbench  (Throughput in MB/sec higher is better)
>> +-----------+-----------+-----------+------------+-----------+
>>        base      stdev       patched    stdev       %improvement
>> +-----------+-----------+-----------+------------+-----------+
>> 1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
>> 2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
>> 3x   967.8212    42.0257     971.8855    18.8532     0.41994
>> 4x   685.2764    25.7150     694.5881     8.3907     1.35882
>> +-----------+-----------+-----------+------------+-----------+
>
> Please list stddev in percentage as well ...

Sure. will do this from next time.

>
> a blind stab gave me these figures:
>
>>        base      stdev       patched    stdev       %improvement
>> 3x   967.8212    4.3%     971.8855      1.8%     0.4
>
> That makes the improvement an order of magnitude smaller than the noise of
> the measurement ... i.e. totally inconclusive.

Okay. agreed.

I always had seen the positive effect of the patch since it uses ple
handler heuristics, and thus avoiding the directed yield to vcpu's in
halt handler. But the current results clearly does not conclude
anything favoring that. :(

So please drop patch 18 for now.

>
> Also please cut the excessive decimal points: with 2-4% noise what point
> is there in 5 decimal point results??

Yes.

Ingo, do you think now the patch series (patch 1 to 17) are in good
shape? or please let me know if you have any concerns to be
addressed.




^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02  9:44                                     ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02  9:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jeremy, gregkh, linux-doc, peterz, drjones, virtualization, andi,
	hpa, stefano.stabellini, xen-devel, kvm, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On 08/02/2013 02:53 PM, Ingo Molnar wrote:
>
> * Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> wrote:
>
>> On 08/01/2013 02:34 PM, Raghavendra K T wrote:
>>> On 08/01/2013 01:15 PM, Gleb Natapov wrote:
>>>>> Shall I consider this as an ack for kvm part?
>>>>>
>>>> For everything except 18/18. For that I still want to see numbers. But
>>>> 18/18 is pretty independent from the reset of the series so it should
>>>> not stop the reset from going in.
>>>
>>> Yes. agreed.
>>> I am going to evaluate patch 18 separately and come with results for
>>> that. Now we can consider only 1-17 patches.
>>>
>>
>> Gleb,
>>
>> 32 core machine with HT off 32 vcpu guests.
>> base = 3.11-rc + patch 1 -17 pvspinlock_v11
>> patched = base + patch 18
>>
>> +-----------+-----------+-----------+------------+-----------+
>>                    dbench  (Throughput in MB/sec higher is better)
>> +-----------+-----------+-----------+------------+-----------+
>>        base      stdev       patched    stdev       %improvement
>> +-----------+-----------+-----------+------------+-----------+
>> 1x 14584.3800   146.9074   14705.1000   163.1060     0.82773
>> 2x  1713.7300    32.8750    1717.3200    45.5979     0.20948
>> 3x   967.8212    42.0257     971.8855    18.8532     0.41994
>> 4x   685.2764    25.7150     694.5881     8.3907     1.35882
>> +-----------+-----------+-----------+------------+-----------+
>
> Please list stddev in percentage as well ...

Sure. will do this from next time.

>
> a blind stab gave me these figures:
>
>>        base      stdev       patched    stdev       %improvement
>> 3x   967.8212    4.3%     971.8855      1.8%     0.4
>
> That makes the improvement an order of magnitude smaller than the noise of
> the measurement ... i.e. totally inconclusive.

Okay. agreed.

I always had seen the positive effect of the patch since it uses ple
handler heuristics, and thus avoiding the directed yield to vcpu's in
halt handler. But the current results clearly does not conclude
anything favoring that. :(

So please drop patch 18 for now.

>
> Also please cut the excessive decimal points: with 2-4% noise what point
> is there in 5 decimal point results??

Yes.

Ingo, do you think now the patch series (patch 1 to 17) are in good
shape? or please let me know if you have any concerns to be
addressed.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-02  9:25                             ` Ingo Molnar
@ 2013-08-02  9:54                               ` Gleb Natapov
  -1 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-08-02  9:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Raghavendra K T, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri

On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > Ingo,
> > 
> > Do you have any concerns reg this series? please let me know if this 
> > looks good now to you.
> 
> I'm inclined to NAK it for excessive quotation - who knows how many people 
> left the discussion in disgust? Was it done to drive away as many 
> reviewers as possible?
> 
> Anyway, see my other reply, the measurement results seem hard to interpret 
> and inconclusive at the moment.
> 
That result was only for patch 18 of the series, not pvspinlock in
general.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02  9:54                               ` Gleb Natapov
  0 siblings, 0 replies; 121+ messages in thread
From: Gleb Natapov @ 2013-08-02  9:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jeremy, x86, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, Raghavendra K T, mingo,
	habanero, riel, konrad.wilk, ouyang, avi.kivity, tglx,
	chegu_vinod, gregkh, linux-kernel, srivatsa.vaddagiri,
	attilio.rao, pbonzini, torvalds

On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > Ingo,
> > 
> > Do you have any concerns reg this series? please let me know if this 
> > looks good now to you.
> 
> I'm inclined to NAK it for excessive quotation - who knows how many people 
> left the discussion in disgust? Was it done to drive away as many 
> reviewers as possible?
> 
> Anyway, see my other reply, the measurement results seem hard to interpret 
> and inconclusive at the moment.
> 
That result was only for patch 18 of the series, not pvspinlock in
general.

--
			Gleb.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-02  9:54                               ` Gleb Natapov
@ 2013-08-02 10:57                                 ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02 10:57 UTC (permalink / raw)
  To: Gleb Natapov, Ingo Molnar, peterz
  Cc: mingo, x86, tglx, jeremy, konrad.wilk, hpa, pbonzini, linux-doc,
	habanero, xen-devel, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, kvm, linux-kernel, riel, drjones, virtualization,
	srivatsa.vaddagiri

On 08/02/2013 03:24 PM, Gleb Natapov wrote:
> On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
>>> Ingo,
>>>
>>> Do you have any concerns reg this series? please let me know if this
>>> looks good now to you.
>>
>> I'm inclined to NAK it for excessive quotation - who knows how many people
>> left the discussion in disgust? Was it done to drive away as many
>> reviewers as possible?

Ingo, Peter,
Sorry for the confusion caused because of nesting. I should have trimmed 
it as Peter also pointed in other thread.
will ensure that is future mails.


>> Anyway, see my other reply, the measurement results seem hard to interpret
>> and inconclusive at the moment.

As Gleb already pointed, patch1-17 as a whole giving good performance 
improvement. It was only the patch 18, Gleb had concerns.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-02 10:57                                 ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-02 10:57 UTC (permalink / raw)
  To: Gleb Natapov, Ingo Molnar, peterz
  Cc: jeremy, gregkh, kvm, linux-doc, drjones, virtualization, andi,
	hpa, stefano.stabellini, xen-devel, x86, mingo, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 08/02/2013 03:24 PM, Gleb Natapov wrote:
> On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
>>> Ingo,
>>>
>>> Do you have any concerns reg this series? please let me know if this
>>> looks good now to you.
>>
>> I'm inclined to NAK it for excessive quotation - who knows how many people
>> left the discussion in disgust? Was it done to drive away as many
>> reviewers as possible?

Ingo, Peter,
Sorry for the confusion caused because of nesting. I should have trimmed 
it as Peter also pointed in other thread.
will ensure that is future mails.


>> Anyway, see my other reply, the measurement results seem hard to interpret
>> and inconclusive at the moment.

As Gleb already pointed, patch1-17 as a whole giving good performance 
improvement. It was only the patch 18, Gleb had concerns.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-02  9:54                               ` Gleb Natapov
@ 2013-08-05  9:46                                 ` Ingo Molnar
  -1 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-05  9:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Raghavendra K T, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri


* Gleb Natapov <gleb@redhat.com> wrote:

> On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > > Ingo,
> > > 
> > > Do you have any concerns reg this series? please let me know if this 
> > > looks good now to you.
> > 
> > I'm inclined to NAK it for excessive quotation - who knows how many 
> > people left the discussion in disgust? Was it done to drive away as 
> > many reviewers as possible?
> > 
> > Anyway, see my other reply, the measurement results seem hard to 
> > interpret and inconclusive at the moment.
>
> That result was only for patch 18 of the series, not pvspinlock in 
> general.

Okay - I've re-read the performance numbers and they are impressive, so no 
objections from me.

The x86 impact seems to be a straightforward API change, with most of the 
changes on the virtualization side. So:

Acked-by: Ingo Molnar <mingo@kernel.org>

I guess you'd want to carry this in the KVM tree or so - maybe in a 
separate branch because it changes Xen as well?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-05  9:46                                 ` Ingo Molnar
  0 siblings, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-05  9:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, x86, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, Raghavendra K T, mingo,
	habanero, riel, konrad.wilk, ouyang, avi.kivity, tglx,
	chegu_vinod, gregkh, linux-kernel, srivatsa.vaddagiri,
	attilio.rao, pbonzini, torvalds


* Gleb Natapov <gleb@redhat.com> wrote:

> On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > > Ingo,
> > > 
> > > Do you have any concerns reg this series? please let me know if this 
> > > looks good now to you.
> > 
> > I'm inclined to NAK it for excessive quotation - who knows how many 
> > people left the discussion in disgust? Was it done to drive away as 
> > many reviewers as possible?
> > 
> > Anyway, see my other reply, the measurement results seem hard to 
> > interpret and inconclusive at the moment.
>
> That result was only for patch 18 of the series, not pvspinlock in 
> general.

Okay - I've re-read the performance numbers and they are impressive, so no 
objections from me.

The x86 impact seems to be a straightforward API change, with most of the 
changes on the virtualization side. So:

Acked-by: Ingo Molnar <mingo@kernel.org>

I guess you'd want to carry this in the KVM tree or so - maybe in a 
separate branch because it changes Xen as well?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05  9:46                                 ` Ingo Molnar
@ 2013-08-05 10:42                                   ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-05 10:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gleb Natapov, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri


>> That result was only for patch 18 of the series, not pvspinlock in
>> general.
>
> Okay - I've re-read the performance numbers and they are impressive, so no
> objections from me.
>
> The x86 impact seems to be a straightforward API change, with most of the
> changes on the virtualization side. So:
>
> Acked-by: Ingo Molnar <mingo@kernel.org>
>
> I guess you'd want to carry this in the KVM tree or so - maybe in a
> separate branch because it changes Xen as well?
>

Thank you Ingo for taking a relook.

Gleb, Please let me know if you want me to resend the first 17 patches
with acked-bys. i.e excluding the 18th patch.





^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-05 10:42                                   ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-05 10:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jeremy, gregkh, linux-doc, peterz, drjones, virtualization, andi,
	hpa, stefano.stabellini, xen-devel, kvm, x86, mingo, habanero,
	riel, konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod,
	linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds


>> That result was only for patch 18 of the series, not pvspinlock in
>> general.
>
> Okay - I've re-read the performance numbers and they are impressive, so no
> objections from me.
>
> The x86 impact seems to be a straightforward API change, with most of the
> changes on the virtualization side. So:
>
> Acked-by: Ingo Molnar <mingo@kernel.org>
>
> I guess you'd want to carry this in the KVM tree or so - maybe in a
> separate branch because it changes Xen as well?
>

Thank you Ingo for taking a relook.

Gleb, Please let me know if you want me to resend the first 17 patches
with acked-bys. i.e excluding the 18th patch.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
       [not found]                                 ` <20130805095901.GL2258@redhat.com>
@ 2013-08-05 13:52                                   ` Ingo Molnar
  2013-08-05 14:05                                       ` Paolo Bonzini
  2013-08-05 13:52                                   ` Ingo Molnar
  1 sibling, 1 reply; 121+ messages in thread
From: Ingo Molnar @ 2013-08-05 13:52 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Raghavendra K T, mingo, x86, tglx, jeremy, konrad.wilk, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri


* Gleb Natapov <gleb@redhat.com> wrote:

> On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> > Acked-by: Ingo Molnar <mingo@kernel.org>
> > 
> > I guess you'd want to carry this in the KVM tree or so - maybe in a 
> > separate branch because it changes Xen as well?
> 
> It changes KVM host and guest side, XEN and common x86 spinlock code. I 
> think it would be best to merge common x86 spinlock bits and guest side 
> KVM/XEN bits through tip tree and host KVM part will go through KVM 
> tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send 
> two separate patch series one for the tip and one for KVM host side.

Sure, that's fine - if the initial series works fine in isolation as well 
(i.e. won't break anything).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
       [not found]                                 ` <20130805095901.GL2258@redhat.com>
  2013-08-05 13:52                                   ` Ingo Molnar
@ 2013-08-05 13:52                                   ` Ingo Molnar
  1 sibling, 0 replies; 121+ messages in thread
From: Ingo Molnar @ 2013-08-05 13:52 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: jeremy, x86, kvm, linux-doc, peterz, drjones, virtualization,
	andi, hpa, stefano.stabellini, xen-devel, Raghavendra K T, mingo,
	habanero, riel, konrad.wilk, ouyang, avi.kivity, tglx,
	chegu_vinod, gregkh, linux-kernel, srivatsa.vaddagiri,
	attilio.rao, pbonzini, torvalds


* Gleb Natapov <gleb@redhat.com> wrote:

> On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> > Acked-by: Ingo Molnar <mingo@kernel.org>
> > 
> > I guess you'd want to carry this in the KVM tree or so - maybe in a 
> > separate branch because it changes Xen as well?
> 
> It changes KVM host and guest side, XEN and common x86 spinlock code. I 
> think it would be best to merge common x86 spinlock bits and guest side 
> KVM/XEN bits through tip tree and host KVM part will go through KVM 
> tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send 
> two separate patch series one for the tip and one for KVM host side.

Sure, that's fine - if the initial series works fine in isolation as well 
(i.e. won't break anything).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05 13:52                                   ` Ingo Molnar
@ 2013-08-05 14:05                                       ` Paolo Bonzini
  0 siblings, 0 replies; 121+ messages in thread
From: Paolo Bonzini @ 2013-08-05 14:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gleb Natapov, Raghavendra K T, mingo, x86, tglx, jeremy,
	konrad wilk, hpa, linux-doc, habanero, xen-devel, peterz,
	mtosatti, stefano stabellini, andi, attilio rao, ouyang, gregkh,
	agraf, chegu vinod, torvalds, avi kivity, kvm, linux-kernel,
	riel, drjones, virtualization, srivatsa vaddagiri

> > On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> > > Acked-by: Ingo Molnar <mingo@kernel.org>
> > > 
> > > I guess you'd want to carry this in the KVM tree or so - maybe in a
> > > separate branch because it changes Xen as well?
> > 
> > It changes KVM host and guest side, XEN and common x86 spinlock code. I
> > think it would be best to merge common x86 spinlock bits and guest side
> > KVM/XEN bits through tip tree and host KVM part will go through KVM
> > tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send
> > two separate patch series one for the tip and one for KVM host side.
> 
> Sure, that's fine - if the initial series works fine in isolation as well
> (i.e. won't break anything).

It would be a big problem if it didn't!  Raghavendra, please send the
two separate series as Gleb explained above.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-05 14:05                                       ` Paolo Bonzini
  0 siblings, 0 replies; 121+ messages in thread
From: Paolo Bonzini @ 2013-08-05 14:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jeremy, x86, linux-doc, peterz, drjones, virtualization, andi,
	hpa, stefano stabellini, xen-devel, kvm, Raghavendra K T, mingo,
	habanero, riel, konrad wilk, ouyang, avi kivity, tglx,
	chegu vinod, gregkh, linux-kernel, srivatsa vaddagiri,
	attilio rao, torvalds

> > On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> > > Acked-by: Ingo Molnar <mingo@kernel.org>
> > > 
> > > I guess you'd want to carry this in the KVM tree or so - maybe in a
> > > separate branch because it changes Xen as well?
> > 
> > It changes KVM host and guest side, XEN and common x86 spinlock code. I
> > think it would be best to merge common x86 spinlock bits and guest side
> > KVM/XEN bits through tip tree and host KVM part will go through KVM
> > tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send
> > two separate patch series one for the tip and one for KVM host side.
> 
> Sure, that's fine - if the initial series works fine in isolation as well
> (i.e. won't break anything).

It would be a big problem if it didn't!  Raghavendra, please send the
two separate series as Gleb explained above.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05 14:05                                       ` Paolo Bonzini
  (?)
@ 2013-08-05 14:39                                       ` Raghavendra K T
  2013-08-05 14:45                                         ` Paolo Bonzini
  2013-08-05 14:45                                         ` Paolo Bonzini
  -1 siblings, 2 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-05 14:39 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov
  Cc: Ingo Molnar, mingo, x86, tglx, jeremy, konrad wilk, hpa,
	linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano stabellini, andi, attilio rao, ouyang, gregkh, agraf,
	chegu vinod, torvalds, avi kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa vaddagiri

On 08/05/2013 07:35 PM, Paolo Bonzini wrote:
>>>> I guess you'd want to carry this in the KVM tree or so - maybe in a
>>>> separate branch because it changes Xen as well?
>>>
>>> It changes KVM host and guest side, XEN and common x86 spinlock code. I
>>> think it would be best to merge common x86 spinlock bits and guest side
>>> KVM/XEN bits through tip tree and host KVM part will go through KVM
>>> tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send
>>> two separate patch series one for the tip and one for KVM host side.
>>
>> Sure, that's fine - if the initial series works fine in isolation as well
>> (i.e. won't break anything).
>
> It would be a big problem if it didn't!  Raghavendra, please send the
> two separate series as Gleb explained above.
>

Yes. Sure.  The patches have been split in that way.

Only thing I am thinking is about KVM_FEATURE_PV_UNHALT, and 
KVM_HC_KICK_CPU definition in the below hunk, that is needed by guest
as well. may be this header file change can be a separate patch so that
duplicate can be handled easily during merge?

I do testing of all combination after splitting and post.

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
  #define KVM_FEATURE_ASYNC_PF           4
  #define KVM_FEATURE_STEAL_TIME         5
  #define KVM_FEATURE_PV_EOI             6
+#define KVM_FEATURE_PV_UNHALT          7

diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
  #define KVM_HC_MMU_OP                  2
  #define KVM_HC_FEATURES                        3
  #define KVM_HC_PPC_MAP_MAGIC_PAGE      4
+#define KVM_HC_KICK_CPU                        5



^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05 14:05                                       ` Paolo Bonzini
  (?)
  (?)
@ 2013-08-05 14:39                                       ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-05 14:39 UTC (permalink / raw)
  To: Paolo Bonzini, Gleb Natapov
  Cc: jeremy, kvm, linux-doc, peterz, drjones, virtualization, andi,
	hpa, Ingo Molnar, stefano stabellini, xen-devel, x86, mingo,
	habanero, riel, konrad wilk, ouyang, avi kivity, tglx,
	chegu vinod, gregkh, linux-kernel, srivatsa vaddagiri,
	attilio rao, torvalds

On 08/05/2013 07:35 PM, Paolo Bonzini wrote:
>>>> I guess you'd want to carry this in the KVM tree or so - maybe in a
>>>> separate branch because it changes Xen as well?
>>>
>>> It changes KVM host and guest side, XEN and common x86 spinlock code. I
>>> think it would be best to merge common x86 spinlock bits and guest side
>>> KVM/XEN bits through tip tree and host KVM part will go through KVM
>>> tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send
>>> two separate patch series one for the tip and one for KVM host side.
>>
>> Sure, that's fine - if the initial series works fine in isolation as well
>> (i.e. won't break anything).
>
> It would be a big problem if it didn't!  Raghavendra, please send the
> two separate series as Gleb explained above.
>

Yes. Sure.  The patches have been split in that way.

Only thing I am thinking is about KVM_FEATURE_PV_UNHALT, and 
KVM_HC_KICK_CPU definition in the below hunk, that is needed by guest
as well. may be this header file change can be a separate patch so that
duplicate can be handled easily during merge?

I do testing of all combination after splitting and post.

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
  #define KVM_FEATURE_ASYNC_PF           4
  #define KVM_FEATURE_STEAL_TIME         5
  #define KVM_FEATURE_PV_EOI             6
+#define KVM_FEATURE_PV_UNHALT          7

diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
  #define KVM_HC_MMU_OP                  2
  #define KVM_HC_FEATURES                        3
  #define KVM_HC_PPC_MAP_MAGIC_PAGE      4
+#define KVM_HC_KICK_CPU                        5

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05 14:39                                       ` Raghavendra K T
@ 2013-08-05 14:45                                         ` Paolo Bonzini
  2013-08-05 14:45                                         ` Paolo Bonzini
  1 sibling, 0 replies; 121+ messages in thread
From: Paolo Bonzini @ 2013-08-05 14:45 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Gleb Natapov, Ingo Molnar, mingo, x86, tglx, jeremy, konrad wilk,
	hpa, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano stabellini, andi, attilio rao, ouyang, gregkh, agraf,
	chegu vinod, torvalds, avi kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa vaddagiri

> Only thing I am thinking is about KVM_FEATURE_PV_UNHALT, and
> KVM_HC_KICK_CPU definition in the below hunk, that is needed by guest
> as well. may be this header file change can be a separate patch so that
> duplicate can be handled easily during merge?

Sure, good idea.

Paolo

> I do testing of all combination after splitting and post.

> diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 06fdbd9..94dc8ca 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -23,6 +23,7 @@
>   #define KVM_FEATURE_ASYNC_PF           4
>   #define KVM_FEATURE_STEAL_TIME         5
>   #define KVM_FEATURE_PV_EOI             6
> +#define KVM_FEATURE_PV_UNHALT          7
> 
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index cea2c5c..2841f86 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -19,6 +19,7 @@
>   #define KVM_HC_MMU_OP                  2
>   #define KVM_HC_FEATURES                        3
>   #define KVM_HC_PPC_MAP_MAGIC_PAGE      4
> +#define KVM_HC_KICK_CPU                        5
> 
> 
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05 14:39                                       ` Raghavendra K T
  2013-08-05 14:45                                         ` Paolo Bonzini
@ 2013-08-05 14:45                                         ` Paolo Bonzini
  1 sibling, 0 replies; 121+ messages in thread
From: Paolo Bonzini @ 2013-08-05 14:45 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, linux-doc, peterz, drjones, virtualization, andi, hpa,
	Ingo Molnar, stefano stabellini, xen-devel, kvm, x86, mingo,
	habanero, riel, konrad wilk, ouyang, avi kivity, tglx,
	chegu vinod, gregkh, linux-kernel, srivatsa vaddagiri,
	attilio rao, torvalds

> Only thing I am thinking is about KVM_FEATURE_PV_UNHALT, and
> KVM_HC_KICK_CPU definition in the below hunk, that is needed by guest
> as well. may be this header file change can be a separate patch so that
> duplicate can be handled easily during merge?

Sure, good idea.

Paolo

> I do testing of all combination after splitting and post.

> diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 06fdbd9..94dc8ca 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -23,6 +23,7 @@
>   #define KVM_FEATURE_ASYNC_PF           4
>   #define KVM_FEATURE_STEAL_TIME         5
>   #define KVM_FEATURE_PV_EOI             6
> +#define KVM_FEATURE_PV_UNHALT          7
> 
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index cea2c5c..2841f86 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -19,6 +19,7 @@
>   #define KVM_HC_MMU_OP                  2
>   #define KVM_HC_FEATURES                        3
>   #define KVM_HC_PPC_MAP_MAGIC_PAGE      4
> +#define KVM_HC_KICK_CPU                        5
> 
> 
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
  2013-08-05  9:46                                 ` Ingo Molnar
@ 2013-08-05 15:37                                   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 121+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-05 15:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gleb Natapov, Raghavendra K T, mingo, x86, tglx, jeremy, hpa,
	pbonzini, linux-doc, habanero, xen-devel, peterz, mtosatti,
	stefano.stabellini, andi, attilio.rao, ouyang, gregkh, agraf,
	chegu_vinod, torvalds, avi.kivity, kvm, linux-kernel, riel,
	drjones, virtualization, srivatsa.vaddagiri

On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > > > Ingo,
> > > > 
> > > > Do you have any concerns reg this series? please let me know if this 
> > > > looks good now to you.
> > > 
> > > I'm inclined to NAK it for excessive quotation - who knows how many 
> > > people left the discussion in disgust? Was it done to drive away as 
> > > many reviewers as possible?
> > > 
> > > Anyway, see my other reply, the measurement results seem hard to 
> > > interpret and inconclusive at the moment.
> >
> > That result was only for patch 18 of the series, not pvspinlock in 
> > general.
> 
> Okay - I've re-read the performance numbers and they are impressive, so no 
> objections from me.
> 
> The x86 impact seems to be a straightforward API change, with most of the 
> changes on the virtualization side. So:
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> I guess you'd want to carry this in the KVM tree or so - maybe in a 
> separate branch because it changes Xen as well?

May I suggest an alternate way - perhaps you can put them in a tip/spinlock
tree for v3.12 - since both KVM and Xen maintainers have acked and carefully
reviewed them?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
@ 2013-08-05 15:37                                   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 121+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-05 15:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jeremy, x86, linux-doc, peterz, drjones, virtualization, andi,
	hpa, xen-devel, kvm, Raghavendra K T, mingo, habanero, riel,
	stefano.stabellini, ouyang, avi.kivity, tglx, chegu_vinod,
	gregkh, linux-kernel, srivatsa.vaddagiri, attilio.rao, pbonzini,
	torvalds

On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
> > > > Ingo,
> > > > 
> > > > Do you have any concerns reg this series? please let me know if this 
> > > > looks good now to you.
> > > 
> > > I'm inclined to NAK it for excessive quotation - who knows how many 
> > > people left the discussion in disgust? Was it done to drive away as 
> > > many reviewers as possible?
> > > 
> > > Anyway, see my other reply, the measurement results seem hard to 
> > > interpret and inconclusive at the moment.
> >
> > That result was only for patch 18 of the series, not pvspinlock in 
> > general.
> 
> Okay - I've re-read the performance numbers and they are impressive, so no 
> objections from me.
> 
> The x86 impact seems to be a straightforward API change, with most of the 
> changes on the virtualization side. So:
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> I guess you'd want to carry this in the KVM tree or so - maybe in a 
> separate branch because it changes Xen as well?

May I suggest an alternate way - perhaps you can put them in a tip/spinlock
tree for v3.12 - since both KVM and Xen maintainers have acked and carefully
reviewed them?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
  2013-07-22  6:16 ` Raghavendra K T
@ 2013-08-05 22:50   ` H. Peter Anvin
  -1 siblings, 0 replies; 121+ messages in thread
From: H. Peter Anvin @ 2013-08-05 22:50 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: gleb, mingo, jeremy, x86, konrad.wilk, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

So, having read through the entire thread I *think* this is what the
status of this patchset is:

1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
   update split into two patchsets;
2. There are at least two versions of patch 15; I think the "PATCH
   RESEND RFC" is the right one.
3. Patch 18 is controversial but there are performance numbers; these
   should be integrated in the patch description.
4. People are in general OK with us putting this patchset into -tip for
   testing, once the updated (V12) patchset is posted.

If I'm misunderstanding something, it is because of excessive thread
length as mentioned by Ingo.

Either way, I'm going to hold off on putting it into -tip until tomorrow
unless Ingo beats me to it.

	-hpa


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-08-05 22:50   ` H. Peter Anvin
  0 siblings, 0 replies; 121+ messages in thread
From: H. Peter Anvin @ 2013-08-05 22:50 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, stefano.stabellini, xen-devel, x86, mingo, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

So, having read through the entire thread I *think* this is what the
status of this patchset is:

1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
   update split into two patchsets;
2. There are at least two versions of patch 15; I think the "PATCH
   RESEND RFC" is the right one.
3. Patch 18 is controversial but there are performance numbers; these
   should be integrated in the patch description.
4. People are in general OK with us putting this patchset into -tip for
   testing, once the updated (V12) patchset is posted.

If I'm misunderstanding something, it is because of excessive thread
length as mentioned by Ingo.

Either way, I'm going to hold off on putting it into -tip until tomorrow
unless Ingo beats me to it.

	-hpa

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
  2013-08-05 22:50   ` H. Peter Anvin
@ 2013-08-06  2:50     ` Raghavendra K T
  -1 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-06  2:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: gleb, mingo, jeremy, x86, konrad.wilk, pbonzini, linux-doc,
	habanero, xen-devel, peterz, mtosatti, stefano.stabellini, andi,
	attilio.rao, ouyang, gregkh, agraf, chegu_vinod, torvalds,
	avi.kivity, tglx, kvm, linux-kernel, riel, drjones,
	virtualization, srivatsa.vaddagiri

On 08/06/2013 04:20 AM, H. Peter Anvin wrote:
> So, having read through the entire thread I *think* this is what the
> status of this patchset is:
>
> 1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
>     update split into two patchsets;

Yes.  Only one patch would be common to both host and guest which will
be sent as a separate patch.
I 'll rebase first patchset to -next and second patchset to kvm tree as
needed.

> 2. There are at least two versions of patch 15; I think the "PATCH
>     RESEND RFC" is the right one.

True.

> 3. Patch 18 is controversial but there are performance numbers; these
>     should be integrated in the patch description.

Current plan is to drop for patch 18 for now.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-08-06  2:50     ` Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-08-06  2:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: jeremy, gregkh, kvm, linux-doc, peterz, drjones, virtualization,
	andi, stefano.stabellini, xen-devel, x86, mingo, habanero, riel,
	konrad.wilk, ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, pbonzini, torvalds

On 08/06/2013 04:20 AM, H. Peter Anvin wrote:
> So, having read through the entire thread I *think* this is what the
> status of this patchset is:
>
> 1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
>     update split into two patchsets;

Yes.  Only one patch would be common to both host and guest which will
be sent as a separate patch.
I 'll rebase first patchset to -next and second patchset to kvm tree as
needed.

> 2. There are at least two versions of patch 15; I think the "PATCH
>     RESEND RFC" is the right one.

True.

> 3. Patch 18 is controversial but there are performance numbers; these
>     should be integrated in the patch description.

Current plan is to drop for patch 18 for now.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks
@ 2013-07-22  6:16 Raghavendra K T
  0 siblings, 0 replies; 121+ messages in thread
From: Raghavendra K T @ 2013-07-22  6:16 UTC (permalink / raw)
  To: gleb, mingo, jeremy, x86, konrad.wilk, hpa, pbonzini
  Cc: gregkh, kvm, linux-doc, peterz, drjones, virtualization, andi,
	xen-devel, Raghavendra K T, habanero, riel, stefano.stabellini,
	ouyang, avi.kivity, tglx, chegu_vinod, linux-kernel,
	srivatsa.vaddagiri, attilio.rao, torvalds


This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

Changes in V11:
 - use safe_halt in lock_spinning path to avoid potential problem 
  in case of irq_handlers taking lock in slowpath (Gleb)
 - add a0 flag for the kick hypercall for future extension  (Gleb)
 - add stubs for missing architecture for kvm_vcpu_schedule() (Gleb)
 - Change hypercall documentation.
 - Rebased to 3.11-rc1

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that. 

Ticket locks have an inherent problem in a virtualized case, because
the vCPUs are scheduled rather than running concurrently (ignoring
gang scheduled vCPUs).  This can result in catastrophic performance
collapses when the vCPU scheduler doesn't schedule the correct "next"
vCPU, and ends up scheduling a vCPU which burns its entire timeslice
spinning.  (Note that this is not the same problem as lock-holder
preemption, which this series also addresses; that's also a problem,
but not catastrophic).

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

Currently we deal with this by having PV spinlocks, which adds a layer
of indirection in front of all the spinlock functions, and defining a
completely new implementation for Xen (and for other pvops users, but
there are none at present).

PV ticketlocks keeps the existing ticketlock implemenentation
(fastpath) as-is, but adds a couple of pvops for the slow paths:

- If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
  iterations, then call out to the __ticket_lock_spinning() pvop,
  which allows a backend to block the vCPU rather than spinning.  This
  pvop can set the lock into "slowpath state".

- When releasing a lock, if it is in "slowpath state", the call
  __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
  lock is no longer in contention, it also clears the slowpath flag.

The "slowpath state" is stored in the LSB of the within the lock tail
ticket.  This has the effect of reducing the max number of CPUs by
half (so, a "small ticket" can deal with 128 CPUs, and "large ticket"
32768).

For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Overall, it results in a large reduction in code, it makes the native
and virtualized cases closer, and it removes a layer of indirection
around all the spinlock functions.

The fast path (taking an uncontended lock which isn't in "slowpath"
state) is optimal, identical to the non-paravirtualized case.

The inner part of ticket lock code becomes:
	inc = xadd(&lock->tickets, inc);
	inc.tail &= ~TICKET_SLOWPATH_FLAG;

	if (likely(inc.head == inc.tail))
		goto out;
	for (;;) {
		unsigned count = SPIN_THRESHOLD;
		do {
			if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
				goto out;
			cpu_relax();
		} while (--count);
		__ticket_lock_spinning(lock, inc.tail);
	}
out:	barrier();
which results in:
	push   %rbp
	mov    %rsp,%rbp

	mov    $0x200,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f	# Slowpath if lock in contention

	pop    %rbp
	retq   

	### SLOWPATH START
1:	and    $-2,%edx
	movzbl %dl,%esi

2:	mov    $0x800,%eax
	jmp    4f

3:	pause  
	sub    $0x1,%eax
	je     5f

4:	movzbl (%rdi),%ecx
	cmp    %cl,%dl
	jne    3b

	pop    %rbp
	retq   

5:	callq  *__ticket_lock_spinning
	jmp    2b
	### SLOWPATH END

with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
the fastpath case is straight through (taking the lock without
contention), and the spin loop is out of line:

	push   %rbp
	mov    %rsp,%rbp

	mov    $0x100,%eax
	lock xadd %ax,(%rdi)
	movzbl %ah,%edx
	cmp    %al,%dl
	jne    1f

	pop    %rbp
	retq   

	### SLOWPATH START
1:	pause  
	movzbl (%rdi),%eax
	cmp    %dl,%al
	jne    1b

	pop    %rbp
	retq   
	### SLOWPATH END

The unlock code is complicated by the need to both add to the lock's
"head" and fetch the slowpath flag from "tail".  This version of the
patch uses a locked add to do this, followed by a test to see if the
slowflag is set.  The lock prefix acts as a full memory barrier, so we
can be sure that other CPUs will have seen the unlock before we read
the flag (without the barrier the read could be fetched from the
store queue before it hits memory, which could result in a deadlock).

This is is all unnecessary complication if you're not using PV ticket
locks, it also uses the jump-label machinery to use the standard
"add"-based unlock in the non-PV case.

	if (TICKET_SLOWPATH_FLAG &&
	     static_key_false(&paravirt_ticketlocks_enabled))) {
		arch_spinlock_t prev;
		prev = *lock;
		add_smp(&lock->tickets.head, TICKET_LOCK_INC);

		/* add_smp() is a full mb() */
		if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
			__ticket_unlock_slowpath(lock, prev);
	} else
		__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
which generates:
	push   %rbp
	mov    %rsp,%rbp

	nop5	# replaced by 5-byte jmp 2f when PV enabled

	# non-PV unlock
	addb   $0x2,(%rdi)

1:	pop    %rbp
	retq   

### PV unlock ###
2:	movzwl (%rdi),%esi	# Fetch prev

	lock addb $0x2,(%rdi)	# Do unlock

	testb  $0x1,0x1(%rdi)	# Test flag
	je     1b		# Finished if not set

### Slow path ###
	add    $2,%sil		# Add "head" in old lock state
	mov    %esi,%edx
	and    $0xfe,%dh	# clear slowflag for comparison
	movzbl %dh,%eax
	cmp    %dl,%al		# If head == tail (uncontended)
	je     4f		# clear slowpath flag

	# Kick next CPU waiting for lock
3:	movzbl %sil,%esi
	callq  *pv_lock_ops.kick

	pop    %rbp
	retq   

	# Lock no longer contended - clear slowflag
4:	mov    %esi,%eax
	lock cmpxchg %dx,(%rdi)	# cmpxchg to clear flag
	cmp    %si,%ax
	jne    3b		# If clear failed, then kick

	pop    %rbp
	retq   

So when not using PV ticketlocks, the unlock sequence just has a
5-byte nop added to it, and the PV case is reasonable straightforward
aside from requiring a "lock add".

Results:
=======
pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat.

For non PLE results are much better for smaller VMs. 
http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/01095.html

This series  with 3.11.rc1 as base is giving
28 to 50% improvement for ebizzy, 8 to 22% for dbench on  32 core machinne with
HT disabled with 32 vcpu guest. 

On 32cpu, 16core machine (HT on) with 16vcpu guests, results showed 1,3,61,77% improvement
 for .5x,1x,1.5x,2x respectively for ebizzy. dbench results were almost flat with -1% to +2%.

Your suggestions and comments are welcome.

github link: https://github.com/ktraghavendra/linux/tree/pvspinlock_v11

Please note that we set SPIN_THRESHOLD = 32k with this series,
that would eatup little bit of overcommit performance of PLE machines
and overall performance of non-PLE machines.

The older series[3] was tested by Attilio for Xen implementation.

Note that Konrad needs to revert below two patches to enable xen on hvm 
  70dd4998, f10cd522c

Jeremy Fitzhardinge (9):
 x86/spinlock: Replace pv spinlocks with pv ticketlocks
 x86/ticketlock: Collapse a layer of functions
 xen: Defer spinlock setup until boot CPU setup
 xen/pvticketlock: Xen implementation for PV ticket locks
 xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
 x86/pvticketlock: Use callee-save for lock_spinning
 x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
 x86/ticketlock: Add slowpath logic
 xen/pvticketlock: Allow interrupts to be enabled while blocking

Andrew Jones (1):
 jump_label: Split jumplabel ratelimit

Srivatsa Vaddagiri (3):
 kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
 kvm guest : Add configuration support to enable debug information for KVM Guests
 kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

Raghavendra K T (5):
 x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
 kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
 kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
 Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
 kvm hypervisor: Add directed yield in vcpu block path

---
Link in V8 has links to previous patch series and also whole history.

[1]. V10 PV Ticketspinlock for Xen/KVM link: https://lkml.org/lkml/2013/6/24/252
[2]. V9 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2013/6/1/168
[3]. V8 PV Ticketspinlock for Xen/KVM link:  https://lkml.org/lkml/2012/5/2/119

 Documentation/virtual/kvm/cpuid.txt      |   4 +
 Documentation/virtual/kvm/hypercalls.txt |  14 ++
 arch/arm/include/asm/kvm_host.h          |   5 +
 arch/arm64/include/asm/kvm_host.h        |   5 +
 arch/ia64/include/asm/kvm_host.h         |   5 +
 arch/mips/include/asm/kvm_host.h         |   5 +
 arch/powerpc/include/asm/kvm_host.h      |   5 +
 arch/s390/include/asm/kvm_host.h         |   5 +
 arch/x86/Kconfig                         |  10 +
 arch/x86/include/asm/kvm_host.h          |   7 +-
 arch/x86/include/asm/kvm_para.h          |  14 +-
 arch/x86/include/asm/paravirt.h          |  32 +--
 arch/x86/include/asm/paravirt_types.h    |  10 +-
 arch/x86/include/asm/spinlock.h          | 128 ++++++----
 arch/x86/include/asm/spinlock_types.h    |  16 +-
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/kvm.c                    | 259 +++++++++++++++++++++
 arch/x86/kernel/paravirt-spinlocks.c     |  18 +-
 arch/x86/kvm/cpuid.c                     |   3 +-
 arch/x86/kvm/lapic.c                     |   5 +-
 arch/x86/kvm/x86.c                       |  39 +++-
 arch/x86/xen/smp.c                       |   2 +-
 arch/x86/xen/spinlock.c                  | 387 ++++++++++---------------------
 include/linux/jump_label.h               |  26 +--
 include/linux/jump_label_ratelimit.h     |  34 +++
 include/linux/kvm_host.h                 |   2 +-
 include/linux/perf_event.h               |   1 +
 include/uapi/linux/kvm_para.h            |   1 +
 kernel/jump_label.c                      |   1 +
 virt/kvm/kvm_main.c                      |   6 +-
 30 files changed, 665 insertions(+), 385 deletions(-)

^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2013-08-06  2:50 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22  6:16 [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks Raghavendra K T
2013-07-22  6:16 ` Raghavendra K T
2013-07-22  6:16 ` [PATCH RFC V11 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks Raghavendra K T
2013-07-22  6:16   ` Raghavendra K T
2013-07-22  6:16   ` Raghavendra K T
2013-07-22  6:17 ` [PATCH RFC V11 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17 ` Raghavendra K T
2013-07-22  6:17 ` [PATCH RFC V11 3/18] x86/ticketlock: Collapse a layer of functions Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17 ` [PATCH RFC V11 4/18] xen: Defer spinlock setup until boot CPU setup Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17 ` [PATCH RFC V11 5/18] xen/pvticketlock: Xen implementation for PV ticket locks Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17 ` [PATCH RFC V11 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:17   ` Raghavendra K T
2013-07-22  6:18 ` [PATCH RFC V11 7/18] x86/pvticketlock: Use callee-save for lock_spinning Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18 ` [PATCH RFC V11 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2 Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18 ` [PATCH RFC V11 9/18] jump_label: Split out rate limiting from jump_label.h Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18 ` [PATCH RFC V11 10/18] x86/ticketlock: Add slowpath logic Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:18   ` Raghavendra K T
2013-07-22  6:19 ` [PATCH RFC V11 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking Raghavendra K T
2013-07-22  6:19 ` Raghavendra K T
2013-07-22  6:19   ` Raghavendra K T
2013-07-22  6:19 ` [PATCH RFC V11 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2013-07-22  6:19   ` Raghavendra K T
2013-07-22  6:19   ` Raghavendra K T
2013-07-22  6:19 ` [PATCH RFC V11 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration Raghavendra K T
2013-07-22  6:19   ` Raghavendra K T
2013-07-22  6:19   ` Raghavendra K T
2013-07-22  6:20 ` [PATCH RFC V11 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20 ` [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-23 15:07   ` Gleb Natapov
2013-07-23 15:07     ` Gleb Natapov
2013-07-24  9:24     ` [PATCH RESEND " Raghavendra K T
2013-07-24  9:24       ` Raghavendra K T
2013-07-24  9:45     ` [PATCH " Raghavendra K T
2013-07-24  9:45       ` Raghavendra K T
2013-07-24 10:39       ` Gleb Natapov
2013-07-24 10:39         ` Gleb Natapov
2013-07-24 12:00         ` Raghavendra K T
2013-07-24 12:00         ` Raghavendra K T
2013-07-24 12:06           ` Gleb Natapov
2013-07-24 12:06           ` Gleb Natapov
2013-07-24 12:36             ` Raghavendra K T
2013-07-24 12:36             ` Raghavendra K T
2013-07-25  9:17               ` Raghavendra K T
2013-07-25  9:17                 ` Raghavendra K T
2013-07-25  9:15                 ` Gleb Natapov
2013-07-25  9:15                   ` Gleb Natapov
2013-07-25  9:38                   ` Raghavendra K T
2013-07-25  9:38                     ` Raghavendra K T
2013-07-30 16:43                     ` Raghavendra K T
2013-07-30 16:43                       ` Raghavendra K T
2013-07-31  6:24                       ` Gleb Natapov
2013-07-31  6:24                         ` Gleb Natapov
2013-08-01  7:38                         ` Raghavendra K T
2013-08-01  7:38                           ` Raghavendra K T
2013-08-01  7:45                           ` Gleb Natapov
2013-08-01  7:45                             ` Gleb Natapov
2013-08-01  9:04                             ` Raghavendra K T
2013-08-02  3:22                               ` Raghavendra K T
2013-08-02  3:22                                 ` Raghavendra K T
2013-08-02  9:23                                 ` Ingo Molnar
2013-08-02  9:23                                   ` Ingo Molnar
2013-08-02  9:44                                   ` Raghavendra K T
2013-08-02  9:44                                     ` Raghavendra K T
2013-08-01  9:04                             ` Raghavendra K T
2013-08-02  9:25                           ` Ingo Molnar
2013-08-02  9:25                             ` Ingo Molnar
2013-08-02  9:54                             ` Gleb Natapov
2013-08-02  9:54                               ` Gleb Natapov
2013-08-02 10:57                               ` Raghavendra K T
2013-08-02 10:57                                 ` Raghavendra K T
2013-08-05  9:46                               ` Ingo Molnar
2013-08-05  9:46                                 ` Ingo Molnar
2013-08-05 10:42                                 ` Raghavendra K T
2013-08-05 10:42                                   ` Raghavendra K T
     [not found]                                 ` <20130805095901.GL2258@redhat.com>
2013-08-05 13:52                                   ` Ingo Molnar
2013-08-05 14:05                                     ` Paolo Bonzini
2013-08-05 14:05                                       ` Paolo Bonzini
2013-08-05 14:39                                       ` Raghavendra K T
2013-08-05 14:45                                         ` Paolo Bonzini
2013-08-05 14:45                                         ` Paolo Bonzini
2013-08-05 14:39                                       ` Raghavendra K T
2013-08-05 13:52                                   ` Ingo Molnar
2013-08-05 15:37                                 ` Konrad Rzeszutek Wilk
2013-08-05 15:37                                   ` Konrad Rzeszutek Wilk
2013-07-22  6:20 ` [PATCH RFC V11 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20 ` [PATCH RFC V11 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20 ` [PATCH RFC V11 18/18] kvm hypervisor: Add directed yield in vcpu block path Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22  6:20   ` Raghavendra K T
2013-07-22 19:36 ` [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks Konrad Rzeszutek Wilk
2013-07-22 19:36   ` Konrad Rzeszutek Wilk
2013-07-23  2:50   ` Raghavendra K T
2013-07-23  2:50     ` Raghavendra K T
2013-08-05 22:50 ` H. Peter Anvin
2013-08-05 22:50   ` H. Peter Anvin
2013-08-06  2:50   ` Raghavendra K T
2013-08-06  2:50     ` Raghavendra K T
2013-07-22  6:16 Raghavendra K T

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.