From: Waiman Long <Waiman.Long@hp.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
xen-devel@lists.xenproject.org, kvm@vger.kernel.org,
Paolo Bonzini <paolo.bonzini@gmail.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
David Vrabel <david.vrabel@citrix.com>,
Oleg Nesterov <oleg@redhat.com>, Gleb Natapov <gleb@redhat.com>,
Scott J Norton <scott.norton@hp.com>,
Chegu Vinod <chegu_vinod@hp.com>,
Waiman Long <Waiman.Long@hp.com>
Subject: [PATCH v11 15/16] pvqspinlock, x86: Enable PV qspinlock PV for KVM
Date: Fri, 30 May 2014 11:44:01 -0400 [thread overview]
Message-ID: <1401464642-33890-16-git-send-email-Waiman.Long@hp.com> (raw)
In-Reply-To: <1401464642-33890-1-git-send-email-Waiman.Long@hp.com>
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Two KVM guests of 20 CPU cores (2 nodes) were created for performance
testing in one of the following three configurations:
1) Only 1 VM is active
2) Both VMs are active and they share the same 20 physical CPUs
(200% overcommit)
The tests run included the disk workload of the AIM7 benchmark on both
ext4 and xfs RAM disks at 3000 users on a 3.15-rc7 based kernel. The
"ebizzy -m" test was was also run and its performance data were
recorded. With two VMs running, the "idle=poll" kernel option was
added to simulate a busy guest. The entry "unfair + PV qspinlock"
below means that both the unfair lock and PV spinlock configuration
options were turned on.
AIM7 XFS Disk Test (no overcommit)
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
PV ticketlock 2521008 7.24 101.02 5.24
qspinlock 2571429 7.00 99.10 5.49
PV qspinlock 2535211 7.10 100.32 5.45
unfair qspinlock 2571429 7.00 99.25 5.40
unfair + PV qspinlock 2549575 7.06 99.81 5.31
AIM7 XFS Disk Test (200% overcommit)
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
PV ticketlock 768902 23.41 341.71 3.07
qspinlock 784656 22.94 346.22 2.90
PV qspinlock 773861 23.26 352.47 2.30
unfair qspinlock 835655 21.54 316.52 1.57
unfair + PV qspinlock 797165 22.58 323.95 3.58
AIM7 EXT4 Disk Test (no overcommit)
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
PV ticketlock 1956522 9.20 106.58 5.35
qspinlock 1995565 9.02 103.19 5.37
PV qspinlock 1958651 9.19 106.57 5.30
unfair qspinlock 2022472 8.90 103.58 5.37
unfair + PV qspinlock 1991150 9.04 104.41 5.46
AIM7 EXT4 Disk Test (200% overcommit)
kernel JPM Real Time Sys Time Usr Time
----- --- --------- -------- --------
PV ticketlock 576553 31.22 407.44 1.51
qspinlock 609550 29.53 407.14 1.69
PV qspinlock 592105 30.40 410.51 1.67
unfair qspinlock 672897 26.75 359.78 1.66
unfair + PV qspinlock 670391 26.85 357.09 0.63
EBIZZY-M Test (no overcommit)
kernel Rec/s Real Time Sys Time Usr Time
----- ----- --------- -------- --------
PV ticketlock 1328 10.00 82.82 1.46
qspinlock 1679 10.00 65.37 1.80
PV qspinlock 1470 10.00 75.54 1.54
unfair qspinlock 1518 10.00 70.80 1.71
unfair + PV qspinlock 1585 10.00 69.02 1.76
EBIZZY-M Test (200% overcommit)
kernel Rec/s Real Time Sys Time Usr Time
----- ----- --------- -------- --------
PV ticketlock 453 10.00 77.11 0.00
qspinlock 459 10.00 77.50 0.00
PV qspinlock 402 10.00 91.55 0.00
unfair qspinlock 570 10.00 62.98 0.00
unfair + PV qspinlock 586 10.00 59.68 0.00
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Tested-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
arch/x86/kernel/kvm.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++
kernel/Kconfig.locks | 2 +-
2 files changed, 136 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7ab8ab3..eef427b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -567,6 +567,7 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
}
+#ifndef CONFIG_QUEUE_SPINLOCK
enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -794,6 +795,134 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
}
}
}
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
+#ifdef CONFIG_KVM_DEBUG_FS
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+static u32 kick_nohlt_stats; /* Kick but not halt count */
+static u32 halt_qhead_stats; /* Queue head halting count */
+static u32 halt_qnode_stats; /* Queue node halting count */
+static u32 halt_abort_stats; /* Halting abort count */
+static u32 wake_kick_stats; /* Wakeup by kicking count */
+static u32 wake_spur_stats; /* Spurious wakeup count */
+static u64 time_blocked; /* Total blocking time */
+
+static int __init kvm_spinlock_debugfs(void)
+{
+ d_kvm_debug = debugfs_create_dir("kvm-guest", NULL);
+ if (!d_kvm_debug) {
+ printk(KERN_WARNING
+ "Could not create 'kvm' debugfs directory\n");
+ return -ENOMEM;
+ }
+ d_spin_debug = debugfs_create_dir("spinlocks", d_kvm_debug);
+
+ debugfs_create_u32("kick_nohlt_stats",
+ 0644, d_spin_debug, &kick_nohlt_stats);
+ debugfs_create_u32("halt_qhead_stats",
+ 0644, d_spin_debug, &halt_qhead_stats);
+ debugfs_create_u32("halt_qnode_stats",
+ 0644, d_spin_debug, &halt_qnode_stats);
+ debugfs_create_u32("halt_abort_stats",
+ 0644, d_spin_debug, &halt_abort_stats);
+ debugfs_create_u32("wake_kick_stats",
+ 0644, d_spin_debug, &wake_kick_stats);
+ debugfs_create_u32("wake_spur_stats",
+ 0644, d_spin_debug, &wake_spur_stats);
+ debugfs_create_u64("time_blocked",
+ 0644, d_spin_debug, &time_blocked);
+ return 0;
+}
+
+static inline void kvm_halt_stats(enum pv_lock_stats type)
+{
+ if (type == PV_HALT_QHEAD)
+ add_smp(&halt_qhead_stats, 1);
+ else if (type == PV_HALT_QNODE)
+ add_smp(&halt_qnode_stats, 1);
+ else /* type == PV_HALT_ABORT */
+ add_smp(&halt_abort_stats, 1);
+}
+
+static inline void kvm_lock_stats(enum pv_lock_stats type)
+{
+ if (type == PV_WAKE_KICKED)
+ add_smp(&wake_kick_stats, 1);
+ else if (type == PV_WAKE_SPURIOUS)
+ add_smp(&wake_spur_stats, 1);
+ else /* type == PV_KICK_NOHALT */
+ add_smp(&kick_nohlt_stats, 1);
+}
+
+static inline u64 spin_time_start(void)
+{
+ return sched_clock();
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+ u64 delta;
+
+ delta = sched_clock() - start;
+ add_smp(&time_blocked, delta);
+}
+
+fs_initcall(kvm_spinlock_debugfs);
+
+#else /* CONFIG_KVM_DEBUG_FS */
+static inline void kvm_halt_stats(enum pv_lock_stats type)
+{
+}
+
+static inline void kvm_lock_stats(enum pv_lock_stats type)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+ return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif /* CONFIG_KVM_DEBUG_FS */
+
+/*
+ * Halt the current CPU & release it back to the host
+ */
+static void kvm_halt_cpu(enum pv_lock_stats type, s8 *state, s8 sval)
+{
+ unsigned long flags;
+ u64 start;
+
+ if (in_nmi())
+ return;
+
+ /*
+ * Make sure an interrupt handler can't upset things in a
+ * partially setup state.
+ */
+ local_irq_save(flags);
+ /*
+ * Don't halt if the CPU state has been changed.
+ */
+ if (ACCESS_ONCE(*state) != sval) {
+ kvm_halt_stats(PV_HALT_ABORT);
+ goto out;
+ }
+ start = spin_time_start();
+ kvm_halt_stats(type);
+ if (arch_irqs_disabled_flags(flags))
+ halt();
+ else
+ safe_halt();
+ spin_time_accum_blocked(start);
+out:
+ local_irq_restore(flags);
+}
+#endif /* !CONFIG_QUEUE_SPINLOCK */
/*
* Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
@@ -806,8 +935,14 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
+#ifdef CONFIG_QUEUE_SPINLOCK
+ pv_lock_ops.kick_cpu = kvm_kick_cpu;
+ pv_lock_ops.halt_cpu = kvm_halt_cpu;
+ pv_lock_ops.lockstat = kvm_lock_stats;
+#else
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
pv_lock_ops.unlock_kick = kvm_unlock_kick;
+#endif
}
static __init int kvm_spinlock_init_jump(void)
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index f185584..a70fdeb 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -229,4 +229,4 @@ config ARCH_USE_QUEUE_SPINLOCK
config QUEUE_SPINLOCK
def_bool y if ARCH_USE_QUEUE_SPINLOCK
- depends on SMP && !PARAVIRT_SPINLOCKS
+ depends on SMP && (!PARAVIRT_SPINLOCKS || !XEN)
--
1.7.1
next prev parent reply other threads:[~2014-05-30 15:47 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-30 15:43 [PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support Waiman Long
2014-05-30 15:43 ` [PATCH v11 01/16] qspinlock: A simple generic 4-byte queue spinlock Waiman Long
2014-05-30 15:43 ` [PATCH v11 02/16] qspinlock, x86: Enable x86-64 to use " Waiman Long
2014-05-30 15:43 ` [PATCH v11 03/16] qspinlock: Add pending bit Waiman Long
2014-05-30 15:43 ` [PATCH v11 04/16] qspinlock: Extract out the exchange of tail code word Waiman Long
2014-05-30 15:43 ` [PATCH v11 05/16] qspinlock: Optimize for smaller NR_CPUS Waiman Long
2014-05-30 15:43 ` [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path Waiman Long
2014-06-11 10:26 ` Peter Zijlstra
2014-06-11 21:22 ` Long, Wai Man
2014-06-12 6:00 ` Peter Zijlstra
2014-06-12 20:54 ` Waiman Long
2014-06-15 13:12 ` Peter Zijlstra
2014-05-30 15:43 ` [PATCH v11 07/16] qspinlock: Use a simple write to grab the lock, if applicable Waiman Long
2014-05-30 15:43 ` [PATCH v11 08/16] qspinlock: Prepare for unfair lock support Waiman Long
2014-05-30 15:43 ` [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest Waiman Long
2014-06-11 10:54 ` Peter Zijlstra
2014-06-11 11:38 ` Peter Zijlstra
2014-06-12 1:37 ` Long, Wai Man
2014-06-12 5:50 ` Peter Zijlstra
2014-06-12 21:08 ` Waiman Long
2014-06-15 13:14 ` Peter Zijlstra
2014-05-30 15:43 ` [PATCH v11 10/16] qspinlock: Split the MCS queuing code into a separate slowerpath Waiman Long
2014-05-30 15:43 ` [PATCH v11 11/16] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled Waiman Long
2014-05-30 15:43 ` [PATCH v11 12/16] pvqspinlock, x86: Add PV data structure & methods Waiman Long
2014-05-30 15:43 ` [PATCH v11 13/16] pvqspinlock: Enable coexistence with the unfair lock Waiman Long
2014-05-30 15:44 ` [PATCH v11 14/16] pvqspinlock: Add qspinlock para-virtualization support Waiman Long
2014-06-12 8:17 ` Peter Zijlstra
2014-06-12 20:48 ` Waiman Long
2014-06-15 13:16 ` Peter Zijlstra
2014-06-17 20:59 ` Konrad Rzeszutek Wilk
2014-05-30 15:44 ` Waiman Long [this message]
2014-05-30 15:44 ` [PATCH v11 16/16] pvqspinlock, x86: Enable PV qspinlock for XEN Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1401464642-33890-16-git-send-email-Waiman.Long@hp.com \
--to=waiman.long@hp.com \
--cc=boris.ostrovsky@oracle.com \
--cc=chegu_vinod@hp.com \
--cc=david.vrabel@citrix.com \
--cc=gleb@redhat.com \
--cc=hpa@zytor.com \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=paolo.bonzini@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).