All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code
@ 2016-02-17 19:14 Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

v2
 - Different ordering; the important fix is first, which makes it easier
   to backport as improvements of [1/14] have complex dependencies.
 - Correct choice of atomics in [2/14].
 - Explicit SMP barriers for lockless updates in [3/14].
 - Protection against userspace races in kvm_vm_ioctl_reinject, [8/14].
 - New code churn.  (I tried to annotate locking, which required me to
   really look at the code and I couldn't leave it alone at that point;
   sorry.)

This series only works with the discard policy, do you want to "fix" NMI
in the delay policy as well?
(NMI delivery is still is going to be wrong by any standard, but will
 make some sense, at the cost of ugly code:  we would always inject NMI
 when the timer fires and suppress NMI injection on EOI reinject.)

Anatomy of the series:
 [1/14] fixes legacy NMI watchdog under discard policy.
 [2-7/14] prepare for optimization of the discard policy.
 [8/14] optimizes discard policy by removing notifiers.
 [9-14/14] slightly improve related code.

I'm ok with dropping patches [2-14/14].

v1: http://www.spinics.net/lists/kvm/msg127017.html

Radim Krčmář (14):
  KVM: x86: change PIT discard tick policy
  KVM: x86: simplify atomics in kvm_pit_ack_irq
  KVM: x86: add kvm_pit_reset_reinject
  KVM: x86: use atomic_t instead of pit.inject_lock
  KVM: x86: tone down WARN_ON pit.state_lock
  KVM: x86: pass struct kvm_pit instead of kvm in PIT
  KVM: x86: remove unnecessary uses of PIT state lock
  KVM: x86: remove notifiers from PIT discard policy
  KVM: x86: refactor kvm_create_pit
  KVM: x86: refactor kvm_free_pit
  KVM: x86: remove pit and kvm from kvm_kpit_state
  KVM: x86: remove pointless dereference of PIT
  KVM: x86: don't assume layout of kvm_kpit_state
  KVM: x86: move PIT timer function initialization

 arch/x86/kvm/i8254.c | 318 +++++++++++++++++++++++----------------------------
 arch/x86/kvm/i8254.h |  15 +--
 arch/x86/kvm/x86.c   |  52 ++++++---
 3 files changed, 187 insertions(+), 198 deletions(-)

-- 
2.7.1

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-18 16:13   ` Paolo Bonzini
  2016-02-18 18:49   ` Paolo Bonzini
  2016-02-17 19:14 ` [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq Radim Krčmář
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.

This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.

The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.

Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC.  (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)

Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.

Cc: <stable@vger.kernel.org>
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: resolve dependencies to make it the first patch

 arch/x86/kvm/i8254.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index b0ea42b78ccd..ab5318727579 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -245,7 +245,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 		 * PIC is being reset.  Handle it gracefully here
 		 */
 		atomic_inc(&ps->pending);
-	else if (value > 0)
+	else if (value > 0 && ps->reinject)
 		/* in this case, we had multiple outstanding pit interrupts
 		 * that we needed to inject.  Reinject
 		 */
@@ -288,7 +288,9 @@ static void pit_do_work(struct kthread_work *work)
 	 * last one has been acked.
 	 */
 	spin_lock(&ps->inject_lock);
-	if (ps->irq_ack) {
+	if (!ps->reinject)
+		inject = 1;
+	else if (ps->irq_ack) {
 		ps->irq_ack = 0;
 		inject = 1;
 	}
@@ -317,10 +319,10 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 	struct kvm_kpit_state *ps = container_of(data, struct kvm_kpit_state, timer);
 	struct kvm_pit *pt = ps->kvm->arch.vpit;
 
-	if (ps->reinject || !atomic_read(&ps->pending)) {
+	if (ps->reinject)
 		atomic_inc(&ps->pending);
-		queue_kthread_work(&pt->worker, &pt->expired);
-	}
+
+	queue_kthread_work(&pt->worker, &pt->expired);
 
 	if (ps->is_periodic) {
 		hrtimer_add_expires_ns(&ps->timer, ps->period);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-18 18:04   ` Paolo Bonzini
  2016-02-17 19:14 ` [PATCH v2 03/14] KVM: x86: add kvm_pit_reset_reinject Radim Krčmář
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

We already have a helper that does the same thing.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: The pending == 1 means there's only us => don't reinject.

 arch/x86/kvm/i8254.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index ab5318727579..7d694ac7f4a4 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -236,19 +236,9 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
 	struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 						 irq_ack_notifier);
-	int value;
 
 	spin_lock(&ps->inject_lock);
-	value = atomic_dec_return(&ps->pending);
-	if (value < 0)
-		/* spurious acks can be generated if, for example, the
-		 * PIC is being reset.  Handle it gracefully here
-		 */
-		atomic_inc(&ps->pending);
-	else if (value > 0 && ps->reinject)
-		/* in this case, we had multiple outstanding pit interrupts
-		 * that we needed to inject.  Reinject
-		 */
+	if (atomic_dec_if_positive(&ps->pending) > 0 && ps->reinject)
 		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
 	ps->irq_ack = 1;
 	spin_unlock(&ps->inject_lock);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 03/14] KVM: x86: add kvm_pit_reset_reinject
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 04/14] KVM: x86: use atomic_t instead of pit.inject_lock Radim Krčmář
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

pit_state.pending and pit_state.irq_ack are always reset at the same
time.  Create a function for them.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: split from a bigger patch.

 arch/x86/kvm/i8254.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 7d694ac7f4a4..bdbb3f076e72 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -321,6 +321,12 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 		return HRTIMER_NORESTART;
 }
 
+static inline void kvm_pit_reset_reinject(struct kvm_pit *pit)
+{
+	atomic_set(&pit->pit_state.pending, 0);
+	pit->pit_state.irq_ack = 1;
+}
+
 static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 {
 	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
@@ -343,8 +349,7 @@ static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 	ps->timer.function = pit_timer_fn;
 	ps->kvm = ps->pit->kvm;
 
-	atomic_set(&ps->pending, 0);
-	ps->irq_ack = 1;
+	kvm_pit_reset_reinject(ps->pit);
 
 	/*
 	 * Do not allow the guest to program periodic timers with small
@@ -644,18 +649,15 @@ void kvm_pit_reset(struct kvm_pit *pit)
 	}
 	mutex_unlock(&pit->pit_state.lock);
 
-	atomic_set(&pit->pit_state.pending, 0);
-	pit->pit_state.irq_ack = 1;
+	kvm_pit_reset_reinject(pit);
 }
 
 static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
 {
 	struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier);
 
-	if (!mask) {
-		atomic_set(&pit->pit_state.pending, 0);
-		pit->pit_state.irq_ack = 1;
-	}
+	if (!mask)
+		kvm_pit_reset_reinject(pit);
 }
 
 static const struct kvm_io_device_ops pit_dev_ops = {
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 04/14] KVM: x86: use atomic_t instead of pit.inject_lock
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (2 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 03/14] KVM: x86: add kvm_pit_reset_reinject Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 05/14] KVM: x86: tone down WARN_ON pit.state_lock Radim Krčmář
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

The lock was an overkill, the same can be done with atomics.

A mb() was added in kvm_pit_ack_irq, to pair with implicit barrier
between pit_timer_fn and pit_do_work.  The mb() prevents a race that
could happen if pending == 0 and irq_ack == 0:

  kvm_pit_ack_irq:                | pit_timer_fn:
   p = atomic_read(&ps->pending); |
                                  |  atomic_inc(&ps->pending);
                                  |  queue_work(pit_do_work);
                                  | pit_do_work:
                                  |  atomic_xchg(&ps->irq_ack, 0);
                                  |  return;
   atomic_set(&ps->irq_ack, 1);   |
   if (p == 0) return;            |

where the interrupt would not be delivered in this tick of pit_timer_fn.
PIT would have eventually delivered the interrupt, but we sacrifice
perofmance to make sure that interrupts are not needlessly delayed.

sfence isn't enough: atomic_dec_if_positive does atomic_read first and
x86 can reorder loads before stores.  lfence isn't enough: store can
pass lfence, turning it into a nop.
A compiler barrier would be more than enough as fences would matter only
if the CPU stalled for unbelievably long.

This patch doesn't do anything in kvm_pit_reset_reinject, because any
order of resets can race, but the result differs by at most one
interrupt, which is ok, because it's the same result as if the reset
happened at a slightly different time.  (Original code didn't protect
the reset path with a proper lock, so users have to be robust.)

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * Add and comment memory barriers.  [Paolo]
 * Adapt to new contexts.

 arch/x86/kvm/i8254.c | 56 +++++++++++++++++++++-------------------------------
 arch/x86/kvm/i8254.h |  3 +--
 2 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index bdbb3f076e72..cbb911728ac8 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -237,11 +237,13 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 	struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 						 irq_ack_notifier);
 
-	spin_lock(&ps->inject_lock);
+	atomic_set(&ps->irq_ack, 1);
+	/* irq_ack should be set before pending is read.  Order accesses with
+	 * inc(pending) in pit_timer_fn and xchg(irq_ack, 0) in pit_do_work.
+	 */
+	smp_mb();
 	if (atomic_dec_if_positive(&ps->pending) > 0 && ps->reinject)
 		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
-	ps->irq_ack = 1;
-	spin_unlock(&ps->inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -272,36 +274,25 @@ static void pit_do_work(struct kthread_work *work)
 	struct kvm_vcpu *vcpu;
 	int i;
 	struct kvm_kpit_state *ps = &pit->pit_state;
-	int inject = 0;
 
-	/* Try to inject pending interrupts when
-	 * last one has been acked.
+	if (ps->reinject && !atomic_xchg(&ps->irq_ack, 0))
+		return;
+
+	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
+	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
+
+	/*
+	 * Provides NMI watchdog support via Virtual Wire mode.
+	 * The route is: PIT -> LVT0 in NMI mode.
+	 *
+	 * Note: Our Virtual Wire implementation does not follow
+	 * the MP specification.  We propagate a PIT interrupt to all
+	 * VCPUs and only when LVT0 is in NMI mode.  The interrupt can
+	 * also be simultaneously delivered through PIC and IOAPIC.
 	 */
-	spin_lock(&ps->inject_lock);
-	if (!ps->reinject)
-		inject = 1;
-	else if (ps->irq_ack) {
-		ps->irq_ack = 0;
-		inject = 1;
-	}
-	spin_unlock(&ps->inject_lock);
-	if (inject) {
-		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
-		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
-
-		/*
-		 * Provides NMI watchdog support via Virtual Wire mode.
-		 * The route is: PIT -> PIC -> LVT0 in NMI mode.
-		 *
-		 * Note: Our Virtual Wire implementation is simplified, only
-		 * propagating PIT interrupts to all VCPUs when they have set
-		 * LVT0 to NMI delivery. Other PIC interrupts are just sent to
-		 * VCPU0, and only if its LVT0 is in EXTINT mode.
-		 */
-		if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
-			kvm_for_each_vcpu(i, vcpu, kvm)
-				kvm_apic_nmi_wd_deliver(vcpu);
-	}
+	if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0)
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			kvm_apic_nmi_wd_deliver(vcpu);
 }
 
 static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
@@ -324,7 +315,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 static inline void kvm_pit_reset_reinject(struct kvm_pit *pit)
 {
 	atomic_set(&pit->pit_state.pending, 0);
-	pit->pit_state.irq_ack = 1;
+	atomic_set(&pit->pit_state.irq_ack, 1);
 }
 
 static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
@@ -691,7 +682,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 
 	mutex_init(&pit->pit_state.lock);
 	mutex_lock(&pit->pit_state.lock);
-	spin_lock_init(&pit->pit_state.inject_lock);
 
 	pid = get_pid(task_tgid(current));
 	pid_nr = pid_vnr(pid);
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index c84990b42b5b..f8cf4b84f435 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -33,8 +33,7 @@ struct kvm_kpit_state {
 	u32    speaker_data_on;
 	struct mutex lock;
 	struct kvm_pit *pit;
-	spinlock_t inject_lock;
-	unsigned long irq_ack;
+	atomic_t irq_ack;
 	struct kvm_irq_ack_notifier irq_ack_notifier;
 };
 
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 05/14] KVM: x86: tone down WARN_ON pit.state_lock
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (3 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 04/14] KVM: x86: use atomic_t instead of pit.inject_lock Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 06/14] KVM: x86: pass struct kvm_pit instead of kvm in PIT Radim Krčmář
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

A guest could hang the host kernel if it could hit the WARN_ON, because
of sheer number of those reports.  Internal callers have to be sensible
anyway, so we now only check for it in an API function.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index cbb911728ac8..328b21f3ab7c 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -76,8 +76,6 @@ static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
 	struct kvm_kpit_channel_state *c =
 		&kvm->arch.vpit->pit_state.channels[channel];
 
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	switch (c->mode) {
 	default:
 	case 0:
@@ -99,8 +97,6 @@ static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
 
 static int pit_get_gate(struct kvm *kvm, int channel)
 {
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	return kvm->arch.vpit->pit_state.channels[channel].gate;
 }
 
@@ -144,8 +140,6 @@ static int pit_get_count(struct kvm *kvm, int channel)
 	s64 d, t;
 	int counter;
 
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	t = kpit_elapsed(kvm, c, channel);
 	d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
@@ -174,8 +168,6 @@ static int pit_get_out(struct kvm *kvm, int channel)
 	s64 d, t;
 	int out;
 
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	t = kpit_elapsed(kvm, c, channel);
 	d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
@@ -207,8 +199,6 @@ static void pit_latch_count(struct kvm *kvm, int channel)
 	struct kvm_kpit_channel_state *c =
 		&kvm->arch.vpit->pit_state.channels[channel];
 
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	if (!c->count_latched) {
 		c->latched_count = pit_get_count(kvm, channel);
 		c->count_latched = c->rw_mode;
@@ -220,8 +210,6 @@ static void pit_latch_status(struct kvm *kvm, int channel)
 	struct kvm_kpit_channel_state *c =
 		&kvm->arch.vpit->pit_state.channels[channel];
 
-	WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
-
 	if (!c->status_latched) {
 		/* TODO: Return NULL COUNT (bit 6). */
 		c->status = ((pit_get_out(kvm, channel) << 7) |
@@ -367,8 +355,6 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
 {
 	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
 
-	WARN_ON(!mutex_is_locked(&ps->lock));
-
 	pr_debug("load_count val is %d, channel is %d\n", val, channel);
 
 	/*
@@ -406,6 +392,9 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
 void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start)
 {
 	u8 saved_mode;
+
+	WARN_ON_ONCE(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
+
 	if (hpet_legacy_start) {
 		/* save existing mode for later reenablement */
 		WARN_ON(channel != 0);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 06/14] KVM: x86: pass struct kvm_pit instead of kvm in PIT
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (4 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 05/14] KVM: x86: tone down WARN_ON pit.state_lock Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock Radim Krčmář
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

This patch passes struct kvm_pit into internal PIT functions.
Those functions used to get PIT through kvm->arch.vpit, even though most
of them never used *kvm for other purposes.  Another benefit is that we
don't need to set kvm->arch.vpit during initialization.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 112 ++++++++++++++++++++++++---------------------------
 arch/x86/kvm/i8254.h |   4 +-
 arch/x86/kvm/x86.c   |  26 +++++++-----
 3 files changed, 70 insertions(+), 72 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 328b21f3ab7c..c11734dd5092 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -71,10 +71,9 @@ static u64 muldiv64(u64 a, u32 b, u32 c)
 	return res.ll;
 }
 
-static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
+static void pit_set_gate(struct kvm_pit *pit, int channel, u32 val)
 {
-	struct kvm_kpit_channel_state *c =
-		&kvm->arch.vpit->pit_state.channels[channel];
+	struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel];
 
 	switch (c->mode) {
 	default:
@@ -95,16 +94,16 @@ static void pit_set_gate(struct kvm *kvm, int channel, u32 val)
 	c->gate = val;
 }
 
-static int pit_get_gate(struct kvm *kvm, int channel)
+static int pit_get_gate(struct kvm_pit *pit, int channel)
 {
-	return kvm->arch.vpit->pit_state.channels[channel].gate;
+	return pit->pit_state.channels[channel].gate;
 }
 
-static s64 __kpit_elapsed(struct kvm *kvm)
+static s64 __kpit_elapsed(struct kvm_pit *pit)
 {
 	s64 elapsed;
 	ktime_t remaining;
-	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
+	struct kvm_kpit_state *ps = &pit->pit_state;
 
 	if (!ps->period)
 		return 0;
@@ -124,23 +123,22 @@ static s64 __kpit_elapsed(struct kvm *kvm)
 	return elapsed;
 }
 
-static s64 kpit_elapsed(struct kvm *kvm, struct kvm_kpit_channel_state *c,
+static s64 kpit_elapsed(struct kvm_pit *pit, struct kvm_kpit_channel_state *c,
 			int channel)
 {
 	if (channel == 0)
-		return __kpit_elapsed(kvm);
+		return __kpit_elapsed(pit);
 
 	return ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time));
 }
 
-static int pit_get_count(struct kvm *kvm, int channel)
+static int pit_get_count(struct kvm_pit *pit, int channel)
 {
-	struct kvm_kpit_channel_state *c =
-		&kvm->arch.vpit->pit_state.channels[channel];
+	struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel];
 	s64 d, t;
 	int counter;
 
-	t = kpit_elapsed(kvm, c, channel);
+	t = kpit_elapsed(pit, c, channel);
 	d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
 	switch (c->mode) {
@@ -161,14 +159,13 @@ static int pit_get_count(struct kvm *kvm, int channel)
 	return counter;
 }
 
-static int pit_get_out(struct kvm *kvm, int channel)
+static int pit_get_out(struct kvm_pit *pit, int channel)
 {
-	struct kvm_kpit_channel_state *c =
-		&kvm->arch.vpit->pit_state.channels[channel];
+	struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel];
 	s64 d, t;
 	int out;
 
-	t = kpit_elapsed(kvm, c, channel);
+	t = kpit_elapsed(pit, c, channel);
 	d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
 	switch (c->mode) {
@@ -194,25 +191,23 @@ static int pit_get_out(struct kvm *kvm, int channel)
 	return out;
 }
 
-static void pit_latch_count(struct kvm *kvm, int channel)
+static void pit_latch_count(struct kvm_pit *pit, int channel)
 {
-	struct kvm_kpit_channel_state *c =
-		&kvm->arch.vpit->pit_state.channels[channel];
+	struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel];
 
 	if (!c->count_latched) {
-		c->latched_count = pit_get_count(kvm, channel);
+		c->latched_count = pit_get_count(pit, channel);
 		c->count_latched = c->rw_mode;
 	}
 }
 
-static void pit_latch_status(struct kvm *kvm, int channel)
+static void pit_latch_status(struct kvm_pit *pit, int channel)
 {
-	struct kvm_kpit_channel_state *c =
-		&kvm->arch.vpit->pit_state.channels[channel];
+	struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel];
 
 	if (!c->status_latched) {
 		/* TODO: Return NULL COUNT (bit 6). */
-		c->status = ((pit_get_out(kvm, channel) << 7) |
+		c->status = ((pit_get_out(pit, channel) << 7) |
 				(c->rw_mode << 4) |
 				(c->mode << 1) |
 				c->bcd);
@@ -306,9 +301,10 @@ static inline void kvm_pit_reset_reinject(struct kvm_pit *pit)
 	atomic_set(&pit->pit_state.irq_ack, 1);
 }
 
-static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
+static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period)
 {
-	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
+	struct kvm_kpit_state *ps = &pit->pit_state;
+	struct kvm *kvm = pit->kvm;
 	s64 interval;
 
 	if (!ioapic_in_kernel(kvm) ||
@@ -326,9 +322,9 @@ static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 	ps->is_periodic = is_period;
 
 	ps->timer.function = pit_timer_fn;
-	ps->kvm = ps->pit->kvm;
+	ps->kvm = pit->kvm;
 
-	kvm_pit_reset_reinject(ps->pit);
+	kvm_pit_reset_reinject(pit);
 
 	/*
 	 * Do not allow the guest to program periodic timers with small
@@ -351,9 +347,9 @@ static void create_pit_timer(struct kvm *kvm, u32 val, int is_period)
 		      HRTIMER_MODE_ABS);
 }
 
-static void pit_load_count(struct kvm *kvm, int channel, u32 val)
+static void pit_load_count(struct kvm_pit *pit, int channel, u32 val)
 {
-	struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state;
+	struct kvm_kpit_state *ps = &pit->pit_state;
 
 	pr_debug("load_count val is %d, channel is %d\n", val, channel);
 
@@ -378,32 +374,33 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val)
 	case 1:
         /* FIXME: enhance mode 4 precision */
 	case 4:
-		create_pit_timer(kvm, val, 0);
+		create_pit_timer(pit, val, 0);
 		break;
 	case 2:
 	case 3:
-		create_pit_timer(kvm, val, 1);
+		create_pit_timer(pit, val, 1);
 		break;
 	default:
-		destroy_pit_timer(kvm->arch.vpit);
+		destroy_pit_timer(pit);
 	}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start)
+void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val,
+		int hpet_legacy_start)
 {
 	u8 saved_mode;
 
-	WARN_ON_ONCE(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
+	WARN_ON_ONCE(!mutex_is_locked(&pit->pit_state.lock));
 
 	if (hpet_legacy_start) {
 		/* save existing mode for later reenablement */
 		WARN_ON(channel != 0);
-		saved_mode = kvm->arch.vpit->pit_state.channels[0].mode;
-		kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */
-		pit_load_count(kvm, channel, val);
-		kvm->arch.vpit->pit_state.channels[0].mode = saved_mode;
+		saved_mode = pit->pit_state.channels[0].mode;
+		pit->pit_state.channels[0].mode = 0xff; /* disable timer */
+		pit_load_count(pit, channel, val);
+		pit->pit_state.channels[0].mode = saved_mode;
 	} else {
-		pit_load_count(kvm, channel, val);
+		pit_load_count(pit, channel, val);
 	}
 }
 
@@ -429,7 +426,6 @@ static int pit_ioport_write(struct kvm_vcpu *vcpu,
 {
 	struct kvm_pit *pit = dev_to_pit(this);
 	struct kvm_kpit_state *pit_state = &pit->pit_state;
-	struct kvm *kvm = pit->kvm;
 	int channel, access;
 	struct kvm_kpit_channel_state *s;
 	u32 val = *(u32 *) data;
@@ -453,9 +449,9 @@ static int pit_ioport_write(struct kvm_vcpu *vcpu,
 				s = &pit_state->channels[channel];
 				if (val & (2 << channel)) {
 					if (!(val & 0x20))
-						pit_latch_count(kvm, channel);
+						pit_latch_count(pit, channel);
 					if (!(val & 0x10))
-						pit_latch_status(kvm, channel);
+						pit_latch_status(pit, channel);
 				}
 			}
 		} else {
@@ -463,7 +459,7 @@ static int pit_ioport_write(struct kvm_vcpu *vcpu,
 			s = &pit_state->channels[channel];
 			access = (val >> 4) & KVM_PIT_CHANNEL_MASK;
 			if (access == 0) {
-				pit_latch_count(kvm, channel);
+				pit_latch_count(pit, channel);
 			} else {
 				s->rw_mode = access;
 				s->read_state = access;
@@ -480,17 +476,17 @@ static int pit_ioport_write(struct kvm_vcpu *vcpu,
 		switch (s->write_state) {
 		default:
 		case RW_STATE_LSB:
-			pit_load_count(kvm, addr, val);
+			pit_load_count(pit, addr, val);
 			break;
 		case RW_STATE_MSB:
-			pit_load_count(kvm, addr, val << 8);
+			pit_load_count(pit, addr, val << 8);
 			break;
 		case RW_STATE_WORD0:
 			s->write_latch = val;
 			s->write_state = RW_STATE_WORD1;
 			break;
 		case RW_STATE_WORD1:
-			pit_load_count(kvm, addr, s->write_latch | (val << 8));
+			pit_load_count(pit, addr, s->write_latch | (val << 8));
 			s->write_state = RW_STATE_WORD0;
 			break;
 		}
@@ -506,7 +502,6 @@ static int pit_ioport_read(struct kvm_vcpu *vcpu,
 {
 	struct kvm_pit *pit = dev_to_pit(this);
 	struct kvm_kpit_state *pit_state = &pit->pit_state;
-	struct kvm *kvm = pit->kvm;
 	int ret, count;
 	struct kvm_kpit_channel_state *s;
 	if (!pit_in_range(addr))
@@ -543,20 +538,20 @@ static int pit_ioport_read(struct kvm_vcpu *vcpu,
 		switch (s->read_state) {
 		default:
 		case RW_STATE_LSB:
-			count = pit_get_count(kvm, addr);
+			count = pit_get_count(pit, addr);
 			ret = count & 0xff;
 			break;
 		case RW_STATE_MSB:
-			count = pit_get_count(kvm, addr);
+			count = pit_get_count(pit, addr);
 			ret = (count >> 8) & 0xff;
 			break;
 		case RW_STATE_WORD0:
-			count = pit_get_count(kvm, addr);
+			count = pit_get_count(pit, addr);
 			ret = count & 0xff;
 			s->read_state = RW_STATE_WORD1;
 			break;
 		case RW_STATE_WORD1:
-			count = pit_get_count(kvm, addr);
+			count = pit_get_count(pit, addr);
 			ret = (count >> 8) & 0xff;
 			s->read_state = RW_STATE_WORD0;
 			break;
@@ -577,14 +572,13 @@ static int speaker_ioport_write(struct kvm_vcpu *vcpu,
 {
 	struct kvm_pit *pit = speaker_to_pit(this);
 	struct kvm_kpit_state *pit_state = &pit->pit_state;
-	struct kvm *kvm = pit->kvm;
 	u32 val = *(u32 *) data;
 	if (addr != KVM_SPEAKER_BASE_ADDRESS)
 		return -EOPNOTSUPP;
 
 	mutex_lock(&pit_state->lock);
 	pit_state->speaker_data_on = (val >> 1) & 1;
-	pit_set_gate(kvm, 2, val & 1);
+	pit_set_gate(pit, 2, val & 1);
 	mutex_unlock(&pit_state->lock);
 	return 0;
 }
@@ -595,7 +589,6 @@ static int speaker_ioport_read(struct kvm_vcpu *vcpu,
 {
 	struct kvm_pit *pit = speaker_to_pit(this);
 	struct kvm_kpit_state *pit_state = &pit->pit_state;
-	struct kvm *kvm = pit->kvm;
 	unsigned int refresh_clock;
 	int ret;
 	if (addr != KVM_SPEAKER_BASE_ADDRESS)
@@ -605,8 +598,8 @@ static int speaker_ioport_read(struct kvm_vcpu *vcpu,
 	refresh_clock = ((unsigned int)ktime_to_ns(ktime_get()) >> 14) & 1;
 
 	mutex_lock(&pit_state->lock);
-	ret = ((pit_state->speaker_data_on << 1) | pit_get_gate(kvm, 2) |
-		(pit_get_out(kvm, 2) << 5) | (refresh_clock << 4));
+	ret = ((pit_state->speaker_data_on << 1) | pit_get_gate(pit, 2) |
+		(pit_get_out(pit, 2) << 5) | (refresh_clock << 4));
 	if (len > sizeof(ret))
 		len = sizeof(ret);
 	memcpy(data, (char *)&ret, len);
@@ -625,7 +618,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
 		c = &pit->pit_state.channels[i];
 		c->mode = 0xff;
 		c->gate = (i != 2);
-		pit_load_count(pit->kvm, i, 0);
+		pit_load_count(pit, i, 0);
 	}
 	mutex_unlock(&pit->pit_state.lock);
 
@@ -687,7 +680,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	}
 	init_kthread_work(&pit->expired, pit_do_work);
 
-	kvm->arch.vpit = pit;
 	pit->kvm = kvm;
 
 	pit_state = &pit->pit_state;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index f8cf4b84f435..a6aceaf08df5 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -56,9 +56,11 @@ struct kvm_pit {
 #define KVM_MAX_PIT_INTR_INTERVAL   HZ / 100
 #define KVM_PIT_CHANNEL_MASK	    0x3
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
+
 void kvm_pit_reset(struct kvm_pit *pit);
+void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val,
+		int hpet_legacy_start);
 
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf15bc5b0dff..57b296466bf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3637,11 +3637,13 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
 	int i;
-	mutex_lock(&kvm->arch.vpit->pit_state.lock);
-	memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state));
+	struct kvm_pit *pit = kvm->arch.vpit;
+
+	mutex_lock(&pit->pit_state.lock);
+	memcpy(&pit->pit_state, ps, sizeof(struct kvm_pit_state));
 	for (i = 0; i < 3; i++)
-		kvm_pit_load_count(kvm, i, ps->channels[i].count, 0);
-	mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+		kvm_pit_load_count(pit, i, ps->channels[i].count, 0);
+	mutex_unlock(&pit->pit_state.lock);
 	return 0;
 }
 
@@ -3661,18 +3663,20 @@ static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps)
 	int start = 0;
 	int i;
 	u32 prev_legacy, cur_legacy;
-	mutex_lock(&kvm->arch.vpit->pit_state.lock);
-	prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY;
+	struct kvm_pit *pit = kvm->arch.vpit;
+
+	mutex_lock(&pit->pit_state.lock);
+	prev_legacy = pit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY;
 	cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY;
 	if (!prev_legacy && cur_legacy)
 		start = 1;
-	memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels,
-	       sizeof(kvm->arch.vpit->pit_state.channels));
-	kvm->arch.vpit->pit_state.flags = ps->flags;
+	memcpy(&pit->pit_state.channels, &ps->channels,
+	       sizeof(pit->pit_state.channels));
+	pit->pit_state.flags = ps->flags;
 	for (i = 0; i < 3; i++)
-		kvm_pit_load_count(kvm, i, kvm->arch.vpit->pit_state.channels[i].count,
+		kvm_pit_load_count(pit, i, pit->pit_state.channels[i].count,
 				   start && i == 0);
-	mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+	mutex_unlock(&pit->pit_state.lock);
 	return 0;
 }
 
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (5 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 06/14] KVM: x86: pass struct kvm_pit instead of kvm in PIT Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-18 18:03   ` Paolo Bonzini
  2016-02-17 19:14 ` [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very
  early, but initialization doesn't use kvm->arch.vpit since the last
  patch, so we can drop locking.
- kvm_free_pit is only run after there are no users of KVM and therefore
  is the sole actor.
- Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject
  is only protected at that place.
- kvm_pit_reset isn't used anywhere and its locking can be dropped if we
  hide it.

Removing useless locking allows to see what actually is being protected
by PIT state lock (values accessible from the guest).

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 7 -------
 arch/x86/kvm/i8254.h | 7 ++++---
 arch/x86/kvm/x86.c   | 4 ++--
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c11734dd5092..01f9c70ebc21 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -612,7 +612,6 @@ void kvm_pit_reset(struct kvm_pit *pit)
 	int i;
 	struct kvm_kpit_channel_state *c;
 
-	mutex_lock(&pit->pit_state.lock);
 	pit->pit_state.flags = 0;
 	for (i = 0; i < 3; i++) {
 		c = &pit->pit_state.channels[i];
@@ -620,7 +619,6 @@ void kvm_pit_reset(struct kvm_pit *pit)
 		c->gate = (i != 2);
 		pit_load_count(pit, i, 0);
 	}
-	mutex_unlock(&pit->pit_state.lock);
 
 	kvm_pit_reset_reinject(pit);
 }
@@ -663,7 +661,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	}
 
 	mutex_init(&pit->pit_state.lock);
-	mutex_lock(&pit->pit_state.lock);
 
 	pid = get_pid(task_tgid(current));
 	pid_nr = pid_vnr(pid);
@@ -673,7 +670,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	pit->worker_task = kthread_run(kthread_worker_fn, &pit->worker,
 				       "kvm-pit/%d", pid_nr);
 	if (IS_ERR(pit->worker_task)) {
-		mutex_unlock(&pit->pit_state.lock);
 		kvm_free_irq_source_id(kvm, pit->irq_source_id);
 		kfree(pit);
 		return NULL;
@@ -689,7 +685,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
 	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
 	pit_state->reinject = true;
-	mutex_unlock(&pit->pit_state.lock);
 
 	kvm_pit_reset(pit);
 
@@ -737,13 +732,11 @@ void kvm_free_pit(struct kvm *kvm)
 					       &kvm->arch.vpit->mask_notifier);
 		kvm_unregister_irq_ack_notifier(kvm,
 				&kvm->arch.vpit->pit_state.irq_ack_notifier);
-		mutex_lock(&kvm->arch.vpit->pit_state.lock);
 		timer = &kvm->arch.vpit->pit_state.timer;
 		hrtimer_cancel(timer);
 		flush_kthread_work(&kvm->arch.vpit->expired);
 		kthread_stop(kvm->arch.vpit->worker_task);
 		kvm_free_irq_source_id(kvm, kvm->arch.vpit->irq_source_id);
-		mutex_unlock(&kvm->arch.vpit->pit_state.lock);
 		kfree(kvm->arch.vpit);
 	}
 }
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index a6aceaf08df5..568d87ec3698 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -22,17 +22,19 @@ struct kvm_kpit_channel_state {
 };
 
 struct kvm_kpit_state {
+	/* All members before "struct mutex lock" are protected by the lock. */
 	struct kvm_kpit_channel_state channels[3];
 	u32 flags;
 	bool is_periodic;
 	s64 period; 				/* unit: ns */
 	struct hrtimer timer;
-	atomic_t pending;			/* accumulated triggered timers */
 	bool reinject;
-	struct kvm *kvm;
 	u32    speaker_data_on;
+
 	struct mutex lock;
+	struct kvm *kvm;
 	struct kvm_pit *pit;
+	atomic_t pending; /* accumulated triggered timers */
 	atomic_t irq_ack;
 	struct kvm_irq_ack_notifier irq_ack_notifier;
 };
@@ -59,7 +61,6 @@ struct kvm_pit {
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 
-void kvm_pit_reset(struct kvm_pit *pit);
 void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val,
 		int hpet_legacy_start);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57b296466bf0..f2f7ba74da6b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3685,9 +3685,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 {
 	if (!kvm->arch.vpit)
 		return -ENXIO;
-	mutex_lock(&kvm->arch.vpit->pit_state.lock);
+
 	kvm->arch.vpit->pit_state.reinject = control->pit_reinject;
-	mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+
 	return 0;
 }
 
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (6 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-18 18:05   ` Paolo Bonzini
  2016-02-18 18:08   ` Paolo Bonzini
  2016-02-17 19:14 ` [PATCH v2 09/14] KVM: x86: refactor kvm_create_pit Radim Krčmář
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

Discard policy doesn't rely on information from notifiers, so we don't
need to register notifiers unconditionally.  We kept correct counts in
case userspace switched between policies during runtime, but that can be
avoided by reseting the state.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * lock the ioctl
 * rebase

 arch/x86/kvm/i8254.c | 38 +++++++++++++++++++++++++++-----------
 arch/x86/kvm/i8254.h |  1 +
 arch/x86/kvm/x86.c   | 12 ++++++++++--
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 01f9c70ebc21..e514f7fa0093 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -225,7 +225,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 	 * inc(pending) in pit_timer_fn and xchg(irq_ack, 0) in pit_do_work.
 	 */
 	smp_mb();
-	if (atomic_dec_if_positive(&ps->pending) > 0 && ps->reinject)
+	if (atomic_dec_if_positive(&ps->pending) > 0)
 		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
 }
 
@@ -301,6 +301,27 @@ static inline void kvm_pit_reset_reinject(struct kvm_pit *pit)
 	atomic_set(&pit->pit_state.irq_ack, 1);
 }
 
+void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
+{
+	struct kvm_kpit_state *ps = &pit->pit_state;
+	struct kvm *kvm = pit->kvm;
+
+	if (ps->reinject == reinject)
+		return;
+
+	if (reinject) {
+		/* The initial state is preserved while ps->reinject == 0. */
+		kvm_pit_reset_reinject(pit);
+		kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
+		kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+	} else {
+		kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
+		kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+	}
+
+	ps->reinject = reinject;
+}
+
 static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period)
 {
 	struct kvm_kpit_state *ps = &pit->pit_state;
@@ -681,15 +702,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	pit_state = &pit->pit_state;
 	pit_state->pit = pit;
 	hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+
 	pit_state->irq_ack_notifier.gsi = 0;
 	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
-	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
-	pit_state->reinject = true;
+	pit->mask_notifier.func = pit_mask_notifer;
 
 	kvm_pit_reset(pit);
 
-	pit->mask_notifier.func = pit_mask_notifer;
-	kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+	kvm_pit_set_reinject(pit, true);
 
 	kvm_iodevice_init(&pit->dev, &pit_dev_ops);
 	ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, KVM_PIT_BASE_ADDRESS,
@@ -712,8 +732,7 @@ fail_unregister:
 	kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
 
 fail:
-	kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
-	kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
+	kvm_pit_set_reinject(pit, false);
 	kvm_free_irq_source_id(kvm, pit->irq_source_id);
 	kthread_stop(pit->worker_task);
 	kfree(pit);
@@ -728,10 +747,7 @@ void kvm_free_pit(struct kvm *kvm)
 		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &kvm->arch.vpit->dev);
 		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
 					      &kvm->arch.vpit->speaker_dev);
-		kvm_unregister_irq_mask_notifier(kvm, 0,
-					       &kvm->arch.vpit->mask_notifier);
-		kvm_unregister_irq_ack_notifier(kvm,
-				&kvm->arch.vpit->pit_state.irq_ack_notifier);
+		kvm_pit_set_reinject(kvm->arch.vpit, false);
 		timer = &kvm->arch.vpit->pit_state.timer;
 		hrtimer_cancel(timer);
 		flush_kthread_work(&kvm->arch.vpit->expired);
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 568d87ec3698..47a96e803aab 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -63,5 +63,6 @@ void kvm_free_pit(struct kvm *kvm);
 
 void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val,
 		int hpet_legacy_start);
+void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject);
 
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f2f7ba74da6b..785cc781673d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3683,10 +3683,18 @@ static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps)
 static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 				 struct kvm_reinject_control *control)
 {
-	if (!kvm->arch.vpit)
+	struct kvm_pit *pit = kvm->arch.vpit;
+
+	if (!pit)
 		return -ENXIO;
 
-	kvm->arch.vpit->pit_state.reinject = control->pit_reinject;
+	/* pit->pit_state.lock was overloaded to prevent userspace from getting
+	 * an inconsistent state after running multiple KVM_REINJECT_CONTROL
+	 * ioctls in parallel.  Use a separate lock if that ioctl isn't rare.
+	 */
+	mutex_lock(&pit->pit_state.lock);
+	kvm_pit_set_reinject(pit, control->pit_reinject);
+	mutex_unlock(&pit->pit_state.lock);
 
 	return 0;
 }
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 09/14] KVM: x86: refactor kvm_create_pit
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (7 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 10/14] KVM: x86: refactor kvm_free_pit Radim Krčmář
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

Locks are gone, so we don't need to duplicate error paths.
Use goto everywhere.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index e514f7fa0093..9dd4460033e3 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -676,10 +676,8 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 		return NULL;
 
 	pit->irq_source_id = kvm_request_irq_source_id(kvm);
-	if (pit->irq_source_id < 0) {
-		kfree(pit);
-		return NULL;
-	}
+	if (pit->irq_source_id < 0)
+		goto fail_request;
 
 	mutex_init(&pit->pit_state.lock);
 
@@ -690,11 +688,9 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	init_kthread_worker(&pit->worker);
 	pit->worker_task = kthread_run(kthread_worker_fn, &pit->worker,
 				       "kvm-pit/%d", pid_nr);
-	if (IS_ERR(pit->worker_task)) {
-		kvm_free_irq_source_id(kvm, pit->irq_source_id);
-		kfree(pit);
-		return NULL;
-	}
+	if (IS_ERR(pit->worker_task))
+		goto fail_kthread;
+
 	init_kthread_work(&pit->expired, pit_do_work);
 
 	pit->kvm = kvm;
@@ -715,7 +711,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, KVM_PIT_BASE_ADDRESS,
 				      KVM_PIT_MEM_LENGTH, &pit->dev);
 	if (ret < 0)
-		goto fail;
+		goto fail_register_pit;
 
 	if (flags & KVM_PIT_SPEAKER_DUMMY) {
 		kvm_iodevice_init(&pit->speaker_dev, &speaker_dev_ops);
@@ -723,18 +719,19 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 					      KVM_SPEAKER_BASE_ADDRESS, 4,
 					      &pit->speaker_dev);
 		if (ret < 0)
-			goto fail_unregister;
+			goto fail_register_speaker;
 	}
 
 	return pit;
 
-fail_unregister:
+fail_register_speaker:
 	kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
-
-fail:
+fail_register_pit:
 	kvm_pit_set_reinject(pit, false);
-	kvm_free_irq_source_id(kvm, pit->irq_source_id);
 	kthread_stop(pit->worker_task);
+fail_kthread:
+	kvm_free_irq_source_id(kvm, pit->irq_source_id);
+fail_request:
 	kfree(pit);
 	return NULL;
 }
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 10/14] KVM: x86: refactor kvm_free_pit
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (8 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 09/14] KVM: x86: refactor kvm_create_pit Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 11/14] KVM: x86: remove pit and kvm from kvm_kpit_state Radim Krčmář
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

Could be easier to read, but git history will become deeper.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 9dd4460033e3..28ad45abf386 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -738,18 +738,16 @@ fail_request:
 
 void kvm_free_pit(struct kvm *kvm)
 {
-	struct hrtimer *timer;
+	struct kvm_pit *pit = kvm->arch.vpit;
 
-	if (kvm->arch.vpit) {
-		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &kvm->arch.vpit->dev);
-		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
-					      &kvm->arch.vpit->speaker_dev);
-		kvm_pit_set_reinject(kvm->arch.vpit, false);
-		timer = &kvm->arch.vpit->pit_state.timer;
-		hrtimer_cancel(timer);
-		flush_kthread_work(&kvm->arch.vpit->expired);
-		kthread_stop(kvm->arch.vpit->worker_task);
-		kvm_free_irq_source_id(kvm, kvm->arch.vpit->irq_source_id);
-		kfree(kvm->arch.vpit);
+	if (pit) {
+		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
+		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->speaker_dev);
+		kvm_pit_set_reinject(pit, false);
+		hrtimer_cancel(&pit->pit_state.timer);
+		flush_kthread_work(&pit->expired);
+		kthread_stop(pit->worker_task);
+		kvm_free_irq_source_id(kvm, pit->irq_source_id);
+		kfree(pit);
 	}
 }
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 11/14] KVM: x86: remove pit and kvm from kvm_kpit_state
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (9 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 10/14] KVM: x86: refactor kvm_free_pit Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 12/14] KVM: x86: remove pointless dereference of PIT Radim Krčmář
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

kvm isn't ever used and pit can be accessed with container_of.
If you *really* need kvm, pit_state_to_pit(ps)->kvm.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 14 +++++++++-----
 arch/x86/kvm/i8254.h |  2 --
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 28ad45abf386..d09925e7934a 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -215,10 +215,16 @@ static void pit_latch_status(struct kvm_pit *pit, int channel)
 	}
 }
 
+static inline struct kvm_pit *pit_state_to_pit(struct kvm_kpit_state *ps)
+{
+	return container_of(ps, struct kvm_pit, pit_state);
+}
+
 static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
 	struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 						 irq_ack_notifier);
+	struct kvm_pit *pit = pit_state_to_pit(ps);
 
 	atomic_set(&ps->irq_ack, 1);
 	/* irq_ack should be set before pending is read.  Order accesses with
@@ -226,7 +232,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 	 */
 	smp_mb();
 	if (atomic_dec_if_positive(&ps->pending) > 0)
-		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
+		queue_kthread_work(&pit->worker, &pit->expired);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -281,7 +287,7 @@ static void pit_do_work(struct kthread_work *work)
 static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
 {
 	struct kvm_kpit_state *ps = container_of(data, struct kvm_kpit_state, timer);
-	struct kvm_pit *pt = ps->kvm->arch.vpit;
+	struct kvm_pit *pt = pit_state_to_pit(ps);
 
 	if (ps->reinject)
 		atomic_inc(&ps->pending);
@@ -338,12 +344,11 @@ static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period)
 
 	/* TODO The new value only affected after the retriggered */
 	hrtimer_cancel(&ps->timer);
-	flush_kthread_work(&ps->pit->expired);
+	flush_kthread_work(&pit->expired);
 	ps->period = interval;
 	ps->is_periodic = is_period;
 
 	ps->timer.function = pit_timer_fn;
-	ps->kvm = pit->kvm;
 
 	kvm_pit_reset_reinject(pit);
 
@@ -696,7 +701,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 	pit->kvm = kvm;
 
 	pit_state = &pit->pit_state;
-	pit_state->pit = pit;
 	hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 
 	pit_state->irq_ack_notifier.gsi = 0;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 47a96e803aab..2f11f5e55021 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -32,8 +32,6 @@ struct kvm_kpit_state {
 	u32    speaker_data_on;
 
 	struct mutex lock;
-	struct kvm *kvm;
-	struct kvm_pit *pit;
 	atomic_t pending; /* accumulated triggered timers */
 	atomic_t irq_ack;
 	struct kvm_irq_ack_notifier irq_ack_notifier;
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 12/14] KVM: x86: remove pointless dereference of PIT
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (10 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 11/14] KVM: x86: remove pit and kvm from kvm_kpit_state Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 13/14] KVM: x86: don't assume layout of kvm_kpit_state Radim Krčmář
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

PIT is known at that point.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d09925e7934a..be1072c31d49 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -267,8 +267,8 @@ static void pit_do_work(struct kthread_work *work)
 	if (ps->reinject && !atomic_xchg(&ps->irq_ack, 0))
 		return;
 
-	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false);
-	kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false);
+	kvm_set_irq(kvm, pit->irq_source_id, 0, 1, false);
+	kvm_set_irq(kvm, pit->irq_source_id, 0, 0, false);
 
 	/*
 	 * Provides NMI watchdog support via Virtual Wire mode.
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 13/14] KVM: x86: don't assume layout of kvm_kpit_state
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (11 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 12/14] KVM: x86: remove pointless dereference of PIT Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-17 19:14 ` [PATCH v2 14/14] KVM: x86: move PIT timer function initialization Radim Krčmář
  2016-02-18 18:11 ` [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Paolo Bonzini
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

channels has offset 0 and correct size now, but that can change.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/x86.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 785cc781673d..c688a99da5ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3628,9 +3628,13 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
 
 static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
-	mutex_lock(&kvm->arch.vpit->pit_state.lock);
-	memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state));
-	mutex_unlock(&kvm->arch.vpit->pit_state.lock);
+	struct kvm_kpit_state *kps = &kvm->arch.vpit->pit_state;
+
+	BUILD_BUG_ON(sizeof(*ps) != sizeof(kps->channels));
+
+	mutex_lock(&kps->lock);
+	memcpy(ps, &kps->channels, sizeof(*ps));
+	mutex_unlock(&kps->lock);
 	return 0;
 }
 
@@ -3640,7 +3644,7 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 	struct kvm_pit *pit = kvm->arch.vpit;
 
 	mutex_lock(&pit->pit_state.lock);
-	memcpy(&pit->pit_state, ps, sizeof(struct kvm_pit_state));
+	memcpy(&pit->pit_state.channels, ps, sizeof(*ps));
 	for (i = 0; i < 3; i++)
 		kvm_pit_load_count(pit, i, ps->channels[i].count, 0);
 	mutex_unlock(&pit->pit_state.lock);
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 14/14] KVM: x86: move PIT timer function initialization
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (12 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 13/14] KVM: x86: don't assume layout of kvm_kpit_state Radim Krčmář
@ 2016-02-17 19:14 ` Radim Krčmář
  2016-02-18 18:11 ` [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Paolo Bonzini
  14 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-17 19:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: kvm, Paolo Bonzini, Yuki Shibuya

We can do it just once.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new

 arch/x86/kvm/i8254.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index be1072c31d49..ebede3040e57 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -348,8 +348,6 @@ static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period)
 	ps->period = interval;
 	ps->is_periodic = is_period;
 
-	ps->timer.function = pit_timer_fn;
-
 	kvm_pit_reset_reinject(pit);
 
 	/*
@@ -702,6 +700,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 
 	pit_state = &pit->pit_state;
 	hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	pit_state->timer.function = pit_timer_fn;
 
 	pit_state->irq_ack_notifier.gsi = 0;
 	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
@ 2016-02-18 16:13   ` Paolo Bonzini
  2016-02-18 16:56     ` Radim Krčmář
  2016-02-18 18:49   ` Paolo Bonzini
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 16:13 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya, Rik van Riel



On 17/02/2016 20:14, Radim Krčmář wrote:
> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
> before EOI from the last one.
> 
> This patch changes the policy to always try to deliver the interrupt,
> which makes a difference when its vector is in ISR.
> Old implementation would drop the interrupt, but proposed one injects to
> IRR, like real hardware would.

This seems like what libvirt calls the "merge" policy:

    Merge the missed tick(s) into one tick and inject. The guest time
    may be delayed, depending on how the OS reacts to the merging of
    ticks

where the merged tick is the one placed into IRR.  Unlike discard,
"merge" can starve the guest through an interrupt storm.

Rik, I think you originally worked on the missed tick policies in Xen.
Is the above correct?

If so, do you recall what would be the reason to use the merge policy
instead of the discard policy?

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-18 16:13   ` Paolo Bonzini
@ 2016-02-18 16:56     ` Radim Krčmář
  2016-02-18 17:33       ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-18 16:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel

2016-02-18 17:13+0100, Paolo Bonzini:
> On 17/02/2016 20:14, Radim Krčmář wrote:
>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
>> before EOI from the last one.
>> 
>> This patch changes the policy to always try to deliver the interrupt,
>> which makes a difference when its vector is in ISR.
>> Old implementation would drop the interrupt, but proposed one injects to
>> IRR, like real hardware would.
> 
> This seems like what libvirt calls the "merge" policy:

Oops, I never looked beyond QEMU after seeing that the naming in libvirt
doesn't even match ...

I think the policy that KVM implements (which I call discard) is "delay"
in libvirt.  (https://libvirt.org/formatdomain.html#elementsTime)

> 
>     Merge the missed tick(s) into one tick and inject. The guest time
>     may be delayed, depending on how the OS reacts to the merging of
>     ticks

The "may be delayed" there makes me feel like the timer has to support a
guest visible counter of missed ticks.
PIT will definitely be delayed if we get another tick while the previous
one is still in IRR and there is nothing that the guest can do with it.

"catchup" is the other KVM policy and "discard" also needs to allow the
guest to handle lost ticks.

> where the merged tick is the one placed into IRR.  Unlike discard,
> "merge" can starve the guest through an interrupt storm.

Yeah, starving a VCPU with an interrupt storm is more likely with the
changed policy.  It's a pretty sad situation if all the time that VCPU
gets isn't even enough to run a PIT handler, so I didn't care.

The NMI watchdog bug can also be solved without changing the policy.
(It's a hack in any case.)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-18 16:56     ` Radim Krčmář
@ 2016-02-18 17:33       ` Paolo Bonzini
  2016-02-18 17:55         ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 17:33 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel

On 18/02/2016 17:56, Radim Krčmář wrote:
> 2016-02-18 17:13+0100, Paolo Bonzini:
> > On 17/02/2016 20:14, Radim Krčmář wrote:
> > > Discard policy uses ack_notifiers to prevent injection of PIT interrupts
> > > before EOI from the last one.
> > > 
> > > This patch changes the policy to always try to deliver the interrupt,
> > > which makes a difference when its vector is in ISR.
> > > Old implementation would drop the interrupt, but proposed one injects to
> > > IRR, like real hardware would.
> > 
> > This seems like what libvirt calls the "merge" policy:
> 
> Oops, I never looked beyond QEMU after seeing that the naming in libvirt
> doesn't even match ...
> 
> I think the policy that KVM implements (which I call discard) is "delay"
> in libvirt.  (https://libvirt.org/formatdomain.html#elementsTime)

Suppose the scheduled ticks are at times 0, 20, 40, 60, 80.  The EOI for
time 0 is only delivered at time 42, other EOIs are timely.

The resulting injections are:

- for discard: 0, 60, 80.

- for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.

- for merge: 0, 20 (in IRR, delivered at 42), 60, 80.

For delay I *think* it would be 0, 42, 62, 82, 102.

You know the i8254 code better than I do.  Does this make sense to you?
 (Or in other words, does the code *really* do the above?...)

> The "may be delayed" there makes me feel like the timer has to support a
> guest visible counter of missed ticks.

Yes, it depends whether the guest uses PIT to count time, or just to do
periodic stuff (and then it reads the time from e.g. the PMTimer).

In either case, only catchup ensures that time is not delayed, and it's
used for Windows which uses the RTC periodic clock to count time.

>> > where the merged tick is the one placed into IRR.  Unlike discard,
>> > "merge" can starve the guest through an interrupt storm.
> Yeah, starving a VCPU with an interrupt storm is more likely with the
> changed policy.  It's a pretty sad situation if all the time that VCPU
> gets isn't even enough to run a PIT handler, so I didn't care.

True.  On one hand the hardware policy is clearly merge, not discard.
The i8259 has an IRR!  On the other hand I'm a bit wary of changing the
policy without seeing exactly what the old OSes were doing in the PIT
handler.

> The NMI watchdog bug can also be solved without changing the policy.
> (It's a hack in any case.)

Can you send a patch for that?

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-18 17:33       ` Paolo Bonzini
@ 2016-02-18 17:55         ` Paolo Bonzini
  2016-02-19 14:44           ` Radim Krčmář
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 17:55 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel



On 18/02/2016 18:33, Paolo Bonzini wrote:
> On 18/02/2016 17:56, Radim Krčmář wrote:
>> 2016-02-18 17:13+0100, Paolo Bonzini:
>>> On 17/02/2016 20:14, Radim Krčmář wrote:
>>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
>>>> before EOI from the last one.
>>>>
>>>> This patch changes the policy to always try to deliver the interrupt,
>>>> which makes a difference when its vector is in ISR.
>>>> Old implementation would drop the interrupt, but proposed one injects to
>>>> IRR, like real hardware would.
>>>
>>> This seems like what libvirt calls the "merge" policy:
>>
>> Oops, I never looked beyond QEMU after seeing that the naming in libvirt
>> doesn't even match ...
>>
>> I think the policy that KVM implements (which I call discard) is "delay"
>> in libvirt.  (https://libvirt.org/formatdomain.html#elementsTime)
> 
> Suppose the scheduled ticks are at times 0, 20, 40, 60, 80.  The EOI for
> time 0 is only delivered at time 42, other EOIs are timely.
> 
> The resulting injections are:
> 
> - for discard: 0, 60, 80.
> 
> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.
> 
> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
> 
> For delay I *think* it would be 0, 42, 62, 82, 102.

Wrong: for delay it is something like 0, 42, 43, 60, 80.

Your patch does the right thing, QEMU is wrong in calling the policy
"discard" where it should have been "merge".  In fact both i8254 and RTC
use the same wrong nomenclature.

And it is indeed superfluous to use ack notifiers to implement the
default policy, as the default policy is already baked into the i8259.

Sorry, it shows that I'm swamped as I'm messing up things a bit lately.
 At least I have you to correct me. :)

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock
  2016-02-17 19:14 ` [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock Radim Krčmář
@ 2016-02-18 18:03   ` Paolo Bonzini
  2016-02-19 14:45     ` Radim Krčmář
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:03 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> +	/* All members before "struct mutex lock" are protected by the lock. */

... except reinject, according to the commit message. :)

Paolo

>  	struct kvm_kpit_channel_state channels[3];
>  	u32 flags;
>  	bool is_periodic;
>  	s64 period; 				/* unit: ns */
>  	struct hrtimer timer;
> -	atomic_t pending;			/* accumulated triggered timers */
>  	bool reinject;
> -	struct kvm *kvm;
>  	u32    speaker_data_on;
> +

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq
  2016-02-17 19:14 ` [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq Radim Krčmář
@ 2016-02-18 18:04   ` Paolo Bonzini
  2016-02-19 15:51     ` Radim Krčmář
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:04 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> -	value = atomic_dec_return(&ps->pending);
> -	if (value < 0)
> -		/* spurious acks can be generated if, for example, the
> -		 * PIC is being reset.  Handle it gracefully here
> -		 */
> -		atomic_inc(&ps->pending);
> -	else if (value > 0 && ps->reinject)
> -		/* in this case, we had multiple outstanding pit interrupts
> -		 * that we needed to inject.  Reinject
> -		 */
> +	if (atomic_dec_if_positive(&ps->pending) > 0 && ps->reinject)
>  		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);

Here it would have made sense to do already

	if (!ps->reinject) {
		WARN_ON_ONCE(ps->pending || !ps->irq_ack);
		return;
	}
	spin_lock(...)
	if (atomic_dec_if_positive(&ps->pending) > 0)
		queue_kthread_work(...);
	ps->irq_ack = 1;
	spin_unlock(...)
	
because ps->pending is only ever nonzero, and irq_ack is only ever zero,
if ps->reinject.  Not a big deal since the ack notifier is going to
disappear altogether for the discard policy, but the nice thing is that
it lets you remove the ack notifier earlier and disentangle a bit more
discard mode.

So if you want for v3 you can reorder the patches like this:

- patch 1, same

- patch 2, what is outlined above

- patch 3, remove ack notifier for discard

- patch 4..14 the rest

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-17 19:14 ` [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
@ 2016-02-18 18:05   ` Paolo Bonzini
  2016-02-18 18:08   ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:05 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> Discard policy doesn't rely on information from notifiers, so we don't
> need to register notifiers unconditionally.  We kept correct counts in
> case userspace switched between policies during runtime, but that can be
> avoided by reseting the state.

Ah okay so the WARN_ON could actually be triggered.  But otherwise I
think my suggestion should work.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-17 19:14 ` [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
  2016-02-18 18:05   ` Paolo Bonzini
@ 2016-02-18 18:08   ` Paolo Bonzini
  2016-02-19 15:04     ` Radim Krčmář
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:08 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> +	/* pit->pit_state.lock was overloaded to prevent userspace from getting
> +	 * an inconsistent state after running multiple KVM_REINJECT_CONTROL
> +	 * ioctls in parallel.  Use a separate lock if that ioctl isn't rare.
> +	 */
> +	mutex_lock(&pit->pit_state.lock);
> +	kvm_pit_set_reinject(pit, control->pit_reinject);
> +	mutex_unlock(&pit->pit_state.lock);

... so in patch 7 concurrent _writes_ of reinject are protected by the
lock, but reads are done outside it (in pit_timer_fn).  WDYT about
making reinject an atomic_t?

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code
  2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
                   ` (13 preceding siblings ...)
  2016-02-17 19:14 ` [PATCH v2 14/14] KVM: x86: move PIT timer function initialization Radim Krčmář
@ 2016-02-18 18:11 ` Paolo Bonzini
  14 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:11 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> v2
>  - Different ordering; the important fix is first, which makes it easier
>    to backport as improvements of [1/14] have complex dependencies.
>  - Correct choice of atomics in [2/14].
>  - Explicit SMP barriers for lockless updates in [3/14].
>  - Protection against userspace races in kvm_vm_ioctl_reinject, [8/14].
>  - New code churn.  (I tried to annotate locking, which required me to
>    really look at the code and I couldn't leave it alone at that point;
>    sorry.)
> 
> This series only works with the discard policy, do you want to "fix" NMI
> in the delay policy as well?
> (NMI delivery is still is going to be wrong by any standard, but will
>  make some sense, at the cost of ugly code:  we would always inject NMI
>  when the timer fires and suppress NMI injection on EOI reinject.)
> 
> Anatomy of the series:
>  [1/14] fixes legacy NMI watchdog under discard policy.
>  [2-7/14] prepare for optimization of the discard policy.
>  [8/14] optimizes discard policy by removing notifiers.
>  [9-14/14] slightly improve related code.
> 
> I'm ok with dropping patches [2-14/14].
> 
> v1: http://www.spinics.net/lists/kvm/msg127017.html
> 
> Radim Krčmář (14):
>   KVM: x86: change PIT discard tick policy
>   KVM: x86: simplify atomics in kvm_pit_ack_irq
>   KVM: x86: add kvm_pit_reset_reinject
>   KVM: x86: use atomic_t instead of pit.inject_lock
>   KVM: x86: tone down WARN_ON pit.state_lock
>   KVM: x86: pass struct kvm_pit instead of kvm in PIT
>   KVM: x86: remove unnecessary uses of PIT state lock
>   KVM: x86: remove notifiers from PIT discard policy
>   KVM: x86: refactor kvm_create_pit
>   KVM: x86: refactor kvm_free_pit
>   KVM: x86: remove pit and kvm from kvm_kpit_state
>   KVM: x86: remove pointless dereference of PIT
>   KVM: x86: don't assume layout of kvm_kpit_state
>   KVM: x86: move PIT timer function initialization
> 
>  arch/x86/kvm/i8254.c | 318 +++++++++++++++++++++++----------------------------
>  arch/x86/kvm/i8254.h |  15 +--
>  arch/x86/kvm/x86.c   |  52 ++++++---
>  3 files changed, 187 insertions(+), 198 deletions(-)
> 

Patches 1 and 9-14:

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
  2016-02-18 16:13   ` Paolo Bonzini
@ 2016-02-18 18:49   ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-18 18:49 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel; +Cc: kvm, Yuki Shibuya



On 17/02/2016 20:14, Radim Krčmář wrote:
> 
> Even though there is a chance of regressions, I think we can fix the
> LVT0 NMI bug without introducing a new tick policy.

I agree, but please change the KVM_CAP_REINJECT_CONTROL check to return
2.  This way we can make QEMU reject old kernels when the user provides
"-global kvm-pit.lost_tick_policy=merge".

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-18 17:55         ` Paolo Bonzini
@ 2016-02-19 14:44           ` Radim Krčmář
  2016-02-25 12:34             ` Peter Krempa
  2016-02-25 13:38             ` Paolo Bonzini
  0 siblings, 2 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-19 14:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel, Peter Krempa

[Cc'd Peter, the last guy that touched timers in libvirt, because he
 might know what tick policies are supposed to be.]

2016-02-18 18:55+0100, Paolo Bonzini:
> On 18/02/2016 18:33, Paolo Bonzini wrote:
>> On 18/02/2016 17:56, Radim Krčmář wrote:
>>> 2016-02-18 17:13+0100, Paolo Bonzini:
>>>> On 17/02/2016 20:14, Radim Krčmář wrote:
>>>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
>>>>> before EOI from the last one.
>>>>>
>>>>> This patch changes the policy to always try to deliver the interrupt,
>>>>> which makes a difference when its vector is in ISR.
>>>>> Old implementation would drop the interrupt, but proposed one injects to
>>>>> IRR, like real hardware would.
>>>>
>>>> This seems like what libvirt calls the "merge" policy:
>>>
>>> Oops, I never looked beyond QEMU after seeing that the naming in libvirt
>>> doesn't even match ...
>>>
>>> I think the policy that KVM implements (which I call discard) is "delay"
>>> in libvirt.  (https://libvirt.org/formatdomain.html#elementsTime)

(I looked at libvirt code, but couldn't find any use of merge or discard
 policies, so please bear with me as I disagree wherever it's possible.)

>> Suppose the scheduled ticks are at times 0, 20, 40, 60, 80.  The EOI for
>> time 0 is only delivered at time 42, other EOIs are timely.
>> 
>> The resulting injections are:
>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.
>> 
>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>> 
>> For delay I *think* it would be 0, 42, 62, 82, 102.

I could call this "delay".

  Continue to deliver ticks at the normal rate. The guest time will be
  delayed due to the late tick

At 82 time units, the guest thinks it's 60, so the guest will do
everything late.  (Leading us to call it delayed?!)

Few examples of "delay" that I find easier to accept:
 0, 60, 80.
 0, 42, 60, 80.  Because we haven't missed the tick at 20, it just took
                 a while to be delivered.  (Semantics ...)

> Wrong: for delay it is something like 0, 42, 43, 60, 80.

Aargh!  One KVM policy does this and QEMU calls it 'delay'.  I think
that libvirt would call it "catchup".

  Deliver ticks at a higher rate to catch up with the missed tick. The
  guest time should not be delayed once catchup is complete.

At 80, the guest time is 80;  no signs of delay.

> Your patch does the right thing, QEMU is wrong in calling the policy
> "discard" where it should have been "merge".  In fact both i8254 and RTC
> use the same wrong nomenclature.

Terminlogy does suck. (Maybe it stems from the fact that QEMU talks
about lost ticks, but libvirt about ticks?)
Nevertheless, I don't think that libvirt "merge" covers what PIT does in
KVM or real hardware.

  Merge the missed tick(s) into one tick and inject. The guest time may
  be delayed, depending on how the OS reacts to the merging of ticks

No merging is happening in KVM or real hardware:  every tick is exactly
one tick, so the guest cannot tell that we missed some ticks and the
time is delayed.  If a tick made it into clear IRR, it's not missed.

In the example:

>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.

at 80, the guest thinks it's 60.

I think that merge might do:  0, 42, 60, 80.
But the tick at 42 is counted as two ticks (20, 40) in the guest.

The main problem of this interpretation is that discard is a subset of
merge:

>> - for discard: 0, 60, 80.

The tick at 60 has to be counted as three ticks (20, 40, 60).

*throws hands into the air and runs in circles*

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock
  2016-02-18 18:03   ` Paolo Bonzini
@ 2016-02-19 14:45     ` Radim Krčmář
  0 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-19 14:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya

2016-02-18 19:03+0100, Paolo Bonzini:
> On 17/02/2016 20:14, Radim Krčmář wrote:
> > +	/* All members before "struct mutex lock" are protected by the lock. */
> 
> ... except reinject, according to the commit message. :)

Ah, thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-18 18:08   ` Paolo Bonzini
@ 2016-02-19 15:04     ` Radim Krčmář
  2016-02-19 15:06       ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-19 15:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya

2016-02-18 19:08+0100, Paolo Bonzini:
> On 17/02/2016 20:14, Radim Krčmář wrote:
>> +	/* pit->pit_state.lock was overloaded to prevent userspace from getting
>> +	 * an inconsistent state after running multiple KVM_REINJECT_CONTROL
>> +	 * ioctls in parallel.  Use a separate lock if that ioctl isn't rare.
>> +	 */
>> +	mutex_lock(&pit->pit_state.lock);
>> +	kvm_pit_set_reinject(pit, control->pit_reinject);
>> +	mutex_unlock(&pit->pit_state.lock);
> 
> ... so in patch 7 concurrent _writes_ of reinject are protected by the
> lock, but reads are done outside it (in pit_timer_fn).  WDYT about
> making reinject an atomic_t?

There was/is no harm in having reinject non-atomic.  This patch added
notifiers, which is the reason for re-introducing a mutex.

Userspace can (and SHOULDN'T) call this function multiple times,
concurrently, so the mutex prevents a situations where, e.g. only one
notifier is registered in the end.

I thought about really stupid stuff when doing this series ...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-19 15:04     ` Radim Krčmář
@ 2016-02-19 15:06       ` Paolo Bonzini
  2016-02-19 15:09         ` Radim Krčmář
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-19 15:06 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Yuki Shibuya



On 19/02/2016 16:04, Radim Krčmář wrote:
>> > 
>> > ... so in patch 7 concurrent _writes_ of reinject are protected by the
>> > lock, but reads are done outside it (in pit_timer_fn).  WDYT about
>> > making reinject an atomic_t?
> There was/is no harm in having reinject non-atomic.  This patch added
> notifiers, which is the reason for re-introducing a mutex.
> 
> Userspace can (and SHOULDN'T) call this function multiple times,
> concurrently, so the mutex prevents a situations where, e.g. only one
> notifier is registered in the end.

Yes, I understand why you added the mutex here; good catch indeed.  The
atomic_t is just to show that it's okay to read it outside the lock.
It's just for a bit of extra documentation.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy
  2016-02-19 15:06       ` Paolo Bonzini
@ 2016-02-19 15:09         ` Radim Krčmář
  0 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-19 15:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya

2016-02-19 16:06+0100, Paolo Bonzini:
> On 19/02/2016 16:04, Radim Krčmář wrote:
>>>> ... so in patch 7 concurrent _writes_ of reinject are protected by the
>>>> lock, but reads are done outside it (in pit_timer_fn).  WDYT about
>>>> making reinject an atomic_t?
>> There was/is no harm in having reinject non-atomic.  This patch added
>> notifiers, which is the reason for re-introducing a mutex.
>> 
>> Userspace can (and SHOULDN'T) call this function multiple times,
>> concurrently, so the mutex prevents a situations where, e.g. only one
>> notifier is registered in the end.
> 
> Yes, I understand why you added the mutex here; good catch indeed.  The
> atomic_t is just to show that it's okay to read it outside the lock.
> It's just for a bit of extra documentation.

Hm, good point.  I will add it.  Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq
  2016-02-18 18:04   ` Paolo Bonzini
@ 2016-02-19 15:51     ` Radim Krčmář
  2016-02-19 15:56       ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-19 15:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya

2016-02-18 19:04+0100, Paolo Bonzini:
> On 17/02/2016 20:14, Radim Krčmář wrote:
>> -	value = atomic_dec_return(&ps->pending);
>> -	if (value < 0)
>> -		/* spurious acks can be generated if, for example, the
>> -		 * PIC is being reset.  Handle it gracefully here
>> -		 */
>> -		atomic_inc(&ps->pending);
>> -	else if (value > 0 && ps->reinject)
>> -		/* in this case, we had multiple outstanding pit interrupts
>> -		 * that we needed to inject.  Reinject
>> -		 */
>> +	if (atomic_dec_if_positive(&ps->pending) > 0 && ps->reinject)
>>  		queue_kthread_work(&ps->pit->worker, &ps->pit->expired);
> 
> Here it would have made sense to do already
> 
> 	if (!ps->reinject) {
> 		WARN_ON_ONCE(ps->pending || !ps->irq_ack);
> 		return;
> 	}

I will add the WARN_ON when removing discard notifiers.

> 	spin_lock(...)
> 	if (atomic_dec_if_positive(&ps->pending) > 0)
> 		queue_kthread_work(...);
> 	ps->irq_ack = 1;
> 	spin_unlock(...)
> 	
> because ps->pending is only ever nonzero, and irq_ack is only ever zero,
> if ps->reinject.

(Well, userspace can switch between policies at runtime.)

>                   Not a big deal since the ack notifier is going to
> disappear altogether for the discard policy, but the nice thing is that
> it lets you remove the ack notifier earlier and disentangle a bit more
> discard mode.
> 
> So if you want for v3 you can reorder the patches like this:

The end result is going to be identical.  I had a version that did
something similar and it was pretty tangled as well -- I wanted to
remove useless locks before re-using one for the ioctls.
(We need the protection earlier, because userspace can control notifiers
 while PIT is still being initialized.  And removing the lock had
 dependencies.)

> 
> - patch 1, same
> 
> - patch 2, what is outlined above
> 
> - patch 3, remove ack notifier for discard

I agree that current ordering looks weird.  The dependency tree looked
like this in my mind:

-[1/14]
 ,-[2/14]
-[4/14]
 ,-[3/14]
 |   ,-[5/14]
 | ,-[6/14]
 +-[7/14]
-[8/14]
-[9-14/14]

I added [2-4/14] early (and a bit out of order), because it made diffs
shorter.  Dependency on [7/14] can dropped with correct mutexing inside
initialization, so the v3 order would be:

-[1/14]
 ,-[3/14]
-[8/14]
 ,-[2/14]
-[4/14]
   ,-[5/14]
 ,-[6/14]
-[7/14]
-[9-14/14]

With [8/14] (remove ack notifier for discard) as third.

Would that be ok?

> - patch 4..14 the rest
> 
> Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq
  2016-02-19 15:51     ` Radim Krčmář
@ 2016-02-19 15:56       ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-19 15:56 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Yuki Shibuya



On 19/02/2016 16:51, Radim Krčmář wrote:
> The end result is going to be identical.  I had a version that did
> something similar and it was pretty tangled as well -- I wanted to
> remove useless locks before re-using one for the ioctls.
> (We need the protection earlier, because userspace can control notifiers
>  while PIT is still being initialized.  And removing the lock had
>  dependencies.)

Yeah, I eventually imagined that cleaning up the locks helps with the
patch that adds/removes the notifiers dynamically.  Then I guess your
current ordering of the patches is good!

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-19 14:44           ` Radim Krčmář
@ 2016-02-25 12:34             ` Peter Krempa
  2016-02-25 13:38             ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Krempa @ 2016-02-25 12:34 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, linux-kernel, kvm, Yuki Shibuya, Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]

On Fri, Feb 19, 2016 at 15:44:23 +0100, Radim Krčmář wrote:
> [Cc'd Peter, the last guy that touched timers in libvirt, because he
>  might know what tick policies are supposed to be.]

I found the following RFC that describes the design of timer access in
libvirt:

http://www.redhat.com/archives/libvir-list/2010-March/msg00304.html

> 
> 2016-02-18 18:55+0100, Paolo Bonzini:
> > On 18/02/2016 18:33, Paolo Bonzini wrote:
> >> On 18/02/2016 17:56, Radim Krčmář wrote:
> >>> 2016-02-18 17:13+0100, Paolo Bonzini:
> >>>> On 17/02/2016 20:14, Radim Krčmář wrote:
> >>>>> Discard policy uses ack_notifiers to prevent injection of PIT interrupts
> >>>>> before EOI from the last one.
> >>>>>
> >>>>> This patch changes the policy to always try to deliver the interrupt,
> >>>>> which makes a difference when its vector is in ISR.
> >>>>> Old implementation would drop the interrupt, but proposed one injects to
> >>>>> IRR, like real hardware would.
> >>>>
> >>>> This seems like what libvirt calls the "merge" policy:
> >>>
> >>> Oops, I never looked beyond QEMU after seeing that the naming in libvirt
> >>> doesn't even match ...
> >>>
> >>> I think the policy that KVM implements (which I call discard) is "delay"
> >>> in libvirt.  (https://libvirt.org/formatdomain.html#elementsTime)
> 
> (I looked at libvirt code, but couldn't find any use of merge or discard
>  policies, so please bear with me as I disagree wherever it's possible.)

Indeed, it looks like it never was implemented. Unfortunately i'm not
able to assist more in this case.

Peter

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-19 14:44           ` Radim Krčmář
  2016-02-25 12:34             ` Peter Krempa
@ 2016-02-25 13:38             ` Paolo Bonzini
  2016-02-25 17:36               ` Radim Krčmář
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-25 13:38 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel, Peter Krempa



On 19/02/2016 15:44, Radim Krčmář wrote:
>> The resulting injections are:
>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.

I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew"
policy.  So we can change QEMU's kvm-i8254 to accept "slew" and warn if
"delay" is given.

In fact "slew" means "a large number or quantity of something" and
indeed that's a good word to characterize kvm-i8254's reinjection behavior.

>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>>>
>>> For delay I *think* it would be 0, 42, 62, 82, 102.
> 
> I could call this "delay".
> 
>   Continue to deliver ticks at the normal rate. The guest time will be
>   delayed due to the late tick
> 
> At 82 time units, the guest thinks it's 60, so the guest will do
> everything late.  (Leading us to call it delayed?!)

Yup, this was my reasoning.

> Few examples of "delay" that I find easier to accept:
>  0, 60, 80.

This is "discard".

>  0, 42, 60, 80.  Because we haven't missed the tick at 20, it just took
>                  a while to be delivered.  (Semantics ...)

This is not discard.  The ideal implementation of the tick policies is
that the timer devices enjoy information from the interrupt controller,
that lets them know when a tick cannot be delivered.  In that case they
do stuff like:

- save it for later (catchup)

- drop it for good (which is discard, and not the same as stashing it in
IRR)

- pause the timer (delay)

- coalesce the ticks into one late tick (merge)

> Terminlogy does suck. (Maybe it stems from the fact that QEMU talks
> about lost ticks, but libvirt about ticks?)
> Nevertheless, I don't think that libvirt "merge" covers what PIT does in
> KVM or real hardware.
> 
>   Merge the missed tick(s) into one tick and inject. The guest time may
>   be delayed, depending on how the OS reacts to the merging of ticks
> 
> No merging is happening in KVM or real hardware:  every tick is exactly
> one tick, so the guest cannot tell that we missed some ticks and the
> time is delayed.  If a tick made it into clear IRR, it's not missed.

The libvirt documentation is written from the point of view of the
guest, ignoring whether the late tick is recorded in some guest-visible
register (IRR) or in the processor state (as is the case for NMI) or in
the timer device state or who knows where.

Therefore, it _also_ happens that thanks to IRR and NMI latching you can
implement "merge" without having that kind of relationship between the
timer device and the interrupt controller.  But libvirt doesn't care,
all the <timer> element wants is the guest-visible behavior in terms of
when the ticks are delivered.

This was my reasoning when proposing to change QEMU regarding "discard"
as well:

- RTC would warn for "discard" and accept "merge"

- kvm-i8254 would warn for "discard", and accept "merge" if the
capability says that your patch is in.  The idea is that "discard" is
such a bad idea, that "merge" should fail if the hypervisor would cause
the watchdog to hang.

> In the example:
> 
>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
> 
> at 80, the guest thinks it's 60.
> 
> I think that merge might do:  0, 42, 60, 80.
> But the tick at 42 is counted as two ticks (20, 40) in the guest.

Yes, merge is a good policy if the guest can somehow realize that 42
stood for (20, 40).  Discard would be a good policy too if the guest can
somehow realize that 60 stood for (20, 40, 60) but it has the problem
that NMIs don't do EOIs.

> The main problem of this interpretation is that discard is a subset of
> merge:
> 
>>> - for discard: 0, 60, 80.
> 
> The tick at 60 has to be counted as three ticks (20, 40, 60).

Why is it a problem?

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-25 13:38             ` Paolo Bonzini
@ 2016-02-25 17:36               ` Radim Krčmář
  2016-02-25 19:11                 ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Radim Krčmář @ 2016-02-25 17:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel, Peter Krempa

I think I understand your definitions of tick policies and I remain
convinced that libvirt defines them differently.

Would you prefer that I got a confirmation on the libvirt list before
continuing the discussion here?

Two relevant parts of my reply that don't depend on libvirt's reply are
marked with ** .

2016-02-25 14:38+0100, Paolo Bonzini:
> On 19/02/2016 15:44, Radim Krčmář wrote:
>>> The resulting injections are:
>>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.
> 
> I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew"
> policy.

We do.  (Libvirt "catchup" is also QEMU "delay".)

>          So we can change QEMU's kvm-i8254 to accept "slew" and warn if
> "delay" is given.
**
QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines:

  delay   - replay all lost ticks in a row once the guest accepts them
            again
  slew    - lost ticks are gradually replayed at a higher frequency than
            the original tick

"delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I
think we shouldn't change it.

> In fact "slew" means "a large number or quantity of something" and
> indeed that's a good word to characterize kvm-i8254's reinjection behavior.

(Isn't it a verb, with a similar meaning as "drift"? ;])

>>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>>>>
>>>> For delay I *think* it would be 0, 42, 62, 82, 102.
>> 
>> I could call this "delay".
>> 
>>   Continue to deliver ticks at the normal rate. The guest time will be
>>   delayed due to the late tick
>> 
>> At 82 time units, the guest thinks it's 60, so the guest will do
>> everything late.  (Leading us to call it delayed?!)
> 
> Yup, this was my reasoning.
> 
>> Few examples of "delay" that I find easier to accept:
>>  0, 60, 80.
> 
> This is "discard".

At 80, the guest thinks that the time is 40, so every action it does
will still be delayed.  I don't see why it isn't libvirt "delay":
 - ticks are delivered at the normal rate
 - guest time is delayed

(The original RFC that Peter shared called this policy "none".)

I don't think it is libvirt "discard":
 - missed ticks were thrown away
 - future injection continues normally

which is fine, but
 - the guest time is delayed, because there isn't a way to handle lost
   ticks

and this is incompatible with libvirt's definition of "discard"

  The guest time may be delayed, unless the OS has explicit handling of
  lost ticks.

"may" doesn't fit.  You can only say
 - the guest time is delayed.

which is best described by "delay".

>>  0, 42, 60, 80.  Because we haven't missed the tick at 20, it just took
>>                  a while to be delivered.  (Semantics ...)
> 
> This is not discard.  The ideal implementation of the tick policies is
> that the timer devices enjoy information from the interrupt controller,
> that lets them know when a tick cannot be delivered.  In that case they
> do stuff like:

Yes.  The case where the timer doesn't know about missed ticks also
deserves a tick policy and I think that libvirt calls it "delay", QEMU
"discard".

> - save it for later (catchup)
> 
> - drop it for good (which is discard, and not the same as stashing it in
> IRR)
> 
> - pause the timer (delay)
> 
> - coalesce the ticks into one late tick (merge)
> 
>> Terminlogy does suck. (Maybe it stems from the fact that QEMU talks
>> about lost ticks, but libvirt about ticks?)
>> Nevertheless, I don't think that libvirt "merge" covers what PIT does in
>> KVM or real hardware.
>> 
>>   Merge the missed tick(s) into one tick and inject. The guest time may
>>   be delayed, depending on how the OS reacts to the merging of ticks
>> 
>> No merging is happening in KVM or real hardware:  every tick is exactly
>> one tick, so the guest cannot tell that we missed some ticks and the
>> time is delayed.  If a tick made it into clear IRR, it's not missed.
> 
> The libvirt documentation is written from the point of view of the
> guest, ignoring whether the late tick is recorded in some guest-visible
> register (IRR) or in the processor state (as is the case for NMI) or in
> the timer device state or who knows where.

Yes.

> Therefore, it _also_ happens that thanks to IRR and NMI latching you can
> implement "merge" without having that kind of relationship between the
> timer device and the interrupt controller.

I disagree.  IRR can catch at most one interrupt, so it is insufficient
to implement libvirt's merge.  (libvirt's merge also has the conditional
"The guest time may be delayed".)

>                                             But libvirt doesn't care,
> all the <timer> element wants is the guest-visible behavior in terms of
> when the ticks are delivered.

Yes, and the guest can't behave like the libvirt definition says.

Libvirt mainly cares if guest's time was irreversibly delayed or not and
for how long can the time be delayed.

(3 of 4 policies in libvirt don't need to delay the time.
 Only 1 policy in QEMU lets the guest recover time.  2 if you say that
 libvirt "merge" = QEMU "merge" and keep pit as QEMU "discard".)

> This was my reasoning when proposing to change QEMU regarding "discard"
> as well:
> 
> - RTC would warn for "discard" and accept "merge"
> 
> - kvm-i8254 would warn for "discard", and accept "merge" if the
> capability says that your patch is in.  The idea is that "discard" is
> such a bad idea, that "merge" should fail if the hypervisor would cause
> the watchdog to hang.
**
I think it's better to say that the tick in IRR wasn't lost and policy
is "discard".
  discard - ignore lost ticks (e.g. if the guest compensates for them
            already)

(We don't need to say that a tick is lost just because it couldn't be
 delivered within some interval.)

"merge" also fits the behavior of kvm-i8254 if you treat IRR as a part
of the timer policy.
  merge   - if multiple ticks are lost, all of them are merged into one
            which is replayed once the guest accepts it again

(It means that every tick was "lost" and then replayed, usually right
 after the tick was "lost" -- this is because you can be only be merging
 lost ticks, so if there could be a non-lost tick in IRR while no tick
 was being handled, e.g. disabled interrupts, and you wanted to deliver
 a next one, then you'd need to have another bit that remembers lost
 ticks.)

>> In the example:
>> 
>>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>> 
>> at 80, the guest thinks it's 60.
>> 
>> I think that merge might do:  0, 42, 60, 80.
>> But the tick at 42 is counted as two ticks (20, 40) in the guest.
> 
> Yes, merge is a good policy if the guest can somehow realize that 42
> stood for (20, 40).  Discard would be a good policy too if the guest can
> somehow realize that 60 stood for (20, 40, 60) but it has the problem
> that NMIs don't do EOIs.

Yes, recognizing lost ticks is a prerequisite for catchup/delay/discard.
Standard i8254 & friends aren't able to implement any policy other than
libvirt "delay", because there is no notion of lost ticks.
(KVM's changes to EOI notifiers were very shortsighted.)

>> The main problem of this interpretation is that discard is a subset of
>> merge:
>> 
>>>> - for discard: 0, 60, 80.
>> 
>> The tick at 60 has to be counted as three ticks (20, 40, 60).
> 
> Why is it a problem?

Defining a subset without using the superset is a major flaw in a spec.
It got clarified in the link that Peter gave, where merge delivers
"asap" while discard just waits for the next tick. :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-25 17:36               ` Radim Krčmář
@ 2016-02-25 19:11                 ` Paolo Bonzini
  2016-02-26 13:44                   ` Radim Krčmář
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-02-25 19:11 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel, Peter Krempa


> 2016-02-25 14:38+0100, Paolo Bonzini:
>> On 19/02/2016 15:44, Radim Krčmář wrote:
>>>> The resulting injections are:
>>>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.
>>
>> I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew"
>> policy.
> 
> We do.  (Libvirt "catchup" is also QEMU "delay".)
> 
>>          So we can change QEMU's kvm-i8254 to accept "slew" and warn if
>> "delay" is given.
> **
> QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines:
> 
>   delay   - replay all lost ticks in a row once the guest accepts them
>             again
>   slew    - lost ticks are gradually replayed at a higher frequency than
>             the original tick
> 
> "delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I
> think we shouldn't change it.

Ooh, I missed this commit message indeed.  Then libvirt delay != QEMU
delay, isn't it?

>> In fact "slew" means "a large number or quantity of something" and
>> indeed that's a good word to characterize kvm-i8254's reinjection behavior.
> 
> (Isn't it a verb, with a similar meaning as "drift"? ;])

It's a noun too, like "I've just gotten a whole slew of bugs assigned to
me".

>>> Few examples of "delay" that I find easier to accept:
>>>  0, 60, 80.
>>
>> This is "discard".
> 
> At 80, the guest thinks that the time is 40, so every action it does
> will still be delayed.  I don't see why it isn't libvirt "delay":
>  - ticks are delivered at the normal rate
>  - guest time is delayed

I can buy this. :)

> I don't think it is libvirt "discard":
>  - missed ticks were thrown away
>  - future injection continues normally
> 
> which is fine, but
>  - the guest time is delayed, because there isn't a way to handle lost
>    ticks
> 
> and this is incompatible with libvirt's definition of "discard"
> 
>   The guest time may be delayed, unless the OS has explicit handling of
>   lost ticks.
> 
> "may" doesn't fit.  You can only say
>  - the guest time is delayed.
> 
> which is best described by "delay".

I think we can safely ignore the "may be" -- you cannot say for sure
that the guest time "will" be delayed since you could always have a very
enlightened guest.

... but then, by removing the handwavy "may be" would you say that
libvirt delay and libvirt discard are the same?  Would 0, 42, 62, 82 be
a valid implementation of the libvirt "delay" policy?

>> Therefore, it _also_ happens that thanks to IRR and NMI latching you can
>> implement "merge" without having that kind of relationship between the
>> timer device and the interrupt controller.
> 
> I disagree.  IRR can catch at most one interrupt, so it is insufficient
> to implement libvirt's merge.  (libvirt's merge also has the conditional
> "The guest time may be delayed".)

Hmm... is your point that the i8254 _alone_ is implementing discard, and
the tick delivery time is _actually_ 0, 20, 60, 80 (and the t=20 tick is
delivered late but not lost due to the i8259 buffer)?  And hence the
QEMU device model should see it as discard.  I can definitely agree with
that.

There is still the matter of:

- improving the documentation

- clarifying the meaning of libvirt delay

- deciding whether it's worth changing the meaning of QEMU delay to
match libvirt's (and the default kvm-pit policy from delay to slew)

But if we can agree on this, I can apply patch 1 as is, even for 4.5.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy
  2016-02-25 19:11                 ` Paolo Bonzini
@ 2016-02-26 13:44                   ` Radim Krčmář
  0 siblings, 0 replies; 37+ messages in thread
From: Radim Krčmář @ 2016-02-26 13:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Yuki Shibuya, Rik van Riel, Peter Krempa

2016-02-25 20:11+0100, Paolo Bonzini:
>> 2016-02-25 14:38+0100, Paolo Bonzini:
>>> On 19/02/2016 15:44, Radim Krčmář wrote:
>>>          So we can change QEMU's kvm-i8254 to accept "slew" and warn if
>>> "delay" is given.
>> **
>> QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines:
>> 
>>   delay   - replay all lost ticks in a row once the guest accepts them
>>             again
>>   slew    - lost ticks are gradually replayed at a higher frequency than
>>             the original tick
>> 
>> "delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I
>> think we shouldn't change it.
> 
> Ooh, I missed this commit message indeed.  Then libvirt delay != QEMU
> delay, isn't it?

Exactly, it's sad.  libvirt delay -> QEMU discard.

(Lost) tick policy relations look like this
   libvirt  | QEMU
   catchup -> delay
   delay   -> discard
   discard -> n/a
   catchup -> slew
   delay   -> merge (?)
   merge   -> n/a (?)

Delay, discard, and merge are too ambiguous to make sense.

>> [...]
>> and this is incompatible with libvirt's definition of "discard"
>> 
>>   The guest time may be delayed, unless the OS has explicit handling of
>>   lost ticks.
>> 
>> "may" doesn't fit.  You can only say
>>  - the guest time is delayed.
>> 
>> which is best described by "delay".
> 
> I think we can safely ignore the "may be" -- you cannot say for sure
> that the guest time "will" be delayed since you could always have a very
> enlightened guest.
> ... but then, by removing the handwavy "may be" would you say that
> libvirt delay and libvirt discard are the same?

I would, which is why the "may be" is significant -- the timer has to
provide tools to the guest, even if the guest ignores them.

Do you agree that following rephrasing is identical to libvirt discard?
  Lost ticks will delay the guest time, unless the guest OS has explicit
  handling of lost ticks.

>                                                  Would 0, 42, 62, 82 be
> a valid implementation of the libvirt "delay" policy?

Distinguishing it from saner variants (0, 60, 80 or 0, 20/42, 60, 80)
would be nice, because shifting the phase is not "continuing at normal
rate" for me, but I'd frown and accept it ...
The important part is that guest time never recovers from the lost tick.

>>> Therefore, it _also_ happens that thanks to IRR and NMI latching you can
>>> implement "merge" without having that kind of relationship between the
>>> timer device and the interrupt controller.
>> 
>> I disagree.  IRR can catch at most one interrupt, so it is insufficient
>> to implement libvirt's merge.  (libvirt's merge also has the conditional
>> "The guest time may be delayed".)
> 
> Hmm... is your point that the i8254 _alone_ is implementing discard, and
> the tick delivery time is _actually_ 0, 20, 60, 80 (and the t=20 tick is
> delivered late but not lost due to the i8259 buffer)?  And hence the
> QEMU device model should see it as discard.  I can definitely agree with
> that.

Yes.  Only the tick at 40 is lost.

> There is still the matter of:
> 
> - improving the documentation

Absolutely, this email thread shouldn't have existed.

QEMU merge policy isn't defined de facto ... do we say that it falls
into libvirt delay?  (Or just remove it?)

  merge   - if multiple ticks are lost, all of them are merged into one
            which is replayed once the guest accepts it again

> - clarifying the meaning of libvirt delay

Yes, I wouldn't mind excluding 0, 42, 62, 82. :)

> - deciding whether it's worth changing the meaning of QEMU delay to
> match libvirt's (and the default kvm-pit policy from delay to slew)

Changing QEMU's lost_tick_policy seems like a recipe for confusion.
It'd be nice to unify QEMU and libvirt terms, but QEMU does care about
backward compatibility and I think it is not wise to do this change
without a new interface.  I'd rather just document and forget. :)

> But if we can agree on this, I can apply patch 1 as is, even for 4.5.

I think we agree on all parts that affect this series.
I'll start preparing v3.  (Likely posting on Tuesday/Wednesday.)

Thank you.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-02-26 13:44 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 19:14 [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 01/14] KVM: x86: change PIT discard tick policy Radim Krčmář
2016-02-18 16:13   ` Paolo Bonzini
2016-02-18 16:56     ` Radim Krčmář
2016-02-18 17:33       ` Paolo Bonzini
2016-02-18 17:55         ` Paolo Bonzini
2016-02-19 14:44           ` Radim Krčmář
2016-02-25 12:34             ` Peter Krempa
2016-02-25 13:38             ` Paolo Bonzini
2016-02-25 17:36               ` Radim Krčmář
2016-02-25 19:11                 ` Paolo Bonzini
2016-02-26 13:44                   ` Radim Krčmář
2016-02-18 18:49   ` Paolo Bonzini
2016-02-17 19:14 ` [PATCH v2 02/14] KVM: x86: simplify atomics in kvm_pit_ack_irq Radim Krčmář
2016-02-18 18:04   ` Paolo Bonzini
2016-02-19 15:51     ` Radim Krčmář
2016-02-19 15:56       ` Paolo Bonzini
2016-02-17 19:14 ` [PATCH v2 03/14] KVM: x86: add kvm_pit_reset_reinject Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 04/14] KVM: x86: use atomic_t instead of pit.inject_lock Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 05/14] KVM: x86: tone down WARN_ON pit.state_lock Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 06/14] KVM: x86: pass struct kvm_pit instead of kvm in PIT Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 07/14] KVM: x86: remove unnecessary uses of PIT state lock Radim Krčmář
2016-02-18 18:03   ` Paolo Bonzini
2016-02-19 14:45     ` Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 08/14] KVM: x86: remove notifiers from PIT discard policy Radim Krčmář
2016-02-18 18:05   ` Paolo Bonzini
2016-02-18 18:08   ` Paolo Bonzini
2016-02-19 15:04     ` Radim Krčmář
2016-02-19 15:06       ` Paolo Bonzini
2016-02-19 15:09         ` Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 09/14] KVM: x86: refactor kvm_create_pit Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 10/14] KVM: x86: refactor kvm_free_pit Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 11/14] KVM: x86: remove pit and kvm from kvm_kpit_state Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 12/14] KVM: x86: remove pointless dereference of PIT Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 13/14] KVM: x86: don't assume layout of kvm_kpit_state Radim Krčmář
2016-02-17 19:14 ` [PATCH v2 14/14] KVM: x86: move PIT timer function initialization Radim Krčmář
2016-02-18 18:11 ` [PATCH v2 00/14] KVM: x86: change PIT discard policy and untangle related code Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.