All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] x86/kvm/hyperv: stable clocksource for L2 guests when running nested KVM on Hyper-V
@ 2018-01-24 13:23 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Changes since v3:
- PATCH3:
  - drop 'include <linux/interrupt.h>' (and __visible/_irq_entry from
    hyperv_reenlightenment_intr declaration -- but not from implementation)
    from mshyperv.h [kbuild robot] I'm retaining Thomas' Reviewed-by tag as
    the change is small, hope that's OK.
- PATCH7:
  - add static to kvm_hyperv_tsc_notifier() [kbuild robot]. I'm also
    keeping Paolo's Acked-by as the change is trivial.

So currently only PATCH2 of the series doesn't have a Reviewed-by/
Acked-by tag.

Original description:

Currently, KVM passes PVCLOCK_TSC_STABLE_BIT to its guests when running in
so called 'masterclock' mode and this is only possible when the clocksource
on the host is TSC. When running nested on Hyper-V we're using a different
clocksource in L1 (Hyper-V TSC Page) which can actually be used for
masterclock. This series brings the required support.

Making KVM work with TSC page clocksource is relatively easy, it is done in
PATCH 6 of the series. All the rest is required to support L1 migration
when TSC frequency changes, we use a special feature from Hyper-V to do
the job.

Vitaly Kuznetsov (7):
  x86/hyper-v: check for required priviliges in hyperv_init()
  x86/hyper-v: add a function to read both TSC and TSC page value
    simulateneously
  x86/hyper-v: reenlightenment notifications support
  x86/hyper-v: redirect reenlightment notifications on CPU offlining
  x86/irq: Count Hyper-V reenlightenment interrupts
  x86/kvm: pass stable clocksource to guests when running nested on
    Hyper-V
  x86/kvm: support Hyper-V reenlightenment

 arch/x86/entry/entry_32.S          |   3 +
 arch/x86/entry/entry_64.S          |   3 +
 arch/x86/hyperv/hv_init.c          | 123 ++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/hardirq.h     |   3 +
 arch/x86/include/asm/irq_vectors.h |   7 +-
 arch/x86/include/asm/mshyperv.h    |  32 +++++++--
 arch/x86/include/uapi/asm/hyperv.h |  27 ++++++++
 arch/x86/kernel/cpu/mshyperv.c     |   6 ++
 arch/x86/kernel/irq.c              |   9 +++
 arch/x86/kvm/x86.c                 | 138 ++++++++++++++++++++++++++++++-------
 10 files changed, 319 insertions(+), 32 deletions(-)

-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 0/7] x86/kvm/hyperv: stable clocksource for L2 guests when running nested KVM on Hyper-V
@ 2018-01-24 13:23 ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Changes since v3:
- PATCH3:
  - drop 'include <linux/interrupt.h>' (and __visible/_irq_entry from
    hyperv_reenlightenment_intr declaration -- but not from implementation)
    from mshyperv.h [kbuild robot] I'm retaining Thomas' Reviewed-by tag as
    the change is small, hope that's OK.
- PATCH7:
  - add static to kvm_hyperv_tsc_notifier() [kbuild robot]. I'm also
    keeping Paolo's Acked-by as the change is trivial.

So currently only PATCH2 of the series doesn't have a Reviewed-by/
Acked-by tag.

Original description:

Currently, KVM passes PVCLOCK_TSC_STABLE_BIT to its guests when running in
so called 'masterclock' mode and this is only possible when the clocksource
on the host is TSC. When running nested on Hyper-V we're using a different
clocksource in L1 (Hyper-V TSC Page) which can actually be used for
masterclock. This series brings the required support.

Making KVM work with TSC page clocksource is relatively easy, it is done in
PATCH 6 of the series. All the rest is required to support L1 migration
when TSC frequency changes, we use a special feature from Hyper-V to do
the job.

Vitaly Kuznetsov (7):
  x86/hyper-v: check for required priviliges in hyperv_init()
  x86/hyper-v: add a function to read both TSC and TSC page value
    simulateneously
  x86/hyper-v: reenlightenment notifications support
  x86/hyper-v: redirect reenlightment notifications on CPU offlining
  x86/irq: Count Hyper-V reenlightenment interrupts
  x86/kvm: pass stable clocksource to guests when running nested on
    Hyper-V
  x86/kvm: support Hyper-V reenlightenment

 arch/x86/entry/entry_32.S          |   3 +
 arch/x86/entry/entry_64.S          |   3 +
 arch/x86/hyperv/hv_init.c          | 123 ++++++++++++++++++++++++++++++++-
 arch/x86/include/asm/hardirq.h     |   3 +
 arch/x86/include/asm/irq_vectors.h |   7 +-
 arch/x86/include/asm/mshyperv.h    |  32 +++++++--
 arch/x86/include/uapi/asm/hyperv.h |  27 ++++++++
 arch/x86/kernel/cpu/mshyperv.c     |   6 ++
 arch/x86/kernel/irq.c              |   9 +++
 arch/x86/kvm/x86.c                 | 138 ++++++++++++++++++++++++++++++-------
 10 files changed, 319 insertions(+), 32 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/7] x86/hyper-v: check for required priviliges in hyperv_init()
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

In hyperv_init() we presume we always have access to VP index and hypercall
MSRs while according to the specification we should check if we're allowed
to access the corresponding MSRs before accessing them.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 189a398290db..21f9d53d9f00 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -110,12 +110,19 @@ static int hv_cpu_init(unsigned int cpu)
  */
 void hyperv_init(void)
 {
-	u64 guest_id;
+	u64 guest_id, required_msrs;
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return;
 
+	/* Absolutely required MSRs */
+	required_msrs = HV_X64_MSR_HYPERCALL_AVAILABLE |
+		HV_X64_MSR_VP_INDEX_AVAILABLE;
+
+	if ((ms_hyperv.features & required_msrs) != required_msrs)
+		return;
+
 	/* Allocate percpu VP index */
 	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
 				    GFP_KERNEL);
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 1/7] x86/hyper-v: check for required priviliges in hyperv_init()
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

In hyperv_init() we presume we always have access to VP index and hypercall
MSRs while according to the specification we should check if we're allowed
to access the corresponding MSRs before accessing them.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 189a398290db..21f9d53d9f00 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -110,12 +110,19 @@ static int hv_cpu_init(unsigned int cpu)
  */
 void hyperv_init(void)
 {
-	u64 guest_id;
+	u64 guest_id, required_msrs;
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return;
 
+	/* Absolutely required MSRs */
+	required_msrs = HV_X64_MSR_HYPERCALL_AVAILABLE |
+		HV_X64_MSR_VP_INDEX_AVAILABLE;
+
+	if ((ms_hyperv.features & required_msrs) != required_msrs)
+		return;
+
 	/* Allocate percpu VP index */
 	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
 				    GFP_KERNEL);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/7] x86/hyper-v: add a function to read both TSC and TSC page value simulateneously
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

This is going to be used from KVM code where we need to get both
TSC and TSC page value.

Nobody is supposed to use the function when Hyper-V code is compiled out,
just BUG().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       |  1 +
 arch/x86/include/asm/mshyperv.h | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 21f9d53d9f00..1a6c63f721bc 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return tsc_pg;
 }
+EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8bf450b13d9f..6b1d4ea78270 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -325,9 +325,10 @@ static inline void hyperv_setup_mmu_ops(void) {}
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
-static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
 {
-	u64 scale, offset, cur_tsc;
+	u64 scale, offset;
 	u32 sequence;
 
 	/*
@@ -358,7 +359,7 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 		scale = READ_ONCE(tsc_pg->tsc_scale);
 		offset = READ_ONCE(tsc_pg->tsc_offset);
-		cur_tsc = rdtsc_ordered();
+		*cur_tsc = rdtsc_ordered();
 
 		/*
 		 * Make sure we read sequence after we read all other values
@@ -368,7 +369,14 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+}
+
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 cur_tsc;
+
+	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
 }
 
 #else
@@ -376,5 +384,12 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return NULL;
 }
+
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
+{
+	BUG();
+	return U64_MAX;
+}
 #endif
 #endif
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/7] x86/hyper-v: add a function to read both TSC and TSC page value simulateneously
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

This is going to be used from KVM code where we need to get both
TSC and TSC page value.

Nobody is supposed to use the function when Hyper-V code is compiled out,
just BUG().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c       |  1 +
 arch/x86/include/asm/mshyperv.h | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 21f9d53d9f00..1a6c63f721bc 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return tsc_pg;
 }
+EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8bf450b13d9f..6b1d4ea78270 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -325,9 +325,10 @@ static inline void hyperv_setup_mmu_ops(void) {}
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
-static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
 {
-	u64 scale, offset, cur_tsc;
+	u64 scale, offset;
 	u32 sequence;
 
 	/*
@@ -358,7 +359,7 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 		scale = READ_ONCE(tsc_pg->tsc_scale);
 		offset = READ_ONCE(tsc_pg->tsc_offset);
-		cur_tsc = rdtsc_ordered();
+		*cur_tsc = rdtsc_ordered();
 
 		/*
 		 * Make sure we read sequence after we read all other values
@@ -368,7 +369,14 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+}
+
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 cur_tsc;
+
+	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
 }
 
 #else
@@ -376,5 +384,12 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return NULL;
 }
+
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
+{
+	BUG();
+	return U64_MAX;
+}
 #endif
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 3/7] x86/hyper-v: reenlightenment notifications support
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when we are migrated to a host with
different TSC frequency for some short period host emulates our accesses
to TSC and sends us an interrupt to notify about the event. When we're
done updating everything we can disable TSC emulation and everything will
start working fast again.

We didn't need these notifications before as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux we even mark it as unstable
on boot. Guests normally use 'tsc page' clocksouce and host updates its
values on migrations automatically.

Things change when we want to run nested virtualization: even when we pass
through PV clocksources (kvm-clock or tsc page) to our guests we need to
know TSC frequency and when it changes.

Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
v3 -> v4:
- drop 'include <linux/interrupt.h>' (and __visible/_irq_entry from
  hyperv_reenlightenment_intr declaration) from mshyperv.h
  [kbuild robot]
---
 arch/x86/entry/entry_32.S          |  3 ++
 arch/x86/entry/entry_64.S          |  3 ++
 arch/x86/hyperv/hv_init.c          | 89 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/irq_vectors.h |  7 ++-
 arch/x86/include/asm/mshyperv.h    |  9 ++++
 arch/x86/include/uapi/asm/hyperv.h | 27 ++++++++++++
 arch/x86/kernel/cpu/mshyperv.c     |  6 +++
 7 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 60c4c342316c..2938cf89d473 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -894,6 +894,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 		 hyperv_vector_handler)
 
+BUILD_INTERRUPT3(hyperv_reenlightenment_vector, HYPERV_REENLIGHTENMENT_VECTOR,
+		 hyperv_reenlightenment_intr)
+
 #endif /* CONFIG_HYPERV */
 
 ENTRY(page_fault)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ff6f8022612c..b04cfd7f6e4c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1244,6 +1244,9 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #if IS_ENABLED(CONFIG_HYPERV)
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	hyperv_callback_vector hyperv_vector_handler
+
+apicinterrupt3 HYPERV_REENLIGHTENMENT_VECTOR \
+	hyperv_reenlightenment_vector hyperv_reenlightenment_intr
 #endif /* CONFIG_HYPERV */
 
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1a6c63f721bc..712ac40081f7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,8 @@
  */
 
 #include <linux/types.h>
+#include <asm/apic.h>
+#include <asm/desc.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv.h>
 #include <asm/mshyperv.h>
@@ -102,6 +104,93 @@ static int hv_cpu_init(unsigned int cpu)
 	return 0;
 }
 
+static void (*hv_reenlightenment_cb)(void);
+
+static void hv_reenlightenment_notify(struct work_struct *dummy)
+{
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	/* Don't issue the callback if TSC accesses are not emulated */
+	if (hv_reenlightenment_cb && emu_status.inprogress)
+		hv_reenlightenment_cb();
+}
+static DECLARE_DELAYED_WORK(hv_reenlightenment_work, hv_reenlightenment_notify);
+
+void hyperv_stop_tsc_emulation(void)
+{
+	u64 freq;
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+	emu_status.inprogress = 0;
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
+	tsc_khz = div64_u64(freq, 1000);
+}
+EXPORT_SYMBOL_GPL(hyperv_stop_tsc_emulation);
+
+static inline bool hv_reenlightenment_available(void)
+{
+	/*
+	 * Check for required features and priviliges to make TSC frequency
+	 * change notifications work.
+	 */
+	return ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS &&
+		ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE &&
+		ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT;
+}
+
+__visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
+{
+	entering_ack_irq();
+
+	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
+
+	exiting_irq();
+}
+
+void set_hv_tscchange_cb(void (*cb)(void))
+{
+	struct hv_reenlightenment_control re_ctrl = {
+		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
+		.enabled = 1,
+		.target_vp = hv_vp_index[smp_processor_id()]
+	};
+	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
+
+	if (!hv_reenlightenment_available()) {
+		pr_warn("Hyper-V: reenlightenment support is unavailable\n");
+		return;
+	}
+
+	hv_reenlightenment_cb = cb;
+
+	/* Make sure callback is registered before we write to MSRs */
+	wmb();
+
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
+}
+EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
+
+void clear_hv_tscchange_cb(void)
+{
+	struct hv_reenlightenment_control re_ctrl;
+
+	if (!hv_reenlightenment_available())
+		return;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+	re_ctrl.enabled = 0;
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+
+	hv_reenlightenment_cb = NULL;
+}
+EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 67421f649cfa..e71c1120426b 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,7 +103,12 @@
 #endif
 
 #define MANAGED_IRQ_SHUTDOWN_VECTOR	0xef
-#define LOCAL_TIMER_VECTOR		0xee
+
+#if IS_ENABLED(CONFIG_HYPERV)
+#define HYPERV_REENLIGHTENMENT_VECTOR	0xee
+#endif
+
+#define LOCAL_TIMER_VECTOR		0xed
 
 #define NR_VECTORS			 256
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6b1d4ea78270..1790002a2052 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -160,6 +160,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 #define hv_set_synint_state(int_num, val) wrmsrl(int_num, val)
 
 void hyperv_callback_vector(void);
+void hyperv_reenlightenment_vector(void);
 #ifdef CONFIG_TRACING
 #define trace_hyperv_callback_vector hyperv_callback_vector
 #endif
@@ -316,11 +317,19 @@ void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs, long err);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
+
+void hyperv_reenlightenment_intr(struct pt_regs *regs);
+void set_hv_tscchange_cb(void (*cb)(void));
+void clear_hv_tscchange_cb(void);
+void hyperv_stop_tsc_emulation(void);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline bool hv_is_hypercall_page_setup(void) { return false; }
 static inline void hyperv_cleanup(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
+static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
+static inline void clear_hv_tscchange_cb(void) {}
+static inline void hyperv_stop_tsc_emulation(void) {};
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfead93b4..197c2e6c7376 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -40,6 +40,9 @@
  */
 #define HV_X64_ACCESS_FREQUENCY_MSRS		(1 << 11)
 
+/* AccessReenlightenmentControls privilege */
+#define HV_X64_ACCESS_REENLIGHTENMENT		BIT(13)
+
 /*
  * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
  * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
@@ -234,6 +237,30 @@
 #define HV_X64_MSR_CRASH_PARAMS		\
 		(1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0))
 
+/* TSC emulation after migration */
+#define HV_X64_MSR_REENLIGHTENMENT_CONTROL	0x40000106
+
+struct hv_reenlightenment_control {
+	u64 vector:8;
+	u64 reserved1:8;
+	u64 enabled:1;
+	u64 reserved2:15;
+	u64 target_vp:32;
+};
+
+#define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
+#define HV_X64_MSR_TSC_EMULATION_STATUS		0x40000108
+
+struct hv_tsc_emulation_control {
+	u64 enabled:1;
+	u64 reserved:63;
+};
+
+struct hv_tsc_emulation_status {
+	u64 inprogress:1;
+	u64 reserved:63;
+};
+
 #define HV_X64_MSR_HYPERCALL_ENABLE		0x00000001
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT	12
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK	\
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 85eb5fc180c8..9340f41ce8d3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -251,6 +251,12 @@ static void __init ms_hyperv_init_platform(void)
 	hyperv_setup_mmu_ops();
 	/* Setup the IDT for hypervisor callback */
 	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, hyperv_callback_vector);
+
+	/* Setup the IDT for reenlightenment notifications */
+	if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT)
+		alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
+				hyperv_reenlightenment_vector);
+
 #endif
 }
 
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 3/7] x86/hyper-v: reenlightenment notifications support
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when we are migrated to a host with
different TSC frequency for some short period host emulates our accesses
to TSC and sends us an interrupt to notify about the event. When we're
done updating everything we can disable TSC emulation and everything will
start working fast again.

We didn't need these notifications before as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux we even mark it as unstable
on boot. Guests normally use 'tsc page' clocksouce and host updates its
values on migrations automatically.

Things change when we want to run nested virtualization: even when we pass
through PV clocksources (kvm-clock or tsc page) to our guests we need to
know TSC frequency and when it changes.

Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
v3 -> v4:
- drop 'include <linux/interrupt.h>' (and __visible/_irq_entry from
  hyperv_reenlightenment_intr declaration) from mshyperv.h
  [kbuild robot]
---
 arch/x86/entry/entry_32.S          |  3 ++
 arch/x86/entry/entry_64.S          |  3 ++
 arch/x86/hyperv/hv_init.c          | 89 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/irq_vectors.h |  7 ++-
 arch/x86/include/asm/mshyperv.h    |  9 ++++
 arch/x86/include/uapi/asm/hyperv.h | 27 ++++++++++++
 arch/x86/kernel/cpu/mshyperv.c     |  6 +++
 7 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 60c4c342316c..2938cf89d473 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -894,6 +894,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 		 hyperv_vector_handler)
 
+BUILD_INTERRUPT3(hyperv_reenlightenment_vector, HYPERV_REENLIGHTENMENT_VECTOR,
+		 hyperv_reenlightenment_intr)
+
 #endif /* CONFIG_HYPERV */
 
 ENTRY(page_fault)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ff6f8022612c..b04cfd7f6e4c 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1244,6 +1244,9 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #if IS_ENABLED(CONFIG_HYPERV)
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	hyperv_callback_vector hyperv_vector_handler
+
+apicinterrupt3 HYPERV_REENLIGHTENMENT_VECTOR \
+	hyperv_reenlightenment_vector hyperv_reenlightenment_intr
 #endif /* CONFIG_HYPERV */
 
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1a6c63f721bc..712ac40081f7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,8 @@
  */
 
 #include <linux/types.h>
+#include <asm/apic.h>
+#include <asm/desc.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv.h>
 #include <asm/mshyperv.h>
@@ -102,6 +104,93 @@ static int hv_cpu_init(unsigned int cpu)
 	return 0;
 }
 
+static void (*hv_reenlightenment_cb)(void);
+
+static void hv_reenlightenment_notify(struct work_struct *dummy)
+{
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	/* Don't issue the callback if TSC accesses are not emulated */
+	if (hv_reenlightenment_cb && emu_status.inprogress)
+		hv_reenlightenment_cb();
+}
+static DECLARE_DELAYED_WORK(hv_reenlightenment_work, hv_reenlightenment_notify);
+
+void hyperv_stop_tsc_emulation(void)
+{
+	u64 freq;
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+	emu_status.inprogress = 0;
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
+	tsc_khz = div64_u64(freq, 1000);
+}
+EXPORT_SYMBOL_GPL(hyperv_stop_tsc_emulation);
+
+static inline bool hv_reenlightenment_available(void)
+{
+	/*
+	 * Check for required features and priviliges to make TSC frequency
+	 * change notifications work.
+	 */
+	return ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS &&
+		ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE &&
+		ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT;
+}
+
+__visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
+{
+	entering_ack_irq();
+
+	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
+
+	exiting_irq();
+}
+
+void set_hv_tscchange_cb(void (*cb)(void))
+{
+	struct hv_reenlightenment_control re_ctrl = {
+		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
+		.enabled = 1,
+		.target_vp = hv_vp_index[smp_processor_id()]
+	};
+	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
+
+	if (!hv_reenlightenment_available()) {
+		pr_warn("Hyper-V: reenlightenment support is unavailable\n");
+		return;
+	}
+
+	hv_reenlightenment_cb = cb;
+
+	/* Make sure callback is registered before we write to MSRs */
+	wmb();
+
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
+}
+EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
+
+void clear_hv_tscchange_cb(void)
+{
+	struct hv_reenlightenment_control re_ctrl;
+
+	if (!hv_reenlightenment_available())
+		return;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+	re_ctrl.enabled = 0;
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+
+	hv_reenlightenment_cb = NULL;
+}
+EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 67421f649cfa..e71c1120426b 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,7 +103,12 @@
 #endif
 
 #define MANAGED_IRQ_SHUTDOWN_VECTOR	0xef
-#define LOCAL_TIMER_VECTOR		0xee
+
+#if IS_ENABLED(CONFIG_HYPERV)
+#define HYPERV_REENLIGHTENMENT_VECTOR	0xee
+#endif
+
+#define LOCAL_TIMER_VECTOR		0xed
 
 #define NR_VECTORS			 256
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6b1d4ea78270..1790002a2052 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -160,6 +160,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 #define hv_set_synint_state(int_num, val) wrmsrl(int_num, val)
 
 void hyperv_callback_vector(void);
+void hyperv_reenlightenment_vector(void);
 #ifdef CONFIG_TRACING
 #define trace_hyperv_callback_vector hyperv_callback_vector
 #endif
@@ -316,11 +317,19 @@ void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs, long err);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
+
+void hyperv_reenlightenment_intr(struct pt_regs *regs);
+void set_hv_tscchange_cb(void (*cb)(void));
+void clear_hv_tscchange_cb(void);
+void hyperv_stop_tsc_emulation(void);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline bool hv_is_hypercall_page_setup(void) { return false; }
 static inline void hyperv_cleanup(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
+static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
+static inline void clear_hv_tscchange_cb(void) {}
+static inline void hyperv_stop_tsc_emulation(void) {};
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfead93b4..197c2e6c7376 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -40,6 +40,9 @@
  */
 #define HV_X64_ACCESS_FREQUENCY_MSRS		(1 << 11)
 
+/* AccessReenlightenmentControls privilege */
+#define HV_X64_ACCESS_REENLIGHTENMENT		BIT(13)
+
 /*
  * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
  * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
@@ -234,6 +237,30 @@
 #define HV_X64_MSR_CRASH_PARAMS		\
 		(1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0))
 
+/* TSC emulation after migration */
+#define HV_X64_MSR_REENLIGHTENMENT_CONTROL	0x40000106
+
+struct hv_reenlightenment_control {
+	u64 vector:8;
+	u64 reserved1:8;
+	u64 enabled:1;
+	u64 reserved2:15;
+	u64 target_vp:32;
+};
+
+#define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
+#define HV_X64_MSR_TSC_EMULATION_STATUS		0x40000108
+
+struct hv_tsc_emulation_control {
+	u64 enabled:1;
+	u64 reserved:63;
+};
+
+struct hv_tsc_emulation_status {
+	u64 inprogress:1;
+	u64 reserved:63;
+};
+
 #define HV_X64_MSR_HYPERCALL_ENABLE		0x00000001
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT	12
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK	\
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 85eb5fc180c8..9340f41ce8d3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -251,6 +251,12 @@ static void __init ms_hyperv_init_platform(void)
 	hyperv_setup_mmu_ops();
 	/* Setup the IDT for hypervisor callback */
 	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, hyperv_callback_vector);
+
+	/* Setup the IDT for reenlightenment notifications */
+	if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT)
+		alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
+				hyperv_reenlightenment_vector);
+
 #endif
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 4/7] x86/hyper-v: redirect reenlightment notifications on CPU offlining
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

It is very unlikely for CPUs to get offlined when we run on Hyper-V as
we have a protection in vmbus module which prevents it when we have any
VMBus devices assigned. This, however,  may change in future if an option
to reassign an already active channel will be added. It is also possible
to run without any Hyper-V devices of have a CPU with no assigned channels.

Reassign reenlightenment notifications to some other active CPU when
the CPU which is assigned to get them goes offline.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 712ac40081f7..e4377e2f2a10 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -191,6 +191,26 @@ void clear_hv_tscchange_cb(void)
 }
 EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
 
+static int hv_cpu_die(unsigned int cpu)
+{
+	struct hv_reenlightenment_control re_ctrl;
+	unsigned int new_cpu;
+
+	if (hv_reenlightenment_cb == NULL)
+		return 0;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	if (re_ctrl.target_vp == hv_vp_index[cpu]) {
+		/* Reassign to some other online CPU */
+		new_cpu = cpumask_any_but(cpu_online_mask, cpu);
+
+		re_ctrl.target_vp = hv_vp_index[new_cpu];
+		wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	}
+
+	return 0;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -220,7 +240,7 @@ void hyperv_init(void)
 		return;
 
 	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
-			      hv_cpu_init, NULL) < 0)
+			      hv_cpu_init, hv_cpu_die) < 0)
 		goto free_vp_index;
 
 	/*
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 4/7] x86/hyper-v: redirect reenlightment notifications on CPU offlining
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

It is very unlikely for CPUs to get offlined when we run on Hyper-V as
we have a protection in vmbus module which prevents it when we have any
VMBus devices assigned. This, however,  may change in future if an option
to reassign an already active channel will be added. It is also possible
to run without any Hyper-V devices of have a CPU with no assigned channels.

Reassign reenlightenment notifications to some other active CPU when
the CPU which is assigned to get them goes offline.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 712ac40081f7..e4377e2f2a10 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -191,6 +191,26 @@ void clear_hv_tscchange_cb(void)
 }
 EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
 
+static int hv_cpu_die(unsigned int cpu)
+{
+	struct hv_reenlightenment_control re_ctrl;
+	unsigned int new_cpu;
+
+	if (hv_reenlightenment_cb == NULL)
+		return 0;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	if (re_ctrl.target_vp == hv_vp_index[cpu]) {
+		/* Reassign to some other online CPU */
+		new_cpu = cpumask_any_but(cpu_online_mask, cpu);
+
+		re_ctrl.target_vp = hv_vp_index[new_cpu];
+		wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	}
+
+	return 0;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -220,7 +240,7 @@ void hyperv_init(void)
 		return;
 
 	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
-			      hv_cpu_init, NULL) < 0)
+			      hv_cpu_init, hv_cpu_die) < 0)
 		goto free_vp_index;
 
 	/*
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
not supposed to see many of them. However, it may be important to know
that the event has happened in case we have L2 nested guests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c      | 2 ++
 arch/x86/include/asm/hardirq.h | 3 +++
 arch/x86/kernel/irq.c          | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4377e2f2a10..a3adece392f1 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -147,6 +147,8 @@ __visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
 {
 	entering_ack_irq();
 
+	inc_irq_stat(irq_hv_reenlightenment_count);
+
 	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
 
 	exiting_irq();
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 51cc979dd364..7c341a74ec8c 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -38,6 +38,9 @@ typedef struct {
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
 	unsigned int irq_hv_callback_count;
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	unsigned int irq_hv_reenlightenment_count;
+#endif
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 68e1867cca80..45fb4d2565f8 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -141,6 +141,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 				   irq_stats(j)->irq_hv_callback_count);
 		seq_puts(p, "  Hypervisor callback interrupts\n");
 	}
+#endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (test_bit(HYPERV_REENLIGHTENMENT_VECTOR, system_vectors)) {
+		seq_printf(p, "%*s: ", prec, "HRE");
+		for_each_online_cpu(j)
+			seq_printf(p, "%10u ",
+				   irq_stats(j)->irq_hv_reenlightenment_count);
+		seq_puts(p, "  Hyper-V reenlightenment interrupts\n");
+	}
 #endif
 	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
not supposed to see many of them. However, it may be important to know
that the event has happened in case we have L2 nested guests.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/hyperv/hv_init.c      | 2 ++
 arch/x86/include/asm/hardirq.h | 3 +++
 arch/x86/kernel/irq.c          | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4377e2f2a10..a3adece392f1 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -147,6 +147,8 @@ __visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
 {
 	entering_ack_irq();
 
+	inc_irq_stat(irq_hv_reenlightenment_count);
+
 	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
 
 	exiting_irq();
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 51cc979dd364..7c341a74ec8c 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -38,6 +38,9 @@ typedef struct {
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
 	unsigned int irq_hv_callback_count;
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	unsigned int irq_hv_reenlightenment_count;
+#endif
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 68e1867cca80..45fb4d2565f8 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -141,6 +141,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 				   irq_stats(j)->irq_hv_callback_count);
 		seq_puts(p, "  Hypervisor callback interrupts\n");
 	}
+#endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (test_bit(HYPERV_REENLIGHTENMENT_VECTOR, system_vectors)) {
+		seq_printf(p, "%*s: ", prec, "HRE");
+		for_each_online_cpu(j)
+			seq_printf(p, "%10u ",
+				   irq_stats(j)->irq_hv_reenlightenment_count);
+		seq_puts(p, "  Hyper-V reenlightenment interrupts\n");
+	}
 #endif
 	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 6/7] x86/kvm: pass stable clocksource to guests when running nested on Hyper-V
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

Currently, KVM is able to work in 'masterclock' mode passing
PVCLOCK_TSC_STABLE_BIT to guests when the clocksource we use on the host
is TSC. When running nested on Hyper-V we normally use a different one:
TSC page which is resistant to TSC frequency changes on event like L1
migration. Add support for it in KVM.

The only non-trivial change in the patch is in vgettsc(): when updating
our gtod copy we now need to get both the clockread and tsc value.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 68 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298dfbf50..b1ce368a07af 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mshyperv.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -1377,6 +1378,11 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 	return tsc;
 }
 
+static inline int gtod_is_based_on_tsc(int mode)
+{
+	return mode == VCLOCK_TSC || mode == VCLOCK_HVCLOCK;
+}
+
 static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
@@ -1396,7 +1402,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * perform request to enable masterclock.
 	 */
 	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+	    (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched))
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
@@ -1459,6 +1465,19 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vcpu->arch.tsc_offset = offset;
 }
 
+static inline bool kvm_check_tsc_unstable(void)
+{
+#ifdef CONFIG_X86_64
+	/*
+	 * TSC is marked unstable when we're running on Hyper-V,
+	 * 'TSC page' clocksource is good.
+	 */
+	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_HVCLOCK)
+		return false;
+#endif
+	return check_tsc_unstable();
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -1504,7 +1523,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
          */
 	if (synchronizing &&
 	    vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
-		if (!check_tsc_unstable()) {
+		if (!kvm_check_tsc_unstable()) {
 			offset = kvm->arch.cur_tsc_offset;
 			pr_debug("kvm: matched tsc offset for %llu\n", data);
 		} else {
@@ -1604,18 +1623,43 @@ static u64 read_tsc(void)
 	return last;
 }
 
-static inline u64 vgettsc(u64 *cycle_now)
+static inline u64 vgettsc(u64 *tsc_timestamp, int *mode)
 {
 	long v;
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
+	u64 tsc_pg_val;
+
+	switch (gtod->clock.vclock_mode) {
+	case VCLOCK_HVCLOCK:
+		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
+						  tsc_timestamp);
+		if (tsc_pg_val != U64_MAX) {
+			/* TSC page valid */
+			*mode = VCLOCK_HVCLOCK;
+			v = (tsc_pg_val - gtod->clock.cycle_last) &
+				gtod->clock.mask;
+		} else {
+			/* TSC page invalid */
+			*mode = VCLOCK_NONE;
+		}
+		break;
+	case VCLOCK_TSC:
+		*mode = VCLOCK_TSC;
+		*tsc_timestamp = read_tsc();
+		v = (*tsc_timestamp - gtod->clock.cycle_last) &
+			gtod->clock.mask;
+		break;
+	default:
+		*mode = VCLOCK_NONE;
+	}
 
-	*cycle_now = read_tsc();
+	if (*mode == VCLOCK_NONE)
+		*tsc_timestamp = v = 0;
 
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
 	return v * gtod->clock.mult;
 }
 
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
+static int do_monotonic_boot(s64 *t, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1624,9 +1668,8 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 		ns += gtod->boot_ns;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
@@ -1635,7 +1678,7 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 	return mode;
 }
 
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
+static int do_realtime(struct timespec *ts, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1644,10 +1687,9 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 
@@ -1657,25 +1699,26 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 	return mode;
 }
 
-/* returns true if host is using tsc clocksource */
-static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
+/* returns true if host is using TSC based clocksource */
+static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_monotonic_boot(kernel_ns,
+						      tsc_timestamp));
 }
 
-/* returns true if host is using tsc clocksource */
+/* returns true if host is using TSC based clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
-					   u64 *cycle_now)
+					   u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_realtime(ts, tsc_timestamp));
 }
 #endif
 
@@ -2869,13 +2912,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 	}
 
-	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
+	if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) {
 		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
 				rdtsc() - vcpu->arch.last_host_tsc;
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
-		if (check_tsc_unstable()) {
+		if (kvm_check_tsc_unstable()) {
 			u64 offset = kvm_compute_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
@@ -6110,9 +6153,9 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
 	update_pvclock_gtod(tk);
 
 	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
+	 * use, TSC based clocksource.
 	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
+	if (!gtod_is_based_on_tsc(gtod->clock.vclock_mode) &&
 	    atomic_read(&kvm_guest_has_master_clock) != 0)
 		queue_work(system_long_wq, &pvclock_gtod_work);
 
@@ -7767,7 +7810,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu;
 
-	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
+	if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
 		printk_once(KERN_WARNING
 		"kvm: SMP vm created on host with unstable TSC; "
 		"guest TSC will not be reliable\n");
@@ -7924,7 +7967,7 @@ int kvm_arch_hardware_enable(void)
 		return ret;
 
 	local_tsc = rdtsc();
-	stable = !check_tsc_unstable();
+	stable = !kvm_check_tsc_unstable();
 	list_for_each_entry(kvm, &vm_list, vm_list) {
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (!stable && vcpu->cpu == smp_processor_id())
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 6/7] x86/kvm: pass stable clocksource to guests when running nested on Hyper-V
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

Currently, KVM is able to work in 'masterclock' mode passing
PVCLOCK_TSC_STABLE_BIT to guests when the clocksource we use on the host
is TSC. When running nested on Hyper-V we normally use a different one:
TSC page which is resistant to TSC frequency changes on event like L1
migration. Add support for it in KVM.

The only non-trivial change in the patch is in vgettsc(): when updating
our gtod copy we now need to get both the clockread and tsc value.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 68 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298dfbf50..b1ce368a07af 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mshyperv.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -1377,6 +1378,11 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 	return tsc;
 }
 
+static inline int gtod_is_based_on_tsc(int mode)
+{
+	return mode == VCLOCK_TSC || mode == VCLOCK_HVCLOCK;
+}
+
 static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
@@ -1396,7 +1402,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * perform request to enable masterclock.
 	 */
 	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+	    (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched))
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
@@ -1459,6 +1465,19 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vcpu->arch.tsc_offset = offset;
 }
 
+static inline bool kvm_check_tsc_unstable(void)
+{
+#ifdef CONFIG_X86_64
+	/*
+	 * TSC is marked unstable when we're running on Hyper-V,
+	 * 'TSC page' clocksource is good.
+	 */
+	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_HVCLOCK)
+		return false;
+#endif
+	return check_tsc_unstable();
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -1504,7 +1523,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
          */
 	if (synchronizing &&
 	    vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
-		if (!check_tsc_unstable()) {
+		if (!kvm_check_tsc_unstable()) {
 			offset = kvm->arch.cur_tsc_offset;
 			pr_debug("kvm: matched tsc offset for %llu\n", data);
 		} else {
@@ -1604,18 +1623,43 @@ static u64 read_tsc(void)
 	return last;
 }
 
-static inline u64 vgettsc(u64 *cycle_now)
+static inline u64 vgettsc(u64 *tsc_timestamp, int *mode)
 {
 	long v;
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
+	u64 tsc_pg_val;
+
+	switch (gtod->clock.vclock_mode) {
+	case VCLOCK_HVCLOCK:
+		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
+						  tsc_timestamp);
+		if (tsc_pg_val != U64_MAX) {
+			/* TSC page valid */
+			*mode = VCLOCK_HVCLOCK;
+			v = (tsc_pg_val - gtod->clock.cycle_last) &
+				gtod->clock.mask;
+		} else {
+			/* TSC page invalid */
+			*mode = VCLOCK_NONE;
+		}
+		break;
+	case VCLOCK_TSC:
+		*mode = VCLOCK_TSC;
+		*tsc_timestamp = read_tsc();
+		v = (*tsc_timestamp - gtod->clock.cycle_last) &
+			gtod->clock.mask;
+		break;
+	default:
+		*mode = VCLOCK_NONE;
+	}
 
-	*cycle_now = read_tsc();
+	if (*mode == VCLOCK_NONE)
+		*tsc_timestamp = v = 0;
 
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
 	return v * gtod->clock.mult;
 }
 
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
+static int do_monotonic_boot(s64 *t, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1624,9 +1668,8 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 		ns += gtod->boot_ns;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
@@ -1635,7 +1678,7 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 	return mode;
 }
 
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
+static int do_realtime(struct timespec *ts, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1644,10 +1687,9 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 
@@ -1657,25 +1699,26 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 	return mode;
 }
 
-/* returns true if host is using tsc clocksource */
-static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
+/* returns true if host is using TSC based clocksource */
+static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_monotonic_boot(kernel_ns,
+						      tsc_timestamp));
 }
 
-/* returns true if host is using tsc clocksource */
+/* returns true if host is using TSC based clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
-					   u64 *cycle_now)
+					   u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_realtime(ts, tsc_timestamp));
 }
 #endif
 
@@ -2869,13 +2912,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 	}
 
-	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
+	if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) {
 		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
 				rdtsc() - vcpu->arch.last_host_tsc;
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
-		if (check_tsc_unstable()) {
+		if (kvm_check_tsc_unstable()) {
 			u64 offset = kvm_compute_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
@@ -6110,9 +6153,9 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
 	update_pvclock_gtod(tk);
 
 	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
+	 * use, TSC based clocksource.
 	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
+	if (!gtod_is_based_on_tsc(gtod->clock.vclock_mode) &&
 	    atomic_read(&kvm_guest_has_master_clock) != 0)
 		queue_work(system_long_wq, &pvclock_gtod_work);
 
@@ -7767,7 +7810,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu;
 
-	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
+	if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
 		printk_once(KERN_WARNING
 		"kvm: SMP vm created on host with unstable TSC; "
 		"guest TSC will not be reliable\n");
@@ -7924,7 +7967,7 @@ int kvm_arch_hardware_enable(void)
 		return ret;
 
 	local_tsc = rdtsc();
-	stable = !check_tsc_unstable();
+	stable = !kvm_check_tsc_unstable();
 	list_for_each_entry(kvm, &vm_list, vm_list) {
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (!stable && vcpu->cpu == smp_processor_id())
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 7/7] x86/kvm: support Hyper-V reenlightenment
  2018-01-24 13:23 ` Vitaly Kuznetsov
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Stephen Hemminger, Radim Krčmář,
	Haiyang Zhang, linux-kernel, devel, Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Thomas Gleixner, Mohammed Gamal

When we run nested KVM on Hyper-V guests we need to update masterclocks for
all guests when L1 migrates to a host with different TSC frequency.
Implement the procedure in the following way:
- Pause all guests.
- Tell our host (Hyper-V) to stop emulating TSC accesses.
- Update our gtod copy, recompute clocks.
- Unpause all guests.

This is somewhat similar to cpufreq but we have two important differences:
we can only disable TSC emulation globally (on all CPUs) and we don't know
the new TSC frequency until we turn the emulation off so we can't
'prepare' ourselves to the event.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
v3 -> v4:
  add static to kvm_hyperv_tsc_notifier() [kbuild robot]
---
 arch/x86/kvm/x86.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b1ce368a07af..879a99987401 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,6 +68,7 @@
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
 #include <asm/mshyperv.h>
+#include <asm/hypervisor.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -5932,6 +5933,43 @@ static void tsc_khz_changed(void *data)
 	__this_cpu_write(cpu_tsc_khz, khz);
 }
 
+static void kvm_hyperv_tsc_notifier(void)
+{
+#ifdef CONFIG_X86_64
+	struct kvm *kvm;
+	struct kvm_vcpu *vcpu;
+	int cpu;
+
+	spin_lock(&kvm_lock);
+	list_for_each_entry(kvm, &vm_list, vm_list)
+		kvm_make_mclock_inprogress_request(kvm);
+
+	hyperv_stop_tsc_emulation();
+
+	/* TSC frequency always matches when on Hyper-V */
+	for_each_present_cpu(cpu)
+		per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+	kvm_max_guest_tsc_khz = tsc_khz;
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		struct kvm_arch *ka = &kvm->arch;
+
+		spin_lock(&ka->pvclock_gtod_sync_lock);
+
+		pvclock_update_vm_gtod_copy(kvm);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
+
+		spin_unlock(&ka->pvclock_gtod_sync_lock);
+	}
+	spin_unlock(&kvm_lock);
+#endif
+}
+
 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 				     void *data)
 {
@@ -6217,6 +6255,9 @@ int kvm_arch_init(void *opaque)
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
 #endif
 
 	return 0;
@@ -6229,6 +6270,10 @@ int kvm_arch_init(void *opaque)
 
 void kvm_arch_exit(void)
 {
+#ifdef CONFIG_X86_64
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		clear_hv_tscchange_cb();
+#endif
 	kvm_lapic_exit();
 	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
 
-- 
2.14.3

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 7/7] x86/kvm: support Hyper-V reenlightenment
@ 2018-01-24 13:23   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 13:23 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

When we run nested KVM on Hyper-V guests we need to update masterclocks for
all guests when L1 migrates to a host with different TSC frequency.
Implement the procedure in the following way:
- Pause all guests.
- Tell our host (Hyper-V) to stop emulating TSC accesses.
- Update our gtod copy, recompute clocks.
- Unpause all guests.

This is somewhat similar to cpufreq but we have two important differences:
we can only disable TSC emulation globally (on all CPUs) and we don't know
the new TSC frequency until we turn the emulation off so we can't
'prepare' ourselves to the event.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
v3 -> v4:
  add static to kvm_hyperv_tsc_notifier() [kbuild robot]
---
 arch/x86/kvm/x86.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b1ce368a07af..879a99987401 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,6 +68,7 @@
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
 #include <asm/mshyperv.h>
+#include <asm/hypervisor.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -5932,6 +5933,43 @@ static void tsc_khz_changed(void *data)
 	__this_cpu_write(cpu_tsc_khz, khz);
 }
 
+static void kvm_hyperv_tsc_notifier(void)
+{
+#ifdef CONFIG_X86_64
+	struct kvm *kvm;
+	struct kvm_vcpu *vcpu;
+	int cpu;
+
+	spin_lock(&kvm_lock);
+	list_for_each_entry(kvm, &vm_list, vm_list)
+		kvm_make_mclock_inprogress_request(kvm);
+
+	hyperv_stop_tsc_emulation();
+
+	/* TSC frequency always matches when on Hyper-V */
+	for_each_present_cpu(cpu)
+		per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+	kvm_max_guest_tsc_khz = tsc_khz;
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		struct kvm_arch *ka = &kvm->arch;
+
+		spin_lock(&ka->pvclock_gtod_sync_lock);
+
+		pvclock_update_vm_gtod_copy(kvm);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
+
+		spin_unlock(&ka->pvclock_gtod_sync_lock);
+	}
+	spin_unlock(&kvm_lock);
+#endif
+}
+
 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 				     void *data)
 {
@@ -6217,6 +6255,9 @@ int kvm_arch_init(void *opaque)
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
 #endif
 
 	return 0;
@@ -6229,6 +6270,10 @@ int kvm_arch_init(void *opaque)
 
 void kvm_arch_exit(void)
 {
+#ifdef CONFIG_X86_64
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		clear_hv_tscchange_cb();
+#endif
 	kvm_lapic_exit();
 	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-24 13:23   ` Vitaly Kuznetsov
@ 2018-01-24 14:58     ` Radim Krčmář
  -1 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2018-01-24 14:58 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Thomas Gleixner
  Cc: Stephen Hemminger, kvm, Haiyang Zhang, x86, linux-kernel, devel,
	Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Mohammed Gamal

2018-01-24 14:23+0100, Vitaly Kuznetsov:
> Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> not supposed to see many of them. However, it may be important to know
> that the event has happened in case we have L2 nested guests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> ---

Thomas,

I think the expectation is that this series will go through the KVM
tree.  Would you prefer a topic branch?

Thanks.
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
@ 2018-01-24 14:58     ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2018-01-24 14:58 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Thomas Gleixner
  Cc: kvm, x86, Paolo Bonzini, Ingo Molnar, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

2018-01-24 14:23+0100, Vitaly Kuznetsov:
> Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> not supposed to see many of them. However, it may be important to know
> that the event has happened in case we have L2 nested guests.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> ---

Thomas,

I think the expectation is that this series will go through the KVM
tree.  Would you prefer a topic branch?

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-24 14:58     ` Radim Krčmář
@ 2018-01-29 21:48       ` Thomas Gleixner
  -1 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2018-01-29 21:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Stephen Hemminger, kvm, Haiyang Zhang, x86, linux-kernel, devel,
	Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Mohammed Gamal

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On Wed, 24 Jan 2018, Radim Krčmář wrote:

> 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > not supposed to see many of them. However, it may be important to know
> > that the event has happened in case we have L2 nested guests.
> > 
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> 
> Thomas,
> 
> I think the expectation is that this series will go through the KVM
> tree.  Would you prefer a topic branch?

Is there any dependency outside of plain 4.15? If not, I'll put it into
x86/hyperv and let KVM folks pull it over.

Thanks,

	tglx

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
@ 2018-01-29 21:48       ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2018-01-29 21:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Vitaly Kuznetsov, kvm, x86, Paolo Bonzini, Ingo Molnar,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On Wed, 24 Jan 2018, Radim Krčmář wrote:

> 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > not supposed to see many of them. However, it may be important to know
> > that the event has happened in case we have L2 nested guests.
> > 
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> 
> Thomas,
> 
> I think the expectation is that this series will go through the KVM
> tree.  Would you prefer a topic branch?

Is there any dependency outside of plain 4.15? If not, I'll put it into
x86/hyperv and let KVM folks pull it over.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-29 21:48       ` Thomas Gleixner
@ 2018-01-30 14:02         ` Radim Krčmář
  -1 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2018-01-30 14:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Stephen Hemminger, kvm, Haiyang Zhang, x86, linux-kernel, devel,
	Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Mohammed Gamal

2018-01-29 22:48+0100, Thomas Gleixner:
> On Wed, 24 Jan 2018, Radim Krčmář wrote:
> > 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > > not supposed to see many of them. However, it may be important to know
> > > that the event has happened in case we have L2 nested guests.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > > ---
> > 
> > Thomas,
> > 
> > I think the expectation is that this series will go through the KVM
> > tree.  Would you prefer a topic branch?
> 
> Is there any dependency outside of plain 4.15? If not, I'll put it into
> x86/hyperv and let KVM folks pull it over.

There isn't;  we'll wait for x86/hyperv, thanks.
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
@ 2018-01-30 14:02         ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2018-01-30 14:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vitaly Kuznetsov, kvm, x86, Paolo Bonzini, Ingo Molnar,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

2018-01-29 22:48+0100, Thomas Gleixner:
> On Wed, 24 Jan 2018, Radim Krčmář wrote:
> > 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > > not supposed to see many of them. However, it may be important to know
> > > that the event has happened in case we have L2 nested guests.
> > > 
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > > ---
> > 
> > Thomas,
> > 
> > I think the expectation is that this series will go through the KVM
> > tree.  Would you prefer a topic branch?
> 
> Is there any dependency outside of plain 4.15? If not, I'll put it into
> x86/hyperv and let KVM folks pull it over.

There isn't;  we'll wait for x86/hyperv, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-30 14:02         ` Radim Krčmář
  (?)
@ 2018-01-30 14:35         ` Thomas Gleixner
  2018-01-30 23:48             ` Thomas Gleixner
  -1 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2018-01-30 14:35 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Stephen Hemminger, kvm, Haiyang Zhang, x86, linux-kernel, devel,
	Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Mohammed Gamal

[-- Attachment #1: Type: text/plain, Size: 926 bytes --]

On Tue, 30 Jan 2018, Radim Krčmář wrote:
> 2018-01-29 22:48+0100, Thomas Gleixner:
> > On Wed, 24 Jan 2018, Radim Krčmář wrote:
> > > 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > > > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > > > not supposed to see many of them. However, it may be important to know
> > > > that the event has happened in case we have L2 nested guests.
> > > > 
> > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > > > ---
> > > 
> > > Thomas,
> > > 
> > > I think the expectation is that this series will go through the KVM
> > > tree.  Would you prefer a topic branch?
> > 
> > Is there any dependency outside of plain 4.15? If not, I'll put it into
> > x86/hyperv and let KVM folks pull it over.
> 
> There isn't;  we'll wait for x86/hyperv, thanks.

Will be there tomorrow. I'll let you know.

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/hyperv: Check for required priviliges in hyperv_init()
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:00   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vkuznets, luto, hpa, mmorsy, pbonzini, sthemmin, rkagan,
	Michael.H.Kelley, mingo, haiyangz, cavery, tglx, kys, rkrcmar,
	linux-kernel

Commit-ID:  89a8f6d4904c8cf3ff8fee9fdaff392a6bbb8bf6
Gitweb:     https://git.kernel.org/tip/89a8f6d4904c8cf3ff8fee9fdaff392a6bbb8bf6
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:31 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:32 +0100

x86/hyperv: Check for required priviliges in hyperv_init()

In hyperv_init() its presumed that it always has access to VP index and
hypercall MSRs while according to the specification it should be checked if
it's allowed to access the corresponding MSRs before accessing them.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-2-vkuznets@redhat.com

---
 arch/x86/hyperv/hv_init.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 189a398..21f9d53 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -110,12 +110,19 @@ static int hv_cpu_init(unsigned int cpu)
  */
 void hyperv_init(void)
 {
-	u64 guest_id;
+	u64 guest_id, required_msrs;
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 
 	if (x86_hyper_type != X86_HYPER_MS_HYPERV)
 		return;
 
+	/* Absolutely required MSRs */
+	required_msrs = HV_X64_MSR_HYPERCALL_AVAILABLE |
+		HV_X64_MSR_VP_INDEX_AVAILABLE;
+
+	if ((ms_hyperv.features & required_msrs) != required_msrs)
+		return;
+
 	/* Allocate percpu VP index */
 	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
 				    GFP_KERNEL);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/hyperv: Add a function to read both TSC and TSC page value simulateneously
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:01   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rkrcmar, tglx, kys, mmorsy, luto, mingo, pbonzini,
	Michael.H.Kelley, cavery, vkuznets, haiyangz, hpa, sthemmin,
	rkagan, linux-kernel

Commit-ID:  e2768eaa1ca4fbb7b778da5615cce3dd310352e6
Gitweb:     https://git.kernel.org/tip/e2768eaa1ca4fbb7b778da5615cce3dd310352e6
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:32 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:32 +0100

x86/hyperv: Add a function to read both TSC and TSC page value simulateneously

This is going to be used from KVM code where both TSC and TSC page value
are needed.

Nothing is supposed to use the function when Hyper-V code is compiled out,
just BUG().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com

---
 arch/x86/hyperv/hv_init.c       |  1 +
 arch/x86/include/asm/mshyperv.h | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 21f9d53..1a6c63f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return tsc_pg;
 }
+EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 read_hv_clock_tsc(struct clocksource *arg)
 {
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8bf450b..6b1d4ea 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -325,9 +325,10 @@ static inline void hyperv_setup_mmu_ops(void) {}
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
-static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
 {
-	u64 scale, offset, cur_tsc;
+	u64 scale, offset;
 	u32 sequence;
 
 	/*
@@ -358,7 +359,7 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 		scale = READ_ONCE(tsc_pg->tsc_scale);
 		offset = READ_ONCE(tsc_pg->tsc_offset);
-		cur_tsc = rdtsc_ordered();
+		*cur_tsc = rdtsc_ordered();
 
 		/*
 		 * Make sure we read sequence after we read all other values
@@ -368,7 +369,14 @@ static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+}
+
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
+{
+	u64 cur_tsc;
+
+	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
 }
 
 #else
@@ -376,5 +384,12 @@ static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
 	return NULL;
 }
+
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+				       u64 *cur_tsc)
+{
+	BUG();
+	return U64_MAX;
+}
 #endif
 #endif

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/hyperv: Reenlightenment notifications support
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:01   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, kys, luto, rkagan, haiyangz, sthemmin, tglx,
	linux-kernel, vkuznets, cavery, rkrcmar, mmorsy,
	Michael.H.Kelley, pbonzini

Commit-ID:  93286261de1b46339aa27cd4c639b21778f6cade
Gitweb:     https://git.kernel.org/tip/93286261de1b46339aa27cd4c639b21778f6cade
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:33 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:32 +0100

x86/hyperv: Reenlightenment notifications support

Hyper-V supports Live Migration notification. This is supposed to be used
in conjunction with TSC emulation: when a VM is migrated to a host with
different TSC frequency for some short period the host emulates the
accesses to TSC and sends an interrupt to notify about the event. When the
guest is done updating everything it can disable TSC emulation and
everything will start working fast again.

These notifications weren't required until now as Hyper-V guests are not
supposed to use TSC as a clocksource: in Linux the TSC is even marked as
unstable on boot. Guests normally use 'tsc page' clocksource and host
updates its values on migrations automatically.

Things change when with nested virtualization: even when the PV
clocksources (kvm-clock or tsc page) are passed through to the nested
guests the TSC frequency and frequency changes need to be know..

Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
the fix in on the way.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com

---
 arch/x86/entry/entry_32.S          |  3 ++
 arch/x86/entry/entry_64.S          |  3 ++
 arch/x86/hyperv/hv_init.c          | 89 ++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/irq_vectors.h |  7 ++-
 arch/x86/include/asm/mshyperv.h    |  9 ++++
 arch/x86/include/uapi/asm/hyperv.h | 27 ++++++++++++
 arch/x86/kernel/cpu/mshyperv.c     |  6 +++
 7 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2a35b1e..7a796eed 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -895,6 +895,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 		 hyperv_vector_handler)
 
+BUILD_INTERRUPT3(hyperv_reenlightenment_vector, HYPERV_REENLIGHTENMENT_VECTOR,
+		 hyperv_reenlightenment_intr)
+
 #endif /* CONFIG_HYPERV */
 
 ENTRY(page_fault)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a835704..553aa49 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1245,6 +1245,9 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #if IS_ENABLED(CONFIG_HYPERV)
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	hyperv_callback_vector hyperv_vector_handler
+
+apicinterrupt3 HYPERV_REENLIGHTENMENT_VECTOR \
+	hyperv_reenlightenment_vector hyperv_reenlightenment_intr
 #endif /* CONFIG_HYPERV */
 
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1a6c63f..712ac40 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,8 @@
  */
 
 #include <linux/types.h>
+#include <asm/apic.h>
+#include <asm/desc.h>
 #include <asm/hypervisor.h>
 #include <asm/hyperv.h>
 #include <asm/mshyperv.h>
@@ -102,6 +104,93 @@ static int hv_cpu_init(unsigned int cpu)
 	return 0;
 }
 
+static void (*hv_reenlightenment_cb)(void);
+
+static void hv_reenlightenment_notify(struct work_struct *dummy)
+{
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	/* Don't issue the callback if TSC accesses are not emulated */
+	if (hv_reenlightenment_cb && emu_status.inprogress)
+		hv_reenlightenment_cb();
+}
+static DECLARE_DELAYED_WORK(hv_reenlightenment_work, hv_reenlightenment_notify);
+
+void hyperv_stop_tsc_emulation(void)
+{
+	u64 freq;
+	struct hv_tsc_emulation_status emu_status;
+
+	rdmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+	emu_status.inprogress = 0;
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_STATUS, *(u64 *)&emu_status);
+
+	rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
+	tsc_khz = div64_u64(freq, 1000);
+}
+EXPORT_SYMBOL_GPL(hyperv_stop_tsc_emulation);
+
+static inline bool hv_reenlightenment_available(void)
+{
+	/*
+	 * Check for required features and priviliges to make TSC frequency
+	 * change notifications work.
+	 */
+	return ms_hyperv.features & HV_X64_ACCESS_FREQUENCY_MSRS &&
+		ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE &&
+		ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT;
+}
+
+__visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
+{
+	entering_ack_irq();
+
+	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
+
+	exiting_irq();
+}
+
+void set_hv_tscchange_cb(void (*cb)(void))
+{
+	struct hv_reenlightenment_control re_ctrl = {
+		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
+		.enabled = 1,
+		.target_vp = hv_vp_index[smp_processor_id()]
+	};
+	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
+
+	if (!hv_reenlightenment_available()) {
+		pr_warn("Hyper-V: reenlightenment support is unavailable\n");
+		return;
+	}
+
+	hv_reenlightenment_cb = cb;
+
+	/* Make sure callback is registered before we write to MSRs */
+	wmb();
+
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
+}
+EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
+
+void clear_hv_tscchange_cb(void)
+{
+	struct hv_reenlightenment_control re_ctrl;
+
+	if (!hv_reenlightenment_available())
+		return;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+	re_ctrl.enabled = 0;
+	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *(u64 *)&re_ctrl);
+
+	hv_reenlightenment_cb = NULL;
+}
+EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 67421f6..e71c112 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,7 +103,12 @@
 #endif
 
 #define MANAGED_IRQ_SHUTDOWN_VECTOR	0xef
-#define LOCAL_TIMER_VECTOR		0xee
+
+#if IS_ENABLED(CONFIG_HYPERV)
+#define HYPERV_REENLIGHTENMENT_VECTOR	0xee
+#endif
+
+#define LOCAL_TIMER_VECTOR		0xed
 
 #define NR_VECTORS			 256
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6b1d4ea..1790002a 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -160,6 +160,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 #define hv_set_synint_state(int_num, val) wrmsrl(int_num, val)
 
 void hyperv_callback_vector(void);
+void hyperv_reenlightenment_vector(void);
 #ifdef CONFIG_TRACING
 #define trace_hyperv_callback_vector hyperv_callback_vector
 #endif
@@ -316,11 +317,19 @@ void hyper_alloc_mmu(void);
 void hyperv_report_panic(struct pt_regs *regs, long err);
 bool hv_is_hypercall_page_setup(void);
 void hyperv_cleanup(void);
+
+void hyperv_reenlightenment_intr(struct pt_regs *regs);
+void set_hv_tscchange_cb(void (*cb)(void));
+void clear_hv_tscchange_cb(void);
+void hyperv_stop_tsc_emulation(void);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline bool hv_is_hypercall_page_setup(void) { return false; }
 static inline void hyperv_cleanup(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
+static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
+static inline void clear_hv_tscchange_cb(void) {}
+static inline void hyperv_stop_tsc_emulation(void) {};
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE
diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfea..197c2e6 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h
@@ -40,6 +40,9 @@
  */
 #define HV_X64_ACCESS_FREQUENCY_MSRS		(1 << 11)
 
+/* AccessReenlightenmentControls privilege */
+#define HV_X64_ACCESS_REENLIGHTENMENT		BIT(13)
+
 /*
  * Basic SynIC MSRs (HV_X64_MSR_SCONTROL through HV_X64_MSR_EOM
  * and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15) available
@@ -234,6 +237,30 @@
 #define HV_X64_MSR_CRASH_PARAMS		\
 		(1 + (HV_X64_MSR_CRASH_P4 - HV_X64_MSR_CRASH_P0))
 
+/* TSC emulation after migration */
+#define HV_X64_MSR_REENLIGHTENMENT_CONTROL	0x40000106
+
+struct hv_reenlightenment_control {
+	u64 vector:8;
+	u64 reserved1:8;
+	u64 enabled:1;
+	u64 reserved2:15;
+	u64 target_vp:32;
+};
+
+#define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
+#define HV_X64_MSR_TSC_EMULATION_STATUS		0x40000108
+
+struct hv_tsc_emulation_control {
+	u64 enabled:1;
+	u64 reserved:63;
+};
+
+struct hv_tsc_emulation_status {
+	u64 inprogress:1;
+	u64 reserved:63;
+};
+
 #define HV_X64_MSR_HYPERCALL_ENABLE		0x00000001
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT	12
 #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK	\
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 85eb5fc..9340f41 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -251,6 +251,12 @@ static void __init ms_hyperv_init_platform(void)
 	hyperv_setup_mmu_ops();
 	/* Setup the IDT for hypervisor callback */
 	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, hyperv_callback_vector);
+
+	/* Setup the IDT for reenlightenment notifications */
+	if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT)
+		alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
+				hyperv_reenlightenment_vector);
+
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/hyperv: Redirect reenlightment notifications on CPU offlining
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:02   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sthemmin, linux-kernel, kys, haiyangz, hpa, rkagan,
	Michael.H.Kelley, mmorsy, pbonzini, tglx, luto, cavery, mingo,
	rkrcmar, vkuznets

Commit-ID:  e7c4e36c447daca2b7df49024f6bf230871cb155
Gitweb:     https://git.kernel.org/tip/e7c4e36c447daca2b7df49024f6bf230871cb155
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:34 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:33 +0100

x86/hyperv: Redirect reenlightment notifications on CPU offlining

It is very unlikely for CPUs to get offlined when running on Hyper-V as
there is a protection in the vmbus module which prevents it when the guest
has any VMBus devices assigned. This, however, may change in future if an
option to reassign an already active channel will be added. It is also
possible to run without any Hyper-V devices or to have a CPU with no
assigned channels.

Reassign reenlightenment notifications to some other active CPU when the
CPU which is assigned to them goes offline.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-5-vkuznets@redhat.com

---
 arch/x86/hyperv/hv_init.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 712ac40..e4377e2 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -191,6 +191,26 @@ void clear_hv_tscchange_cb(void)
 }
 EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
 
+static int hv_cpu_die(unsigned int cpu)
+{
+	struct hv_reenlightenment_control re_ctrl;
+	unsigned int new_cpu;
+
+	if (hv_reenlightenment_cb == NULL)
+		return 0;
+
+	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	if (re_ctrl.target_vp == hv_vp_index[cpu]) {
+		/* Reassign to some other online CPU */
+		new_cpu = cpumask_any_but(cpu_online_mask, cpu);
+
+		re_ctrl.target_vp = hv_vp_index[new_cpu];
+		wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	}
+
+	return 0;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -220,7 +240,7 @@ void hyperv_init(void)
 		return;
 
 	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
-			      hv_cpu_init, NULL) < 0)
+			      hv_cpu_init, hv_cpu_die) < 0)
 		goto free_vp_index;
 
 	/*

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
  (?)
@ 2018-01-30 23:02   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sthemmin, mmorsy, Michael.H.Kelley, pbonzini, mingo, kys,
	vkuznets, cavery, linux-kernel, luto, hpa, rkrcmar, haiyangz,
	tglx, rkagan

Commit-ID:  51d4e5daa32808df4d50db511d167fde19fa114e
Gitweb:     https://git.kernel.org/tip/51d4e5daa32808df4d50db511d167fde19fa114e
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:35 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:33 +0100

x86/irq: Count Hyper-V reenlightenment interrupts

Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
they are not interesting in general it's important when L2 nested guests
are running.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com

---
 arch/x86/hyperv/hv_init.c      | 2 ++
 arch/x86/include/asm/hardirq.h | 3 +++
 arch/x86/kernel/irq.c          | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4377e2..a3adece 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -147,6 +147,8 @@ __visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
 {
 	entering_ack_irq();
 
+	inc_irq_stat(irq_hv_reenlightenment_count);
+
 	schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
 
 	exiting_irq();
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 51cc979..7c341a7 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -38,6 +38,9 @@ typedef struct {
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
 	unsigned int irq_hv_callback_count;
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	unsigned int irq_hv_reenlightenment_count;
+#endif
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 68e1867..45fb4d2 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -142,6 +142,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 		seq_puts(p, "  Hypervisor callback interrupts\n");
 	}
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (test_bit(HYPERV_REENLIGHTENMENT_VECTOR, system_vectors)) {
+		seq_printf(p, "%*s: ", prec, "HRE");
+		for_each_online_cpu(j)
+			seq_printf(p, "%10u ",
+				   irq_stats(j)->irq_hv_reenlightenment_count);
+		seq_puts(p, "  Hyper-V reenlightenment interrupts\n");
+	}
+#endif
 	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
 	seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/kvm: Pass stable clocksource to guests when running nested on Hyper-V
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:02   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mmorsy, sthemmin, luto, kys, tglx, Michael.H.Kelley, vkuznets,
	hpa, rkrcmar, pbonzini, mingo, linux-kernel, rkagan, haiyangz,
	cavery

Commit-ID:  b0c39dc68e3b3d22bf9d2984f62f6c86788a49e7
Gitweb:     https://git.kernel.org/tip/b0c39dc68e3b3d22bf9d2984f62f6c86788a49e7
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:36 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:34 +0100

x86/kvm: Pass stable clocksource to guests when running nested on Hyper-V

Currently, KVM is able to work in 'masterclock' mode passing
PVCLOCK_TSC_STABLE_BIT to guests when the clocksource which is used on the
host is TSC.

When running nested on Hyper-V the guest normally uses a different one: TSC
page which is resistant to TSC frequency changes on events like L1
migration. Add support for it in KVM.

The only non-trivial change is in vgettsc(): when updating the gtod copy
both the clock readout and tsc value have to be updated now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-7-vkuznets@redhat.com

---
 arch/x86/kvm/x86.c | 93 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 68 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..b1ce368 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mshyperv.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -1377,6 +1378,11 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 	return tsc;
 }
 
+static inline int gtod_is_based_on_tsc(int mode)
+{
+	return mode == VCLOCK_TSC || mode == VCLOCK_HVCLOCK;
+}
+
 static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
@@ -1396,7 +1402,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * perform request to enable masterclock.
 	 */
 	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+	    (gtod_is_based_on_tsc(gtod->clock.vclock_mode) && vcpus_matched))
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
@@ -1459,6 +1465,19 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vcpu->arch.tsc_offset = offset;
 }
 
+static inline bool kvm_check_tsc_unstable(void)
+{
+#ifdef CONFIG_X86_64
+	/*
+	 * TSC is marked unstable when we're running on Hyper-V,
+	 * 'TSC page' clocksource is good.
+	 */
+	if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_HVCLOCK)
+		return false;
+#endif
+	return check_tsc_unstable();
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
 	struct kvm *kvm = vcpu->kvm;
@@ -1504,7 +1523,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
          */
 	if (synchronizing &&
 	    vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
-		if (!check_tsc_unstable()) {
+		if (!kvm_check_tsc_unstable()) {
 			offset = kvm->arch.cur_tsc_offset;
 			pr_debug("kvm: matched tsc offset for %llu\n", data);
 		} else {
@@ -1604,18 +1623,43 @@ static u64 read_tsc(void)
 	return last;
 }
 
-static inline u64 vgettsc(u64 *cycle_now)
+static inline u64 vgettsc(u64 *tsc_timestamp, int *mode)
 {
 	long v;
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
+	u64 tsc_pg_val;
+
+	switch (gtod->clock.vclock_mode) {
+	case VCLOCK_HVCLOCK:
+		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
+						  tsc_timestamp);
+		if (tsc_pg_val != U64_MAX) {
+			/* TSC page valid */
+			*mode = VCLOCK_HVCLOCK;
+			v = (tsc_pg_val - gtod->clock.cycle_last) &
+				gtod->clock.mask;
+		} else {
+			/* TSC page invalid */
+			*mode = VCLOCK_NONE;
+		}
+		break;
+	case VCLOCK_TSC:
+		*mode = VCLOCK_TSC;
+		*tsc_timestamp = read_tsc();
+		v = (*tsc_timestamp - gtod->clock.cycle_last) &
+			gtod->clock.mask;
+		break;
+	default:
+		*mode = VCLOCK_NONE;
+	}
 
-	*cycle_now = read_tsc();
+	if (*mode == VCLOCK_NONE)
+		*tsc_timestamp = v = 0;
 
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
 	return v * gtod->clock.mult;
 }
 
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
+static int do_monotonic_boot(s64 *t, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1624,9 +1668,8 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 		ns += gtod->boot_ns;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
@@ -1635,7 +1678,7 @@ static int do_monotonic_boot(s64 *t, u64 *cycle_now)
 	return mode;
 }
 
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
+static int do_realtime(struct timespec *ts, u64 *tsc_timestamp)
 {
 	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 	unsigned long seq;
@@ -1644,10 +1687,9 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 
 	do {
 		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
+		ns += vgettsc(tsc_timestamp, &mode);
 		ns >>= gtod->clock.shift;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
 
@@ -1657,25 +1699,26 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now)
 	return mode;
 }
 
-/* returns true if host is using tsc clocksource */
-static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
+/* returns true if host is using TSC based clocksource */
+static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_monotonic_boot(kernel_ns,
+						      tsc_timestamp));
 }
 
-/* returns true if host is using tsc clocksource */
+/* returns true if host is using TSC based clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
-					   u64 *cycle_now)
+					   u64 *tsc_timestamp)
 {
 	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
+	if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode))
 		return false;
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return gtod_is_based_on_tsc(do_realtime(ts, tsc_timestamp));
 }
 #endif
 
@@ -2869,13 +2912,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 	}
 
-	if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) {
+	if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) {
 		s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 :
 				rdtsc() - vcpu->arch.last_host_tsc;
 		if (tsc_delta < 0)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
-		if (check_tsc_unstable()) {
+		if (kvm_check_tsc_unstable()) {
 			u64 offset = kvm_compute_tsc_offset(vcpu,
 						vcpu->arch.last_guest_tsc);
 			kvm_vcpu_write_tsc_offset(vcpu, offset);
@@ -6110,9 +6153,9 @@ static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
 	update_pvclock_gtod(tk);
 
 	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
+	 * use, TSC based clocksource.
 	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
+	if (!gtod_is_based_on_tsc(gtod->clock.vclock_mode) &&
 	    atomic_read(&kvm_guest_has_master_clock) != 0)
 		queue_work(system_long_wq, &pvclock_gtod_work);
 
@@ -7767,7 +7810,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu;
 
-	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
+	if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
 		printk_once(KERN_WARNING
 		"kvm: SMP vm created on host with unstable TSC; "
 		"guest TSC will not be reliable\n");
@@ -7924,7 +7967,7 @@ int kvm_arch_hardware_enable(void)
 		return ret;
 
 	local_tsc = rdtsc();
-	stable = !check_tsc_unstable();
+	stable = !kvm_check_tsc_unstable();
 	list_for_each_entry(kvm, &vm_list, vm_list) {
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (!stable && vcpu->cpu == smp_processor_id())

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:x86/hyperv] x86/kvm: Support Hyper-V reenlightenment
  2018-01-24 13:23   ` Vitaly Kuznetsov
  (?)
@ 2018-01-30 23:03   ` tip-bot for Vitaly Kuznetsov
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Vitaly Kuznetsov @ 2018-01-30 23:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vkuznets, linux-kernel, sthemmin, mmorsy, kys, tglx, pbonzini,
	hpa, rkrcmar, Michael.H.Kelley, rkagan, haiyangz, mingo, cavery,
	luto

Commit-ID:  0092e4346f49558e5fe5a927c6d78d401dc4ed73
Gitweb:     https://git.kernel.org/tip/0092e4346f49558e5fe5a927c6d78d401dc4ed73
Author:     Vitaly Kuznetsov <vkuznets@redhat.com>
AuthorDate: Wed, 24 Jan 2018 14:23:37 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:55:34 +0100

x86/kvm: Support Hyper-V reenlightenment

When running nested KVM on Hyper-V guests its required to update
masterclocks for all guests when L1 migrates to a host with different TSC
frequency.

Implement the procedure in the following way:
  - Pause all guests.
  - Tell the host (Hyper-V) to stop emulating TSC accesses.
  - Update the gtod copy, recompute clocks.
  - Unpause all guests.

This is somewhat similar to cpufreq but there are two important differences:
 - TSC emulation can only be disabled globally (on all CPUs)
 - The new TSC frequency is not known until emulation is turned off so
   there is no way to 'prepare' for the event upfront.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Link: https://lkml.kernel.org/r/20180124132337.30138-8-vkuznets@redhat.com

---
 arch/x86/kvm/x86.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b1ce368..879a999 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,6 +68,7 @@
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
 #include <asm/mshyperv.h>
+#include <asm/hypervisor.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -5932,6 +5933,43 @@ static void tsc_khz_changed(void *data)
 	__this_cpu_write(cpu_tsc_khz, khz);
 }
 
+static void kvm_hyperv_tsc_notifier(void)
+{
+#ifdef CONFIG_X86_64
+	struct kvm *kvm;
+	struct kvm_vcpu *vcpu;
+	int cpu;
+
+	spin_lock(&kvm_lock);
+	list_for_each_entry(kvm, &vm_list, vm_list)
+		kvm_make_mclock_inprogress_request(kvm);
+
+	hyperv_stop_tsc_emulation();
+
+	/* TSC frequency always matches when on Hyper-V */
+	for_each_present_cpu(cpu)
+		per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+	kvm_max_guest_tsc_khz = tsc_khz;
+
+	list_for_each_entry(kvm, &vm_list, vm_list) {
+		struct kvm_arch *ka = &kvm->arch;
+
+		spin_lock(&ka->pvclock_gtod_sync_lock);
+
+		pvclock_update_vm_gtod_copy(kvm);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+
+		kvm_for_each_vcpu(cpu, vcpu, kvm)
+			kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
+
+		spin_unlock(&ka->pvclock_gtod_sync_lock);
+	}
+	spin_unlock(&kvm_lock);
+#endif
+}
+
 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 				     void *data)
 {
@@ -6217,6 +6255,9 @@ int kvm_arch_init(void *opaque)
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
 	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		set_hv_tscchange_cb(kvm_hyperv_tsc_notifier);
 #endif
 
 	return 0;
@@ -6229,6 +6270,10 @@ out:
 
 void kvm_arch_exit(void)
 {
+#ifdef CONFIG_X86_64
+	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
+		clear_hv_tscchange_cb();
+#endif
 	kvm_lapic_exit();
 	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
  2018-01-30 14:35         ` Thomas Gleixner
@ 2018-01-30 23:48             ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2018-01-30 23:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Stephen Hemminger, kvm, Haiyang Zhang, x86, linux-kernel, devel,
	Michael Kelley (EOSG),
	Ingo Molnar, Roman Kagan, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Vitaly Kuznetsov, Mohammed Gamal

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On Tue, 30 Jan 2018, Thomas Gleixner wrote:

> On Tue, 30 Jan 2018, Radim Krčmář wrote:
> > 2018-01-29 22:48+0100, Thomas Gleixner:
> > > On Wed, 24 Jan 2018, Radim Krčmář wrote:
> > > > 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > > > > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > > > > not supposed to see many of them. However, it may be important to know
> > > > > that the event has happened in case we have L2 nested guests.
> > > > > 
> > > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > > > > ---
> > > > 
> > > > Thomas,
> > > > 
> > > > I think the expectation is that this series will go through the KVM
> > > > tree.  Would you prefer a topic branch?
> > > 
> > > Is there any dependency outside of plain 4.15? If not, I'll put it into
> > > x86/hyperv and let KVM folks pull it over.
> > 
> > There isn't;  we'll wait for x86/hyperv, thanks.
> 
> Will be there tomorrow. I'll let you know.

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/hyperv

Feel free to pull it into your tree.

Thanks,

	tglx

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts
@ 2018-01-30 23:48             ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2018-01-30 23:48 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Vitaly Kuznetsov, kvm, x86, Paolo Bonzini, Ingo Molnar,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Michael Kelley (EOSG),
	Andy Lutomirski, Mohammed Gamal, Cathy Avery, Roman Kagan,
	linux-kernel, devel

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On Tue, 30 Jan 2018, Thomas Gleixner wrote:

> On Tue, 30 Jan 2018, Radim Krčmář wrote:
> > 2018-01-29 22:48+0100, Thomas Gleixner:
> > > On Wed, 24 Jan 2018, Radim Krčmář wrote:
> > > > 2018-01-24 14:23+0100, Vitaly Kuznetsov:
> > > > > Hyper-V reenlightenment interrupts arrive when the VM is migrated, we're
> > > > > not supposed to see many of them. However, it may be important to know
> > > > > that the event has happened in case we have L2 nested guests.
> > > > > 
> > > > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > > > > ---
> > > > 
> > > > Thomas,
> > > > 
> > > > I think the expectation is that this series will go through the KVM
> > > > tree.  Would you prefer a topic branch?
> > > 
> > > Is there any dependency outside of plain 4.15? If not, I'll put it into
> > > x86/hyperv and let KVM folks pull it over.
> > 
> > There isn't;  we'll wait for x86/hyperv, thanks.
> 
> Will be there tomorrow. I'll let you know.

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/hyperv

Feel free to pull it into your tree.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-01-30 23:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 13:23 [PATCH v4 0/7] x86/kvm/hyperv: stable clocksource for L2 guests when running nested KVM on Hyper-V Vitaly Kuznetsov
2018-01-24 13:23 ` Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 1/7] x86/hyper-v: check for required priviliges in hyperv_init() Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:00   ` [tip:x86/hyperv] x86/hyperv: Check " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 2/7] x86/hyper-v: add a function to read both TSC and TSC page value simulateneously Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:01   ` [tip:x86/hyperv] x86/hyperv: Add " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 3/7] x86/hyper-v: reenlightenment notifications support Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:01   ` [tip:x86/hyperv] x86/hyperv: Reenlightenment " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 4/7] x86/hyper-v: redirect reenlightment notifications on CPU offlining Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:02   ` [tip:x86/hyperv] x86/hyperv: Redirect " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 5/7] x86/irq: Count Hyper-V reenlightenment interrupts Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-24 14:58   ` Radim Krčmář
2018-01-24 14:58     ` Radim Krčmář
2018-01-29 21:48     ` Thomas Gleixner
2018-01-29 21:48       ` Thomas Gleixner
2018-01-30 14:02       ` Radim Krčmář
2018-01-30 14:02         ` Radim Krčmář
2018-01-30 14:35         ` Thomas Gleixner
2018-01-30 23:48           ` Thomas Gleixner
2018-01-30 23:48             ` Thomas Gleixner
2018-01-30 23:02   ` [tip:x86/hyperv] " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 6/7] x86/kvm: pass stable clocksource to guests when running nested on Hyper-V Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:02   ` [tip:x86/hyperv] x86/kvm: Pass " tip-bot for Vitaly Kuznetsov
2018-01-24 13:23 ` [PATCH v4 7/7] x86/kvm: support Hyper-V reenlightenment Vitaly Kuznetsov
2018-01-24 13:23   ` Vitaly Kuznetsov
2018-01-30 23:03   ` [tip:x86/hyperv] x86/kvm: Support " tip-bot for Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.