linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation
@ 2019-08-20  1:51 Dexuan Cui
  2019-08-20  1:51 ` [PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation Dexuan Cui
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:51 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

Hi all,
The patchset is to enhance hv_vmbus to support hibernation when Linux VM
runs on Hyper-V. A second patchset to enhance the high-level VSC drivers
(hv_netvsc, hv_storvsc, etc.) for hibernation will be posted after this
patchset is acceped. If you want to test this hibernation feaure, all the
needed patches can be found on my github branch:
https://github.com/dcui/linux/commits/decui/hibernation/2019-0818/v5.3-rc5

This patchset is based on v5.3-rc5.  

Please review.

Hi tglx,
I hope all the 12 patchset, including the below 3 patches,

[PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation
[PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported()
[PATCH v3 03/12] clocksource/drivers: Suspend/resume Hyper-V

can go through Sasha's hyperv/linux.git tree, since all the changes belong
to the hv stuff. However, if you think it's better for these 3 patches to go
through the tip.git tree, it also works for me.

Hi Michael Kelley,
I added your Reviewed-by's for patch 1, 3, 4, 5, 7 and 9, since you reviewed
these patches. Please review the others.

Looking forward to your comments!

Thanks,
Dexuan

Changes in v2:
  Patch 3: Improved the changelog and added a comment.

  Patch 4: Remove the "hv_stimer_cleanup" in hv_synic_suspend(), because I
           suppose https://lkml.org/lkml/2019/7/27/5 will be accepted. Also
           improved changelog and the comment.

  Patch 5: Fixed the third argument of print_hex_dump_debug(). Also improved
           the changelog.

  Patch 6: Improved the changelog and the comment. Added a check for the
           'vmbus_proto_version' in vmbus_bus_resume().

Changes in v3:
  Patch 2: Add a new API hv_is_hibernation_supported().

  Patch 6: Add a new helper function is_sub_channel().

  Patch 8: Find the old channels via Instance GUID rather than the RELID.

  Patch 10: Add code to clean up  hv_sock channels by force

  Patch 11: Add code to wait in the suspend path: all the hv_sock channels
            and sub-channels should be cleaned up first before Linux sends
            the VMBUS UNLOAD message.

  Patch 12: Add code to fix up the old primary channels before further
            resuming.

Dexuan Cui (12):
  x86/hyper-v: Suspend/resume the hypercall page for hibernation
  x86/hyper-v: Implement hv_is_hibernation_supported()
  clocksource/drivers: Suspend/resume Hyper-V clocksource for
    hibernation
  Drivers: hv: vmbus: Break out synic enable and disable operations
  Drivers: hv: vmbus: Suspend/resume the synic for hibernation
  Drivers: hv: vmbus: Add a helper function is_sub_channel()
  Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for
    hibernation
  Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
  Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
  Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
  Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
  Drivers: hv: vmbus: Resume after fixing up old primary channels

 arch/x86/hyperv/hv_init.c          |  41 ++++++
 drivers/clocksource/hyperv_timer.c |  25 ++++
 drivers/hv/channel_mgmt.c          | 127 +++++++++++++++---
 drivers/hv/connection.c            |  35 ++++-
 drivers/hv/hv.c                    |  66 ++++++----
 drivers/hv/hyperv_vmbus.h          |  33 +++++
 drivers/hv/vmbus_drv.c             | 262 +++++++++++++++++++++++++++++++++++++
 include/asm-generic/mshyperv.h     |   2 +
 include/linux/hyperv.h             |  16 ++-
 9 files changed, 558 insertions(+), 49 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
@ 2019-08-20  1:51 ` Dexuan Cui
  2019-08-20  1:51 ` [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported() Dexuan Cui
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:51 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

This is needed for hibernation, e.g. when we resume the old kernel, we need
to disable the "current" kernel's hypercall page and then resume the old
kernel's.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0d25868..78e53d9 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include <linux/hyperv.h>
 #include <linux/slab.h>
 #include <linux/cpuhotplug.h>
+#include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 
 void *hv_hypercall_pg;
@@ -223,6 +224,34 @@ static int __init hv_pci_init(void)
 	return 1;
 }
 
+static int hv_suspend(void)
+{
+	union hv_x64_msr_hypercall_contents hypercall_msr;
+
+	/* Reset the hypercall page */
+	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+	hypercall_msr.enable = 0;
+	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+
+	return 0;
+}
+
+static void hv_resume(void)
+{
+	union hv_x64_msr_hypercall_contents hypercall_msr;
+
+	/* Re-enable the hypercall page */
+	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+	hypercall_msr.enable = 1;
+	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
+	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+}
+
+static struct syscore_ops hv_syscore_ops = {
+	.suspend = hv_suspend,
+	.resume = hv_resume,
+};
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -303,6 +332,9 @@ void __init hyperv_init(void)
 
 	/* Register Hyper-V specific clocksource */
 	hv_init_clocksource();
+
+	register_syscore_ops(&hv_syscore_ops);
+
 	return;
 
 remove_cpuhp_state:
@@ -322,6 +354,8 @@ void hyperv_cleanup(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 
+	unregister_syscore_ops(&hv_syscore_ops);
+
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported()
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
  2019-08-20  1:51 ` [PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation Dexuan Cui
@ 2019-08-20  1:51 ` Dexuan Cui
  2019-08-23 19:50   ` Michael Kelley
  2019-08-20  1:52 ` [PATCH v3 03/12] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation Dexuan Cui
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:51 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

When a Linux VM runs on Hyper-V and hibernates, it must disable the
memory hot-add/remove and balloon up/down capabilities in the hv_balloon
driver.

By default, Hyper-V does not enable the virtual ACPI S4 state for a VM;
on recent Hyper-V hosts, the administrator is able to enable the virtual
ACPI S4 state for a VM, so we hope to use the presence of the virtual ACPI
S4 state as a hint for hv_balloon to disable the aforementioned
capabilities.

The new API will be used by hv_balloon.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/hv_init.c      | 7 +++++++
 include/asm-generic/mshyperv.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 78e53d9..6735e45 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -7,6 +7,7 @@
  * Author : K. Y. Srinivasan <kys@microsoft.com>
  */
 
+#include <linux/acpi.h>
 #include <linux/efi.h>
 #include <linux/types.h>
 #include <asm/apic.h>
@@ -453,3 +454,9 @@ bool hv_is_hyperv_initialized(void)
 	return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
+
+bool hv_is_hibernation_supported(void)
+{
+	return acpi_sleep_state_supported(ACPI_STATE_S4);
+}
+EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0becb7d..1cb4001 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -166,9 +166,11 @@ static inline int cpumask_to_vpset(struct hv_vpset *vpset,
 void hyperv_report_panic(struct pt_regs *regs, long err);
 void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
 bool hv_is_hyperv_initialized(void);
+bool hv_is_hibernation_supported(void);
 void hyperv_cleanup(void);
 #else /* CONFIG_HYPERV */
 static inline bool hv_is_hyperv_initialized(void) { return false; }
+static inline bool hv_is_hibernation_supported(void) { return false; }
 static inline void hyperv_cleanup(void) {}
 #endif /* CONFIG_HYPERV */
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 03/12] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
  2019-08-20  1:51 ` [PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation Dexuan Cui
  2019-08-20  1:51 ` [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported() Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 04/12] Drivers: hv: vmbus: Break out synic enable and disable operations Dexuan Cui
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

This is needed for hibernation, e.g. when we resume the old kernel, we need
to disable the "current" kernel's TSC page and then resume the old kernel's.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/clocksource/hyperv_timer.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 17b96f9..8f3422c 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -238,12 +238,37 @@ static u64 read_hv_clock_tsc(struct clocksource *arg)
 	return read_hv_sched_clock_tsc();
 }
 
+static void suspend_hv_clock_tsc(struct clocksource *arg)
+{
+	u64 tsc_msr;
+
+	/* Disable the TSC page */
+	hv_get_reference_tsc(tsc_msr);
+	tsc_msr &= ~BIT_ULL(0);
+	hv_set_reference_tsc(tsc_msr);
+}
+
+
+static void resume_hv_clock_tsc(struct clocksource *arg)
+{
+	phys_addr_t phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+	u64 tsc_msr;
+
+	/* Re-enable the TSC page */
+	hv_get_reference_tsc(tsc_msr);
+	tsc_msr &= GENMASK_ULL(11, 0);
+	tsc_msr |= BIT_ULL(0) | (u64)phys_addr;
+	hv_set_reference_tsc(tsc_msr);
+}
+
 static struct clocksource hyperv_cs_tsc = {
 	.name	= "hyperv_clocksource_tsc_page",
 	.rating	= 400,
 	.read	= read_hv_clock_tsc,
 	.mask	= CLOCKSOURCE_MASK(64),
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.suspend= suspend_hv_clock_tsc,
+	.resume	= resume_hv_clock_tsc,
 };
 #endif
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 04/12] Drivers: hv: vmbus: Break out synic enable and disable operations
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (2 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 03/12] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 05/12] Drivers: hv: vmbus: Suspend/resume the synic for hibernation Dexuan Cui
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

Break out synic enable and disable operations into separate
hv_synic_disable_regs() and hv_synic_enable_regs() functions for use by a
later patch to support hibernation.

There is no functional change except the unnecessary check
"if (sctrl.enable != 1) return -EFAULT;" which is removed, because when
we're in hv_synic_cleanup(), we're absolutely sure sctrl.enable must be 1.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/hv.c           | 66 ++++++++++++++++++++++++++---------------------
 drivers/hv/hyperv_vmbus.h |  2 ++
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 6188fb7..fcc5279 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -154,7 +154,7 @@ void hv_synic_free(void)
  * retrieve the initialized message and event pages.  Otherwise, we create and
  * initialize the message and event pages.
  */
-int hv_synic_init(unsigned int cpu)
+void hv_synic_enable_regs(unsigned int cpu)
 {
 	struct hv_per_cpu_context *hv_cpu
 		= per_cpu_ptr(hv_context.cpu_context, cpu);
@@ -196,6 +196,11 @@ int hv_synic_init(unsigned int cpu)
 	sctrl.enable = 1;
 
 	hv_set_synic_state(sctrl.as_uint64);
+}
+
+int hv_synic_init(unsigned int cpu)
+{
+	hv_synic_enable_regs(cpu);
 
 	hv_stimer_init(cpu);
 
@@ -205,20 +210,45 @@ int hv_synic_init(unsigned int cpu)
 /*
  * hv_synic_cleanup - Cleanup routine for hv_synic_init().
  */
-int hv_synic_cleanup(unsigned int cpu)
+void hv_synic_disable_regs(unsigned int cpu)
 {
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_scontrol sctrl;
+
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+	shared_sint.masked = 1;
+
+	/* Need to correctly cleanup in the case of SMP!!! */
+	/* Disable the interrupt */
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+	hv_get_simp(simp.as_uint64);
+	simp.simp_enabled = 0;
+	simp.base_simp_gpa = 0;
+
+	hv_set_simp(simp.as_uint64);
+
+	hv_get_siefp(siefp.as_uint64);
+	siefp.siefp_enabled = 0;
+	siefp.base_siefp_gpa = 0;
+
+	hv_set_siefp(siefp.as_uint64);
+
+	/* Disable the global synic bit */
+	hv_get_synic_state(sctrl.as_uint64);
+	sctrl.enable = 0;
+	hv_set_synic_state(sctrl.as_uint64);
+}
+
+int hv_synic_cleanup(unsigned int cpu)
+{
 	struct vmbus_channel *channel, *sc;
 	bool channel_found = false;
 	unsigned long flags;
 
-	hv_get_synic_state(sctrl.as_uint64);
-	if (sctrl.enable != 1)
-		return -EFAULT;
-
 	/*
 	 * Search for channels which are bound to the CPU we're about to
 	 * cleanup. In case we find one and vmbus is still connected we need to
@@ -249,29 +279,7 @@ int hv_synic_cleanup(unsigned int cpu)
 
 	hv_stimer_cleanup(cpu);
 
-	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	shared_sint.masked = 1;
-
-	/* Need to correctly cleanup in the case of SMP!!! */
-	/* Disable the interrupt */
-	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	hv_get_simp(simp.as_uint64);
-	simp.simp_enabled = 0;
-	simp.base_simp_gpa = 0;
-
-	hv_set_simp(simp.as_uint64);
-
-	hv_get_siefp(siefp.as_uint64);
-	siefp.siefp_enabled = 0;
-	siefp.base_siefp_gpa = 0;
-
-	hv_set_siefp(siefp.as_uint64);
-
-	/* Disable the global synic bit */
-	sctrl.enable = 0;
-	hv_set_synic_state(sctrl.as_uint64);
+	hv_synic_disable_regs(cpu);
 
 	return 0;
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index fb16a62..9f7fb6d 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -169,8 +169,10 @@ extern int hv_post_message(union hv_connection_id connection_id,
 
 extern void hv_synic_free(void);
 
+extern void hv_synic_enable_regs(unsigned int cpu);
 extern int hv_synic_init(unsigned int cpu);
 
+extern void hv_synic_disable_regs(unsigned int cpu);
 extern int hv_synic_cleanup(unsigned int cpu);
 
 /* Interface */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 05/12] Drivers: hv: vmbus: Suspend/resume the synic for hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (3 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 04/12] Drivers: hv: vmbus: Break out synic enable and disable operations Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel() Dexuan Cui
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

This is needed when we resume the old kernel from the "current" kernel.

Note: when hv_synic_suspend() and hv_synic_resume() run, all the
non-boot CPUs have been offlined, and interrupts are disabled on CPU0.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index ebd35fc..2ef375c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -30,6 +30,7 @@
 #include <linux/kdebug.h>
 #include <linux/efi.h>
 #include <linux/random.h>
+#include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 #include "hyperv_vmbus.h"
 
@@ -2086,6 +2087,47 @@ static void hv_crash_handler(struct pt_regs *regs)
 	hyperv_cleanup();
 };
 
+static int hv_synic_suspend(void)
+{
+	/*
+	 * When we reach here, all the non-boot CPUs have been offlined, and
+	 * the stimers on them have been unbound in hv_synic_cleanup() ->
+	 * hv_stimer_cleanup() -> clockevents_unbind_device().
+	 *
+	 * hv_synic_suspend() only runs on CPU0 with interrupts disabled. Here
+	 * we do not unbind the stimer on CPU0 because: 1) it's unnecessary
+	 * because the interrupts remain disabled between syscore_suspend()
+	 * and syscore_resume(): see create_image() and resume_target_kernel();
+	 * 2) the stimer on CPU0 is automatically disabled later by
+	 * syscore_suspend() -> timekeeping_suspend() -> tick_suspend() -> ...
+	 * -> clockevents_shutdown() -> ... -> hv_ce_shutdown(); 3) a warning
+	 * would be triggered if we call clockevents_unbind_device(), which
+	 * may sleep, in an interrupts-disabled context. So, we intentionally
+	 * don't call hv_stimer_cleanup(0) here.
+	 */
+
+	hv_synic_disable_regs(0);
+
+	return 0;
+}
+
+static void hv_synic_resume(void)
+{
+	hv_synic_enable_regs(0);
+
+	/*
+	 * Note: we don't need to call hv_stimer_init(0), because the timer
+	 * on CPU0 is not unbound in hv_synic_suspend(), and the timer is
+	 * automatically re-enabled in timekeeping_resume().
+	 */
+}
+
+/* The callbacks run only on CPU0, with irqs_disabled. */
+static struct syscore_ops hv_synic_syscore_ops = {
+	.suspend = hv_synic_suspend,
+	.resume = hv_synic_resume,
+};
+
 static int __init hv_acpi_init(void)
 {
 	int ret, t;
@@ -2116,6 +2158,8 @@ static int __init hv_acpi_init(void)
 	hv_setup_kexec_handler(hv_kexec_handler);
 	hv_setup_crash_handler(hv_crash_handler);
 
+	register_syscore_ops(&hv_synic_syscore_ops);
+
 	return 0;
 
 cleanup:
@@ -2128,6 +2172,8 @@ static void __exit vmbus_exit(void)
 {
 	int cpu;
 
+	unregister_syscore_ops(&hv_synic_syscore_ops);
+
 	hv_remove_kexec_handler();
 	hv_remove_crash_handler();
 	vmbus_connection.conn_state = DISCONNECTED;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel()
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (4 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 05/12] Drivers: hv: vmbus: Suspend/resume the synic for hibernation Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-23 19:51   ` Michael Kelley
  2019-08-20  1:52 ` [PATCH v3 07/12] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation Dexuan Cui
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

The existing method of telling if a channel is sub-channel in
vmbus_process_offer() is cumbersome. This new simple helper function
is preferred in future.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 include/linux/hyperv.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..2d39248 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -245,7 +245,10 @@ struct vmbus_channel_offer {
 		} pipe;
 	} u;
 	/*
-	 * The sub_channel_index is defined in win8.
+	 * The sub_channel_index is defined in Win8: a value of zero means a
+	 * primary channel and a value of non-zero means a sub-channel.
+	 *
+	 * Before Win8, the field is reserved, meaning it's always zero.
 	 */
 	u16 sub_channel_index;
 	u16 reserved3;
@@ -934,6 +937,11 @@ static inline bool is_hvsock_channel(const struct vmbus_channel *c)
 		  VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
 }
 
+static inline bool is_sub_channel(const struct vmbus_channel *c)
+{
+	return c->offermsg.offer.sub_channel_index != 0;
+}
+
 static inline void set_channel_affinity_state(struct vmbus_channel *c,
 					      enum hv_numa_policy policy)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 07/12] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (5 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel() Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation Dexuan Cui
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

The high-level VSC drivers will implement device-specific callbacks.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/hyperv.h |  3 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 2ef375c..a30c70a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -911,6 +911,43 @@ static void vmbus_shutdown(struct device *child_device)
 		drv->shutdown(dev);
 }
 
+/*
+ * vmbus_suspend - Suspend a vmbus device
+ */
+static int vmbus_suspend(struct device *child_device)
+{
+	struct hv_driver *drv;
+	struct hv_device *dev = device_to_hv_device(child_device);
+
+	/* The device may not be attached yet */
+	if (!child_device->driver)
+		return 0;
+
+	drv = drv_to_hv_drv(child_device->driver);
+	if (!drv->suspend)
+		return -EOPNOTSUPP;
+
+	return drv->suspend(dev);
+}
+
+/*
+ * vmbus_resume - Resume a vmbus device
+ */
+static int vmbus_resume(struct device *child_device)
+{
+	struct hv_driver *drv;
+	struct hv_device *dev = device_to_hv_device(child_device);
+
+	/* The device may not be attached yet */
+	if (!child_device->driver)
+		return 0;
+
+	drv = drv_to_hv_drv(child_device->driver);
+	if (!drv->resume)
+		return -EOPNOTSUPP;
+
+	return drv->resume(dev);
+}
 
 /*
  * vmbus_device_release - Final callback release of the vmbus child device
@@ -926,6 +963,14 @@ static void vmbus_device_release(struct device *device)
 	kfree(hv_dev);
 }
 
+/*
+ * Note: we must use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS rather than
+ * SET_SYSTEM_SLEEP_PM_OPS: see the comment before vmbus_bus_pm.
+ */
+static const struct dev_pm_ops vmbus_pm = {
+	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(vmbus_suspend, vmbus_resume)
+};
+
 /* The one and only one */
 static struct bus_type  hv_bus = {
 	.name =		"vmbus",
@@ -936,6 +981,7 @@ static void vmbus_device_release(struct device *device)
 	.uevent =		vmbus_uevent,
 	.dev_groups =		vmbus_dev_groups,
 	.drv_groups =		vmbus_drv_groups,
+	.pm =			&vmbus_pm,
 };
 
 struct onmessage_work_context {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2d39248..8a60e77 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1157,6 +1157,9 @@ struct hv_driver {
 	int (*remove)(struct hv_device *);
 	void (*shutdown)(struct hv_device *);
 
+	int (*suspend)(struct hv_device *);
+	int (*resume)(struct hv_device *);
+
 };
 
 /* Base device object */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (6 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 07/12] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-23 19:56   ` Michael Kelley
  2019-08-20  1:52 ` [PATCH v3 09/12] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation Dexuan Cui
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

When the VM resumes, the host re-sends the offers. We should not add the
offers to the global vmbus_connection.chn_list again.

This patch assumes the RELIDs of the channels don't change across
hibernation. Actually this is not always true, especially in the case of
NIC SR-IOV the VF vmbus device's RELID sometimes can change. A later patch
will address this issue by mapping the new offers to the old channels and
fixing up the old channels, if necessary.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
 drivers/hv/connection.c   | 27 +++++++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h |  3 +++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index addcef5..f7a1184 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -854,12 +854,39 @@ void vmbus_initiate_unload(bool crash)
 static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 {
 	struct vmbus_channel_offer_channel *offer;
-	struct vmbus_channel *newchannel;
+	struct vmbus_channel *oldchannel, *newchannel;
+	size_t offer_sz;
 
 	offer = (struct vmbus_channel_offer_channel *)hdr;
 
 	trace_vmbus_onoffer(offer);
 
+	mutex_lock(&vmbus_connection.channel_mutex);
+	oldchannel = find_primary_channel_by_offer(offer);
+	mutex_unlock(&vmbus_connection.channel_mutex);
+
+	if (oldchannel != NULL) {
+		atomic_dec(&vmbus_connection.offer_in_progress);
+
+		/*
+		 * We're resuming from hibernation: we expect the host to send
+		 * exactly the same offers that we had before the hibernation.
+		 */
+		offer_sz = sizeof(*offer);
+		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0)
+			return;
+
+		pr_debug("Mismatched offer from the host (relid=%d)\n",
+			 offer->child_relid);
+
+		print_hex_dump_debug("Old vmbus offer: ", DUMP_PREFIX_OFFSET,
+				     16, 4, &oldchannel->offermsg, offer_sz,
+				     false);
+		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
+				     16, 4, offer, offer_sz, false);
+		return;
+	}
+
 	/* Allocate the channel object and save this offer. */
 	newchannel = alloc_channel();
 	if (!newchannel) {
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 09829e1..6c7a983 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -337,6 +337,33 @@ struct vmbus_channel *relid2channel(u32 relid)
 }
 
 /*
+ * find_primary_channel_by_offer - Get the channel object given the new offer.
+ * This is only used in the resume path of hibernation.
+ */
+struct vmbus_channel *
+find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer)
+{
+	struct vmbus_channel *channel;
+	const guid_t *inst1, *inst2;
+
+	WARN_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
+
+	/* Ignore sub-channel offers. */
+	if (offer->offer.sub_channel_index != 0)
+		return NULL;
+
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		inst1 = &channel->offermsg.offer.if_instance;
+		inst2 = &offer->offer.if_instance;
+
+		if (guid_equal(inst1, inst2))
+			return channel;
+	}
+
+	return NULL;
+}
+
+/*
  * vmbus_on_event - Process a channel event notification
  *
  * For batched channels (default) optimize host to guest signaling
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9f7fb6d..c42b46d 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -310,6 +310,9 @@ int vmbus_add_channel_kobj(struct hv_device *device_obj,
 
 struct vmbus_channel *relid2channel(u32 relid);
 
+struct vmbus_channel *
+find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer);
+
 void vmbus_free_channels(void);
 
 /* Connection interface */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 09/12] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (7 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend Dexuan Cui
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

Before Linux enters hibernation, it sends the CHANNELMSG_UNLOAD message to
the host so all the offers are gone. After hibernation, Linux needs to
re-negotiate with the host using the same vmbus protocol version (which
was in use before hibernation), and ask the host to re-offer the vmbus
devices.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---
 drivers/hv/connection.c   |  3 +--
 drivers/hv/hyperv_vmbus.h |  2 ++
 drivers/hv/vmbus_drv.c    | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6c7a983..701d9a8 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -59,8 +59,7 @@ static __u32 vmbus_get_next_version(__u32 current_version)
 	}
 }
 
-static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
-					__u32 version)
+int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
 	int ret = 0;
 	unsigned int cur_cpu;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index c42b46d..4610277 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -272,6 +272,8 @@ struct vmbus_msginfo {
 
 extern struct vmbus_connection vmbus_connection;
 
+int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version);
+
 static inline void vmbus_send_interrupt(u32 relid)
 {
 	sync_set_bit(relid, vmbus_connection.send_int_page);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index a30c70a..ce9974b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2089,6 +2089,51 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
+static int vmbus_bus_suspend(struct device *dev)
+{
+	vmbus_initiate_unload(false);
+
+	vmbus_connection.conn_state = DISCONNECTED;
+
+	return 0;
+}
+
+static int vmbus_bus_resume(struct device *dev)
+{
+	struct vmbus_channel_msginfo *msginfo;
+	size_t msgsize;
+	int ret;
+
+	/*
+	 * We only use the 'vmbus_proto_version', which was in use before
+	 * hibernation, to re-negotiate with the host.
+	 */
+	if (vmbus_proto_version == VERSION_INVAL ||
+	    vmbus_proto_version == 0) {
+		pr_err("Invalid proto version = 0x%x\n", vmbus_proto_version);
+		return -EINVAL;
+	}
+
+	msgsize = sizeof(*msginfo) +
+		  sizeof(struct vmbus_channel_initiate_contact);
+
+	msginfo = kzalloc(msgsize, GFP_KERNEL);
+
+	if (msginfo == NULL)
+		return -ENOMEM;
+
+	ret = vmbus_negotiate_version(msginfo, vmbus_proto_version);
+
+	kfree(msginfo);
+
+	if (ret != 0)
+		return ret;
+
+	vmbus_request_offers();
+
+	return 0;
+}
+
 static const struct acpi_device_id vmbus_acpi_device_ids[] = {
 	{"VMBUS", 0},
 	{"VMBus", 0},
@@ -2096,6 +2141,19 @@ static int vmbus_acpi_add(struct acpi_device *device)
 };
 MODULE_DEVICE_TABLE(acpi, vmbus_acpi_device_ids);
 
+/*
+ * Note: we must use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS rather than
+ * SET_SYSTEM_SLEEP_PM_OPS, otherwise NIC SR-IOV can not work, because the
+ * "pci_dev_pm_ops" uses the "noirq" callbacks: in the resume path, the
+ * pci "noirq" restore callback runs before "non-noirq" callbacks (see
+ * resume_target_kernel() -> dpm_resume_start(), and hibernation_restore() ->
+ * dpm_resume_end()). This means vmbus_bus_resume() and the pci-hyperv's
+ * resume callback must also run via the "noirq" callbacks.
+ */
+static const struct dev_pm_ops vmbus_bus_pm = {
+	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(vmbus_bus_suspend, vmbus_bus_resume)
+};
+
 static struct acpi_driver vmbus_acpi_driver = {
 	.name = "vmbus",
 	.ids = vmbus_acpi_device_ids,
@@ -2103,6 +2161,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 		.add = vmbus_acpi_add,
 		.remove = vmbus_acpi_remove,
 	},
+	.drv.pm = &vmbus_bus_pm,
 };
 
 static void hv_kexec_handler(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (8 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 09/12] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-23 20:02   ` Michael Kelley
  2019-08-20  1:52 ` [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels Dexuan Cui
  2019-08-20  1:52 ` [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels Dexuan Cui
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
hibernation. There is no better method to clean up the channels since
some of the channels may still be referenced by the userspace apps when
hiberantin is triggered: in this case, the "rescind" fields of the
channels are set, and the apps will thoroughly destroy the channels
after hibernation.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index ce9974b..2bea669 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -24,6 +24,7 @@
 #include <linux/sched/task_stack.h>
 
 #include <asm/mshyperv.h>
+#include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/ptrace.h>
 #include <linux/screen_info.h>
@@ -1069,6 +1070,41 @@ void vmbus_on_msg_dpc(unsigned long data)
 	vmbus_signal_eom(msg, message_type);
 }
 
+/*
+ * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
+ * hibernation, because hv_sock connections can not persist across hibernation.
+ */
+static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
+{
+	struct onmessage_work_context *ctx;
+	struct vmbus_channel_rescind_offer *rescind;
+
+	WARN_ON(!is_hvsock_channel(channel));
+
+	/*
+	 * sizeof(*ctx) is small and the allocation should really not fail,
+	 * otherwise the state of the hv_sock connections ends up in limbo.
+	 */
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL);
+
+	/*
+	 * So far, these are not really used by Linux. Just set them to the
+	 * reasonable values conforming to the definitions of the fields.
+	 */
+	ctx->msg.header.message_type = 1;
+	ctx->msg.header.payload_size = sizeof(*rescind);
+
+	/* These values are actually used by Linux. */
+	rescind = (struct vmbus_channel_rescind_offer *)ctx->msg.u.payload;
+	rescind->header.msgtype = CHANNELMSG_RESCIND_CHANNELOFFER;
+	rescind->child_relid = channel->offermsg.child_relid;
+
+	INIT_WORK(&ctx->work, vmbus_onmessage_work);
+
+	queue_work_on(vmbus_connection.connect_cpu,
+		      vmbus_connection.work_queue,
+		      &ctx->work);
+}
 
 /*
  * Direct callback for channels using other deferred processing
@@ -2091,6 +2127,25 @@ static int vmbus_acpi_add(struct acpi_device *device)
 
 static int vmbus_bus_suspend(struct device *dev)
 {
+	struct vmbus_channel *channel;
+
+	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
+		/*
+		 * We wait here until any channel offer is currently
+		 * being processed.
+		 */
+		msleep(1);
+	}
+
+	mutex_lock(&vmbus_connection.channel_mutex);
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (!is_hvsock_channel(channel))
+			continue;
+
+		vmbus_force_channel_rescinded(channel);
+	}
+	mutex_unlock(&vmbus_connection.channel_mutex);
+
 	vmbus_initiate_unload(false);
 
 	vmbus_connection.conn_state = DISCONNECTED;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (9 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-23 20:16   ` Michael Kelley
  2019-08-20  1:52 ` [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels Dexuan Cui
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

Before suspend, Linux must make sure all the hv_sock channels have been
properly cleaned up, because a hv_sock connection can not persist across
hibernation, and the user-space app must be properly notified of the
state change of the connection.

Before suspend, Linux also must make sure all the sub-channels have been
destroyed, i.e. the related channel structs of the sub-channels must be
properly removed, otherwise they would cause a conflict when the
sub-channels are recreated upon resume.

Add a counter to track such channels, and vmbus_bus_suspend() should wait
for the counter to drop to zero.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 28 ++++++++++++++++++++++++++++
 drivers/hv/connection.c   |  3 +++
 drivers/hv/hyperv_vmbus.h | 12 ++++++++++++
 drivers/hv/vmbus_drv.c    | 41 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f7a1184..8491d1b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -499,6 +499,8 @@ static void vmbus_add_channel_work(struct work_struct *work)
 	return;
 
 err_deq_chan:
+	WARN_ON_ONCE(1);
+
 	mutex_lock(&vmbus_connection.channel_mutex);
 
 	/*
@@ -545,6 +547,10 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 
 	mutex_lock(&vmbus_connection.channel_mutex);
 
+	/* Remember the channels that should be cleaned up upon suspend. */
+	if (is_hvsock_channel(newchannel) || is_sub_channel(newchannel))
+		atomic_inc(&vmbus_connection.nr_chan_close_on_suspend);
+
 	/*
 	 * Now that we have acquired the channel_mutex,
 	 * we can release the potentially racing rescind thread.
@@ -915,6 +921,16 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	vmbus_process_offer(newchannel);
 }
 
+static void check_ready_for_suspend_event(void)
+{
+	/*
+	 * If all the sub-channels or hv_sock channels have been cleaned up,
+	 * then it's safe to suspend.
+	 */
+	if (atomic_dec_and_test(&vmbus_connection.nr_chan_close_on_suspend))
+		complete(&vmbus_connection.ready_for_suspend_event);
+}
+
 /*
  * vmbus_onoffer_rescind - Rescind offer handler.
  *
@@ -925,6 +941,7 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 	struct vmbus_channel_rescind_offer *rescind;
 	struct vmbus_channel *channel;
 	struct device *dev;
+	bool clean_up_chan_for_suspend;
 
 	rescind = (struct vmbus_channel_rescind_offer *)hdr;
 
@@ -964,6 +981,8 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 		return;
 	}
 
+	clean_up_chan_for_suspend = is_hvsock_channel(channel) ||
+				    is_sub_channel(channel);
 	/*
 	 * Before setting channel->rescind in vmbus_rescind_cleanup(), we
 	 * should make sure the channel callback is not running any more.
@@ -989,6 +1008,10 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 	if (channel->device_obj) {
 		if (channel->chn_rescind_callback) {
 			channel->chn_rescind_callback(channel);
+
+			if (clean_up_chan_for_suspend)
+				check_ready_for_suspend_event();
+
 			return;
 		}
 		/*
@@ -1021,6 +1044,11 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 		}
 		mutex_unlock(&vmbus_connection.channel_mutex);
 	}
+
+	/* The "channel" may have been freed. Do not access it any longer. */
+
+	if (clean_up_chan_for_suspend)
+		check_ready_for_suspend_event();
 }
 
 void vmbus_hvsock_device_unregister(struct vmbus_channel *channel)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 701d9a8..f15d3115 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -26,6 +26,9 @@
 struct vmbus_connection vmbus_connection = {
 	.conn_state		= DISCONNECTED,
 	.next_gpadl_handle	= ATOMIC_INIT(0xE1E10),
+
+	.ready_for_suspend_event= COMPLETION_INITIALIZER(
+				  vmbus_connection.ready_for_suspend_event),
 };
 EXPORT_SYMBOL_GPL(vmbus_connection);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 4610277..9f96e23 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -258,6 +258,18 @@ struct vmbus_connection {
 	struct workqueue_struct *work_queue;
 	struct workqueue_struct *handle_primary_chan_wq;
 	struct workqueue_struct *handle_sub_chan_wq;
+
+	/*
+	 * The number of sub-channels and hv_sock channels that should be
+	 * cleaned up upon suspend: sub-channels will be re-created upon
+	 * resume, and hv_sock channels should not survive suspend.
+	 */
+	atomic_t nr_chan_close_on_suspend;
+	/*
+	 * vmbus_bus_suspend() waits for "nr_chan_close_on_suspend" to
+	 * drop to zero.
+	 */
+	struct completion ready_for_suspend_event;
 };
 
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 2bea669..0507157 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2127,7 +2127,8 @@ static int vmbus_acpi_add(struct acpi_device *device)
 
 static int vmbus_bus_suspend(struct device *dev)
 {
-	struct vmbus_channel *channel;
+	struct vmbus_channel *channel, *sc;
+	unsigned long flags;
 
 	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
 		/*
@@ -2146,6 +2147,41 @@ static int vmbus_bus_suspend(struct device *dev)
 	}
 	mutex_unlock(&vmbus_connection.channel_mutex);
 
+	/*
+	 * Wait until all the sub-channels and hv_sock channels have been
+	 * cleaned up. Sub-channels should be destroyed upon suspend, otherwise
+	 * they would conflict with the new sub-channels that will be created
+	 * in the resume path. hv_sock channels should also be destroyed, but
+	 * a hv_sock channel of an established hv_sock connection can not be
+	 * really destroyed since it may still be referenced by the userspace
+	 * application, so we just force the hv_sock channel to be rescinded
+	 * by vmbus_force_channel_rescinded(), and the userspace application
+	 * will thoroughly destroy the channel after hibernation.
+	 */
+	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
+		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
+
+	mutex_lock(&vmbus_connection.channel_mutex);
+
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (is_hvsock_channel(channel)) {
+			if (!channel->rescind) {
+				pr_err("hv_sock channel not rescinded!\n");
+				WARN_ON_ONCE(1);
+			}
+			continue;
+		}
+
+		spin_lock_irqsave(&channel->lock, flags);
+		list_for_each_entry(sc, &channel->sc_list, sc_list) {
+			pr_err("Sub-channel not deleted!\n");
+			WARN_ON_ONCE(1);
+		}
+		spin_unlock_irqrestore(&channel->lock, flags);
+	}
+
+	mutex_unlock(&vmbus_connection.channel_mutex);
+
 	vmbus_initiate_unload(false);
 
 	vmbus_connection.conn_state = DISCONNECTED;
@@ -2186,6 +2222,9 @@ static int vmbus_bus_resume(struct device *dev)
 
 	vmbus_request_offers();
 
+	/* Reset the event for the next suspend. */
+	reinit_completion(&vmbus_connection.ready_for_suspend_event);
+
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels
  2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
                   ` (10 preceding siblings ...)
  2019-08-20  1:52 ` [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels Dexuan Cui
@ 2019-08-20  1:52 ` Dexuan Cui
  2019-08-23 20:25   ` Michael Kelley
  11 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-20  1:52 UTC (permalink / raw)
  To: linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin, sashal,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, tglx
  Cc: linux-kernel, Dexuan Cui

When the host re-offers the primary channels upon resume, the host only
guarantees the Instance GUID  doesn't change, so vmbus_bus_suspend()
should invalidate channel->offermsg.child_relid and figure out the
number of primary channels that need to be fixed up upon resume.

Upon resume, vmbus_onoffer() finds the old channel structs, and maps
the new offers to the old channels, and fixes up the old structs,
and finally the resume callbacks of the VSC drivers will re-open
the channels.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 76 +++++++++++++++++++++++++++++++++++------------
 drivers/hv/connection.c   |  2 ++
 drivers/hv/hyperv_vmbus.h | 14 +++++++++
 drivers/hv/vmbus_drv.c    | 17 +++++++++++
 include/linux/hyperv.h    |  3 ++
 5 files changed, 93 insertions(+), 19 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8491d1b..61b6288 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -407,7 +407,15 @@ void hv_process_channel_removal(struct vmbus_channel *channel)
 		cpumask_clear_cpu(channel->target_cpu,
 				  &primary_channel->alloced_cpus_in_node);
 
-	vmbus_release_relid(channel->offermsg.child_relid);
+	/*
+	 * Upon suspend, an in-use hv_sock channel is marked as "rescinded" and
+	 * the relid is invalidated; after hibernation, when the user-space app
+	 * destroys the channel, the relid is INVALID_RELID, and in this case
+	 * it's unnecessary and unsafe to release the old relid, since the same
+	 * relid can refer to a completely different channel now.
+	 */
+	if (channel->offermsg.child_relid != INVALID_RELID)
+		vmbus_release_relid(channel->offermsg.child_relid);
 
 	free_channel(channel);
 }
@@ -853,6 +861,36 @@ void vmbus_initiate_unload(bool crash)
 		vmbus_wait_for_unload();
 }
 
+static void check_ready_for_resume_event(void)
+{
+	/*
+	 * If all the old primary channels have been fixed up, then it's safe
+	 * to resume.
+	 */
+	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
+		complete(&vmbus_connection.ready_for_resume_event);
+}
+
+static void vmbus_setup_channel_state(struct vmbus_channel *channel,
+				      struct vmbus_channel_offer_channel *offer)
+{
+	/*
+	 * Setup state for signalling the host.
+	 */
+	channel->sig_event = VMBUS_EVENT_CONNECTION_ID;
+
+	if (vmbus_proto_version != VERSION_WS2008) {
+		channel->is_dedicated_interrupt =
+				(offer->is_dedicated_interrupt != 0);
+		channel->sig_event = offer->connection_id;
+	}
+
+	memcpy(&channel->offermsg, offer,
+	       sizeof(struct vmbus_channel_offer_channel));
+	channel->monitor_grp = (u8)offer->monitorid / 32;
+	channel->monitor_bit = (u8)offer->monitorid % 32;
+}
+
 /*
  * vmbus_onoffer - Handler for channel offers from vmbus in parent partition.
  *
@@ -875,12 +913,21 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		atomic_dec(&vmbus_connection.offer_in_progress);
 
 		/*
-		 * We're resuming from hibernation: we expect the host to send
-		 * exactly the same offers that we had before the hibernation.
+		 * We're resuming from hibernation: all the sub-channel and
+		 * hv_sock channels we had before the hibernation should have
+		 * been cleaned up, and now we must be seeing a re-offered
+		 * primary channel that we had before the hibernation.
 		 */
+
+		WARN_ON(oldchannel->offermsg.child_relid != INVALID_RELID);
+		/* Fix up the relid. */
+		oldchannel->offermsg.child_relid = offer->child_relid;
+
 		offer_sz = sizeof(*offer);
-		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0)
+		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0) {
+			check_ready_for_resume_event();
 			return;
+		}
 
 		pr_debug("Mismatched offer from the host (relid=%d)\n",
 			 offer->child_relid);
@@ -890,6 +937,11 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 				     false);
 		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
 				     16, 4, offer, offer_sz, false);
+
+		vmbus_setup_channel_state(oldchannel, offer);
+
+		check_ready_for_resume_event();
+
 		return;
 	}
 
@@ -902,21 +954,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		return;
 	}
 
-	/*
-	 * Setup state for signalling the host.
-	 */
-	newchannel->sig_event = VMBUS_EVENT_CONNECTION_ID;
-
-	if (vmbus_proto_version != VERSION_WS2008) {
-		newchannel->is_dedicated_interrupt =
-				(offer->is_dedicated_interrupt != 0);
-		newchannel->sig_event = offer->connection_id;
-	}
-
-	memcpy(&newchannel->offermsg, offer,
-	       sizeof(struct vmbus_channel_offer_channel));
-	newchannel->monitor_grp = (u8)offer->monitorid / 32;
-	newchannel->monitor_bit = (u8)offer->monitorid % 32;
+	vmbus_setup_channel_state(newchannel, offer);
 
 	vmbus_process_offer(newchannel);
 }
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index f15d3115..ebfa9aa 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -29,6 +29,8 @@ struct vmbus_connection vmbus_connection = {
 
 	.ready_for_suspend_event= COMPLETION_INITIALIZER(
 				  vmbus_connection.ready_for_suspend_event),
+	.ready_for_resume_event	= COMPLETION_INITIALIZER(
+				  vmbus_connection.ready_for_resume_event),
 };
 EXPORT_SYMBOL_GPL(vmbus_connection);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9f96e23..fb00ffd 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -270,6 +270,20 @@ struct vmbus_connection {
 	 * drop to zero.
 	 */
 	struct completion ready_for_suspend_event;
+
+	/*
+	 * The number of primary channels that should be "fixed up"
+	 * upon resume: these channels are re-offered upon resume, and some
+	 * fields of the channel offers (i.e. child_relid and connection_id)
+	 * can change, so the old offermsg must be fixed up, before the resume
+	 * callbacks of the VSC drivers start to further touch the channels.
+	 */
+	atomic_t nr_chan_fixup_on_resume;
+	/*
+	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
+	 * drop to zero.
+	 */
+	struct completion ready_for_resume_event;
 };
 
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0507157..edec984 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2161,9 +2161,17 @@ static int vmbus_bus_suspend(struct device *dev)
 	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
 		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
 
+	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0);
+
 	mutex_lock(&vmbus_connection.channel_mutex);
 
 	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		/*
+		 * Invalidate the field. Upon resume, vmbus_onoffer() will fix
+		 * up the field, and the other fields (if necessary).
+		 */
+		channel->offermsg.child_relid = INVALID_RELID;
+
 		if (is_hvsock_channel(channel)) {
 			if (!channel->rescind) {
 				pr_err("hv_sock channel not rescinded!\n");
@@ -2178,6 +2186,8 @@ static int vmbus_bus_suspend(struct device *dev)
 			WARN_ON_ONCE(1);
 		}
 		spin_unlock_irqrestore(&channel->lock, flags);
+
+		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
 	}
 
 	mutex_unlock(&vmbus_connection.channel_mutex);
@@ -2186,6 +2196,9 @@ static int vmbus_bus_suspend(struct device *dev)
 
 	vmbus_connection.conn_state = DISCONNECTED;
 
+	/* Reset the event for the next resume. */
+	reinit_completion(&vmbus_connection.ready_for_resume_event);
+
 	return 0;
 }
 
@@ -2220,8 +2233,12 @@ static int vmbus_bus_resume(struct device *dev)
 	if (ret != 0)
 		return ret;
 
+	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
+
 	vmbus_request_offers();
 
+	wait_for_completion(&vmbus_connection.ready_for_resume_event);
+
 	/* Reset the event for the next suspend. */
 	reinit_completion(&vmbus_connection.ready_for_suspend_event);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 8a60e77..a3aa9e9 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -426,6 +426,9 @@ enum vmbus_channel_message_type {
 	CHANNELMSG_COUNT
 };
 
+/* Hyper-V supports about 2048 channels, and the RELIDs start with 1. */
+#define INVALID_RELID	U32_MAX
+
 struct vmbus_channel_message_header {
 	enum vmbus_channel_message_type msgtype;
 	u32 padding;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported()
  2019-08-20  1:51 ` [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported() Dexuan Cui
@ 2019-08-23 19:50   ` Michael Kelley
  2019-08-30 23:37     ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 19:50 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, August 19, 2019 6:52 PM
> 
> When a Linux VM runs on Hyper-V and hibernates, it must disable the
> memory hot-add/remove and balloon up/down capabilities in the hv_balloon
> driver.

I'm unclear on the above statement.  I think the requirement is that
ballooning must not be active when hibernation is initiated.  Is hibernation
blocked in that case?  If not, what happens?

> 
> By default, Hyper-V does not enable the virtual ACPI S4 state for a VM;
> on recent Hyper-V hosts, the administrator is able to enable the virtual
> ACPI S4 state for a VM, so we hope to use the presence of the virtual ACPI

"we hope" sounds very indefinite. :-(    Does ACPI S4 have to be enabled for
hibernation to be initiated?  Goes back to my first question ....

> S4 state as a hint for hv_balloon to disable the aforementioned
> capabilities.
> 
> The new API will be used by hv_balloon.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c      | 7 +++++++
>  include/asm-generic/mshyperv.h | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 78e53d9..6735e45 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -7,6 +7,7 @@
>   * Author : K. Y. Srinivasan <kys@microsoft.com>
>   */
> 
> +#include <linux/acpi.h>
>  #include <linux/efi.h>
>  #include <linux/types.h>
>  #include <asm/apic.h>
> @@ -453,3 +454,9 @@ bool hv_is_hyperv_initialized(void)
>  	return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> +
> +bool hv_is_hibernation_supported(void)
> +{
> +	return acpi_sleep_state_supported(ACPI_STATE_S4);
> +}
> +EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 0becb7d..1cb4001 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -166,9 +166,11 @@ static inline int cpumask_to_vpset(struct hv_vpset *vpset,
>  void hyperv_report_panic(struct pt_regs *regs, long err);
>  void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
>  bool hv_is_hyperv_initialized(void);
> +bool hv_is_hibernation_supported(void);
>  void hyperv_cleanup(void);
>  #else /* CONFIG_HYPERV */
>  static inline bool hv_is_hyperv_initialized(void) { return false; }
> +static inline bool hv_is_hibernation_supported(void) { return false; }
>  static inline void hyperv_cleanup(void) {}
>  #endif /* CONFIG_HYPERV */
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel()
  2019-08-20  1:52 ` [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel() Dexuan Cui
@ 2019-08-23 19:51   ` Michael Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 19:51 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, August 19, 2019 6:52 PM
> 
> The existing method of telling if a channel is sub-channel in
> vmbus_process_offer() is cumbersome. This new simple helper function
> is preferred in future.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  include/linux/hyperv.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 6256cc3..2d39248 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -245,7 +245,10 @@ struct vmbus_channel_offer {
>  		} pipe;
>  	} u;
>  	/*
> -	 * The sub_channel_index is defined in win8.
> +	 * The sub_channel_index is defined in Win8: a value of zero means a
> +	 * primary channel and a value of non-zero means a sub-channel.
> +	 *
> +	 * Before Win8, the field is reserved, meaning it's always zero.
>  	 */
>  	u16 sub_channel_index;
>  	u16 reserved3;
> @@ -934,6 +937,11 @@ static inline bool is_hvsock_channel(const struct vmbus_channel
> *c)
>  		  VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
>  }
> 
> +static inline bool is_sub_channel(const struct vmbus_channel *c)
> +{
> +	return c->offermsg.offer.sub_channel_index != 0;
> +}
> +
>  static inline void set_channel_affinity_state(struct vmbus_channel *c,
>  					      enum hv_numa_policy policy)
>  {
> --
> 1.8.3.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
  2019-08-20  1:52 ` [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation Dexuan Cui
@ 2019-08-23 19:56   ` Michael Kelley
  2019-08-31  0:23     ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 19:56 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com>  Sent: Monday, August 19, 2019 6:52 PM
> 
> When the VM resumes, the host re-sends the offers. We should not add the
> offers to the global vmbus_connection.chn_list again.
> 
> This patch assumes the RELIDs of the channels don't change across
> hibernation. Actually this is not always true, especially in the case of
> NIC SR-IOV the VF vmbus device's RELID sometimes can change. A later patch
> will address this issue by mapping the new offers to the old channels and
> fixing up the old channels, if necessary.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
>  drivers/hv/connection.c   | 27 +++++++++++++++++++++++++++
>  drivers/hv/hyperv_vmbus.h |  3 +++
>  3 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index addcef5..f7a1184 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -854,12 +854,39 @@ void vmbus_initiate_unload(bool crash)
>  static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>  {
>  	struct vmbus_channel_offer_channel *offer;
> -	struct vmbus_channel *newchannel;
> +	struct vmbus_channel *oldchannel, *newchannel;
> +	size_t offer_sz;
> 
>  	offer = (struct vmbus_channel_offer_channel *)hdr;
> 
>  	trace_vmbus_onoffer(offer);
> 
> +	mutex_lock(&vmbus_connection.channel_mutex);
> +	oldchannel = find_primary_channel_by_offer(offer);
> +	mutex_unlock(&vmbus_connection.channel_mutex);
> +
> +	if (oldchannel != NULL) {
> +		atomic_dec(&vmbus_connection.offer_in_progress);
> +
> +		/*
> +		 * We're resuming from hibernation: we expect the host to send
> +		 * exactly the same offers that we had before the hibernation.
> +		 */
> +		offer_sz = sizeof(*offer);
> +		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0)
> +			return;
> +
> +		pr_debug("Mismatched offer from the host (relid=%d)\n",
> +			 offer->child_relid);
> +
> +		print_hex_dump_debug("Old vmbus offer: ", DUMP_PREFIX_OFFSET,
> +				     16, 4, &oldchannel->offermsg, offer_sz,
> +				     false);
> +		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
> +				     16, 4, offer, offer_sz, false);
> +		return;
> +	}
> +
>  	/* Allocate the channel object and save this offer. */
>  	newchannel = alloc_channel();
>  	if (!newchannel) {
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 09829e1..6c7a983 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -337,6 +337,33 @@ struct vmbus_channel *relid2channel(u32 relid)
>  }
> 
>  /*
> + * find_primary_channel_by_offer - Get the channel object given the new offer.
> + * This is only used in the resume path of hibernation.
> + */
> +struct vmbus_channel *
> +find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer)
> +{
> +	struct vmbus_channel *channel;
> +	const guid_t *inst1, *inst2;
> +
> +	WARN_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
> +
> +	/* Ignore sub-channel offers. */
> +	if (offer->offer.sub_channel_index != 0)
> +		return NULL;
> +
> +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> +		inst1 = &channel->offermsg.offer.if_instance;
> +		inst2 = &offer->offer.if_instance;
> +
> +		if (guid_equal(inst1, inst2))
> +			return channel;
> +	}
> +
> +	return NULL;
> +}

Any particular reason this new function is in connection.c instead of
putting it in channel_mgmt.c where it is called?

> +
> +/*
>   * vmbus_on_event - Process a channel event notification
>   *
>   * For batched channels (default) optimize host to guest signaling
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 9f7fb6d..c42b46d 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -310,6 +310,9 @@ int vmbus_add_channel_kobj(struct hv_device *device_obj,
> 
>  struct vmbus_channel *relid2channel(u32 relid);
> 
> +struct vmbus_channel *
> +find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer);
> +
>  void vmbus_free_channels(void);
> 
>  /* Connection interface */
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
  2019-08-20  1:52 ` [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend Dexuan Cui
@ 2019-08-23 20:02   ` Michael Kelley
  2019-08-31  2:00     ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 20:02 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, August 19, 2019 6:52 PM
> 
> Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
> hibernation. There is no better method to clean up the channels since
> some of the channels may still be referenced by the userspace apps when
> hiberantin is triggered: in this case, the "rescind" fields of the

s/hiberantin/hibernation/

> channels are set, and the apps will thoroughly destroy the channels
> after hibernation.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 55
> ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index ce9974b..2bea669 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -24,6 +24,7 @@
>  #include <linux/sched/task_stack.h>
> 
>  #include <asm/mshyperv.h>
> +#include <linux/delay.h>
>  #include <linux/notifier.h>
>  #include <linux/ptrace.h>
>  #include <linux/screen_info.h>
> @@ -1069,6 +1070,41 @@ void vmbus_on_msg_dpc(unsigned long data)
>  	vmbus_signal_eom(msg, message_type);
>  }
> 
> +/*
> + * Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
> + * hibernation, because hv_sock connections can not persist across hibernation.
> + */
> +static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
> +{
> +	struct onmessage_work_context *ctx;
> +	struct vmbus_channel_rescind_offer *rescind;
> +
> +	WARN_ON(!is_hvsock_channel(channel));
> +
> +	/*
> +	 * sizeof(*ctx) is small and the allocation should really not fail,
> +	 * otherwise the state of the hv_sock connections ends up in limbo.
> +	 */
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL);
> +
> +	/*
> +	 * So far, these are not really used by Linux. Just set them to the
> +	 * reasonable values conforming to the definitions of the fields.
> +	 */
> +	ctx->msg.header.message_type = 1;
> +	ctx->msg.header.payload_size = sizeof(*rescind);
> +
> +	/* These values are actually used by Linux. */
> +	rescind = (struct vmbus_channel_rescind_offer *)ctx->msg.u.payload;
> +	rescind->header.msgtype = CHANNELMSG_RESCIND_CHANNELOFFER;
> +	rescind->child_relid = channel->offermsg.child_relid;
> +
> +	INIT_WORK(&ctx->work, vmbus_onmessage_work);
> +
> +	queue_work_on(vmbus_connection.connect_cpu,
> +		      vmbus_connection.work_queue,
> +		      &ctx->work);
> +}
> 
>  /*
>   * Direct callback for channels using other deferred processing
> @@ -2091,6 +2127,25 @@ static int vmbus_acpi_add(struct acpi_device *device)
> 
>  static int vmbus_bus_suspend(struct device *dev)
>  {
> +	struct vmbus_channel *channel;
> +
> +	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
> +		/*
> +		 * We wait here until any channel offer is currently
> +		 * being processed.
> +		 */

The wording of the comment is a bit off.  Maybe

		/*
		 * We wait here until the completion of any channel
		 * offers that are currently in progress.
		 */

> +		msleep(1);
> +	}
> +
> +	mutex_lock(&vmbus_connection.channel_mutex);
> +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> +		if (!is_hvsock_channel(channel))
> +			continue;
> +
> +		vmbus_force_channel_rescinded(channel);
> +	}
> +	mutex_unlock(&vmbus_connection.channel_mutex);
> +
>  	vmbus_initiate_unload(false);
> 
>  	vmbus_connection.conn_state = DISCONNECTED;
> --
> 1.8.3.1

Modulo the nits:

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
  2019-08-20  1:52 ` [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels Dexuan Cui
@ 2019-08-23 20:16   ` Michael Kelley
  2019-08-31  2:54     ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 20:16 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, August 19, 2019 6:52 PM
> 
> Before suspend, Linux must make sure all the hv_sock channels have been
> properly cleaned up, because a hv_sock connection can not persist across
> hibernation, and the user-space app must be properly notified of the
> state change of the connection.
> 
> Before suspend, Linux also must make sure all the sub-channels have been
> destroyed, i.e. the related channel structs of the sub-channels must be
> properly removed, otherwise they would cause a conflict when the
> sub-channels are recreated upon resume.
> 
> Add a counter to track such channels, and vmbus_bus_suspend() should wait
> for the counter to drop to zero.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 28 ++++++++++++++++++++++++++++
>  drivers/hv/connection.c   |  3 +++
>  drivers/hv/hyperv_vmbus.h | 12 ++++++++++++
>  drivers/hv/vmbus_drv.c    | 41 ++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 83 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index f7a1184..8491d1b 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -499,6 +499,8 @@ static void vmbus_add_channel_work(struct work_struct *work)
>  	return;
> 
>  err_deq_chan:
> +	WARN_ON_ONCE(1);
> +

Why this change?  I was thinking maybe it's a debug statement that got
left in.

>  	mutex_lock(&vmbus_connection.channel_mutex);
> 
>  	/*
> @@ -545,6 +547,10 @@ static void vmbus_process_offer(struct vmbus_channel
> *newchannel)
> 
>  	mutex_lock(&vmbus_connection.channel_mutex);
> 
> +	/* Remember the channels that should be cleaned up upon suspend. */
> +	if (is_hvsock_channel(newchannel) || is_sub_channel(newchannel))
> +		atomic_inc(&vmbus_connection.nr_chan_close_on_suspend);
> +
>  	/*
>  	 * Now that we have acquired the channel_mutex,
>  	 * we can release the potentially racing rescind thread.
> @@ -915,6 +921,16 @@ static void vmbus_onoffer(struct
> vmbus_channel_message_header *hdr)
>  	vmbus_process_offer(newchannel);
>  }
> 
> +static void check_ready_for_suspend_event(void)
> +{
> +	/*
> +	 * If all the sub-channels or hv_sock channels have been cleaned up,
> +	 * then it's safe to suspend.
> +	 */
> +	if (atomic_dec_and_test(&vmbus_connection.nr_chan_close_on_suspend))
> +		complete(&vmbus_connection.ready_for_suspend_event);
> +}
> +
>  /*
>   * vmbus_onoffer_rescind - Rescind offer handler.
>   *
> @@ -925,6 +941,7 @@ static void vmbus_onoffer_rescind(struct
> vmbus_channel_message_header *hdr)
>  	struct vmbus_channel_rescind_offer *rescind;
>  	struct vmbus_channel *channel;
>  	struct device *dev;
> +	bool clean_up_chan_for_suspend;
> 
>  	rescind = (struct vmbus_channel_rescind_offer *)hdr;
> 
> @@ -964,6 +981,8 @@ static void vmbus_onoffer_rescind(struct
> vmbus_channel_message_header *hdr)
>  		return;
>  	}
> 
> +	clean_up_chan_for_suspend = is_hvsock_channel(channel) ||
> +				    is_sub_channel(channel);
>  	/*
>  	 * Before setting channel->rescind in vmbus_rescind_cleanup(), we
>  	 * should make sure the channel callback is not running any more.
> @@ -989,6 +1008,10 @@ static void vmbus_onoffer_rescind(struct
> vmbus_channel_message_header *hdr)
>  	if (channel->device_obj) {
>  		if (channel->chn_rescind_callback) {
>  			channel->chn_rescind_callback(channel);
> +
> +			if (clean_up_chan_for_suspend)
> +				check_ready_for_suspend_event();
> +
>  			return;
>  		}
>  		/*
> @@ -1021,6 +1044,11 @@ static void vmbus_onoffer_rescind(struct
> vmbus_channel_message_header *hdr)
>  		}
>  		mutex_unlock(&vmbus_connection.channel_mutex);
>  	}
> +
> +	/* The "channel" may have been freed. Do not access it any longer. */
> +
> +	if (clean_up_chan_for_suspend)
> +		check_ready_for_suspend_event();
>  }

Having to add the above lines twice is a bit clumsy, but the problem is
the overall structure of the vmbus_onoffer_rescind.  The early return in
the case of a rescind_callback function is a bit weird, but I guess it makes
sense since from what I can tell, only uio and hv_sock have rescind callback
functions.   Some minor restructuring might be warranted, but I don't feel
strongly about it.

> 
>  void vmbus_hvsock_device_unregister(struct vmbus_channel *channel)
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 701d9a8..f15d3115 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -26,6 +26,9 @@
>  struct vmbus_connection vmbus_connection = {
>  	.conn_state		= DISCONNECTED,
>  	.next_gpadl_handle	= ATOMIC_INIT(0xE1E10),
> +
> +	.ready_for_suspend_event= COMPLETION_INITIALIZER(
> +				  vmbus_connection.ready_for_suspend_event),
>  };
>  EXPORT_SYMBOL_GPL(vmbus_connection);
> 
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 4610277..9f96e23 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -258,6 +258,18 @@ struct vmbus_connection {
>  	struct workqueue_struct *work_queue;
>  	struct workqueue_struct *handle_primary_chan_wq;
>  	struct workqueue_struct *handle_sub_chan_wq;
> +
> +	/*
> +	 * The number of sub-channels and hv_sock channels that should be
> +	 * cleaned up upon suspend: sub-channels will be re-created upon
> +	 * resume, and hv_sock channels should not survive suspend.
> +	 */
> +	atomic_t nr_chan_close_on_suspend;
> +	/*
> +	 * vmbus_bus_suspend() waits for "nr_chan_close_on_suspend" to
> +	 * drop to zero.
> +	 */
> +	struct completion ready_for_suspend_event;
>  };
> 
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 2bea669..0507157 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2127,7 +2127,8 @@ static int vmbus_acpi_add(struct acpi_device *device)
> 
>  static int vmbus_bus_suspend(struct device *dev)
>  {
> -	struct vmbus_channel *channel;
> +	struct vmbus_channel *channel, *sc;
> +	unsigned long flags;
> 
>  	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
>  		/*
> @@ -2146,6 +2147,41 @@ static int vmbus_bus_suspend(struct device *dev)
>  	}
>  	mutex_unlock(&vmbus_connection.channel_mutex);
> 
> +	/*
> +	 * Wait until all the sub-channels and hv_sock channels have been
> +	 * cleaned up. Sub-channels should be destroyed upon suspend, otherwise
> +	 * they would conflict with the new sub-channels that will be created
> +	 * in the resume path. hv_sock channels should also be destroyed, but
> +	 * a hv_sock channel of an established hv_sock connection can not be
> +	 * really destroyed since it may still be referenced by the userspace
> +	 * application, so we just force the hv_sock channel to be rescinded
> +	 * by vmbus_force_channel_rescinded(), and the userspace application
> +	 * will thoroughly destroy the channel after hibernation.
> +	 */
> +	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)

At first glance, the above line seemed useless to me.  You could just do the
wait_for_completion() unconditionally.  But is the intent to handle the case where
the VM never had any sub-channels or hv_sock channels, and so
nr_chan_close_on_suspend never went above 0?  If so, a comment might
be helpful.

> +		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
> +
> +	mutex_lock(&vmbus_connection.channel_mutex);
> +
> +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> +		if (is_hvsock_channel(channel)) {
> +			if (!channel->rescind) {
> +				pr_err("hv_sock channel not rescinded!\n");
> +				WARN_ON_ONCE(1);
> +			}
> +			continue;
> +		}
> +
> +		spin_lock_irqsave(&channel->lock, flags);
> +		list_for_each_entry(sc, &channel->sc_list, sc_list) {
> +			pr_err("Sub-channel not deleted!\n");
> +			WARN_ON_ONCE(1);
> +		}
> +		spin_unlock_irqrestore(&channel->lock, flags);
> +	}
> +
> +	mutex_unlock(&vmbus_connection.channel_mutex);
> +
>  	vmbus_initiate_unload(false);
> 
>  	vmbus_connection.conn_state = DISCONNECTED;
> @@ -2186,6 +2222,9 @@ static int vmbus_bus_resume(struct device *dev)
> 
>  	vmbus_request_offers();
> 
> +	/* Reset the event for the next suspend. */
> +	reinit_completion(&vmbus_connection.ready_for_suspend_event);
> +
>  	return 0;
>  }
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels
  2019-08-20  1:52 ` [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels Dexuan Cui
@ 2019-08-23 20:25   ` Michael Kelley
  2019-08-31  4:37     ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kelley @ 2019-08-23 20:25 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv, gregkh, Stephen Hemminger, Sasha Levin,
	sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, August 19, 2019 6:52 PM
> 
> When the host re-offers the primary channels upon resume, the host only
> guarantees the Instance GUID  doesn't change, so vmbus_bus_suspend()
> should invalidate channel->offermsg.child_relid and figure out the
> number of primary channels that need to be fixed up upon resume.
> 
> Upon resume, vmbus_onoffer() finds the old channel structs, and maps
> the new offers to the old channels, and fixes up the old structs,
> and finally the resume callbacks of the VSC drivers will re-open
> the channels.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 76 +++++++++++++++++++++++++++++++++++------------
>  drivers/hv/connection.c   |  2 ++
>  drivers/hv/hyperv_vmbus.h | 14 +++++++++
>  drivers/hv/vmbus_drv.c    | 17 +++++++++++
>  include/linux/hyperv.h    |  3 ++
>  5 files changed, 93 insertions(+), 19 deletions(-)
> 
> @@ -875,12 +913,21 @@ static void vmbus_onoffer(struct
> vmbus_channel_message_header *hdr)
>  		atomic_dec(&vmbus_connection.offer_in_progress);
> 
>  		/*
> -		 * We're resuming from hibernation: we expect the host to send
> -		 * exactly the same offers that we had before the hibernation.
> +		 * We're resuming from hibernation: all the sub-channel and
> +		 * hv_sock channels we had before the hibernation should have
> +		 * been cleaned up, and now we must be seeing a re-offered
> +		 * primary channel that we had before the hibernation.
>  		 */
> +
> +		WARN_ON(oldchannel->offermsg.child_relid != INVALID_RELID);
> +		/* Fix up the relid. */
> +		oldchannel->offermsg.child_relid = offer->child_relid;
> +
>  		offer_sz = sizeof(*offer);
> -		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0)
> +		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0) {
> +			check_ready_for_resume_event();
>  			return;
> +		}
> 
>  		pr_debug("Mismatched offer from the host (relid=%d)\n",
>  			 offer->child_relid);
> @@ -890,6 +937,11 @@ static void vmbus_onoffer(struct
> vmbus_channel_message_header *hdr)
>  				     false);
>  		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
>  				     16, 4, offer, offer_sz, false);
> +
> +		vmbus_setup_channel_state(oldchannel, offer);
> +
> +		check_ready_for_resume_event();

This is the error case where the new offer didn't match some aspect of
the old offer.  Is the intent to proceed and use the new offer?   I can see
that check_ready_for_resume_event() has to be called in the error case,
otherwise the resume operation will hang forever, but I'm not sure about
setting up the channel state and then proceeding as if all is good.

> +
>  		return;
>  	}
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported()
  2019-08-23 19:50   ` Michael Kelley
@ 2019-08-30 23:37     ` Dexuan Cui
  0 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-30 23:37 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Friday, August 23, 2019 12:50 PM
> 
> From: Dexuan Cui <decui@microsoft.com> Sent: August 19, 2019 6:52 PM
> >
> > When a Linux VM runs on Hyper-V and hibernates, it must disable the
> > memory hot-add/remove and balloon up/down capabilities in the hv_balloon
> > driver.
> 
> I'm unclear on the above statement.  I think the requirement is that
> ballooning must not be active when hibernation is initiated.  Is hibernation
> blocked in that case?  If not, what happens?

Ballooning (and hot-addition of memory) must not be active if the user
wants the Linux VM to support hibernation, not just when hibernation is
initiated. This is because ballooning and hot-addition of memory are
incompatible with hibernation according to Hyper-V team, e.g. upon
hibernation the balloon VSP doesn't save any info about ballooned-out
pages (if any), so after Linux resumes, Linux balloon VSC expects that the
VSP will return the pages if Linux is under memory pressure, but the VSP
will never return the pages, since the VSP thinks it never stole the
pages from the VM.

So, if the user wants Linux VM to support hibernation, Linux must completely
forbid ballooning and hot-addition of memory, and hence the only
functionality of balloon VSC is reporting the memory pressure to the host.

Ideally, when Linux detects that the user wants it to support hibernation,
the balloon VSC should tell the VSP that it does not support ballooning and
hot-addition; however, the current version of the VSP requires
the VSC should support these capabilities, otherwise the capability negotiation
fails and the VSC can not load at all, so in my changes to the VSC driver, I
still report to the VSP that Linux supports the capabilities, but the VSC
ignores the host's requests of balloon up/down and hot add, and returns an
error to the VSP, when applicable.

BTW, in the future the balloon VSP driver will allow the VSC to not support
the capabilities of balloon up/down and hot add.

The remaining problem is: how can Linux know the user wants Linux VM
to support hibernation?

The ACPI S4 state is not a must for hibernation to work, because Linux is
able to hibernate as long as the system can shut down.

My decision is that: we artificially use the presence of the virtual
ACPI S4 state as the indicator of the user's intent of using hibernation.
BTW, I believe the Windows team made the same decision.

Once all the vmbus and VSC patches are accepted, I'll submit a patch to
forbid hibernation if the virtual ACPI S4 state is absent, e.g.
hv_is_hibernation_supported() is false.

> > By default, Hyper-V does not enable the virtual ACPI S4 state for a VM;
> > on recent Hyper-V hosts, the administrator is able to enable the virtual
> > ACPI S4 state for a VM, so we hope to use the presence of the virtual ACPI
> 
> "we hope" sounds very indefinite. :-(    Does ACPI S4 have to be enabled for
> hibernation to be initiated?  Goes back to my first question ....

Technically, we don't have to enable ACPI S4, but in practice, as I mentioned,
I'll submit a patch to forbid hibernation if ACPI S4 is absent.

> 
> > S4 state as a hint for hv_balloon to disable the aforementioned
> > capabilities.
> >
> > The new API will be used by hv_balloon.

I'll add my above explanation into the changelog in v4.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
  2019-08-23 19:56   ` Michael Kelley
@ 2019-08-31  0:23     ` Dexuan Cui
  0 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-31  0:23 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Friday, August 23, 2019 12:57 PM
> 
> From: Dexuan Cui <decui@microsoft.com>  Sent: August 19, 2019 6:52 PM
> >
> > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> > @@ -337,6 +337,33 @@ struct vmbus_channel *relid2channel(u32 relid)
> >  }
> >
> >  /*
> > + * find_primary_channel_by_offer - Get the channel object given the new
> offer.
> > + * This is only used in the resume path of hibernation.
> > + */
> > +struct vmbus_channel *
> > +find_primary_channel_by_offer(const struct vmbus_channel_offer_channel
> *offer)
> > +{
> > +	struct vmbus_channel *channel;
> > +	const guid_t *inst1, *inst2;
> > +
> > +	WARN_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
> > +
> > +	/* Ignore sub-channel offers. */
> > +	if (offer->offer.sub_channel_index != 0)
> > +		return NULL;
> > +
> > +	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> > +		inst1 = &channel->offermsg.offer.if_instance;
> > +		inst2 = &offer->offer.if_instance;
> > +
> > +		if (guid_equal(inst1, inst2))
> > +			return channel;
> > +	}
> > +
> > +	return NULL;
> > +}
> 
> Any particular reason this new function is in connection.c instead of
> putting it in channel_mgmt.c where it is called?

There is a similar function relid2channel(), which is in connection.c.

Since the new function is only used in channel_mgmt.c. I'll move it to
channel_mgmt.c in v4.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend
  2019-08-23 20:02   ` Michael Kelley
@ 2019-08-31  2:00     ` Dexuan Cui
  0 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-31  2:00 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Friday, August 23, 2019 1:02 PM
> 
> From: Dexuan Cui Sent: Monday, August 19, 2019 6:52 PM
> >
> > Fake RESCIND_CHANNEL messages to clean up hv_sock channels by force for
> > hibernation. There is no better method to clean up the channels since
> > some of the channels may still be referenced by the userspace apps when
> > hiberantin is triggered: in this case, the "rescind" fields of the
> 
> s/hiberantin/hibernation/

Thanks! Will fix this in v4.

> > @@ -2091,6 +2127,25 @@ static int vmbus_acpi_add(struct acpi_device
> *device)
> >
> >  static int vmbus_bus_suspend(struct device *dev)
> >  {
> > +	struct vmbus_channel *channel;
> > +
> > +	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
> > +		/*
> > +		 * We wait here until any channel offer is currently
> > +		 * being processed.
> > +		 */
> 
> The wording of the comment is a bit off.  Maybe
> 
> 		/*
> 		 * We wait here until the completion of any channel
> 		 * offers that are currently in progress.
> 		 */

Will use the better version this in v4. 

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels
  2019-08-23 20:16   ` Michael Kelley
@ 2019-08-31  2:54     ` Dexuan Cui
  0 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-31  2:54 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Friday, August 23, 2019 1:17 PM
> 
> From: Dexuan Cui Sent: Monday, August 19, 2019 6:52 PM
> >
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > @@ -499,6 +499,8 @@ static void vmbus_add_channel_work(struct
> work_struct *work)
> >  	return;
> >
> >  err_deq_chan:
> > +	WARN_ON_ONCE(1);
> > +
> 
> Why this change?  I was thinking maybe it's a debug statement that got
> left in.

This is indeed a debug statement. :-) 
I was not so sure if the error handling is absolutely correct there.
I think I can remove this WARN_ON_ONCE() since almost all of the possible
error paths have a pr_err(). We can make an extra patch to fix any bug in
the existing error handling code. BTW, it should be pretty unlikely to reach
the "err_deq_chan label" in practice.

> > @@ -1021,6 +1044,11 @@ static void vmbus_onoffer_rescind(struct
> > vmbus_channel_message_header *hdr)
> >  		}
> >  		mutex_unlock(&vmbus_connection.channel_mutex);
> >  	}
> > +
> > +	/* The "channel" may have been freed. Do not access it any longer. */
> > +
> > +	if (clean_up_chan_for_suspend)
> > +		check_ready_for_suspend_event();
> >  }
> 
> Having to add the above lines twice is a bit clumsy, but the problem is
> the overall structure of the vmbus_onoffer_rescind.  The early return in
> the case of a rescind_callback function is a bit weird, but I guess it makes
> sense since from what I can tell, only uio and hv_sock have rescind callback
> functions. Some minor restructuring might be warranted, but I don't feel
> strongly about it.

I agree. We should restructure vmbus_onoffer_rescind(), but I think we can
do it in a separate patch. Here in this patch I'm focused on the minimal
required change for hibernation.

> > +	/*
> > +	 * Wait until all the sub-channels and hv_sock channels have been
> > +	 * cleaned up. Sub-channels should be destroyed upon suspend,
> otherwise
> > +	 * they would conflict with the new sub-channels that will be created
> > +	 * in the resume path. hv_sock channels should also be destroyed, but
> > +	 * a hv_sock channel of an established hv_sock connection can not be
> > +	 * really destroyed since it may still be referenced by the userspace
> > +	 * application, so we just force the hv_sock channel to be rescinded
> > +	 * by vmbus_force_channel_rescinded(), and the userspace application
> > +	 * will thoroughly destroy the channel after hibernation.
> > +	 */
> > +	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
> 
> At first glance, the above line seemed useless to me.  You could just do the
> wait_for_completion() unconditionally.  But is the intent to handle the case
> where the VM never had any sub-channels or hv_sock channels, and so
> nr_chan_close_on_suspend never went above 0?  

Yes.

> If so, a comment might be helpful.
> > +wait_for_completion(&vmbus_connection.ready_for_suspend_event);

Ok. I'll add a commnt for this in v4.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels
  2019-08-23 20:25   ` Michael Kelley
@ 2019-08-31  4:37     ` Dexuan Cui
  2019-08-31  5:08       ` Dexuan Cui
  0 siblings, 1 reply; 25+ messages in thread
From: Dexuan Cui @ 2019-08-31  4:37 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Friday, August 23, 2019 1:25 PM
> 
> From: Dexuan Cui Sent: Monday, August 19, 2019 6:52  PM
> > @@ -890,6 +937,11 @@ static void vmbus_onoffer(struct
> > vmbus_channel_message_header *hdr)
> >  				     false);
> >  		print_hex_dump_debug("New vmbus offer: ",
> DUMP_PREFIX_OFFSET,
> >  				     16, 4, offer, offer_sz, false);
> > +
> > +		vmbus_setup_channel_state(oldchannel, offer);
> > +
> > +		check_ready_for_resume_event();
> 
> This is the error case where the new offer didn't match some aspect of
> the old offer. 

Actually, this is not an error: besides the RELID, the host can also change
the offer->connection_id when it re-offers a device to the guest: so far,
I only see this host behavior for the VF vmbus device, and in this case, the
first vmbus_setup_channel_state() in vmbus_onoffer() is used to do the
fix-up:
    channel->sig_event = offer->connection_id;
and later channel->sig_event is used in vmbus_set_event().

Despite the host behavior, it looks the VF vmbus device still works fine,
so (IMO) this is not an error. I'll write a separate email to report this to
Hyper-V team.

> Is the intent to proceed and use the new offer? 
Yes, since this is not an error.

I'll add a comment before the "Mismatched offer from the host" for this.

BTW, the 3 debug lines here output nothing, unless we enable the output
by 
  cd /sys/kernel/debug/dynamic_debug/
  echo 'file drivers/hv/channel_mgmt.c +p' > control
.

> I can see that check_ready_for_resume_event() has to be called in
> the error case, otherwise the resume operation will hang forever, but 
> I'm not sure about setting up the channel state and then proceeding as
> if all is good.
> > +
> > 		return;

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels
  2019-08-31  4:37     ` Dexuan Cui
@ 2019-08-31  5:08       ` Dexuan Cui
  0 siblings, 0 replies; 25+ messages in thread
From: Dexuan Cui @ 2019-08-31  5:08 UTC (permalink / raw)
  To: Michael Kelley, linux-hyperv, gregkh, Stephen Hemminger,
	Sasha Levin, sashal, Haiyang Zhang, KY Srinivasan, tglx
  Cc: linux-kernel

> From: Dexuan Cui
> Sent: Friday, August 30, 2019 9:37 PM
> > Is the intent to proceed and use the new offer?
> Yes, since this is not an error.
> 
> I'll add a comment before the "Mismatched offer from the host" for this.

Hi Michael, 
I'm going to make the below change in v4.

--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -956,7 +956,13 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
                        return;
                }

-               pr_debug("Mismatched offer from the host (relid=%d)\n",
+               /*
+                * This is not an error, since the host can also change the
+                * other field(s) of the offer, e.g. on WS RS5 (Build 17763),
+                * the offer->connection_id of the Mellanox VF vmbus device
+                * can change when the host reoffers the device upon resume.
+                */
+               pr_debug("vmbus offer changed: relid=%d\n",
                         offer->child_relid);

                print_hex_dump_debug("Old vmbus offer: ", DUMP_PREFIX_OFFSET,
@@ -965,6 +971,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
                print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
                                     16, 4, offer, offer_sz, false);

+               /* Fix up the old channel. */
                vmbus_setup_channel_state(oldchannel, offer);

                check_ready_for_resume_event();
Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-08-31  5:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20  1:51 [PATCH v3 00/12] Enhance the hv_vmbus driver to support hibernation Dexuan Cui
2019-08-20  1:51 ` [PATCH v3 01/12] x86/hyper-v: Suspend/resume the hypercall page for hibernation Dexuan Cui
2019-08-20  1:51 ` [PATCH v3 02/12] x86/hyper-v: Implement hv_is_hibernation_supported() Dexuan Cui
2019-08-23 19:50   ` Michael Kelley
2019-08-30 23:37     ` Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 03/12] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 04/12] Drivers: hv: vmbus: Break out synic enable and disable operations Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 05/12] Drivers: hv: vmbus: Suspend/resume the synic for hibernation Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 06/12] Drivers: hv: vmbus: Add a helper function is_sub_channel() Dexuan Cui
2019-08-23 19:51   ` Michael Kelley
2019-08-20  1:52 ` [PATCH v3 07/12] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 08/12] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation Dexuan Cui
2019-08-23 19:56   ` Michael Kelley
2019-08-31  0:23     ` Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 09/12] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 10/12] Drivers: hv: vmbus: Clean up hv_sock channels by force upon suspend Dexuan Cui
2019-08-23 20:02   ` Michael Kelley
2019-08-31  2:00     ` Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 11/12] Drivers: hv: vmbus: Suspend after cleaning up hv_sock and sub channels Dexuan Cui
2019-08-23 20:16   ` Michael Kelley
2019-08-31  2:54     ` Dexuan Cui
2019-08-20  1:52 ` [PATCH v3 12/12] Drivers: hv: vmbus: Resume after fixing up old primary channels Dexuan Cui
2019-08-23 20:25   ` Michael Kelley
2019-08-31  4:37     ` Dexuan Cui
2019-08-31  5:08       ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).