linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fixes hibernation bugs on x86-32 system
@ 2018-08-27  9:41 Gu Zhimin
  2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Gu Zhimin @ 2018-08-27  9:41 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Machek,
	Yu Chen, x86, linux-pm, linux-kernel, Zhimin Gu

From: Zhimin Gu <kookoo.gu@intel.com>

Currently there are mainly three bugs in x86-32 system when doing
hibernation:
1. The page copy code is not running in safe page, which might
   cause hang during resume.
2. There's no text mapping for the final jump address
   of the original kernel, which might cause the system jumping
   into illegal address and causes system hang during resume.
3. The restore kernel switches to its own kernel page table(swapper_pg_dir)
   rather than the original kernel page table after all the pages
   been copied back, which might cause invalid virtual-physical
   mapping issue during resume.

To solve these problems:

1. Copy the code core_restore_code to a safe page, to avoid the instruction
   code be overwriten when image kernel pages are being copied.
2. Set up temporary text mapping for the image kernel's jump address, so that
   after all the pages have been copied back, the system could jump to this address.
3. Switch to the original kernel page table during resume.

Furthermore, MD5 hash check for e820 map is also backported from 64bits

Zhimin Gu (3):
  x86, hibernate: Fix nosave_regions setup for hibernation
  x86, hibernate: Extract the common code of 64/32 bit system
  x86, hibernate: Backport several fixes from 64bits to 32bits
    hibernation

 arch/x86/Kconfig                  |   2 +-
 arch/x86/include/asm/suspend_32.h |   4 +
 arch/x86/kernel/setup.c           |   2 +-
 arch/x86/power/hibernate.c        | 253 ++++++++++++++++++++++++++++++++++++++
 arch/x86/power/hibernate_32.c     |  76 ++++++++----
 arch/x86/power/hibernate_64.c     | 241 +-----------------------------------
 arch/x86/power/hibernate_asm_32.S |  49 ++++++--
 arch/x86/power/hibernate_asm_64.S |   2 +-
 8 files changed, 353 insertions(+), 276 deletions(-)
 create mode 100644 arch/x86/power/hibernate.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation
  2018-08-27  9:41 [PATCH 0/3] Fixes hibernation bugs on x86-32 system Gu Zhimin
@ 2018-08-27  9:42 ` Gu Zhimin
  2018-08-27  9:48   ` Pavel Machek
  2018-08-30 12:45   ` Thomas Gleixner
  2018-08-27  9:42 ` [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system Gu Zhimin
  2018-08-27  9:42 ` [PATCH 3/3] x86, hibernate: Backport several fixes from 64bits to 32bits hibernation Gu Zhimin
  2 siblings, 2 replies; 10+ messages in thread
From: Gu Zhimin @ 2018-08-27  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Machek,
	Yu Chen, x86, linux-pm, linux-kernel, Zhimin Gu

From: Zhimin Gu <kookoo.gu@intel.com>

On 32bit systems, nosave_regions(non RAM areas) located between
max_low_pfn and max_pfn are not excluded from hibernation snapshot
currently, which may result in a machine check exception when
trying to access these unsafe regions during hibernation:

[  612.800453] Disabling lock debugging due to kernel taint
[  612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136
[  612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560}
[  612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086
[  612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24
[  612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[  612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt
[  612.853380] Kernel panic - not syncing: Fatal machine check
[  612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff)

This is because on 32bit systems, pages above max_low_pfn are regarded
as high memeory, and accessing unsafe pages might cause expected MCE.
On the problematic 32bit system, there are reserved memory above low memory,
which triggered the MCE:

e820 memory mapping:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable

Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
This issue should also exist on 64bits systems, if there are reserved
regions above 4GB.

Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>
---
 arch/x86/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b4866ba..90ecc10 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1251,7 +1251,7 @@ void __init setup_arch(char **cmdline_p)
 	x86_init.hyper.guest_late_init();
 
 	e820__reserve_resources();
-	e820__register_nosave_regions(max_low_pfn);
+	e820__register_nosave_regions(max_pfn);
 
 	x86_init.resources.reserve_resources();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system
  2018-08-27  9:41 [PATCH 0/3] Fixes hibernation bugs on x86-32 system Gu Zhimin
  2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
@ 2018-08-27  9:42 ` Gu Zhimin
  2018-08-27  9:48   ` Pavel Machek
  2018-08-30 12:59   ` Thomas Gleixner
  2018-08-27  9:42 ` [PATCH 3/3] x86, hibernate: Backport several fixes from 64bits to 32bits hibernation Gu Zhimin
  2 siblings, 2 replies; 10+ messages in thread
From: Gu Zhimin @ 2018-08-27  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Machek,
	Yu Chen, x86, linux-pm, linux-kernel, Zhimin Gu

From: Zhimin Gu <kookoo.gu@intel.com>

Reduce the hibernation code duplication between x86-32 and x86-64
by extracting the common code into hibernate.c.

No functional change.

Suggested-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>
---
 arch/x86/power/hibernate.c        | 255 ++++++++++++++++++++++++++++++++++++++
 arch/x86/power/hibernate_32.c     |  22 +---
 arch/x86/power/hibernate_64.c     | 241 +----------------------------------
 arch/x86/power/hibernate_asm_64.S |   2 +-
 4 files changed, 259 insertions(+), 261 deletions(-)
 create mode 100644 arch/x86/power/hibernate.c

diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
new file mode 100644
index 0000000..6f91f7b
--- /dev/null
+++ b/arch/x86/power/hibernate.c
@@ -0,0 +1,255 @@
+/*
+ * Hibernation support for x86
+ *
+ * Distribute under GPLv2
+ *
+ * Copyright (c) 2007 Rafael J. Wysocki <rjw@sisk.pl>
+ * Copyright (c) 2002 Pavel Machek <pavel@ucw.cz>
+ * Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
+ */
+
+#include <linux/gfp.h>
+#include <linux/smp.h>
+#include <linux/suspend.h>
+#include <linux/scatterlist.h>
+#include <linux/kdebug.h>
+#include <linux/bootmem.h>
+
+#include <crypto/hash.h>
+
+#include <asm/e820/api.h>
+#include <asm/init.h>
+#include <asm/proto.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+#include <asm/mtrr.h>
+#include <asm/sections.h>
+#include <asm/suspend.h>
+#include <asm/tlbflush.h>
+#include <asm/mmzone.h>
+
+/* Defined in hibernate_asm_64.S or hibernate_asm_32.S */
+extern asmlinkage __visible int restore_image(void);
+
+/*
+ * Address to jump to in the last phase of restore in order to get to the image
+ * kernel's text (this value is passed in the image header).
+ */
+unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;
+
+/*
+ * Value of the cr3 register from before the hibernation (this value is passed
+ * in the image header).
+ */
+unsigned long restore_cr3 __visible;
+unsigned long temp_pgt __visible;
+unsigned long relocated_restore_code __visible;
+
+#ifdef CONFIG_X86_32
+	#define RESTORE_MAGIC	0x12345678UL
+#else
+	#define RESTORE_MAGIC	0x23456789ABCDEF01UL
+#endif
+
+/*
+ *	pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
+	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
+	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
+
+#ifdef CONFIG_X86_64
+static int relocate_restore_code(void)
+{
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	relocated_restore_code = get_safe_page(GFP_ATOMIC);
+	if (!relocated_restore_code)
+		return -ENOMEM;
+
+	memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
+
+	/* Make the page containing the relocated code executable */
+	pgd = (pgd_t *)__va(read_cr3_pa()) +
+		pgd_index(relocated_restore_code);
+	p4d = p4d_offset(pgd, relocated_restore_code);
+	if (p4d_large(*p4d)) {
+		set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
+		goto out;
+	}
+	pud = pud_offset(p4d, relocated_restore_code);
+	if (pud_large(*pud)) {
+		set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
+		goto out;
+	}
+	pmd = pmd_offset(pud, relocated_restore_code);
+	if (pmd_large(*pmd)) {
+		set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
+		goto out;
+	}
+	pte = pte_offset_kernel(pmd, relocated_restore_code);
+	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
+out:
+	__flush_tlb_all();
+	return 0;
+}
+
+#define MD5_DIGEST_SIZE 16
+
+struct restore_data_record {
+	unsigned long jump_address;
+	unsigned long jump_address_phys;
+	unsigned long cr3;
+	unsigned long magic;
+	u8 e820_digest[MD5_DIGEST_SIZE];
+};
+
+#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
+/**
+ * get_e820_md5 - calculate md5 according to given e820 table
+ *
+ * @table: the e820 table to be calculated
+ * @buf: the md5 result to be stored to
+ */
+static int get_e820_md5(struct e820_table *table, void *buf)
+{
+	struct crypto_shash *tfm;
+	struct shash_desc *desc;
+	int size;
+	int ret = 0;
+
+	tfm = crypto_alloc_shash("md5", 0, 0);
+	if (IS_ERR(tfm))
+		return -ENOMEM;
+
+	desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
+		       GFP_KERNEL);
+	if (!desc) {
+		ret = -ENOMEM;
+		goto free_tfm;
+	}
+
+	desc->tfm = tfm;
+	desc->flags = 0;
+
+	size = offsetof(struct e820_table, entries) +
+		sizeof(struct e820_entry) * table->nr_entries;
+
+	if (crypto_shash_digest(desc, (u8 *)table, size, buf))
+		ret = -EINVAL;
+
+	kzfree(desc);
+
+free_tfm:
+	crypto_free_shash(tfm);
+	return ret;
+}
+
+static void hibernation_e820_save(void *buf)
+{
+	get_e820_md5(e820_table_firmware, buf);
+}
+
+static bool hibernation_e820_mismatch(void *buf)
+{
+	int ret;
+	u8 result[MD5_DIGEST_SIZE];
+
+	memset(result, 0, MD5_DIGEST_SIZE);
+	/* If there is no digest in suspend kernel, let it go. */
+	if (!memcmp(result, buf, MD5_DIGEST_SIZE))
+		return false;
+
+	ret = get_e820_md5(e820_table_firmware, result);
+	if (ret)
+		return true;
+
+	return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
+}
+#else
+static void hibernation_e820_save(void *buf)
+{
+}
+
+static bool hibernation_e820_mismatch(void *buf)
+{
+	/* If md5 is not builtin for restore kernel, let it go. */
+	return false;
+}
+#endif
+
+/**
+ *	arch_hibernation_header_save - populate the architecture specific part
+ *		of a hibernation image header
+ *	@addr: address to save the data at
+ */
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+	struct restore_data_record *rdr = addr;
+
+	if (max_size < sizeof(struct restore_data_record))
+		return -EOVERFLOW;
+	rdr->jump_address = (unsigned long)restore_registers;
+	rdr->jump_address_phys = __pa_symbol(restore_registers);
+
+	/*
+	 * The restore code fixes up CR3 and CR4 in the following sequence:
+	 *
+	 * [in hibernation asm]
+	 * 1. CR3 <= temporary page tables
+	 * 2. CR4 <= mmu_cr4_features (from the kernel that restores us)
+	 * 3. CR3 <= rdr->cr3
+	 * 4. CR4 <= mmu_cr4_features (from us, i.e. the image kernel)
+	 * [in restore_processor_state()]
+	 * 5. CR4 <= saved CR4
+	 * 6. CR3 <= saved CR3
+	 *
+	 * Our mmu_cr4_features has CR4.PCIDE=0, and toggling
+	 * CR4.PCIDE while CR3's PCID bits are nonzero is illegal, so
+	 * rdr->cr3 needs to point to valid page tables but must not
+	 * have any of the PCID bits set.
+	 */
+	rdr->cr3 = restore_cr3 & ~CR3_PCID_MASK;
+
+	rdr->magic = RESTORE_MAGIC;
+
+	hibernation_e820_save(rdr->e820_digest);
+
+	return 0;
+}
+
+/**
+ *	arch_hibernation_header_restore - read the architecture specific data
+ *		from the hibernation image header
+ *	@addr: address to read the data from
+ */
+int arch_hibernation_header_restore(void *addr)
+{
+	struct restore_data_record *rdr = addr;
+
+	restore_jump_address = rdr->jump_address;
+	jump_address_phys = rdr->jump_address_phys;
+	restore_cr3 = rdr->cr3;
+
+	if (rdr->magic != RESTORE_MAGIC) {
+		pr_crit("Unrecognized hibernate image header format!\n");
+		return -EINVAL;
+	}
+
+	if (hibernation_e820_mismatch(rdr->e820_digest)) {
+		pr_crit("Hibernate inconsistent memory map detected!\n");
+		return -ENODEV;
+	}
+
+	return 0;
+}
+#endif
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index afc4ed7..7922e11 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -6,17 +6,7 @@
  * Copyright (c) 2006 Rafael J. Wysocki <rjw@sisk.pl>
  */
 
-#include <linux/gfp.h>
-#include <linux/suspend.h>
-#include <linux/bootmem.h>
-
-#include <asm/page.h>
-#include <asm/pgtable.h>
-#include <asm/mmzone.h>
-#include <asm/sections.h>
-
-/* Defined in hibernate_asm_32.S */
-extern int restore_image(void);
+#include "hibernate.c"
 
 /* Pointer to the temporary resume page tables */
 pgd_t *resume_pg_dir;
@@ -163,13 +153,3 @@ asmlinkage int swsusp_arch_resume(void)
 	return 0;
 }
 
-/*
- *	pfn_is_nosave - check if given pfn is in the 'nosave' section
- */
-
-int pfn_is_nosave(unsigned long pfn)
-{
-	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
-	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
-	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
-}
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index f8e3b66..77037c3 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -8,43 +8,7 @@
  * Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
  */
 
-#include <linux/gfp.h>
-#include <linux/smp.h>
-#include <linux/suspend.h>
-#include <linux/scatterlist.h>
-#include <linux/kdebug.h>
-
-#include <crypto/hash.h>
-
-#include <asm/e820/api.h>
-#include <asm/init.h>
-#include <asm/proto.h>
-#include <asm/page.h>
-#include <asm/pgtable.h>
-#include <asm/mtrr.h>
-#include <asm/sections.h>
-#include <asm/suspend.h>
-#include <asm/tlbflush.h>
-
-/* Defined in hibernate_asm_64.S */
-extern asmlinkage __visible int restore_image(void);
-
-/*
- * Address to jump to in the last phase of restore in order to get to the image
- * kernel's text (this value is passed in the image header).
- */
-unsigned long restore_jump_address __visible;
-unsigned long jump_address_phys;
-
-/*
- * Value of the cr3 register from before the hibernation (this value is passed
- * in the image header).
- */
-unsigned long restore_cr3 __visible;
-
-unsigned long temp_level4_pgt __visible;
-
-unsigned long relocated_restore_code __visible;
+#include "hibernate.c"
 
 static int set_up_temporary_text_mapping(pgd_t *pgd)
 {
@@ -141,46 +105,7 @@ static int set_up_temporary_mappings(void)
 			return result;
 	}
 
-	temp_level4_pgt = __pa(pgd);
-	return 0;
-}
-
-static int relocate_restore_code(void)
-{
-	pgd_t *pgd;
-	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-
-	relocated_restore_code = get_safe_page(GFP_ATOMIC);
-	if (!relocated_restore_code)
-		return -ENOMEM;
-
-	memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
-
-	/* Make the page containing the relocated code executable */
-	pgd = (pgd_t *)__va(read_cr3_pa()) +
-		pgd_index(relocated_restore_code);
-	p4d = p4d_offset(pgd, relocated_restore_code);
-	if (p4d_large(*p4d)) {
-		set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
-		goto out;
-	}
-	pud = pud_offset(p4d, relocated_restore_code);
-	if (pud_large(*pud)) {
-		set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
-		goto out;
-	}
-	pmd = pmd_offset(pud, relocated_restore_code);
-	if (pmd_large(*pmd)) {
-		set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
-		goto out;
-	}
-	pte = pte_offset_kernel(pmd, relocated_restore_code);
-	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
-out:
-	__flush_tlb_all();
+	temp_pgt = __pa(pgd);
 	return 0;
 }
 
@@ -201,165 +126,3 @@ asmlinkage int swsusp_arch_resume(void)
 	return 0;
 }
 
-/*
- *	pfn_is_nosave - check if given pfn is in the 'nosave' section
- */
-
-int pfn_is_nosave(unsigned long pfn)
-{
-	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
-	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
-	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
-}
-
-#define MD5_DIGEST_SIZE 16
-
-struct restore_data_record {
-	unsigned long jump_address;
-	unsigned long jump_address_phys;
-	unsigned long cr3;
-	unsigned long magic;
-	u8 e820_digest[MD5_DIGEST_SIZE];
-};
-
-#define RESTORE_MAGIC	0x23456789ABCDEF01UL
-
-#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
-/**
- * get_e820_md5 - calculate md5 according to given e820 table
- *
- * @table: the e820 table to be calculated
- * @buf: the md5 result to be stored to
- */
-static int get_e820_md5(struct e820_table *table, void *buf)
-{
-	struct crypto_shash *tfm;
-	struct shash_desc *desc;
-	int size;
-	int ret = 0;
-
-	tfm = crypto_alloc_shash("md5", 0, 0);
-	if (IS_ERR(tfm))
-		return -ENOMEM;
-
-	desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
-		       GFP_KERNEL);
-	if (!desc) {
-		ret = -ENOMEM;
-		goto free_tfm;
-	}
-
-	desc->tfm = tfm;
-	desc->flags = 0;
-
-	size = offsetof(struct e820_table, entries) +
-		sizeof(struct e820_entry) * table->nr_entries;
-
-	if (crypto_shash_digest(desc, (u8 *)table, size, buf))
-		ret = -EINVAL;
-
-	kzfree(desc);
-
-free_tfm:
-	crypto_free_shash(tfm);
-	return ret;
-}
-
-static void hibernation_e820_save(void *buf)
-{
-	get_e820_md5(e820_table_firmware, buf);
-}
-
-static bool hibernation_e820_mismatch(void *buf)
-{
-	int ret;
-	u8 result[MD5_DIGEST_SIZE];
-
-	memset(result, 0, MD5_DIGEST_SIZE);
-	/* If there is no digest in suspend kernel, let it go. */
-	if (!memcmp(result, buf, MD5_DIGEST_SIZE))
-		return false;
-
-	ret = get_e820_md5(e820_table_firmware, result);
-	if (ret)
-		return true;
-
-	return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
-}
-#else
-static void hibernation_e820_save(void *buf)
-{
-}
-
-static bool hibernation_e820_mismatch(void *buf)
-{
-	/* If md5 is not builtin for restore kernel, let it go. */
-	return false;
-}
-#endif
-
-/**
- *	arch_hibernation_header_save - populate the architecture specific part
- *		of a hibernation image header
- *	@addr: address to save the data at
- */
-int arch_hibernation_header_save(void *addr, unsigned int max_size)
-{
-	struct restore_data_record *rdr = addr;
-
-	if (max_size < sizeof(struct restore_data_record))
-		return -EOVERFLOW;
-	rdr->jump_address = (unsigned long)restore_registers;
-	rdr->jump_address_phys = __pa_symbol(restore_registers);
-
-	/*
-	 * The restore code fixes up CR3 and CR4 in the following sequence:
-	 *
-	 * [in hibernation asm]
-	 * 1. CR3 <= temporary page tables
-	 * 2. CR4 <= mmu_cr4_features (from the kernel that restores us)
-	 * 3. CR3 <= rdr->cr3
-	 * 4. CR4 <= mmu_cr4_features (from us, i.e. the image kernel)
-	 * [in restore_processor_state()]
-	 * 5. CR4 <= saved CR4
-	 * 6. CR3 <= saved CR3
-	 *
-	 * Our mmu_cr4_features has CR4.PCIDE=0, and toggling
-	 * CR4.PCIDE while CR3's PCID bits are nonzero is illegal, so
-	 * rdr->cr3 needs to point to valid page tables but must not
-	 * have any of the PCID bits set.
-	 */
-	rdr->cr3 = restore_cr3 & ~CR3_PCID_MASK;
-
-	rdr->magic = RESTORE_MAGIC;
-
-	hibernation_e820_save(rdr->e820_digest);
-
-	return 0;
-}
-
-/**
- *	arch_hibernation_header_restore - read the architecture specific data
- *		from the hibernation image header
- *	@addr: address to read the data from
- */
-int arch_hibernation_header_restore(void *addr)
-{
-	struct restore_data_record *rdr = addr;
-
-	restore_jump_address = rdr->jump_address;
-	jump_address_phys = rdr->jump_address_phys;
-	restore_cr3 = rdr->cr3;
-
-	if (rdr->magic != RESTORE_MAGIC) {
-		pr_crit("Unrecognized hibernate image header format!\n");
-		return -EINVAL;
-	}
-
-	if (hibernation_e820_mismatch(rdr->e820_digest)) {
-		pr_crit("Hibernate inconsistent memory map detected!\n");
-		return -ENODEV;
-	}
-
-	return 0;
-}
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index fd369a6..3008baa 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -59,7 +59,7 @@ ENTRY(restore_image)
 	movq	restore_cr3(%rip), %r9
 
 	/* prepare to switch to temporary page tables */
-	movq	temp_level4_pgt(%rip), %rax
+	movq	temp_pgt(%rip), %rax
 	movq	mmu_cr4_features(%rip), %rbx
 
 	/* prepare to copy image data to their original locations */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] x86, hibernate: Backport several fixes from 64bits to 32bits hibernation
  2018-08-27  9:41 [PATCH 0/3] Fixes hibernation bugs on x86-32 system Gu Zhimin
  2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
  2018-08-27  9:42 ` [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system Gu Zhimin
@ 2018-08-27  9:42 ` Gu Zhimin
  2 siblings, 0 replies; 10+ messages in thread
From: Gu Zhimin @ 2018-08-27  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Pavel Machek,
	Yu Chen, x86, linux-pm, linux-kernel, Zhimin Gu

From: Zhimin Gu <kookoo.gu@intel.com>

Currently there are mainly three bugs in 32bits system when doing
hibernation:
1. The page copy code is not running in safe page, which might
   cause hang during resume.
2. There's no text mapping for the final jump address
   of the original kernel, which might cause the system jumping
   into illegal address and causes system hang during resume.
3. The restore kernel switches to its own kernel page table(swapper_pg_dir)
   rather than the original kernel page table after all the pages
   been copied back, which might cause invalid virtual-physical
   mapping issue during resume.

To solve these problems:

1. Copy the code core_restore_code to a safe page, to avoid the instruction
   code be overwritten when image kernel pages are being copied.
2. Set up temporary text mapping for the image kernel's jump address, so that
   after all the pages have been copied back, the system could jump to this address.
3. Switch to the original kernel page table during resume.

Furthermore, MD5 hash check for e820 map is also backported from 64bits
system.

Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>
---
 arch/x86/Kconfig                  |  2 +-
 arch/x86/include/asm/suspend_32.h |  4 +++
 arch/x86/power/hibernate.c        |  2 --
 arch/x86/power/hibernate_32.c     | 54 ++++++++++++++++++++++++++++++++++++---
 arch/x86/power/hibernate_asm_32.S | 49 ++++++++++++++++++++++++++++-------
 5 files changed, 95 insertions(+), 16 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c5ff296..d1c3c9d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2422,7 +2422,7 @@ menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
 	def_bool y
-	depends on X86_64 && HIBERNATION
+	depends on HIBERNATION
 
 source "kernel/power/Kconfig"
 
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 8be6afb..fdbd9d7 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -32,4 +32,8 @@ struct saved_context {
 	unsigned long return_address;
 } __attribute__((packed));
 
+/* routines for saving/restoring kernel state */
+extern char core_restore_code[];
+extern char restore_registers[];
+
 #endif /* _ASM_X86_SUSPEND_32_H */
diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
index 6f91f7b..d3ef08d 100644
--- a/arch/x86/power/hibernate.c
+++ b/arch/x86/power/hibernate.c
@@ -63,7 +63,6 @@ int pfn_is_nosave(unsigned long pfn)
 	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
 }
 
-#ifdef CONFIG_X86_64
 static int relocate_restore_code(void)
 {
 	pgd_t *pgd;
@@ -252,4 +251,3 @@ int arch_hibernation_header_restore(void *addr)
 
 	return 0;
 }
-#endif
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 7922e11..d0a41ed 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -8,9 +8,6 @@
 
 #include "hibernate.c"
 
-/* Pointer to the temporary resume page tables */
-pgd_t *resume_pg_dir;
-
 /* The following three functions are based on the analogous code in
  * arch/x86/mm/init_32.c
  */
@@ -135,20 +132,69 @@ static inline void resume_init_first_level_page_table(pgd_t *pg_dir)
 #endif
 }
 
-asmlinkage int swsusp_arch_resume(void)
+static int set_up_temporary_text_mapping(pgd_t *pgd_base)
+{
+	pgd_t *pgd;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pgd = pgd_base + pgd_index(restore_jump_address);
+
+	pmd = resume_one_md_table_init(pgd);
+	if (!pmd)
+		return -ENOMEM;
+
+	if (boot_cpu_has(X86_FEATURE_PSE)) {
+		set_pmd(pmd + pmd_index(restore_jump_address),
+		__pmd((jump_address_phys & PMD_MASK) | pgprot_val(PAGE_KERNEL_LARGE_EXEC)));
+	} else {
+		pte = resume_one_page_table_init(pmd);
+		if (!pte)
+			return -ENOMEM;
+		set_pte(pte + pte_index(restore_jump_address),
+		__pte((jump_address_phys & PAGE_MASK) | pgprot_val(PAGE_KERNEL_EXEC)));
+	}
+
+	return 0;
+}
+
+/* Set up the temporary kernel text and direct mapping. */
+static int set_up_temporary_mappings(void)
 {
 	int error;
+	pgd_t *resume_pg_dir;
 
 	resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
 	if (!resume_pg_dir)
 		return -ENOMEM;
 
 	resume_init_first_level_page_table(resume_pg_dir);
+	error = set_up_temporary_text_mapping(resume_pg_dir);
+	if (error)
+		return error;
+
 	error = resume_physical_mapping_init(resume_pg_dir);
 	if (error)
 		return error;
 
+	temp_pgt = __pa(resume_pg_dir);
+
+	return 0;
+}
+
+asmlinkage int swsusp_arch_resume(void)
+{
+	int error;
+
 	/* We have got enough memory and from now on we cannot recover */
+	error = set_up_temporary_mappings();
+	if (error)
+		return error;
+
+	error = relocate_restore_code();
+	if (error)
+		return error;
+
 	restore_image();
 	return 0;
 }
diff --git a/arch/x86/power/hibernate_asm_32.S b/arch/x86/power/hibernate_asm_32.S
index 6e56815..a53b4a4 100644
--- a/arch/x86/power/hibernate_asm_32.S
+++ b/arch/x86/power/hibernate_asm_32.S
@@ -24,21 +24,40 @@ ENTRY(swsusp_arch_suspend)
 	pushfl
 	popl saved_context_eflags
 
+	/* save cr3 */
+	movl	%cr3, %eax
+	movl	%eax, restore_cr3
+
 	call swsusp_save
 	ret
+ENDPROC(swsusp_arch_suspend)
 
 ENTRY(restore_image)
-	movl	mmu_cr4_features, %ecx
-	movl	resume_pg_dir, %eax
-	subl	$__PAGE_OFFSET, %eax
+	/* prepare to jump to the image kernel */
+	movl	restore_jump_address, %ebx
+	movl	restore_cr3, %ebp
+
+	movl	mmu_cr4_features, %edx
+
+	/* jump to relocated restore code */
+	movl	relocated_restore_code, %eax
+	jmpl	*%eax
+
+	/* code below has been relocated to a safe page */
+ENTRY(core_restore_code)
+	movl	temp_pgt, %eax
 	movl	%eax, %cr3
 
+	/* flush TLB */
+	movl	%edx, %ecx
 	jecxz	1f	# cr4 Pentium and higher, skip if zero
 	andl	$~(X86_CR4_PGE), %ecx
 	movl	%ecx, %cr4;  # turn off PGE
 	movl	%cr3, %eax;  # flush TLB
 	movl	%eax, %cr3
+	movl	%edx, %cr4;  # turn PGE back on
 1:
+	/* prepare to copy image data to their original locations */
 	movl	restore_pblist, %edx
 	.p2align 4,,7
 
@@ -49,7 +68,7 @@ copy_loop:
 	movl	pbe_address(%edx), %esi
 	movl	pbe_orig_address(%edx), %edi
 
-	movl	$1024, %ecx
+	movl	$(PAGE_SIZE >> 2), %ecx
 	rep
 	movsl
 
@@ -58,13 +77,22 @@ copy_loop:
 	.p2align 4,,7
 
 done:
+	jmpl	*%ebx
+	.align PAGE_SIZE
+
+ENTRY(restore_registers)
 	/* go back to the original page tables */
-	movl	$swapper_pg_dir, %eax
-	subl	$__PAGE_OFFSET, %eax
-	movl	%eax, %cr3
-	movl	mmu_cr4_features, %ecx
+	movl	%ebp, %cr3
+
+	/* flush TLB */
+	movl	mmu_cr4_features, %edx
+	movl	%edx, %ecx
 	jecxz	1f	# cr4 Pentium and higher, skip if zero
-	movl	%ecx, %cr4;  # turn PGE back on
+	andl	$~(X86_CR4_PGE), %ecx
+	movl	%ecx, %cr4;  # turn off PGE
+	movl	%cr3, %ecx;  # flush TLB
+	movl	%ecx, %cr3;
+	movl	%edx, %cr4;  # turn PGE back on
 1:
 
 	movl saved_context_esp, %esp
@@ -82,4 +110,7 @@ done:
 
 	xorl	%eax, %eax
 
+	movl	%eax, in_suspend
+
 	ret
+ENDPROC(restore_registers)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation
  2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
@ 2018-08-27  9:48   ` Pavel Machek
  2018-08-30 12:45   ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Pavel Machek @ 2018-08-27  9:48 UTC (permalink / raw)
  To: Gu Zhimin
  Cc: Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Yu Chen, x86, linux-pm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]

On Mon 2018-08-27 17:42:05, Gu Zhimin wrote:
> From: Zhimin Gu <kookoo.gu@intel.com>
> 
> On 32bit systems, nosave_regions(non RAM areas) located between
> max_low_pfn and max_pfn are not excluded from hibernation snapshot
> currently, which may result in a machine check exception when
> trying to access these unsafe regions during hibernation:
> 
> [  612.800453] Disabling lock debugging due to kernel taint
> [  612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136
> [  612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560}
> [  612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086
> [  612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24
> [  612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> [  612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt
> [  612.853380] Kernel panic - not syncing: Fatal machine check
> [  612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff)
> 
> This is because on 32bit systems, pages above max_low_pfn are regarded
> as high memeory, and accessing unsafe pages might cause expected MCE.
> On the problematic 32bit system, there are reserved memory above low memory,
> which triggered the MCE:
> 
> e820 memory mapping:
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
> [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable
> [    0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable
> 
> Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
> This issue should also exist on 64bits systems, if there are reserved
> regions above 4GB.
> 
> Acked-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>

Acked-by: Pavel Machek <pavel@ucw.cz>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system
  2018-08-27  9:42 ` [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system Gu Zhimin
@ 2018-08-27  9:48   ` Pavel Machek
  2018-08-30 12:59   ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Pavel Machek @ 2018-08-27  9:48 UTC (permalink / raw)
  To: Gu Zhimin
  Cc: Rafael J. Wysocki, Len Brown, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Yu Chen, x86, linux-pm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On Mon 2018-08-27 17:42:19, Gu Zhimin wrote:
> From: Zhimin Gu <kookoo.gu@intel.com>
> 
> Reduce the hibernation code duplication between x86-32 and x86-64
> by extracting the common code into hibernate.c.
> 
> No functional change.
> 
> Suggested-by: Pavel Machek <pavel@ucw.cz>
> Acked-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Zhimin Gu <kookoo.gu@intel.com>

Thanks a lot for doing this.

Acked-by: Pavel Machek <pavel@ucw.cz>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation
  2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
  2018-08-27  9:48   ` Pavel Machek
@ 2018-08-30 12:45   ` Thomas Gleixner
  2018-09-01 15:38     ` Yu Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2018-08-30 12:45 UTC (permalink / raw)
  To: Gu Zhimin
  Cc: Rafael J. Wysocki, Len Brown, Ingo Molnar, H. Peter Anvin,
	Pavel Machek, Yu Chen, x86, linux-pm, linux-kernel

On Mon, 27 Aug 2018, Gu Zhimin wrote:
> 
> Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
> This issue should also exist on 64bits systems, if there are reserved
> regions above 4GB.

Should? Can we please have facts and not some half baken assumptions? 

On 64bit max_low_pfn is the same as max_pfn. See init_mem_mapping().

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system
  2018-08-27  9:42 ` [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system Gu Zhimin
  2018-08-27  9:48   ` Pavel Machek
@ 2018-08-30 12:59   ` Thomas Gleixner
  2018-09-01 15:45     ` Yu Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2018-08-30 12:59 UTC (permalink / raw)
  To: Gu Zhimin
  Cc: Rafael J. Wysocki, Len Brown, Ingo Molnar, H. Peter Anvin,
	Pavel Machek, Yu Chen, x86, linux-pm, linux-kernel

On Mon, 27 Aug 2018, Gu Zhimin wrote:
> diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
> new file mode 100644
> index 0000000..6f91f7b
> --- /dev/null
> +++ b/arch/x86/power/hibernate.c
> @@ -0,0 +1,255 @@
> +/*
> + * Hibernation support for x86
> + *
> + * Distribute under GPLv2

We have SPDX identifiers for that and not some randomly chosen license
hint.

> +
> +/*
> + *	pfn_is_nosave - check if given pfn is in the 'nosave' section

This is a half baken kernel doc header. 

> + */
> +

Random new line.

> +int pfn_is_nosave(unsigned long pfn)
> +{
> +	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
> +	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;

Instead of blindly copying stuff please fix it so it matches kernel coding
rules.

> +	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);

The brackets are pointless

> +}
> +
> +#ifdef CONFIG_X86_64
> +static int relocate_restore_code(void)
> +{
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	relocated_restore_code = get_safe_page(GFP_ATOMIC);
> +	if (!relocated_restore_code)
> +		return -ENOMEM;
> +
> +	memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
> +
> +	/* Make the page containing the relocated code executable */
> +	pgd = (pgd_t *)__va(read_cr3_pa()) +
> +		pgd_index(relocated_restore_code);
> +	p4d = p4d_offset(pgd, relocated_restore_code);
> +	if (p4d_large(*p4d)) {
> +		set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
> +		goto out;
> +	}
> +	pud = pud_offset(p4d, relocated_restore_code);
> +	if (pud_large(*pud)) {
> +		set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
> +		goto out;
> +	}
> +	pmd = pmd_offset(pud, relocated_restore_code);
> +	if (pmd_large(*pmd)) {
> +		set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
> +		goto out;
> +	}
> +	pte = pte_offset_kernel(pmd, relocated_restore_code);
> +	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
> +out:
> +	__flush_tlb_all();
> +	return 0;
> +}
> +
> +#define MD5_DIGEST_SIZE 16
> +
> +struct restore_data_record {
> +	unsigned long jump_address;
> +	unsigned long jump_address_phys;
> +	unsigned long cr3;
> +	unsigned long magic;
> +	u8 e820_digest[MD5_DIGEST_SIZE];
> +};
> +
> +#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
> +/**
> + * get_e820_md5 - calculate md5 according to given e820 table
> + *
> + * @table: the e820 table to be calculated
> + * @buf: the md5 result to be stored to
> + */
> +static int get_e820_md5(struct e820_table *table, void *buf)
> +{
> +	struct crypto_shash *tfm;
> +	struct shash_desc *desc;
> +	int size;
> +	int ret = 0;
> +
> +	tfm = crypto_alloc_shash("md5", 0, 0);
> +	if (IS_ERR(tfm))
> +		return -ENOMEM;
> +
> +	desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
> +		       GFP_KERNEL);
> +	if (!desc) {
> +		ret = -ENOMEM;
> +		goto free_tfm;
> +	}
> +
> +	desc->tfm = tfm;
> +	desc->flags = 0;
> +
> +	size = offsetof(struct e820_table, entries) +
> +		sizeof(struct e820_entry) * table->nr_entries;
> +
> +	if (crypto_shash_digest(desc, (u8 *)table, size, buf))
> +		ret = -EINVAL;
> +
> +	kzfree(desc);
> +
> +free_tfm:
> +	crypto_free_shash(tfm);
> +	return ret;
> +}
> +
> +static void hibernation_e820_save(void *buf)
> +{
> +	get_e820_md5(e820_table_firmware, buf);

So if get_e820_md5() fails, then it will hibernate nevertheless. Why is
that error code not propagated?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation
  2018-08-30 12:45   ` Thomas Gleixner
@ 2018-09-01 15:38     ` Yu Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Yu Chen @ 2018-09-01 15:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gu Zhimin, Rafael J. Wysocki, Len Brown, Ingo Molnar,
	H. Peter Anvin, Pavel Machek, x86, linux-pm, linux-kernel

Hi,
On Thu, Aug 30, 2018 at 02:45:26PM +0200, Thomas Gleixner wrote:
> On Mon, 27 Aug 2018, Gu Zhimin wrote:
> > 
> > Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
> > This issue should also exist on 64bits systems, if there are reserved
> > regions above 4GB.
> 
> Should? Can we please have facts and not some half baken assumptions? 
> 
> On 64bit max_low_pfn is the same as max_pfn. See init_mem_mapping().
>
Thanks for pointing this out, we've overlooked the reassignment of max_low_pfn,
but just take a glance within setup_arch().
Best,
Yu

> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system
  2018-08-30 12:59   ` Thomas Gleixner
@ 2018-09-01 15:45     ` Yu Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Yu Chen @ 2018-09-01 15:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gu Zhimin, Rafael J. Wysocki, Len Brown, Ingo Molnar,
	H. Peter Anvin, Pavel Machek, x86, linux-pm, linux-kernel

On Thu, Aug 30, 2018 at 02:59:09PM +0200, Thomas Gleixner wrote:
> On Mon, 27 Aug 2018, Gu Zhimin wrote:
> > diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
> > new file mode 100644
> > index 0000000..6f91f7b
> > --- /dev/null
> > +++ b/arch/x86/power/hibernate.c
> > @@ -0,0 +1,255 @@
> > +/*
> > + * Hibernation support for x86
> > + *
> > + * Distribute under GPLv2
> 
> We have SPDX identifiers for that and not some randomly chosen license
> hint.
>
> > +
> > +/*
> > + *	pfn_is_nosave - check if given pfn is in the 'nosave' section
> 
> This is a half baken kernel doc header. 
> 
> > + */
> > +
> 
> Random new line.
> 
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > +	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
> > +	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
> 
> Instead of blindly copying stuff please fix it so it matches kernel coding
> rules.
> 
> > +	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
> 
> The brackets are pointless
> 
> > +}
> > +
> > +#ifdef CONFIG_X86_64
> > +static int relocate_restore_code(void)
> > +{
> > +	pgd_t *pgd;
> > +	p4d_t *p4d;
> > +	pud_t *pud;
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +
> > +	relocated_restore_code = get_safe_page(GFP_ATOMIC);
> > +	if (!relocated_restore_code)
> > +		return -ENOMEM;
> > +
> > +	memcpy((void *)relocated_restore_code, core_restore_code, PAGE_SIZE);
> > +
> > +	/* Make the page containing the relocated code executable */
> > +	pgd = (pgd_t *)__va(read_cr3_pa()) +
> > +		pgd_index(relocated_restore_code);
> > +	p4d = p4d_offset(pgd, relocated_restore_code);
> > +	if (p4d_large(*p4d)) {
> > +		set_p4d(p4d, __p4d(p4d_val(*p4d) & ~_PAGE_NX));
> > +		goto out;
> > +	}
> > +	pud = pud_offset(p4d, relocated_restore_code);
> > +	if (pud_large(*pud)) {
> > +		set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
> > +		goto out;
> > +	}
> > +	pmd = pmd_offset(pud, relocated_restore_code);
> > +	if (pmd_large(*pmd)) {
> > +		set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
> > +		goto out;
> > +	}
> > +	pte = pte_offset_kernel(pmd, relocated_restore_code);
> > +	set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
> > +out:
> > +	__flush_tlb_all();
> > +	return 0;
> > +}
> > +
> > +#define MD5_DIGEST_SIZE 16
> > +
> > +struct restore_data_record {
> > +	unsigned long jump_address;
> > +	unsigned long jump_address_phys;
> > +	unsigned long cr3;
> > +	unsigned long magic;
> > +	u8 e820_digest[MD5_DIGEST_SIZE];
> > +};
> > +
> > +#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
> > +/**
> > + * get_e820_md5 - calculate md5 according to given e820 table
> > + *
> > + * @table: the e820 table to be calculated
> > + * @buf: the md5 result to be stored to
> > + */
> > +static int get_e820_md5(struct e820_table *table, void *buf)
> > +{
> > +	struct crypto_shash *tfm;
> > +	struct shash_desc *desc;
> > +	int size;
> > +	int ret = 0;
> > +
> > +	tfm = crypto_alloc_shash("md5", 0, 0);
> > +	if (IS_ERR(tfm))
> > +		return -ENOMEM;
> > +
> > +	desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
> > +		       GFP_KERNEL);
> > +	if (!desc) {
> > +		ret = -ENOMEM;
> > +		goto free_tfm;
> > +	}
> > +
> > +	desc->tfm = tfm;
> > +	desc->flags = 0;
> > +
> > +	size = offsetof(struct e820_table, entries) +
> > +		sizeof(struct e820_entry) * table->nr_entries;
> > +
> > +	if (crypto_shash_digest(desc, (u8 *)table, size, buf))
> > +		ret = -EINVAL;
> > +
> > +	kzfree(desc);
> > +
> > +free_tfm:
> > +	crypto_free_shash(tfm);
> > +	return ret;
> > +}
> > +
> > +static void hibernation_e820_save(void *buf)
> > +{
> > +	get_e820_md5(e820_table_firmware, buf);
> 
> So if get_e820_md5() fails, then it will hibernate nevertheless. Why is
> that error code not propagated?
>
Thomas, thanks for reviewing this, we will fix them one by one.
Best,
Yu

> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-01 15:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27  9:41 [PATCH 0/3] Fixes hibernation bugs on x86-32 system Gu Zhimin
2018-08-27  9:42 ` [PATCH 1/3] x86, hibernate: Fix nosave_regions setup for hibernation Gu Zhimin
2018-08-27  9:48   ` Pavel Machek
2018-08-30 12:45   ` Thomas Gleixner
2018-09-01 15:38     ` Yu Chen
2018-08-27  9:42 ` [PATCH 2/3] x86, hibernate: Extract the common code of 64/32 bit system Gu Zhimin
2018-08-27  9:48   ` Pavel Machek
2018-08-30 12:59   ` Thomas Gleixner
2018-09-01 15:45     ` Yu Chen
2018-08-27  9:42 ` [PATCH 3/3] x86, hibernate: Backport several fixes from 64bits to 32bits hibernation Gu Zhimin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).