* [PATCH 0/2][RFC] Introduce e820_table_ori to fix the memory inconsistent problem during hibernation @ 2017-06-17 5:21 Chen Yu 2017-06-17 5:22 ` [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout Chen Yu 2017-06-17 5:23 ` [PATCH 2/2][RFC] PM / hibernate: Utilize the original e820 map for consistent check Chen Yu 0 siblings, 2 replies; 7+ messages in thread From: Chen Yu @ 2017-06-17 5:21 UTC (permalink / raw) To: x86; +Cc: linux-kernel, Yu Chen, Ingo Molnar, Rafael J . Wysocki This is a patch set to fix the issue found during hibernation restore that, the MD5 fingerprint of the physical memory layout checking code has reported a false-positive failure due to incorrect input from the e820 map. Chen Yu (2): x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout PM / hibernate: Utilize the original e820 map for consistent check arch/x86/include/asm/e820/api.h | 1 + arch/x86/kernel/e820.c | 24 ++++++++++++++++++++---- arch/x86/power/hibernate_64.c | 4 ++-- 3 files changed, 23 insertions(+), 6 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout 2017-06-17 5:21 [PATCH 0/2][RFC] Introduce e820_table_ori to fix the memory inconsistent problem during hibernation Chen Yu @ 2017-06-17 5:22 ` Chen Yu 2017-06-22 9:40 ` Ingo Molnar 2017-06-17 5:23 ` [PATCH 2/2][RFC] PM / hibernate: Utilize the original e820 map for consistent check Chen Yu 1 sibling, 1 reply; 7+ messages in thread From: Chen Yu @ 2017-06-17 5:22 UTC (permalink / raw) To: x86 Cc: linux-kernel, Yu Chen, Ingo Molnar, Rafael J . Wysocki, Ingo Molnar, Thomas Gleixner, Len Brown, Ying Huang Currently we try to have e820_table_firmware to represent the original firmware memory layout passed to us by the bootloader, however it is not the case, the e820_table_firmware might still be modified by linux: 1. During bootup, the efi boot stub might allocate memory via efi service for the PCI device information structure, then later e820_reserve_setup_data() reserved these dynamically allocated structures(AKA, setup_data) in e820_table_firmware accordingly. 2. The kexec might also modify the e820_table_firmware. This brings problem to the memory layout checking logic during hibernation, because in theory that code expects to get input from the original memory layout passed by the BIOS, otherwise we get a force-positive failure. So introduce a new variable e820_table_ori to record the original BIOS provided memory layout, which is composed of two parts: the BIOS-provided physical RAM map and extended physical RAM map. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <lenb@kernel.org> Cc: Ying Huang <ying.huang@intel.com> Cc: x86@kernel.org Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- arch/x86/include/asm/e820/api.h | 1 + arch/x86/kernel/e820.c | 24 ++++++++++++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h index 8e0f8b8..d30114b 100644 --- a/arch/x86/include/asm/e820/api.h +++ b/arch/x86/include/asm/e820/api.h @@ -5,6 +5,7 @@ extern struct e820_table *e820_table; extern struct e820_table *e820_table_firmware; +extern struct e820_table *e820_table_ori; extern unsigned long pci_mem_start; diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index d78a586..29dcb4c 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -20,7 +20,7 @@ #include <asm/setup.h> /* - * We organize the E820 table into two main data structures: + * We organize the E820 table into three main data structures: * * - 'e820_table_firmware': the original firmware version passed to us by the * bootloader - not modified by the kernel. We use this to: @@ -28,14 +28,18 @@ * - inform the user about the firmware's notion of memory layout * via /sys/firmware/memmap * - * - the hibernation code uses it to generate a kernel-independent MD5 - * fingerprint of the physical memory layout of a system. - * * - kexec, which is a bootloader in disguise, uses the original E820 * layout to pass to the kexec-ed kernel. This way the original kernel * can have a restricted E820 map while the kexec()-ed kexec-kernel * can have access to full memory - etc. * + * - 'e820_table_ori': the original firmware version passed to us by the + * bootloader - not modified by the kernel or the efi boot stub. + * We use this to: + * + * - the hibernation code uses it to generate a kernel-independent MD5 + * fingerprint of the physical memory layout of a system. + * * - 'e820_table': this is the main E820 table that is massaged by the * low level x86 platform code, or modified by boot parameters, before * passed on to higher level MM layers. @@ -47,9 +51,11 @@ */ static struct e820_table e820_table_init __initdata; static struct e820_table e820_table_firmware_init __initdata; +static struct e820_table e820_table_ori_init __initdata; struct e820_table *e820_table __refdata = &e820_table_init; struct e820_table *e820_table_firmware __refdata = &e820_table_firmware_init; +struct e820_table *e820_table_ori __refdata = &e820_table_ori_init; /* For PCI or other memory-mapped resources */ unsigned long pci_mem_start = 0xaeedbabe; @@ -648,6 +654,12 @@ __init void e820__reallocate_tables(void) BUG_ON(!n); memcpy(n, e820_table_firmware, size); e820_table_firmware = n; + + size = offsetof(struct e820_table, entries) + sizeof(struct e820_entry)*e820_table_ori->nr_entries; + n = kmalloc(size, GFP_KERNEL); + BUG_ON(!n); + memcpy(n, e820_table_ori, size); + e820_table_ori = n; } /* @@ -669,6 +681,9 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len) __append_e820_table(extmap, entries); e820__update_table(e820_table); + /* Update the original table if there's any extended memory. */ + memcpy(e820_table_ori, e820_table, sizeof(*e820_table_ori)); + early_memunmap(sdata, data_len); pr_info("e820: extended physical RAM map:\n"); e820__print_table("extended"); @@ -1176,6 +1191,7 @@ void __init e820__memory_setup(void) who = x86_init.resources.memory_setup(); memcpy(e820_table_firmware, e820_table, sizeof(*e820_table_firmware)); + memcpy(e820_table_ori, e820_table, sizeof(*e820_table_ori)); pr_info("e820: BIOS-provided physical RAM map:\n"); e820__print_table(who); -- 2.7.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout 2017-06-17 5:22 ` [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout Chen Yu @ 2017-06-22 9:40 ` Ingo Molnar 2017-06-23 4:13 ` Chen Yu 0 siblings, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2017-06-22 9:40 UTC (permalink / raw) To: Chen Yu Cc: x86, linux-kernel, Rafael J . Wysocki, Ingo Molnar, Thomas Gleixner, Len Brown, Ying Huang * Chen Yu <yu.c.chen@intel.com> wrote: > Currently we try to have e820_table_firmware to represent the > original firmware memory layout passed to us by the bootloader, > however it is not the case, the e820_table_firmware might still > be modified by linux: > 1. During bootup, the efi boot stub might allocate memory via > efi service for the PCI device information structure, then > later e820_reserve_setup_data() reserved these dynamically > allocated structures(AKA, setup_data) in e820_table_firmware > accordingly. > 2. The kexec might also modify the e820_table_firmware. Hm, so why does the EFI code modify e280_table_firmware - why doesn't it modify e820_table? I.e. what is the point of having 3 different versions of the memory layout table? Thanks, Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout 2017-06-22 9:40 ` Ingo Molnar @ 2017-06-23 4:13 ` Chen Yu 2017-06-23 8:42 ` Ingo Molnar 0 siblings, 1 reply; 7+ messages in thread From: Chen Yu @ 2017-06-23 4:13 UTC (permalink / raw) To: Ingo Molnar Cc: x86, linux-kernel, Rafael J . Wysocki, Ingo Molnar, Thomas Gleixner, Len Brown, Ying Huang, Xunlei Pang Hi Ingo, On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote: > > * Chen Yu <yu.c.chen@intel.com> wrote: > > > Currently we try to have e820_table_firmware to represent the > > original firmware memory layout passed to us by the bootloader, > > however it is not the case, the e820_table_firmware might still > > be modified by linux: > > 1. During bootup, the efi boot stub might allocate memory via > > efi service for the PCI device information structure, then > > later e820_reserve_setup_data() reserved these dynamically > > allocated structures(AKA, setup_data) in e820_table_firmware > > accordingly. > > 2. The kexec might also modify the e820_table_firmware. > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't > it modify e820_table? > Both the e820_table and e820_table_firmware will be updated in e820__reserve_setup_data(): Changing the PCI device information structures from E820_TYPE_RAM to E820_TYPE_RESERVED_KERN. > I.e. what is the point of having 3 different versions of the > memory layout table? My original thought was that, we should not record the modification from the efi boot stub into the e820_tabel_firmware and we are done. But after checking the code, I realized that if we do so the kexec might have potiential problem. The e820_table_firmware was introduced mainly for kexec and was used to pass the original memory layout to the second kernel: commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a Author: Bernhard Walle <bwalle@suse.de> Date: Fri Jun 27 13:12:55 2008 +0200 x86: use FIRMWARE_MEMMAP on x86/E820 Besides, the second kernel will not re-enter the efi boot stub code and it will reuse the PCI device information structure created by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN region. So these PCI device information structures will not be modified by the second kernel, as kexec will only pass the E820_TYPE_RAM to the second kernel, thus the latter could leverage ioremap to access the PCI information. So the problem is, if we do not record the PCI information in the e820_table_firmware, the PCI information will be kept as type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will be passed to the second kernel and might be allocated for ordinary use in the second kernel, as a result the second kernel might not get valid PCI information(might be overwritten by others). So currently we try to introduce a new e820_table_ori to represent the original one provided by the BIOS(mainly for hibernation memory layout md5 checking). Thanks, Yu > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout 2017-06-23 4:13 ` Chen Yu @ 2017-06-23 8:42 ` Ingo Molnar 2017-07-02 17:01 ` Chen Yu 0 siblings, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2017-06-23 8:42 UTC (permalink / raw) To: Chen Yu Cc: x86, linux-kernel, Rafael J . Wysocki, Ingo Molnar, Thomas Gleixner, Len Brown, Ying Huang, Xunlei Pang * Chen Yu <yu.c.chen@intel.com> wrote: > Hi Ingo, > On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote: > > > > * Chen Yu <yu.c.chen@intel.com> wrote: > > > > > Currently we try to have e820_table_firmware to represent the > > > original firmware memory layout passed to us by the bootloader, > > > however it is not the case, the e820_table_firmware might still > > > be modified by linux: > > > 1. During bootup, the efi boot stub might allocate memory via > > > efi service for the PCI device information structure, then > > > later e820_reserve_setup_data() reserved these dynamically > > > allocated structures(AKA, setup_data) in e820_table_firmware > > > accordingly. > > > 2. The kexec might also modify the e820_table_firmware. > > > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't > > it modify e820_table? > > > Both the e820_table and e820_table_firmware will be updated in > e820__reserve_setup_data(): > Changing the PCI device information structures from E820_TYPE_RAM > to E820_TYPE_RESERVED_KERN. > > I.e. what is the point of having 3 different versions of the > > memory layout table? > My original thought was that, we should not record the modification > from the efi boot stub into the e820_tabel_firmware and we are done. > But after checking the code, I realized that if we do so the > kexec might have potiential problem. > > The e820_table_firmware was introduced mainly for kexec and > was used to pass the original memory layout to the second > kernel: > > commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a > Author: Bernhard Walle <bwalle@suse.de> > Date: Fri Jun 27 13:12:55 2008 +0200 > > x86: use FIRMWARE_MEMMAP on x86/E820 > > Besides, the second kernel will not re-enter the efi boot stub > code and it will reuse the PCI device information structure created > by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN > region. So these PCI device information structures will not be > modified by the second kernel, as kexec will only pass the E820_TYPE_RAM > to the second kernel, thus the latter could leverage ioremap to access > the PCI information. > > So the problem is, if we do not record the PCI information in > the e820_table_firmware, the PCI information will be kept as > type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will > be passed to the second kernel and might be allocated for ordinary > use in the second kernel, as a result the second kernel might not > get valid PCI information(might be overwritten by others). So > currently we try to introduce a new e820_table_ori to represent > the original one provided by the BIOS(mainly for hibernation > memory layout md5 checking). So there's 3 versions we need: - the original 'firmware' table as-is - for MD5 check and other potential purposes - some intermediate version of the table for kexec: what is the exact definition of that table, what changes from the real table does it _not_ want? - the 'real' table all the naming should reflect that. I.e. instead of some nonsensical "_ori" postfix, that is really the _firmware table. If kexec needs a separate one then name it _kexec and copy it at the right stage. Ok? Thanks, Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout 2017-06-23 8:42 ` Ingo Molnar @ 2017-07-02 17:01 ` Chen Yu 0 siblings, 0 replies; 7+ messages in thread From: Chen Yu @ 2017-07-02 17:01 UTC (permalink / raw) To: Ingo Molnar Cc: x86, linux-kernel, Rafael J . Wysocki, Ingo Molnar, Thomas Gleixner, Len Brown, Ying Huang, Xunlei Pang On Fri, Jun 23, 2017 at 10:42:10AM +0200, Ingo Molnar wrote: > > * Chen Yu <yu.c.chen@intel.com> wrote: > > > Hi Ingo, > > On Thu, Jun 22, 2017 at 11:40:30AM +0200, Ingo Molnar wrote: > > > > > > * Chen Yu <yu.c.chen@intel.com> wrote: > > > > > > > Currently we try to have e820_table_firmware to represent the > > > > original firmware memory layout passed to us by the bootloader, > > > > however it is not the case, the e820_table_firmware might still > > > > be modified by linux: > > > > 1. During bootup, the efi boot stub might allocate memory via > > > > efi service for the PCI device information structure, then > > > > later e820_reserve_setup_data() reserved these dynamically > > > > allocated structures(AKA, setup_data) in e820_table_firmware > > > > accordingly. > > > > 2. The kexec might also modify the e820_table_firmware. > > > > > > Hm, so why does the EFI code modify e280_table_firmware - why doesn't > > > it modify e820_table? > > > > > Both the e820_table and e820_table_firmware will be updated in > > e820__reserve_setup_data(): > > Changing the PCI device information structures from E820_TYPE_RAM > > to E820_TYPE_RESERVED_KERN. > > > I.e. what is the point of having 3 different versions of the > > > memory layout table? > > My original thought was that, we should not record the modification > > from the efi boot stub into the e820_tabel_firmware and we are done. > > But after checking the code, I realized that if we do so the > > kexec might have potiential problem. > > > > The e820_table_firmware was introduced mainly for kexec and > > was used to pass the original memory layout to the second > > kernel: > > > > commit 5dfcf14d5b28174f94cbe9b4fb35d415db61c64a > > Author: Bernhard Walle <bwalle@suse.de> > > Date: Fri Jun 27 13:12:55 2008 +0200 > > > > x86: use FIRMWARE_MEMMAP on x86/E820 > > > > Besides, the second kernel will not re-enter the efi boot stub > > code and it will reuse the PCI device information structure created > > by the first kernel, which is stored in the E820_TYPE_RESERVED_KERN > > region. So these PCI device information structures will not be > > modified by the second kernel, as kexec will only pass the E820_TYPE_RAM > > to the second kernel, thus the latter could leverage ioremap to access > > the PCI information. > > > > So the problem is, if we do not record the PCI information in > > the e820_table_firmware, the PCI information will be kept as > > type E820_TYPE_RAM, and all the E820_TYPE_RAM type regions will > > be passed to the second kernel and might be allocated for ordinary > > use in the second kernel, as a result the second kernel might not > > get valid PCI information(might be overwritten by others). So > > currently we try to introduce a new e820_table_ori to represent > > the original one provided by the BIOS(mainly for hibernation > > memory layout md5 checking). > > So there's 3 versions we need: > > - the original 'firmware' table as-is - for MD5 check and other potential > purposes > > - some intermediate version of the table for kexec: what is the exact definition > of that table, what changes from the real table does it _not_ want? > Some boot options such as 'mem=' are not wanted by kexec, because the kexec wants to let the second kernel see the whole memory layout passed by the bootloader. I think this is why e820_table_firmware was introduced. > - the 'real' table > > all the naming should reflect that. I.e. instead of some nonsensical "_ori" > postfix, that is really the _firmware table. If kexec needs a separate one then > name it _kexec and copy it at the right stage. > > Ok? > Ok. I'm sending V2 of this patch. I tried not to break the old behavior and split the patch into three, thus the logic might look more clear. > Thanks, > > Ingo Thanks, Yu ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2][RFC] PM / hibernate: Utilize the original e820 map for consistent check 2017-06-17 5:21 [PATCH 0/2][RFC] Introduce e820_table_ori to fix the memory inconsistent problem during hibernation Chen Yu 2017-06-17 5:22 ` [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout Chen Yu @ 2017-06-17 5:23 ` Chen Yu 1 sibling, 0 replies; 7+ messages in thread From: Chen Yu @ 2017-06-17 5:23 UTC (permalink / raw) To: linux-pm Cc: linux-kernel, Yu Chen, Ingo Molnar, Rafael J . Wysocki, Len Brown, Matt Fleming, Ingo Molnar, Thomas Gleixner Use the e820_table_ori instead of e820_table_firmware to check the consistence of memory layout provided by BIOS, because the e820_table_firmware might be modified by the kernel such as efi boot stub. To be more specific, during bootup, the efi boot stub might allocate memory via efi service for the PCI device information structure, then e820_reserve_setup_data() reserved these dynamically allocated structures(AKA, setup_data) in e820_table_firmware accordingly, changing their attribute from E820_TYPE_RAM to E820_TYPE_RESERVED_KERN. So e820_table_firmware is not the original BIOS-provided memory layout anymore. As a result, we might get false-positive report that the memory layout is inconsistent across hibernation: The suspend kernel: [ 0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable The resume kernel: [ 0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable ... [ 15.752088] PM: Using 3 thread(s) for decompression. [ 15.752088] PM: Loading and decompressing image data (471870 pages)... [ 15.764971] Hibernate inconsistent memory map detected! [ 15.770833] PM: Image mismatch: architecture specific data Actually it is safe to restore these pages because E820_TYPE_RAM and E820_TYPE_RESERVED_KERN are treated the same during hibernation. Reported-by: James Ren <james.ren@intel.com> Reported-by: "Mejia, Leonidas A" <leonidas.a.mejia@intel.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <lenb@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-pm@vger.kernel.org Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- arch/x86/power/hibernate_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c index a6e21fe..4bf087d 100644 --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -250,7 +250,7 @@ static int get_e820_md5(struct e820_table *table, void *buf) static void hibernation_e820_save(void *buf) { - get_e820_md5(e820_table_firmware, buf); + get_e820_md5(e820_table_ori, buf); } static bool hibernation_e820_mismatch(void *buf) @@ -263,7 +263,7 @@ static bool hibernation_e820_mismatch(void *buf) if (!memcmp(result, buf, MD5_DIGEST_SIZE)) return false; - ret = get_e820_md5(e820_table_firmware, result); + ret = get_e820_md5(e820_table_ori, result); if (ret) return true; -- 2.7.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-07-02 17:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-17 5:21 [PATCH 0/2][RFC] Introduce e820_table_ori to fix the memory inconsistent problem during hibernation Chen Yu 2017-06-17 5:22 ` [PATCH 1/2][RFC] x86/boot/e820: Introduce e820_table_ori to represent the real original e820 layout Chen Yu 2017-06-22 9:40 ` Ingo Molnar 2017-06-23 4:13 ` Chen Yu 2017-06-23 8:42 ` Ingo Molnar 2017-07-02 17:01 ` Chen Yu 2017-06-17 5:23 ` [PATCH 2/2][RFC] PM / hibernate: Utilize the original e820 map for consistent check Chen Yu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).