From: ebiederm@xmission.com (Eric W. Biederman)
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>,
nigel@nigel.suspend2.net, "Rafael J. Wysocki" <rjw@sisk.pl>,
Andrew Morton <akpm@linux-foundation.org>,
Jeremy Maitin-Shepard <jbms@cmu.edu>,
linux-kernel@vger.kernel.org,
linux-pm@lists.linux-foundation.org,
Kexec Mailing List <kexec@lists.infradead.org>
Subject: Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
Date: Mon, 10 Dec 2007 19:25:43 -0700 [thread overview]
Message-ID: <m1tzmq0xy0.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <1197042810.24045.61.camel@caritas-dev.intel.com> (Ying Huang's message of "Fri, 07 Dec 2007 15:53:30 +0000")
"Huang, Ying" <ying.huang@intel.com> writes:
> This patch implements the functionality of jumping between the kexeced
> kernel and the original kernel.
>
> To support jumping between two kernels, before jumping to (executing)
> the new kernel and jumping back to the original kernel, the devices
> are put into quiescent state, and the state of devices and CPU is
> saved. After jumping back from kexeced kernel and jumping to the new
> kernel, the state of devices and CPU are restored accordingly. The
> devices/CPU state save/restore code of software suspend is called to
> implement corresponding function.
>
> To support jumping without reserving memory. One shadow backup page
> (source page) is allocated for each page used by new (kexeced) kernel
> (destination page). When do kexec_load, the image of new kernel is
> loaded into source pages, and before executing, the destination pages
> and the source pages are swapped, so the contents of destination pages
> are backupped. Before jumping to the new (kexeced) kernel and after
> jumping back to the original kernel, the destination pages and the
> source pages are swapped too.
>
> A jump back protocol for kexec is defined and documented. It is an
> extension to ordinary function calling protocol. So, the facility
> provided by this patch can be used to call ordinary C function in real
> mode.
>
> A set of flags for sys_kexec_load are added to control which state are
> saved/restored before/after real mode code executing. For example, you
> can specify the device state and FPU state are saved/restored
> before/after real mode code executing.
>
> The states (exclude CPU state) save/restore code can be overridden
> based on the "command" parameter of kexec jump. Because more states
> need to be saved/restored by hibernating/resuming.
>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
>
> ---
> Documentation/i386/jump_back_protocol.txt | 103 ++++++++++++++
> arch/powerpc/kernel/machine_kexec.c | 2
> arch/ppc/kernel/machine_kexec.c | 2
> arch/sh/kernel/machine_kexec.c | 2
> arch/x86/kernel/machine_kexec_32.c | 88 +++++++++---
> arch/x86/kernel/machine_kexec_64.c | 2
> arch/x86/kernel/relocate_kernel_32.S | 214 +++++++++++++++++++++++++++---
> include/asm-x86/kexec_32.h | 39 ++++-
> include/linux/kexec.h | 40 +++++
> kernel/kexec.c | 188 ++++++++++++++++++++++++++
> kernel/power/Kconfig | 2
> kernel/sys.c | 35 +++-
> 12 files changed, 648 insertions(+), 69 deletions(-)
>
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -20,6 +20,7 @@
> #include <asm/cpufeature.h>
> #include <asm/desc.h>
> #include <asm/system.h>
> +#include <asm/cacheflush.h>
>
> #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
> static u32 kexec_pgd[1024] PAGE_ALIGNED;
> @@ -83,10 +84,14 @@ static void load_segments(void)
> * reboot code buffer to allow us to avoid allocations
> * later.
> *
> - * Currently nothing.
> + * Turn off NX bit for control page.
> */
> int machine_kexec_prepare(struct kimage *image)
> {
> + if (nx_enabled) {
> + change_page_attr(image->control_code_page, 1, PAGE_KERNEL_EXEC);
> + global_flush_tlb();
> + }
> return 0;
> }
>
> @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage
> */
> void machine_kexec_cleanup(struct kimage *image)
> {
> + if (nx_enabled) {
> + change_page_attr(image->control_code_page, 1, PAGE_KERNEL);
> + global_flush_tlb();
> + }
> +}
> +
> +void machine_kexec(struct kimage *image)
> +{
> + machine_kexec_call(image, NULL, 0);
> }
>
> /*
> * Do not allocate memory (or fail in any way) in machine_kexec().
> * We are past the point of no return, committed to rebooting now.
> */
> -NORET_TYPE void machine_kexec(struct kimage *image)
> +int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
> + unsigned int argc, va_list args)
> {
Why do we need var arg support?
Can't we do that with a shim we load from user space?
> unsigned long page_list[PAGES_NR];
> void *control_page;
> + asmlinkage NORET_TYPE void
> + (*relocate_kernel_ptr)(unsigned long indirection_page,
> + unsigned long control_page,
> + unsigned long start_address,
> + unsigned int has_pae) ATTRIB_NORET;
>
> /* Interrupts aren't acceptable while we reboot */
> local_irq_disable();
>
> control_page = page_address(image->control_code_page);
> - memcpy(control_page, relocate_kernel, PAGE_SIZE);
> + memcpy(control_page, relocate_page, PAGE_SIZE/2);
> + KCALL_MAGIC(control_page) = 0;
>
> + if (image->preserve_cpu) {
> + unsigned int i;
> + KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER;
> + KCALL_ARGC(control_page) = argc;
> + for (i = 0; i < argc; i++)
> + KCALL_ARGS(control_page)[i] = \
> + va_arg(args, unsigned long);
> +
> + if (kexec_call_save_cpu(control_page)) {
> + image->start = KCALL_ENTRY(control_page);
> + if (ret)
> + *ret = KCALL_ARGS(control_page)[0];
> + return 0;
> + }
> + }
> +
> + relocate_kernel_ptr = control_page +
> + ((void *)relocate_kernel - (void *)relocate_page);
> page_list[PA_CONTROL_PAGE] = __pa(control_page);
> - page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> + page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
> page_list[PA_PGD] = __pa(kexec_pgd);
> page_list[VA_PGD] = (unsigned long)kexec_pgd;
> #ifdef CONFIG_X86_PAE
> @@ -127,26 +166,33 @@ NORET_TYPE void machine_kexec(struct kim
> page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
> page_list[PA_PTE_1] = __pa(kexec_pte1);
> page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
>
> - /* The segment registers are funny things, they have both a
> - * visible and an invisible part. Whenever the visible part is
> - * set to a specific selector, the invisible part is loaded
> - * with from a table in memory. At no other time is the
> - * descriptor table in memory accessed.
> - *
> - * I take advantage of this here by force loading the
> - * segments, before I zap the gdt with an invalid value.
> - */
> - load_segments();
> - /* The gdt & idt are now invalid.
> - * If you want to load them you must set up your own idt & gdt.
> - */
> - set_gdt(phys_to_virt(0),0);
> - set_idt(phys_to_virt(0),0);
> + if (image->preserve_cpu_ext) {
> + /* The segment registers are funny things, they have
> + * both a visible and an invisible part. Whenever the
> + * visible part is set to a specific selector, the
> + * invisible part is loaded with from a table in
> + * memory. At no other time is the descriptor table
> + * in memory accessed.
> + *
> + * I take advantage of this here by force loading the
> + * segments, before I zap the gdt with an invalid
> + * value.
> + */
> + load_segments();
> + /* The gdt & idt are now invalid. If you want to load
> + * them you must set up your own idt & gdt.
> + */
> + set_gdt(phys_to_virt(0), 0);
> + set_idt(phys_to_virt(0), 0);
> + }
We can't keep the same idt and gdt as the pages they are on will be
overwritten/reused. So explictily stomping on them sounds better
so they never work. We can restore them on kernel reentry.
> /* now call it */
> - relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> - image->start, cpu_has_pae);
> + relocate_kernel_ptr((unsigned long)image->head,
> + (unsigned long)page_list,
> + image->start, cpu_has_pae);
Why rename relocate_kernel?
Ah. I see. You need to make it into a pointer again. The crazy don't
stop the pgd support strikes again. It used to be named rnk.
More later.
Eric
next prev parent reply other threads:[~2007-12-11 2:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-07 15:53 [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump Huang, Ying
2007-12-08 23:53 ` Pavel Machek
2007-12-09 0:19 ` Rafael J. Wysocki
2007-12-09 1:06 ` Eric W. Biederman
2007-12-10 19:55 ` Vivek Goyal
2007-12-11 8:51 ` Huang, Ying
2007-12-10 22:31 ` Vivek Goyal
2007-12-11 8:55 ` Huang, Ying
2007-12-11 2:25 ` Eric W. Biederman [this message]
2007-12-11 15:50 ` Huang, Ying
2007-12-11 9:27 ` Eric W. Biederman
2007-12-12 6:27 ` Huang, Ying
2007-12-18 8:34 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1tzmq0xy0.fsf@ebiederm.dsl.xmission.com \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=jbms@cmu.edu \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=nigel@nigel.suspend2.net \
--cc=pavel@ucw.cz \
--cc=rjw@sisk.pl \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).