All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
@ 2009-02-07  0:17 Nauman Rafique
       [not found] ` <20090207001609.8168.14884.stgit-AP77eCFSSktSzHKm+aFRNNkmqwFzkYv6@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Nauman Rafique @ 2009-02-07  0:17 UTC (permalink / raw)
  To: m-takahashi-kvZsz0w9TrB8UrSeD/g0lQ,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX

The patch sent by Masahiko assumes that all the user-space registers are saved on
the kernel stack on a system call. This is not true for the majority
of the system calls. The callee saved registers (as defined by x86_64
ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
cases. That means that these registers would not be available to
checkpoint code. Moreover, the restore code would have no space in
stack to restore those registers.

This patch partially solves that problem, but using a stub around
checkpoint/restart system calls. This stub saves/restores those callee
saved registers to/from the kernel stack. This solves the problem in
the case of self checkpoint and restore.

In case of external checkpoint, there is no clean way to have access
to these callee saved registers. We freeze or SIGSTOP the process that
has to be checkpointed. The process could have entered the kernel
space via any arbitrary code path before it was stopped or
frozen. Thus the callee saved registers were not saved in pt_regs
(i.e. the bottom of the kernel mode stack). They would be saved at
some arbitrary place in the kernel mode stack. And when we want to
checkpoint that process, we cannot find those registers and save them
in the checkpoint.

Possible solutions to this external checkpointing problem include
saving/restoring all registers (not feasible as it would have
performance penalty for every code path), and overloading a signal for
achieving external checkpointing. Any ideas?
---

 arch/x86/include/asm/unistd_64.h |    4 ++--
 arch/x86/kernel/entry_64.S       |   10 ++++++++++
 arch/x86/mm/checkpoint.c         |    3 +--
 arch/x86/mm/restart.c            |    5 ++---
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index fe7174d..76aa903 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -654,9 +654,9 @@ __SYSCALL(__NR_pipe2, sys_pipe2)
 #define __NR_inotify_init1			294
 __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
 #define __NR_checkpoint				295
-__SYSCALL(__NR_checkpoint, sys_checkpoint)
+__SYSCALL(__NR_checkpoint, stub_checkpoint)
 #define __NR_restart				296
-__SYSCALL(__NR_restart, sys_restart)
+__SYSCALL(__NR_restart, stub_restart)
 
 
 #ifndef __NO_STUBS
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b86f332..0369267 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -545,6 +545,14 @@ END(system_call)
 END(\label)
 	.endm
 
+	.macro FULLSTACKCALL label,func
+	.globl \label
+	\label:
+	leaq    \func(%rip),%rax
+	jmp     ptregscall_common
+	END(\label)
+	.endm
+	
 	CFI_STARTPROC
 
 	PTREGSCALL stub_clone, sys_clone, %r8
@@ -552,6 +560,8 @@ END(\label)
 	PTREGSCALL stub_vfork, sys_vfork, %rdi
 	PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
 	PTREGSCALL stub_iopl, sys_iopl, %rsi
+	FULLSTACKCALL stub_restart, sys_restart
+	FULLSTACKCALL stub_checkpoint, sys_checkpoint
 
 ENTRY(ptregscall_common)
 	popq %r11
diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
index 2514f14..a26332d 100644
--- a/arch/x86/mm/checkpoint.c
+++ b/arch/x86/mm/checkpoint.c
@@ -75,10 +75,10 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 	hh->ip = regs->ip;
 	hh->cs = regs->cs;
 	hh->flags = regs->flags;
+	hh->sp = regs->sp;
 	hh->ss = regs->ss;
 
 #ifdef CONFIG_X86_64
-	hh->sp = read_pda (oldrsp);
 	hh->r8 = regs->r8;
 	hh->r9 = regs->r9;
 	hh->r10 = regs->r10;
@@ -90,7 +90,6 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 	hh->ds = thread->ds;
 	hh->es = thread->es;
 #else /* !CONFIG_X86_64 */
-	hh->sp = regs->sp;
 	hh->ds = regs->ds;
 	hh->es = regs->es;
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
index a10d63e..329f938 100644
--- a/arch/x86/mm/restart.c
+++ b/arch/x86/mm/restart.c
@@ -111,15 +111,14 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 	regs->cs = hh->cs;
 	regs->flags = hh->flags;
 	regs->sp = hh->sp;
-	write_pda(oldrsp, hh->sp);
 	regs->ss = hh->ss;
 
-	thread->gs = hh->gs;
-	thread->fs = hh->fs;
 #ifdef CONFIG_X86_64
 	do_arch_prctl(t, ARCH_SET_FS, hh->fs);
 	do_arch_prctl(t, ARCH_SET_GS, hh->gs);
 #else
+	thread->gs = hh->gs;
+	thread->fs = hh->fs;
 	loadsegment(gs, hh->gs);
 	loadsegment(fs, hh->fs);
 #endif

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found] ` <20090207001609.8168.14884.stgit-AP77eCFSSktSzHKm+aFRNNkmqwFzkYv6@public.gmane.org>
@ 2009-02-09 17:53   ` Jim Winget
       [not found]     ` <f4192e520902090953x43a98134hfaa8443d586a32a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-02-09 18:02   ` Dave Hansen
  1 sibling, 1 reply; 16+ messages in thread
From: Jim Winget @ 2009-02-09 17:53 UTC (permalink / raw)
  To: nauman-hpIqsD4AKlfQT0dZR+AlfA
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Any way to use a delayed checkpoint signal (perhaps somewhat
non-deterministic, e.g. "do it now" really means "do it pretty soon") that
is only taken on return to user space thus allowing a deterministic
solution?
Jim

On Fri, Feb 6, 2009 at 4:17 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:

>
> The patch sent by Masahiko assumes that all the user-space registers are
> saved on
> the kernel stack on a system call. This is not true for the majority
> of the system calls. The callee saved registers (as defined by x86_64
> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
> cases. That means that these registers would not be available to
> checkpoint code. Moreover, the restore code would have no space in
> stack to restore those registers.
>
> This patch partially solves that problem, but using a stub around
> checkpoint/restart system calls. This stub saves/restores those callee
> saved registers to/from the kernel stack. This solves the problem in
> the case of self checkpoint and restore.
>
> In case of external checkpoint, there is no clean way to have access
> to these callee saved registers. We freeze or SIGSTOP the process that
> has to be checkpointed. The process could have entered the kernel
> space via any arbitrary code path before it was stopped or
> frozen. Thus the callee saved registers were not saved in pt_regs
> (i.e. the bottom of the kernel mode stack). They would be saved at
> some arbitrary place in the kernel mode stack. And when we want to
> checkpoint that process, we cannot find those registers and save them
> in the checkpoint.
>
> Possible solutions to this external checkpointing problem include
> saving/restoring all registers (not feasible as it would have
> performance penalty for every code path), and overloading a signal for
> achieving external checkpointing. Any ideas?
> ---
>
>  arch/x86/include/asm/unistd_64.h |    4 ++--
>  arch/x86/kernel/entry_64.S       |   10 ++++++++++
>  arch/x86/mm/checkpoint.c         |    3 +--
>  arch/x86/mm/restart.c            |    5 ++---
>  4 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/unistd_64.h
> b/arch/x86/include/asm/unistd_64.h
> index fe7174d..76aa903 100644
> --- a/arch/x86/include/asm/unistd_64.h
> +++ b/arch/x86/include/asm/unistd_64.h
> @@ -654,9 +654,9 @@ __SYSCALL(__NR_pipe2, sys_pipe2)
>  #define __NR_inotify_init1                     294
>  __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
>  #define __NR_checkpoint                                295
> -__SYSCALL(__NR_checkpoint, sys_checkpoint)
> +__SYSCALL(__NR_checkpoint, stub_checkpoint)
>  #define __NR_restart                           296
> -__SYSCALL(__NR_restart, sys_restart)
> +__SYSCALL(__NR_restart, stub_restart)
>
>
>  #ifndef __NO_STUBS
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index b86f332..0369267 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -545,6 +545,14 @@ END(system_call)
>  END(\label)
>        .endm
>
> +       .macro FULLSTACKCALL label,func
> +       .globl \label
> +       \label:
> +       leaq    \func(%rip),%rax
> +       jmp     ptregscall_common
> +       END(\label)
> +       .endm
> +
>        CFI_STARTPROC
>
>        PTREGSCALL stub_clone, sys_clone, %r8
> @@ -552,6 +560,8 @@ END(\label)
>        PTREGSCALL stub_vfork, sys_vfork, %rdi
>        PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
>        PTREGSCALL stub_iopl, sys_iopl, %rsi
> +       FULLSTACKCALL stub_restart, sys_restart
> +       FULLSTACKCALL stub_checkpoint, sys_checkpoint
>
>  ENTRY(ptregscall_common)
>        popq %r11
> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
> index 2514f14..a26332d 100644
> --- a/arch/x86/mm/checkpoint.c
> +++ b/arch/x86/mm/checkpoint.c
> @@ -75,10 +75,10 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
>        hh->ip = regs->ip;
>        hh->cs = regs->cs;
>        hh->flags = regs->flags;
> +       hh->sp = regs->sp;
>        hh->ss = regs->ss;
>
>  #ifdef CONFIG_X86_64
> -       hh->sp = read_pda (oldrsp);
>        hh->r8 = regs->r8;
>        hh->r9 = regs->r9;
>        hh->r10 = regs->r10;
> @@ -90,7 +90,6 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
>        hh->ds = thread->ds;
>        hh->es = thread->es;
>  #else /* !CONFIG_X86_64 */
> -       hh->sp = regs->sp;
>        hh->ds = regs->ds;
>        hh->es = regs->es;
>  #endif /* CONFIG_X86_64 */
> diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
> index a10d63e..329f938 100644
> --- a/arch/x86/mm/restart.c
> +++ b/arch/x86/mm/restart.c
> @@ -111,15 +111,14 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
>        regs->cs = hh->cs;
>        regs->flags = hh->flags;
>        regs->sp = hh->sp;
> -       write_pda(oldrsp, hh->sp);
>        regs->ss = hh->ss;
>
> -       thread->gs = hh->gs;
> -       thread->fs = hh->fs;
>  #ifdef CONFIG_X86_64
>        do_arch_prctl(t, ARCH_SET_FS, hh->fs);
>        do_arch_prctl(t, ARCH_SET_GS, hh->gs);
>  #else
> +       thread->gs = hh->gs;
> +       thread->fs = hh->fs;
>        loadsegment(gs, hh->gs);
>        loadsegment(fs, hh->fs);
>  #endif
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups
> "kernel-live-migration" group.
> To post to this group, send email to kernel-live-migration-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
> To unsubscribe from this group, send email to
> kernel-live-migration+unsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org<kernel-live-migration%2Bunsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> For more options, visit this group at
> http://groups.google.com/a/google.com/group/kernel-live-migration?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found] ` <20090207001609.8168.14884.stgit-AP77eCFSSktSzHKm+aFRNNkmqwFzkYv6@public.gmane.org>
  2009-02-09 17:53   ` Jim Winget
@ 2009-02-09 18:02   ` Dave Hansen
  2009-02-09 18:06     ` Dave Hansen
  2009-02-10 22:27     ` Nauman Rafique
  1 sibling, 2 replies; 16+ messages in thread
From: Dave Hansen @ 2009-02-09 18:02 UTC (permalink / raw)
  To: Nauman Rafique
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA

On Fri, 2009-02-06 at 16:17 -0800, Nauman Rafique wrote:
> The patch sent by Masahiko assumes that all the user-space registers
> are saved on
> the kernel stack on a system call. This is not true for the majority
> of the system calls. The callee saved registers (as defined by x86_64
> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
> cases. That means that these registers would not be available to
> checkpoint code. Moreover, the restore code would have no space in
> stack to restore those registers.

According to this:

http://msdn.microsoft.com/en-us/library/6t169e9c(VS.80).aspx

Those registers all get clobbered on all function calls.  I assume that
userspace also considers them to get clobbered on system calls as
well.  

What are those special cases you are talking about?  Certain special
cases for entering the kernel where we do save those registers?

Signal handling and ptrace single stepping are two places I would
imagine we have to enter the kernel and preserve those registers.  Is
that why you were suggesting overloading signal delivery?

Thanks for pointing out the problem, though.  This one will be
interesting. :)

-- Dave

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
  2009-02-09 18:02   ` Dave Hansen
@ 2009-02-09 18:06     ` Dave Hansen
  2009-02-10 22:27     ` Nauman Rafique
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2009-02-09 18:06 UTC (permalink / raw)
  To: Nauman Rafique
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA

On Mon, 2009-02-09 at 10:02 -0800, Dave Hansen wrote:
> Signal handling and ptrace single stepping are two places I would
> imagine we have to enter the kernel and preserve those registers.  Is
> that why you were suggesting overloading signal delivery?

There is also, of course, good old interrupt handling.  Even if the
process is running and not making any system calls is timeslice has to
expire sometime. 

-- Dave

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]     ` <f4192e520902090953x43a98134hfaa8443d586a32a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-02-09 19:25       ` Mike Waychison
       [not found]         ` <49908323.3090606-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Waychison @ 2009-02-09 19:25 UTC (permalink / raw)
  To: Jim Winget
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Jim Winget wrote:
> Any way to use a delayed checkpoint signal (perhaps somewhat
> non-deterministic, e.g. "do it now" really means "do it pretty soon") that
> is only taken on return to user space thus allowing a deterministic
> solution?

Ya, I'm thinking that a 'checkpoint' signal would be advisory, with the 
SIG_DFL action performing the checkpoint itself.

Considering that we'd need to cleanly get access to all registers, the 
checkpoint itself needs to be a well defined path from 
userland->kernelland.  I'm wondering if sys_checkpoint could be this 
well-defined path using the PTREGSCALL stub macro.

For tasks that aren't checkpoint-aware, SIG_DFL could possibly be done 
by having the vsyscall page/vdso implement the userland sighandler that 
calls sys_checkpoint.

What this means though is that we won't be able to freeze or SIGSTOP 
tasks before checkpoint.  Both of these paths can be entered via a 
variety of kernel entry points and unless we start dumping the full 
ptregs on each entry point, we'll never be able to reliably get access 
to all registers.

sys_checkpoint itself would have to have it's own method to quiesce all 
the tasks (basically wait for all tasks to enter sys_checkpoint so that 
a multi-task checkpoint is self-consistent).  The nice thing about a 
signal too is that userland can block it and ignore it in a 
deterministic way.  The failure logic for ignored or blocked-for-a-long 
time can be pushed back down to userland.

This is all a dramatic shift from the current way things are done, so 
we'd be best getting a better feel for our options though..

> Jim
> 
> On Fri, Feb 6, 2009 at 4:17 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> The patch sent by Masahiko assumes that all the user-space registers are
>> saved on
>> the kernel stack on a system call. This is not true for the majority
>> of the system calls. The callee saved registers (as defined by x86_64
>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>> cases. That means that these registers would not be available to
>> checkpoint code. Moreover, the restore code would have no space in
>> stack to restore those registers.
>>
>> This patch partially solves that problem, but using a stub around
>> checkpoint/restart system calls. This stub saves/restores those callee
>> saved registers to/from the kernel stack. This solves the problem in
>> the case of self checkpoint and restore.
>>
>> In case of external checkpoint, there is no clean way to have access
>> to these callee saved registers. We freeze or SIGSTOP the process that
>> has to be checkpointed. The process could have entered the kernel
>> space via any arbitrary code path before it was stopped or
>> frozen. Thus the callee saved registers were not saved in pt_regs
>> (i.e. the bottom of the kernel mode stack). They would be saved at
>> some arbitrary place in the kernel mode stack. And when we want to
>> checkpoint that process, we cannot find those registers and save them
>> in the checkpoint.
>>
>> Possible solutions to this external checkpointing problem include
>> saving/restoring all registers (not feasible as it would have
>> performance penalty for every code path), and overloading a signal for
>> achieving external checkpointing. Any ideas?
>> ---
>>
>>  arch/x86/include/asm/unistd_64.h |    4 ++--
>>  arch/x86/kernel/entry_64.S       |   10 ++++++++++
>>  arch/x86/mm/checkpoint.c         |    3 +--
>>  arch/x86/mm/restart.c            |    5 ++---
>>  4 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/unistd_64.h
>> b/arch/x86/include/asm/unistd_64.h
>> index fe7174d..76aa903 100644
>> --- a/arch/x86/include/asm/unistd_64.h
>> +++ b/arch/x86/include/asm/unistd_64.h
>> @@ -654,9 +654,9 @@ __SYSCALL(__NR_pipe2, sys_pipe2)
>>  #define __NR_inotify_init1                     294
>>  __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
>>  #define __NR_checkpoint                                295
>> -__SYSCALL(__NR_checkpoint, sys_checkpoint)
>> +__SYSCALL(__NR_checkpoint, stub_checkpoint)
>>  #define __NR_restart                           296
>> -__SYSCALL(__NR_restart, sys_restart)
>> +__SYSCALL(__NR_restart, stub_restart)
>>
>>
>>  #ifndef __NO_STUBS
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index b86f332..0369267 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -545,6 +545,14 @@ END(system_call)
>>  END(\label)
>>        .endm
>>
>> +       .macro FULLSTACKCALL label,func
>> +       .globl \label
>> +       \label:
>> +       leaq    \func(%rip),%rax
>> +       jmp     ptregscall_common
>> +       END(\label)
>> +       .endm
>> +
>>        CFI_STARTPROC
>>
>>        PTREGSCALL stub_clone, sys_clone, %r8
>> @@ -552,6 +560,8 @@ END(\label)
>>        PTREGSCALL stub_vfork, sys_vfork, %rdi
>>        PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
>>        PTREGSCALL stub_iopl, sys_iopl, %rsi
>> +       FULLSTACKCALL stub_restart, sys_restart
>> +       FULLSTACKCALL stub_checkpoint, sys_checkpoint
>>
>>  ENTRY(ptregscall_common)
>>        popq %r11
>> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
>> index 2514f14..a26332d 100644
>> --- a/arch/x86/mm/checkpoint.c
>> +++ b/arch/x86/mm/checkpoint.c
>> @@ -75,10 +75,10 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        hh->ip = regs->ip;
>>        hh->cs = regs->cs;
>>        hh->flags = regs->flags;
>> +       hh->sp = regs->sp;
>>        hh->ss = regs->ss;
>>
>>  #ifdef CONFIG_X86_64
>> -       hh->sp = read_pda (oldrsp);
>>        hh->r8 = regs->r8;
>>        hh->r9 = regs->r9;
>>        hh->r10 = regs->r10;
>> @@ -90,7 +90,6 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        hh->ds = thread->ds;
>>        hh->es = thread->es;
>>  #else /* !CONFIG_X86_64 */
>> -       hh->sp = regs->sp;
>>        hh->ds = regs->ds;
>>        hh->es = regs->es;
>>  #endif /* CONFIG_X86_64 */
>> diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
>> index a10d63e..329f938 100644
>> --- a/arch/x86/mm/restart.c
>> +++ b/arch/x86/mm/restart.c
>> @@ -111,15 +111,14 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        regs->cs = hh->cs;
>>        regs->flags = hh->flags;
>>        regs->sp = hh->sp;
>> -       write_pda(oldrsp, hh->sp);
>>        regs->ss = hh->ss;
>>
>> -       thread->gs = hh->gs;
>> -       thread->fs = hh->fs;
>>  #ifdef CONFIG_X86_64
>>        do_arch_prctl(t, ARCH_SET_FS, hh->fs);
>>        do_arch_prctl(t, ARCH_SET_GS, hh->gs);
>>  #else
>> +       thread->gs = hh->gs;
>> +       thread->fs = hh->fs;
>>        loadsegment(gs, hh->gs);
>>        loadsegment(fs, hh->fs);
>>  #endif
>>
>>
>> --~--~---------~--~----~------------~-------~--~----~
>> You received this message because you are subscribed to the Google Groups
>> "kernel-live-migration" group.
>> To post to this group, send email to kernel-live-migration-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
>> To unsubscribe from this group, send email to
>> kernel-live-migration+unsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org<kernel-live-migration%2Bunsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> For more options, visit this group at
>> http://groups.google.com/a/google.com/group/kernel-live-migration?hl=en
>> -~----------~----~----~----~------~----~------~--~---
>>
>>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]         ` <49908323.3090606-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2009-02-09 20:14           ` Cedric Le Goater
       [not found]             ` <49908EA0.2080901-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Cedric Le Goater @ 2009-02-09 20:14 UTC (permalink / raw)
  To: Mike Waychison
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX, Jim Winget

Mike Waychison wrote:
> Jim Winget wrote:
>> Any way to use a delayed checkpoint signal (perhaps somewhat
>> non-deterministic, e.g. "do it now" really means "do it pretty soon") that
>> is only taken on return to user space thus allowing a deterministic
>> solution?
> 
> Ya, I'm thinking that a 'checkpoint' signal would be advisory, with the 
> SIG_DFL action performing the checkpoint itself.
> 
> Considering that we'd need to cleanly get access to all registers, the 
> checkpoint itself needs to be a well defined path from 
> userland->kernelland.  I'm wondering if sys_checkpoint could be this 
> well-defined path using the PTREGSCALL stub macro.
> 
> For tasks that aren't checkpoint-aware, SIG_DFL could possibly be done 
> by having the vsyscall page/vdso implement the userland sighandler that 
> calls sys_checkpoint.
> 
> What this means though is that we won't be able to freeze or SIGSTOP 
> tasks before checkpoint. 

the sys_checkpoint() in the userland sighandler you are proposing, is how 
you would freeze all the tasks of a container. Once all the tasks have 
entered sys_checkpoint() and are blocked on a wait queue, you can start 
gathering states. 

This means that you need to count how many tasks should enter sys_checkpoint(). 
The cgroup fork callback can be used to signal new comers and maintain
a coherent count of tasks. But we would also need an exit callback, which
is not available.  

> Both of these paths can be entered via a 
> variety of kernel entry points and unless we start dumping the full 
> ptregs on each entry point, we'll never be able to reliably get access 
> to all registers.
> 
> sys_checkpoint itself would have to have it's own method to quiesce all 
> the tasks (basically wait for all tasks to enter sys_checkpoint so that 
> a multi-task checkpoint is self-consistent).  

yes.

sys_restart() works the same, all the tasks are signalled in advance how 
many should enter the wait queue. once the task state is restored, you 
let each task restart from its signal handler using the cpu state that 
was saved on user stack at checkpoint time.

> The nice thing about a signal too is that userland can block it and 
> ignore it in a deterministic way.

yes and 

The *very* nice thing about a signal handler is that you don't have to
worry about your cpu state. I don't think it's a good idea to duplicate 
this code in the C/R framework. it is *very* arch dependent.

> The failure logic for ignored or blocked-for-a-long time can be pushed 
> back down to userland.
> 
> This is all a dramatic shift from the current way things are done, so 
> we'd be best getting a better feel for our options though..

I think that the current way of doing things is work in progress and needs
to be reviewed. The way checkpoint/restart is triggered has always been
controversial among the stakeholders.

We've been maintaining a C/R solution on ppc32, ppc64, x86, x86_64, ia64, 
s390, s390x since 2002 working on the above principles you are describing.
UNICOS and later IRIX used similar principles, following the POSIX draft
on checkpoint/restart.

For the signal, we have 'hijacked' SIGSTOP but new signals SIGCKPT and 
SIGRESTART would definitely be a nicer solution for a mainline solution.

Cheers,

C.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]             ` <49908EA0.2080901-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2009-02-10 10:25               ` Louis Rilling
  0 siblings, 0 replies; 16+ messages in thread
From: Louis Rilling @ 2009-02-10 10:25 UTC (permalink / raw)
  To: Cedric Le Goater
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Jim Winget


[-- Attachment #1.1: Type: text/plain, Size: 1756 bytes --]

On 09/02/09 21:14 +0100, Cedric Le Goater wrote:

[...]

> I think that the current way of doing things is work in progress and needs
> to be reviewed. The way checkpoint/restart is triggered has always been
> controversial among the stakeholders.
> 
> We've been maintaining a C/R solution on ppc32, ppc64, x86, x86_64, ia64, 
> s390, s390x since 2002 working on the above principles you are describing.
> UNICOS and later IRIX used similar principles, following the POSIX draft
> on checkpoint/restart.
> 
> For the signal, we have 'hijacked' SIGSTOP but new signals SIGCKPT and 
> SIGRESTART would definitely be a nicer solution for a mainline solution.

In Kerrighed we implemented a variant of signal handler, which is more
transparent to userspace. Instead of using userspace signals (for instance
SIGRTMIN), we send a signal with si_code = SI_KERRIGHED. When dequeuing such a
signal, get_signal_to_deliver() calls the appropriate kernel callback (eg.
task_checkpoint()), and then continues as if the signal were ignored.

This solution has two nice properties:
- no userspace signal is overloaded, so that applications can use whatever
  signals they want;
- every task will eventually handle the signal with the right callback, whether
  they use the VDSO page or not.

Of course there is the matching drawback: userspace cannot use the checkpoint
signal to perform application-dependent control. However, this kind of thing
could equally be done from the checkpoint callback setting up an
application-defined signal handler.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
  2009-02-09 18:02   ` Dave Hansen
  2009-02-09 18:06     ` Dave Hansen
@ 2009-02-10 22:27     ` Nauman Rafique
       [not found]       ` <e98e18940902101427i7459a7edke4fdd8404e2ef642-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Nauman Rafique @ 2009-02-10 22:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA

On Mon, Feb 9, 2009 at 10:02 AM, Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
> On Fri, 2009-02-06 at 16:17 -0800, Nauman Rafique wrote:
>> The patch sent by Masahiko assumes that all the user-space registers
>> are saved on
>> the kernel stack on a system call. This is not true for the majority
>> of the system calls. The callee saved registers (as defined by x86_64
>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>> cases. That means that these registers would not be available to
>> checkpoint code. Moreover, the restore code would have no space in
>> stack to restore those registers.
>
> According to this:
>
> http://msdn.microsoft.com/en-us/library/6t169e9c(VS.80).aspx
>
> Those registers all get clobbered on all function calls.  I assume that
> userspace also considers them to get clobbered on system calls as
> well.
>
> What are those special cases you are talking about?  Certain special
> cases for entering the kernel where we do save those registers?

There are the system calls the use the same stub that I have used to
save the full stack (and thus all the registers).
	sys_clone
	sys_fork
	sys_vfork
	sys_sigaltstack
	sys_iopl

>
> Signal handling and ptrace single stepping are two places I would
> imagine we have to enter the kernel and preserve those registers.  Is
> that why you were suggesting overloading signal delivery?
>
> Thanks for pointing out the problem, though.  This one will be
> interesting. :)
>
> -- Dave
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]       ` <e98e18940902101427i7459a7edke4fdd8404e2ef642-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-02-11  3:34         ` Nauman Rafique
       [not found]           ` <e98e18940902101934o6f93230ag7226da6013afd20-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Nauman Rafique @ 2009-02-11  3:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA

Actually looking at the code in entry_64.S again closely, external
checkpointing should work with my patch too. The callee save registers
-- rbx, rbp, r12, r13, r14, r15 -- are saved on the kernel stack
before calling signal handling code (i.e. right before switching from
kernel to user mode). This signal handling code would be called
whenever we are trying to checkpoint a process with SIGSTOP or cgroup
freezer. Thus these registers would be on the kernel stack of
checkpointed process. And we don't need any user level signal handling
for external checkpointing to work in x86_64. Sorry for causing
confusion.

On Tue, Feb 10, 2009 at 2:27 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Feb 9, 2009 at 10:02 AM, Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
>> On Fri, 2009-02-06 at 16:17 -0800, Nauman Rafique wrote:
>>> The patch sent by Masahiko assumes that all the user-space registers
>>> are saved on
>>> the kernel stack on a system call. This is not true for the majority
>>> of the system calls. The callee saved registers (as defined by x86_64
>>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>>> cases. That means that these registers would not be available to
>>> checkpoint code. Moreover, the restore code would have no space in
>>> stack to restore those registers.
>>
>> According to this:
>>
>> http://msdn.microsoft.com/en-us/library/6t169e9c(VS.80).aspx
>>
>> Those registers all get clobbered on all function calls.  I assume that
>> userspace also considers them to get clobbered on system calls as
>> well.
>>
>> What are those special cases you are talking about?  Certain special
>> cases for entering the kernel where we do save those registers?
>
> There are the system calls the use the same stub that I have used to
> save the full stack (and thus all the registers).
>        sys_clone
>        sys_fork
>        sys_vfork
>        sys_sigaltstack
>        sys_iopl
>
>>
>> Signal handling and ptrace single stepping are two places I would
>> imagine we have to enter the kernel and preserve those registers.  Is
>> that why you were suggesting overloading signal delivery?
>>
>> Thanks for pointing out the problem, though.  This one will be
>> interesting. :)
>>
>> -- Dave
>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]           ` <e98e18940902101934o6f93230ag7226da6013afd20-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-03-18  6:56             ` Oren Laadan
       [not found]               ` <49C09B03.6040403-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2009-03-18  6:56 UTC (permalink / raw)
  To: Nauman Rafique
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA, Dave Hansen


I was very confued byt he original post; there is no need for a special
signal (handling) for checkpoint.

To checkpoint, we first freeze (or stop) the processes, meaning that
they are kept with empty kernel stack before returning to user-mode.

We then rely on the fact that a process saves everything that it needs
before entering a syscall - so whatever is on the stack when it enters
the kernel must be preserved, the rest can be overwritten. Otherwise,
processes wouldn't survive context switches while in syscalls ...

Oren.

Nauman Rafique wrote:
> Actually looking at the code in entry_64.S again closely, external
> checkpointing should work with my patch too. The callee save registers
> -- rbx, rbp, r12, r13, r14, r15 -- are saved on the kernel stack
> before calling signal handling code (i.e. right before switching from
> kernel to user mode). This signal handling code would be called
> whenever we are trying to checkpoint a process with SIGSTOP or cgroup
> freezer. Thus these registers would be on the kernel stack of
> checkpointed process. And we don't need any user level signal handling
> for external checkpointing to work in x86_64. Sorry for causing
> confusion.
> 
> On Tue, Feb 10, 2009 at 2:27 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> On Mon, Feb 9, 2009 at 10:02 AM, Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
>>> On Fri, 2009-02-06 at 16:17 -0800, Nauman Rafique wrote:
>>>> The patch sent by Masahiko assumes that all the user-space registers
>>>> are saved on
>>>> the kernel stack on a system call. This is not true for the majority
>>>> of the system calls. The callee saved registers (as defined by x86_64
>>>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>>>> cases. That means that these registers would not be available to
>>>> checkpoint code. Moreover, the restore code would have no space in
>>>> stack to restore those registers.
>>> According to this:
>>>
>>> http://msdn.microsoft.com/en-us/library/6t169e9c(VS.80).aspx
>>>
>>> Those registers all get clobbered on all function calls.  I assume that
>>> userspace also considers them to get clobbered on system calls as
>>> well.
>>>
>>> What are those special cases you are talking about?  Certain special
>>> cases for entering the kernel where we do save those registers?
>> There are the system calls the use the same stub that I have used to
>> save the full stack (and thus all the registers).
>>        sys_clone
>>        sys_fork
>>        sys_vfork
>>        sys_sigaltstack
>>        sys_iopl
>>
>>> Signal handling and ptrace single stepping are two places I would
>>> imagine we have to enter the kernel and preserve those registers.  Is
>>> that why you were suggesting overloading signal delivery?
>>>
>>> Thanks for pointing out the problem, though.  This one will be
>>> interesting. :)
>>>
>>> -- Dave
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]               ` <49C09B03.6040403-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-03-20 17:21                 ` Nauman Rafique
  0 siblings, 0 replies; 16+ messages in thread
From: Nauman Rafique @ 2009-03-20 17:21 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA, Dave Hansen

On Tue, Mar 17, 2009 at 11:56 PM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote:
>
> I was very confued byt he original post; there is no need for a special
> signal (handling) for checkpoint.
>
> To checkpoint, we first freeze (or stop) the processes, meaning that
> they are kept with empty kernel stack before returning to user-mode.
>
> We then rely on the fact that a process saves everything that it needs
> before entering a syscall - so whatever is on the stack when it enters
> the kernel must be preserved, the rest can be overwritten. Otherwise,
> processes wouldn't survive context switches while in syscalls ...

Well in x86_64, everything that process needs is not saved on the
stack before entering the system call, for example, callee saved
registers (rbx, rbp, r12, r13, r14, r15). If there registers are used
anywhere in the kernel, they would be saved and restored from the
stack. On context switch, these registers are explicitly clobbered, so
that they are saved on the kernel stack of the outgoing process.

Anyways, with the stubs that I introduced in my patch, these registers
are saved before entering the system call, so the problem is solved. I
am now working on checkpointing/restoring 32-bit binaries on 64-bit
kernel (i.e compatibility mode). It is working with internal
checkpointing, but results in a seg fault in user mode after restore
for external checkpoint. I will post the patches as soon as I nail it
down.


>
> Oren.
>
> Nauman Rafique wrote:
>> Actually looking at the code in entry_64.S again closely, external
>> checkpointing should work with my patch too. The callee save registers
>> -- rbx, rbp, r12, r13, r14, r15 -- are saved on the kernel stack
>> before calling signal handling code (i.e. right before switching from
>> kernel to user mode). This signal handling code would be called
>> whenever we are trying to checkpoint a process with SIGSTOP or cgroup
>> freezer. Thus these registers would be on the kernel stack of
>> checkpointed process. And we don't need any user level signal handling
>> for external checkpointing to work in x86_64. Sorry for causing
>> confusion.
>>
>> On Tue, Feb 10, 2009 at 2:27 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>> On Mon, Feb 9, 2009 at 10:02 AM, Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:
>>>> On Fri, 2009-02-06 at 16:17 -0800, Nauman Rafique wrote:
>>>>> The patch sent by Masahiko assumes that all the user-space registers
>>>>> are saved on
>>>>> the kernel stack on a system call. This is not true for the majority
>>>>> of the system calls. The callee saved registers (as defined by x86_64
>>>>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>>>>> cases. That means that these registers would not be available to
>>>>> checkpoint code. Moreover, the restore code would have no space in
>>>>> stack to restore those registers.
>>>> According to this:
>>>>
>>>> http://msdn.microsoft.com/en-us/library/6t169e9c(VS.80).aspx
>>>>
>>>> Those registers all get clobbered on all function calls.  I assume that
>>>> userspace also considers them to get clobbered on system calls as
>>>> well.
>>>>
>>>> What are those special cases you are talking about?  Certain special
>>>> cases for entering the kernel where we do save those registers?
>>> There are the system calls the use the same stub that I have used to
>>> save the full stack (and thus all the registers).
>>>        sys_clone
>>>        sys_fork
>>>        sys_vfork
>>>        sys_sigaltstack
>>>        sys_iopl
>>>
>>>> Signal handling and ptrace single stepping are two places I would
>>>> imagine we have to enter the kernel and preserve those registers.  Is
>>>> that why you were suggesting overloading signal delivery?
>>>>
>>>> Thanks for pointing out the problem, though.  This one will be
>>>> interesting. :)
>>>>
>>>> -- Dave
>>>>
>>>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
  2009-02-04 16:21         ` Dave Hansen
@ 2009-02-05  1:13           ` Masahiko Takahashi
  0 siblings, 0 replies; 16+ messages in thread
From: Masahiko Takahashi @ 2009-02-05  1:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX

Hi Dave,

Actually, that was the latest patch I made.
You're right, it needs cleanups and I'm sorry
for using #ifdefs.
If there is anything I can do, please let me know.


Thanks,

Masahiko.


On 2009-02-04 at 08:21 -0800, Dave Hansen wrote:
> On Wed, 2009-01-28 at 11:10 +0900, Masahiko Takahashi wrote:
> > I'm now working on porting to x86_64 with help from Nauman Rafique.
> > Here is the preliminary patch. If there is someone who is interested
> > in x86_64 support, please join.
> 
> Do you have anything more recent than this?  I think there are a few
> cleanups we can add to get rid of some of the #ifdef mess.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]       ` <090128111035.M0106630-n+Fz6uxiQ6t02ytvwG4l7tBPR1lH4CV8@public.gmane.org>
  2009-01-28 21:59         ` Serge E. Hallyn
@ 2009-02-04 16:21         ` Dave Hansen
  2009-02-05  1:13           ` Masahiko Takahashi
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2009-02-04 16:21 UTC (permalink / raw)
  To: m-takahashi-kvZsz0w9TrB8UrSeD/g0lQ
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX

On Wed, 2009-01-28 at 11:10 +0900, Masahiko Takahashi wrote:
> I'm now working on porting to x86_64 with help from Nauman Rafique.
> Here is the preliminary patch. If there is someone who is interested
> in x86_64 support, please join.

Do you have anything more recent than this?  I think there are a few
cleanups we can add to get rid of some of the #ifdef mess.  

-- Dave

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]           ` <20090128215902.GA5635-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-01-29  1:45             ` Masahiko Takahashi
  0 siblings, 0 replies; 16+ messages in thread
From: Masahiko Takahashi @ 2009-01-29  1:45 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX

Hi Serge,

Thank you for the comment.
I have checked the patch works on the Oren's latest patchset(v13) also.

And, ... I notice the email's subject was incorrect. Oops.
As you know, "x86_64 support" is correct.


Thanks,

Masahiko.


On 2009-01-28 at 15:59 -0600, Serge E. Hallyn wrote:
> Quoting Masahiko Takahashi (m-takahashi-kvZsz0w9TrB8UrSeD/g0lQ@public.gmane.org):
> > Hi,
> > 
> > I'm now working on porting to x86_64 with help from Nauman Rafique.
> > Here is the preliminary patch. If there is someone who is interested
> > in x86_64 support, please join.
> 
> Cool.  Looks nice and minimal.  Thanks for doing this.
> 
> > This patch is to support x86_64 on Oren's checkpoint/restart patchset
> > (v12 on December 29th). His patchset is well implemented, so the x86_64
> > patch is only handling architecture specific registers. (Maybe I'm
> > missing something important...)
> > 
> > I've tested this patch with his test suite (self.c and rstr.c) but I'm
> > not confident especially in handling segment registers (%fs/%gs).
> >
> > This patch doesn't support external checkpointing, nor 32bit binary
> 
> Heh, and I just realized that my s390 port doesn't do self-checkpoint,
> I'd only been doing external.  Oops.
> 
> > checkpoint/restart on 64bit Linux.
> > 
> > I would appreciate any comments and feedback.
> 
> -serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]       ` <090128111035.M0106630-n+Fz6uxiQ6t02ytvwG4l7tBPR1lH4CV8@public.gmane.org>
@ 2009-01-28 21:59         ` Serge E. Hallyn
       [not found]           ` <20090128215902.GA5635-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-02-04 16:21         ` Dave Hansen
  1 sibling, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2009-01-28 21:59 UTC (permalink / raw)
  To: Masahiko Takahashi
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX

Quoting Masahiko Takahashi (m-takahashi-kvZsz0w9TrB8UrSeD/g0lQ@public.gmane.org):
> Hi,
> 
> I'm now working on porting to x86_64 with help from Nauman Rafique.
> Here is the preliminary patch. If there is someone who is interested
> in x86_64 support, please join.

Cool.  Looks nice and minimal.  Thanks for doing this.

> This patch is to support x86_64 on Oren's checkpoint/restart patchset
> (v12 on December 29th). His patchset is well implemented, so the x86_64
> patch is only handling architecture specific registers. (Maybe I'm
> missing something important...)
> 
> I've tested this patch with his test suite (self.c and rstr.c) but I'm
> not confident especially in handling segment registers (%fs/%gs).
>
> This patch doesn't support external checkpointing, nor 32bit binary

Heh, and I just realized that my s390 port doesn't do self-checkpoint,
I'd only been doing external.  Oops.

> checkpoint/restart on 64bit Linux.
> 
> I would appreciate any comments and feedback.

-serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
       [not found]   ` <20090127155947.GB10039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-01-28  2:10     ` Masahiko Takahashi
       [not found]       ` <090128111035.M0106630-n+Fz6uxiQ6t02ytvwG4l7tBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Masahiko Takahashi @ 2009-01-28  2:10 UTC (permalink / raw)
  To: orenl-eQaUEPhvms7ENvBUuze7eA, serue-r/Jw6+rmf7HQT0dZR+AlfA,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA

Hi,

I'm now working on porting to x86_64 with help from Nauman Rafique.
Here is the preliminary patch. If there is someone who is interested
in x86_64 support, please join.

This patch is to support x86_64 on Oren's checkpoint/restart patchset
(v12 on December 29th). His patchset is well implemented, so the x86_64
patch is only handling architecture specific registers. (Maybe I'm
missing something important...)

I've tested this patch with his test suite (self.c and rstr.c) but I'm
not confident especially in handling segment registers (%fs/%gs).

This patch doesn't support external checkpointing, nor 32bit binary
checkpoint/restart on 64bit Linux.

I would appreciate any comments and feedback.


Thanks,

Masahiko.


In article <20090127155947.GB10039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:

> Quoting Ralph-Gordon Paul (Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX@public.gmane.org):
> > Hello,
> >
> > i'm searching for the right Mailing List for Linux checkpoint / restart.
> >
> > I'm working for the XtreemOS Project (http://www.xtreemos.eu). We want
> > to include the linux native checkpoint / restart, but it seems to not
> > work. I tested it with the delivered test applications, but they don't
> > work. The restart application (rstr) throws a segmentation fault. I
> > tried both tests from the documentation, checkpointing self and
> > checkpoint from another process. My Architecture is a Pentium D.
> >
> > Sorry if this is the wrong Mailing List, i couldn't find the right
> > place (forum or mailing list).
> >
> > Thanks for Help,
>
> Hi,
>
> Yup, this is the right list - well either this or lkml.
>
> IIUC you're saying you tested this on a 64-bit x86.  That is
> in fact not yet supported.  With Oren's patchset, you can
> only use x86_32.  (With the patch I sent last week, you could
> also do s390x.)  Support for x86_64 shouldn't take much work -
> I think Dave was going to see about such a port, and Oren
> may have one at the ready, but you could also do it yourself
> if you're so inclined.



Against Oren's patchset v12.

ToDo;
1. support external checkpoint (Nauman Rafique is working now)
2. support 32bit binary checkpoint/restart on 64bit Linux


Signed-off-by: Masahiko Takahashi <m-takahashi-kvZsz0w9TrB8UrSeD/g0lQ@public.gmane.org>
---
 arch/x86/include/asm/unistd_64.h |    4 +++
 arch/x86/mm/checkpoint.c         |   47 ++++++++++++++++++++++++++++++-------
 arch/x86/mm/restart.c            |   29 +++++++++++++++++------
 checkpoint/Kconfig               |    2 +-
 4 files changed, 64 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index d2e415e..fe7174d 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -653,6 +653,10 @@ __SYSCALL(__NR_dup3, sys_dup3)
 __SYSCALL(__NR_pipe2, sys_pipe2)
 #define __NR_inotify_init1			294
 __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_checkpoint				295
+__SYSCALL(__NR_checkpoint, sys_checkpoint)
+#define __NR_restart				296
+__SYSCALL(__NR_restart, sys_restart)
 
 
 #ifndef __NO_STUBS
diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
index 50bde9a..28fb08a 100644
--- a/arch/x86/mm/checkpoint.c
+++ b/arch/x86/mm/checkpoint.c
@@ -10,6 +10,7 @@
 
 #include <asm/desc.h>
 #include <asm/i387.h>
+#include <asm/prctl.h>
 
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
@@ -62,12 +63,6 @@ int cr_write_thread(struct cr_ctx *ctx, struct task_struct *t)
 	return ret;
 }
 
-#ifdef CONFIG_X86_64
-
-#error "CONFIG_X86_64 unsupported yet."
-
-#else	/* !CONFIG_X86_64 */
-
 static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 {
 	struct thread_struct *thread = &t->thread;
@@ -84,17 +79,52 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 	hh->ip = regs->ip;
 	hh->cs = regs->cs;
 	hh->flags = regs->flags;
-	hh->sp = regs->sp;
 	hh->ss = regs->ss;
 
+#ifdef CONFIG_X86_64
+	hh->sp = read_pda (oldrsp);
+	hh->r8 = regs->r8;
+	hh->r9 = regs->r9;
+	hh->r10 = regs->r10;
+	hh->r11 = regs->r11;
+	hh->r12 = regs->r12;
+	hh->r13 = regs->r13;
+	hh->r14 = regs->r14;
+	hh->r15 = regs->r15;
+	hh->ds = thread->ds;
+	hh->es = thread->es;
+#else /* !CONFIG_X86_64 */
+	hh->sp = regs->sp;
 	hh->ds = regs->ds;
 	hh->es = regs->es;
+#endif /* CONFIG_X86_64 */
 
 	/*
 	 * for checkpoint in process context (from within a container)
 	 * the GS and FS registers should be saved from the hardware;
 	 * otherwise they are already sabed on the thread structure
 	 */
+#ifdef CONFIG_X86_64
+	if (thread->fsindex == FS_TLS_SEL)
+		hh->fs = get_desc_base(&thread->tls_array[FS_TLS]);
+	else if (t == current)
+		rdmsrl(MSR_FS_BASE, hh->fs);
+	else
+		hh->fs = thread->fs;
+
+	if (thread->gsindex == GS_TLS_SEL)
+		hh->gs = get_desc_base(&thread->tls_array[GS_TLS]);
+	else if (t == current) {
+		unsigned gsindex;
+
+		savesegment(gs, gsindex);
+		if (gsindex)
+			rdmsrl(MSR_KERNEL_GS_BASE, hh->gs);
+		else
+			hh->gs = thread->gs;
+	} else
+		hh->gs = thread->gs;
+#else /* !CONFIG_X86_64 */
 	if (t == current) {
 		savesegment(gs, hh->gs);
 		savesegment(fs, hh->fs);
@@ -102,6 +132,7 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 		hh->gs = thread->gs;
 		hh->fs = thread->fs;
 	}
+#endif /* CONFIG_X86_64 */
 
 	/*
 	 * for checkpoint in process context (from within a container),
@@ -184,8 +215,6 @@ static int cr_write_cpu_fpu(struct cr_ctx *ctx, struct task_struct *t)
 	return ret;
 }
 
-#endif	/* CONFIG_X86_64 */
-
 /* dump the cpu state and registers of a given task */
 int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
 {
diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
index a682a1d..032ffd1 100644
--- a/arch/x86/mm/restart.c
+++ b/arch/x86/mm/restart.c
@@ -10,6 +10,8 @@
 
 #include <asm/desc.h>
 #include <asm/i387.h>
+#include <asm/prctl.h>
+#include <asm/proto.h>
 
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
@@ -77,12 +79,6 @@ int cr_read_thread(struct cr_ctx *ctx)
 	return ret;
 }
 
-#ifdef CONFIG_X86_64
-
-#error "CONFIG_X86_64 unsupported yet."
-
-#else	/* !CONFIG_X86_64 */
-
 static int cr_load_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 {
 	struct thread_struct *thread = &t->thread;
@@ -95,19 +91,38 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
 	regs->di = hh->di;
 	regs->bp = hh->bp;
 	regs->ax = hh->ax;
+#ifdef CONFIG_X86_64
+	regs->r8 = hh->r8;
+	regs->r9 = hh->r9;
+	regs->r10 = hh->r10;
+	regs->r11 = hh->r11;
+	regs->r12 = hh->r12;
+	regs->r13 = hh->r13;
+	regs->r14 = hh->r14;
+	regs->r15 = hh->r15;
+	thread->ds = hh->ds;
+	thread->es = hh->es;
+#else
 	regs->ds = hh->ds;
 	regs->es = hh->es;
+#endif /* CONFIG_X86_64 */
 	regs->orig_ax = hh->orig_ax;
 	regs->ip = hh->ip;
 	regs->cs = hh->cs;
 	regs->flags = hh->flags;
 	regs->sp = hh->sp;
+	write_pda(oldrsp, hh->sp);
 	regs->ss = hh->ss;
 
 	thread->gs = hh->gs;
 	thread->fs = hh->fs;
+#ifdef CONFIG_X86_64
+	do_arch_prctl(t, ARCH_SET_FS, hh->fs);
+	do_arch_prctl(t, ARCH_SET_GS, hh->gs);
+#else
 	loadsegment(gs, hh->gs);
 	loadsegment(fs, hh->fs);
+#endif
 
 	return 0;
 }
@@ -166,8 +181,6 @@ static int cr_read_cpu_fpu(struct cr_ctx *ctx, struct task_struct *t)
 	return 0;
 }
 
-#endif	/* CONFIG_X86_64 */
-
 /* read the cpu state and registers for the current task */
 int cr_read_cpu(struct cr_ctx *ctx)
 {
diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index ffaa635..bc7f09f 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -1,7 +1,7 @@
 config CHECKPOINT_RESTART
 	prompt "Enable checkpoint/restart (EXPERIMENTAL)"
 	def_bool n
-	depends on X86_32 && EXPERIMENTAL
+	depends on X86 && EXPERIMENTAL
 	help
 	  Application checkpoint/restart is the ability to save the
 	  state of a running application so that it can later resume
-- 
1.5.2.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-03-20 17:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-07  0:17 [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart) Nauman Rafique
     [not found] ` <20090207001609.8168.14884.stgit-AP77eCFSSktSzHKm+aFRNNkmqwFzkYv6@public.gmane.org>
2009-02-09 17:53   ` Jim Winget
     [not found]     ` <f4192e520902090953x43a98134hfaa8443d586a32a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-02-09 19:25       ` Mike Waychison
     [not found]         ` <49908323.3090606-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2009-02-09 20:14           ` Cedric Le Goater
     [not found]             ` <49908EA0.2080901-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-02-10 10:25               ` Louis Rilling
2009-02-09 18:02   ` Dave Hansen
2009-02-09 18:06     ` Dave Hansen
2009-02-10 22:27     ` Nauman Rafique
     [not found]       ` <e98e18940902101427i7459a7edke4fdd8404e2ef642-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-02-11  3:34         ` Nauman Rafique
     [not found]           ` <e98e18940902101934o6f93230ag7226da6013afd20-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-03-18  6:56             ` Oren Laadan
     [not found]               ` <49C09B03.6040403-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-20 17:21                 ` Nauman Rafique
  -- strict thread matches above, loose matches on Subject: below --
2009-01-27  9:12 Checkpoint / Restart Ralph-Gordon Paul
2009-01-27 15:59 ` Serge E. Hallyn
     [not found]   ` <20090127155947.GB10039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-28  2:10     ` [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart) Masahiko Takahashi
     [not found]       ` <090128111035.M0106630-n+Fz6uxiQ6t02ytvwG4l7tBPR1lH4CV8@public.gmane.org>
2009-01-28 21:59         ` Serge E. Hallyn
     [not found]           ` <20090128215902.GA5635-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-29  1:45             ` Masahiko Takahashi
2009-02-04 16:21         ` Dave Hansen
2009-02-05  1:13           ` Masahiko Takahashi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.