All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
@ 2010-01-27  7:16 Serge E. Hallyn
       [not found] ` <20100127071636.GA16624-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-01-27  7:16 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

arch_setup_additional_pages() on x86_64 fills in a 64-bit
vdso page.

With this patch, restart of 32-bit tasks (both self- and external
checkpoints) on x86-64 succeeds.

Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 mm/mmap.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1759a7f..e3d4178 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2324,6 +2324,10 @@ int special_mapping_restore(struct ckpt_ctx *ctx,
 	 * Even that, is very basic - call arch_setup_additional_pages
 	 * requiring the same mapping (start address) as before.
 	 */
+#ifdef CONFIG_X86_64 && CONFIG_COMPAT
+	if (test_thread_flag(TIF_IA32))
+		return syscall32_setup_pages(NULL, h->vm_start, 0);
+#endif
 	return arch_setup_additional_pages(NULL, h->vm_start, 0);
 }
 #else /* !CONFIG_CHECKPOINT */
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found] ` <20100127071636.GA16624-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-01-27 14:59   ` Oren Laadan
       [not found]     ` <Pine.LNX.4.64.1001270954120.8974-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-01-27 14:59 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers


Cool !

So what do we have working now for 64 bit kernel (for 32 bit kernel
we know it works...):

	'restart'	checkpointed
	 program	  program
	----------------------------------------
	  64bit		  64bit		-> works
	  32bit		  32bit		-> works

	  64bit		  32bit		-> ?????

Does it make sense to allow the opposite transition: 'restart' starts
as a 32bit and becomes a 64bit after it restores the state from the
image ?

And what about if you checkpoint on a 32 bit kernel and try to 
restart on a 64 bit kernel, and vice versa ?  (in both cases, the
program of course is 32bit, and we can assume same physical host
for now). 

Oren.



On Wed, 27 Jan 2010, Serge E. Hallyn wrote:

> arch_setup_additional_pages() on x86_64 fills in a 64-bit
> vdso page.
> 
> With this patch, restart of 32-bit tasks (both self- and external
> checkpoints) on x86-64 succeeds.
> 
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  mm/mmap.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1759a7f..e3d4178 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2324,6 +2324,10 @@ int special_mapping_restore(struct ckpt_ctx *ctx,
>  	 * Even that, is very basic - call arch_setup_additional_pages
>  	 * requiring the same mapping (start address) as before.
>  	 */
> +#ifdef CONFIG_X86_64 && CONFIG_COMPAT
> +	if (test_thread_flag(TIF_IA32))
> +		return syscall32_setup_pages(NULL, h->vm_start, 0);
> +#endif
>  	return arch_setup_additional_pages(NULL, h->vm_start, 0);
>  }
>  #else /* !CONFIG_CHECKPOINT */
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]     ` <Pine.LNX.4.64.1001270954120.8974-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
@ 2010-01-27 20:10       ` Serge E. Hallyn
       [not found]         ` <20100127201037.GA23119-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-01-27 20:10 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> Cool !
> 
> So what do we have working now for 64 bit kernel (for 32 bit kernel
> we know it works...):
> 
> 	'restart'	checkpointed
> 	 program	  program
> 	----------------------------------------
> 	  64bit		  64bit		-> works
> 	  32bit		  32bit		-> works
> 
> 	  64bit		  32bit		-> ?????

Actually the other way around works - /bin/restart_32 < 64bit.out
works just fine.  /bin/restart_64 < 32bit.out does not.  The reason
is that destroy_mm() ends up calling do_munmap on a 64-bit mapping
after the switch to 32-bit had been made, and it refuses bc
vma->vm_start > TASK_SIZE.

Perhaps getting it to work will be as simple as temporarily switching
back to 64-bit during destroy_mm().

> Does it make sense to allow the opposite transition: 'restart' starts
> as a 32bit and becomes a 64bit after it restores the state from the
> image ?
> 
> And what about if you checkpoint on a 32 bit kernel and try to 
> restart on a 64 bit kernel, and vice versa ?  (in both cases, the
> program of course is 32bit, and we can assume same physical host
> for now). 

I have no hw right now where I could test such a thing.  Do you?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]         ` <20100127201037.GA23119-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-01-27 20:51           ` Oren Laadan
       [not found]             ` <4B60A763.4030806-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-01-27 20:51 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>> Cool !
>>
>> So what do we have working now for 64 bit kernel (for 32 bit kernel
>> we know it works...):
>>
>> 	'restart'	checkpointed
>> 	 program	  program
>> 	----------------------------------------
>> 	  64bit		  64bit		-> works
>> 	  32bit		  32bit		-> works
>>
>> 	  64bit		  32bit		-> ?????
> 
> Actually the other way around works - /bin/restart_32 < 64bit.out
> works just fine.  /bin/restart_64 < 32bit.out does not.  The reason
> is that destroy_mm() ends up calling do_munmap on a 64-bit mapping
> after the switch to 32-bit had been made, and it refuses bc
> vma->vm_start > TASK_SIZE.
> 
> Perhaps getting it to work will be as simple as temporarily switching
> back to 64-bit during destroy_mm().

Interesting, I didn't think about it.

So yes, switching temporarily should work. An alternative we can
do the call destroy_mm() earlier, as it may suit us.

> 
>> Does it make sense to allow the opposite transition: 'restart' starts
>> as a 32bit and becomes a 64bit after it restores the state from the
>> image ?
>>
>> And what about if you checkpoint on a 32 bit kernel and try to 
>> restart on a 64 bit kernel, and vice versa ?  (in both cases, the
>> program of course is 32bit, and we can assume same physical host
>> for now). 
> 
> I have no hw right now where I could test such a thing.  Do you?
> 

Couldn't you use the same machine you are using - just reboot it
between checkpoint and restart with a 32bit kernel ... ?

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]             ` <4B60A763.4030806-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-01-27 21:10               ` Serge E. Hallyn
       [not found]                 ` <20100127211052.GA27579-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-01-27 21:10 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> Cool !
> >>
> >> So what do we have working now for 64 bit kernel (for 32 bit kernel
> >> we know it works...):
> >>
> >> 	'restart'	checkpointed
> >> 	 program	  program
> >> 	----------------------------------------
> >> 	  64bit		  64bit		-> works
> >> 	  32bit		  32bit		-> works
> >>
> >> 	  64bit		  32bit		-> ?????
> > 
> > Actually the other way around works - /bin/restart_32 < 64bit.out
> > works just fine.  /bin/restart_64 < 32bit.out does not.  The reason
> > is that destroy_mm() ends up calling do_munmap on a 64-bit mapping
> > after the switch to 32-bit had been made, and it refuses bc
> > vma->vm_start > TASK_SIZE.
> > 
> > Perhaps getting it to work will be as simple as temporarily switching
> > back to 64-bit during destroy_mm().
> 
> Interesting, I didn't think about it.
> 
> So yes, switching temporarily should work. An alternative we can
> do the call destroy_mm() earlier, as it may suit us.
> 
> > 
> >> Does it make sense to allow the opposite transition: 'restart' starts
> >> as a 32bit and becomes a 64bit after it restores the state from the
> >> image ?
> >>
> >> And what about if you checkpoint on a 32 bit kernel and try to 
> >> restart on a 64 bit kernel, and vice versa ?  (in both cases, the
> >> program of course is 32bit, and we can assume same physical host
> >> for now). 
> > 
> > I have no hw right now where I could test such a thing.  Do you?
> > 
> 
> Couldn't you use the same machine you are using - just reboot it
> between checkpoint and restart with a 32bit kernel ... ?

maybe - but i'm borrowing this machine (with no phys access) so don't
want to get too risky :)

can give it a shot i guess

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                 ` <20100127211052.GA27579-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-01-27 21:13                   ` Oren Laadan
       [not found]                     ` <4B60AC7E.2010908-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-01-27 21:13 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> Cool !
>>>>
>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
>>>> we know it works...):
>>>>
>>>> 	'restart'	checkpointed
>>>> 	 program	  program
>>>> 	----------------------------------------
>>>> 	  64bit		  64bit		-> works
>>>> 	  32bit		  32bit		-> works
>>>>
>>>> 	  64bit		  32bit		-> ?????
>>> Actually the other way around works - /bin/restart_32 < 64bit.out
>>> works just fine.  /bin/restart_64 < 32bit.out does not.  The reason
>>> is that destroy_mm() ends up calling do_munmap on a 64-bit mapping
>>> after the switch to 32-bit had been made, and it refuses bc
>>> vma->vm_start > TASK_SIZE.
>>>
>>> Perhaps getting it to work will be as simple as temporarily switching
>>> back to 64-bit during destroy_mm().
>> Interesting, I didn't think about it.
>>
>> So yes, switching temporarily should work. An alternative we can
>> do the call destroy_mm() earlier, as it may suit us.
>>
>>>> Does it make sense to allow the opposite transition: 'restart' starts
>>>> as a 32bit and becomes a 64bit after it restores the state from the
>>>> image ?
>>>>
>>>> And what about if you checkpoint on a 32 bit kernel and try to 
>>>> restart on a 64 bit kernel, and vice versa ?  (in both cases, the
>>>> program of course is 32bit, and we can assume same physical host
>>>> for now). 
>>> I have no hw right now where I could test such a thing.  Do you?
>>>
>> Couldn't you use the same machine you are using - just reboot it
>> between checkpoint and restart with a 32bit kernel ... ?
> 
> maybe - but i'm borrowing this machine (with no phys access) so don't
> want to get too risky :)
> 
> can give it a shot i guess
> 

Or you can run the 32bit kernel inside a VM on that machine...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                     ` <4B60AC7E.2010908-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-02-05 23:38                       ` Serge E. Hallyn
       [not found]                         ` <20100205233800.GA17057-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-02-05 23:38 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >> Serge E. Hallyn wrote:
> >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>> Cool !
> >>>>
> >>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
> >>>> we know it works...):
> >>>>
> >>>> 	'restart'	checkpointed
> >>>> 	 program	  program
> >>>> 	----------------------------------------
> >>>> 	  64bit		  64bit		-> works
> >>>> 	  32bit		  32bit		-> works
> >>>>
> >>>> 	  64bit		  32bit		-> ?????

s/?????/Rejected/

CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
we refuse restart in restore_read_header().

-serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                         ` <20100205233800.GA17057-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-06  1:04                           ` Oren Laadan
       [not found]                             ` <4B6CC00C.2090509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-02-06  1:04 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> Serge E. Hallyn wrote:
>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>> Cool !
>>>>>>
>>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
>>>>>> we know it works...):
>>>>>>
>>>>>> 	'restart'	checkpointed
>>>>>> 	 program	  program
>>>>>> 	----------------------------------------
>>>>>> 	  64bit		  64bit		-> works
>>>>>> 	  32bit		  32bit		-> works
>>>>>>
>>>>>> 	  64bit		  32bit		-> ?????
> 
> s/?????/Rejected/
> 
> CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> we refuse restart in restore_read_header().
> 
> -serge
> 

lol ... that's actually funny !

Anyway, in light of the IRC discussions, here are the cases again:


original	original	restart		target
program		kernel		program		kernel
--------	---------	--------	--------
64 bit		64 bit		64 bit		64 bit	  [0] works

32 bit		32 bit		32 bit		32 bit	  [0] works
32 bit		64 bit		32 bit		64 bit	  [0] works

32 bit		32 bit		32 bit		64 bit	  [1]
32 bit		64 bit		32 bit		32 bit	  [1]

32 bit		any		64 bit		64 bit	  [2]
64 bit		64 bit		32 bit		64 bit	  [2]

[0] The first 3 cases are "homogeneous", with conditions equal at
checkpoint and restart. AFAIK, they work.

[1] The next two cases consider 32 bit program, and vary only the
environment - the kernel may change from 32 to 64 or back. We want
them to work.

IIUC, your comment above means that they don't work because the
CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
make 'restart' modify it, or make the kernel tolerate it.

[2] The last two cases consider the case when the restart program
itself has different bit-ness than the checkpointed program (and
transition may occur in either direction). While lower priority,
we would like this to work, too.

The question is whether the transition 64 -> 32 (or 32 ->64) from
the 'restart' program to the restarting task should happen in the
kernel as part of sys_restart(), or in user space using an execve()
syscall before calling sys_restart().

Doing so in user space is not trivial when threads are involved,
since the exec must then happen before the creation of threads (or
it will kill them). This will complicate the implementation of the
MakeForest() algorithm which relies on all all descendents seeing
the same data structures.

Doing so in kernel should have been easy in theory, but in practice
so far it isn't working; and it may be frowned upon by kernel people
(allowing such transition not in exec).

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                             ` <4B6CC00C.2090509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-02-06  6:26                               ` Matt Helsley
       [not found]                                 ` <20100206062650.GG3714-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  2010-02-06 17:09                               ` Serge E. Hallyn
  1 sibling, 1 reply; 16+ messages in thread
From: Matt Helsley @ 2010-02-06  6:26 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

On Fri, Feb 05, 2010 at 08:04:12PM -0500, Oren Laadan wrote:
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >> Serge E. Hallyn wrote:
> >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>> Serge E. Hallyn wrote:
> >>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>> Cool !
> >>>>>>
> >>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
> >>>>>> we know it works...):
> >>>>>>
> >>>>>> 	'restart'	checkpointed
> >>>>>> 	 program	  program
> >>>>>> 	----------------------------------------
> >>>>>> 	  64bit		  64bit		-> works
> >>>>>> 	  32bit		  32bit		-> works
> >>>>>>
> >>>>>> 	  64bit		  32bit		-> ?????
> > 
> > s/?????/Rejected/
> > 
> > CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> > we refuse restart in restore_read_header().
> > 
> > -serge
> > 
> 
> lol ... that's actually funny !
> 
> Anyway, in light of the IRC discussions, here are the cases again:
> 
> 
> original	original	restart		target
> program		kernel		program		kernel
> --------	---------	--------	--------
> 64 bit		64 bit		64 bit		64 bit	  [0] works
> 
> 32 bit		32 bit		32 bit		32 bit	  [0] works
> 32 bit		64 bit		32 bit		64 bit	  [0] works
> 
> 32 bit		32 bit		32 bit		64 bit	  [1]
> 32 bit		64 bit		32 bit		32 bit	  [1]
> 
> 32 bit		any		64 bit		64 bit	  [2]
> 64 bit		64 bit		32 bit		64 bit	  [2]
> 
> [0] The first 3 cases are "homogeneous", with conditions equal at
> checkpoint and restart. AFAIK, they work.
> 
> [1] The next two cases consider 32 bit program, and vary only the
> environment - the kernel may change from 32 to 64 or back. We want
> them to work.
> 
> IIUC, your comment above means that they don't work because the
> CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> make 'restart' modify it, or make the kernel tolerate it.
> 
> [2] The last two cases consider the case when the restart program
> itself has different bit-ness than the checkpointed program (and
> transition may occur in either direction). While lower priority,
> we would like this to work, too.

Great table. Is it posted in the ckpt wiki too?

http://ckpt.wiki.kernel.org

I could take care of that for you if not. Perhaps it belongs under
the "Checklist"?

> The question is whether the transition 64 -> 32 (or 32 ->64) from
> the 'restart' program to the restarting task should happen in the
> kernel as part of sys_restart(), or in user space using an execve()
> syscall before calling sys_restart().

The recent exec bug while switching personalities highlights the value,
in my opinion, of keeping these transitions out of the restart syscall.
There's great potential for nasty, long-term bugs in any code that
deals with those kinds of switches. Keeping that code "in one place" is
the best way to avoid adding similar bugs.

> Doing so in user space is not trivial when threads are involved,
> since the exec must then happen before the creation of threads (or
> it will kill them). This will complicate the implementation of the
> MakeForest() algorithm which relies on all all descendents seeing
> the same data structures.

True -- MakeForest is already rather complicated.

As for seeing the same data structures across exec, perhaps we should
keep an fd open across exec and read/map the table from that. That means
converting from struct task* to indices in the table for one thing. I
have some RFC patches for that. It also means the table contents have to
use the same layout between 32 and 64-bit -- also quite easy.

What I couldn't see was a good place to do the exec itself.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                 ` <20100206062650.GG3714-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
@ 2010-02-06 15:43                                   ` Oren Laadan
  2010-02-08 17:40                                   ` Oren Laadan
  1 sibling, 0 replies; 16+ messages in thread
From: Oren Laadan @ 2010-02-06 15:43 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Linux Containers

On Fri, 5 Feb 2010, Matt Helsley wrote:

> On Fri, Feb 05, 2010 at 08:04:12PM -0500, Oren Laadan wrote:
> > 
> > 
> > Serge E. Hallyn wrote:
> > > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> > >>
> > >> Serge E. Hallyn wrote:
> > >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> > >>>> Serge E. Hallyn wrote:
> > >>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> > >>>>>> Cool !
> > >>>>>>
> > >>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
> > >>>>>> we know it works...):
> > >>>>>>
> > >>>>>> 	'restart'	checkpointed
> > >>>>>> 	 program	  program
> > >>>>>> 	----------------------------------------
> > >>>>>> 	  64bit		  64bit		-> works
> > >>>>>> 	  32bit		  32bit		-> works
> > >>>>>>
> > >>>>>> 	  64bit		  32bit		-> ?????
> > > 
> > > s/?????/Rejected/
> > > 
> > > CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> > > we refuse restart in restore_read_header().
> > > 
> > > -serge
> > > 
> > 
> > lol ... that's actually funny !
> > 
> > Anyway, in light of the IRC discussions, here are the cases again:
> > 
> > 
> > original	original	restart		target
> > program		kernel		program		kernel
> > --------	---------	--------	--------
> > 64 bit		64 bit		64 bit		64 bit	  [0] works
> > 
> > 32 bit		32 bit		32 bit		32 bit	  [0] works
> > 32 bit		64 bit		32 bit		64 bit	  [0] works
> > 
> > 32 bit		32 bit		32 bit		64 bit	  [1]
> > 32 bit		64 bit		32 bit		32 bit	  [1]
> > 
> > 32 bit		any		64 bit		64 bit	  [2]
> > 64 bit		64 bit		32 bit		64 bit	  [2]
> > 
> > [0] The first 3 cases are "homogeneous", with conditions equal at
> > checkpoint and restart. AFAIK, they work.
> > 
> > [1] The next two cases consider 32 bit program, and vary only the
> > environment - the kernel may change from 32 to 64 or back. We want
> > them to work.
> > 
> > IIUC, your comment above means that they don't work because the
> > CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> > make 'restart' modify it, or make the kernel tolerate it.
> > 
> > [2] The last two cases consider the case when the restart program
> > itself has different bit-ness than the checkpointed program (and
> > transition may occur in either direction). While lower priority,
> > we would like this to work, too.
> 
> Great table. Is it posted in the ckpt wiki too?
> 
> http://ckpt.wiki.kernel.org
> 
> I could take care of that for you if not. Perhaps it belongs under
> the "Checklist"?

Yep, I'll add it. 

> > The question is whether the transition 64 -> 32 (or 32 ->64) from
> > the 'restart' program to the restarting task should happen in the
> > kernel as part of sys_restart(), or in user space using an execve()
> > syscall before calling sys_restart().
> 
> The recent exec bug while switching personalities highlights the value,
> in my opinion, of keeping these transitions out of the restart syscall.
> There's great potential for nasty, long-term bugs in any code that
> deals with those kinds of switches. Keeping that code "in one place" is
> the best way to avoid adding similar bugs.

I agree with the concern.

There is even a stronger argument: doing it in user-space via exec
will rid the need to repeat the kernel code for different archs.

The only caveat is that restarting mixes 32- and 64-bit programs 
will be slower because fo the exec calls. Then again, it can be
optimized when someone complains (and provides kernel code for this).

> 
> > Doing so in user space is not trivial when threads are involved,
> > since the exec must then happen before the creation of threads (or
> > it will kill them). This will complicate the implementation of the
> > MakeForest() algorithm which relies on all all descendents seeing
> > the same data structures.
> 
> True -- MakeForest is already rather complicated.
> 
> As for seeing the same data structures across exec, perhaps we should
> keep an fd open across exec and read/map the table from that. That means
> converting from struct task* to indices in the table for one thing. I
> have some RFC patches for that. It also means the table contents have to
> use the same layout between 32 and 64-bit -- also quite easy.
> 
> What I couldn't see was a good place to do the exec itself.

Can you send me the patches that you already have ?

There are a couple of more details:

* Checkpoint needs to also record the bit-ness of each process in the
tasks table

* On restart, if all tasks have same bit-ness, then one exec at most is
needed.

* Where to place the exec ?    ideally for process (without threads)
it would happen right before sys_restart(), but for processes that
have threads it should happen right before the fork().

* Passing an fd is a good idea - anonymous shared memory should do
the trick.

* I'm a bit concerned about security, because 'restart' will likely
be a setuid utility:
 1) need to ensure that we are exec'ing the right program (e.g. use
   the /proc/self/exe link)
 2) a new switch to 'restart' will say "use fd N for the data", but
   a user may provide arbitrary data - can they do harm ?  we'll 
   definitely need to be more cautious in handling the tasks array
   because in this case we don't construct it ourselves.

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                             ` <4B6CC00C.2090509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2010-02-06  6:26                               ` Matt Helsley
@ 2010-02-06 17:09                               ` Serge E. Hallyn
       [not found]                                 ` <20100206170902.GA20497-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-02-06 17:09 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >> Serge E. Hallyn wrote:
> >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>> Serge E. Hallyn wrote:
> >>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>> Cool !
> >>>>>>
> >>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
> >>>>>> we know it works...):
> >>>>>>
> >>>>>> 	'restart'	checkpointed
> >>>>>> 	 program	  program
> >>>>>> 	----------------------------------------
> >>>>>> 	  64bit		  64bit		-> works
> >>>>>> 	  32bit		  32bit		-> works
> >>>>>>
> >>>>>> 	  64bit		  32bit		-> ?????
> > 
> > s/?????/Rejected/
> > 
> > CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> > we refuse restart in restore_read_header().
> > 
> > -serge
> > 
> 
> lol ... that's actually funny !
> 
> Anyway, in light of the IRC discussions, here are the cases again:
> 
> 
> original	original	restart		target
> program		kernel		program		kernel
> --------	---------	--------	--------
> 64 bit		64 bit		64 bit		64 bit	  [0] works
> 
> 32 bit		32 bit		32 bit		32 bit	  [0] works
> 32 bit		64 bit		32 bit		64 bit	  [0] works
> 
> 32 bit		32 bit		32 bit		64 bit	  [1]
> 32 bit		64 bit		32 bit		32 bit	  [1]
> 
> 32 bit		any		64 bit		64 bit	  [2]
> 64 bit		64 bit		32 bit		64 bit	  [2]
> 
> [0] The first 3 cases are "homogeneous", with conditions equal at
> checkpoint and restart. AFAIK, they work.
> 
> [1] The next two cases consider 32 bit program, and vary only the
> environment - the kernel may change from 32 to 64 or back. We want
> them to work.
> 
> IIUC, your comment above means that they don't work because the
> CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> make 'restart' modify it, or make the kernel tolerate it.

Well, you'd think so, but we also check for uts->machine, and want
to eventually check for kernel config, both of which are obviously
different.

After I comment out the obvious offending checks, it still fails to
restart from x8632->x86-64.  I can spend some time next week figuring
out what we're not quite doing right as there shouldn't be a
problem really.  But do we definately want to go out of our way to try
 and mask out the differences in this case, while trying to detect
cpu differences between two x86-32's for instance?

-serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                 ` <20100206170902.GA20497-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2010-02-08 14:43                                   ` Oren Laadan
       [not found]                                     ` <4B7022FE.4060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-02-08 14:43 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> Serge E. Hallyn wrote:
>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>> Serge E. Hallyn wrote:
>>>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>>>> Cool !
>>>>>>>>
>>>>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
>>>>>>>> we know it works...):
>>>>>>>>
>>>>>>>> 	'restart'	checkpointed
>>>>>>>> 	 program	  program
>>>>>>>> 	----------------------------------------
>>>>>>>> 	  64bit		  64bit		-> works
>>>>>>>> 	  32bit		  32bit		-> works
>>>>>>>>
>>>>>>>> 	  64bit		  32bit		-> ?????
>>> s/?????/Rejected/
>>>
>>> CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
>>> we refuse restart in restore_read_header().
>>>
>>> -serge
>>>
>> lol ... that's actually funny !
>>
>> Anyway, in light of the IRC discussions, here are the cases again:
>>
>>
>> original	original	restart		target
>> program		kernel		program		kernel
>> --------	---------	--------	--------
>> 64 bit		64 bit		64 bit		64 bit	  [0] works
>>
>> 32 bit		32 bit		32 bit		32 bit	  [0] works
>> 32 bit		64 bit		32 bit		64 bit	  [0] works
>>
>> 32 bit		32 bit		32 bit		64 bit	  [1]
>> 32 bit		64 bit		32 bit		32 bit	  [1]
>>
>> 32 bit		any		64 bit		64 bit	  [2]
>> 64 bit		64 bit		32 bit		64 bit	  [2]
>>
>> [0] The first 3 cases are "homogeneous", with conditions equal at
>> checkpoint and restart. AFAIK, they work.
>>
>> [1] The next two cases consider 32 bit program, and vary only the
>> environment - the kernel may change from 32 to 64 or back. We want
>> them to work.
>>
>> IIUC, your comment above means that they don't work because the
>> CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
>> make 'restart' modify it, or make the kernel tolerate it.
> 
> Well, you'd think so, but we also check for uts->machine, and want
> to eventually check for kernel config, both of which are obviously
> different.

Then we'll have to take that in account when we get to also
check those other fields.

> 
> After I comment out the obvious offending checks, it still fails to
> restart from x8632->x86-64.  I can spend some time next week figuring
> out what we're not quite doing right as there shouldn't be a
> problem really.  But do we definately want to go out of our way to try
>  and mask out the differences in this case, while trying to detect
> cpu differences between two x86-32's for instance?

I agree, there shouldn't be a problem really, and I expect this to
be a very useful feature for migration/fault-tolerance.

Checking for differences between CPU's is a separate issue, and is
orthogonal to migration (of 32bit programs) between 32 and 64 bit
kernels.

I tend to answer "yes" - we should eventually refuse restart if we
detect that the "configuration" at restart time differs from that at
checkpoint time "sufficiently".

Now "configuration" is very basic - the architecture. I would like
that to also include cpu features, kernel features, fpu capabilities...
Also "sufficiently" is vaguely defined, because I don't know enough
to describe it more precisely.

Ideally there will be some clever user-space logic that will detect
and make a decision. And, yes, it will take a lot of details...

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                     ` <4B7022FE.4060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-02-08 15:31                                       ` Serge E. Hallyn
       [not found]                                         ` <20100208153145.GB9120-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Serge E. Hallyn @ 2010-02-08 15:31 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> >Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >>Serge E. Hallyn wrote:
> >>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>Serge E. Hallyn wrote:
> >>>>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>>Serge E. Hallyn wrote:
> >>>>>>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>>>>Cool !
> >>>>>>>>
> >>>>>>>>So what do we have working now for 64 bit kernel (for 32 bit kernel
> >>>>>>>>we know it works...):
> >>>>>>>>
> >>>>>>>>	'restart'	checkpointed
> >>>>>>>>	 program	  program
> >>>>>>>>	----------------------------------------
> >>>>>>>>	  64bit		  64bit		-> works
> >>>>>>>>	  32bit		  32bit		-> works
> >>>>>>>>
> >>>>>>>>	  64bit		  32bit		-> ?????
> >>>s/?????/Rejected/
> >>>
> >>>CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> >>>we refuse restart in restore_read_header().
> >>>
> >>>-serge
> >>>
> >>lol ... that's actually funny !
> >>
> >>Anyway, in light of the IRC discussions, here are the cases again:
> >>
> >>
> >>original	original	restart		target
> >>program		kernel		program		kernel
> >>--------	---------	--------	--------
> >>64 bit		64 bit		64 bit		64 bit	  [0] works
> >>
> >>32 bit		32 bit		32 bit		32 bit	  [0] works
> >>32 bit		64 bit		32 bit		64 bit	  [0] works
> >>
> >>32 bit		32 bit		32 bit		64 bit	  [1]
> >>32 bit		64 bit		32 bit		32 bit	  [1]
> >>
> >>32 bit		any		64 bit		64 bit	  [2]
> >>64 bit		64 bit		32 bit		64 bit	  [2]
> >>
> >>[0] The first 3 cases are "homogeneous", with conditions equal at
> >>checkpoint and restart. AFAIK, they work.
> >>
> >>[1] The next two cases consider 32 bit program, and vary only the
> >>environment - the kernel may change from 32 to 64 or back. We want
> >>them to work.
> >>
> >>IIUC, your comment above means that they don't work because the
> >>CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> >>make 'restart' modify it, or make the kernel tolerate it.
> >
> >Well, you'd think so, but we also check for uts->machine, and want
> >to eventually check for kernel config, both of which are obviously
> >different.
> 
> Then we'll have to take that in account when we get to also
> check those other fields.
> 
> >
> >After I comment out the obvious offending checks, it still fails to
> >restart from x8632->x86-64.  I can spend some time next week figuring
> >out what we're not quite doing right as there shouldn't be a
> >problem really.  But do we definately want to go out of our way to try
> > and mask out the differences in this case, while trying to detect
> >cpu differences between two x86-32's for instance?
> 
> I agree, there shouldn't be a problem really, and I expect this to
> be a very useful feature for migration/fault-tolerance.

May be, but then perhaps this is the first case where we should be
using a userspace checkpoing image rewriter to help us out.  Otherwise
we'll need to hardcode in the kernel that a task which was
checkpointed on X86_32 should, on x86_64, have TIF_IA32 added to
the thread_flags but may be restarted;  etc.  Should be doable, but
kind of ugly...

-serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                         ` <20100208153145.GB9120-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-02-08 16:17                                           ` Oren Laadan
       [not found]                                             ` <4B703936.3010200-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Oren Laadan @ 2010-02-08 16:17 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> Serge E. Hallyn wrote:
>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>> Serge E. Hallyn wrote:
>>>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>>>> Serge E. Hallyn wrote:
>>>>>>>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>>>>>>>> Cool !
>>>>>>>>>>
>>>>>>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
>>>>>>>>>> we know it works...):
>>>>>>>>>>
>>>>>>>>>> 	'restart'	checkpointed
>>>>>>>>>> 	 program	  program
>>>>>>>>>> 	----------------------------------------
>>>>>>>>>> 	  64bit		  64bit		-> works
>>>>>>>>>> 	  32bit		  32bit		-> works
>>>>>>>>>>
>>>>>>>>>> 	  64bit		  32bit		-> ?????
>>>>> s/?????/Rejected/
>>>>>
>>>>> CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
>>>>> we refuse restart in restore_read_header().
>>>>>
>>>>> -serge
>>>>>
>>>> lol ... that's actually funny !
>>>>
>>>> Anyway, in light of the IRC discussions, here are the cases again:
>>>>
>>>>
>>>> original	original	restart		target
>>>> program		kernel		program		kernel
>>>> --------	---------	--------	--------
>>>> 64 bit		64 bit		64 bit		64 bit	  [0] works
>>>>
>>>> 32 bit		32 bit		32 bit		32 bit	  [0] works
>>>> 32 bit		64 bit		32 bit		64 bit	  [0] works
>>>>
>>>> 32 bit		32 bit		32 bit		64 bit	  [1]
>>>> 32 bit		64 bit		32 bit		32 bit	  [1]
>>>>
>>>> 32 bit		any		64 bit		64 bit	  [2]
>>>> 64 bit		64 bit		32 bit		64 bit	  [2]
>>>>
>>>> [0] The first 3 cases are "homogeneous", with conditions equal at
>>>> checkpoint and restart. AFAIK, they work.
>>>>
>>>> [1] The next two cases consider 32 bit program, and vary only the
>>>> environment - the kernel may change from 32 to 64 or back. We want
>>>> them to work.
>>>>
>>>> IIUC, your comment above means that they don't work because the
>>>> CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
>>>> make 'restart' modify it, or make the kernel tolerate it.
      ^^^^^^^^^^^^^^^^^^^^^^^^
---->

>>> Well, you'd think so, but we also check for uts->machine, and want
>>> to eventually check for kernel config, both of which are obviously
>>> different.
>> Then we'll have to take that in account when we get to also
>> check those other fields.
>>
>>> After I comment out the obvious offending checks, it still fails to
>>> restart from x8632->x86-64.  I can spend some time next week figuring
>>> out what we're not quite doing right as there shouldn't be a
>>> problem really.  But do we definately want to go out of our way to try
>>> and mask out the differences in this case, while trying to detect
>>> cpu differences between two x86-32's for instance?
>> I agree, there shouldn't be a problem really, and I expect this to
>> be a very useful feature for migration/fault-tolerance.
> 
> May be, but then perhaps this is the first case where we should be
> using a userspace checkpoing image rewriter to help us out.  Otherwise
> we'll need to hardcode in the kernel that a task which was
> checkpointed on X86_32 should, on x86_64, have TIF_IA32 added to
> the thread_flags but may be restarted;  etc.  Should be doable, but
> kind of ugly...

Indeed. I offered that path above :)

Since we are going to need the bit-ness of a task for the tree
creation as well, how about:

1) Add the bit-ness property to the pids_arr[], e.g. as a flags
field (we may need use it for other stuff later).

2) 'restart' already examines and possibly modifies pids_arr[],
so in transition from 32->64 it will add that flag, and in the
opposite transition it will check/remove that flag.

3) 'restart' will also change the header architecture as needed.

4) The kernel will verify that the bitness reported in pids_arr[]
is the same as the actual process. (This is just a sanity check,
of course).

Later we'll also make 'restart' use that bit-ness information to
decide whether an exec() is needed to change own bit-ness.

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                 ` <20100206062650.GG3714-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  2010-02-06 15:43                                   ` Oren Laadan
@ 2010-02-08 17:40                                   ` Oren Laadan
  1 sibling, 0 replies; 16+ messages in thread
From: Oren Laadan @ 2010-02-08 17:40 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Linux Containers, Nathan Lynch


[snip]

>> original	original	restart		target
>> program		kernel		program		kernel
>> --------	---------	--------	--------
>> 64 bit		64 bit		64 bit		64 bit	  [0] works
>>
>> 32 bit		32 bit		32 bit		32 bit	  [0] works
>> 32 bit		64 bit		32 bit		64 bit	  [0] works
>>
>> 32 bit		32 bit		32 bit		64 bit	  [1]
>> 32 bit		64 bit		32 bit		32 bit	  [1]
>>
>> 32 bit		any		64 bit		64 bit	  [2]
>> 64 bit		64 bit		32 bit		64 bit	  [2]
>>
>> [0] The first 3 cases are "homogeneous", with conditions equal at
>> checkpoint and restart. AFAIK, they work.
>>
>> [1] The next two cases consider 32 bit program, and vary only the
>> environment - the kernel may change from 32 to 64 or back. We want
>> them to work.
>>
>> IIUC, your comment above means that they don't work because the
>> CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
>> make 'restart' modify it, or make the kernel tolerate it.
>>
>> [2] The last two cases consider the case when the restart program
>> itself has different bit-ness than the checkpointed program (and
>> transition may occur in either direction). While lower priority,
>> we would like this to work, too.
> 
> Great table. Is it posted in the ckpt wiki too?
> 

I updated the wiki:
     http://ckpt.wiki.kernel.org/index.php/Architecture

For powerpc I just assumed it's like x86 ... :)
Can you please approve or modify for archs other than x86-{32,64} ?

Thanks,

Oren.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
       [not found]                                             ` <4B703936.3010200-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-02-09 14:54                                               ` Serge E. Hallyn
  0 siblings, 0 replies; 16+ messages in thread
From: Serge E. Hallyn @ 2010-02-09 14:54 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> >Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >>Serge E. Hallyn wrote:
> >>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>Serge E. Hallyn wrote:
> >>>>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>>Serge E. Hallyn wrote:
> >>>>>>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>>>>Serge E. Hallyn wrote:
> >>>>>>>>>Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>>>>>>>>Cool !
> >>>>>>>>>>
> >>>>>>>>>>So what do we have working now for 64 bit kernel (for 32 bit kernel
> >>>>>>>>>>we know it works...):
> >>>>>>>>>>
> >>>>>>>>>>	'restart'	checkpointed
> >>>>>>>>>>	 program	  program
> >>>>>>>>>>	----------------------------------------
> >>>>>>>>>>	  64bit		  64bit		-> works
> >>>>>>>>>>	  32bit		  32bit		-> works
> >>>>>>>>>>
> >>>>>>>>>>	  64bit		  32bit		-> ?????
> >>>>>s/?????/Rejected/
> >>>>>
> >>>>>CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> >>>>>we refuse restart in restore_read_header().
> >>>>>
> >>>>>-serge
> >>>>>
> >>>>lol ... that's actually funny !
> >>>>
> >>>>Anyway, in light of the IRC discussions, here are the cases again:
> >>>>
> >>>>
> >>>>original	original	restart		target
> >>>>program		kernel		program		kernel
> >>>>--------	---------	--------	--------
> >>>>64 bit		64 bit		64 bit		64 bit	  [0] works
> >>>>
> >>>>32 bit		32 bit		32 bit		32 bit	  [0] works
> >>>>32 bit		64 bit		32 bit		64 bit	  [0] works
> >>>>
> >>>>32 bit		32 bit		32 bit		64 bit	  [1]
> >>>>32 bit		64 bit		32 bit		32 bit	  [1]
> >>>>
> >>>>32 bit		any		64 bit		64 bit	  [2]
> >>>>64 bit		64 bit		32 bit		64 bit	  [2]
> >>>>
> >>>>[0] The first 3 cases are "homogeneous", with conditions equal at
> >>>>checkpoint and restart. AFAIK, they work.
> >>>>
> >>>>[1] The next two cases consider 32 bit program, and vary only the
> >>>>environment - the kernel may change from 32 to 64 or back. We want
> >>>>them to work.
> >>>>
> >>>>IIUC, your comment above means that they don't work because the
> >>>>CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> >>>>make 'restart' modify it, or make the kernel tolerate it.
>      ^^^^^^^^^^^^^^^^^^^^^^^^
> ---->
> 
> >>>Well, you'd think so, but we also check for uts->machine, and want
> >>>to eventually check for kernel config, both of which are obviously
> >>>different.
> >>Then we'll have to take that in account when we get to also
> >>check those other fields.
> >>
> >>>After I comment out the obvious offending checks, it still fails to
> >>>restart from x8632->x86-64.  I can spend some time next week figuring
> >>>out what we're not quite doing right as there shouldn't be a
> >>>problem really.  But do we definately want to go out of our way to try
> >>>and mask out the differences in this case, while trying to detect
> >>>cpu differences between two x86-32's for instance?
> >>I agree, there shouldn't be a problem really, and I expect this to
> >>be a very useful feature for migration/fault-tolerance.
> >
> >May be, but then perhaps this is the first case where we should be
> >using a userspace checkpoing image rewriter to help us out.  Otherwise
> >we'll need to hardcode in the kernel that a task which was
> >checkpointed on X86_32 should, on x86_64, have TIF_IA32 added to
> >the thread_flags but may be restarted;  etc.  Should be doable, but
> >kind of ugly...
> 
> Indeed. I offered that path above :)
> 
> Since we are going to need the bit-ness of a task for the tree
> creation as well, how about:
> 
> 1) Add the bit-ness property to the pids_arr[], e.g. as a flags
> field (we may need use it for other stuff later).
> 
> 2) 'restart' already examines and possibly modifies pids_arr[],
> so in transition from 32->64 it will add that flag, and in the
> opposite transition it will check/remove that flag.
>
> 3) 'restart' will also change the header architecture as needed.
> 
> 4) The kernel will verify that the bitness reported in pids_arr[]
> is the same as the actual process. (This is just a sanity check,
> of course).
> 
> Later we'll also make 'restart' use that bit-ness information to
> decide whether an exec() is needed to change own bit-ness.

It'll mean yet another arch-dependent hook used early in the
checkpoint path, but if we want to restarted mixed-bit containers
i guess it's what we'll need.

Still I really don't think it's all that mean to just say we
don't support it:  at checkpoint we refuse with a meaningful
log message including pids of task which are COMPAT, and the
end-user can use that info to checkpoint those applications
separately as subtrees, kill them, then checkpoint the container,
then restart the applications.

If to my surprise there turn out to be people who care, then
we can make the necessary changes to accomodate them.  But IMO
we have enough to worry about right now.

-serge

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-02-09 14:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-27  7:16 [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64 Serge E. Hallyn
     [not found] ` <20100127071636.GA16624-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-01-27 14:59   ` Oren Laadan
     [not found]     ` <Pine.LNX.4.64.1001270954120.8974-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-01-27 20:10       ` Serge E. Hallyn
     [not found]         ` <20100127201037.GA23119-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-01-27 20:51           ` Oren Laadan
     [not found]             ` <4B60A763.4030806-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-01-27 21:10               ` Serge E. Hallyn
     [not found]                 ` <20100127211052.GA27579-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-01-27 21:13                   ` Oren Laadan
     [not found]                     ` <4B60AC7E.2010908-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-05 23:38                       ` Serge E. Hallyn
     [not found]                         ` <20100205233800.GA17057-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-06  1:04                           ` Oren Laadan
     [not found]                             ` <4B6CC00C.2090509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-06  6:26                               ` Matt Helsley
     [not found]                                 ` <20100206062650.GG3714-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-02-06 15:43                                   ` Oren Laadan
2010-02-08 17:40                                   ` Oren Laadan
2010-02-06 17:09                               ` Serge E. Hallyn
     [not found]                                 ` <20100206170902.GA20497-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-02-08 14:43                                   ` Oren Laadan
     [not found]                                     ` <4B7022FE.4060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-08 15:31                                       ` Serge E. Hallyn
     [not found]                                         ` <20100208153145.GB9120-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-08 16:17                                           ` Oren Laadan
     [not found]                                             ` <4B703936.3010200-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-09 14:54                                               ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.