linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: containers@lists.linux-foundation.org, jeremy@goop.org,
	linux-kernel@vger.kernel.org, arnd@arndb.de
Subject: Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)
Date: Tue, 09 Sep 2008 02:01:30 -0400	[thread overview]
Message-ID: <48C6113A.3080804@cs.columbia.edu> (raw)
In-Reply-To: <1220892596.23386.166.camel@nimitz>



Dave Hansen wrote:
> On Sat, 2008-09-06 at 23:09 -0400, Oren Laadan wrote:

[...]

>>>> +	if (writable && !(vma->vm_flags & VM_WRITE))
>>>> +		flags = vma->vm_flags | VM_WRITE;
>>>> +	else if (!writable && (vma->vm_flags & VM_WRITE))
>>>> +		flags = vma->vm_flags & ~VM_WRITE;
>>>> +	cr_debug("flags %#lx\n", flags);
>>>> +	if (flags)
>>>> +		ret = mprotect_fixup(vma, &prev, vma->vm_start,
>>>> +				     vma->vm_end, flags);
>>> Is this to fixup the same things that setup_arg_pages() uses it for?  We
>>> should probably consolidate those calls somehow.  
>> This is needed for a VMA that has modified pages but is read-only: e.g. a
>> task modifies memory then calls mprotect() to make it read-only.
>> Since during restart we will recreate this VMA as read-only, we need to
>> temporarily make it read-write to be able to read the saved contents into
>> it, and then restore the read-only protection.
>>
>> setup_arg_pages() is unrelated, and the code doesn't seem to have much in
>> common.
> 
> Have you looked at mprotect_fixup()?  It deals with two things:
> 1. altering the commit charge against RSS if the mapping is actually
>    writable.
> 2. Merging the VMA with an adjacent one if possible
> 
> We don't want to do either of these two things.  Even if we do merge the
> VMA, it will be a waste of time and energy since we'll just re-split it
> when we mprotect() again.

Your observation is correct; I chose this interface because it's really
simple and handy. I'm not worried about the performance because such VMAs
(read only but modified) are really rare, and the code can be optimized
later on.

[...]

>>>> +	/* restore original protection for this vma */
>>>> +	if (!(cr_vma->vm_flags & VM_WRITE))
>>>> +		ret = cr_vma_writable(mm, cr_vma->vm_start, cr_vma->vm_end, 0);
>>>> +
>>>> + out:
>>>> +	return ret;
>>>> +}
>>> Ugh.  Is this a security hole?  What if the user was not allowed to
>>> write to the file being mmap()'d by this VMA?  Is this a window where
>>> someone could come in and (using ptrace or something similar) write to
>>> the file?
>> Not a security hole: this is only for private memory, so it never
>> modifies the underlying file. This is related to what I explained before
>> about read-only VMAs that have modified pages.
> 
> OK, so a shared, read-only mmap() should never get into this code path.
> What if an attacker modified the checkpoint file to pretend to have
> pages for a read-only, but shared mmap().  Would this code be tricked?

VMAs of shared maps (IPC, anonymous shared) will be treated differently.

VMAs of shared files (mapped shared) are saved without their contents,
as the contents remains available on the file system !  (yes, for that
we will eventually need file system snapshots).

As for an attack that provides an altered checkpoint image: since we
(currently) don't escalate privileges, the attacker will not be able
to modify something that it doesn't have access to in the first place.

> 
>> The process is restarting, inside a container that is restarting. All
>> tasks inside should be calling sys_restart() (by design) and no other
>> process from outside should be allowed to ptrace them at this point.
> 
> Are there plans to implement this, or is it already in here somehow?

Once we get positive responses about the current patchset, the next
step is to handle multiple processes: I plan to extend the freezer
with two more state for this purpose (dumping, restarting).

> 
>> (In any case, if some other tasks ptraces this task, it can make it do
>> anything anyhow).
> 
> No.  I'm suggesting that since this lets you effectively write to
> something that is not writable, it may be a hole with which to bypass
> permissions which were set up at an earlier time.

That's a good comment, but here all we are doing here is to modify a
privately mapped/anonymous memory.

> 
>>> We copy into the process address space all the time when not in its
>>> context explicitly.  
>> Huh ?
> 
> I'm just saying that you don't need to be in a process's context in
> order to copy contents into its virtual address space.  Check out
> access_process_vm().
> 

That would be the other way to implement the restart. But, since restart
executes in task's context, it's simpler and more efficient to leverage
copy-to-user().
In terms of security, both methods brings about the same end results: the
memory is modified (perhaps bypassing the read-only property of the VMA)

Oren.

  reply	other threads:[~2008-09-09  6:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-04  7:57 [RFC v3][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-09-04  8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04  8:37   ` Cedric Le Goater
2008-09-04 14:42   ` Serge E. Hallyn
2008-09-04 17:32     ` Oren Laadan
2008-09-04 20:37       ` Serge E. Hallyn
2008-09-04 21:05         ` Oren Laadan
2008-09-04 22:03           ` Serge E. Hallyn
2008-09-08 15:02     ` [Devel] " Andrey Mirkin
2008-09-08 16:07       ` Cedric Le Goater
2008-09-04  8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04  9:12   ` Louis Rilling
2008-09-04 16:00     ` Serge E. Hallyn
2008-09-04 16:03   ` Serge E. Hallyn
2008-09-04 16:09     ` Dave Hansen
2008-09-04  8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04  8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04 18:25   ` Dave Hansen
2008-09-07  1:54     ` Oren Laadan
2008-09-08 15:55       ` Dave Hansen
2008-09-04  8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04 18:08   ` Dave Hansen
2008-09-07  3:09     ` Oren Laadan
2008-09-08 16:49       ` Dave Hansen
2008-09-09  6:01         ` Oren Laadan [this message]
2008-09-10 21:42           ` Dave Hansen
2008-09-10 22:00             ` Cleanups for: [PATCH " Dave Hansen
2008-09-11  7:37             ` [RFC v3][PATCH " Oren Laadan
2008-09-11 15:38               ` Serge E. Hallyn
2008-09-12 16:34               ` Dave Hansen
2008-09-04  8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04  8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04  9:38   ` Louis Rilling
2008-09-04 14:23     ` Oren Laadan
2008-09-04 18:14   ` Dave Hansen
2008-09-04  8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04  9:47   ` Louis Rilling
2008-09-04 14:43     ` Oren Laadan
2008-09-04 15:01   ` Dave Hansen
2008-09-04 18:41   ` Dave Hansen
2008-09-07  4:52     ` Oren Laadan
2008-09-08 16:57       ` Dave Hansen
2008-09-04  8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C6113A.3080804@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).