All of lore.kernel.org
 help / color / mirror / Atom feed
* [patches] VM-related fixes
@ 2012-03-05  6:37 Al Viro
  2012-03-05  6:38 ` [PATCH 1/3] aout: move setup_arg_pages() prior to reading/mapping the binary Al Viro
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Al Viro @ 2012-03-05  6:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, arve, airlied, carsteno, steiner

	I'd been crawling through VMA-related code for the last couple
of weeks; the obvious parts of fallout are in followups and IMO they
need to go into -stable as well; there's more, including some bugs I
don't know what to do about and patches that need more testing...

* aout handling needs to do setup_arg_pages() _before_ it does any mmap();
otherwise we might easily end up with bprm->vma freed by the time we
get to setup_arg_pages().  Sure, long-term we want setup_new_exec()
merged with setup_arg_pages(), but not this late in the cycle...

* anonymous shared mapping should *not* get VM_GROWS{UP,DOWN}, or we'll
end up with very unpleasant things happening.

* __unmap_hugepage_range() calls flush_tlb_range() without ->mmap_sem,
which means that we need ->page_table_lock.  Call it before dropping
->page_table_lock, not after that...

[unsolved] binder is insane, even more than usual for drivers/staging.
It stores a reference to mm at open() time, then it stores a reference to
vma at mmap() time, then it cheerfully works with that ->vma assuming that
->mmap_sem on stored reference to ->mm would suffice.  Guess what happens
if somebody opens it and then forks and does mmap in child?  Moreover, if
we fork() and have child exit(), we get ->close() called on each VMA we'd
copied into child.  Since they have stored reference to vma invalidated by
->close(), that has unpleasant side effects, to put it mildly.  And yes,
I realize that android userland probably doesn't do anything of that kind;
fat lot of good it does us...

[unsolved] a bunch of ->fault() instances are doing things that need
->mmap_sem exclusive; e.g. vm_insert_page() is called by xip_file_fault(),
drivers/misc/sgi-gru/grumain.c:gru_fault() does remap_pfn_range().
drivers/gpu/drm/gma500/framebuffer.c:psbfb_vm_fault() does
        vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
and drivers/gpu/drm/ttm/ttm_bo_vm.c:ttm_bo_vm_fault() does very similar
things, same for spufs ->fault() instances.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] aout: move setup_arg_pages() prior to reading/mapping the binary
  2012-03-05  6:37 [patches] VM-related fixes Al Viro
@ 2012-03-05  6:38 ` Al Viro
  2012-03-05  6:39 ` [PATCH 2/3] VM_GROWS{UP,DOWN} shouldn't be set on shmem VMAs Al Viro
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Al Viro @ 2012-03-05  6:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/x86/ia32/ia32_aout.c |   14 +++++++-------
 fs/binfmt_aout.c          |   14 +++++++-------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index fd84387..39e4909 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -315,6 +315,13 @@ static int load_aout_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 	current->mm->free_area_cache = TASK_UNMAPPED_BASE;
 	current->mm->cached_hole_size = 0;
 
+	retval = setup_arg_pages(bprm, IA32_STACK_TOP, EXSTACK_DEFAULT);
+	if (retval < 0) {
+		/* Someone check-me: is this error path enough? */
+		send_sig(SIGKILL, current, 0);
+		return retval;
+	}
+
 	install_exec_creds(bprm);
 	current->flags &= ~PF_FORKNOEXEC;
 
@@ -410,13 +417,6 @@ beyond_if:
 
 	set_brk(current->mm->start_brk, current->mm->brk);
 
-	retval = setup_arg_pages(bprm, IA32_STACK_TOP, EXSTACK_DEFAULT);
-	if (retval < 0) {
-		/* Someone check-me: is this error path enough? */
-		send_sig(SIGKILL, current, 0);
-		return retval;
-	}
-
 	current->mm->start_stack =
 		(unsigned long)create_aout_tables((char __user *)bprm->p, bprm);
 	/* start thread */
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index a6395bd..0b39e46 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -259,6 +259,13 @@ static int load_aout_binary(struct linux_binprm * bprm, struct pt_regs * regs)
 	current->mm->free_area_cache = current->mm->mmap_base;
 	current->mm->cached_hole_size = 0;
 
+	retval = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
+	if (retval < 0) { 
+		/* Someone check-me: is this error path enough? */ 
+		send_sig(SIGKILL, current, 0); 
+		return retval;
+	}
+
 	install_exec_creds(bprm);
  	current->flags &= ~PF_FORKNOEXEC;
 
@@ -352,13 +359,6 @@ beyond_if:
 		return retval;
 	}
 
-	retval = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
-	if (retval < 0) { 
-		/* Someone check-me: is this error path enough? */ 
-		send_sig(SIGKILL, current, 0); 
-		return retval;
-	}
-
 	current->mm->start_stack =
 		(unsigned long) create_aout_tables((char __user *) bprm->p, bprm);
 #ifdef __alpha__
-- 
1.7.2.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] VM_GROWS{UP,DOWN} shouldn't be set on shmem VMAs
  2012-03-05  6:37 [patches] VM-related fixes Al Viro
  2012-03-05  6:38 ` [PATCH 1/3] aout: move setup_arg_pages() prior to reading/mapping the binary Al Viro
@ 2012-03-05  6:39 ` Al Viro
  2012-03-05  6:40 ` [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held Al Viro
  2012-03-06  3:38 ` [patches] VM-related fixes Arve Hjønnevåg
  3 siblings, 0 replies; 13+ messages in thread
From: Al Viro @ 2012-03-05  6:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/mmap.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 3f758c7..22e1a0b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1293,6 +1293,8 @@ munmap_back:
 		pgoff = vma->vm_pgoff;
 		vm_flags = vma->vm_flags;
 	} else if (vm_flags & VM_SHARED) {
+		if (unlikely(vm_flags & (VM_GROWSDOWN|VM_GROWSUP)))
+			goto free_vma;
 		error = shmem_zero_setup(vma);
 		if (error)
 			goto free_vma;
-- 
1.7.2.5



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held
  2012-03-05  6:37 [patches] VM-related fixes Al Viro
  2012-03-05  6:38 ` [PATCH 1/3] aout: move setup_arg_pages() prior to reading/mapping the binary Al Viro
  2012-03-05  6:39 ` [PATCH 2/3] VM_GROWS{UP,DOWN} shouldn't be set on shmem VMAs Al Viro
@ 2012-03-05  6:40 ` Al Viro
  2012-03-05 20:30   ` Linus Torvalds
  2012-03-06  3:38 ` [patches] VM-related fixes Arve Hjønnevåg
  3 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2012-03-05  6:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/hugetlb.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5f34bd8..a876871 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2277,8 +2277,8 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			set_page_dirty(page);
 		list_add(&page->lru, &page_list);
 	}
-	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	spin_unlock(&mm->page_table_lock);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		page_remove_rmap(page);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held
  2012-03-05  6:40 ` [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held Al Viro
@ 2012-03-05 20:30   ` Linus Torvalds
  2012-03-05 20:53     ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2012-03-05 20:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel

Is this safe? And why does it need it? Please add more explanations.

                  Linus

On Sun, Mar 4, 2012 at 10:40 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  mm/hugetlb.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5f34bd8..a876871 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2277,8 +2277,8 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
>                        set_page_dirty(page);
>                list_add(&page->lru, &page_list);
>        }
> -       spin_unlock(&mm->page_table_lock);
>        flush_tlb_range(vma, start, end);
> +       spin_unlock(&mm->page_table_lock);
>        mmu_notifier_invalidate_range_end(mm, start, end);
>        list_for_each_entry_safe(page, tmp, &page_list, lru) {
>                page_remove_rmap(page);
> --
> 1.7.2.5
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held
  2012-03-05 20:30   ` Linus Torvalds
@ 2012-03-05 20:53     ` Al Viro
  2012-03-09 21:06       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2012-03-05 20:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Mon, Mar 05, 2012 at 12:30:19PM -0800, Linus Torvalds wrote:
> Is this safe? And why does it need it? Please add more explanations.

a) safety - as the matter of fact, all other callers either hold either
->mmap_sem (exclusive) or ->page_table_lock.  flush_tlb_range() is
called under ->page_table_lock in a lot of places, e.g.
page_referenced_one() -> pmdp_clear_flush_young_notify() ->
-> pmdp_clear_flush_young() -> flush_tlb_range(), with
                /* go ahead even if the pmd is pmd_trans_splitting() */
                if (pmdp_clear_flush_young_notify(vma, address, pmd))
                        referenced++;
                spin_unlock(&mm->page_table_lock);
in page_referenced_one().

b) there are instances that work with page tables.  See e.g.
arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and flush_range() in there.
The same goes for uml, with a lot more extensive playing with page tables.

Almost all callers are actually fine - flush_tlb_range() may have no need
to bother playing with page tables, but it can do so safely; again, this
caller is the sole exception - everything else either has exclusive ->mmap_sem
on the mm in question, or mm->page_table_lock is held.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patches] VM-related fixes
  2012-03-05  6:37 [patches] VM-related fixes Al Viro
                   ` (2 preceding siblings ...)
  2012-03-05  6:40 ` [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held Al Viro
@ 2012-03-06  3:38 ` Arve Hjønnevåg
  2012-03-06  4:10   ` Al Viro
  3 siblings, 1 reply; 13+ messages in thread
From: Arve Hjønnevåg @ 2012-03-06  3:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, airlied, carsteno, steiner

On Sun, Mar 4, 2012 at 10:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>        I'd been crawling through VMA-related code for the last couple
> of weeks; the obvious parts of fallout are in followups and IMO they
> need to go into -stable as well; there's more, including some bugs I
> don't know what to do about and patches that need more testing...
>
...
> [unsolved] binder is insane, even more than usual for drivers/staging.
> It stores a reference to mm at open() time,

It stores a reference to the task struct, not the mm.

> then it stores a reference to
> vma at mmap() time, then it cheerfully works with that ->vma assuming that
> ->mmap_sem on stored reference to ->mm would suffice.  Guess what happens
> if somebody opens it and then forks and does mmap in child?

mmap succeeds, but binder_update_page_range will fail when trying to
create a transaction? (assuming you are talking about the current
version of the driver)

>  Moreover, if
> we fork() and have child exit(), we get ->close() called on each VMA we'd
> copied into child.

The driver sets VM_DONTCOPY which should avoid this.

>  Since they have stored reference to vma invalidated by
> ->close(), that has unpleasant side effects, to put it mildly.

The side effect of ->close() should be the same as when the process
that opened the driver exits. In other words, the process that
misbehaved is considered dying. It does not however send death
notifications until the driver is closed, so other processes do not
notice until they try to talk to it.

>  And yes,
> I realize that android userland probably doesn't do anything of that kind;
> fat lot of good it does us...
>

What usage case do you have in mind? The binder driver uses the file
pointer as the binder process identifier, so you need to open and mmap
the driver in every process that want to use the driver. While it is
not common to fork a process in android, the driver should work as
expected if you do (assuming each process opens and mmaps the driver
on its own and don't share the file).

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patches] VM-related fixes
  2012-03-06  3:38 ` [patches] VM-related fixes Arve Hjønnevåg
@ 2012-03-06  4:10   ` Al Viro
  2012-03-06  5:25     ` Arve Hjønnevåg
  2012-03-08 23:43     ` [PATCH] Staging: android: binder: Fix use-after-free bug Arve Hjønnevåg
  0 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2012-03-06  4:10 UTC (permalink / raw)
  To: Arve Hj?nnev?g; +Cc: Linus Torvalds, linux-kernel, airlied, carsteno, steiner

On Mon, Mar 05, 2012 at 07:38:21PM -0800, Arve Hj?nnev?g wrote:
> > [unsolved] binder is insane, even more than usual for drivers/staging.
> > It stores a reference to mm at open() time,
> 
> It stores a reference to the task struct, not the mm.

Equivalent, for our purposes.  OK, almost - strictly speaking that avoids
some extra nasty scenarios with open/exec/mmap, but the real PITA comes from
much simpler open/fork/mmap in child anyway...

> > then it stores a reference to
> > vma at mmap() time, then it cheerfully works with that ->vma assuming that
> > ->mmap_sem on stored reference to ->mm would suffice. ?Guess what happens
> > if somebody opens it and then forks and does mmap in child?
> 
> mmap succeeds, but binder_update_page_range will fail when trying to
> create a transaction? (assuming you are talking about the current
> version of the driver)
 
Sorry, no.
                down_write(&mm->mmap_sem);
                vma = proc->vma;
                if (vma && mm != vma->vm_mm) {
does *not* do what you seem to describe; there's nothing to protect you
from proc->vma getting freed under you right between load from proc->vma
and check of vma->mm.  ->mmap_sem on the right mm would prevent that,
but this one doesn't guarantee anything.  Get preempted after the second
line quoted above and by the time you get the timeslice back, you might
have had munmap() done by another thread, with vma freed, its memory
recycled, etc.

> > ?Moreover, if
> > we fork() and have child exit(), we get ->close() called on each VMA we'd
> > copied into child.
> 
> The driver sets VM_DONTCOPY which should avoid this.

Umm...  Point taken - missed that.

> > ?Since they have stored reference to vma invalidated by
> > ->close(), that has unpleasant side effects, to put it mildly.
> 
> The side effect of ->close() should be the same as when the process
> that opened the driver exits. In other words, the process that
> misbehaved is considered dying. It does not however send death
> notifications until the driver is closed, so other processes do not
> notice until they try to talk to it.
> 
> > ?And yes,
> > I realize that android userland probably doesn't do anything of that kind;
> > fat lot of good it does us...
> >
> 
> What usage case do you have in mind? The binder driver uses the file
> pointer as the binder process identifier, so you need to open and mmap
> the driver in every process that want to use the driver. While it is
> not common to fork a process in android, the driver should work as
> expected if you do (assuming each process opens and mmaps the driver
> on its own and don't share the file).

Except that we can't make that assumption...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patches] VM-related fixes
  2012-03-06  4:10   ` Al Viro
@ 2012-03-06  5:25     ` Arve Hjønnevåg
  2012-03-06  5:57       ` Al Viro
  2012-03-08 23:43     ` [PATCH] Staging: android: binder: Fix use-after-free bug Arve Hjønnevåg
  1 sibling, 1 reply; 13+ messages in thread
From: Arve Hjønnevåg @ 2012-03-06  5:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, airlied, carsteno, steiner

On Mon, Mar 5, 2012 at 8:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Mar 05, 2012 at 07:38:21PM -0800, Arve Hj?nnev?g wrote:
>> > [unsolved] binder is insane, even more than usual for drivers/staging.
>> > It stores a reference to mm at open() time,
>>
>> It stores a reference to the task struct, not the mm.
>
> Equivalent, for our purposes.  OK, almost - strictly speaking that avoids
> some extra nasty scenarios with open/exec/mmap, but the real PITA comes from
> much simpler open/fork/mmap in child anyway...
>
>> > then it stores a reference to
>> > vma at mmap() time, then it cheerfully works with that ->vma assuming that
>> > ->mmap_sem on stored reference to ->mm would suffice. ?Guess what happens
>> > if somebody opens it and then forks and does mmap in child?
>>
>> mmap succeeds, but binder_update_page_range will fail when trying to
>> create a transaction? (assuming you are talking about the current
>> version of the driver)
>
> Sorry, no.
>                down_write(&mm->mmap_sem);
>                vma = proc->vma;
>                if (vma && mm != vma->vm_mm) {
> does *not* do what you seem to describe; there's nothing to protect you
> from proc->vma getting freed under you right between load from proc->vma
> and check of vma->mm.  ->mmap_sem on the right mm would prevent that,
> but this one doesn't guarantee anything.  Get preempted after the second
> line quoted above and by the time you get the timeslice back, you might
> have had munmap() done by another thread, with vma freed, its memory
> recycled, etc.
>

OK, if the memory got freed and then re-used by someone who stored a
value that matched a pointer to the mm struct that was just locked,
this check will fail to catch it. I can check against a cached vm_mm
member from mmap instead, assuming this will not change before
->close() is called. Does that sound reasonable, or is there a better
way to check this?

>> > ?Moreover, if
>> > we fork() and have child exit(), we get ->close() called on each VMA we'd
>> > copied into child.
>>
>> The driver sets VM_DONTCOPY which should avoid this.
>
> Umm...  Point taken - missed that.
>
>> > ?Since they have stored reference to vma invalidated by
>> > ->close(), that has unpleasant side effects, to put it mildly.
>>
>> The side effect of ->close() should be the same as when the process
>> that opened the driver exits. In other words, the process that
>> misbehaved is considered dying. It does not however send death
>> notifications until the driver is closed, so other processes do not
>> notice until they try to talk to it.
>>
>> > ?And yes,
>> > I realize that android userland probably doesn't do anything of that kind;
>> > fat lot of good it does us...
>> >
>>
>> What usage case do you have in mind? The binder driver uses the file
>> pointer as the binder process identifier, so you need to open and mmap
>> the driver in every process that want to use the driver. While it is
>> not common to fork a process in android, the driver should work as
>> expected if you do (assuming each process opens and mmaps the driver
>> on its own and don't share the file).
>
> Except that we can't make that assumption...

I'm not sure what your point is. You cannot expect the driver to
behave as expected if you don't follow its api. However, the driver
should prevent misbehaving processes from affecting other processes or
crashing the kernel. Apart from the situation described above, do you
know of any situation where this is not the case?

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patches] VM-related fixes
  2012-03-06  5:25     ` Arve Hjønnevåg
@ 2012-03-06  5:57       ` Al Viro
  2012-03-06  6:36         ` Arve Hjønnevåg
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2012-03-06  5:57 UTC (permalink / raw)
  To: Arve Hj?nnev?g; +Cc: Linus Torvalds, linux-kernel, airlied, carsteno, steiner

On Mon, Mar 05, 2012 at 09:25:05PM -0800, Arve Hj?nnev?g wrote:
> > Sorry, no.
> > ? ? ? ? ? ? ? ?down_write(&mm->mmap_sem);
> > ? ? ? ? ? ? ? ?vma = proc->vma;
> > ? ? ? ? ? ? ? ?if (vma && mm != vma->vm_mm) {
> > does *not* do what you seem to describe; there's nothing to protect you
> > from proc->vma getting freed under you right between load from proc->vma
> > and check of vma->mm. ?->mmap_sem on the right mm would prevent that,
> > but this one doesn't guarantee anything. ?Get preempted after the second
> > line quoted above and by the time you get the timeslice back, you might
> > have had munmap() done by another thread, with vma freed, its memory
> > recycled, etc.
> >
> 
> OK, if the memory got freed and then re-used by someone who stored a
> value that matched a pointer to the mm struct that was just locked,
> this check will fail to catch it. I can check against a cached vm_mm
> member from mmap instead, assuming this will not change before
> ->close() is called. Does that sound reasonable, or is there a better
> way to check this?

Huh?  Sorry, I hadn't been able to parse that - what do you want to cache,
where and what do you want to check?  Again, at that point *(proc->vma)
may very well be random garbage, so looking at it would be pointless;
the value you had at ->mmap() time would be simply current->mm of mmap(2)
caller; if you want to check that it matches that of opener, fine,
but then why not do just that in ->mmap()?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patches] VM-related fixes
  2012-03-06  5:57       ` Al Viro
@ 2012-03-06  6:36         ` Arve Hjønnevåg
  0 siblings, 0 replies; 13+ messages in thread
From: Arve Hjønnevåg @ 2012-03-06  6:36 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, airlied, carsteno, steiner

On Mon, Mar 5, 2012 at 9:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Mar 05, 2012 at 09:25:05PM -0800, Arve Hj?nnev?g wrote:
>> > Sorry, no.
>> > ? ? ? ? ? ? ? ?down_write(&mm->mmap_sem);
>> > ? ? ? ? ? ? ? ?vma = proc->vma;
>> > ? ? ? ? ? ? ? ?if (vma && mm != vma->vm_mm) {
>> > does *not* do what you seem to describe; there's nothing to protect you
>> > from proc->vma getting freed under you right between load from proc->vma
>> > and check of vma->mm. ?->mmap_sem on the right mm would prevent that,
>> > but this one doesn't guarantee anything. ?Get preempted after the second
>> > line quoted above and by the time you get the timeslice back, you might
>> > have had munmap() done by another thread, with vma freed, its memory
>> > recycled, etc.
>> >
>>
>> OK, if the memory got freed and then re-used by someone who stored a
>> value that matched a pointer to the mm struct that was just locked,
>> this check will fail to catch it. I can check against a cached vm_mm
>> member from mmap instead, assuming this will not change before
>> ->close() is called. Does that sound reasonable, or is there a better
>> way to check this?
>
> Huh?  Sorry, I hadn't been able to parse that - what do you want to cache,

The vm_mm member of the vma.

> where and what do you want to check?  Again, at that point *(proc->vma)
> may very well be random garbage, so looking at it would be pointless;

Yes it may be random garbage, but we only have a problem if that
random garbage happens to match the mm pointer we just used.

> the value you had at ->mmap() time would be simply current->mm of mmap(2)
> caller; if you want to check that it matches that of opener, fine,
> but then why not do just that in ->mmap()?

If the opener calls exec does the mm returned by get_task_mm change?
If so, the mmap_sem I locked will not match the vma saved in mmap() so
I need the check above. If exec creates a new task struct then a check
in mmap would be sufficient, but I was not under the impression that
this happens.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Staging: android: binder: Fix use-after-free bug
  2012-03-06  4:10   ` Al Viro
  2012-03-06  5:25     ` Arve Hjønnevåg
@ 2012-03-08 23:43     ` Arve Hjønnevåg
  1 sibling, 0 replies; 13+ messages in thread
From: Arve Hjønnevåg @ 2012-03-08 23:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, gregkh, Arve Hjønnevåg

binder_update_page_range could read freed memory if the vma of the
selected process was freed right before the check that the vma
belongs to the mm struct it just locked.

If the vm_mm pointer in that freed vma struct had also been rewritten
with a value that matched the locked mm struct, then the code would
proceed and possibly modify the freed vma.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
 drivers/staging/android/binder.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/android/binder.c b/drivers/staging/android/binder.c
index 4350425..be42bf2 100644
--- a/drivers/staging/android/binder.c
+++ b/drivers/staging/android/binder.c
@@ -288,6 +288,7 @@ struct binder_proc {
 	struct rb_root refs_by_node;
 	int pid;
 	struct vm_area_struct *vma;
+	struct mm_struct *vma_vm_mm;
 	struct task_struct *tsk;
 	struct files_struct *files;
 	struct hlist_node deferred_work_node;
@@ -633,7 +634,7 @@ static int binder_update_page_range(struct binder_proc *proc, int allocate,
 	if (mm) {
 		down_write(&mm->mmap_sem);
 		vma = proc->vma;
-		if (vma && mm != vma->vm_mm) {
+		if (vma && mm != proc->vma_vm_mm) {
 			pr_err("binder: %d: vma mm and task mm mismatch\n",
 				proc->pid);
 			vma = NULL;
@@ -2776,6 +2777,7 @@ static void binder_vma_close(struct vm_area_struct *vma)
 		     (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,
 		     (unsigned long)pgprot_val(vma->vm_page_prot));
 	proc->vma = NULL;
+	proc->vma_vm_mm = NULL;
 	binder_defer_work(proc, BINDER_DEFERRED_PUT_FILES);
 }
 
@@ -2858,6 +2860,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
 	barrier();
 	proc->files = get_files_struct(proc->tsk);
 	proc->vma = vma;
+	proc->vma_vm_mm = vma->vm_mm;
 
 	/*printk(KERN_INFO "binder_mmap: %d %lx-%lx maps %p\n",
 		 proc->pid, vma->vm_start, vma->vm_end, proc->buffer);*/
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held
  2012-03-05 20:53     ` Al Viro
@ 2012-03-09 21:06       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2012-03-09 21:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel

On Mon, 2012-03-05 at 20:53 +0000, Al Viro wrote:
> On Mon, Mar 05, 2012 at 12:30:19PM -0800, Linus Torvalds wrote:
> > Is this safe? And why does it need it? Please add more explanations.
> 
> a) safety - as the matter of fact, all other callers either hold either
> ->mmap_sem (exclusive) or ->page_table_lock.  flush_tlb_range() is
> called under ->page_table_lock in a lot of places, e.g.
> page_referenced_one() -> pmdp_clear_flush_young_notify() ->
> -> pmdp_clear_flush_young() -> flush_tlb_range(), with
>                 /* go ahead even if the pmd is pmd_trans_splitting() */
>                 if (pmdp_clear_flush_young_notify(vma, address, pmd))
>                         referenced++;
>                 spin_unlock(&mm->page_table_lock);
> in page_referenced_one().
>
> b) there are instances that work with page tables.  See e.g.
> arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and flush_range() in there.
> The same goes for uml, with a lot more extensive playing with page tables.

Yes, we need to make sure they don't go away. Without any of these locks
page table pages may be freed... however, I don't see the page table
lock ensuring that anymore. The hugetlb_free_pgd_range implementation in
powerpc seemed to have old comments about expecting the PTL to be held
but that doesn't appear to be the case anymore.

In fact I always worry with the whole walking of page tables vs. freeing
them. We use sched RCU to delay the freeing so we -should- be ok if we
keep interrupts off on the walking side but it's fishy.

> Almost all callers are actually fine - flush_tlb_range() may have no need
> to bother playing with page tables, but it can do so safely; again, this
> caller is the sole exception - everything else either has exclusive ->mmap_sem
> on the mm in question, or mm->page_table_lock is held.

mmap_sem will protect vs. page tables freeing. page_table_lock on the
other hand...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-03-09 21:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-05  6:37 [patches] VM-related fixes Al Viro
2012-03-05  6:38 ` [PATCH 1/3] aout: move setup_arg_pages() prior to reading/mapping the binary Al Viro
2012-03-05  6:39 ` [PATCH 2/3] VM_GROWS{UP,DOWN} shouldn't be set on shmem VMAs Al Viro
2012-03-05  6:40 ` [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held Al Viro
2012-03-05 20:30   ` Linus Torvalds
2012-03-05 20:53     ` Al Viro
2012-03-09 21:06       ` Benjamin Herrenschmidt
2012-03-06  3:38 ` [patches] VM-related fixes Arve Hjønnevåg
2012-03-06  4:10   ` Al Viro
2012-03-06  5:25     ` Arve Hjønnevåg
2012-03-06  5:57       ` Al Viro
2012-03-06  6:36         ` Arve Hjønnevåg
2012-03-08 23:43     ` [PATCH] Staging: android: binder: Fix use-after-free bug Arve Hjønnevåg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.