linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Limit pgd range freeing to mm->task_size
@ 2013-02-13 11:39 Catalin Marinas
  2013-02-13 21:47 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Catalin Marinas @ 2013-02-13 11:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, linux-arch, Andrea Arcangeli, Russell King

ARM processors with LPAE enabled use 3 levels of page tables, with an
entry in the top level (pgd) covering 1GB of virtual space. Because of
the branch relocation limitations on ARM, the loadable modules are
mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared
between kernel modules and user space.

Since free_pgtables() is called with ceiling == 0, free_pgd_range() (and
subsequently called functions) also frees the page table
shared between user space and kernel modules (which is normally handled
by the ARM-specific pgd_free() function).

This patch changes the ceiling argument to mm->task_size for the
free_pgtables() and free_pgd_range() function calls. We cannot use
TASK_SIZE since this macro may not be a run-time constant on 64-bit
systems supporting compat applications.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
---

Hi Andrew,

I posted this patch a couple of times in the past. The latest
incarnation (using mm->task_size instead of TASK_SIZE) is a result of
discussions I had with Andrea and benh at the last KS.

Do you have any comments on it? It fixes a problem on ARM (32-bit) with
LPAE.

Thanks.

 fs/exec.c | 4 ++--
 mm/mmap.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 20df02c..04c1534 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -613,7 +613,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		 * when the old and new regions overlap clear from new_end.
 		 */
 		free_pgd_range(&tlb, new_end, old_end, new_end,
-			vma->vm_next ? vma->vm_next->vm_start : 0);
+			vma->vm_next ? vma->vm_next->vm_start : mm->task_size);
 	} else {
 		/*
 		 * otherwise, clean from old_start; this is done to not touch
@@ -622,7 +622,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		 * for the others its just a little faster.
 		 */
 		free_pgd_range(&tlb, old_start, old_end, new_end,
-			vma->vm_next ? vma->vm_next->vm_start : 0);
+			vma->vm_next ? vma->vm_next->vm_start : mm->task_size);
 	}
 	tlb_finish_mmu(&tlb, new_end, old_end);
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 35730ee..e15d294 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2262,7 +2262,7 @@ static void unmap_region(struct mm_struct *mm,
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end);
 	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
-				 next ? next->vm_start : 0);
+				 next ? next->vm_start : mm->task_size);
 	tlb_finish_mmu(&tlb, start, end);
 }
 
@@ -2640,7 +2640,7 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
 
-	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
+	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, mm->task_size);
 	tlb_finish_mmu(&tlb, 0, -1);
 
 	/*

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Limit pgd range freeing to mm->task_size
  2013-02-13 11:39 [PATCH] mm: Limit pgd range freeing to mm->task_size Catalin Marinas
@ 2013-02-13 21:47 ` Andrew Morton
  2013-02-14 21:24   ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2013-02-13 21:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-kernel, linux-mm, linux-arch, Andrea Arcangeli, Russell King

On Wed, 13 Feb 2013 11:39:29 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> ARM processors with LPAE enabled use 3 levels of page tables, with an
> entry in the top level (pgd) covering 1GB of virtual space. Because of
> the branch relocation limitations on ARM, the loadable modules are
> mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared
> between kernel modules and user space.
> 
> Since free_pgtables() is called with ceiling == 0, free_pgd_range() (and
> subsequently called functions) also frees the page table
> shared between user space and kernel modules (which is normally handled
> by the ARM-specific pgd_free() function).
> 
> This patch changes the ceiling argument to mm->task_size for the
> free_pgtables() and free_pgd_range() function calls. We cannot use
> TASK_SIZE since this macro may not be a run-time constant on 64-bit
> systems supporting compat applications.

I'm trying to work out why we're using 0 in there at all, rather than
->task_size.  But that's lost in the mists of time.

As you've discovered, handling of task_size and TASK_SIZE is somewhat
inconsistent across architectures and with compat tasks.  I guess we
toss it in there and see if anything breaks...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Limit pgd range freeing to mm->task_size
  2013-02-13 21:47 ` Andrew Morton
@ 2013-02-14 21:24   ` Hugh Dickins
  2013-02-18 15:49     ` Catalin Marinas
  0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2013-02-14 21:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Catalin Marinas, linux-kernel, linux-mm, linux-arch,
	Andrea Arcangeli, Russell King

On Wed, 13 Feb 2013, Andrew Morton wrote:
> On Wed, 13 Feb 2013 11:39:29 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > ARM processors with LPAE enabled use 3 levels of page tables, with an
> > entry in the top level (pgd) covering 1GB of virtual space. Because of
> > the branch relocation limitations on ARM, the loadable modules are
> > mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared
> > between kernel modules and user space.
> > 
> > Since free_pgtables() is called with ceiling == 0, free_pgd_range() (and
> > subsequently called functions) also frees the page table
> > shared between user space and kernel modules (which is normally handled
> > by the ARM-specific pgd_free() function).
> > 
> > This patch changes the ceiling argument to mm->task_size for the
> > free_pgtables() and free_pgd_range() function calls. We cannot use
> > TASK_SIZE since this macro may not be a run-time constant on 64-bit
> > systems supporting compat applications.
> 
> I'm trying to work out why we're using 0 in there at all, rather than
> ->task_size.  But that's lost in the mists of time.
> 
> As you've discovered, handling of task_size and TASK_SIZE is somewhat
> inconsistent across architectures and with compat tasks.  I guess we
> toss it in there and see if anything breaks...

... and an x86_64 kernel quickly shows,
with either 64-bit or 32-bit userspace, that exit_mmap() breaks at
WARN_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);

We couldn't think of using mm->task_size in 2.6.12 because it didn't
exist then; but although it sounds plausible, and on many architectures
(x86_32?) it should be fine, in general it's not quite the right thing
to use.  0 is an easy rounded-up-whatever-the-increment version of
TASK_SIZE (okay, it's missing an implicit 1 before all its 0s).

The ceiling passed to free_pgtables() says how far up it can go in
freeing pts and pmds and puds and pgds: when doing munmap(), you have
to be careful not to stray beyond the range you're freeing; when doing
exit_mmap(), you have to be careful to free all the areas you might
have had to avoid before.

mm->task_size does not necessarily fall on a nice boundary: use it
instead of 0 and exit_mmap() is liable to leave unfreed page tables
at several levels.

I'm sure that Catalin is right that he needs to adjust that ceiling arg
to free_pgtables() to cope with a level shared between user and kernel.

I met the same problem two years ago, when doing a patch (which worked
but went nowhere: x86 people kept on changing the early pagetable setup)
to make CONFIG_VMSPLIT_2G_OPT and 3G_OPT compatible with CONFIG_X86_PAE.
That shared a level beween user and kernel too: everything could be
handled down in the arch code, except this free_pgtables() ceiling arg.

(I did not make any change to the free_pgd_range() calls in fs/exec.c,
I'm not familiar with those at all: my patch appeared to work fine
without touching them, but now I wonder.)

Here's the mm/mmap.c part of my patch (but it now looks like the
default should go into include/asm-generic):

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -38,6 +38,16 @@
 
 #include "internal.h"
 
+/*
+ * On almost all architectures and configurations, 0 can be used as the
+ * upper ceiling to free_pgtables(): on many architectures it has the same
+ * effect as using TASK_SIZE.  However, there is one configuration which
+ * must impose a more careful limit, to avoid freeing kernel pgtables.
+ */
+#ifndef USER_PGTABLES_CEILING
+#define USER_PGTABLES_CEILING	0UL
+#endif
+
 #ifndef arch_mmap_check
 #define arch_mmap_check(addr, len, flags)	(0)
 #endif
@@ -1888,8 +1898,8 @@ static void unmap_region(struct mm_struct *mm,
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
-				 next? next->vm_start: 0);
+	free_pgtables(tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
+				 next ? next->vm_start : USER_PGTABLES_CEILING);
 	tlb_finish_mmu(tlb, start, end);
 }
 
@@ -2221,7 +2231,7 @@ void exit_mmap(struct mm_struct *mm)
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 
-	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
+	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(tlb, 0, end);
 	arch_flush_exec_range(mm);
 

Then arch/x86/include/asm/pgtable-3level_types.h had to
#define USER_PGTABLES_CEILING PAGE_OFFSET
in the special configuration.

In other words: to be safe, I believe you have to keep using 0 for the
ceiling on all the architectures and configurations that you're not
adding special new code to handle the user/kernel shared pagetables.

Hugh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Limit pgd range freeing to mm->task_size
  2013-02-14 21:24   ` Hugh Dickins
@ 2013-02-18 15:49     ` Catalin Marinas
  0 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2013-02-18 15:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-kernel, linux-mm, linux-arch,
	Andrea Arcangeli, Russell King

Hugh,

On Thu, Feb 14, 2013 at 09:24:09PM +0000, Hugh Dickins wrote:
> On Wed, 13 Feb 2013, Andrew Morton wrote:
> > On Wed, 13 Feb 2013 11:39:29 +0000
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> > > ARM processors with LPAE enabled use 3 levels of page tables, with an
> > > entry in the top level (pgd) covering 1GB of virtual space. Because of
> > > the branch relocation limitations on ARM, the loadable modules are
> > > mapped 16MB below PAGE_OFFSET, making the corresponding 1GB pgd shared
> > > between kernel modules and user space.
> > > 
> > > Since free_pgtables() is called with ceiling == 0, free_pgd_range() (and
> > > subsequently called functions) also frees the page table
> > > shared between user space and kernel modules (which is normally handled
> > > by the ARM-specific pgd_free() function).
> > > 
> > > This patch changes the ceiling argument to mm->task_size for the
> > > free_pgtables() and free_pgd_range() function calls. We cannot use
> > > TASK_SIZE since this macro may not be a run-time constant on 64-bit
> > > systems supporting compat applications.
> > 
> > I'm trying to work out why we're using 0 in there at all, rather than
> > ->task_size.  But that's lost in the mists of time.
> > 
> > As you've discovered, handling of task_size and TASK_SIZE is somewhat
> > inconsistent across architectures and with compat tasks.  I guess we
> > toss it in there and see if anything breaks...
> 
> ... and an x86_64 kernel quickly shows,
> with either 64-bit or 32-bit userspace, that exit_mmap() breaks at
> WARN_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
> 
> We couldn't think of using mm->task_size in 2.6.12 because it didn't
> exist then; but although it sounds plausible, and on many architectures
> (x86_32?) it should be fine, in general it's not quite the right thing
> to use.  0 is an easy rounded-up-whatever-the-increment version of
> TASK_SIZE (okay, it's missing an implicit 1 before all its 0s).
> 
> The ceiling passed to free_pgtables() says how far up it can go in
> freeing pts and pmds and puds and pgds: when doing munmap(), you have
> to be careful not to stray beyond the range you're freeing; when doing
> exit_mmap(), you have to be careful to free all the areas you might
> have had to avoid before.

Yes, on ARM+LPAE we make sure we free what's left of the shared pgd (a
pmd page).

> mm->task_size does not necessarily fall on a nice boundary: use it
> instead of 0 and exit_mmap() is liable to leave unfreed page tables
> at several levels.
> 
> I'm sure that Catalin is right that he needs to adjust that ceiling arg
> to free_pgtables() to cope with a level shared between user and kernel.
> 
> I met the same problem two years ago, when doing a patch (which worked
> but went nowhere: x86 people kept on changing the early pagetable setup)
> to make CONFIG_VMSPLIT_2G_OPT and 3G_OPT compatible with CONFIG_X86_PAE.
> That shared a level beween user and kernel too: everything could be
> handled down in the arch code, except this free_pgtables() ceiling arg.
> 
> (I did not make any change to the free_pgd_range() calls in fs/exec.c,
> I'm not familiar with those at all: my patch appeared to work fine
> without touching them, but now I wonder.)
> 
> Here's the mm/mmap.c part of my patch (but it now looks like the
> default should go into include/asm-generic):

Thanks for the patch. It is related to FIRST_USER_ADDRESS which is
defined in asm/pgtable.h, so asm-generic/pgtable.h looks like a good
place. We can actually make FIRST_USER_ADDRESS generic as well since
apart from arm and unicore32 all the other architectures define it as 0.

I'll shortly post a series of two patches with your patch and the ARM
definition of USER_PGTABLES_CEILING.

-- 
Catalin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-18 15:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-13 11:39 [PATCH] mm: Limit pgd range freeing to mm->task_size Catalin Marinas
2013-02-13 21:47 ` Andrew Morton
2013-02-14 21:24   ` Hugh Dickins
2013-02-18 15:49     ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).