linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Add mmap_assert_locked() annotations to find_vma*()
@ 2021-07-31 17:53 Luigi Rizzo
  2021-08-01 19:33 ` Andrew Morton
  2021-08-03 16:08 ` Jason Gunthorpe
  0 siblings, 2 replies; 15+ messages in thread
From: Luigi Rizzo @ 2021-07-31 17:53 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, David Rientjes; +Cc: linux-mm, Luigi Rizzo

find_vma() and variants need protection when used.
This patch adds mmap_assert_lock() calls in the functions.

To make sure the invariant is satisfied, we also need to add a
mmap_read_loc() around the get_user_pages_remote() call in
get_arg_page(). The lock is not strictly necessary because the mm
has been newly created, but the extra cost is limited because
the same mutex was also acquired shortly before in __bprm_mm_init(),
so it is hot and uncontended.

Signed-off-by: Luigi Rizzo <lrizzo@google.com>
---
 fs/exec.c | 2 ++
 mm/mmap.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 38f63451b928..ac7603e985b4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 	 * We are doing an exec().  'current' is the process
 	 * doing the exec and bprm->mm is the new process's mm.
 	 */
+	mmap_read_lock(bprm->mm);
 	ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
 			&page, NULL, NULL);
+	mmap_read_unlock(bprm->mm);
 	if (ret <= 0)
 		return NULL;
 
diff --git a/mm/mmap.c b/mm/mmap.c
index ca54d36d203a..79f4f8ae43ec 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -534,6 +534,7 @@ static int find_vma_links(struct mm_struct *mm, unsigned long addr,
 {
 	struct rb_node **__rb_link, *__rb_parent, *rb_prev;
 
+	mmap_assert_locked(mm);
 	__rb_link = &mm->mm_rb.rb_node;
 	rb_prev = __rb_parent = NULL;
 
@@ -2303,6 +2304,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
 	struct rb_node *rb_node;
 	struct vm_area_struct *vma;
 
+	mmap_assert_locked(mm);
 	/* Check the cache first. */
 	vma = vmacache_find(mm, addr);
 	if (likely(vma))
-- 
2.32.0.554.ge1b32706d8-goog



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-07-31 17:53 [PATCH] Add mmap_assert_locked() annotations to find_vma*() Luigi Rizzo
@ 2021-08-01 19:33 ` Andrew Morton
  2021-08-02  0:11   ` Luigi Rizzo
  2021-08-02  0:16   ` Luigi Rizzo
  2021-08-03 16:08 ` Jason Gunthorpe
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2021-08-01 19:33 UTC (permalink / raw)
  To: Luigi Rizzo; +Cc: linux-kernel, David Rientjes, linux-mm

On Sat, 31 Jul 2021 10:53:41 -0700 Luigi Rizzo <lrizzo@google.com> wrote:

> find_vma() and variants need protection when used.
> This patch adds mmap_assert_lock() calls in the functions.
> 
> To make sure the invariant is satisfied, we also need to add a
> mmap_read_loc() around the get_user_pages_remote() call in
> get_arg_page(). The lock is not strictly necessary because the mm
> has been newly created, but the extra cost is limited because
> the same mutex was also acquired shortly before in __bprm_mm_init(),
> so it is hot and uncontended.
> 

Well, it isn't cost-free.  find_vma() is called a lot and a surprising
number of systems apparently run with CONFIG_DEBUG_VM.  Why do you
think this cost is justified?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-01 19:33 ` Andrew Morton
@ 2021-08-02  0:11   ` Luigi Rizzo
  2021-08-02  0:16   ` Luigi Rizzo
  1 sibling, 0 replies; 15+ messages in thread
From: Luigi Rizzo @ 2021-08-02  0:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David Rientjes, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2376 bytes --]

On Sun, Aug 1, 2021 at 9:33 PM Andrew Morton <akpm@linux-foundation.org>
wrote:

> On Sat, 31 Jul 2021 10:53:41 -0700 Luigi Rizzo <lrizzo@google.com> wrote:
>
> > find_vma() and variants need protection when used.
> > This patch adds mmap_assert_lock() calls in the functions.
> >
> > To make sure the invariant is satisfied, we also need to add a
> > mmap_read_loc() around the get_user_pages_remote() call in
> > get_arg_page(). The lock is not strictly necessary because the mm
> > has been newly created, but the extra cost is limited because
> > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > so it is hot and uncontended.
> >
>
> Well, it isn't cost-free.  find_vma() is called a lot and a surprising
> number of systems apparently run with CONFIG_DEBUG_VM.  Why do you
> think this cost is justified?
>

I assume you are concerned with the cost of mmap_assert_locked() ?

I'd say the justification is the same as for all asserts:
at some point some code change may miss the required lock, and the
asserts are there to catch elusive race conditions,

There are in fact already instances of mmap_locked_assert()
right before find_vma() in walk_page_range(), and a couple before
calls to __get_user_pages().

As for the cost, I'd think that if CONFIG_DEBUG_VM is set,
one does it on purpose to catch errors and is prepared to pay
the cost (in this case the atomic_read(counter) in rwsem_is_locked(),
the counter should be hot).

FWIW I have instrumented find_vma() on a fast machine using kstats

   https://github.com/luigirizzo/lr-cstats

(load the module then enable the trace with
  echo "trace pcpu:find_vma bits 3" > /sys/kernel/debug/kstats/_control
and monitor the time with
  watch "grep CPUS /sys/kernel/debug/kstats/find_vma"

I didn't run anything especially intensive except some network
benchmarks, but I have collected ~2M samples with the following
distribution of find_vma() time in nanoseconds in 3 configs:

CONFIGURATION         p10   p50   p90   p95   p98

no-debug               89   109   214   332    605
debug                 331   369   603   862   1338
debug+this patch      337   369   603   863   1339

As you can see, just compiling a debug kernel, even without this patch,
makes the function 3x more expensive. The effect of this patch is
not measurable (the differences are below measurement error).

cheers
luigi

[-- Attachment #2: Type: text/html, Size: 7639 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-01 19:33 ` Andrew Morton
  2021-08-02  0:11   ` Luigi Rizzo
@ 2021-08-02  0:16   ` Luigi Rizzo
  2021-08-02 21:11     ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Luigi Rizzo @ 2021-08-02  0:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David Rientjes, linux-mm

(repost in plain text)

On Sun, Aug 1, 2021 at 9:33 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sat, 31 Jul 2021 10:53:41 -0700 Luigi Rizzo <lrizzo@google.com> wrote:
>
> > find_vma() and variants need protection when used.
> > This patch adds mmap_assert_lock() calls in the functions.
> >
> > To make sure the invariant is satisfied, we also need to add a
> > mmap_read_loc() around the get_user_pages_remote() call in
> > get_arg_page(). The lock is not strictly necessary because the mm
> > has been newly created, but the extra cost is limited because
> > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > so it is hot and uncontended.
> >
>
> Well, it isn't cost-free.  find_vma() is called a lot and a surprising
> number of systems apparently run with CONFIG_DEBUG_VM.  Why do you
> think this cost is justified?

I assume you are concerned with the cost of mmap_assert_locked() ?

I'd say the justification is the same as for all asserts:
at some point some code change may miss the required lock, and the
asserts are there to catch elusive race conditions,

There are in fact already instances of mmap_locked_assert()
right before find_vma() in walk_page_range(), and a couple before
calls to __get_user_pages().

As for the cost, I'd think that if CONFIG_DEBUG_VM is set,
one does it on purpose to catch errors and is prepared to pay
the cost (in this case the atomic_read(counter) in rwsem_is_locked(),
the counter should be hot).

FWIW I have instrumented find_vma() on a fast machine using kstats

   https://github.com/luigirizzo/lr-cstats

(load the module then enable the trace with
  echo "trace pcpu:find_vma bits 3" > /sys/kernel/debug/kstats/_control
and monitor the time with
  watch "grep CPUS /sys/kernel/debug/kstats/find_vma"

I didn't run anything especially intensive except some network
benchmarks, but I have collected ~2M samples with the following
distribution of find_vma() time in nanoseconds in 3 configs:

CONFIGURATION         p10   p50   p90   p95   p98

no-debug               89   109   214   332    605
debug                 331   369   603   862   1338
debug+this patch      337   369   603   863   1339

As you can see, just compiling a debug kernel, even without this patch,
makes the function 3x more expensive. The effect of this patch is
not measurable (the differences are below measurement error).

cheers
luigi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-02  0:16   ` Luigi Rizzo
@ 2021-08-02 21:11     ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2021-08-02 21:11 UTC (permalink / raw)
  To: Luigi Rizzo; +Cc: linux-kernel, David Rientjes, linux-mm

On Mon, 2 Aug 2021 02:16:14 +0200 Luigi Rizzo <lrizzo@google.com> wrote:

> > Well, it isn't cost-free.  find_vma() is called a lot and a surprising
> > number of systems apparently run with CONFIG_DEBUG_VM.  Why do you
> > think this cost is justified?
> 
> I assume you are concerned with the cost of mmap_assert_locked() ?
> 
> I'd say the justification is the same as for all asserts:
> at some point some code change may miss the required lock, and the
> asserts are there to catch elusive race conditions,
> 
> There are in fact already instances of mmap_locked_assert()
> right before find_vma() in walk_page_range(), and a couple before
> calls to __get_user_pages().
> 
> As for the cost, I'd think that if CONFIG_DEBUG_VM is set,
> one does it on purpose to catch errors and is prepared to pay
> the cost (in this case the atomic_read(counter) in rwsem_is_locked(),
> the counter should be hot).
> 
> FWIW I have instrumented find_vma() on a fast machine using kstats
> 
>    https://github.com/luigirizzo/lr-cstats
> 
> (load the module then enable the trace with
>   echo "trace pcpu:find_vma bits 3" > /sys/kernel/debug/kstats/_control
> and monitor the time with
>   watch "grep CPUS /sys/kernel/debug/kstats/find_vma"
> 
> I didn't run anything especially intensive except some network
> benchmarks, but I have collected ~2M samples with the following
> distribution of find_vma() time in nanoseconds in 3 configs:
> 
> CONFIGURATION         p10   p50   p90   p95   p98
> 
> no-debug               89   109   214   332    605
> debug                 331   369   603   862   1338
> debug+this patch      337   369   603   863   1339
> 
> As you can see, just compiling a debug kernel, even without this patch,
> makes the function 3x more expensive. The effect of this patch is
> not measurable (the differences are below measurement error).

Cool, thanks, that's convincing.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-07-31 17:53 [PATCH] Add mmap_assert_locked() annotations to find_vma*() Luigi Rizzo
  2021-08-01 19:33 ` Andrew Morton
@ 2021-08-03 16:08 ` Jason Gunthorpe
  2021-08-03 21:48   ` Luigi Rizzo
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2021-08-03 16:08 UTC (permalink / raw)
  To: Luigi Rizzo, Jann Horn
  Cc: linux-kernel, Andrew Morton, David Rientjes, linux-mm

On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> find_vma() and variants need protection when used.
> This patch adds mmap_assert_lock() calls in the functions.
> 
> To make sure the invariant is satisfied, we also need to add a
> mmap_read_loc() around the get_user_pages_remote() call in
> get_arg_page(). The lock is not strictly necessary because the mm
> has been newly created, but the extra cost is limited because
> the same mutex was also acquired shortly before in __bprm_mm_init(),
> so it is hot and uncontended.
> 
> Signed-off-by: Luigi Rizzo <lrizzo@google.com>
>  fs/exec.c | 2 ++
>  mm/mmap.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 38f63451b928..ac7603e985b4 100644
> +++ b/fs/exec.c
> @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
>  	 * We are doing an exec().  'current' is the process
>  	 * doing the exec and bprm->mm is the new process's mm.
>  	 */
> +	mmap_read_lock(bprm->mm);
>  	ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
>  			&page, NULL, NULL);
> +	mmap_read_unlock(bprm->mm);
>  	if (ret <= 0)
>  		return NULL;

Wasn't Jann Horn working on something like this too?

https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/

IIRC it was very tricky here, are you sure it is OK to obtain this lock
here?

I would much rather see Jann's complete solution be merged then
hacking at the exec problem on the side..

Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 16:08 ` Jason Gunthorpe
@ 2021-08-03 21:48   ` Luigi Rizzo
  2021-08-03 23:07     ` Liam Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Luigi Rizzo @ 2021-08-03 21:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jann Horn, linux-kernel, Andrew Morton, David Rientjes, linux-mm

On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > find_vma() and variants need protection when used.
> > This patch adds mmap_assert_lock() calls in the functions.
> >
> > To make sure the invariant is satisfied, we also need to add a
> > mmap_read_loc() around the get_user_pages_remote() call in
> > get_arg_page(). The lock is not strictly necessary because the mm
> > has been newly created, but the extra cost is limited because
> > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > so it is hot and uncontended.
> >
> > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> >  fs/exec.c | 2 ++
> >  mm/mmap.c | 2 ++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 38f63451b928..ac7603e985b4 100644
> > +++ b/fs/exec.c
> > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> >        * We are doing an exec().  'current' is the process
> >        * doing the exec and bprm->mm is the new process's mm.
> >        */
> > +     mmap_read_lock(bprm->mm);
> >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> >                       &page, NULL, NULL);
> > +     mmap_read_unlock(bprm->mm);
> >       if (ret <= 0)
> >               return NULL;
>
> Wasn't Jann Horn working on something like this too?
>
> https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
>
> IIRC it was very tricky here, are you sure it is OK to obtain this lock
> here?

I cannot comment on Jann's patch series but no other thread knows
about this mm at this point in the code so the lock is definitely
safe to acquire (shortly before there was also a write lock acquired
on the same mm, in the same conditions).

cheers
luigi

>
> I would much rather see Jann's complete solution be merged then
> hacking at the exec problem on the side..
>
> Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 21:48   ` Luigi Rizzo
@ 2021-08-03 23:07     ` Liam Howlett
  2021-08-03 23:35       ` Jason Gunthorpe
  2021-08-04 14:42       ` Jann Horn
  0 siblings, 2 replies; 15+ messages in thread
From: Liam Howlett @ 2021-08-03 23:07 UTC (permalink / raw)
  To: Luigi Rizzo
  Cc: Jason Gunthorpe, Jann Horn, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

* Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > find_vma() and variants need protection when used.
> > > This patch adds mmap_assert_lock() calls in the functions.
> > >
> > > To make sure the invariant is satisfied, we also need to add a
> > > mmap_read_loc() around the get_user_pages_remote() call in
> > > get_arg_page(). The lock is not strictly necessary because the mm
> > > has been newly created, but the extra cost is limited because
> > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > so it is hot and uncontended.
> > >
> > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > >  fs/exec.c | 2 ++
> > >  mm/mmap.c | 2 ++
> > >  2 files changed, 4 insertions(+)
> > >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index 38f63451b928..ac7603e985b4 100644
> > > +++ b/fs/exec.c
> > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > >        * We are doing an exec().  'current' is the process
> > >        * doing the exec and bprm->mm is the new process's mm.
> > >        */
> > > +     mmap_read_lock(bprm->mm);
> > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > >                       &page, NULL, NULL);
> > > +     mmap_read_unlock(bprm->mm);
> > >       if (ret <= 0)
> > >               return NULL;
> >
> > Wasn't Jann Horn working on something like this too?
> >
> > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> >
> > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > here?
> 
> I cannot comment on Jann's patch series but no other thread knows
> about this mm at this point in the code so the lock is definitely
> safe to acquire (shortly before there was also a write lock acquired
> on the same mm, in the same conditions).

If there is no other code that knows about this mm, then does one need
the lock at all?  Is this just to satisfy the new check you added?

If you want to make this change, I would suggest writing it in a way to
ensure the call to expand_downwards() in the same function also holds
the lock.  I believe this is technically required as well?  What do you
think?

Thanks,
Liam

> 
> cheers
> luigi
> 
> >
> > I would much rather see Jann's complete solution be merged then
> > hacking at the exec problem on the side..
> >
> > Jason
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 23:07     ` Liam Howlett
@ 2021-08-03 23:35       ` Jason Gunthorpe
  2021-08-03 23:57         ` Luigi Rizzo
  2021-08-04 14:42       ` Jann Horn
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2021-08-03 23:35 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Luigi Rizzo, Jann Horn, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> * Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > find_vma() and variants need protection when used.
> > > > This patch adds mmap_assert_lock() calls in the functions.
> > > >
> > > > To make sure the invariant is satisfied, we also need to add a
> > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > has been newly created, but the extra cost is limited because
> > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > so it is hot and uncontended.
> > > >
> > > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > > >  fs/exec.c | 2 ++
> > > >  mm/mmap.c | 2 ++
> > > >  2 files changed, 4 insertions(+)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 38f63451b928..ac7603e985b4 100644
> > > > +++ b/fs/exec.c
> > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > >        * We are doing an exec().  'current' is the process
> > > >        * doing the exec and bprm->mm is the new process's mm.
> > > >        */
> > > > +     mmap_read_lock(bprm->mm);
> > > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > >                       &page, NULL, NULL);
> > > > +     mmap_read_unlock(bprm->mm);
> > > >       if (ret <= 0)
> > > >               return NULL;
> > >
> > > Wasn't Jann Horn working on something like this too?
> > >
> > > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> > >
> > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > here?
> > 
> > I cannot comment on Jann's patch series but no other thread knows
> > about this mm at this point in the code so the lock is definitely
> > safe to acquire (shortly before there was also a write lock acquired
> > on the same mm, in the same conditions).
> 
> If there is no other code that knows about this mm, then does one need
> the lock at all?  Is this just to satisfy the new check you added?
> 
> If you want to make this change, I would suggest writing it in a way to
> ensure the call to expand_downwards() in the same function also holds
> the lock.  I believe this is technically required as well?  What do you
> think?

This is essentially what Jann was doing. Since the mm is newly created
we can create it write locked and then we can add proper locking tests
to many of the functions called along this path.

Adding useless locks around each troublesome callsite just seems
really confusing to me.

Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 23:35       ` Jason Gunthorpe
@ 2021-08-03 23:57         ` Luigi Rizzo
  2021-08-04  5:12           ` Liam Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Luigi Rizzo @ 2021-08-03 23:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liam Howlett, Jann Horn, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

On Wed, Aug 4, 2021 at 1:35 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> > * Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > find_vma() and variants need protection when used.
> > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > >
> > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > has been newly created, but the extra cost is limited because
> > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > so it is hot and uncontended.
> > > > >
> > > > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > > > >  fs/exec.c | 2 ++
> > > > >  mm/mmap.c | 2 ++
> > > > >  2 files changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > +++ b/fs/exec.c
> > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > >        * We are doing an exec().  'current' is the process
> > > > >        * doing the exec and bprm->mm is the new process's mm.
> > > > >        */
> > > > > +     mmap_read_lock(bprm->mm);
> > > > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > >                       &page, NULL, NULL);
> > > > > +     mmap_read_unlock(bprm->mm);
> > > > >       if (ret <= 0)
> > > > >               return NULL;
> > > >
> > > > Wasn't Jann Horn working on something like this too?
> > > >
> > > > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> > > >
> > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > here?
> > >
> > > I cannot comment on Jann's patch series but no other thread knows
> > > about this mm at this point in the code so the lock is definitely
> > > safe to acquire (shortly before there was also a write lock acquired
> > > on the same mm, in the same conditions).
> >
> > If there is no other code that knows about this mm, then does one need
> > the lock at all?  Is this just to satisfy the new check you added?
> >
> > If you want to make this change, I would suggest writing it in a way to
> > ensure the call to expand_downwards() in the same function also holds
> > the lock.  I believe this is technically required as well?  What do you
> > think?
>
> This is essentially what Jann was doing. Since the mm is newly created
> we can create it write locked and then we can add proper locking tests
> to many of the functions called along this path.
>
> Adding useless locks around each troublesome callsite just seems
> really confusing to me.

Uhm... by that reasoning, even creating the mm locked (and unlocking
at the end) is equally unnecessary.

My goal was to add asserts and invariants that are easy
to understand and get right, rather than optimize a path
that does not appear to be critical.

Adding one read lock pair around the one function we annotate
is easy to understand and clearly a leaf lock.

Having alloc_bprm return a locked object is a bit unconventional,
and also passing it to other methods raises the question of whether
they take other lock possibly causing lock order reversals
in the future.

cheers
luigi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 23:57         ` Luigi Rizzo
@ 2021-08-04  5:12           ` Liam Howlett
  0 siblings, 0 replies; 15+ messages in thread
From: Liam Howlett @ 2021-08-04  5:12 UTC (permalink / raw)
  To: Luigi Rizzo
  Cc: Jason Gunthorpe, Jann Horn, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

* Luigi Rizzo <lrizzo@google.com> [210803 19:58]:
> On Wed, Aug 4, 2021 at 1:35 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> > > * Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> > > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > > find_vma() and variants need protection when used.
> > > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > > >
> > > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > > has been newly created, but the extra cost is limited because
> > > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > > so it is hot and uncontended.
> > > > > >
> > > > > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > > > > >  fs/exec.c | 2 ++
> > > > > >  mm/mmap.c | 2 ++
> > > > > >  2 files changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > > +++ b/fs/exec.c
> > > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > > >        * We are doing an exec().  'current' is the process
> > > > > >        * doing the exec and bprm->mm is the new process's mm.
> > > > > >        */
> > > > > > +     mmap_read_lock(bprm->mm);
> > > > > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > > >                       &page, NULL, NULL);
> > > > > > +     mmap_read_unlock(bprm->mm);
> > > > > >       if (ret <= 0)
> > > > > >               return NULL;
> > > > >
> > > > > Wasn't Jann Horn working on something like this too?
> > > > >
> > > > > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> > > > >
> > > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > > here?
> > > >
> > > > I cannot comment on Jann's patch series but no other thread knows
> > > > about this mm at this point in the code so the lock is definitely
> > > > safe to acquire (shortly before there was also a write lock acquired
> > > > on the same mm, in the same conditions).
> > >
> > > If there is no other code that knows about this mm, then does one need
> > > the lock at all?  Is this just to satisfy the new check you added?
> > >
> > > If you want to make this change, I would suggest writing it in a way to
> > > ensure the call to expand_downwards() in the same function also holds
> > > the lock.  I believe this is technically required as well?  What do you
> > > think?
> >
> > This is essentially what Jann was doing. Since the mm is newly created
> > we can create it write locked and then we can add proper locking tests
> > to many of the functions called along this path.

That sounds good.  Jann has left the patch as pending a fix since
November 2020.  Can't the removal of the lock/unlock be added to the
next iteration of the patch?  Was there a v4 of that patch?

> >
> > Adding useless locks around each troublesome callsite just seems
> > really confusing to me.
> 
> Uhm... by that reasoning, even creating the mm locked (and unlocking
> at the end) is equally unnecessary.

I think taking the lock is more clear than leaving it the way it's
currently written.  It is actually confusing to see the lock taken after
calling expand_downwards() which explicitly mentions the lock as
required in the comments though.  This should at least have a comment
about early creation not requiring the lock.

> 
> My goal was to add asserts and invariants that are easy
> to understand and get right, rather than optimize a path
> that does not appear to be critical.
> 
> Adding one read lock pair around the one function we annotate
> is easy to understand and clearly a leaf lock.
> 
> Having alloc_bprm return a locked object is a bit unconventional,
> and also passing it to other methods raises the question of whether
> they take other lock possibly causing lock order reversals
> in the future.

We are (probably?) okay as the usual order right now is to take the mmap
sem before the pte and interval tree.  It's also just for the set up, so
unless there is a special case that could cause trouble... or maybe I
should ask which cases will cause trouble?

Thanks,
Liam


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-03 23:07     ` Liam Howlett
  2021-08-03 23:35       ` Jason Gunthorpe
@ 2021-08-04 14:42       ` Jann Horn
  2021-08-04 15:21         ` Jason Gunthorpe
  2021-08-04 17:32         ` Liam Howlett
  1 sibling, 2 replies; 15+ messages in thread
From: Jann Horn @ 2021-08-04 14:42 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Luigi Rizzo, Jason Gunthorpe, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

On Wed, Aug 4, 2021 at 1:07 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> * Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > find_vma() and variants need protection when used.
> > > > This patch adds mmap_assert_lock() calls in the functions.
> > > >
> > > > To make sure the invariant is satisfied, we also need to add a
> > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > has been newly created, but the extra cost is limited because
> > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > so it is hot and uncontended.
> > > >
> > > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > > >  fs/exec.c | 2 ++
> > > >  mm/mmap.c | 2 ++
> > > >  2 files changed, 4 insertions(+)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 38f63451b928..ac7603e985b4 100644
> > > > +++ b/fs/exec.c
> > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > >        * We are doing an exec().  'current' is the process
> > > >        * doing the exec and bprm->mm is the new process's mm.
> > > >        */
> > > > +     mmap_read_lock(bprm->mm);
> > > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > >                       &page, NULL, NULL);
> > > > +     mmap_read_unlock(bprm->mm);
> > > >       if (ret <= 0)
> > > >               return NULL;
> > >
> > > Wasn't Jann Horn working on something like this too?
> > >
> > > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> > >
> > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > here?
> >
> > I cannot comment on Jann's patch series but no other thread knows
> > about this mm at this point in the code so the lock is definitely
> > safe to acquire (shortly before there was also a write lock acquired
> > on the same mm, in the same conditions).
>
> If there is no other code that knows about this mm, then does one need
> the lock at all?  Is this just to satisfy the new check you added?
>
> If you want to make this change, I would suggest writing it in a way to
> ensure the call to expand_downwards() in the same function also holds
> the lock.  I believe this is technically required as well?  What do you
> think?

The call to expand_downwards() takes a VMA pointer as argument, and
the mmap lock is the only thing that normally prevents concurrent
freeing of VMA structs. Taking a lock there would be of limited utility - either
the lock is not necessary because nobody else can access the MM, or
the lock is insufficient because someone could have freed the VMA
pointer before the lock was taken. So I think that taking a lock
around the expand_downwards() call would just be obfuscating things,
unless you specifically want to prevent concurrent *reads* while
concurrent *writes* are impossible.

Since I haven't sent a new version of my old series for almost a year,
I think it'd be fine to take Luigi's patch for now, and undo it at a
later point when/if we want to actually use proper locking here
because we're worried about concurrent access to the MM.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-04 14:42       ` Jann Horn
@ 2021-08-04 15:21         ` Jason Gunthorpe
  2021-08-04 21:22           ` Jann Horn
  2021-08-04 17:32         ` Liam Howlett
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2021-08-04 15:21 UTC (permalink / raw)
  To: Jann Horn
  Cc: Liam Howlett, Luigi Rizzo, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

On Wed, Aug 04, 2021 at 04:42:23PM +0200, Jann Horn wrote:
> Since I haven't sent a new version of my old series for almost a year,
> I think it'd be fine to take Luigi's patch for now, and undo it at a
> later point when/if we want to actually use proper locking here
> because we're worried about concurrent access to the MM.

IIRC one of the major points of that work was not "proper locking" but
to have enough locking to be complatible with lockdep so we could add
assertions like in get_user_pages and find_vma.

Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-04 14:42       ` Jann Horn
  2021-08-04 15:21         ` Jason Gunthorpe
@ 2021-08-04 17:32         ` Liam Howlett
  1 sibling, 0 replies; 15+ messages in thread
From: Liam Howlett @ 2021-08-04 17:32 UTC (permalink / raw)
  To: Jann Horn
  Cc: Luigi Rizzo, Jason Gunthorpe, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

* Jann Horn <jannh@google.com> [210804 10:42]:
> On Wed, Aug 4, 2021 at 1:07 AM Liam Howlett <liam.howlett@oracle.com> wrote:
> > * Luigi Rizzo <lrizzo@google.com> [210803 17:49]:
> > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > find_vma() and variants need protection when used.
> > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > >
> > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > has been newly created, but the extra cost is limited because
> > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > so it is hot and uncontended.
> > > > >
> > > > > Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> > > > >  fs/exec.c | 2 ++
> > > > >  mm/mmap.c | 2 ++
> > > > >  2 files changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > +++ b/fs/exec.c
> > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > >        * We are doing an exec().  'current' is the process
> > > > >        * doing the exec and bprm->mm is the new process's mm.
> > > > >        */
> > > > > +     mmap_read_lock(bprm->mm);
> > > > >       ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > >                       &page, NULL, NULL);
> > > > > +     mmap_read_unlock(bprm->mm);
> > > > >       if (ret <= 0)
> > > > >               return NULL;
> > > >
> > > > Wasn't Jann Horn working on something like this too?
> > > >
> > > > https://lore.kernel.org/linux-mm/20201016225713.1971256-1-jannh@google.com/
> > > >
> > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > here?
> > >
> > > I cannot comment on Jann's patch series but no other thread knows
> > > about this mm at this point in the code so the lock is definitely
> > > safe to acquire (shortly before there was also a write lock acquired
> > > on the same mm, in the same conditions).
> >
> > If there is no other code that knows about this mm, then does one need
> > the lock at all?  Is this just to satisfy the new check you added?
> >
> > If you want to make this change, I would suggest writing it in a way to
> > ensure the call to expand_downwards() in the same function also holds
> > the lock.  I believe this is technically required as well?  What do you
> > think?
> 
> The call to expand_downwards() takes a VMA pointer as argument, and
> the mmap lock is the only thing that normally prevents concurrent
> freeing of VMA structs. Taking a lock there would be of limited utility - either
> the lock is not necessary because nobody else can access the MM, or
> the lock is insufficient because someone could have freed the VMA
> pointer before the lock was taken. So I think that taking a lock
> around the expand_downwards() call would just be obfuscating things,
> unless you specifically want to prevent concurrent *reads* while
> concurrent *writes* are impossible.

Good point on the VMA being passed in, that certainly points to your
previous patch being a better approach.  That resolves my questions
around the patch.

> 
> Since I haven't sent a new version of my old series for almost a year,
> I think it'd be fine to take Luigi's patch for now, and undo it at a
> later point when/if we want to actually use proper locking here
> because we're worried about concurrent access to the MM.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()
  2021-08-04 15:21         ` Jason Gunthorpe
@ 2021-08-04 21:22           ` Jann Horn
  0 siblings, 0 replies; 15+ messages in thread
From: Jann Horn @ 2021-08-04 21:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liam Howlett, Luigi Rizzo, linux-kernel, Andrew Morton,
	David Rientjes, linux-mm

On Wed, Aug 4, 2021 at 5:21 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Aug 04, 2021 at 04:42:23PM +0200, Jann Horn wrote:
> > Since I haven't sent a new version of my old series for almost a year,
> > I think it'd be fine to take Luigi's patch for now, and undo it at a
> > later point when/if we want to actually use proper locking here
> > because we're worried about concurrent access to the MM.
>
> IIRC one of the major points of that work was not "proper locking" but
> to have enough locking to be complatible with lockdep so we could add
> assertions like in get_user_pages and find_vma.

That's part of it; but it's also for making the code more clearly
correct and future-proofing it. Looking at it now, I think
process_madvise() might actually already be able to race with execve()
to some degree; and if you made a change like this to the current
kernel:

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d3d348b17f4..3648c198673c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1043,12 +1043,14 @@ madvise_behavior_valid(int behavior)
 static bool
 process_madvise_behavior_valid(int behavior)
 {
        switch (behavior) {
        case MADV_COLD:
        case MADV_PAGEOUT:
+       case MADV_DOFORK:
+       case MADV_DONTFORK:
                return true;
        default:
                return false;
        }
 }

it would probably introduce a memory corruption bug, because then
someone might be able to destroy the stack VMA between
setup_new_exec() and setup_arg_pages() by using process_madvise() to
trigger VMA splitting/merging in the right pattern.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-08-04 21:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-31 17:53 [PATCH] Add mmap_assert_locked() annotations to find_vma*() Luigi Rizzo
2021-08-01 19:33 ` Andrew Morton
2021-08-02  0:11   ` Luigi Rizzo
2021-08-02  0:16   ` Luigi Rizzo
2021-08-02 21:11     ` Andrew Morton
2021-08-03 16:08 ` Jason Gunthorpe
2021-08-03 21:48   ` Luigi Rizzo
2021-08-03 23:07     ` Liam Howlett
2021-08-03 23:35       ` Jason Gunthorpe
2021-08-03 23:57         ` Luigi Rizzo
2021-08-04  5:12           ` Liam Howlett
2021-08-04 14:42       ` Jann Horn
2021-08-04 15:21         ` Jason Gunthorpe
2021-08-04 21:22           ` Jann Horn
2021-08-04 17:32         ` Liam Howlett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).