All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
       [not found] <20200930011944.19869-1-jannh@google.com>
@ 2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

In preparation for adding a mmap_assert_locked() check in
__get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
to operate on an mm without locking in the middle of execve() as long as
it hasn't been installed on a process yet.

Existing code paths that do this are (reverse callgraph):

  get_user_pages_remote
    get_arg_page
      copy_strings
      copy_string_kernel
      remove_arg_zero
    tomoyo_dump_page
      tomoyo_print_bprm
      tomoyo_scan_bprm
      tomoyo_environ

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/exec.c                 |  8 ++++++++
 include/linux/mm_types.h  |  9 +++++++++
 include/linux/mmap_lock.h | 16 ++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..c02b0e8e1c0b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1129,6 +1129,14 @@ static int exec_mmap(struct mm_struct *mm)
 		}
 	}

+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	/*
+	 * From here on, the mm may be accessed concurrently, and proper locking
+	 * is required for things like get_user_pages_remote().
+	 */
+	mm->mmap_lock_required = 1;
+#endif
+
 	task_lock(tsk);
 	active_mm = tsk->active_mm;
 	membarrier_exec_mmap(mm);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ed028af3cb19..89fee0d0d652 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -552,6 +552,15 @@ struct mm_struct {
 		atomic_long_t hugetlb_usage;
 #endif
 		struct work_struct async_put_work;
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+		/*
+		 * Notes whether this mm has been installed on a process yet.
+		 * If not, only the task going through execve() can access this
+		 * mm, and no locking is needed around get_user_pages_remote().
+		 * This flag is only used for debug checks.
+		 */
+		bool mmap_lock_required;
+#endif
 	} __randomize_layout;

 	/*
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 0707671851a8..c4fd874954d7 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -77,14 +77,22 @@ static inline void
mmap_read_unlock_non_owner(struct mm_struct *mm)

 static inline void mmap_assert_locked(struct mm_struct *mm)
 {
-	lockdep_assert_held(&mm->mmap_lock);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	if (mm->mmap_lock_required) {
+		lockdep_assert_held(&mm->mmap_lock);
+		VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+	}
+#endif
 }

 static inline void mmap_assert_write_locked(struct mm_struct *mm)
 {
-	lockdep_assert_held_write(&mm->mmap_lock);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	if (mm->mmap_lock_required) {
+		lockdep_assert_held_write(&mm->mmap_lock);
+		VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+	}
+#endif
 }

 #endif /* _LINUX_MMAP_LOCK_H */
-- 
2.28.0.709.gb0816b6eb0-goog

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] binfmt_elf: Take the mmap lock around find_extend_vma()
       [not found] <20200930011944.19869-1-jannh@google.com>
@ 2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

create_elf_tables() runs after setup_new_exec(), so other tasks can
already access our new mm and do things like process_madvise() on it.
(At the time I'm writing this commit, process_madvise() is not in mainline
yet, but has been in akpm's tree for some time.)

While I believe that there are currently no APIs that would actually allow
another process to mess up our VMA tree (process_madvise() is limited to
MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
under which no syscalls have been executed yet), this seems like an
accident waiting to happen.

Let's make sure that we always take the mmap lock around GUP paths as long
as another process might be able to see the mm.

(Yes, this diff looks suspicious because we drop the lock before doing
anything with `vma`, but that's because we actually don't do anything with
it apart from the NULL check.)

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/binfmt_elf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 40ec0b9b4b4f..cd7c574a91a4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -309,7 +309,10 @@ create_elf_tables(struct linux_binprm *bprm,
const struct elfhdr *exec,
 	 * Grow the stack manually; some architectures have a limit on how
 	 * far ahead a user-space access may be in order to grow the stack.
 	 */
+	if (mmap_read_lock_killable(mm))
+		return -EINTR;
 	vma = find_extend_vma(mm, bprm->p);
+	mmap_read_unlock(mm);
 	if (!vma)
 		return -EFAULT;

-- 
2.28.0.709.gb0816b6eb0-goog

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-09-30  1:20   ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

In preparation for adding a mmap_assert_locked() check in
__get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
to operate on an mm without locking in the middle of execve() as long as
it hasn't been installed on a process yet.

Existing code paths that do this are (reverse callgraph):

  get_user_pages_remote
    get_arg_page
      copy_strings
      copy_string_kernel
      remove_arg_zero
    tomoyo_dump_page
      tomoyo_print_bprm
      tomoyo_scan_bprm
      tomoyo_environ

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/exec.c                 |  8 ++++++++
 include/linux/mm_types.h  |  9 +++++++++
 include/linux/mmap_lock.h | 16 ++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..c02b0e8e1c0b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1129,6 +1129,14 @@ static int exec_mmap(struct mm_struct *mm)
 		}
 	}

+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	/*
+	 * From here on, the mm may be accessed concurrently, and proper locking
+	 * is required for things like get_user_pages_remote().
+	 */
+	mm->mmap_lock_required = 1;
+#endif
+
 	task_lock(tsk);
 	active_mm = tsk->active_mm;
 	membarrier_exec_mmap(mm);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ed028af3cb19..89fee0d0d652 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -552,6 +552,15 @@ struct mm_struct {
 		atomic_long_t hugetlb_usage;
 #endif
 		struct work_struct async_put_work;
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+		/*
+		 * Notes whether this mm has been installed on a process yet.
+		 * If not, only the task going through execve() can access this
+		 * mm, and no locking is needed around get_user_pages_remote().
+		 * This flag is only used for debug checks.
+		 */
+		bool mmap_lock_required;
+#endif
 	} __randomize_layout;

 	/*
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 0707671851a8..c4fd874954d7 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -77,14 +77,22 @@ static inline void
mmap_read_unlock_non_owner(struct mm_struct *mm)

 static inline void mmap_assert_locked(struct mm_struct *mm)
 {
-	lockdep_assert_held(&mm->mmap_lock);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	if (mm->mmap_lock_required) {
+		lockdep_assert_held(&mm->mmap_lock);
+		VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+	}
+#endif
 }

 static inline void mmap_assert_write_locked(struct mm_struct *mm)
 {
-	lockdep_assert_held_write(&mm->mmap_lock);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEBUG_VM)
+	if (mm->mmap_lock_required) {
+		lockdep_assert_held_write(&mm->mmap_lock);
+		VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+	}
+#endif
 }

 #endif /* _LINUX_MMAP_LOCK_H */
-- 
2.28.0.709.gb0816b6eb0-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] binfmt_elf: Take the mmap lock around find_extend_vma()
@ 2020-09-30  1:20   ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

create_elf_tables() runs after setup_new_exec(), so other tasks can
already access our new mm and do things like process_madvise() on it.
(At the time I'm writing this commit, process_madvise() is not in mainline
yet, but has been in akpm's tree for some time.)

While I believe that there are currently no APIs that would actually allow
another process to mess up our VMA tree (process_madvise() is limited to
MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
under which no syscalls have been executed yet), this seems like an
accident waiting to happen.

Let's make sure that we always take the mmap lock around GUP paths as long
as another process might be able to see the mm.

(Yes, this diff looks suspicious because we drop the lock before doing
anything with `vma`, but that's because we actually don't do anything with
it apart from the NULL check.)

Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/binfmt_elf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 40ec0b9b4b4f..cd7c574a91a4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -309,7 +309,10 @@ create_elf_tables(struct linux_binprm *bprm,
const struct elfhdr *exec,
 	 * Grow the stack manually; some architectures have a limit on how
 	 * far ahead a user-space access may be in order to grow the stack.
 	 */
+	if (mmap_read_lock_killable(mm))
+		return -EINTR;
 	vma = find_extend_vma(mm, bprm->p);
+	mmap_read_unlock(mm);
 	if (!vma)
 		return -EFAULT;

-- 
2.28.0.709.gb0816b6eb0-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages()
       [not found] <20200930011944.19869-1-jannh@google.com>
@ 2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2020-09-30  1:20   ` Jann Horn
  2 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

After having cleaned up all GUP callers (except for the atomisp staging
driver, which currently gets mmap locking completely wrong [1]) to always
ensure that they hold the mmap lock when calling into GUP (unless the mm is
not yet globally visible), add an assertion to make sure it stays that way
going forward.

[1] https://lore.kernel.org/lkml/CAG48ez3tZAb9JVhw4T5e-i=h2_DUZxfNRTDsagSRCVazNXx5qA@mail.gmail.com/

Signed-off-by: Jann Horn <jannh@google.com>
---
 mm/gup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index f11d39867cf5..3e5d843215b9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1020,6 +1020,8 @@ static long __get_user_pages(struct mm_struct *mm,
 	struct vm_area_struct *vma = NULL;
 	struct follow_page_context ctx = { NULL };

+	mmap_assert_locked(mm);
+
 	if (!nr_pages)
 		return 0;

-- 
2.28.0.709.gb0816b6eb0-goog

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages()
@ 2020-09-30  1:20   ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30  1:20 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Eric W . Biederman, Michel Lespinasse,
	Mauro Carvalho Chehab, Sakari Ailus

After having cleaned up all GUP callers (except for the atomisp staging
driver, which currently gets mmap locking completely wrong [1]) to always
ensure that they hold the mmap lock when calling into GUP (unless the mm is
not yet globally visible), add an assertion to make sure it stays that way
going forward.

[1] https://lore.kernel.org/lkml/CAG48ez3tZAb9JVhw4T5e-i=h2_DUZxfNRTDsagSRCVazNXx5qA@mail.gmail.com/

Signed-off-by: Jann Horn <jannh@google.com>
---
 mm/gup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index f11d39867cf5..3e5d843215b9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1020,6 +1020,8 @@ static long __get_user_pages(struct mm_struct *mm,
 	struct vm_area_struct *vma = NULL;
 	struct follow_page_context ctx = { NULL };

+	mmap_assert_locked(mm);
+
 	if (!nr_pages)
 		return 0;

-- 
2.28.0.709.gb0816b6eb0-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30  1:20   ` Jann Horn
  (?)
@ 2020-09-30 12:30   ` Jason Gunthorpe
  2020-09-30 12:50       ` Jann Horn
  -1 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 12:30 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, linux-mm, linux-kernel, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> In preparation for adding a mmap_assert_locked() check in
> __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> to operate on an mm without locking in the middle of execve() as long as
> it hasn't been installed on a process yet.

I'm happy to see lockdep being added here, but can you elaborate on
why add this mmap_locked_required instead of obtaining the lock in the
execv path?

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages()
  2020-09-30  1:20   ` Jann Horn
  (?)
@ 2020-09-30 12:32   ` Jason Gunthorpe
  2020-09-30 23:24       ` Michel Lespinasse
  -1 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 12:32 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, linux-mm, linux-kernel, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Tue, Sep 29, 2020 at 06:20:01PM -0700, Jann Horn wrote:
> After having cleaned up all GUP callers (except for the atomisp staging
> driver, which currently gets mmap locking completely wrong [1]) to always
> ensure that they hold the mmap lock when calling into GUP (unless the mm is
> not yet globally visible), add an assertion to make sure it stays that way
> going forward.
> 
> [1] https://lore.kernel.org/lkml/CAG48ez3tZAb9JVhw4T5e-i=h2_DUZxfNRTDsagSRCVazNXx5qA@mail.gmail.com/
> 
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  mm/gup.c | 2 ++
>  1 file changed, 2 insertions(+)

I'm happy to see this, I have observed many cases of missing locking
here.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 12:30   ` Jason Gunthorpe
@ 2020-09-30 12:50       ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 12:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > In preparation for adding a mmap_assert_locked() check in
> > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > to operate on an mm without locking in the middle of execve() as long as
> > it hasn't been installed on a process yet.
>
> I'm happy to see lockdep being added here, but can you elaborate on
> why add this mmap_locked_required instead of obtaining the lock in the
> execv path?

My thinking was: At that point, we're logically still in the
single-owner initialization phase of the mm_struct. Almost any object
has initialization and teardown steps that occur in a context where
the object only has a single owner, and therefore no locking is
required. It seems to me that adding locking in places like
get_arg_page() would be confusing because it would suggest the
existence of concurrency where there is no actual concurrency, and it
might be annoying in terms of lockdep if someone tries to use
something like get_arg_page() while holding the mmap_sem of the
calling process. It would also mean that we'd be doing extra locking
in normal kernel builds that isn't actually logically required.

Hmm, on the other hand, dup_mmap() already locks the child mm (with
mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
do it in get_arg_page() and tomoyo_dump_page(), with comments that
note that we're doing this for lockdep consistency... I guess I can go
change this in v2.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-09-30 12:50       ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 12:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > In preparation for adding a mmap_assert_locked() check in
> > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > to operate on an mm without locking in the middle of execve() as long as
> > it hasn't been installed on a process yet.
>
> I'm happy to see lockdep being added here, but can you elaborate on
> why add this mmap_locked_required instead of obtaining the lock in the
> execv path?

My thinking was: At that point, we're logically still in the
single-owner initialization phase of the mm_struct. Almost any object
has initialization and teardown steps that occur in a context where
the object only has a single owner, and therefore no locking is
required. It seems to me that adding locking in places like
get_arg_page() would be confusing because it would suggest the
existence of concurrency where there is no actual concurrency, and it
might be annoying in terms of lockdep if someone tries to use
something like get_arg_page() while holding the mmap_sem of the
calling process. It would also mean that we'd be doing extra locking
in normal kernel builds that isn't actually logically required.

Hmm, on the other hand, dup_mmap() already locks the child mm (with
mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
do it in get_arg_page() and tomoyo_dump_page(), with comments that
note that we're doing this for lockdep consistency... I guess I can go
change this in v2.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 12:50       ` Jann Horn
@ 2020-09-30 20:14         ` Jann Horn
  -1 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 20:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > In preparation for adding a mmap_assert_locked() check in
> > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > to operate on an mm without locking in the middle of execve() as long as
> > > it hasn't been installed on a process yet.
> >
> > I'm happy to see lockdep being added here, but can you elaborate on
> > why add this mmap_locked_required instead of obtaining the lock in the
> > execv path?
>
> My thinking was: At that point, we're logically still in the
> single-owner initialization phase of the mm_struct. Almost any object
> has initialization and teardown steps that occur in a context where
> the object only has a single owner, and therefore no locking is
> required. It seems to me that adding locking in places like
> get_arg_page() would be confusing because it would suggest the
> existence of concurrency where there is no actual concurrency, and it
> might be annoying in terms of lockdep if someone tries to use
> something like get_arg_page() while holding the mmap_sem of the
> calling process. It would also mean that we'd be doing extra locking
> in normal kernel builds that isn't actually logically required.
>
> Hmm, on the other hand, dup_mmap() already locks the child mm (with
> mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> do it in get_arg_page() and tomoyo_dump_page(), with comments that
> note that we're doing this for lockdep consistency... I guess I can go
> change this in v2.

Actually, I'm taking that back. There's an extra problem:
get_arg_page() accesses bprm->vma, which is set all the way back in
__bprm_mm_init(). We really shouldn't be pretending that we're
properly taking the mmap_sem when actually, we keep reusing a
vm_area_struct pointer.

So for that reason I prefer the approach in the existing patch, where
we make it clear that mm_struct has two different lifetime phases in
which GUP works, and that those lifetime phases have very different
locking requirements.

Does that sound reasonable?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-09-30 20:14         ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 20:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > In preparation for adding a mmap_assert_locked() check in
> > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > to operate on an mm without locking in the middle of execve() as long as
> > > it hasn't been installed on a process yet.
> >
> > I'm happy to see lockdep being added here, but can you elaborate on
> > why add this mmap_locked_required instead of obtaining the lock in the
> > execv path?
>
> My thinking was: At that point, we're logically still in the
> single-owner initialization phase of the mm_struct. Almost any object
> has initialization and teardown steps that occur in a context where
> the object only has a single owner, and therefore no locking is
> required. It seems to me that adding locking in places like
> get_arg_page() would be confusing because it would suggest the
> existence of concurrency where there is no actual concurrency, and it
> might be annoying in terms of lockdep if someone tries to use
> something like get_arg_page() while holding the mmap_sem of the
> calling process. It would also mean that we'd be doing extra locking
> in normal kernel builds that isn't actually logically required.
>
> Hmm, on the other hand, dup_mmap() already locks the child mm (with
> mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> do it in get_arg_page() and tomoyo_dump_page(), with comments that
> note that we're doing this for lockdep consistency... I guess I can go
> change this in v2.

Actually, I'm taking that back. There's an extra problem:
get_arg_page() accesses bprm->vma, which is set all the way back in
__bprm_mm_init(). We really shouldn't be pretending that we're
properly taking the mmap_sem when actually, we keep reusing a
vm_area_struct pointer.

So for that reason I prefer the approach in the existing patch, where
we make it clear that mm_struct has two different lifetime phases in
which GUP works, and that those lifetime phases have very different
locking requirements.

Does that sound reasonable?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] binfmt_elf: Take the mmap lock around find_extend_vma()
  2020-09-30  1:20   ` Jann Horn
@ 2020-09-30 23:22     ` Michel Lespinasse
  -1 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:22 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, linux-mm, LKML, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Tue, Sep 29, 2020 at 6:20 PM Jann Horn <jannh@google.com> wrote:
> create_elf_tables() runs after setup_new_exec(), so other tasks can
> already access our new mm and do things like process_madvise() on it.
> (At the time I'm writing this commit, process_madvise() is not in mainline
> yet, but has been in akpm's tree for some time.)
>
> While I believe that there are currently no APIs that would actually allow
> another process to mess up our VMA tree (process_madvise() is limited to
> MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
> under which no syscalls have been executed yet), this seems like an
> accident waiting to happen.
>
> Let's make sure that we always take the mmap lock around GUP paths as long
> as another process might be able to see the mm.
>
> (Yes, this diff looks suspicious because we drop the lock before doing
> anything with `vma`, but that's because we actually don't do anything with
> it apart from the NULL check.)
>
> Signed-off-by: Jann Horn <jannh@google.com>

Thanks for these cleanups :)
Acked-by: Michel Lespinasse <walken@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] binfmt_elf: Take the mmap lock around find_extend_vma()
@ 2020-09-30 23:22     ` Michel Lespinasse
  0 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:22 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, linux-mm, LKML, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Tue, Sep 29, 2020 at 6:20 PM Jann Horn <jannh@google.com> wrote:
> create_elf_tables() runs after setup_new_exec(), so other tasks can
> already access our new mm and do things like process_madvise() on it.
> (At the time I'm writing this commit, process_madvise() is not in mainline
> yet, but has been in akpm's tree for some time.)
>
> While I believe that there are currently no APIs that would actually allow
> another process to mess up our VMA tree (process_madvise() is limited to
> MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
> under which no syscalls have been executed yet), this seems like an
> accident waiting to happen.
>
> Let's make sure that we always take the mmap lock around GUP paths as long
> as another process might be able to see the mm.
>
> (Yes, this diff looks suspicious because we drop the lock before doing
> anything with `vma`, but that's because we actually don't do anything with
> it apart from the NULL check.)
>
> Signed-off-by: Jann Horn <jannh@google.com>

Thanks for these cleanups :)
Acked-by: Michel Lespinasse <walken@google.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages()
  2020-09-30 12:32   ` Jason Gunthorpe
@ 2020-09-30 23:24       ` Michel Lespinasse
  0 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jann Horn, Andrew Morton, linux-mm, LKML, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 5:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Sep 29, 2020 at 06:20:01PM -0700, Jann Horn wrote:
> > After having cleaned up all GUP callers (except for the atomisp staging
> > driver, which currently gets mmap locking completely wrong [1]) to always
> > ensure that they hold the mmap lock when calling into GUP (unless the mm is
> > not yet globally visible), add an assertion to make sure it stays that way
> > going forward.

Thanks for doing this, there is a lot of value in ensuring that a
function's callers follows the prerequisites.

Acked-by: Michel Lespinasse <walken@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages()
@ 2020-09-30 23:24       ` Michel Lespinasse
  0 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jann Horn, Andrew Morton, linux-mm, LKML, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 5:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Sep 29, 2020 at 06:20:01PM -0700, Jann Horn wrote:
> > After having cleaned up all GUP callers (except for the atomisp staging
> > driver, which currently gets mmap locking completely wrong [1]) to always
> > ensure that they hold the mmap lock when calling into GUP (unless the mm is
> > not yet globally visible), add an assertion to make sure it stays that way
> > going forward.

Thanks for doing this, there is a lot of value in ensuring that a
function's callers follows the prerequisites.

Acked-by: Michel Lespinasse <walken@google.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 20:14         ` Jann Horn
  (?)
@ 2020-09-30 23:26         ` Jason Gunthorpe
  2020-09-30 23:51             ` Jann Horn
  -1 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2020-09-30 23:26 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Michel Lespinasse, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > In preparation for adding a mmap_assert_locked() check in
> > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > to operate on an mm without locking in the middle of execve() as long as
> > > > it hasn't been installed on a process yet.
> > >
> > > I'm happy to see lockdep being added here, but can you elaborate on
> > > why add this mmap_locked_required instead of obtaining the lock in the
> > > execv path?
> >
> > My thinking was: At that point, we're logically still in the
> > single-owner initialization phase of the mm_struct. Almost any object
> > has initialization and teardown steps that occur in a context where
> > the object only has a single owner, and therefore no locking is
> > required. It seems to me that adding locking in places like
> > get_arg_page() would be confusing because it would suggest the
> > existence of concurrency where there is no actual concurrency, and it
> > might be annoying in terms of lockdep if someone tries to use
> > something like get_arg_page() while holding the mmap_sem of the
> > calling process. It would also mean that we'd be doing extra locking
> > in normal kernel builds that isn't actually logically required.
> >
> > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > note that we're doing this for lockdep consistency... I guess I can go
> > change this in v2.
> 
> Actually, I'm taking that back. There's an extra problem:
> get_arg_page() accesses bprm->vma, which is set all the way back in
> __bprm_mm_init(). We really shouldn't be pretending that we're
> properly taking the mmap_sem when actually, we keep reusing a
> vm_area_struct pointer.

Any chance the mmap lock can just be held from mm_struct allocation
till exec inserts it into the process?

> Does that sound reasonable?

My only concern is how weird it is to do this with a variable, I've
never seen something like this before

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 20:14         ` Jann Horn
@ 2020-09-30 23:42           ` Michel Lespinasse
  -1 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:42 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jason Gunthorpe, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 1:15 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > In preparation for adding a mmap_assert_locked() check in
> > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > to operate on an mm without locking in the middle of execve() as long as
> > > > it hasn't been installed on a process yet.
> > >
> > > I'm happy to see lockdep being added here, but can you elaborate on
> > > why add this mmap_locked_required instead of obtaining the lock in the
> > > execv path?
> >
> > My thinking was: At that point, we're logically still in the
> > single-owner initialization phase of the mm_struct. Almost any object
> > has initialization and teardown steps that occur in a context where
> > the object only has a single owner, and therefore no locking is
> > required. It seems to me that adding locking in places like
> > get_arg_page() would be confusing because it would suggest the
> > existence of concurrency where there is no actual concurrency, and it
> > might be annoying in terms of lockdep if someone tries to use
> > something like get_arg_page() while holding the mmap_sem of the
> > calling process. It would also mean that we'd be doing extra locking
> > in normal kernel builds that isn't actually logically required.
> >
> > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > note that we're doing this for lockdep consistency... I guess I can go
> > change this in v2.
>
> Actually, I'm taking that back. There's an extra problem:
> get_arg_page() accesses bprm->vma, which is set all the way back in
> __bprm_mm_init(). We really shouldn't be pretending that we're
> properly taking the mmap_sem when actually, we keep reusing a
> vm_area_struct pointer.
>
> So for that reason I prefer the approach in the existing patch, where
> we make it clear that mm_struct has two different lifetime phases in
> which GUP works, and that those lifetime phases have very different
> locking requirements.
>
> Does that sound reasonable?

I'm really not a fan of adding such exceptions; I think it's both
unusual and adds complexity that is not strictly contained into the
init paths.

I don't really understand the concern with the bprm vma in
get_arg_page(); I'm not super familiar with this code but isn't it a
normal vma within the process that __do_execve_file() is creating ? I
received Jason's last email while I was composing this one, but I
think I have the same concern/approach as him, i.e. I think it would
be simplest to keep the new MM locked through the __do_execve_file()
call and avoid adding the mmap_lock_required exception to the
mmap_assert_locked rule.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-09-30 23:42           ` Michel Lespinasse
  0 siblings, 0 replies; 27+ messages in thread
From: Michel Lespinasse @ 2020-09-30 23:42 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jason Gunthorpe, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Wed, Sep 30, 2020 at 1:15 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > In preparation for adding a mmap_assert_locked() check in
> > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > to operate on an mm without locking in the middle of execve() as long as
> > > > it hasn't been installed on a process yet.
> > >
> > > I'm happy to see lockdep being added here, but can you elaborate on
> > > why add this mmap_locked_required instead of obtaining the lock in the
> > > execv path?
> >
> > My thinking was: At that point, we're logically still in the
> > single-owner initialization phase of the mm_struct. Almost any object
> > has initialization and teardown steps that occur in a context where
> > the object only has a single owner, and therefore no locking is
> > required. It seems to me that adding locking in places like
> > get_arg_page() would be confusing because it would suggest the
> > existence of concurrency where there is no actual concurrency, and it
> > might be annoying in terms of lockdep if someone tries to use
> > something like get_arg_page() while holding the mmap_sem of the
> > calling process. It would also mean that we'd be doing extra locking
> > in normal kernel builds that isn't actually logically required.
> >
> > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > note that we're doing this for lockdep consistency... I guess I can go
> > change this in v2.
>
> Actually, I'm taking that back. There's an extra problem:
> get_arg_page() accesses bprm->vma, which is set all the way back in
> __bprm_mm_init(). We really shouldn't be pretending that we're
> properly taking the mmap_sem when actually, we keep reusing a
> vm_area_struct pointer.
>
> So for that reason I prefer the approach in the existing patch, where
> we make it clear that mm_struct has two different lifetime phases in
> which GUP works, and that those lifetime phases have very different
> locking requirements.
>
> Does that sound reasonable?

I'm really not a fan of adding such exceptions; I think it's both
unusual and adds complexity that is not strictly contained into the
init paths.

I don't really understand the concern with the bprm vma in
get_arg_page(); I'm not super familiar with this code but isn't it a
normal vma within the process that __do_execve_file() is creating ? I
received Jason's last email while I was composing this one, but I
think I have the same concern/approach as him, i.e. I think it would
be simplest to keep the new MM locked through the __do_execve_file()
call and avoid adding the mmap_lock_required exception to the
mmap_assert_locked rule.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 23:26         ` Jason Gunthorpe
@ 2020-09-30 23:51             ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 23:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Michel Lespinasse
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 1, 2020 at 1:26 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> > On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > > In preparation for adding a mmap_assert_locked() check in
> > > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > > to operate on an mm without locking in the middle of execve() as long as
> > > > > it hasn't been installed on a process yet.
> > > >
> > > > I'm happy to see lockdep being added here, but can you elaborate on
> > > > why add this mmap_locked_required instead of obtaining the lock in the
> > > > execv path?
> > >
> > > My thinking was: At that point, we're logically still in the
> > > single-owner initialization phase of the mm_struct. Almost any object
> > > has initialization and teardown steps that occur in a context where
> > > the object only has a single owner, and therefore no locking is
> > > required. It seems to me that adding locking in places like
> > > get_arg_page() would be confusing because it would suggest the
> > > existence of concurrency where there is no actual concurrency, and it
> > > might be annoying in terms of lockdep if someone tries to use
> > > something like get_arg_page() while holding the mmap_sem of the
> > > calling process. It would also mean that we'd be doing extra locking
> > > in normal kernel builds that isn't actually logically required.
> > >
> > > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > > note that we're doing this for lockdep consistency... I guess I can go
> > > change this in v2.
> >
> > Actually, I'm taking that back. There's an extra problem:
> > get_arg_page() accesses bprm->vma, which is set all the way back in
> > __bprm_mm_init(). We really shouldn't be pretending that we're
> > properly taking the mmap_sem when actually, we keep reusing a
> > vm_area_struct pointer.
>
> Any chance the mmap lock can just be held from mm_struct allocation
> till exec inserts it into the process?

Hm... it should work if we define a lockdep subclass for this so that
lockdep is happy when we call get_user() on the old mm_struct while
holding that mmap lock.

> > Does that sound reasonable?
>
> My only concern is how weird it is to do this with a variable, I've
> never seen something like this before

It seems clearer to me this way than taking locks when there is no
concurrency that we actually need to guard against. But since both you
and Michel seem to hate it, I'll go and code up the version with a
lockdep subclass. Under protest. :P

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-09-30 23:51             ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-09-30 23:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Michel Lespinasse
  Cc: Andrew Morton, Linux-MM, kernel list, Eric W . Biederman,
	Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 1, 2020 at 1:26 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> > On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > > In preparation for adding a mmap_assert_locked() check in
> > > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > > to operate on an mm without locking in the middle of execve() as long as
> > > > > it hasn't been installed on a process yet.
> > > >
> > > > I'm happy to see lockdep being added here, but can you elaborate on
> > > > why add this mmap_locked_required instead of obtaining the lock in the
> > > > execv path?
> > >
> > > My thinking was: At that point, we're logically still in the
> > > single-owner initialization phase of the mm_struct. Almost any object
> > > has initialization and teardown steps that occur in a context where
> > > the object only has a single owner, and therefore no locking is
> > > required. It seems to me that adding locking in places like
> > > get_arg_page() would be confusing because it would suggest the
> > > existence of concurrency where there is no actual concurrency, and it
> > > might be annoying in terms of lockdep if someone tries to use
> > > something like get_arg_page() while holding the mmap_sem of the
> > > calling process. It would also mean that we'd be doing extra locking
> > > in normal kernel builds that isn't actually logically required.
> > >
> > > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > > note that we're doing this for lockdep consistency... I guess I can go
> > > change this in v2.
> >
> > Actually, I'm taking that back. There's an extra problem:
> > get_arg_page() accesses bprm->vma, which is set all the way back in
> > __bprm_mm_init(). We really shouldn't be pretending that we're
> > properly taking the mmap_sem when actually, we keep reusing a
> > vm_area_struct pointer.
>
> Any chance the mmap lock can just be held from mm_struct allocation
> till exec inserts it into the process?

Hm... it should work if we define a lockdep subclass for this so that
lockdep is happy when we call get_user() on the old mm_struct while
holding that mmap lock.

> > Does that sound reasonable?
>
> My only concern is how weird it is to do this with a variable, I've
> never seen something like this before

It seems clearer to me this way than taking locks when there is no
concurrency that we actually need to guard against. But since both you
and Michel seem to hate it, I'll go and code up the version with a
lockdep subclass. Under protest. :P


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-09-30 23:51             ` Jann Horn
  (?)
@ 2020-10-01 19:15             ` Jason Gunthorpe
  2020-10-01 20:16                 ` Jann Horn
  -1 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2020-10-01 19:15 UTC (permalink / raw)
  To: Jann Horn
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 01, 2020 at 01:51:33AM +0200, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 1:26 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> > > On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > > > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > > > In preparation for adding a mmap_assert_locked() check in
> > > > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > > > to operate on an mm without locking in the middle of execve() as long as
> > > > > > it hasn't been installed on a process yet.
> > > > >
> > > > > I'm happy to see lockdep being added here, but can you elaborate on
> > > > > why add this mmap_locked_required instead of obtaining the lock in the
> > > > > execv path?
> > > >
> > > > My thinking was: At that point, we're logically still in the
> > > > single-owner initialization phase of the mm_struct. Almost any object
> > > > has initialization and teardown steps that occur in a context where
> > > > the object only has a single owner, and therefore no locking is
> > > > required. It seems to me that adding locking in places like
> > > > get_arg_page() would be confusing because it would suggest the
> > > > existence of concurrency where there is no actual concurrency, and it
> > > > might be annoying in terms of lockdep if someone tries to use
> > > > something like get_arg_page() while holding the mmap_sem of the
> > > > calling process. It would also mean that we'd be doing extra locking
> > > > in normal kernel builds that isn't actually logically required.
> > > >
> > > > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > > > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > > > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > > > note that we're doing this for lockdep consistency... I guess I can go
> > > > change this in v2.
> > >
> > > Actually, I'm taking that back. There's an extra problem:
> > > get_arg_page() accesses bprm->vma, which is set all the way back in
> > > __bprm_mm_init(). We really shouldn't be pretending that we're
> > > properly taking the mmap_sem when actually, we keep reusing a
> > > vm_area_struct pointer.
> >
> > Any chance the mmap lock can just be held from mm_struct allocation
> > till exec inserts it into the process?
> 
> Hm... it should work if we define a lockdep subclass for this so that
> lockdep is happy when we call get_user() on the old mm_struct while
> holding that mmap lock.

A subclass isn't right, it has to be a _nested annotation.

nested locking is a pretty good reason to not be able to do this, this
is something lockdep does struggle to model.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-10-01 19:15             ` Jason Gunthorpe
@ 2020-10-01 20:16                 ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-10-01 20:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 1, 2020 at 9:15 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Oct 01, 2020 at 01:51:33AM +0200, Jann Horn wrote:
> > On Thu, Oct 1, 2020 at 1:26 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> > > > On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > > > > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > > > > In preparation for adding a mmap_assert_locked() check in
> > > > > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > > > > to operate on an mm without locking in the middle of execve() as long as
> > > > > > > it hasn't been installed on a process yet.
> > > > > >
> > > > > > I'm happy to see lockdep being added here, but can you elaborate on
> > > > > > why add this mmap_locked_required instead of obtaining the lock in the
> > > > > > execv path?
> > > > >
> > > > > My thinking was: At that point, we're logically still in the
> > > > > single-owner initialization phase of the mm_struct. Almost any object
> > > > > has initialization and teardown steps that occur in a context where
> > > > > the object only has a single owner, and therefore no locking is
> > > > > required. It seems to me that adding locking in places like
> > > > > get_arg_page() would be confusing because it would suggest the
> > > > > existence of concurrency where there is no actual concurrency, and it
> > > > > might be annoying in terms of lockdep if someone tries to use
> > > > > something like get_arg_page() while holding the mmap_sem of the
> > > > > calling process. It would also mean that we'd be doing extra locking
> > > > > in normal kernel builds that isn't actually logically required.
> > > > >
> > > > > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > > > > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > > > > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > > > > note that we're doing this for lockdep consistency... I guess I can go
> > > > > change this in v2.
> > > >
> > > > Actually, I'm taking that back. There's an extra problem:
> > > > get_arg_page() accesses bprm->vma, which is set all the way back in
> > > > __bprm_mm_init(). We really shouldn't be pretending that we're
> > > > properly taking the mmap_sem when actually, we keep reusing a
> > > > vm_area_struct pointer.
> > >
> > > Any chance the mmap lock can just be held from mm_struct allocation
> > > till exec inserts it into the process?
> >
> > Hm... it should work if we define a lockdep subclass for this so that
> > lockdep is happy when we call get_user() on the old mm_struct while
> > holding that mmap lock.
>
> A subclass isn't right, it has to be a _nested annotation.
>
> nested locking is a pretty good reason to not be able to do this, this
> is something lockdep does struggle to model.

Did I get the terminology wrong? I thought they were the same. The
down_*_nested() APIs take an argument "subclass", with the default
subclass for the functions without "_nested" being 0.

Anyway, I wrote a patch for this yesterday, I'll send it out later
today after testing that it still boots without lockdep warnings. Then
you can decide whether you prefer it to the current patch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-10-01 20:16                 ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-10-01 20:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 1, 2020 at 9:15 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Oct 01, 2020 at 01:51:33AM +0200, Jann Horn wrote:
> > On Thu, Oct 1, 2020 at 1:26 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Wed, Sep 30, 2020 at 10:14:57PM +0200, Jann Horn wrote:
> > > > On Wed, Sep 30, 2020 at 2:50 PM Jann Horn <jannh@google.com> wrote:
> > > > > On Wed, Sep 30, 2020 at 2:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > > On Tue, Sep 29, 2020 at 06:20:00PM -0700, Jann Horn wrote:
> > > > > > > In preparation for adding a mmap_assert_locked() check in
> > > > > > > __get_user_pages(), teach the mmap_assert_*locked() helpers that it's fine
> > > > > > > to operate on an mm without locking in the middle of execve() as long as
> > > > > > > it hasn't been installed on a process yet.
> > > > > >
> > > > > > I'm happy to see lockdep being added here, but can you elaborate on
> > > > > > why add this mmap_locked_required instead of obtaining the lock in the
> > > > > > execv path?
> > > > >
> > > > > My thinking was: At that point, we're logically still in the
> > > > > single-owner initialization phase of the mm_struct. Almost any object
> > > > > has initialization and teardown steps that occur in a context where
> > > > > the object only has a single owner, and therefore no locking is
> > > > > required. It seems to me that adding locking in places like
> > > > > get_arg_page() would be confusing because it would suggest the
> > > > > existence of concurrency where there is no actual concurrency, and it
> > > > > might be annoying in terms of lockdep if someone tries to use
> > > > > something like get_arg_page() while holding the mmap_sem of the
> > > > > calling process. It would also mean that we'd be doing extra locking
> > > > > in normal kernel builds that isn't actually logically required.
> > > > >
> > > > > Hmm, on the other hand, dup_mmap() already locks the child mm (with
> > > > > mmap_write_lock_nested()), so I guess it wouldn't be too bad to also
> > > > > do it in get_arg_page() and tomoyo_dump_page(), with comments that
> > > > > note that we're doing this for lockdep consistency... I guess I can go
> > > > > change this in v2.
> > > >
> > > > Actually, I'm taking that back. There's an extra problem:
> > > > get_arg_page() accesses bprm->vma, which is set all the way back in
> > > > __bprm_mm_init(). We really shouldn't be pretending that we're
> > > > properly taking the mmap_sem when actually, we keep reusing a
> > > > vm_area_struct pointer.
> > >
> > > Any chance the mmap lock can just be held from mm_struct allocation
> > > till exec inserts it into the process?
> >
> > Hm... it should work if we define a lockdep subclass for this so that
> > lockdep is happy when we call get_user() on the old mm_struct while
> > holding that mmap lock.
>
> A subclass isn't right, it has to be a _nested annotation.
>
> nested locking is a pretty good reason to not be able to do this, this
> is something lockdep does struggle to model.

Did I get the terminology wrong? I thought they were the same. The
down_*_nested() APIs take an argument "subclass", with the default
subclass for the functions without "_nested" being 0.

Anyway, I wrote a patch for this yesterday, I'll send it out later
today after testing that it still boots without lockdep warnings. Then
you can decide whether you prefer it to the current patch.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-10-01 20:16                 ` Jann Horn
  (?)
@ 2020-10-01 23:41                 ` Jason Gunthorpe
  2020-10-01 23:55                     ` Jann Horn
  -1 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2020-10-01 23:41 UTC (permalink / raw)
  To: Jann Horn
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Thu, Oct 01, 2020 at 10:16:35PM +0200, Jann Horn wrote:

> > A subclass isn't right, it has to be a _nested annotation.
> >
> > nested locking is a pretty good reason to not be able to do this, this
> > is something lockdep does struggle to model.
> 
> Did I get the terminology wrong? I thought they were the same. The
> down_*_nested() APIs take an argument "subclass", with the default
> subclass for the functions without "_nested" being 0.

AFAIK a subclass at init time sticks with the lock forever, the
_nested ones are temporary overrides.

I think what you kind of want is to start out with
lockdep_set_novalidate_class() then switch to a real class once things
are finished. Not sure exactly how :)

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
  2020-10-01 23:41                 ` Jason Gunthorpe
@ 2020-10-01 23:55                     ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-10-01 23:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Fri, Oct 2, 2020 at 1:41 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Oct 01, 2020 at 10:16:35PM +0200, Jann Horn wrote:
> > > A subclass isn't right, it has to be a _nested annotation.
> > >
> > > nested locking is a pretty good reason to not be able to do this, this
> > > is something lockdep does struggle to model.
> >
> > Did I get the terminology wrong? I thought they were the same. The
> > down_*_nested() APIs take an argument "subclass", with the default
> > subclass for the functions without "_nested" being 0.
>
> AFAIK a subclass at init time sticks with the lock forever, the
> _nested ones are temporary overrides.
>
> I think what you kind of want is to start out with
> lockdep_set_novalidate_class() then switch to a real class once things
> are finished. Not sure exactly how :)

Huh, is there an API that sets a *subclass* (not a class) at init
time? I don't think there is.

Anyway, I'm pretty sure I just need to use the normal _nested()
locking API. I'm still cleaning up and testing a little bit, but I'll
send it out in a short while, unless I run into unexpected trouble.
Let's continue this if necessary once there's a concrete patch to talk
about. :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet
@ 2020-10-01 23:55                     ` Jann Horn
  0 siblings, 0 replies; 27+ messages in thread
From: Jann Horn @ 2020-10-01 23:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michel Lespinasse, Andrew Morton, Linux-MM, kernel list,
	Eric W . Biederman, Mauro Carvalho Chehab, Sakari Ailus

On Fri, Oct 2, 2020 at 1:41 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Oct 01, 2020 at 10:16:35PM +0200, Jann Horn wrote:
> > > A subclass isn't right, it has to be a _nested annotation.
> > >
> > > nested locking is a pretty good reason to not be able to do this, this
> > > is something lockdep does struggle to model.
> >
> > Did I get the terminology wrong? I thought they were the same. The
> > down_*_nested() APIs take an argument "subclass", with the default
> > subclass for the functions without "_nested" being 0.
>
> AFAIK a subclass at init time sticks with the lock forever, the
> _nested ones are temporary overrides.
>
> I think what you kind of want is to start out with
> lockdep_set_novalidate_class() then switch to a real class once things
> are finished. Not sure exactly how :)

Huh, is there an API that sets a *subclass* (not a class) at init
time? I don't think there is.

Anyway, I'm pretty sure I just need to use the normal _nested()
locking API. I'm still cleaning up and testing a little bit, but I'll
send it out in a short while, unless I run into unexpected trouble.
Let's continue this if necessary once there's a concrete patch to talk
about. :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-10-01 23:56 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200930011944.19869-1-jannh@google.com>
2020-09-30  1:20 ` [PATCH 2/4] binfmt_elf: Take the mmap lock around find_extend_vma() Jann Horn
2020-09-30  1:20   ` Jann Horn
2020-09-30 23:22   ` Michel Lespinasse
2020-09-30 23:22     ` Michel Lespinasse
2020-09-30  1:20 ` [PATCH 3/4] mmap locking API: Don't check locking if the mm isn't live yet Jann Horn
2020-09-30  1:20   ` Jann Horn
2020-09-30 12:30   ` Jason Gunthorpe
2020-09-30 12:50     ` Jann Horn
2020-09-30 12:50       ` Jann Horn
2020-09-30 20:14       ` Jann Horn
2020-09-30 20:14         ` Jann Horn
2020-09-30 23:26         ` Jason Gunthorpe
2020-09-30 23:51           ` Jann Horn
2020-09-30 23:51             ` Jann Horn
2020-10-01 19:15             ` Jason Gunthorpe
2020-10-01 20:16               ` Jann Horn
2020-10-01 20:16                 ` Jann Horn
2020-10-01 23:41                 ` Jason Gunthorpe
2020-10-01 23:55                   ` Jann Horn
2020-10-01 23:55                     ` Jann Horn
2020-09-30 23:42         ` Michel Lespinasse
2020-09-30 23:42           ` Michel Lespinasse
2020-09-30  1:20 ` [PATCH 4/4] mm/gup: Assert that the mmap lock is held in __get_user_pages() Jann Horn
2020-09-30  1:20   ` Jann Horn
2020-09-30 12:32   ` Jason Gunthorpe
2020-09-30 23:24     ` Michel Lespinasse
2020-09-30 23:24       ` Michel Lespinasse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.