linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
@ 2021-11-16 21:57 Suren Baghdasaryan
  2021-11-16 21:57 ` [PATCH 2/2] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-16 21:57 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, mhocko, rientjes, willy, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team, surenb

oom-reaper and process_mrelease system call should protect against
races with exit_mmap which can destroy page tables while they
walk the VMA tree. oom-reaper protects from that race by setting
MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP
before taking and releasing mmap_write_lock. process_mrelease has
to elevate mm->mm_users to prevent such race. Both oom-reaper and
process_mrelease hold mmap_read_lock when walking the VMA tree.
The locking rules and mechanisms could be simpler if exit_mmap takes
mmap_write_lock while executing destructive operations such as
free_pgtables.
Change exit_mmap to hold the mmap_write_lock when calling
free_pgtables. Operations like unmap_vmas() and unlock_range() are not
destructive and could run under mmap_read_lock but for simplicity we
take one mmap_write_lock during almost the entire operation. Note
also that because oom-reaper checks VM_LOCKED flag, unlock_range()
should not be allowed to race with it.
In most cases this lock should be uncontended. Previously, Kirill
reported ~4% regression caused by a similar change [1]. We reran the
same test and although the individual results are quite noisy, the
percentiles show lower regression with 1.6% being the worst case [2].
The change allows oom-reaper and process_mrelease to execute safely
under mmap_read_lock without worries that exit_mmap might destroy page
tables from under them.

[1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/
[2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/mmap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index bfb0ea164a90..69b3036c6dee 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3142,25 +3142,27 @@ void exit_mmap(struct mm_struct *mm)
 		 * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
 		 * __oom_reap_task_mm() will not block.
 		 *
-		 * This needs to be done before calling munlock_vma_pages_all(),
+		 * This needs to be done before calling unlock_range(),
 		 * which clears VM_LOCKED, otherwise the oom reaper cannot
 		 * reliably test it.
 		 */
 		(void)__oom_reap_task_mm(mm);
 
 		set_bit(MMF_OOM_SKIP, &mm->flags);
-		mmap_write_lock(mm);
-		mmap_write_unlock(mm);
 	}
 
+	mmap_write_lock(mm);
 	if (mm->locked_vm)
 		unlock_range(mm->mmap, ULONG_MAX);
 
 	arch_exit_mmap(mm);
 
 	vma = mm->mmap;
-	if (!vma)	/* Can happen if dup_mmap() received an OOM */
+	if (!vma) {
+		/* Can happen if dup_mmap() received an OOM */
+		mmap_write_unlock(mm);
 		return;
+	}
 
 	lru_add_drain();
 	flush_cache_mm(mm);
@@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
 	unmap_vmas(&tlb, vma, 0, -1);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb);
+	mmap_write_unlock(mm);
 
 	/*
 	 * Walk the list again, actually closing and freeing it,
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mm/oom_kill: allow process_mrelease to run under mmap_lock protection
  2021-11-16 21:57 [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
@ 2021-11-16 21:57 ` Suren Baghdasaryan
  2021-11-23  1:47 ` [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
  2021-11-23 13:19 ` Matthew Wilcox
  2 siblings, 0 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-16 21:57 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, mhocko, rientjes, willy, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team, surenb

With exit_mmap holding mmap_write_lock during free_pgtables call,
process_mrelease does not need to elevate mm->mm_users in order to
prevent exit_mmap from destrying pagetables while __oom_reap_task_mm
is walking the VMA tree. The change prevents process_mrelease from
calling the last mmput, which can lead to waiting for IO completion
in exit_aio.

Fixes: 337546e83fc7 ("mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/oom_kill.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1ddabefcfb5a..67780386f478 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1169,15 +1169,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
 		goto put_task;
 	}
 
-	if (mmget_not_zero(p->mm)) {
-		mm = p->mm;
-		if (task_will_free_mem(p))
-			reap = true;
-		else {
-			/* Error only if the work has not been done already */
-			if (!test_bit(MMF_OOM_SKIP, &mm->flags))
-				ret = -EINVAL;
-		}
+	mm = p->mm;
+	mmgrab(mm);
+
+	if (task_will_free_mem(p))
+		reap = true;
+	else {
+		/* Error only if the work has not been done already */
+		if (!test_bit(MMF_OOM_SKIP, &mm->flags))
+			ret = -EINVAL;
 	}
 	task_unlock(p);
 
@@ -1188,13 +1188,16 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
 		ret = -EINTR;
 		goto drop_mm;
 	}
-	if (!__oom_reap_task_mm(mm))
+	/*
+	 * Check MMF_OOM_SKIP again under mmap_read_lock protection to ensure
+	 * possible change in exit_mmap is seen
+	 */
+	if (!test_bit(MMF_OOM_SKIP, &mm->flags) && !__oom_reap_task_mm(mm))
 		ret = -EAGAIN;
 	mmap_read_unlock(mm);
 
 drop_mm:
-	if (mm)
-		mmput(mm);
+	mmdrop(mm);
 put_task:
 	put_task_struct(task);
 	return ret;
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-16 21:57 [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
  2021-11-16 21:57 ` [PATCH 2/2] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
@ 2021-11-23  1:47 ` Suren Baghdasaryan
  2021-11-23 13:19 ` Matthew Wilcox
  2 siblings, 0 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-23  1:47 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, mhocko, rientjes, willy, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Tue, Nov 16, 2021 at 1:57 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> oom-reaper and process_mrelease system call should protect against
> races with exit_mmap which can destroy page tables while they
> walk the VMA tree. oom-reaper protects from that race by setting
> MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP
> before taking and releasing mmap_write_lock. process_mrelease has
> to elevate mm->mm_users to prevent such race. Both oom-reaper and
> process_mrelease hold mmap_read_lock when walking the VMA tree.
> The locking rules and mechanisms could be simpler if exit_mmap takes
> mmap_write_lock while executing destructive operations such as
> free_pgtables.
> Change exit_mmap to hold the mmap_write_lock when calling
> free_pgtables. Operations like unmap_vmas() and unlock_range() are not
> destructive and could run under mmap_read_lock but for simplicity we
> take one mmap_write_lock during almost the entire operation. Note
> also that because oom-reaper checks VM_LOCKED flag, unlock_range()
> should not be allowed to race with it.
> In most cases this lock should be uncontended. Previously, Kirill
> reported ~4% regression caused by a similar change [1]. We reran the
> same test and although the individual results are quite noisy, the
> percentiles show lower regression with 1.6% being the worst case [2].
> The change allows oom-reaper and process_mrelease to execute safely
> under mmap_read_lock without worries that exit_mmap might destroy page
> tables from under them.
>
> [1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/
> [2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/

Friendly nudge.
Michal, Matthew, from our discussion in
https://lore.kernel.org/all/YXKhOKIIngIuJaYi@casper.infradead.org I
was under the impression this change would be interesting for you. Any
feedback?

>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/mmap.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bfb0ea164a90..69b3036c6dee 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3142,25 +3142,27 @@ void exit_mmap(struct mm_struct *mm)
>                  * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
>                  * __oom_reap_task_mm() will not block.
>                  *
> -                * This needs to be done before calling munlock_vma_pages_all(),
> +                * This needs to be done before calling unlock_range(),
>                  * which clears VM_LOCKED, otherwise the oom reaper cannot
>                  * reliably test it.
>                  */
>                 (void)__oom_reap_task_mm(mm);
>
>                 set_bit(MMF_OOM_SKIP, &mm->flags);
> -               mmap_write_lock(mm);
> -               mmap_write_unlock(mm);
>         }
>
> +       mmap_write_lock(mm);
>         if (mm->locked_vm)
>                 unlock_range(mm->mmap, ULONG_MAX);
>
>         arch_exit_mmap(mm);
>
>         vma = mm->mmap;
> -       if (!vma)       /* Can happen if dup_mmap() received an OOM */
> +       if (!vma) {
> +               /* Can happen if dup_mmap() received an OOM */
> +               mmap_write_unlock(mm);
>                 return;
> +       }
>
>         lru_add_drain();
>         flush_cache_mm(mm);
> @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
>         unmap_vmas(&tlb, vma, 0, -1);
>         free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
>         tlb_finish_mmu(&tlb);
> +       mmap_write_unlock(mm);
>
>         /*
>          * Walk the list again, actually closing and freeing it,
> --
> 2.34.0.rc1.387.gb447b232ab-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-16 21:57 [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
  2021-11-16 21:57 ` [PATCH 2/2] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
  2021-11-23  1:47 ` [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
@ 2021-11-23 13:19 ` Matthew Wilcox
  2021-11-23 17:56   ` Suren Baghdasaryan
  2 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2021-11-23 13:19 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, mhocko, mhocko, rientjes, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Tue, Nov 16, 2021 at 01:57:14PM -0800, Suren Baghdasaryan wrote:
> @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
>  	unmap_vmas(&tlb, vma, 0, -1);
>  	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
>  	tlb_finish_mmu(&tlb);
> +	mmap_write_unlock(mm);
>  
>  	/*
>  	 * Walk the list again, actually closing and freeing it,

Is there a reason to unlock here instead of after the remove_vma loop?
We'll need the mmap sem held during that loop when VMAs are stored in
the maple tree.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-23 13:19 ` Matthew Wilcox
@ 2021-11-23 17:56   ` Suren Baghdasaryan
  2021-11-24 12:20     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-23 17:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, mhocko, mhocko, rientjes, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Tue, Nov 23, 2021 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 16, 2021 at 01:57:14PM -0800, Suren Baghdasaryan wrote:
> > @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
> >       unmap_vmas(&tlb, vma, 0, -1);
> >       free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> >       tlb_finish_mmu(&tlb);
> > +     mmap_write_unlock(mm);
> >
> >       /*
> >        * Walk the list again, actually closing and freeing it,
>
> Is there a reason to unlock here instead of after the remove_vma loop?
> We'll need the mmap sem held during that loop when VMAs are stored in
> the maple tree.

I didn't realize remove_vma() would need to be protected as well. I
think I can move mmap_write_unlock down to cover the last walk too
with no impact.
Does anyone know if there was any specific reason to perform that last
walk with no locks held (as the comment states)? I can track that
comment back to Linux-2.6.12-rc2 merge with no earlier history, so not
sure if it's critical not to hold any locks at this point. Seems to me
it's ok to hold mmap_write_unlock but maybe I'm missing something?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-23 17:56   ` Suren Baghdasaryan
@ 2021-11-24 12:20     ` Michal Hocko
  2021-11-24 15:25       ` Suren Baghdasaryan
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2021-11-24 12:20 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox, akpm, rientjes, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Tue 23-11-21 09:56:41, Suren Baghdasaryan wrote:
> On Tue, Nov 23, 2021 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Nov 16, 2021 at 01:57:14PM -0800, Suren Baghdasaryan wrote:
> > > @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
> > >       unmap_vmas(&tlb, vma, 0, -1);
> > >       free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > >       tlb_finish_mmu(&tlb);
> > > +     mmap_write_unlock(mm);
> > >
> > >       /*
> > >        * Walk the list again, actually closing and freeing it,
> >
> > Is there a reason to unlock here instead of after the remove_vma loop?
> > We'll need the mmap sem held during that loop when VMAs are stored in
> > the maple tree.
> 
> I didn't realize remove_vma() would need to be protected as well. I
> think I can move mmap_write_unlock down to cover the last walk too
> with no impact.
> Does anyone know if there was any specific reason to perform that last
> walk with no locks held (as the comment states)? I can track that
> comment back to Linux-2.6.12-rc2 merge with no earlier history, so not
> sure if it's critical not to hold any locks at this point. Seems to me
> it's ok to hold mmap_write_unlock but maybe I'm missing something?

I suspect the primary reason was that neither fput (and callbacks
invoked from it) nor vm_close would need to be very careful about
interacting with mm locks. fput is async these days so it shouldn't be
problematic. vm_ops->close doesn't have any real contract definition AFAIK
but taking mmap_sem from those would be really suprising. They should be
mostly destructing internal vma state and that shouldn't really require
address space protection.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-24 12:20     ` Michal Hocko
@ 2021-11-24 15:25       ` Suren Baghdasaryan
  2021-11-25  0:01         ` Suren Baghdasaryan
  0 siblings, 1 reply; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-24 15:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, akpm, rientjes, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Wed, Nov 24, 2021 at 4:20 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 23-11-21 09:56:41, Suren Baghdasaryan wrote:
> > On Tue, Nov 23, 2021 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Nov 16, 2021 at 01:57:14PM -0800, Suren Baghdasaryan wrote:
> > > > @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
> > > >       unmap_vmas(&tlb, vma, 0, -1);
> > > >       free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > > >       tlb_finish_mmu(&tlb);
> > > > +     mmap_write_unlock(mm);
> > > >
> > > >       /*
> > > >        * Walk the list again, actually closing and freeing it,
> > >
> > > Is there a reason to unlock here instead of after the remove_vma loop?
> > > We'll need the mmap sem held during that loop when VMAs are stored in
> > > the maple tree.
> >
> > I didn't realize remove_vma() would need to be protected as well. I
> > think I can move mmap_write_unlock down to cover the last walk too
> > with no impact.
> > Does anyone know if there was any specific reason to perform that last
> > walk with no locks held (as the comment states)? I can track that
> > comment back to Linux-2.6.12-rc2 merge with no earlier history, so not
> > sure if it's critical not to hold any locks at this point. Seems to me
> > it's ok to hold mmap_write_unlock but maybe I'm missing something?
>
> I suspect the primary reason was that neither fput (and callbacks
> invoked from it) nor vm_close would need to be very careful about
> interacting with mm locks. fput is async these days so it shouldn't be
> problematic. vm_ops->close doesn't have any real contract definition AFAIK
> but taking mmap_sem from those would be really suprising. They should be
> mostly destructing internal vma state and that shouldn't really require
> address space protection.

Thanks for clarification, Michal. I'll post an updated patch with
remove_vma() loop executed under mmap_write_lock protection.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
  2021-11-24 15:25       ` Suren Baghdasaryan
@ 2021-11-25  0:01         ` Suren Baghdasaryan
  0 siblings, 0 replies; 8+ messages in thread
From: Suren Baghdasaryan @ 2021-11-25  0:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, akpm, rientjes, hannes, guro, riel, minchan,
	kirill, aarcange, christian, hch, oleg, david, jannh, shakeelb,
	luto, christian.brauner, fweimer, jengelh, timmurray, linux-mm,
	linux-kernel, kernel-team

On Wed, Nov 24, 2021 at 7:25 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Nov 24, 2021 at 4:20 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 23-11-21 09:56:41, Suren Baghdasaryan wrote:
> > > On Tue, Nov 23, 2021 at 5:19 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Tue, Nov 16, 2021 at 01:57:14PM -0800, Suren Baghdasaryan wrote:
> > > > > @@ -3170,6 +3172,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > >       unmap_vmas(&tlb, vma, 0, -1);
> > > > >       free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > > > >       tlb_finish_mmu(&tlb);
> > > > > +     mmap_write_unlock(mm);
> > > > >
> > > > >       /*
> > > > >        * Walk the list again, actually closing and freeing it,
> > > >
> > > > Is there a reason to unlock here instead of after the remove_vma loop?
> > > > We'll need the mmap sem held during that loop when VMAs are stored in
> > > > the maple tree.
> > >
> > > I didn't realize remove_vma() would need to be protected as well. I
> > > think I can move mmap_write_unlock down to cover the last walk too
> > > with no impact.
> > > Does anyone know if there was any specific reason to perform that last
> > > walk with no locks held (as the comment states)? I can track that
> > > comment back to Linux-2.6.12-rc2 merge with no earlier history, so not
> > > sure if it's critical not to hold any locks at this point. Seems to me
> > > it's ok to hold mmap_write_unlock but maybe I'm missing something?
> >
> > I suspect the primary reason was that neither fput (and callbacks
> > invoked from it) nor vm_close would need to be very careful about
> > interacting with mm locks. fput is async these days so it shouldn't be
> > problematic. vm_ops->close doesn't have any real contract definition AFAIK
> > but taking mmap_sem from those would be really suprising. They should be
> > mostly destructing internal vma state and that shouldn't really require
> > address space protection.
>
> Thanks for clarification, Michal. I'll post an updated patch with
> remove_vma() loop executed under mmap_write_lock protection.

v2 is posted at
https://lore.kernel.org/all/20211124235906.14437-1-surenb@google.com/

>
> > --
> > Michal Hocko
> > SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-25  0:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 21:57 [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
2021-11-16 21:57 ` [PATCH 2/2] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
2021-11-23  1:47 ` [PATCH 1/2] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
2021-11-23 13:19 ` Matthew Wilcox
2021-11-23 17:56   ` Suren Baghdasaryan
2021-11-24 12:20     ` Michal Hocko
2021-11-24 15:25       ` Suren Baghdasaryan
2021-11-25  0:01         ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).