All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, rientjes@google.com,
	willy@infradead.org, hannes@cmpxchg.org, guro@fb.com,
	riel@surriel.com, minchan@kernel.org, christian@brauner.io,
	hch@infradead.org, oleg@redhat.com, david@redhat.com,
	jannh@google.com, shakeelb@google.com, luto@kernel.org,
	christian.brauner@ubuntu.com, fweimer@redhat.com,
	jengelh@inai.de, linux-api@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH 1/1] mm: prevent a race between process_mrelease and exit_mmap
Date: Fri, 22 Oct 2021 10:03:29 +0200	[thread overview]
Message-ID: <YXJwUUPjfg9wV6MQ@dhcp22.suse.cz> (raw)
In-Reply-To: <20211022014658.263508-1-surenb@google.com>

On Thu 21-10-21 18:46:58, Suren Baghdasaryan wrote:
> Race between process_mrelease and exit_mmap, where free_pgtables is
> called while __oom_reap_task_mm is in progress, leads to kernel crash
> during pte_offset_map_lock call. oom-reaper avoids this race by setting
> MMF_OOM_VICTIM flag and causing exit_mmap to take and release
> mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock.
> Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to
> fix this race, however that would be considered a hack. Fix this race
> by elevating mm->mm_users and preventing exit_mmap from executing until
> process_mrelease is finished. Patch slightly refactors the code to adapt
> for a possible mmget_not_zero failure.
> This fix has considerable negative impact on process_mrelease performance
> and will likely need later optimization.

I am not sure there is any promise that process_mrelease will run in
parallel with the exiting process. In fact the primary purpose of this
syscall is to provide a reliable way to oom kill from user space. If you
want to optimize process exit resp. its exit_mmap part then you should
be using other means. So I would be careful calling this a regression.

I do agree that taking the reference count is the right approach here. I
was wrong previously [1] when saying that pinning the mm struct is
sufficient. I have completely forgot about the subtle sync in exit_mmap.
One way we can approach that would be to take exclusive mmap_sem
throughout the exit_mmap unconditionally. There was a push back against
that though so arguments would have to be re-evaluated.

[1] http://lkml.kernel.org/r/YQzZqFwDP7eUxwcn@dhcp22.suse.cz

That being said
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> Fixes: 884a7e5964e0 ("mm: introduce process_mrelease system call")
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/oom_kill.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 831340e7ad8b..989f35a2bbb1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1150,7 +1150,7 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
>  	struct task_struct *task;
>  	struct task_struct *p;
>  	unsigned int f_flags;
> -	bool reap = true;
> +	bool reap = false;
>  	struct pid *pid;
>  	long ret = 0;
>  
> @@ -1177,15 +1177,15 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
>  		goto put_task;
>  	}
>  
> -	mm = p->mm;
> -	mmgrab(mm);
> -
> -	/* If the work has been done already, just exit with success */
> -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> -		reap = false;
> -	else if (!task_will_free_mem(p)) {
> -		reap = false;
> -		ret = -EINVAL;
> +	if (mmget_not_zero(p->mm)) {
> +		mm = p->mm;
> +		if (task_will_free_mem(p))
> +			reap = true;
> +		else {
> +			/* Error only if the work has not been done already */
> +			if (!test_bit(MMF_OOM_SKIP, &mm->flags))
> +				ret = -EINVAL;
> +		}
>  	}
>  	task_unlock(p);
>  
> @@ -1201,7 +1201,8 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
>  	mmap_read_unlock(mm);
>  
>  drop_mm:
> -	mmdrop(mm);
> +	if (mm)
> +		mmput(mm);
>  put_task:
>  	put_task_struct(task);
>  put_pid:
> -- 
> 2.33.0.1079.g6e70778dc9-goog

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2021-10-22  8:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-22  1:46 [PATCH 1/1] mm: prevent a race between process_mrelease and exit_mmap Suren Baghdasaryan
2021-10-22  2:24 ` Andrew Morton
2021-10-22  5:23   ` Suren Baghdasaryan
2021-10-22  8:03 ` Michal Hocko [this message]
2021-10-22 11:32   ` Matthew Wilcox
2021-10-22 12:04     ` Michal Hocko
2021-10-22 17:38   ` Suren Baghdasaryan
2021-10-27 16:08     ` Suren Baghdasaryan
2021-10-27 17:33       ` Matthew Wilcox
2021-10-27 17:42         ` Suren Baghdasaryan
2021-10-27 17:51           ` Matthew Wilcox
2021-10-27 18:00             ` Suren Baghdasaryan
2021-10-29 13:03       ` Michal Hocko
2021-10-29 16:07         ` Suren Baghdasaryan
2021-11-01  8:37           ` Michal Hocko
2021-11-01 15:44             ` Suren Baghdasaryan
2021-11-01 19:59               ` Suren Baghdasaryan
2021-11-02  7:58               ` Michal Hocko
2021-11-02 15:14                 ` Suren Baghdasaryan
2021-11-09 19:01                   ` Suren Baghdasaryan
2021-11-09 19:26                     ` Michal Hocko
2021-11-09 19:37                       ` Suren Baghdasaryan
2021-11-09 19:50                         ` Michal Hocko
2021-11-09 20:02                           ` Suren Baghdasaryan
2021-11-09 20:10                             ` Michal Hocko
2021-11-09 21:10                               ` Suren Baghdasaryan
2021-11-11  1:49                                 ` Suren Baghdasaryan
2021-11-11  9:20                                   ` Michal Hocko
2021-11-11 15:02                                     ` Suren Baghdasaryan
2021-11-12  8:58                                       ` Michal Hocko
2021-11-12 16:00                                         ` Suren Baghdasaryan
2021-11-09 19:41                       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXJwUUPjfg9wV6MQ@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=david@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=jengelh@inai.de \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.