All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Nico Pache <npache@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rafael Aquini <aquini@redhat.com>,
	Waiman Long <longman@redhat.com>, Baoquan He <bhe@redhat.com>,
	Christoph von Recklinghausen <crecklin@redhat.com>,
	Don Dutile <ddutile@redhat.com>,
	"Herton R . Krzesinski" <herton@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Darren Hart <dvhart@infradead.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andre Almeida <andrealmeid@collabora.com>,
	David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joel Savitz <jsavitz@redhat.com>
Subject: Re: [PATCH v5] mm/oom_kill.c: futex: Close a race between do_exit and the oom_reaper
Date: Mon, 21 Mar 2022 09:55:57 +0100	[thread overview]
Message-ID: <Yjg9ncgep58gFLiN@dhcp22.suse.cz> (raw)
In-Reply-To: <20220318033621.626006-1-npache@redhat.com>

On Thu 17-03-22 21:36:21, Nico Pache wrote:
> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
> be targeted by the oom reaper. This mapping is used to store the futex
> robust list; the kernel does not keep a copy of the robust list and instead
> references a userspace address to maintain the robustness during a process
> death. A race can occur between exit_mm and the oom reaper that allows
> the oom reaper to free the memory of the futex robust list before the
> exit path has handled the futex death:
> 
>     CPU1                               CPU2
> ------------------------------------------------------------------------
>     page_fault
>     out_of_memory
>     do_exit "signal"
>     wake_oom_reaper
> 					oom_reaper
>                         		oom_reap_task_mm (invalidates mm)
>     exit_mm
>     exit_mm_release
>     futex_exit_release
>     futex_cleanup
>     exit_robust_list
>     get_user (EFAULT- can't access memory)

I still think it is useful to be explicit about the consequences of the
EFAULT here. Did you want to mention that a failing get_user in this
path would result in a hang because nobody is woken up when the current
holder of the lock terminates.

> While in the oom reaper thread, we must handle the futex cleanup without
> sleeping. To achieve this, add the boolean `try` to futex_exit_begin().
> This will control weather or not we use a trylock. Also introduce
> try_futex_exit_release() which will utilize the trylock version of the
> futex_cleanup_begin(). Also call kthread_use_mm in this context to assure
> the get_user call in futex_cleanup() does not return with EFAULT.

This alone is not sufficient. get_user can sleep in the #PF handler path
(e.g. by waiting for swap in). Or is there any guarantee that the page
is never swapped out? If we cannot rule out #PF then this is not a
viable way to address the problem I am afraid.

[...]
> @@ -587,6 +588,18 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
>  		goto out_unlock;
>  	}
>  
> +	/* We can't reap a process holding a robust_list; the pthread
> +	 * struct is allocated in userspace using PRIVATE | ANONYMOUS
> +	 * memory which when reaped before futex_cleanup() can leave
> +	 * the waiting process stuck. Try to perform the futex_cleanup,
> +	 * and if unsuccessful, skip the OOM reaping.
> +	 */
> +	if (task_has_robust_list(tsk) && !try_futex_exit_release(tsk)) {
> +		trace_skip_task_reaping(tsk->pid);
> +		pr_info("oom_reaper: skipping task as it contains a robust list");
> +		goto out_finish;
> +	}
> +
>  	trace_start_task_reaping(tsk->pid);
>  
>  	/* failed to reap part of the address space. Try again later */

Please also note that this all is done after mmap_lock has been already
taken so a page fault could deadlock on the mmap_lock.

The more I am thinking about this the more I am getting convinced that
we should rather approach this differently and skip over vmas which can
be holding the list. Have you considered this option?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2022-03-21  8:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18  3:36 [PATCH v5] mm/oom_kill.c: futex: Close a race between do_exit and the oom_reaper Nico Pache
2022-03-21  8:55 ` Michal Hocko [this message]
2022-03-21 22:45   ` Nico Pache
2022-03-22  0:42   ` Davidlohr Bueso
2022-03-22  1:53     ` Nico Pache
2022-03-22  2:57       ` Davidlohr Bueso
2022-03-22  3:09         ` Nico Pache
2022-03-22  8:26         ` Michal Hocko
2022-03-22 15:17           ` Thomas Gleixner
2022-03-22 16:36             ` Michal Hocko
2022-03-22 22:43               ` Thomas Gleixner
2022-03-23  9:17                 ` Michal Hocko
2022-03-23 10:30                   ` Thomas Gleixner
2022-03-23 11:11                   ` Peter Zijlstra
2022-03-30  9:18                   ` Michal Hocko
2022-03-30 18:18                     ` Nico Pache
2022-03-30 21:36                       ` Nico Pache
2022-04-06 17:22             ` Nico Pache
2022-04-06 17:36               ` Nico Pache

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yjg9ncgep58gFLiN@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrealmeid@collabora.com \
    --cc=aquini@redhat.com \
    --cc=bhe@redhat.com \
    --cc=crecklin@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=ddutile@redhat.com \
    --cc=dvhart@infradead.org \
    --cc=herton@redhat.com \
    --cc=jsavitz@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.