linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
@ 2022-04-14 14:40 Nico Pache
  2022-04-20 12:50 ` Michal Hocko
  2022-04-21 14:40 ` Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: Nico Pache @ 2022-04-14 14:40 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Rafael Aquini, Waiman Long, Herton R . Krzesinski, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, David Rientjes,
	Michal Hocko, Andrea Arcangeli, Andrew Morton, Davidlohr Bueso,
	Peter Zijlstra, Ingo Molnar, Joel Savitz, Darren Hart, stable,
	Thomas Gleixner

The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
be targeted by the oom reaper. This mapping is used to store the futex
robust list head; the kernel does not keep a copy of the robust list and
instead references a userspace address to maintain the robustness during
a process death. A race can occur between exit_mm and the oom reaper that
allows the oom reaper to free the memory of the futex robust list before
the exit path has handled the futex death:

    CPU1                               CPU2
------------------------------------------------------------------------
    page_fault
    do_exit "signal"
    wake_oom_reaper
                                        oom_reaper
                                        oom_reap_task_mm (invalidates mm)
    exit_mm
    exit_mm_release
    futex_exit_release
    futex_cleanup
    exit_robust_list
    get_user (EFAULT- can't access memory)

If the get_user EFAULT's, the kernel will be unable to recover the
waiters on the robust_list, leaving userspace mutexes hung indefinitely.

Delay the OOM reaper, allowing more time for the exit path to perform
the futex cleanup.

Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer

[1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370

Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: stable@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
[ Based on a patch by Michal Hocko ]
Co-developed-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/sched.h |  1 +
 mm/oom_kill.c         | 54 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..a8911b1f35aa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1443,6 +1443,7 @@ struct task_struct {
 	int				pagefault_disabled;
 #ifdef CONFIG_MMU
 	struct task_struct		*oom_reaper_list;
+	struct timer_list		oom_reaper_timer;
 #endif
 #ifdef CONFIG_VMAP_STACK
 	struct vm_struct		*stack_vm_area;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7ec38194f8e1..49d7df39b02d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -632,7 +632,7 @@ static void oom_reap_task(struct task_struct *tsk)
 	 */
 	set_bit(MMF_OOM_SKIP, &mm->flags);
 
-	/* Drop a reference taken by wake_oom_reaper */
+	/* Drop a reference taken by queue_oom_reaper */
 	put_task_struct(tsk);
 }
 
@@ -644,12 +644,12 @@ static int oom_reaper(void *unused)
 		struct task_struct *tsk = NULL;
 
 		wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL);
-		spin_lock(&oom_reaper_lock);
+		spin_lock_irq(&oom_reaper_lock);
 		if (oom_reaper_list != NULL) {
 			tsk = oom_reaper_list;
 			oom_reaper_list = tsk->oom_reaper_list;
 		}
-		spin_unlock(&oom_reaper_lock);
+		spin_unlock_irq(&oom_reaper_lock);
 
 		if (tsk)
 			oom_reap_task(tsk);
@@ -658,22 +658,48 @@ static int oom_reaper(void *unused)
 	return 0;
 }
 
-static void wake_oom_reaper(struct task_struct *tsk)
+static void wake_oom_reaper(struct timer_list *timer)
 {
-	/* mm is already queued? */
-	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
-		return;
+	struct task_struct *tsk = container_of(timer, struct task_struct,
+			oom_reaper_timer);
+	struct mm_struct *mm = tsk->signal->oom_mm;
+	unsigned long flags;
 
-	get_task_struct(tsk);
+	/* The victim managed to terminate on its own - see exit_mmap */
+	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+		put_task_struct(tsk);
+		return;
+	}
 
-	spin_lock(&oom_reaper_lock);
+	spin_lock_irqsave(&oom_reaper_lock, flags);
 	tsk->oom_reaper_list = oom_reaper_list;
 	oom_reaper_list = tsk;
-	spin_unlock(&oom_reaper_lock);
+	spin_unlock_irqrestore(&oom_reaper_lock, flags);
 	trace_wake_reaper(tsk->pid);
 	wake_up(&oom_reaper_wait);
 }
 
+/*
+ * Give the OOM victim time to exit naturally before invoking the oom_reaping.
+ * The timers timeout is arbitrary... the longer it is, the longer the worst
+ * case scenario for the OOM can take. If it is too small, the oom_reaper can
+ * get in the way and release resources needed by the process exit path.
+ * e.g. The futex robust list can sit in Anon|Private memory that gets reaped
+ * before the exit path is able to wake the futex waiters.
+ */
+#define OOM_REAPER_DELAY (2*HZ)
+static void queue_oom_reaper(struct task_struct *tsk)
+{
+	/* mm is already queued? */
+	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
+		return;
+
+	get_task_struct(tsk);
+	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
+	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
+	add_timer(&tsk->oom_reaper_timer);
+}
+
 static int __init oom_init(void)
 {
 	oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper");
@@ -681,7 +707,7 @@ static int __init oom_init(void)
 }
 subsys_initcall(oom_init)
 #else
-static inline void wake_oom_reaper(struct task_struct *tsk)
+static inline void queue_oom_reaper(struct task_struct *tsk)
 {
 }
 #endif /* CONFIG_MMU */
@@ -932,7 +958,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
 	rcu_read_unlock();
 
 	if (can_oom_reap)
-		wake_oom_reaper(victim);
+		queue_oom_reaper(victim);
 
 	mmdrop(mm);
 	put_task_struct(victim);
@@ -968,7 +994,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	task_lock(victim);
 	if (task_will_free_mem(victim)) {
 		mark_oom_victim(victim);
-		wake_oom_reaper(victim);
+		queue_oom_reaper(victim);
 		task_unlock(victim);
 		put_task_struct(victim);
 		return;
@@ -1067,7 +1093,7 @@ bool out_of_memory(struct oom_control *oc)
 	 */
 	if (task_will_free_mem(current)) {
 		mark_oom_victim(current);
-		wake_oom_reaper(current);
+		queue_oom_reaper(current);
 		return true;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
  2022-04-14 14:40 [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup Nico Pache
@ 2022-04-20 12:50 ` Michal Hocko
  2022-04-21 14:40 ` Thomas Gleixner
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2022-04-20 12:50 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-mm, linux-kernel, Rafael Aquini, Waiman Long,
	Herton R . Krzesinski, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, David Rientjes, Andrea Arcangeli,
	Andrew Morton, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar,
	Joel Savitz, Darren Hart, stable, Thomas Gleixner

On Thu 14-04-22 10:40:42, Nico Pache wrote:
> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
> be targeted by the oom reaper. This mapping is used to store the futex
> robust list head; the kernel does not keep a copy of the robust list and
> instead references a userspace address to maintain the robustness during
> a process death. A race can occur between exit_mm and the oom reaper that
> allows the oom reaper to free the memory of the futex robust list before
> the exit path has handled the futex death:
> 
>     CPU1                               CPU2
> ------------------------------------------------------------------------
>     page_fault
>     do_exit "signal"
>     wake_oom_reaper
>                                         oom_reaper
>                                         oom_reap_task_mm (invalidates mm)
>     exit_mm
>     exit_mm_release
>     futex_exit_release
>     futex_cleanup
>     exit_robust_list
>     get_user (EFAULT- can't access memory)
> 
> If the get_user EFAULT's, the kernel will be unable to recover the
> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
> 
> Delay the OOM reaper, allowing more time for the exit path to perform
> the futex cleanup.
> 
> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
> 
> [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370
> 
> Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Herton R. Krzesinski <herton@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Joel Savitz <jsavitz@redhat.com>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: stable@kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> [ Based on a patch by Michal Hocko ]
> Co-developed-by: Joel Savitz <jsavitz@redhat.com>
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> Signed-off-by: Nico Pache <npache@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
>  include/linux/sched.h |  1 +
>  mm/oom_kill.c         | 54 ++++++++++++++++++++++++++++++++-----------
>  2 files changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d5e3c00b74e1..a8911b1f35aa 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1443,6 +1443,7 @@ struct task_struct {
>  	int				pagefault_disabled;
>  #ifdef CONFIG_MMU
>  	struct task_struct		*oom_reaper_list;
> +	struct timer_list		oom_reaper_timer;
>  #endif
>  #ifdef CONFIG_VMAP_STACK
>  	struct vm_struct		*stack_vm_area;
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 7ec38194f8e1..49d7df39b02d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -632,7 +632,7 @@ static void oom_reap_task(struct task_struct *tsk)
>  	 */
>  	set_bit(MMF_OOM_SKIP, &mm->flags);
>  
> -	/* Drop a reference taken by wake_oom_reaper */
> +	/* Drop a reference taken by queue_oom_reaper */
>  	put_task_struct(tsk);
>  }
>  
> @@ -644,12 +644,12 @@ static int oom_reaper(void *unused)
>  		struct task_struct *tsk = NULL;
>  
>  		wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL);
> -		spin_lock(&oom_reaper_lock);
> +		spin_lock_irq(&oom_reaper_lock);
>  		if (oom_reaper_list != NULL) {
>  			tsk = oom_reaper_list;
>  			oom_reaper_list = tsk->oom_reaper_list;
>  		}
> -		spin_unlock(&oom_reaper_lock);
> +		spin_unlock_irq(&oom_reaper_lock);
>  
>  		if (tsk)
>  			oom_reap_task(tsk);
> @@ -658,22 +658,48 @@ static int oom_reaper(void *unused)
>  	return 0;
>  }
>  
> -static void wake_oom_reaper(struct task_struct *tsk)
> +static void wake_oom_reaper(struct timer_list *timer)
>  {
> -	/* mm is already queued? */
> -	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
> -		return;
> +	struct task_struct *tsk = container_of(timer, struct task_struct,
> +			oom_reaper_timer);
> +	struct mm_struct *mm = tsk->signal->oom_mm;
> +	unsigned long flags;
>  
> -	get_task_struct(tsk);
> +	/* The victim managed to terminate on its own - see exit_mmap */
> +	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
> +		put_task_struct(tsk);
> +		return;
> +	}
>  
> -	spin_lock(&oom_reaper_lock);
> +	spin_lock_irqsave(&oom_reaper_lock, flags);
>  	tsk->oom_reaper_list = oom_reaper_list;
>  	oom_reaper_list = tsk;
> -	spin_unlock(&oom_reaper_lock);
> +	spin_unlock_irqrestore(&oom_reaper_lock, flags);
>  	trace_wake_reaper(tsk->pid);
>  	wake_up(&oom_reaper_wait);
>  }
>  
> +/*
> + * Give the OOM victim time to exit naturally before invoking the oom_reaping.
> + * The timers timeout is arbitrary... the longer it is, the longer the worst
> + * case scenario for the OOM can take. If it is too small, the oom_reaper can
> + * get in the way and release resources needed by the process exit path.
> + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped
> + * before the exit path is able to wake the futex waiters.
> + */
> +#define OOM_REAPER_DELAY (2*HZ)
> +static void queue_oom_reaper(struct task_struct *tsk)
> +{
> +	/* mm is already queued? */
> +	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
> +		return;
> +
> +	get_task_struct(tsk);
> +	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> +	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> +	add_timer(&tsk->oom_reaper_timer);
> +}
> +
>  static int __init oom_init(void)
>  {
>  	oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper");
> @@ -681,7 +707,7 @@ static int __init oom_init(void)
>  }
>  subsys_initcall(oom_init)
>  #else
> -static inline void wake_oom_reaper(struct task_struct *tsk)
> +static inline void queue_oom_reaper(struct task_struct *tsk)
>  {
>  }
>  #endif /* CONFIG_MMU */
> @@ -932,7 +958,7 @@ static void __oom_kill_process(struct task_struct *victim, const char *message)
>  	rcu_read_unlock();
>  
>  	if (can_oom_reap)
> -		wake_oom_reaper(victim);
> +		queue_oom_reaper(victim);
>  
>  	mmdrop(mm);
>  	put_task_struct(victim);
> @@ -968,7 +994,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>  	task_lock(victim);
>  	if (task_will_free_mem(victim)) {
>  		mark_oom_victim(victim);
> -		wake_oom_reaper(victim);
> +		queue_oom_reaper(victim);
>  		task_unlock(victim);
>  		put_task_struct(victim);
>  		return;
> @@ -1067,7 +1093,7 @@ bool out_of_memory(struct oom_control *oc)
>  	 */
>  	if (task_will_free_mem(current)) {
>  		mark_oom_victim(current);
> -		wake_oom_reaper(current);
> +		queue_oom_reaper(current);
>  		return true;
>  	}
>  
> -- 
> 2.35.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
  2022-04-14 14:40 [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup Nico Pache
  2022-04-20 12:50 ` Michal Hocko
@ 2022-04-21 14:40 ` Thomas Gleixner
  2022-04-21 16:25   ` Nico Pache
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2022-04-21 14:40 UTC (permalink / raw)
  To: Nico Pache, linux-mm, linux-kernel
  Cc: Rafael Aquini, Waiman Long, Herton R . Krzesinski, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, David Rientjes,
	Michal Hocko, Andrea Arcangeli, Andrew Morton, Davidlohr Bueso,
	Peter Zijlstra, Ingo Molnar, Joel Savitz, Darren Hart, stable

On Thu, Apr 14 2022 at 10:40, Nico Pache wrote:
> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
> be targeted by the oom reaper. This mapping is used to store the futex
> robust list head; the kernel does not keep a copy of the robust list and
> instead references a userspace address to maintain the robustness during
> a process death. A race can occur between exit_mm and the oom reaper that
> allows the oom reaper to free the memory of the futex robust list before
> the exit path has handled the futex death:
>
>     CPU1                               CPU2
> ------------------------------------------------------------------------
>     page_fault
>     do_exit "signal"
>     wake_oom_reaper
>                                         oom_reaper
>                                         oom_reap_task_mm (invalidates mm)
>     exit_mm
>     exit_mm_release
>     futex_exit_release
>     futex_cleanup
>     exit_robust_list
>     get_user (EFAULT- can't access memory)
>
> If the get_user EFAULT's, the kernel will be unable to recover the
> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
>
> Delay the OOM reaper, allowing more time for the exit path to perform
> the futex cleanup.
>
> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
>
> [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370

A link to the original discussion about this would be more useful than a
code reference which is stale tomorrow. The above explanation is good
enough to describe the problem.

>  
> +/*
> + * Give the OOM victim time to exit naturally before invoking the oom_reaping.
> + * The timers timeout is arbitrary... the longer it is, the longer the worst
> + * case scenario for the OOM can take. If it is too small, the oom_reaper can
> + * get in the way and release resources needed by the process exit path.
> + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped
> + * before the exit path is able to wake the futex waiters.
> + */
> +#define OOM_REAPER_DELAY (2*HZ)
> +static void queue_oom_reaper(struct task_struct *tsk)

Bah. Did you run out of newlines? Glueing that define between the
comment and the function is unreadable.

Other than that.

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
  2022-04-21 14:40 ` Thomas Gleixner
@ 2022-04-21 16:25   ` Nico Pache
  2022-04-21 18:49     ` Andrew Morton
  2022-04-21 20:43     ` Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: Nico Pache @ 2022-04-21 16:25 UTC (permalink / raw)
  To: Thomas Gleixner, linux-mm, linux-kernel, Andrew Morton
  Cc: Rafael Aquini, Waiman Long, Herton R . Krzesinski, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, David Rientjes,
	Michal Hocko, Andrea Arcangeli, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Joel Savitz, Darren Hart, stable



On 4/21/22 10:40, Thomas Gleixner wrote:
> On Thu, Apr 14 2022 at 10:40, Nico Pache wrote:
>> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
>> be targeted by the oom reaper. This mapping is used to store the futex
>> robust list head; the kernel does not keep a copy of the robust list and
>> instead references a userspace address to maintain the robustness during
>> a process death. A race can occur between exit_mm and the oom reaper that
>> allows the oom reaper to free the memory of the futex robust list before
>> the exit path has handled the futex death:
>>
>>     CPU1                               CPU2
>> ------------------------------------------------------------------------
>>     page_fault
>>     do_exit "signal"
>>     wake_oom_reaper
>>                                         oom_reaper
>>                                         oom_reap_task_mm (invalidates mm)
>>     exit_mm
>>     exit_mm_release
>>     futex_exit_release
>>     futex_cleanup
>>     exit_robust_list
>>     get_user (EFAULT- can't access memory)
>>
>> If the get_user EFAULT's, the kernel will be unable to recover the
>> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
>>
>> Delay the OOM reaper, allowing more time for the exit path to perform
>> the futex cleanup.
>>
>> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
>>
>> [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370
> 
> A link to the original discussion about this would be more useful than a
> code reference which is stale tomorrow. The above explanation is good
> enough to describe the problem.

Hi Andrew,

can you please update the link when you add the ACKs.

Here is a more stable link:
[1] https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370

> 
>>  
>> +/*
>> + * Give the OOM victim time to exit naturally before invoking the oom_reaping.
>> + * The timers timeout is arbitrary... the longer it is, the longer the worst
>> + * case scenario for the OOM can take. If it is too small, the oom_reaper can
>> + * get in the way and release resources needed by the process exit path.
>> + * e.g. The futex robust list can sit in Anon|Private memory that gets reaped
>> + * before the exit path is able to wake the futex waiters.
>> + */
>> +#define OOM_REAPER_DELAY (2*HZ)
>> +static void queue_oom_reaper(struct task_struct *tsk)
> 
> Bah. Did you run out of newlines? Glueing that define between the
> comment and the function is unreadable.

My Enter key hit its cgroup limit for newlines.

Andrew, would it be possible to also add a new line when you make the other
changes. Sorry about that.

> 
> Other than that.
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

Thanks!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
  2022-04-21 16:25   ` Nico Pache
@ 2022-04-21 18:49     ` Andrew Morton
  2022-04-21 20:43     ` Thomas Gleixner
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2022-04-21 18:49 UTC (permalink / raw)
  To: Nico Pache
  Cc: Thomas Gleixner, linux-mm, linux-kernel, Rafael Aquini,
	Waiman Long, Herton R . Krzesinski, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, David Rientjes, Michal Hocko,
	Andrea Arcangeli, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar,
	Joel Savitz, Darren Hart, stable

On Thu, 21 Apr 2022 12:25:58 -0400 Nico Pache <npache@redhat.com> wrote:

> can you please update the link when you add the ACKs.
> 
> Here is a more stable link:
> [1] https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370

Done.  Thanks, all.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup
  2022-04-21 16:25   ` Nico Pache
  2022-04-21 18:49     ` Andrew Morton
@ 2022-04-21 20:43     ` Thomas Gleixner
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2022-04-21 20:43 UTC (permalink / raw)
  To: Nico Pache, linux-mm, linux-kernel, Andrew Morton
  Cc: Rafael Aquini, Waiman Long, Herton R . Krzesinski, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, David Rientjes,
	Michal Hocko, Andrea Arcangeli, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Joel Savitz, Darren Hart, stable

On Thu, Apr 21 2022 at 12:25, Nico Pache wrote:
> On 4/21/22 10:40, Thomas Gleixner wrote:
>>> Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer
>>>
>>> [1] https://elixir.bootlin.com/glibc/latest/source/nptl/allocatestack.c#L370
>> 
>> A link to the original discussion about this would be more useful than a
>> code reference which is stale tomorrow. The above explanation is good
>> enough to describe the problem.
>
> Hi Andrew,
>
> can you please update the link when you add the ACKs.
>
> Here is a more stable link:
> [1] https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370

That link is still uninteresting and has nothing to do with what I was
asking for, i.e. replacing it with a link to the original discussion
which led to this patch.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-21 20:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-14 14:40 [PATCH v9] oom_kill.c: futex: Delay the OOM reaper to allow time for proper futex cleanup Nico Pache
2022-04-20 12:50 ` Michal Hocko
2022-04-21 14:40 ` Thomas Gleixner
2022-04-21 16:25   ` Nico Pache
2022-04-21 18:49     ` Andrew Morton
2022-04-21 20:43     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).