linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns
@ 2022-11-25 13:54 Frederic Weisbecker
  2022-11-25 13:54 ` [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-11-25 13:54 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Eric W . Biederman, Neeraj Upadhyay,
	Oleg Nesterov, Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Pengfei Xu has reported a deadlock involving calls to unshare(),
perf_event_open() and clone3() calls. It requires CAP_SYS_ADMIN
to reproduce (at least I don't see a way for a non privilege process to
reproduce).

See this thread for details: https://lore.kernel.org/all/Y3sOgrOmMQqPMItu@xpf.sh.intel.com/
And this document for the collaborative analysis with Boqun, Paul and Neeraj:
https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k

The two first patches are small improvements. The fix is in the last patch.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	rcu/dev

HEAD: 45ef5a0a4be4e0db9eadcc86e8f346d34c62e744

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose
      rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls
      rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()


 include/linux/rcupdate.h |  2 ++
 kernel/pid_namespace.c   | 17 +++++++++++++++
 kernel/rcu/tasks.h       | 55 ++++++++++++++++++++++++++++++++++++------------
 3 files changed, 60 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose
  2022-11-25 13:54 [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Frederic Weisbecker
@ 2022-11-25 13:54 ` Frederic Weisbecker
  2022-11-25 13:54 ` [PATCH 2/3] rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-11-25 13:54 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Eric W . Biederman, Neeraj Upadhyay,
	Oleg Nesterov, Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Make sure we don't need to look again into the depths of git blame in
order not to miss a subtle part about how rcu-tasks is dealing with
exiting tasks.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tasks.h | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4a991311be9b..7deed6135f73 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -827,11 +827,21 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
 static void rcu_tasks_postscan(struct list_head *hop)
 {
 	/*
-	 * Wait for tasks that are in the process of exiting.  This
-	 * does only part of the job, ensuring that all tasks that were
-	 * previously exiting reach the point where they have disabled
-	 * preemption, allowing the later synchronize_rcu() to finish
-	 * the job.
+	 * Exiting tasks may escape the tasklist scan. Those are vulnerable
+	 * until their final schedule() with TASK_DEAD state. To cope with
+	 * this, divide the fragile exit path part in two intersecting
+	 * read side critical sections:
+	 *
+	 * 1) An _SRCU_ read side starting before calling exit_notify(),
+	 *    which may remove the task from the tasklist, and ending after
+	 *    the final preempt_disable() call in do_exit().
+	 *
+	 * 2) An _RCU_ read side starting with the final preempt_disable()
+	 *    call in do_exit() and ending with the final call to schedule()
+	 *    with TASK_DEAD state.
+	 *
+	 * This handles the part 1). And postgp will handle part 2) with a
+	 * call to synchronize_rcu().
 	 */
 	synchronize_srcu(&tasks_rcu_exit_srcu);
 }
@@ -898,7 +908,10 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
 	 *
 	 * In addition, this synchronize_rcu() waits for exiting tasks
 	 * to complete their final preempt_disable() region of execution,
-	 * cleaning up after the synchronize_srcu() above.
+	 * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu),
+	 * enforcing the whole region before tasklist removal until
+	 * the final schedule() with TASK_DEAD state to be an RCU TASKS
+	 * read side critical section.
 	 */
 	synchronize_rcu();
 }
@@ -988,7 +1001,11 @@ void show_rcu_tasks_classic_gp_kthread(void)
 EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread);
 #endif // !defined(CONFIG_TINY_RCU)
 
-/* Do the srcu_read_lock() for the above synchronize_srcu().  */
+/*
+ * Contribute to protect against tasklist scan blind spot while the
+ * task is exiting and may be removed from the tasklist. See
+ * corresponding synchronize_srcu() for further details.
+ */
 void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
 {
 	preempt_disable();
@@ -996,7 +1013,11 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
 	preempt_enable();
 }
 
-/* Do the srcu_read_unlock() for the above synchronize_srcu().  */
+/*
+ * Contribute to protect against tasklist scan blind spot while the
+ * task is exiting and may be removed from the tasklist. See
+ * corresponding synchronize_srcu() for further details.
+ */
 void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
 {
 	struct task_struct *t = current;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls
  2022-11-25 13:54 [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Frederic Weisbecker
  2022-11-25 13:54 ` [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Frederic Weisbecker
@ 2022-11-25 13:54 ` Frederic Weisbecker
  2022-11-25 13:55 ` [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() Frederic Weisbecker
  2022-11-29  0:22 ` [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Paul E. McKenney
  3 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-11-25 13:54 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Eric W . Biederman, Neeraj Upadhyay,
	Oleg Nesterov, Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Ever since the following commit:

	5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()")

SRCU doesn't rely anymore on preemption to be disabled in order to
modify the per-CPU counter. And even then it used to be done from the API
itself.

Therefore and after checking further, it appears to be safe to remove
the preemption disablement around __srcu_read_[un]lock() in
exit_tasks_rcu_start() and exit_tasks_rcu_finish()

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tasks.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 7deed6135f73..9a8114114b48 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1008,9 +1008,7 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_classic_gp_kthread);
  */
 void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
 {
-	preempt_disable();
 	current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
-	preempt_enable();
 }
 
 /*
@@ -1022,9 +1020,7 @@ void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
 {
 	struct task_struct *t = current;
 
-	preempt_disable();
 	__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
-	preempt_enable();
 	exit_tasks_rcu_finish_trace(t);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-11-25 13:54 [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Frederic Weisbecker
  2022-11-25 13:54 ` [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Frederic Weisbecker
  2022-11-25 13:54 ` [PATCH 2/3] rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls Frederic Weisbecker
@ 2022-11-25 13:55 ` Frederic Weisbecker
  2022-11-30 18:37   ` Eric W. Biederman
  2022-11-29  0:22 ` [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Paul E. McKenney
  3 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2022-11-25 13:55 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: LKML, Frederic Weisbecker, Eric W . Biederman, Neeraj Upadhyay,
	Oleg Nesterov, Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

RCU Tasks and PID-namespace unshare can interact in do_exit() in a
complicated circular dependency:

1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
   that every subsequent child of TASK A will belong to. But TASK A
   doesn't itself belong to that new PID namespace.

2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
   namespace (let's say PID_NS1) and TASK B is the first task belonging
   to the new PID namespace created by unshare()  (let's call it PID_NS2).

3) Since TASK B is the first task attached to PID_NS2, it becomes the
   PID_NS2 child reaper.

4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
   Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
   TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.

5) TASK B exits and since it is the child reaper for PID_NS2, it has to
   kill all other tasks attached to PID_NS2, and wait for all of them to
   die before getting reaped itself (zap_pid_ns_process()).

6) TASK A calls synchronize_rcu_tasks() which leads to
   synchronize_srcu(&tasks_rcu_exit_srcu).

7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
   tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
   exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.

8) TASK C exits and since TASK A is its parent, it waits for it to reap
   TASK C, but it can't because TASK A waits for TASK B that waits for
   TASK C.

Pid_namespace semantics can hardly be changed at this point. But the
coverage of tasks_rcu_exit_srcu can be reduced instead.

The current task is assumed not to be concurrently reapable at this
stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
temporarily relaxed without breaking its constraints, providing a way
out of the deadlock scenario.

Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Eric W . Biederman <ebiederm@xmission.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcupdate.h |  2 ++
 kernel/pid_namespace.c   | 17 +++++++++++++++++
 kernel/rcu/tasks.h       | 14 ++++++++++++--
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 89b3036746d2..a19d91d5461c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -238,6 +238,7 @@ void synchronize_rcu_tasks_rude(void);
 
 #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
 void exit_tasks_rcu_start(void);
+void exit_tasks_rcu_stop(void);
 void exit_tasks_rcu_finish(void);
 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
 #define rcu_tasks_classic_qs(t, preempt) do { } while (0)
@@ -246,6 +247,7 @@ void exit_tasks_rcu_finish(void);
 #define call_rcu_tasks call_rcu
 #define synchronize_rcu_tasks synchronize_rcu
 static inline void exit_tasks_rcu_start(void) { }
+static inline void exit_tasks_rcu_stop(void) { }
 static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index f4f8cb0435b4..fc21c5d5fd5d 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (pid_ns->pid_allocated == init_pids)
 			break;
+		/*
+		 * Release tasks_rcu_exit_srcu to avoid following deadlock:
+		 *
+		 * 1) TASK A unshare(CLONE_NEWPID)
+		 * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
+		 *    and TASK C
+		 * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
+		 * 4) TASK A calls synchronize_rcu_tasks()
+		 *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
+		 * 5) *DEADLOCK*
+		 *
+		 * It is considered safe to release tasks_rcu_exit_srcu here
+		 * because we assume the current task can not be concurrently
+		 * reaped at this point.
+		 */
+		exit_tasks_rcu_stop();
 		schedule();
+		exit_tasks_rcu_start();
 	}
 	__set_current_state(TASK_RUNNING);
 
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 9a8114114b48..f9dfc2ece287 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1016,12 +1016,22 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
  * task is exiting and may be removed from the tasklist. See
  * corresponding synchronize_srcu() for further details.
  */
-void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
 {
 	struct task_struct *t = current;
 
 	__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
-	exit_tasks_rcu_finish_trace(t);
+}
+
+/*
+ * Contribute to protect against tasklist scan blind spot while the
+ * task is exiting and may be removed from the tasklist. See
+ * corresponding synchronize_srcu() for further details.
+ */
+void exit_tasks_rcu_finish(void)
+{
+	exit_tasks_rcu_stop();
+	exit_tasks_rcu_finish_trace(current);
 }
 
 #else /* #ifdef CONFIG_TASKS_RCU */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns
  2022-11-25 13:54 [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2022-11-25 13:55 ` [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() Frederic Weisbecker
@ 2022-11-29  0:22 ` Paul E. McKenney
  2022-11-29  9:55   ` Frederic Weisbecker
  3 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-29  0:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Eric W . Biederman, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Fri, Nov 25, 2022 at 02:54:57PM +0100, Frederic Weisbecker wrote:
> Pengfei Xu has reported a deadlock involving calls to unshare(),
> perf_event_open() and clone3() calls. It requires CAP_SYS_ADMIN
> to reproduce (at least I don't see a way for a non privilege process to
> reproduce).
> 
> See this thread for details: https://lore.kernel.org/all/Y3sOgrOmMQqPMItu@xpf.sh.intel.com/
> And this document for the collaborative analysis with Boqun, Paul and Neeraj:
> https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k
> 
> The two first patches are small improvements. The fix is in the last patch.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	rcu/dev
> 
> HEAD: 45ef5a0a4be4e0db9eadcc86e8f346d34c62e744

Hearing no objections, queued for further review and testing.

And thank you very much!  That race between synchronize_rcu_tasks() and
zap_pid_ns_processes() certainly was more than a bit on the non-trivial
side.  Good show!!!

							Thanx, Paul

> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (3):
>       rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose
>       rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls
>       rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
> 
> 
>  include/linux/rcupdate.h |  2 ++
>  kernel/pid_namespace.c   | 17 +++++++++++++++
>  kernel/rcu/tasks.h       | 55 ++++++++++++++++++++++++++++++++++++------------
>  3 files changed, 60 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns
  2022-11-29  0:22 ` [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Paul E. McKenney
@ 2022-11-29  9:55   ` Frederic Weisbecker
  2022-11-29 14:48     ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2022-11-29  9:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Eric W . Biederman, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Mon, Nov 28, 2022 at 04:22:40PM -0800, Paul E. McKenney wrote:
> On Fri, Nov 25, 2022 at 02:54:57PM +0100, Frederic Weisbecker wrote:
> > Pengfei Xu has reported a deadlock involving calls to unshare(),
> > perf_event_open() and clone3() calls. It requires CAP_SYS_ADMIN
> > to reproduce (at least I don't see a way for a non privilege process to
> > reproduce).
> > 
> > See this thread for details: https://lore.kernel.org/all/Y3sOgrOmMQqPMItu@xpf.sh.intel.com/
> > And this document for the collaborative analysis with Boqun, Paul and Neeraj:
> > https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k
> > 
> > The two first patches are small improvements. The fix is in the last patch.
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	rcu/dev
> > 
> > HEAD: 45ef5a0a4be4e0db9eadcc86e8f346d34c62e744
> 
> Hearing no objections, queued for further review and testing.
> 
> And thank you very much!  That race between synchronize_rcu_tasks() and
> zap_pid_ns_processes() certainly was more than a bit on the non-trivial
> side.  Good show!!!

Thanks!

Also please replace the last patch with the following to fix
a !CONFIG_RCU_TASKS issue:

---
From: Frederic Weisbecker <frederic@kernel.org>
Date: Thu, 24 Nov 2022 18:15:46 +0100
Subject: [PATCH] rcu-tasks: Fix synchronize_rcu_tasks() VS
 zap_pid_ns_processes()

RCU Tasks and PID-namespace unshare can interact in do_exit() in a
complicated circular dependency:

1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
   that every subsequent child of TASK A will belong to. But TASK A
   doesn't itself belong to that new PID namespace.

2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
   namespace (let's say PID_NS1) and TASK B is the first task belonging
   to the new PID namespace created by unshare()  (let's call it PID_NS2).

3) Since TASK B is the first task attached to PID_NS2, it becomes the
   PID_NS2 child reaper.

4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
   Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
   TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.

5) TASK B exits and since it is the child reaper for PID_NS2, it has to
   kill all other tasks attached to PID_NS2, and wait for all of them to
   die before getting reaped itself (zap_pid_ns_process()).

6) TASK A calls synchronize_rcu_tasks() which leads to
   synchronize_srcu(&tasks_rcu_exit_srcu).

7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
   tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
   exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.

8) TASK C exits and since TASK A is its parent, it waits for it to reap
   TASK C, but it can't because TASK A waits for TASK B that waits for
   TASK C.

Pid_namespace semantics can hardly be changed at this point. But the
coverage of tasks_rcu_exit_srcu can be reduced instead.

The current task is assumed not to be concurrently reapable at this
stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
temporarily relaxed without breaking its constraints, providing a way
out of the deadlock scenario.

Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Eric W . Biederman <ebiederm@xmission.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcupdate.h |  2 ++
 kernel/pid_namespace.c   | 17 +++++++++++++++++
 kernel/rcu/tasks.h       | 15 +++++++++++++--
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 89b3036746d2..a19d91d5461c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -238,6 +238,7 @@ void synchronize_rcu_tasks_rude(void);
 
 #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
 void exit_tasks_rcu_start(void);
+void exit_tasks_rcu_stop(void);
 void exit_tasks_rcu_finish(void);
 #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
 #define rcu_tasks_classic_qs(t, preempt) do { } while (0)
@@ -246,6 +247,7 @@ void exit_tasks_rcu_finish(void);
 #define call_rcu_tasks call_rcu
 #define synchronize_rcu_tasks synchronize_rcu
 static inline void exit_tasks_rcu_start(void) { }
+static inline void exit_tasks_rcu_stop(void) { }
 static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
 
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index f4f8cb0435b4..fc21c5d5fd5d 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (pid_ns->pid_allocated == init_pids)
 			break;
+		/*
+		 * Release tasks_rcu_exit_srcu to avoid following deadlock:
+		 *
+		 * 1) TASK A unshare(CLONE_NEWPID)
+		 * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
+		 *    and TASK C
+		 * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
+		 * 4) TASK A calls synchronize_rcu_tasks()
+		 *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
+		 * 5) *DEADLOCK*
+		 *
+		 * It is considered safe to release tasks_rcu_exit_srcu here
+		 * because we assume the current task can not be concurrently
+		 * reaped at this point.
+		 */
+		exit_tasks_rcu_stop();
 		schedule();
+		exit_tasks_rcu_start();
 	}
 	__set_current_state(TASK_RUNNING);
 
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 9a8114114b48..4dda8e6e5707 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1016,16 +1016,27 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
  * task is exiting and may be removed from the tasklist. See
  * corresponding synchronize_srcu() for further details.
  */
-void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
 {
 	struct task_struct *t = current;
 
 	__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
-	exit_tasks_rcu_finish_trace(t);
+}
+
+/*
+ * Contribute to protect against tasklist scan blind spot while the
+ * task is exiting and may be removed from the tasklist. See
+ * corresponding synchronize_srcu() for further details.
+ */
+void exit_tasks_rcu_finish(void)
+{
+	exit_tasks_rcu_stop();
+	exit_tasks_rcu_finish_trace(current);
 }
 
 #else /* #ifdef CONFIG_TASKS_RCU */
 void exit_tasks_rcu_start(void) { }
+void exit_tasks_rcu_stop(void) { }
 void exit_tasks_rcu_finish(void) { exit_tasks_rcu_finish_trace(current); }
 #endif /* #else #ifdef CONFIG_TASKS_RCU */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns
  2022-11-29  9:55   ` Frederic Weisbecker
@ 2022-11-29 14:48     ` Paul E. McKenney
  2022-12-02 22:55       ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-11-29 14:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Eric W . Biederman, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Tue, Nov 29, 2022 at 10:55:00AM +0100, Frederic Weisbecker wrote:
> On Mon, Nov 28, 2022 at 04:22:40PM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 25, 2022 at 02:54:57PM +0100, Frederic Weisbecker wrote:
> > > Pengfei Xu has reported a deadlock involving calls to unshare(),
> > > perf_event_open() and clone3() calls. It requires CAP_SYS_ADMIN
> > > to reproduce (at least I don't see a way for a non privilege process to
> > > reproduce).
> > > 
> > > See this thread for details: https://lore.kernel.org/all/Y3sOgrOmMQqPMItu@xpf.sh.intel.com/
> > > And this document for the collaborative analysis with Boqun, Paul and Neeraj:
> > > https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k
> > > 
> > > The two first patches are small improvements. The fix is in the last patch.
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > 	rcu/dev
> > > 
> > > HEAD: 45ef5a0a4be4e0db9eadcc86e8f346d34c62e744
> > 
> > Hearing no objections, queued for further review and testing.
> > 
> > And thank you very much!  That race between synchronize_rcu_tasks() and
> > zap_pid_ns_processes() certainly was more than a bit on the non-trivial
> > side.  Good show!!!
> 
> Thanks!
> 
> Also please replace the last patch with the following to fix
> a !CONFIG_RCU_TASKS issue:

Like this?  ;-)

a0c355bbdfee ("squash! rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()")

							Thanx, Paul

> ---
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Thu, 24 Nov 2022 18:15:46 +0100
> Subject: [PATCH] rcu-tasks: Fix synchronize_rcu_tasks() VS
>  zap_pid_ns_processes()
> 
> RCU Tasks and PID-namespace unshare can interact in do_exit() in a
> complicated circular dependency:
> 
> 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
>    that every subsequent child of TASK A will belong to. But TASK A
>    doesn't itself belong to that new PID namespace.
> 
> 2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
>    namespace (let's say PID_NS1) and TASK B is the first task belonging
>    to the new PID namespace created by unshare()  (let's call it PID_NS2).
> 
> 3) Since TASK B is the first task attached to PID_NS2, it becomes the
>    PID_NS2 child reaper.
> 
> 4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
>    Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
>    TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.
> 
> 5) TASK B exits and since it is the child reaper for PID_NS2, it has to
>    kill all other tasks attached to PID_NS2, and wait for all of them to
>    die before getting reaped itself (zap_pid_ns_process()).
> 
> 6) TASK A calls synchronize_rcu_tasks() which leads to
>    synchronize_srcu(&tasks_rcu_exit_srcu).
> 
> 7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
>    tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
>    exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.
> 
> 8) TASK C exits and since TASK A is its parent, it waits for it to reap
>    TASK C, but it can't because TASK A waits for TASK B that waits for
>    TASK C.
> 
> Pid_namespace semantics can hardly be changed at this point. But the
> coverage of tasks_rcu_exit_srcu can be reduced instead.
> 
> The current task is assumed not to be concurrently reapable at this
> stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
> temporarily relaxed without breaking its constraints, providing a way
> out of the deadlock scenario.
> 
> Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Eric W . Biederman <ebiederm@xmission.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  include/linux/rcupdate.h |  2 ++
>  kernel/pid_namespace.c   | 17 +++++++++++++++++
>  kernel/rcu/tasks.h       | 15 +++++++++++++--
>  3 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 89b3036746d2..a19d91d5461c 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -238,6 +238,7 @@ void synchronize_rcu_tasks_rude(void);
>  
>  #define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
>  void exit_tasks_rcu_start(void);
> +void exit_tasks_rcu_stop(void);
>  void exit_tasks_rcu_finish(void);
>  #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
>  #define rcu_tasks_classic_qs(t, preempt) do { } while (0)
> @@ -246,6 +247,7 @@ void exit_tasks_rcu_finish(void);
>  #define call_rcu_tasks call_rcu
>  #define synchronize_rcu_tasks synchronize_rcu
>  static inline void exit_tasks_rcu_start(void) { }
> +static inline void exit_tasks_rcu_stop(void) { }
>  static inline void exit_tasks_rcu_finish(void) { }
>  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
>  
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index f4f8cb0435b4..fc21c5d5fd5d 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		if (pid_ns->pid_allocated == init_pids)
>  			break;
> +		/*
> +		 * Release tasks_rcu_exit_srcu to avoid following deadlock:
> +		 *
> +		 * 1) TASK A unshare(CLONE_NEWPID)
> +		 * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
> +		 *    and TASK C
> +		 * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
> +		 * 4) TASK A calls synchronize_rcu_tasks()
> +		 *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
> +		 * 5) *DEADLOCK*
> +		 *
> +		 * It is considered safe to release tasks_rcu_exit_srcu here
> +		 * because we assume the current task can not be concurrently
> +		 * reaped at this point.
> +		 */
> +		exit_tasks_rcu_stop();
>  		schedule();
> +		exit_tasks_rcu_start();
>  	}
>  	__set_current_state(TASK_RUNNING);
>  
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 9a8114114b48..4dda8e6e5707 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1016,16 +1016,27 @@ void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
>   * task is exiting and may be removed from the tasklist. See
>   * corresponding synchronize_srcu() for further details.
>   */
> -void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
> +void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
>  {
>  	struct task_struct *t = current;
>  
>  	__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
> -	exit_tasks_rcu_finish_trace(t);
> +}
> +
> +/*
> + * Contribute to protect against tasklist scan blind spot while the
> + * task is exiting and may be removed from the tasklist. See
> + * corresponding synchronize_srcu() for further details.
> + */
> +void exit_tasks_rcu_finish(void)
> +{
> +	exit_tasks_rcu_stop();
> +	exit_tasks_rcu_finish_trace(current);
>  }
>  
>  #else /* #ifdef CONFIG_TASKS_RCU */
>  void exit_tasks_rcu_start(void) { }
> +void exit_tasks_rcu_stop(void) { }
>  void exit_tasks_rcu_finish(void) { exit_tasks_rcu_finish_trace(current); }
>  #endif /* #else #ifdef CONFIG_TASKS_RCU */
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-11-25 13:55 ` [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() Frederic Weisbecker
@ 2022-11-30 18:37   ` Eric W. Biederman
  2022-12-02 19:51     ` Paul E. McKenney
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Eric W. Biederman @ 2022-11-30 18:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Frederic Weisbecker <frederic@kernel.org> writes:

> RCU Tasks and PID-namespace unshare can interact in do_exit() in a
> complicated circular dependency:
>
> 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
>    that every subsequent child of TASK A will belong to. But TASK A
>    doesn't itself belong to that new PID namespace.
>
> 2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
>    namespace (let's say PID_NS1) and TASK B is the first task belonging
>    to the new PID namespace created by unshare()  (let's call it PID_NS2).
>
> 3) Since TASK B is the first task attached to PID_NS2, it becomes the
>    PID_NS2 child reaper.
>
> 4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
>    Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
>    TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.
>
> 5) TASK B exits and since it is the child reaper for PID_NS2, it has to
>    kill all other tasks attached to PID_NS2, and wait for all of them to
>    die before getting reaped itself (zap_pid_ns_process()).
>
> 6) TASK A calls synchronize_rcu_tasks() which leads to
>    synchronize_srcu(&tasks_rcu_exit_srcu).
>
> 7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
>    tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
>    exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.
>
> 8) TASK C exits and since TASK A is its parent, it waits for it to reap
>    TASK C, but it can't because TASK A waits for TASK B that waits for
>    TASK C.
>
> Pid_namespace semantics can hardly be changed at this point. But the
> coverage of tasks_rcu_exit_srcu can be reduced instead.
>
> The current task is assumed not to be concurrently reapable at this
> stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
> temporarily relaxed without breaking its constraints, providing a way
> out of the deadlock scenario.
>
> Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Eric W . Biederman <ebiederm@xmission.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  include/linux/rcupdate.h |  2 ++
>  kernel/pid_namespace.c   | 17 +++++++++++++++++
>  kernel/rcu/tasks.h       | 14 ++++++++++++--
>  3 files changed, 31 insertions(+), 2 deletions(-)

> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index f4f8cb0435b4..fc21c5d5fd5d 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		if (pid_ns->pid_allocated == init_pids)
>  			break;
> +		/*
> +		 * Release tasks_rcu_exit_srcu to avoid following deadlock:
> +		 *
> +		 * 1) TASK A unshare(CLONE_NEWPID)
> +		 * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
> +		 *    and TASK C
> +		 * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
> +		 * 4) TASK A calls synchronize_rcu_tasks()
> +		 *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
> +		 * 5) *DEADLOCK*
> +		 *
> +		 * It is considered safe to release tasks_rcu_exit_srcu here
> +		 * because we assume the current task can not be concurrently
> +		 * reaped at this point.
> +		 */
> +		exit_tasks_rcu_stop();
>  		schedule();
> +		exit_tasks_rcu_start();
>  	}
>  	__set_current_state(TASK_RUNNING);

Two questions.

1) Is there any chance you need the exit_task_rcu_stop() and
   exit_tasks_rcu_start() around schedule in the part of this code that
   calls kernel_wait4.

2) I keep thinking zap_pid_ns_processes() should be changed so that
   after it sends SIGKILL to all of the relevant processes to not wait,
   and instead have wait_consider_task simply not allow the 
   init process of the pid namespace to be reaped.

   Am I right in thinking that such a change were to be made it would
   make remove the deadlock without having to have any special code?

   It is just tricky enough to do that I don't want to discourage your
   simpler change but this looks like a case that makes the pain of
   changing zap_pid_ns_processes worthwhile in the practice.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-11-30 18:37   ` Eric W. Biederman
@ 2022-12-02 19:51     ` Paul E. McKenney
  2022-12-02 22:54     ` Frederic Weisbecker
  2022-12-06 16:49     ` Oleg Nesterov
  2 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-02 19:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Frederic Weisbecker, LKML, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > RCU Tasks and PID-namespace unshare can interact in do_exit() in a
> > complicated circular dependency:
> >
> > 1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
> >    that every subsequent child of TASK A will belong to. But TASK A
> >    doesn't itself belong to that new PID namespace.
> >
> > 2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
> >    namespace (let's say PID_NS1) and TASK B is the first task belonging
> >    to the new PID namespace created by unshare()  (let's call it PID_NS2).
> >
> > 3) Since TASK B is the first task attached to PID_NS2, it becomes the
> >    PID_NS2 child reaper.
> >
> > 4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
> >    Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
> >    TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.
> >
> > 5) TASK B exits and since it is the child reaper for PID_NS2, it has to
> >    kill all other tasks attached to PID_NS2, and wait for all of them to
> >    die before getting reaped itself (zap_pid_ns_process()).
> >
> > 6) TASK A calls synchronize_rcu_tasks() which leads to
> >    synchronize_srcu(&tasks_rcu_exit_srcu).
> >
> > 7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
> >    tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
> >    exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.
> >
> > 8) TASK C exits and since TASK A is its parent, it waits for it to reap
> >    TASK C, but it can't because TASK A waits for TASK B that waits for
> >    TASK C.
> >
> > Pid_namespace semantics can hardly be changed at this point. But the
> > coverage of tasks_rcu_exit_srcu can be reduced instead.
> >
> > The current task is assumed not to be concurrently reapable at this
> > stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
> > temporarily relaxed without breaking its constraints, providing a way
> > out of the deadlock scenario.
> >
> > Fixes: 3f95aa81d265 ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
> > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> > Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Eric W . Biederman <ebiederm@xmission.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  include/linux/rcupdate.h |  2 ++
> >  kernel/pid_namespace.c   | 17 +++++++++++++++++
> >  kernel/rcu/tasks.h       | 14 ++++++++++++--
> >  3 files changed, 31 insertions(+), 2 deletions(-)
> 
> > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> > index f4f8cb0435b4..fc21c5d5fd5d 100644
> > --- a/kernel/pid_namespace.c
> > +++ b/kernel/pid_namespace.c
> > @@ -244,7 +244,24 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> >  		set_current_state(TASK_INTERRUPTIBLE);
> >  		if (pid_ns->pid_allocated == init_pids)
> >  			break;
> > +		/*
> > +		 * Release tasks_rcu_exit_srcu to avoid following deadlock:
> > +		 *
> > +		 * 1) TASK A unshare(CLONE_NEWPID)
> > +		 * 2) TASK A fork() twice -> TASK B (child reaper for new ns)
> > +		 *    and TASK C
> > +		 * 3) TASK B exits, kills TASK C, waits for TASK A to reap it
> > +		 * 4) TASK A calls synchronize_rcu_tasks()
> > +		 *                   -> synchronize_srcu(tasks_rcu_exit_srcu)
> > +		 * 5) *DEADLOCK*
> > +		 *
> > +		 * It is considered safe to release tasks_rcu_exit_srcu here
> > +		 * because we assume the current task can not be concurrently
> > +		 * reaped at this point.
> > +		 */
> > +		exit_tasks_rcu_stop();
> >  		schedule();
> > +		exit_tasks_rcu_start();
> >  	}
> >  	__set_current_state(TASK_RUNNING);
> 
> Two questions.
> 
> 1) Is there any chance you need the exit_task_rcu_stop() and
>    exit_tasks_rcu_start() around schedule in the part of this code that
>    calls kernel_wait4.

Quite possibly, but I must defer to Frederic on this one.

> 2) I keep thinking zap_pid_ns_processes() should be changed so that
>    after it sends SIGKILL to all of the relevant processes to not wait,
>    and instead have wait_consider_task simply not allow the 
>    init process of the pid namespace to be reaped.
> 
>    Am I right in thinking that such a change were to be made it would
>    make remove the deadlock without having to have any special code?
> 
>    It is just tricky enough to do that I don't want to discourage your
>    simpler change but this looks like a case that makes the pain of
>    changing zap_pid_ns_processes worthwhile in the practice.

I would dearly love for there to be a fix that allowed the RCU-related
code to go back to what it was originally.  But there is apparently some
concern that users might be relying on the current sleep-while-exiting
semantics.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-11-30 18:37   ` Eric W. Biederman
  2022-12-02 19:51     ` Paul E. McKenney
@ 2022-12-02 22:54     ` Frederic Weisbecker
  2022-12-02 23:28       ` Eric W. Biederman
  2022-12-06 16:49     ` Oleg Nesterov
  2 siblings, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2022-12-02 22:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul E . McKenney, LKML, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> Two questions.
> 
> 1) Is there any chance you need the exit_task_rcu_stop() and
>    exit_tasks_rcu_start() around schedule in the part of this code that
>    calls kernel_wait4.

Indeed it could be relaxed there too if necessary.

> 
> 2) I keep thinking zap_pid_ns_processes() should be changed so that
>    after it sends SIGKILL to all of the relevant processes to not wait,
>    and instead have wait_consider_task simply not allow the 
>    init process of the pid namespace to be reaped.
> 
>    Am I right in thinking that such a change were to be made it would
>    make remove the deadlock without having to have any special code?
> 
>    It is just tricky enough to do that I don't want to discourage your
>    simpler change but this looks like a case that makes the pain of
>    changing zap_pid_ns_processes worthwhile in the practice.

So you mean it still reaps those that were EXIT_ZOMBIE before ignoring
SIGCHLD (the kernel_wait4() call) but it doesn't sleep anymore on those
that autoreap (or get reaped by a parent outside that namespace) after
ignoring SIGCHLD? Namely it doesn't do the schedule() loop I'm working
around here and proceeds with exit_notify() and notifies its parent?

And then in this case the responsibility of sleeping, until the init_process
of the namespace is the last task in the namespace, goes to the parent while
waiting that init_process, right?

But what if the init_process of the given namespace autoreaps? Should it then
wait itself until the namespace is empty? And then aren't we back to the initial
issue?

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns
  2022-11-29 14:48     ` Paul E. McKenney
@ 2022-12-02 22:55       ` Frederic Weisbecker
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-12-02 22:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Eric W . Biederman, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Tue, Nov 29, 2022 at 06:48:34AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 29, 2022 at 10:55:00AM +0100, Frederic Weisbecker wrote:
> > On Mon, Nov 28, 2022 at 04:22:40PM -0800, Paul E. McKenney wrote:
> > > On Fri, Nov 25, 2022 at 02:54:57PM +0100, Frederic Weisbecker wrote:
> > > > Pengfei Xu has reported a deadlock involving calls to unshare(),
> > > > perf_event_open() and clone3() calls. It requires CAP_SYS_ADMIN
> > > > to reproduce (at least I don't see a way for a non privilege process to
> > > > reproduce).
> > > > 
> > > > See this thread for details: https://lore.kernel.org/all/Y3sOgrOmMQqPMItu@xpf.sh.intel.com/
> > > > And this document for the collaborative analysis with Boqun, Paul and Neeraj:
> > > > https://docs.google.com/document/d/1hJxgiZ5TMZ4YJkdJPLAkRvq7sYQ-A7svgA8no6i-v8k
> > > > 
> > > > The two first patches are small improvements. The fix is in the last patch.
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > 	rcu/dev
> > > > 
> > > > HEAD: 45ef5a0a4be4e0db9eadcc86e8f346d34c62e744
> > > 
> > > Hearing no objections, queued for further review and testing.
> > > 
> > > And thank you very much!  That race between synchronize_rcu_tasks() and
> > > zap_pid_ns_processes() certainly was more than a bit on the non-trivial
> > > side.  Good show!!!
> > 
> > Thanks!
> > 
> > Also please replace the last patch with the following to fix
> > a !CONFIG_RCU_TASKS issue:
> 
> Like this?  ;-)
> 
> a0c355bbdfee ("squash! rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()")

You got it! :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-02 22:54     ` Frederic Weisbecker
@ 2022-12-02 23:28       ` Eric W. Biederman
  2022-12-04  0:03         ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2022-12-02 23:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E . McKenney, LKML, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Frederic Weisbecker <frederic@kernel.org> writes:

> On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote:
>> Frederic Weisbecker <frederic@kernel.org> writes:
>> Two questions.
>> 
>> 1) Is there any chance you need the exit_task_rcu_stop() and
>>    exit_tasks_rcu_start() around schedule in the part of this code that
>>    calls kernel_wait4.
>
> Indeed it could be relaxed there too if necessary.

I was just wondering how you tell if it is necessary and if perhaps
the kernel_wait4 was overlooked.

>> 2) I keep thinking zap_pid_ns_processes() should be changed so that
>>    after it sends SIGKILL to all of the relevant processes to not wait,
>>    and instead have wait_consider_task simply not allow the 
>>    init process of the pid namespace to be reaped.
>> 
>>    Am I right in thinking that such a change were to be made it would
>>    make remove the deadlock without having to have any special code?
>> 
>>    It is just tricky enough to do that I don't want to discourage your
>>    simpler change but this looks like a case that makes the pain of
>>    changing zap_pid_ns_processes worthwhile in the practice.
>
> So you mean it still reaps those that were EXIT_ZOMBIE before ignoring
> SIGCHLD (the kernel_wait4() call) but it doesn't sleep anymore on those
> that autoreap (or get reaped by a parent outside that namespace) after
> ignoring SIGCHLD? Namely it doesn't do the schedule() loop I'm working
> around here and proceeds with exit_notify() and notifies its parent?

Yes.  A change to make it work like when the thread group leader exits
before the other threads.  So any actual sleeping happens in the reaper
of the init process when the reaper calls wait(2).

> And then in this case the responsibility of sleeping, until the init_process
> of the namespace is the last task in the namespace, goes to the parent while
> waiting that init_process, right?
>
> But what if the init_process of the given namespace autoreaps? Should it then
> wait itself until the namespace is empty? And then aren't we back to the initial
> issue?

The idea is that we only care when the userspace cares.  I don't think
there is any kernel code that fundamentally cares.  There might be a few
bits that accidentally care and those would need to be take care of when
making the proposed change.  The wait would only exist for userspace so
the same semantics are observed.

Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-02 23:28       ` Eric W. Biederman
@ 2022-12-04  0:03         ` Frederic Weisbecker
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-12-04  0:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul E . McKenney, LKML, Neeraj Upadhyay, Oleg Nesterov,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Fri, Dec 02, 2022 at 05:28:54PM -0600, Eric W. Biederman wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > On Wed, Nov 30, 2022 at 12:37:15PM -0600, Eric W. Biederman wrote:
> >> Frederic Weisbecker <frederic@kernel.org> writes:
> >> Two questions.
> >> 
> >> 1) Is there any chance you need the exit_task_rcu_stop() and
> >>    exit_tasks_rcu_start() around schedule in the part of this code that
> >>    calls kernel_wait4.
> >
> > Indeed it could be relaxed there too if necessary.
> 
> I was just wondering how you tell if it is necessary and if perhaps
> the kernel_wait4 was overlooked.

As for the deadlock described in this patch, it involves waiting for
another task to reap yet another task. So it doesn't look necessary to
relax the lock when the current task does the reaping.


Now the following looks possible:

      TASK A                                    TASK B                       TASK C
      -----                                   -------                       ------
      mutex_lock(&lock1)
      synchronize_srcu(tasks_rcu_exit_srcu)
                                                                            mutex_lock(&lock1)
                                                                            //wait for TASK A to release &lock1
                                              exit_tasks_rcu_start();
                                              ...
                                              zap_pid_ns_processes()
                                                  kernel_wait4()
                                                      //wait for TASK C

> > So you mean it still reaps those that were EXIT_ZOMBIE before ignoring
> > SIGCHLD (the kernel_wait4() call) but it doesn't sleep anymore on those
> > that autoreap (or get reaped by a parent outside that namespace) after
> > ignoring SIGCHLD? Namely it doesn't do the schedule() loop I'm working
> > around here and proceeds with exit_notify() and notifies its parent?
> 
> Yes.  A change to make it work like when the thread group leader exits
> before the other threads.  So any actual sleeping happens in the reaper
> of the init process when the reaper calls wait(2).
> 
> > And then in this case the responsibility of sleeping, until the init_process
> > of the namespace is the last task in the namespace, goes to the parent while
> > waiting that init_process, right?
> >
> > But what if the init_process of the given namespace autoreaps? Should it then
> > wait itself until the namespace is empty? And then aren't we back to the initial
> > issue?
> 
> The idea is that we only care when the userspace cares.  I don't think
> there is any kernel code that fundamentally cares.  There might be a few
> bits that accidentally care and those would need to be take care of when
> making the proposed change.  The wait would only exist for userspace so
> the same semantics are observed.

I mean the issue is that if the pid_namespace reaper goes for autoreaping, then
it still has to go through that loop waiting for the remaining tasks of the
pid_namespace to be reaped. The loop will just happen later in exit_notify() but
eventually the issue remains: we'll have to relax tasks_rcu_exit_srcu
around that loop.

Thanks.

> 
> Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-11-30 18:37   ` Eric W. Biederman
  2022-12-02 19:51     ` Paul E. McKenney
  2022-12-02 22:54     ` Frederic Weisbecker
@ 2022-12-06 16:49     ` Oleg Nesterov
  2022-12-07 14:34       ` Paul E. McKenney
  2022-12-07 20:01       ` Frederic Weisbecker
  2 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2022-12-06 16:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Frederic Weisbecker, Paul E . McKenney, LKML, Neeraj Upadhyay,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On 11/30, Eric W. Biederman wrote:
>
> 2) I keep thinking zap_pid_ns_processes() should be changed so that
>    after it sends SIGKILL to all of the relevant processes to not wait,

At least I think it should not wait for the tasks injected into this ns.

Because this looks like a kernel bug even if we forget about this deadlock.

Say we create a task P using clone(CLONE_NEWPID), then inject a task T into
P's pid-namespace via setns/fork. This make the process P "unkillable", it
will hang in zap_pid_ns_processes() "forever" until T->parent reaps a zombie
task T killed by P.

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-06 16:49     ` Oleg Nesterov
@ 2022-12-07 14:34       ` Paul E. McKenney
  2022-12-07 20:01       ` Frederic Weisbecker
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-07 14:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Frederic Weisbecker, LKML, Neeraj Upadhyay,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Tue, Dec 06, 2022 at 05:49:28PM +0100, Oleg Nesterov wrote:
> On 11/30, Eric W. Biederman wrote:
> >
> > 2) I keep thinking zap_pid_ns_processes() should be changed so that
> >    after it sends SIGKILL to all of the relevant processes to not wait,
> 
> At least I think it should not wait for the tasks injected into this ns.
> 
> Because this looks like a kernel bug even if we forget about this deadlock.
> 
> Say we create a task P using clone(CLONE_NEWPID), then inject a task T into
> P's pid-namespace via setns/fork. This make the process P "unkillable", it
> will hang in zap_pid_ns_processes() "forever" until T->parent reaps a zombie
> task T killed by P.

Given that this is not the first time that Tasks Trace RCU has been
involved in a deadlock involving this exit path, I could certainly get
behind this approach.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-06 16:49     ` Oleg Nesterov
  2022-12-07 14:34       ` Paul E. McKenney
@ 2022-12-07 20:01       ` Frederic Weisbecker
  2022-12-07 20:39         ` Oleg Nesterov
  1 sibling, 1 reply; 18+ messages in thread
From: Frederic Weisbecker @ 2022-12-07 20:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Paul E . McKenney, LKML, Neeraj Upadhyay,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Tue, Dec 06, 2022 at 05:49:28PM +0100, Oleg Nesterov wrote:
> On 11/30, Eric W. Biederman wrote:
> >
> > 2) I keep thinking zap_pid_ns_processes() should be changed so that
> >    after it sends SIGKILL to all of the relevant processes to not wait,
> 
> At least I think it should not wait for the tasks injected into this ns.
> 
> Because this looks like a kernel bug even if we forget about this deadlock.
> 
> Say we create a task P using clone(CLONE_NEWPID), then inject a task T into
> P's pid-namespace via setns/fork. This make the process P "unkillable", it
> will hang in zap_pid_ns_processes() "forever" until T->parent reaps a zombie
> task T killed by P.

I think this was made that way on purpose, see the comment in
zap_pid_ns_processes():

	/*
	 * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
	 * process whose parents processes are outside of the pid
	 * namespace.  Such processes are created with setns()+fork().
	 *
	 * If those EXIT_ZOMBIE processes are not reaped by their
	 * parents before their parents exit, they will be reparented
	 * to pid_ns->child_reaper.  Thus pidns->child_reaper needs to
	 * stay valid until they all go away.
	 *
	 * The code relies on the pid_ns->child_reaper ignoring
	 * SIGCHILD to cause those EXIT_ZOMBIE processes to be
	 * autoreaped if reparented.
	 *
	 * Semantically it is also desirable to wait for EXIT_ZOMBIE
	 * processes before allowing the child_reaper to be reaped, as
	 * that gives the invariant that when the init process of a
	 * pid namespace is reaped all of the processes in the pid
	 * namespace are gone.

I can't say I like the fact that a parent not belonging to a new namespace
can create more than one child within that namespace but anyway this all look
like an ABI that can't be reverted now.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-07 20:01       ` Frederic Weisbecker
@ 2022-12-07 20:39         ` Oleg Nesterov
  2022-12-09 20:26           ` Frederic Weisbecker
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2022-12-07 20:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Eric W. Biederman, Paul E . McKenney, LKML, Neeraj Upadhyay,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

Frederic,

before anything else, your patches look good to me, it is not that
I tried to argue!

On 12/07, Frederic Weisbecker wrote:
>
> On Tue, Dec 06, 2022 at 05:49:28PM +0100, Oleg Nesterov wrote:
> >
> > At least I think it should not wait for the tasks injected into this ns.
> >
> > Because this looks like a kernel bug even if we forget about this deadlock.
> >
> I think this was made that way on purpose,

Well maybe. But to me we have this behaviour only because we (me at least)
do not know how to avoid the "hang" in this case.

> see the comment in zap_pid_ns_processes():

Heh ;) I wrote this comment in a53b83154914 ("exit: pidns: fix/update the
comments in zap_pid_ns_processes()") exactly because I didn't like this
behaviour, but I thought it must be documented.

> I can't say I like the fact that a parent not belonging to a new namespace
> can create more than one child within that namespace

not sure I understand but this looks fine and useful to me,

> but anyway this all look like an ABI that can't be reverted now.

perhaps... But you know, I wrote my previous email because 2 weeks ago
I had to investigate a bug report which blamed the kernel, while the
problem (unkillable process sleeping in zap_pid_ns_processes) was caused
by the dangling zombie injected into that process's namespace. And I am
still trying to convince the customer they need to fix userspace.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()
  2022-12-07 20:39         ` Oleg Nesterov
@ 2022-12-09 20:26           ` Frederic Weisbecker
  0 siblings, 0 replies; 18+ messages in thread
From: Frederic Weisbecker @ 2022-12-09 20:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, Paul E . McKenney, LKML, Neeraj Upadhyay,
	Pengfei Xu, Boqun Feng, Lai Jiangshan, rcu

On Wed, Dec 07, 2022 at 09:39:00PM +0100, Oleg Nesterov wrote:
> On 12/07, Frederic Weisbecker wrote:
> >
> > On Tue, Dec 06, 2022 at 05:49:28PM +0100, Oleg Nesterov wrote:
> > >
> > > At least I think it should not wait for the tasks injected into this ns.
> > >
> > > Because this looks like a kernel bug even if we forget about this deadlock.
> > >
> > I think this was made that way on purpose,
> 
> Well maybe. But to me we have this behaviour only because we (me at least)
> do not know how to avoid the "hang" in this case.
> 
> > see the comment in zap_pid_ns_processes():
> 
> Heh ;) I wrote this comment in a53b83154914 ("exit: pidns: fix/update the
> comments in zap_pid_ns_processes()") exactly because I didn't like this
> behaviour, but I thought it must be documented.

Bah! I should have guessed ;-)

> 
> > I can't say I like the fact that a parent not belonging to a new namespace
> > can create more than one child within that namespace
> 
> not sure I understand but this looks fine and useful to me,

I mean if only one task could be injected within a new namespace, we could be sure
that all subsequent tasks belonging to that namespace would be descendents of
that first task (the same way that every task in the default namespace is a
descendant of the real init_task) and thus we wouldn't be bothered with such
deadlocks. But I guess namespaces aren't designed to work like that. I don't
know much about them so what I'm saying is very likely irrelevant.

> > but anyway this all look like an ABI that can't be reverted now.
> 
> perhaps... But you know, I wrote my previous email because 2 weeks ago
> I had to investigate a bug report which blamed the kernel, while the
> problem (unkillable process sleeping in zap_pid_ns_processes) was caused
> by the dangling zombie injected into that process's namespace. And I am
> still trying to convince the customer they need to fix userspace.

Heh :-/

I wish we could fix this but I have no idea how. I guess the child_reaper
of an ns could avoid waiting for the rest of the ns and designate its parent
as the new child reaper.

Or we could arrange for all tasks in the ns to autoreap if they ever fall back
to be reaped by their ns->child_reaper and that child_reaper is dead.

But that would look like ABI breakages...

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-12-09 20:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-25 13:54 [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Frederic Weisbecker
2022-11-25 13:54 ` [PATCH 1/3] rcu-tasks: Improve comments explaining tasks_rcu_exit_srcu purpose Frederic Weisbecker
2022-11-25 13:54 ` [PATCH 2/3] rcu-tasks: Remove preemption disablement around srcu_read_[un]lock() calls Frederic Weisbecker
2022-11-25 13:55 ` [PATCH 3/3] rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() Frederic Weisbecker
2022-11-30 18:37   ` Eric W. Biederman
2022-12-02 19:51     ` Paul E. McKenney
2022-12-02 22:54     ` Frederic Weisbecker
2022-12-02 23:28       ` Eric W. Biederman
2022-12-04  0:03         ` Frederic Weisbecker
2022-12-06 16:49     ` Oleg Nesterov
2022-12-07 14:34       ` Paul E. McKenney
2022-12-07 20:01       ` Frederic Weisbecker
2022-12-07 20:39         ` Oleg Nesterov
2022-12-09 20:26           ` Frederic Weisbecker
2022-11-29  0:22 ` [PATCH 0/3] rcu-tasks: Fix race against exiting pid_ns Paul E. McKenney
2022-11-29  9:55   ` Frederic Weisbecker
2022-11-29 14:48     ` Paul E. McKenney
2022-12-02 22:55       ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).