All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
@ 2022-04-21 15:02 Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Find here a new posting of the ptrace and freezer patches :-)

The majority of the changes are in patch 2, which with much feedback from Oleg
and Eric has changed lots.

I'm hoping we're converging on something agreeable.

---
 drivers/acpi/x86/s2idle.c         |  12 +-
 drivers/android/binder.c          |   4 +-
 drivers/media/pci/pt3/pt3.c       |   4 +-
 drivers/scsi/scsi_transport_spi.c |   7 +-
 fs/cifs/inode.c                   |   4 +-
 fs/cifs/transport.c               |   5 +-
 fs/coredump.c                     |   5 +-
 fs/nfs/file.c                     |   3 +-
 fs/nfs/inode.c                    |  12 +-
 fs/nfs/nfs3proc.c                 |   3 +-
 fs/nfs/nfs4proc.c                 |  14 +--
 fs/nfs/nfs4state.c                |   3 +-
 fs/nfs/pnfs.c                     |   4 +-
 fs/xfs/xfs_trans_ail.c            |   8 +-
 include/linux/completion.h        |   1 +
 include/linux/freezer.h           | 244 ++------------------------------------
 include/linux/sched.h             |  49 ++++----
 include/linux/sched/jobctl.h      |  10 ++
 include/linux/sched/signal.h      |   5 +-
 include/linux/sunrpc/sched.h      |   7 +-
 include/linux/suspend.h           |   8 +-
 include/linux/umh.h               |   9 +-
 include/linux/wait.h              |  40 ++++++-
 init/do_mounts_initrd.c           |  10 +-
 kernel/cgroup/legacy_freezer.c    |  23 ++--
 kernel/exit.c                     |   4 +-
 kernel/fork.c                     |   5 +-
 kernel/freezer.c                  | 137 +++++++++++++++------
 kernel/futex/waitwake.c           |   8 +-
 kernel/hung_task.c                |   4 +-
 kernel/power/hibernate.c          |  35 ++++--
 kernel/power/main.c               |  18 +--
 kernel/power/process.c            |  10 +-
 kernel/power/suspend.c            |  12 +-
 kernel/power/user.c               |  24 ++--
 kernel/ptrace.c                   | 114 ++++++++++--------
 kernel/sched/completion.c         |   9 ++
 kernel/sched/core.c               |  24 ++--
 kernel/signal.c                   |  62 +++++++---
 kernel/time/hrtimer.c             |   4 +-
 kernel/umh.c                      |  18 ++-
 mm/khugepaged.c                   |   4 +-
 net/sunrpc/sched.c                |  12 +-
 net/unix/af_unix.c                |   8 +-
 44 files changed, 478 insertions(+), 528 deletions(-)


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-26 23:34   ` Eric W. Biederman
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h        |    8 +++-----
 include/linux/sched/jobctl.h |    6 ++++++
 include/linux/sched/signal.h |    5 ++++-
 kernel/ptrace.c              |   26 +++++++++++++++-----------
 kernel/signal.c              |   16 ++++++++++++----
 5 files changed, 40 insertions(+), 21 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,9 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 
+#define JOBCTL_STOPPED_BIT	24	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	25	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -29,6 +32,9 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,6 +441,7 @@ static inline void signal_wake_up(struct
 {
 	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
 }
+
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
 	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(st
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -218,9 +223,10 @@ static void ptrace_unfreeze_traced(struc
 	 */
 	spin_lock_irq(&task->sighand->siglock);
 	if (READ_ONCE(task->__state) == __TASK_TRACED) {
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~JOBCTL_TRACED;
 			wake_up_state(task, __TASK_TRACED);
-		else
+		} else
 			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -475,8 +481,10 @@ static int ptrace_attach(struct task_str
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -850,8 +858,6 @@ static long ptrace_get_rseq_configuratio
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -892,13 +898,11 @@ static int ptrace_resume(struct task_str
 	 * status and clears the code too; this can't race with the tracee, it
 	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(ke
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
 	 * By using wake_up_state, we ensure the process will wake up and
 	 * handle its death signal.
 	 */
-	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
+	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+	else
 		kick_process(t);
 }
 
@@ -884,7 +889,7 @@ static int check_kill_permission(int sig
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -930,9 +935,10 @@ static bool prepare_signal(int sig, stru
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2219,6 +2225,7 @@ static int ptrace_stop(int exit_code, in
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
 	 */
+	current->jobctl |= JOBCTL_TRACED;
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2460,6 +2467,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
                     ` (4 more replies)
  2022-04-21 15:02 ` [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
task->__state as much.

Due to how PREEMPT_RT is changing the rules vs task->__state with the
introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
blocking spinlock thing), the way ptrace freeze tries to do things no
longer works.

Specifically there are two problems:

 - due to ->saved_state, the ->__state modification removing
   TASK_WAKEKILL no longer works reliably.

 - due to ->saved_state, wait_task_inactive() also no longer works
   reliably.

The first problem is solved by a suggestion from Eric that instead
of changing __state, TASK_WAKEKILL be delayed.

The second problem is solved by a suggestion from Oleg; add
JOBCTL_TRACED_QUIESCE to cover the chunk of code between
set_current_state(TASK_TRACED) and schedule(), such that
ptrace_check_attach() can first wait for JOBCTL_TRACED_QUIESCE to get
cleared, and then use wait_task_inactive().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/jobctl.h |    8 ++-
 kernel/ptrace.c              |   90 ++++++++++++++++++++++---------------------
 kernel/sched/core.c          |    5 --
 kernel/signal.c              |   36 ++++++++++++++---
 4 files changed, 86 insertions(+), 53 deletions(-)

--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,9 +19,11 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT 24	/* delay killable wakeups */
 
-#define JOBCTL_STOPPED_BIT	24	/* do_signal_stop() */
-#define JOBCTL_TRACED_BIT	25	/* ptrace_stop() */
+#define JOBCTL_STOPPED_BIT	25	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	26	/* ptrace_stop() */
+#define JOBCTL_TRACED_QUIESCE_BIT 27
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -31,9 +33,11 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
 #define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+#define JOBCTL_TRACED_QUIESCE	(1UL << JOBCTL_TRACED_QUIESCE_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -193,41 +193,44 @@ static bool looks_like_a_spurious_pid(st
  */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
+	unsigned long flags;
 	bool ret = false;
 
 	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
-	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
+	if (!lock_task_sighand(task, &flags))
+		return ret;
+
+	if (task_is_traced(task) &&
+	    !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		WARN_ON_ONCE(READ_ONCE(task->__state) != TASK_TRACED);
+		WARN_ON_ONCE(task->jobctl & JOBCTL_DELAY_WAKEKILL);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
+	unlock_task_sighand(task, &flags);
 
 	return ret;
 }
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!task_is_traced(task))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
 
-	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
-	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task_is_traced(task)) {
+//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task)) {
 			task->jobctl &= ~JOBCTL_TRACED;
-			wake_up_state(task, __TASK_TRACED);
-		} else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+			wake_up_state(task, TASK_WAKEKILL);
+		}
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -251,40 +254,45 @@ static void ptrace_unfreeze_traced(struc
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
-	int ret = -ESRCH;
+	int traced;
 
 	/*
 	 * We take the read lock around doing both checks to close a
-	 * possible race where someone else was tracing our child and
-	 * detached between these two checks.  After this locked check,
-	 * we are sure that this is our traced child and that can only
-	 * be changed by us so it's not changing right after this.
+	 * possible race where someone else attaches or detaches our
+	 * natural child.
 	 */
 	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
-	}
+	traced = child->ptrace && child->parent == current;
 	read_unlock(&tasklist_lock);
+	if (!traced)
+		return -ESRCH;
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
+	if (ignore_state)
+		return 0;
+
+	if (!task_is_traced(child))
+		return -ESRCH;
+
+	WARN_ON_ONCE(READ_ONCE(child->jobctl) & JOBCTL_DELAY_WAKEKILL);
+
+	/* Wait for JOBCTL_TRACED_QUIESCE to go away, see ptrace_stop(). */
+	for (;;) {
+		if (fatal_signal_pending(current))
+			return -EINTR;
+
+		set_current_state(TASK_KILLABLE);
+		if (!(READ_ONCE(child->jobctl) & JOBCTL_TRACED_QUIESCE))
+			break;
+
+		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 
-	return ret;
+	if (!wait_task_inactive(child, TASK_TRACED) ||
+	    !ptrace_freeze_traced(child))
+		return -ESRCH;
+
+	return 0;
 }
 
 static bool ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
@@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
 		goto out_put_task_struct;
 
 	ret = arch_ptrace(child, request, addr, data);
-	if (ret || request != PTRACE_DETACH)
-		ptrace_unfreeze_traced(child);
+	ptrace_unfreeze_traced(child);
 
  out_put_task_struct:
 	put_task_struct(child);
@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
 				  request == PTRACE_INTERRUPT);
 	if (!ret) {
 		ret = compat_arch_ptrace(child, request, addr, data);
-		if (ret || request != PTRACE_DETACH)
-			ptrace_unfreeze_traced(child);
+		ptrace_unfreeze_traced(child);
 	}
 
  out_put_task_struct:
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -764,6 +764,10 @@ void signal_wake_up_state(struct task_st
 {
 	lockdep_assert_held(&t->sighand->siglock);
 
+	/* Suppress wakekill? */
+	if (t->jobctl & JOBCTL_DELAY_WAKEKILL)
+		state &= ~TASK_WAKEKILL;
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
 
 	/*
@@ -774,7 +778,7 @@ void signal_wake_up_state(struct task_st
 	 * handle its death signal.
 	 */
 	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
-		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);
 	else
 		kick_process(t);
 }
@@ -2187,6 +2191,15 @@ static void do_notify_parent_cldstop(str
 	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
+static void clear_traced_quiesce(void)
+{
+	spin_lock_irq(&current->sighand->siglock);
+	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
+	current->jobctl &= ~JOBCTL_TRACED_QUIESCE;
+	wake_up_state(current->parent, TASK_KILLABLE);
+	spin_unlock_irq(&current->sighand->siglock);
+}
+
 /*
  * This must be called with current->sighand->siglock held.
  *
@@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
 	 */
-	current->jobctl |= JOBCTL_TRACED;
+	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
 		 */
 		preempt_disable();
 		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
+		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
+
+		/*
+		 * JOBCTL_TRACE_QUIESCE bridges the gap between
+		 * set_current_state(TASK_TRACED) above and schedule() below.
+		 * There must not be any blocking (specifically anything that
+		 * touched ->saved_state on PREEMPT_RT) between here and
+		 * schedule().
+		 *
+		 * ptrace_check_attach() relies on this with its
+		 * wait_task_inactive() usage.
+		 */
+		clear_traced_quiesce();
+
 		preempt_enable_no_resched();
 		freezable_schedule();
+
 		cgroup_leave_frozen(true);
 	} else {
 		/*
@@ -2335,6 +2360,7 @@ static int ptrace_stop(int exit_code, in
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rafael explained that the reason for having both PF_NOFREEZE and
PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from
kthread context that has previously called set_freezable().

In preparation of merging the flags, have {,un}lock_system_slee() save
and restore current->flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/acpi/x86/s2idle.c         |   12 ++++++++----
 drivers/scsi/scsi_transport_spi.c |    7 ++++---
 include/linux/suspend.h           |    8 ++++----
 kernel/power/hibernate.c          |   35 ++++++++++++++++++++++-------------
 kernel/power/main.c               |   16 ++++++++++------
 kernel/power/suspend.c            |   12 ++++++++----
 kernel/power/user.c               |   24 ++++++++++++++----------
 7 files changed, 70 insertions(+), 44 deletions(-)

--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -538,12 +538,14 @@ void acpi_s2idle_setup(void)
 
 int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg)
 {
+	unsigned int sleep_flags;
+
 	if (!lps0_device_handle || sleep_no_lps0)
 		return -ENODEV;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	list_add(&arg->list_node, &lps0_s2idle_devops_head);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return 0;
 }
@@ -551,12 +553,14 @@ EXPORT_SYMBOL_GPL(acpi_register_lps0_dev
 
 void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg)
 {
+	unsigned int sleep_flags;
+
 	if (!lps0_device_handle || sleep_no_lps0)
 		return;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	list_del(&arg->list_node);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(acpi_unregister_lps0_dev);
 
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -998,8 +998,9 @@ void
 spi_dv_device(struct scsi_device *sdev)
 {
 	struct scsi_target *starget = sdev->sdev_target;
-	u8 *buffer;
 	const int len = SPI_MAX_ECHO_BUFFER_SIZE*2;
+	unsigned int sleep_flags;
+	u8 *buffer;
 
 	/*
 	 * Because this function and the power management code both call
@@ -1007,7 +1008,7 @@ spi_dv_device(struct scsi_device *sdev)
 	 * while suspend or resume is in progress. Hence the
 	 * lock/unlock_system_sleep() calls.
 	 */
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (scsi_autopm_get_device(sdev))
 		goto unlock_system_sleep;
@@ -1058,7 +1059,7 @@ spi_dv_device(struct scsi_device *sdev)
 	scsi_autopm_put_device(sdev);
 
 unlock_system_sleep:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL(spi_dv_device);
 
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -510,8 +510,8 @@ extern bool pm_save_wakeup_count(unsigne
 extern void pm_wakep_autosleep_enabled(bool set);
 extern void pm_print_active_wakeup_sources(void);
 
-extern void lock_system_sleep(void);
-extern void unlock_system_sleep(void);
+extern unsigned int lock_system_sleep(void);
+extern void unlock_system_sleep(unsigned int);
 
 #else /* !CONFIG_PM_SLEEP */
 
@@ -534,8 +534,8 @@ static inline void pm_system_wakeup(void
 static inline void pm_wakeup_clear(bool reset) {}
 static inline void pm_system_irq_wakeup(unsigned int irq_number) {}
 
-static inline void lock_system_sleep(void) {}
-static inline void unlock_system_sleep(void) {}
+static inline unsigned int lock_system_sleep(void) { return 0; }
+static inline void unlock_system_sleep(unsigned int flags) {}
 
 #endif /* !CONFIG_PM_SLEEP */
 
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -92,20 +92,24 @@ bool hibernation_available(void)
  */
 void hibernation_set_ops(const struct platform_hibernation_ops *ops)
 {
+	unsigned int sleep_flags;
+
 	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
 	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
 	    && ops->restore_cleanup && ops->leave)) {
 		WARN_ON(1);
 		return;
 	}
-	lock_system_sleep();
+
+	sleep_flags = lock_system_sleep();
+
 	hibernation_ops = ops;
 	if (ops)
 		hibernation_mode = HIBERNATION_PLATFORM;
 	else if (hibernation_mode == HIBERNATION_PLATFORM)
 		hibernation_mode = HIBERNATION_SHUTDOWN;
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(hibernation_set_ops);
 
@@ -713,6 +717,7 @@ static int load_image_and_restore(void)
 int hibernate(void)
 {
 	bool snapshot_test = false;
+	unsigned int sleep_flags;
 	int error;
 
 	if (!hibernation_available()) {
@@ -720,7 +725,7 @@ int hibernate(void)
 		return -EPERM;
 	}
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	/* The snapshot device should not be opened while we're running */
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -794,7 +799,7 @@ int hibernate(void)
 	pm_restore_console();
 	hibernate_release();
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 	pr_info("hibernation exit\n");
 
 	return error;
@@ -809,9 +814,10 @@ int hibernate(void)
  */
 int hibernate_quiet_exec(int (*func)(void *data), void *data)
 {
+	unsigned int sleep_flags;
 	int error;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -891,7 +897,7 @@ int hibernate_quiet_exec(int (*func)(voi
 	hibernate_release();
 
 unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error;
 }
@@ -1100,11 +1106,12 @@ static ssize_t disk_show(struct kobject
 static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
 			  const char *buf, size_t n)
 {
+	int mode = HIBERNATION_INVALID;
+	unsigned int sleep_flags;
 	int error = 0;
-	int i;
 	int len;
 	char *p;
-	int mode = HIBERNATION_INVALID;
+	int i;
 
 	if (!hibernation_available())
 		return -EPERM;
@@ -1112,7 +1119,7 @@ static ssize_t disk_store(struct kobject
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
 		if (len == strlen(hibernation_modes[i])
 		    && !strncmp(buf, hibernation_modes[i], len)) {
@@ -1142,7 +1149,7 @@ static ssize_t disk_store(struct kobject
 	if (!error)
 		pm_pr_dbg("Hibernation mode set to '%s'\n",
 			       hibernation_modes[mode]);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 	return error ? error : n;
 }
 
@@ -1158,9 +1165,10 @@ static ssize_t resume_show(struct kobjec
 static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
 			    const char *buf, size_t n)
 {
-	dev_t res;
+	unsigned int sleep_flags;
 	int len = n;
 	char *name;
+	dev_t res;
 
 	if (len && buf[len-1] == '\n')
 		len--;
@@ -1173,9 +1181,10 @@ static ssize_t resume_store(struct kobje
 	if (!res)
 		return -EINVAL;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	swsusp_resume_device = res;
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
+
 	pm_pr_dbg("Configured hibernation resume from disk to %u\n",
 		  swsusp_resume_device);
 	noresume = 0;
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -21,14 +21,16 @@
 
 #ifdef CONFIG_PM_SLEEP
 
-void lock_system_sleep(void)
+unsigned int lock_system_sleep(void)
 {
+	unsigned int flags = current->flags;
 	current->flags |= PF_FREEZER_SKIP;
 	mutex_lock(&system_transition_mutex);
+	return flags;
 }
 EXPORT_SYMBOL_GPL(lock_system_sleep);
 
-void unlock_system_sleep(void)
+void unlock_system_sleep(unsigned int flags)
 {
 	/*
 	 * Don't use freezer_count() because we don't want the call to
@@ -46,7 +48,8 @@ void unlock_system_sleep(void)
 	 * Which means, if we use try_to_freeze() here, it would make them
 	 * enter the refrigerator, thus causing hibernation to lockup.
 	 */
-	current->flags &= ~PF_FREEZER_SKIP;
+	if (!(flags & PF_FREEZER_SKIP))
+		current->flags &= ~PF_FREEZER_SKIP;
 	mutex_unlock(&system_transition_mutex);
 }
 EXPORT_SYMBOL_GPL(unlock_system_sleep);
@@ -260,16 +263,17 @@ static ssize_t pm_test_show(struct kobje
 static ssize_t pm_test_store(struct kobject *kobj, struct kobj_attribute *attr,
 				const char *buf, size_t n)
 {
+	unsigned int sleep_flags;
 	const char * const *s;
+	int error = -EINVAL;
 	int level;
 	char *p;
 	int len;
-	int error = -EINVAL;
 
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	level = TEST_FIRST;
 	for (s = &pm_tests[level]; level <= TEST_MAX; s++, level++)
@@ -279,7 +283,7 @@ static ssize_t pm_test_store(struct kobj
 			break;
 		}
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error ? error : n;
 }
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -75,9 +75,11 @@ EXPORT_SYMBOL_GPL(pm_suspend_default_s2i
 
 void s2idle_set_ops(const struct platform_s2idle_ops *ops)
 {
-	lock_system_sleep();
+	unsigned int sleep_flags;
+
+	sleep_flags = lock_system_sleep();
 	s2idle_ops = ops;
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 
 static void s2idle_begin(void)
@@ -200,7 +202,9 @@ __setup("mem_sleep_default=", mem_sleep_
  */
 void suspend_set_ops(const struct platform_suspend_ops *ops)
 {
-	lock_system_sleep();
+	unsigned int sleep_flags;
+
+	sleep_flags = lock_system_sleep();
 
 	suspend_ops = ops;
 
@@ -216,7 +220,7 @@ void suspend_set_ops(const struct platfo
 			mem_sleep_current = PM_SUSPEND_MEM;
 	}
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(suspend_set_ops);
 
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -46,12 +46,13 @@ int is_hibernate_resume_dev(dev_t dev)
 static int snapshot_open(struct inode *inode, struct file *filp)
 {
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	int error;
 
 	if (!hibernation_available())
 		return -EPERM;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -97,7 +98,7 @@ static int snapshot_open(struct inode *i
 	data->dev = 0;
 
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error;
 }
@@ -105,8 +106,9 @@ static int snapshot_open(struct inode *i
 static int snapshot_release(struct inode *inode, struct file *filp)
 {
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	swsusp_free();
 	data = filp->private_data;
@@ -123,7 +125,7 @@ static int snapshot_release(struct inode
 			PM_POST_HIBERNATION : PM_POST_RESTORE);
 	hibernate_release();
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return 0;
 }
@@ -131,11 +133,12 @@ static int snapshot_release(struct inode
 static ssize_t snapshot_read(struct file *filp, char __user *buf,
                              size_t count, loff_t *offp)
 {
+	loff_t pg_offp = *offp & ~PAGE_MASK;
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	ssize_t res;
-	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	data = filp->private_data;
 	if (!data->ready) {
@@ -156,7 +159,7 @@ static ssize_t snapshot_read(struct file
 		*offp += res;
 
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return res;
 }
@@ -164,11 +167,12 @@ static ssize_t snapshot_read(struct file
 static ssize_t snapshot_write(struct file *filp, const char __user *buf,
                               size_t count, loff_t *offp)
 {
+	loff_t pg_offp = *offp & ~PAGE_MASK;
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	ssize_t res;
-	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	data = filp->private_data;
 
@@ -190,7 +194,7 @@ static ssize_t snapshot_write(struct fil
 	if (res > 0)
 		*offp += res;
 unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return res;
 }



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
  5 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

handle_initrd() marks itself as PF_FREEZER_SKIP in order to ensure
that the UMH, which is going to freeze the system, doesn't
indefinitely wait for it's caller.

Rework things by adding UMH_FREEZABLE to indicate the completion is
freezable.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/umh.h     |    9 +++++----
 init/do_mounts_initrd.c |   10 +---------
 kernel/umh.c            |    8 ++++++++
 3 files changed, 14 insertions(+), 13 deletions(-)

--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -11,10 +11,11 @@
 struct cred;
 struct file;
 
-#define UMH_NO_WAIT	0	/* don't wait at all */
-#define UMH_WAIT_EXEC	1	/* wait for the exec, but not the process */
-#define UMH_WAIT_PROC	2	/* wait for the process to complete */
-#define UMH_KILLABLE	4	/* wait for EXEC/PROC killable */
+#define UMH_NO_WAIT	0x00	/* don't wait at all */
+#define UMH_WAIT_EXEC	0x01	/* wait for the exec, but not the process */
+#define UMH_WAIT_PROC	0x02	/* wait for the process to complete */
+#define UMH_KILLABLE	0x04	/* wait for EXEC/PROC killable */
+#define UMH_FREEZABLE	0x08	/* wait for EXEC/PROC freezable */
 
 struct subprocess_info {
 	struct work_struct work;
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -79,19 +79,11 @@ static void __init handle_initrd(void)
 	init_mkdir("/old", 0700);
 	init_chdir("/old");
 
-	/*
-	 * In case that a resume from disk is carried out by linuxrc or one of
-	 * its children, we need to tell the freezer not to wait for us.
-	 */
-	current->flags |= PF_FREEZER_SKIP;
-
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
 					 GFP_KERNEL, init_linuxrc, NULL, NULL);
 	if (!info)
 		return;
-	call_usermodehelper_exec(info, UMH_WAIT_PROC);
-
-	current->flags &= ~PF_FREEZER_SKIP;
+	call_usermodehelper_exec(info, UMH_WAIT_PROC|UMH_FREEZABLE);
 
 	/* move initrd to rootfs' /old */
 	init_mount("..", ".", NULL, MS_MOVE, NULL);
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -28,6 +28,7 @@
 #include <linux/async.h>
 #include <linux/uaccess.h>
 #include <linux/initrd.h>
+#include <linux/freezer.h>
 
 #include <trace/events/module.h>
 
@@ -436,6 +437,9 @@ int call_usermodehelper_exec(struct subp
 	if (wait == UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
 
+	if (wait & UMH_FREEZABLE)
+		freezer_do_not_count();
+
 	if (wait & UMH_KILLABLE) {
 		retval = wait_for_completion_killable(&done);
 		if (!retval)
@@ -448,6 +452,10 @@ int call_usermodehelper_exec(struct subp
 	}
 
 	wait_for_completion(&done);
+
+	if (wait & UMH_FREEZABLE)
+		freezer_count();
+
 wait_done:
 	retval = sub_info->retval;
 out:



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 17:26   ` Eric W. Biederman
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
  5 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/android/binder.c       |    4 
 drivers/media/pci/pt3/pt3.c    |    4 
 fs/cifs/inode.c                |    4 
 fs/cifs/transport.c            |    5 
 fs/coredump.c                  |    5 
 fs/nfs/file.c                  |    3 
 fs/nfs/inode.c                 |   12 --
 fs/nfs/nfs3proc.c              |    3 
 fs/nfs/nfs4proc.c              |   14 +-
 fs/nfs/nfs4state.c             |    3 
 fs/nfs/pnfs.c                  |    4 
 fs/xfs/xfs_trans_ail.c         |    8 -
 include/linux/completion.h     |    1 
 include/linux/freezer.h        |  244 +----------------------------------------
 include/linux/sched.h          |   41 +++---
 include/linux/sunrpc/sched.h   |    7 -
 include/linux/wait.h           |   40 +++++-
 kernel/cgroup/legacy_freezer.c |   23 +--
 kernel/exit.c                  |    4 
 kernel/fork.c                  |    5 
 kernel/freezer.c               |  137 ++++++++++++++++-------
 kernel/futex/waitwake.c        |    8 -
 kernel/hung_task.c             |    4 
 kernel/power/main.c            |    6 -
 kernel/power/process.c         |   10 -
 kernel/ptrace.c                |    2 
 kernel/sched/completion.c      |    9 +
 kernel/sched/core.c            |   19 ++-
 kernel/signal.c                |   14 +-
 kernel/time/hrtimer.c          |    4 
 kernel/umh.c                   |   20 +--
 mm/khugepaged.c                |    4 
 net/sunrpc/sched.c             |   12 --
 net/unix/af_unix.c             |    8 -
 34 files changed, 281 insertions(+), 410 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4034,10 +4034,9 @@ static int binder_wait_for_work(struct b
 	struct binder_proc *proc = thread->proc;
 	int ret = 0;
 
-	freezer_do_not_count();
 	binder_inner_proc_lock(proc);
 	for (;;) {
-		prepare_to_wait(&thread->wait, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(&thread->wait, &wait, TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 		if (binder_has_work_ilocked(thread, do_proc_work))
 			break;
 		if (do_proc_work)
@@ -4054,7 +4053,6 @@ static int binder_wait_for_work(struct b
 	}
 	finish_wait(&thread->wait, &wait);
 	binder_inner_proc_unlock(proc);
-	freezer_count();
 
 	return ret;
 }
--- a/drivers/media/pci/pt3/pt3.c
+++ b/drivers/media/pci/pt3/pt3.c
@@ -445,8 +445,8 @@ static int pt3_fetch_thread(void *data)
 		pt3_proc_dma(adap);
 
 		delay = ktime_set(0, PT3_FETCH_DELAY * NSEC_PER_MSEC);
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		freezable_schedule_hrtimeout_range(&delay,
+		set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
+		schedule_hrtimeout_range(&delay,
 					PT3_FETCH_DELAY_DELTA * NSEC_PER_MSEC,
 					HRTIMER_MODE_REL);
 	}
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2286,7 +2286,7 @@ cifs_invalidate_mapping(struct inode *in
 static int
 cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
@@ -2304,7 +2304,7 @@ cifs_revalidate_mapping(struct inode *in
 		return 0;
 
 	rc = wait_on_bit_lock_action(flags, CIFS_INO_LOCK, cifs_wait_bit_killable,
-				     TASK_KILLABLE);
+				     TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 	if (rc)
 		return rc;
 
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -760,8 +760,9 @@ wait_for_response(struct TCP_Server_Info
 {
 	int error;
 
-	error = wait_event_freezekillable_unsafe(server->response_q,
-				    midQ->mid_state != MID_REQUEST_SUBMITTED);
+	error = wait_event_state(server->response_q,
+				 midQ->mid_state != MID_REQUEST_SUBMITTED,
+				 (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE));
 	if (error < 0)
 		return -ERESTARTSYS;
 
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -402,9 +402,8 @@ static int coredump_wait(int exit_code,
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
 
-		freezer_do_not_count();
-		wait_for_completion(&core_state->startup);
-		freezer_count();
+		wait_for_completion_state(&core_state->startup,
+					  TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
 		/*
 		 * Wait for all the threads to become inactive, so that
 		 * all the thread context (extended register state, like
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -565,7 +565,8 @@ static vm_fault_t nfs_vm_page_mkwrite(st
 	}
 
 	wait_on_bit_action(&NFS_I(inode)->flags, NFS_INO_INVALIDATING,
-			nfs_wait_bit_killable, TASK_KILLABLE);
+			   nfs_wait_bit_killable,
+			   TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 
 	lock_page(page);
 	mapping = page_file_mapping(page);
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -72,18 +72,13 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fat
 	return nfs_fileid_to_ino_t(fattr->fileid);
 }
 
-static int nfs_wait_killable(int mode)
+int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
 }
-
-int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
-{
-	return nfs_wait_killable(mode);
-}
 EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
 
 /**
@@ -1331,7 +1326,8 @@ int nfs_clear_invalid_mapping(struct add
 	 */
 	for (;;) {
 		ret = wait_on_bit_action(bitlock, NFS_INO_INVALIDATING,
-					 nfs_wait_bit_killable, TASK_KILLABLE);
+					 nfs_wait_bit_killable,
+					 TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 		if (ret)
 			goto out;
 		spin_lock(&inode->i_lock);
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -36,7 +36,8 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt,
 		res = rpc_call_sync(clnt, msg, flags);
 		if (res != -EJUKEBOX)
 			break;
-		freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
+		__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
+		schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
 		res = -ERESTARTSYS;
 	} while (!fatal_signal_pending(current));
 	return res;
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -408,8 +408,8 @@ static int nfs4_delay_killable(long *tim
 {
 	might_sleep();
 
-	freezable_schedule_timeout_killable_unsafe(
-		nfs4_update_delay(timeout));
+	__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
+	schedule_timeout(nfs4_update_delay(timeout));
 	if (!__fatal_signal_pending(current))
 		return 0;
 	return -EINTR;
@@ -419,7 +419,8 @@ static int nfs4_delay_interruptible(long
 {
 	might_sleep();
 
-	freezable_schedule_timeout_interruptible_unsafe(nfs4_update_delay(timeout));
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE_UNSAFE);
+	schedule_timeout(nfs4_update_delay(timeout));
 	if (!signal_pending(current))
 		return 0;
 	return __fatal_signal_pending(current) ? -EINTR :-ERESTARTSYS;
@@ -7363,7 +7364,8 @@ nfs4_retry_setlk_simple(struct nfs4_stat
 		status = nfs4_proc_setlk(state, cmd, request);
 		if ((status != -EAGAIN) || IS_SETLK(cmd))
 			break;
-		freezable_schedule_timeout_interruptible(timeout);
+		__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+		schedule_timeout(timeout);
 		timeout *= 2;
 		timeout = min_t(unsigned long, NFS4_LOCK_MAXTIMEOUT, timeout);
 		status = -ERESTARTSYS;
@@ -7431,10 +7433,8 @@ nfs4_retry_setlk(struct nfs4_state *stat
 			break;
 
 		status = -ERESTARTSYS;
-		freezer_do_not_count();
-		wait_woken(&waiter.wait, TASK_INTERRUPTIBLE,
+		wait_woken(&waiter.wait, TASK_INTERRUPTIBLE|TASK_FREEZABLE,
 			   NFS4_LOCK_MAXTIMEOUT);
-		freezer_count();
 	} while (!signalled());
 
 	remove_wait_queue(q, &waiter.wait);
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1314,7 +1314,8 @@ int nfs4_wait_clnt_recover(struct nfs_cl
 
 	refcount_inc(&clp->cl_count);
 	res = wait_on_bit_action(&clp->cl_state, NFS4CLNT_MANAGER_RUNNING,
-				 nfs_wait_bit_killable, TASK_KILLABLE);
+				 nfs_wait_bit_killable,
+				 TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 	if (res)
 		goto out;
 	if (clp->cl_cons_state < 0)
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1907,7 +1907,7 @@ static int pnfs_prepare_to_retry_layoutg
 	pnfs_layoutcommit_inode(lo->plh_inode, false);
 	return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
 				   nfs_wait_bit_killable,
-				   TASK_KILLABLE);
+				   TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 }
 
 static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
@@ -3182,7 +3182,7 @@ pnfs_layoutcommit_inode(struct inode *in
 		status = wait_on_bit_lock_action(&nfsi->flags,
 				NFS_INO_LAYOUTCOMMITTING,
 				nfs_wait_bit_killable,
-				TASK_KILLABLE);
+				TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 		if (status)
 			goto out;
 	}
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -602,9 +602,9 @@ xfsaild(
 
 	while (1) {
 		if (tout && tout <= 20)
-			set_current_state(TASK_KILLABLE);
+			set_current_state(TASK_KILLABLE|TASK_FREEZABLE);
 		else
-			set_current_state(TASK_INTERRUPTIBLE);
+			set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 
 		/*
 		 * Check kthread_should_stop() after we set the task state to
@@ -653,14 +653,14 @@ xfsaild(
 		    ailp->ail_target == ailp->ail_target_prev &&
 		    list_empty(&ailp->ail_buf_list)) {
 			spin_unlock(&ailp->ail_lock);
-			freezable_schedule();
+			schedule();
 			tout = 0;
 			continue;
 		}
 		spin_unlock(&ailp->ail_lock);
 
 		if (tout)
-			freezable_schedule_timeout(msecs_to_jiffies(tout));
+			schedule_timeout(msecs_to_jiffies(tout));
 
 		__set_current_state(TASK_RUNNING);
 
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -103,6 +103,7 @@ extern void wait_for_completion(struct c
 extern void wait_for_completion_io(struct completion *);
 extern int wait_for_completion_interruptible(struct completion *x);
 extern int wait_for_completion_killable(struct completion *x);
+extern int wait_for_completion_state(struct completion *x, unsigned int state);
 extern unsigned long wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
 extern unsigned long wait_for_completion_io_timeout(struct completion *x,
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -8,9 +8,11 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
+#include <linux/jump_label.h>
 
 #ifdef CONFIG_FREEZER
-extern atomic_t system_freezing_cnt;	/* nr of freezing conds in effect */
+DECLARE_STATIC_KEY_FALSE(freezer_active);
+
 extern bool pm_freezing;		/* PM freezing in effect */
 extern bool pm_nosig_freezing;		/* PM nosig freezing in effect */
 
@@ -22,10 +24,7 @@ extern unsigned int freeze_timeout_msecs
 /*
  * Check if a process has been frozen
  */
-static inline bool frozen(struct task_struct *p)
-{
-	return p->flags & PF_FROZEN;
-}
+extern bool frozen(struct task_struct *p);
 
 extern bool freezing_slow_path(struct task_struct *p);
 
@@ -34,9 +33,10 @@ extern bool freezing_slow_path(struct ta
  */
 static inline bool freezing(struct task_struct *p)
 {
-	if (likely(!atomic_read(&system_freezing_cnt)))
-		return false;
-	return freezing_slow_path(p);
+	if (static_branch_unlikely(&freezer_active))
+		return freezing_slow_path(p);
+
+	return false;
 }
 
 /* Takes and releases task alloc lock using task_lock() */
@@ -48,23 +48,14 @@ extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 extern void thaw_kernel_threads(void);
 
-/*
- * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
- * If try_to_freeze causes a lockdep warning it means the caller may deadlock
- */
-static inline bool try_to_freeze_unsafe(void)
+static inline bool try_to_freeze(void)
 {
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
-	return __refrigerator(false);
-}
-
-static inline bool try_to_freeze(void)
-{
 	if (!(current->flags & PF_NOFREEZE))
 		debug_check_no_locks_held();
-	return try_to_freeze_unsafe();
+	return __refrigerator(false);
 }
 
 extern bool freeze_task(struct task_struct *p);
@@ -79,195 +70,6 @@ static inline bool cgroup_freezing(struc
 }
 #endif /* !CONFIG_CGROUP_FREEZER */
 
-/*
- * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
- * calls wait_for_completion(&vfork) and reset right after it returns from this
- * function.  Next, the parent should call try_to_freeze() to freeze itself
- * appropriately in case the child has exited before the freezing of tasks is
- * complete.  However, we don't want kernel threads to be frozen in unexpected
- * places, so we allow them to block freeze_processes() instead or to set
- * PF_NOFREEZE if needed. Fortunately, in the ____call_usermodehelper() case the
- * parent won't really block freeze_processes(), since ____call_usermodehelper()
- * (the child) does a little before exec/exit and it can't be frozen before
- * waking up the parent.
- */
-
-
-/**
- * freezer_do_not_count - tell freezer to ignore %current
- *
- * Tell freezers to ignore the current task when determining whether the
- * target frozen state is reached.  IOW, the current task will be
- * considered frozen enough by freezers.
- *
- * The caller shouldn't do anything which isn't allowed for a frozen task
- * until freezer_cont() is called.  Usually, freezer[_do_not]_count() pair
- * wrap a scheduling operation and nothing much else.
- */
-static inline void freezer_do_not_count(void)
-{
-	current->flags |= PF_FREEZER_SKIP;
-}
-
-/**
- * freezer_count - tell freezer to stop ignoring %current
- *
- * Undo freezer_do_not_count().  It tells freezers that %current should be
- * considered again and tries to freeze if freezing condition is already in
- * effect.
- */
-static inline void freezer_count(void)
-{
-	current->flags &= ~PF_FREEZER_SKIP;
-	/*
-	 * If freezing is in progress, the following paired with smp_mb()
-	 * in freezer_should_skip() ensures that either we see %true
-	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
-	 */
-	smp_mb();
-	try_to_freeze();
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline void freezer_count_unsafe(void)
-{
-	current->flags &= ~PF_FREEZER_SKIP;
-	smp_mb();
-	try_to_freeze_unsafe();
-}
-
-/**
- * freezer_should_skip - whether to skip a task when determining frozen
- *			 state is reached
- * @p: task in quesion
- *
- * This function is used by freezers after establishing %true freezing() to
- * test whether a task should be skipped when determining the target frozen
- * state is reached.  IOW, if this function returns %true, @p is considered
- * frozen enough.
- */
-static inline bool freezer_should_skip(struct task_struct *p)
-{
-	/*
-	 * The following smp_mb() paired with the one in freezer_count()
-	 * ensures that either freezer_count() sees %true freezing() or we
-	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
-	 * impossible for a task to slip frozen state testing after
-	 * clearing %PF_FREEZER_SKIP.
-	 */
-	smp_mb();
-	return p->flags & PF_FREEZER_SKIP;
-}
-
-/*
- * These functions are intended to be used whenever you want allow a sleeping
- * task to be frozen. Note that neither return any clear indication of
- * whether a freeze event happened while in this function.
- */
-
-/* Like schedule(), but should not block the freezer. */
-static inline void freezable_schedule(void)
-{
-	freezer_do_not_count();
-	schedule();
-	freezer_count();
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline void freezable_schedule_unsafe(void)
-{
-	freezer_do_not_count();
-	schedule();
-	freezer_count_unsafe();
-}
-
-/*
- * Like schedule_timeout(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline long freezable_schedule_timeout(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/*
- * Like schedule_timeout_interruptible(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline long freezable_schedule_timeout_interruptible(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_interruptible(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline long freezable_schedule_timeout_interruptible_unsafe(long timeout)
-{
-	long __retval;
-
-	freezer_do_not_count();
-	__retval = schedule_timeout_interruptible(timeout);
-	freezer_count_unsafe();
-	return __retval;
-}
-
-/* Like schedule_timeout_killable(), but should not block the freezer. */
-static inline long freezable_schedule_timeout_killable(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_killable(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline long freezable_schedule_timeout_killable_unsafe(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_killable(timeout);
-	freezer_count_unsafe();
-	return __retval;
-}
-
-/*
- * Like schedule_hrtimeout_range(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline int freezable_schedule_hrtimeout_range(ktime_t *expires,
-		u64 delta, const enum hrtimer_mode mode)
-{
-	int __retval;
-	freezer_do_not_count();
-	__retval = schedule_hrtimeout_range(expires, delta, mode);
-	freezer_count();
-	return __retval;
-}
-
-/*
- * Freezer-friendly wrappers around wait_event_interruptible(),
- * wait_event_killable() and wait_event_interruptible_timeout(), originally
- * defined in <linux/wait.h>
- */
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-#define wait_event_freezekillable_unsafe(wq, condition)			\
-({									\
-	int __retval;							\
-	freezer_do_not_count();						\
-	__retval = wait_event_killable(wq, (condition));		\
-	freezer_count_unsafe();						\
-	__retval;							\
-})
-
 #else /* !CONFIG_FREEZER */
 static inline bool frozen(struct task_struct *p) { return false; }
 static inline bool freezing(struct task_struct *p) { return false; }
@@ -281,35 +83,9 @@ static inline void thaw_kernel_threads(v
 
 static inline bool try_to_freeze(void) { return false; }
 
-static inline void freezer_do_not_count(void) {}
 static inline void freezer_count(void) {}
-static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 static inline void set_freezable(void) {}
 
-#define freezable_schedule()  schedule()
-
-#define freezable_schedule_unsafe()  schedule()
-
-#define freezable_schedule_timeout(timeout)  schedule_timeout(timeout)
-
-#define freezable_schedule_timeout_interruptible(timeout)		\
-	schedule_timeout_interruptible(timeout)
-
-#define freezable_schedule_timeout_interruptible_unsafe(timeout)	\
-	schedule_timeout_interruptible(timeout)
-
-#define freezable_schedule_timeout_killable(timeout)			\
-	schedule_timeout_killable(timeout)
-
-#define freezable_schedule_timeout_killable_unsafe(timeout)		\
-	schedule_timeout_killable(timeout)
-
-#define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
-	schedule_hrtimeout_range(expires, delta, mode)
-
-#define wait_event_freezekillable_unsafe(wq, condition)			\
-		wait_event_killable(wq, condition)
-
 #endif /* !CONFIG_FREEZER */
 
 #endif	/* FREEZER_H_INCLUDED */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -80,25 +80,32 @@ struct task_group;
  */
 
 /* Used in tsk->state: */
-#define TASK_RUNNING			0x0000
-#define TASK_INTERRUPTIBLE		0x0001
-#define TASK_UNINTERRUPTIBLE		0x0002
-#define __TASK_STOPPED			0x0004
-#define __TASK_TRACED			0x0008
+#define TASK_RUNNING			0x000000
+#define TASK_INTERRUPTIBLE		0x000001
+#define TASK_UNINTERRUPTIBLE		0x000002
+#define __TASK_STOPPED			0x000004
+#define __TASK_TRACED			0x000008
 /* Used in tsk->exit_state: */
-#define EXIT_DEAD			0x0010
-#define EXIT_ZOMBIE			0x0020
+#define EXIT_DEAD			0x000010
+#define EXIT_ZOMBIE			0x000020
 #define EXIT_TRACE			(EXIT_ZOMBIE | EXIT_DEAD)
 /* Used in tsk->state again: */
-#define TASK_PARKED			0x0040
-#define TASK_DEAD			0x0080
-#define TASK_WAKEKILL			0x0100
-#define TASK_WAKING			0x0200
-#define TASK_NOLOAD			0x0400
-#define TASK_NEW			0x0800
-/* RT specific auxilliary flag to mark RT lock waiters */
-#define TASK_RTLOCK_WAIT		0x1000
-#define TASK_STATE_MAX			0x2000
+#define TASK_PARKED			0x000040
+#define TASK_DEAD			0x000080
+#define TASK_WAKEKILL			0x000100
+#define TASK_WAKING			0x000200
+#define TASK_NOLOAD			0x000400
+#define TASK_NEW			0x000800
+#define TASK_FREEZABLE			0x001000
+#define __TASK_FREEZABLE_UNSAFE	       (0x002000 * IS_ENABLED(CONFIG_LOCKDEP))
+#define TASK_FROZEN			0x004000
+#define TASK_RTLOCK_WAIT		0x008000
+#define TASK_STATE_MAX			0x010000
+
+/*
+ * DO NOT ADD ANY NEW USERS !
+ */
+#define TASK_FREEZABLE_UNSAFE		(TASK_FREEZABLE | __TASK_FREEZABLE_UNSAFE)
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
@@ -1698,7 +1705,6 @@ extern struct pid *cad_pid;
 #define PF_NPROC_EXCEEDED	0x00001000	/* set_user() noticed that RLIMIT_NPROC was exceeded */
 #define PF_USED_MATH		0x00002000	/* If unset the fpu must be initialized before use */
 #define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
-#define PF_FROZEN		0x00010000	/* Frozen for system suspend */
 #define PF_KSWAPD		0x00020000	/* I am kswapd */
 #define PF_MEMALLOC_NOFS	0x00040000	/* All allocation requests will inherit GFP_NOFS */
 #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
@@ -1709,7 +1715,6 @@ extern struct pid *cad_pid;
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
 #define PF_MEMALLOC_PIN		0x10000000	/* Allocation context constrained to zones which allow long term pinning. */
-#define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
 #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
 
 /*
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -252,7 +252,7 @@ int		rpc_malloc(struct rpc_task *);
 void		rpc_free(struct rpc_task *);
 int		rpciod_up(void);
 void		rpciod_down(void);
-int		__rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *);
+int		rpc_wait_for_completion_task(struct rpc_task *task);
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 struct net;
 void		rpc_show_tasks(struct net *);
@@ -264,11 +264,6 @@ extern struct workqueue_struct *xprtiod_
 void		rpc_prepare_task(struct rpc_task *task);
 gfp_t		rpc_task_gfp_mask(void);
 
-static inline int rpc_wait_for_completion_task(struct rpc_task *task)
-{
-	return __rpc_wait_for_completion_task(task, NULL);
-}
-
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) || IS_ENABLED(CONFIG_TRACEPOINTS)
 static inline const char * rpc_qname(const struct rpc_wait_queue *q)
 {
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -361,8 +361,8 @@ do {										\
 } while (0)
 
 #define __wait_event_freezable(wq_head, condition)				\
-	___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0,		\
-			    freezable_schedule())
+	___wait_event(wq_head, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE),	\
+			0, 0, schedule())
 
 /**
  * wait_event_freezable - sleep (or freeze) until a condition gets true
@@ -420,8 +420,8 @@ do {										\
 
 #define __wait_event_freezable_timeout(wq_head, condition, timeout)		\
 	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
-		      TASK_INTERRUPTIBLE, 0, timeout,				\
-		      __ret = freezable_schedule_timeout(__ret))
+		      (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 0, timeout,		\
+		      __ret = schedule_timeout(__ret))
 
 /*
  * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid
@@ -641,8 +641,8 @@ do {										\
 
 
 #define __wait_event_freezable_exclusive(wq, condition)				\
-	___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0,			\
-			freezable_schedule())
+	___wait_event(wq, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 1, 0,\
+			schedule())
 
 #define wait_event_freezable_exclusive(wq, condition)				\
 ({										\
@@ -931,6 +931,34 @@ extern int do_wait_intr_irq(wait_queue_h
 	__ret;									\
 })
 
+#define __wait_event_state(wq, condition, state)				\
+	___wait_event(wq, condition, state, 0, 0, schedule())
+
+/**
+ * wait_event_state - sleep until a condition gets true
+ * @wq_head: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @state: state to sleep in
+ *
+ * The process is put to sleep (@state) until the @condition evaluates to true
+ * or a signal is received.  The @condition is checked each time the waitqueue
+ * @wq_head is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ *
+ * The function will return -ERESTARTSYS if it was interrupted by a
+ * signal and 0 if @condition evaluated to true.
+ */
+#define wait_event_state(wq_head, condition, state)				\
+({										\
+	int __ret = 0;								\
+	might_sleep();								\
+	if (!(condition))							\
+		__ret = __wait_event_state(wq_head, condition, state);		\
+	__ret;									\
+})
+
 #define __wait_event_killable_timeout(wq_head, condition, timeout)		\
 	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
 		      TASK_KILLABLE, 0, timeout,				\
--- a/kernel/cgroup/legacy_freezer.c
+++ b/kernel/cgroup/legacy_freezer.c
@@ -113,7 +113,7 @@ static int freezer_css_online(struct cgr
 
 	if (parent && (parent->state & CGROUP_FREEZING)) {
 		freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
-		atomic_inc(&system_freezing_cnt);
+		static_branch_inc(&freezer_active);
 	}
 
 	mutex_unlock(&freezer_mutex);
@@ -134,7 +134,7 @@ static void freezer_css_offline(struct c
 	mutex_lock(&freezer_mutex);
 
 	if (freezer->state & CGROUP_FREEZING)
-		atomic_dec(&system_freezing_cnt);
+		static_branch_dec(&freezer_active);
 
 	freezer->state = 0;
 
@@ -179,6 +179,7 @@ static void freezer_attach(struct cgroup
 			__thaw_task(task);
 		} else {
 			freeze_task(task);
+
 			/* clear FROZEN and propagate upwards */
 			while (freezer && (freezer->state & CGROUP_FROZEN)) {
 				freezer->state &= ~CGROUP_FROZEN;
@@ -271,16 +272,8 @@ static void update_if_frozen(struct cgro
 	css_task_iter_start(css, 0, &it);
 
 	while ((task = css_task_iter_next(&it))) {
-		if (freezing(task)) {
-			/*
-			 * freezer_should_skip() indicates that the task
-			 * should be skipped when determining freezing
-			 * completion.  Consider it frozen in addition to
-			 * the usual frozen condition.
-			 */
-			if (!frozen(task) && !freezer_should_skip(task))
-				goto out_iter_end;
-		}
+		if (freezing(task) && !frozen(task))
+			goto out_iter_end;
 	}
 
 	freezer->state |= CGROUP_FROZEN;
@@ -357,7 +350,7 @@ static void freezer_apply_state(struct f
 
 	if (freeze) {
 		if (!(freezer->state & CGROUP_FREEZING))
-			atomic_inc(&system_freezing_cnt);
+			static_branch_inc(&freezer_active);
 		freezer->state |= state;
 		freeze_cgroup(freezer);
 	} else {
@@ -366,9 +359,9 @@ static void freezer_apply_state(struct f
 		freezer->state &= ~state;
 
 		if (!(freezer->state & CGROUP_FREEZING)) {
-			if (was_freezing)
-				atomic_dec(&system_freezing_cnt);
 			freezer->state &= ~CGROUP_FROZEN;
+			if (was_freezing)
+				static_branch_dec(&freezer_active);
 			unfreeze_cgroup(freezer);
 		}
 	}
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -374,10 +374,10 @@ static void coredump_task_exit(struct ta
 			complete(&core_state->startup);
 
 		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
 			if (!self.task) /* see coredump_finish() */
 				break;
-			freezable_schedule();
+			schedule();
 		}
 		__set_current_state(TASK_RUNNING);
 	}
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1417,13 +1417,12 @@ static void complete_vfork_done(struct t
 static int wait_for_vfork_done(struct task_struct *child,
 				struct completion *vfork)
 {
+	unsigned int state = TASK_UNINTERRUPTIBLE|TASK_KILLABLE|TASK_FREEZABLE;
 	int killed;
 
-	freezer_do_not_count();
 	cgroup_enter_frozen();
-	killed = wait_for_completion_killable(vfork);
+	killed = wait_for_completion_state(vfork, state);
 	cgroup_leave_frozen(false);
-	freezer_count();
 
 	if (killed) {
 		task_lock(child);
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -13,10 +13,11 @@
 #include <linux/kthread.h>
 
 /* total number of freezing conditions in effect */
-atomic_t system_freezing_cnt = ATOMIC_INIT(0);
-EXPORT_SYMBOL(system_freezing_cnt);
+DEFINE_STATIC_KEY_FALSE(freezer_active);
+EXPORT_SYMBOL(freezer_active);
 
-/* indicate whether PM freezing is in effect, protected by
+/*
+ * indicate whether PM freezing is in effect, protected by
  * system_transition_mutex
  */
 bool pm_freezing;
@@ -29,7 +30,7 @@ static DEFINE_SPINLOCK(freezer_lock);
  * freezing_slow_path - slow path for testing whether a task needs to be frozen
  * @p: task to be tested
  *
- * This function is called by freezing() if system_freezing_cnt isn't zero
+ * This function is called by freezing() if freezer_active isn't zero
  * and tests whether @p needs to enter and stay in frozen state.  Can be
  * called under any context.  The freezers are responsible for ensuring the
  * target tasks see the updated state.
@@ -52,41 +53,40 @@ bool freezing_slow_path(struct task_stru
 }
 EXPORT_SYMBOL(freezing_slow_path);
 
+bool frozen(struct task_struct *p)
+{
+	return READ_ONCE(p->__state) & TASK_FROZEN;
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
-	/* Hmm, should we be allowed to suspend when there are realtime
-	   processes around? */
+	unsigned int state = get_current_state();
 	bool was_frozen = false;
-	unsigned int save = get_current_state();
 
 	pr_debug("%s entered refrigerator\n", current->comm);
 
+	WARN_ON_ONCE(state && !(state & TASK_NORMAL));
+
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		bool freeze;
+
+		set_current_state(TASK_FROZEN);
 
 		spin_lock_irq(&freezer_lock);
-		current->flags |= PF_FROZEN;
-		if (!freezing(current) ||
-		    (check_kthr_stop && kthread_should_stop()))
-			current->flags &= ~PF_FROZEN;
+		freeze = freezing(current) && !(check_kthr_stop && kthread_should_stop());
 		spin_unlock_irq(&freezer_lock);
 
-		if (!(current->flags & PF_FROZEN))
+		if (!freeze)
 			break;
+
 		was_frozen = true;
 		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 
 	pr_debug("%s left refrigerator\n", current->comm);
 
-	/*
-	 * Restore saved task state before returning.  The mb'd version
-	 * needs to be used; otherwise, it might silently break
-	 * synchronization which depends on ordered task state change.
-	 */
-	set_current_state(save);
-
 	return was_frozen;
 }
 EXPORT_SYMBOL(__refrigerator);
@@ -101,6 +101,44 @@ static void fake_signal_wake_up(struct t
 	}
 }
 
+static int __set_task_frozen(struct task_struct *p, void *arg)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if (p->on_rq)
+		return 0;
+
+	if (p != current && task_curr(p))
+		return 0;
+
+	if (!(state & (TASK_FREEZABLE | __TASK_STOPPED | __TASK_TRACED)))
+		return 0;
+
+	/*
+	 * Only TASK_NORMAL can be augmented with TASK_FREEZABLE, since they
+	 * can suffer spurious wakeups.
+	 */
+	if (state & TASK_FREEZABLE)
+		WARN_ON_ONCE(!(state & TASK_NORMAL));
+
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * It's dangerous to freeze with locks held; there be dragons there.
+	 */
+	if (!(state & __TASK_FREEZABLE_UNSAFE))
+		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
+#endif
+
+	WRITE_ONCE(p->__state, TASK_FROZEN);
+	return TASK_FROZEN;
+}
+
+static bool __freeze_task(struct task_struct *p)
+{
+	/* TASK_FREEZABLE|TASK_STOPPED|TASK_TRACED -> TASK_FROZEN */
+	return task_call_func(p, __set_task_frozen, NULL);
+}
+
 /**
  * freeze_task - send a freeze request to given task
  * @p: task to send the request to
@@ -116,20 +154,8 @@ bool freeze_task(struct task_struct *p)
 {
 	unsigned long flags;
 
-	/*
-	 * This check can race with freezer_do_not_count, but worst case that
-	 * will result in an extra wakeup being sent to the task.  It does not
-	 * race with freezer_count(), the barriers in freezer_count() and
-	 * freezer_should_skip() ensure that either freezer_count() sees
-	 * freezing == true in try_to_freeze() and freezes, or
-	 * freezer_should_skip() sees !PF_FREEZE_SKIP and freezes the task
-	 * normally.
-	 */
-	if (freezer_should_skip(p))
-		return false;
-
 	spin_lock_irqsave(&freezer_lock, flags);
-	if (!freezing(p) || frozen(p)) {
+	if (!freezing(p) || frozen(p) || __freeze_task(p)) {
 		spin_unlock_irqrestore(&freezer_lock, flags);
 		return false;
 	}
@@ -137,19 +163,56 @@ bool freeze_task(struct task_struct *p)
 	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
 	else
-		wake_up_state(p, TASK_INTERRUPTIBLE);
+		wake_up_state(p, TASK_NORMAL);
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
 }
 
+/*
+ * The special task states (TASK_STOPPED, TASK_TRACED) keep their canonical
+ * state in p->jobctl. If either of them got a wakeup that was missed because
+ * TASK_FROZEN, then their canonical state reflects that and the below will
+ * refuse to restore the special state and instead issue the wakeup.
+ */
+static int __set_task_special(struct task_struct *p, void *arg)
+{
+	unsigned int state = 0;
+
+	if (p->jobctl & JOBCTL_TRACED)
+		state = TASK_TRACED;
+
+	else if (p->jobctl & JOBCTL_STOPPED)
+		state = TASK_STOPPED;
+
+	if (__fatal_signal_pending(p) &&
+	    !(p->jobctl & JOBCTL_DELAY_WAKEKILL))
+		state = 0;
+
+	if (state)
+		WRITE_ONCE(p->__state, state);
+
+	return state;
+}
+
 void __thaw_task(struct task_struct *p)
 {
-	unsigned long flags;
+	unsigned long flags, flags2;
 
 	spin_lock_irqsave(&freezer_lock, flags);
-	if (frozen(p))
-		wake_up_process(p);
+	if (WARN_ON_ONCE(freezing(p)))
+		goto unlock;
+
+	if (lock_task_sighand(p, &flags2)) {
+		/* TASK_FROZEN -> TASK_{STOPPED,TRACED} */
+		bool ret = task_call_func(p, __set_task_special, NULL);
+		unlock_task_sighand(p, &flags2);
+		if (ret)
+			goto unlock;
+	}
+
+	wake_up_state(p, TASK_FROZEN);
+unlock:
 	spin_unlock_irqrestore(&freezer_lock, flags);
 }
 
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -334,7 +334,7 @@ void futex_wait_queue(struct futex_hash_
 	 * futex_queue() calls spin_unlock() upon completion, both serializing
 	 * access to the hash list and forcing another memory barrier.
 	 */
-	set_current_state(TASK_INTERRUPTIBLE);
+	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 	futex_queue(q, hb);
 
 	/* Arm the timer */
@@ -352,7 +352,7 @@ void futex_wait_queue(struct futex_hash_
 		 * is no timeout, or if it has yet to expire.
 		 */
 		if (!timeout || timeout->task)
-			freezable_schedule();
+			schedule();
 	}
 	__set_current_state(TASK_RUNNING);
 }
@@ -430,7 +430,7 @@ static int futex_wait_multiple_setup(str
 			return ret;
 	}
 
-	set_current_state(TASK_INTERRUPTIBLE);
+	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
@@ -504,7 +504,7 @@ static void futex_sleep_multiple(struct
 			return;
 	}
 
-	freezable_schedule();
+	schedule();
 }
 
 /**
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -95,8 +95,8 @@ static void check_hung_task(struct task_
 	 * Ensure the task is not frozen.
 	 * Also, skip vfork and any other user process that freezer should skip.
 	 */
-	if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
-	    return;
+	if (unlikely(READ_ONCE(t->__state) & (TASK_FREEZABLE | TASK_FROZEN)))
+		return;
 
 	/*
 	 * When a freshly created task is scheduled once, changes its state to
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -24,7 +24,7 @@
 unsigned int lock_system_sleep(void)
 {
 	unsigned int flags = current->flags;
-	current->flags |= PF_FREEZER_SKIP;
+	current->flags |= PF_NOFREEZE;
 	mutex_lock(&system_transition_mutex);
 	return flags;
 }
@@ -48,8 +48,8 @@ void unlock_system_sleep(unsigned int fl
 	 * Which means, if we use try_to_freeze() here, it would make them
 	 * enter the refrigerator, thus causing hibernation to lockup.
 	 */
-	if (!(flags & PF_FREEZER_SKIP))
-		current->flags &= ~PF_FREEZER_SKIP;
+	if (!(flags & PF_NOFREEZE))
+		current->flags &= ~PF_NOFREEZE;
 	mutex_unlock(&system_transition_mutex);
 }
 EXPORT_SYMBOL_GPL(unlock_system_sleep);
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -53,8 +53,7 @@ static int try_to_freeze_tasks(bool user
 			if (p == current || !freeze_task(p))
 				continue;
 
-			if (!freezer_should_skip(p))
-				todo++;
+			todo++;
 		}
 		read_unlock(&tasklist_lock);
 
@@ -99,8 +98,7 @@ static int try_to_freeze_tasks(bool user
 		if (!wakeup || pm_debug_messages_on) {
 			read_lock(&tasklist_lock);
 			for_each_process_thread(g, p) {
-				if (p != current && !freezer_should_skip(p)
-				    && freezing(p) && !frozen(p))
+				if (p != current && freezing(p) && !frozen(p))
 					sched_show_task(p);
 			}
 			read_unlock(&tasklist_lock);
@@ -132,7 +130,7 @@ int freeze_processes(void)
 	current->flags |= PF_SUSPEND_TASK;
 
 	if (!pm_freezing)
-		atomic_inc(&system_freezing_cnt);
+		static_branch_inc(&freezer_active);
 
 	pm_wakeup_clear(0);
 	pr_info("Freezing user space processes ... ");
@@ -193,7 +191,7 @@ void thaw_processes(void)
 
 	trace_suspend_resume(TPS("thaw_processes"), 0, true);
 	if (pm_freezing)
-		atomic_dec(&system_freezing_cnt);
+		static_branch_dec(&freezer_active);
 	pm_freezing = false;
 	pm_nosig_freezing = false;
 
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
 	}
 	__set_current_state(TASK_RUNNING);
 
-	if (!wait_task_inactive(child, TASK_TRACED) ||
+	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
 	    !ptrace_freeze_traced(child))
 		return -ESRCH;
 
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -247,6 +247,15 @@ int __sched wait_for_completion_killable
 }
 EXPORT_SYMBOL(wait_for_completion_killable);
 
+int __sched wait_for_completion_state(struct completion *x, unsigned int state)
+{
+	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state);
+	if (t == -ERESTARTSYS)
+		return t;
+	return 0;
+}
+EXPORT_SYMBOL(wait_for_completion_state);
+
 /**
  * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3260,6 +3260,19 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+static inline bool __wti_match(struct task_struct *p, unsigned int match_state)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if ((match_state & TASK_FREEZABLE) && state == TASK_FROZEN)
+		return true;
+
+	if (state == (match_state & ~TASK_FREEZABLE))
+		return true;
+
+	return false;
+}
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3304,7 +3317,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && !__wti_match(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3332,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || __wti_match(p, match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 
@@ -6320,7 +6333,7 @@ static void __sched notrace __schedule(u
 			prev->sched_contributes_to_load =
 				(prev_state & TASK_UNINTERRUPTIBLE) &&
 				!(prev_state & TASK_NOLOAD) &&
-				!(prev->flags & PF_FROZEN);
+				!(prev_state & TASK_FROZEN);
 
 			if (prev->sched_contributes_to_load)
 				rq->nr_uninterruptible++;
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2321,7 +2321,7 @@ static int ptrace_stop(int exit_code, in
 		clear_traced_quiesce();
 
 		preempt_enable_no_resched();
-		freezable_schedule();
+		schedule();
 
 		cgroup_leave_frozen(true);
 	} else {
@@ -2514,7 +2514,7 @@ static bool do_signal_stop(int signr)
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
-		freezable_schedule();
+		schedule();
 		return true;
 	} else {
 		/*
@@ -2589,11 +2589,11 @@ static void do_freezer_trap(void)
 	 * immediately (if there is a non-fatal signal pending), and
 	 * put the task into sleep.
 	 */
-	__set_current_state(TASK_INTERRUPTIBLE);
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 	clear_thread_flag(TIF_SIGPENDING);
 	spin_unlock_irq(&current->sighand->siglock);
 	cgroup_enter_frozen();
-	freezable_schedule();
+	schedule();
 }
 
 static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
@@ -3639,9 +3639,9 @@ static int do_sigtimedwait(const sigset_
 		recalc_sigpending();
 		spin_unlock_irq(&tsk->sighand->siglock);
 
-		__set_current_state(TASK_INTERRUPTIBLE);
-		ret = freezable_schedule_hrtimeout_range(to, tsk->timer_slack_ns,
-							 HRTIMER_MODE_REL);
+		__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+		ret = schedule_hrtimeout_range(to, tsk->timer_slack_ns,
+					       HRTIMER_MODE_REL);
 		spin_lock_irq(&tsk->sighand->siglock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
 		sigemptyset(&tsk->real_blocked);
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2037,11 +2037,11 @@ static int __sched do_nanosleep(struct h
 	struct restart_block *restart;
 
 	do {
-		set_current_state(TASK_INTERRUPTIBLE);
+		set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 		hrtimer_sleeper_start_expires(t, mode);
 
 		if (likely(t->task))
-			freezable_schedule();
+			schedule();
 
 		hrtimer_cancel(&t->timer);
 		mode = HRTIMER_MODE_ABS;
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -404,6 +404,7 @@ EXPORT_SYMBOL(call_usermodehelper_setup)
  */
 int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 {
+	unsigned int state = TASK_UNINTERRUPTIBLE;
 	DECLARE_COMPLETION_ONSTACK(done);
 	int retval = 0;
 
@@ -437,25 +438,22 @@ int call_usermodehelper_exec(struct subp
 	if (wait == UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
 
+	if (wait & UMH_KILLABLE)
+		state |= TASK_KILLABLE;
+
 	if (wait & UMH_FREEZABLE)
-		freezer_do_not_count();
+		state |= TASK_FREEZABLE;
 
-	if (wait & UMH_KILLABLE) {
-		retval = wait_for_completion_killable(&done);
-		if (!retval)
-			goto wait_done;
+	retval = wait_for_completion_state(&done, state);
+	if (!retval)
+		goto wait_done;
 
+	if (wait & UMH_KILLABLE) {
 		/* umh_complete() will see NULL and free sub_info */
 		if (xchg(&sub_info->complete, NULL))
 			goto unlock;
-		/* fallthrough, umh_complete() was already called */
 	}
 
-	wait_for_completion(&done);
-
-	if (wait & UMH_FREEZABLE)
-		freezer_count();
-
 wait_done:
 	retval = sub_info->retval;
 out:
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -780,8 +780,8 @@ static void khugepaged_alloc_sleep(void)
 	DEFINE_WAIT(wait);
 
 	add_wait_queue(&khugepaged_wait, &wait);
-	freezable_schedule_timeout_interruptible(
-		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+	schedule_timeout(msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
 	remove_wait_queue(&khugepaged_wait, &wait);
 }
 
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -268,7 +268,7 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
@@ -332,14 +332,12 @@ static int rpc_complete_task(struct rpc_
  * to enforce taking of the wq->lock and hence avoid races with
  * rpc_complete_task().
  */
-int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *action)
+int rpc_wait_for_completion_task(struct rpc_task *task)
 {
-	if (action == NULL)
-		action = rpc_wait_bit_killable;
 	return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE,
-			action, TASK_KILLABLE);
+			rpc_wait_bit_killable, TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 }
-EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task);
+EXPORT_SYMBOL_GPL(rpc_wait_for_completion_task);
 
 /*
  * Make an RPC task runnable.
@@ -963,7 +961,7 @@ static void __rpc_execute(struct rpc_tas
 		trace_rpc_task_sync_sleep(task, task->tk_action);
 		status = out_of_line_wait_on_bit(&task->tk_runstate,
 				RPC_TASK_QUEUED, rpc_wait_bit_killable,
-				TASK_KILLABLE);
+				TASK_KILLABLE|TASK_FREEZABLE);
 		if (status < 0) {
 			/*
 			 * When a sync task receives a signal, it exits with
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2530,13 +2530,14 @@ static long unix_stream_data_wait(struct
 				  struct sk_buff *last, unsigned int last_len,
 				  bool freezable)
 {
+	unsigned int state = TASK_INTERRUPTIBLE | freezable * TASK_FREEZABLE;
 	struct sk_buff *tail;
 	DEFINE_WAIT(wait);
 
 	unix_state_lock(sk);
 
 	for (;;) {
-		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, state);
 
 		tail = skb_peek_tail(&sk->sk_receive_queue);
 		if (tail != last ||
@@ -2549,10 +2550,7 @@ static long unix_stream_data_wait(struct
 
 		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 		unix_state_unlock(sk);
-		if (freezable)
-			timeo = freezable_schedule_timeout(timeo);
-		else
-			timeo = schedule_timeout(timeo);
+		timeo = schedule_timeout(timeo);
 		unix_state_lock(sk);
 
 		if (sock_flag(sk, SOCK_DEAD))



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
@ 2022-04-21 17:26   ` Eric W. Biederman
  2022-04-21 17:57     ` Oleg Nesterov
  2022-04-21 19:55     ` Peter Zijlstra
  0 siblings, 2 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-21 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
>  	}
>  	__set_current_state(TASK_RUNNING);
>  
> -	if (!wait_task_inactive(child, TASK_TRACED) ||
> +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
>  	    !ptrace_freeze_traced(child))
>  		return -ESRCH;

Do we mind that this is going to fail if the child is frozen
during ptrace_check_attach?

I think to avoid that we need to safely get this to
wait_task_inactive(child, 0), like the coredump code uses.

I would like to say that we can do without the wait_task_inactive,
but it looks like it is necessary to ensure that all of the userspace
registers are saved where the tracer can get at them.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 17:26   ` Eric W. Biederman
@ 2022-04-21 17:57     ` Oleg Nesterov
  2022-04-21 19:55     ` Peter Zijlstra
  1 sibling, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-21 17:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/21, Eric W. Biederman wrote:
>
> I would like to say that we can do without the wait_task_inactive,
> but it looks like it is necessary to ensure that all of the userspace
> registers are saved where the tracer can get at them.

Yes, for example, fpu regs.

But there are more problems. For example, if debugger changes TIF_BLOCKSTEP
we need to ensure the child is already inactive and it will do another
switch_to() after that.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
@ 2022-04-21 18:23   ` Oleg Nesterov
  2022-04-21 19:58     ` Peter Zijlstra
  2022-04-21 18:40   ` Eric W. Biederman
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-21 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.

Looks good after the quick glance... but to me honest I got lost and
I need to apply these patches and read the code carefully.

However, I am not able to do this until Monday, sorry.

Just one nit for now,

>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!task_is_traced(task))
>  		return;
>  
>  	WARN_ON(!task->ptrace || task->parent != current);
>  
> -	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> -	 */
>  	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (task_is_traced(task)) {

I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
I think a single lockless

	if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
		return;

at the start should be enough?

Nobody else can set this flag. It can be cleared by the tracee if it was
woken up, so perhaps we can check it again but afaics this is not strictly
needed.

> +//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));

Did you really want to add the commented WARN_ON_ONCE?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
@ 2022-04-21 18:40   ` Eric W. Biederman
  2022-04-26 22:50       ` Eric W. Biederman
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-21 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.
>
> Due to how PREEMPT_RT is changing the rules vs task->__state with the
> introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
> blocking spinlock thing), the way ptrace freeze tries to do things no
> longer works.


The problem with ptrace_stop and do_signal_stop that requires dropping
siglock and grabbing tasklist_lock is that do_notify_parent_cldstop
needs tasklist_lock to keep parent and real_parent stable.

With just some very modest code changes it looks like we can use
a processes own siglock to keep parent and real_parent stable.  The
siglock is already acquired in all of those places it is just not held
over the changing parent and real_parent.

Then make a rule that a child's siglock must be grabbed before a parents
siglock and do_notify_parent_cldstop can be always be called under the
childs siglock.

This means ptrace_stop can be significantly simplified, and the
notifications can be moved far enough up that set_special_state
can be called after do_notify_parent_cldstop.  With the result
that there is simply no PREEMPT_RT issue to worry about and
wait_task_inactive can be used as is.

I remember Oleg suggesting a change something like this a long
time ago.


I need to handle the case where the parent and the child share
the same sighand but that is just remembering to handle it in
do_notify_parent_cldstop, as the handling is simply not taking
the lock twice.

I am going to play with that and see if I there are any gotcha's
I missed when looking through the code.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 17:26   ` Eric W. Biederman
  2022-04-21 17:57     ` Oleg Nesterov
@ 2022-04-21 19:55     ` Peter Zijlstra
  2022-04-21 20:07       ` Peter Zijlstra
  1 sibling, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 19:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
> >  	}
> >  	__set_current_state(TASK_RUNNING);
> >  
> > -	if (!wait_task_inactive(child, TASK_TRACED) ||
> > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
> >  	    !ptrace_freeze_traced(child))
> >  		return -ESRCH;
> 
> Do we mind that this is going to fail if the child is frozen
> during ptrace_check_attach?

Why should this fail? wait_task_inactive() will in fact succeed if it is
frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
changes elsewhere in this patch.

And I don't see why ptrace_freeze_traced() should fail. It'll warn
though, I should extend/remove that WARN_ON_ONCE() looking at __state,
but it should work.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 18:23   ` Oleg Nesterov
@ 2022-04-21 19:58     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 19:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 08:23:26PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> > task->__state as much.
> 
> Looks good after the quick glance... but to me honest I got lost and
> I need to apply these patches and read the code carefully.
> 
> However, I am not able to do this until Monday, sorry.

Sure, no worries. Take your time.

> Just one nit for now,
> 
> >  static void ptrace_unfreeze_traced(struct task_struct *task)
> >  {
> > -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> > +	if (!task_is_traced(task))
> >  		return;
> >  
> >  	WARN_ON(!task->ptrace || task->parent != current);
> >  
> > -	/*
> > -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> > -	 * Recheck state under the lock to close this race.
> > -	 */
> >  	spin_lock_irq(&task->sighand->siglock);
> > -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> > +	if (task_is_traced(task)) {
> 
> I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
> I think a single lockless
> 
> 	if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
> 		return;
> 
> at the start should be enough?

I think so. That is indeed cleaner. I'll make the change if I don't see
anything wrong with it in the morning when the brain has woken up again
;-)

> 
> Nobody else can set this flag. It can be cleared by the tracee if it was
> woken up, so perhaps we can check it again but afaics this is not strictly
> needed.
> 
> > +//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
> 
> Did you really want to add the commented WARN_ON_ONCE?

I did that because:

@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
                                  request == PTRACE_INTERRUPT);
        if (!ret) {
                ret = compat_arch_ptrace(child, request, addr, data);
-               if (ret || request != PTRACE_DETACH)
-                       ptrace_unfreeze_traced(child);
+               ptrace_unfreeze_traced(child);
        }

Can now call unfreeze too often. I left the comment in because I need to
think more about why Eric did that and see if it really is needed.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 19:55     ` Peter Zijlstra
@ 2022-04-21 20:07       ` Peter Zijlstra
  2022-04-22 15:52         ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 20:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 09:55:51PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
> > Peter Zijlstra <peterz@infradead.org> writes:
> > 
> > > --- a/kernel/ptrace.c
> > > +++ b/kernel/ptrace.c
> > > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
> > >  	}
> > >  	__set_current_state(TASK_RUNNING);
> > >  
> > > -	if (!wait_task_inactive(child, TASK_TRACED) ||
> > > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
> > >  	    !ptrace_freeze_traced(child))
> > >  		return -ESRCH;
> > 
> > Do we mind that this is going to fail if the child is frozen
> > during ptrace_check_attach?
> 
> Why should this fail? wait_task_inactive() will in fact succeed if it is
> frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
> changes elsewhere in this patch.

These:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3260,6 +3260,19 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+static inline bool __wti_match(struct task_struct *p, unsigned int match_state)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if ((match_state & TASK_FREEZABLE) && state == TASK_FROZEN)
+		return true;
+
+	if (state == (match_state & ~TASK_FREEZABLE))
+		return true;
+
+	return false;
+}
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3304,7 +3317,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && !__wti_match(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3332,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || __wti_match(p, match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 


> And I don't see why ptrace_freeze_traced() should fail. It'll warn
> though, I should extend/remove that WARN_ON_ONCE() looking at __state,
> but it should work.

And that looks like (after removal of the one WARN):

static bool ptrace_freeze_traced(struct task_struct *task)
{
	unsigned long flags;
	bool ret = false;

	/* Lockless, nobody but us can set this flag */
	if (task->jobctl & JOBCTL_LISTENING)
		return ret;

	if (!lock_task_sighand(task, &flags))
		return ret;

	if (task_is_traced(task) &&
	    !looks_like_a_spurious_pid(task) &&
	    !__fatal_signal_pending(task)) {
		WARN_ON_ONCE(task->jobctl & JOBCTL_DELAY_WAKEKILL);
		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
		ret = true;
	}
	unlock_task_sighand(task, &flags);

	return ret;
}

And nothing there cares about ->__state.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 20:07       ` Peter Zijlstra
@ 2022-04-22 15:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-22 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 21, 2022 at 09:55:51PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> > 
>> > > --- a/kernel/ptrace.c
>> > > +++ b/kernel/ptrace.c
>> > > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
>> > >  	}
>> > >  	__set_current_state(TASK_RUNNING);
>> > >  
>> > > -	if (!wait_task_inactive(child, TASK_TRACED) ||
>> > > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
>> > >  	    !ptrace_freeze_traced(child))
>> > >  		return -ESRCH;
>> > 
>> > Do we mind that this is going to fail if the child is frozen
>> > during ptrace_check_attach?
>> 
>> Why should this fail? wait_task_inactive() will in fact succeed if it is
>> frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
>> changes elsewhere in this patch.
>
> These:

I had missed that change to wait_task_inactive.

Still that change to wait_task_inactive fundamentally depends upon the
fact that we don't care about the state we are passing into
wait_task_inactive.  So I think it would be better to simply have a
precursor patch that changes wait_task_inactive(child, TASK_TRACED) to
wait_task_inactive(child, 0) and say so explicitly.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
@ 2022-04-22 17:43 ` Sebastian Andrzej Siewior
  2022-04-22 19:15   ` Eric W. Biederman
  5 siblings, 1 reply; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-22 17:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, Will Deacon, linux-kernel, tj, linux-pm

On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
> Find here a new posting of the ptrace and freezer patches :-)
> 
> The majority of the changes are in patch 2, which with much feedback from Oleg
> and Eric has changed lots.
> 
> I'm hoping we're converging on something agreeable.

I tested this under RT (had to remove the preempt-disable section in
ptrace_stop()) with ssdd [0]. It forks a few tasks and then
PTRACE_SINGLESTEPs them for a few iterations.

The following failures were reported by that tool:
| forktest#27/3790: EXITING, ERROR: wait on PTRACE_ATTACH saw a SIGCHLD count of 0, should be 1
| forktest#225/40029: EXITING, ERROR: wait on PTRACE_SINGLESTEP #22241: no SIGCHLD seen (signal count == 0), signo 5

very rarely. Then I managed to figure out that the latter error triggers
if I compile something large with a RT priority. Sadly it also happens
with my old ptrace hack (but I just noticed it). It didn't happen with
without RT (just the 5 patches applied).

I also managed to trigger this backtrace with RT:
|WARNING: CPU: 1 PID: 3748 at kernel/signal.c:2237 ptrace_stop+0x356/0x370
|Modules linked in:
|CPU: 1 PID: 3748 Comm: ssdd Not tainted 5.18.0-rc3-rt1+ #1
|Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
|RIP: 0010:ptrace_stop+0x356/0x370
|RSP: 0000:ffffc9000d277d98 EFLAGS: 00010246
|RAX: ffff888116d1e100 RBX: ffff888116d1e100 RCX: 0000000000000001
|RDX: 0000000000000001 RSI: 000000000000002e RDI: ffffffff822bdcc3
|RBP: ffff888116d1e100 R08: ffff88811ca99870 R09: 0000000000000001
|R10: ffff88811ca99910 R11: ffff88852ade2680 R12: ffffc9000d277e90
|R13: 0000000000000004 R14: ffff888116d1ed48 R15: 0000000000000000
|FS:  00007f0afdad4580(0000) GS:ffff88852aa40000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|CR2: 00007f0afdad4508 CR3: 0000000558198006 CR4: 00000000000606e0
|Call Trace:
| <TASK>
| get_signal+0x553/0x870
| arch_do_signal_or_restart+0x31/0x7b0
| exit_to_user_mode_prepare+0xe4/0x110
| irqentry_exit_to_user_mode+0x5/0x20
| noist_exc_debug+0xe0/0x120
| asm_exc_debug+0x2b/0x30
|RSP: 002b:00007fffae964b70 EFLAGS: 00000346
|RAX: 0000000000000000 RBX: 00000000000000fc RCX: 00007f0afd9c0d35
|RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
|RBP: 00007fffae964e38 R08: 0000000000000000 R09: 00007fffae962a82
|R10: 00007f0afdad4850 R11: 0000000000000246 R12: 0000000000000000
|R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
| </TASK>

which is the WARN_ON_ONCE() in clear_traced_quiesce().

[0] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
@ 2022-04-22 19:15   ` Eric W. Biederman
  2022-04-22 21:13     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-22 19:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, linux-kernel,
	tj, linux-pm

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
>> Find here a new posting of the ptrace and freezer patches :-)
>> 
>> The majority of the changes are in patch 2, which with much feedback from Oleg
>> and Eric has changed lots.
>> 
>> I'm hoping we're converging on something agreeable.
>
> I tested this under RT (had to remove the preempt-disable section in
> ptrace_stop()) with ssdd [0]. It forks a few tasks and then
> PTRACE_SINGLESTEPs them for a few iterations.

Out of curiosity why did you need to remove the preempt_disable section
on PREEMPT_RT?  It should have lasted for just a moment until schedule
was called.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-22 19:15   ` Eric W. Biederman
@ 2022-04-22 21:13     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-22 21:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, linux-kernel,
	tj, linux-pm

On 2022-04-22 14:15:35 [-0500], Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
> >> Find here a new posting of the ptrace and freezer patches :-)
> >> 
> >> The majority of the changes are in patch 2, which with much feedback from Oleg
> >> and Eric has changed lots.
> >> 
> >> I'm hoping we're converging on something agreeable.
> >
> > I tested this under RT (had to remove the preempt-disable section in
> > ptrace_stop()) with ssdd [0]. It forks a few tasks and then
> > PTRACE_SINGLESTEPs them for a few iterations.
> 
> Out of curiosity why did you need to remove the preempt_disable section
> on PREEMPT_RT?  It should have lasted for just a moment until schedule
> was called.

within that section spinlock_t locks are acquired. These locks are
sleeping locks on PREEMPT_RT and must not be acquired within a
preempt-disable section. (A spinlock_t lock does not disable preemption
on PREEMPT_RT.)

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
  2022-04-21 18:40   ` Eric W. Biederman
@ 2022-04-25 14:35   ` Oleg Nesterov
  2022-04-25 18:33     ` Peter Zijlstra
  2022-04-25 17:47   ` Oleg Nesterov
  2022-04-27 15:53   ` Oleg Nesterov
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-25 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> +static void clear_traced_quiesce(void)
> +{
> +	spin_lock_irq(&current->sighand->siglock);
> +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));

This WARN_ON_ONCE() doesn't look right, the task can be killed right
after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
drops siglock.

> @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
>  		/*
>  		 * Don't want to allow preemption here, because
>  		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
>  		 */
>  		preempt_disable();
>  		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> +
> +		/*
> +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
> +		 * set_current_state(TASK_TRACED) above and schedule() below.
> +		 * There must not be any blocking (specifically anything that
> +		 * touched ->saved_state on PREEMPT_RT) between here and
> +		 * schedule().
> +		 *
> +		 * ptrace_check_attach() relies on this with its
> +		 * wait_task_inactive() usage.
> +		 */
> +		clear_traced_quiesce();

Well, I think it should be called earlier under tasklist_lock,
before preempt_disable() above.

We need tasklist_lock to protect ->parent, debugger can be killed
and go away right after read_unlock(&tasklist_lock).

Still trying to convince myself everything is right with
JOBCTL_STOPPED/TRACED ...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                     ` (2 preceding siblings ...)
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
@ 2022-04-25 17:47   ` Oleg Nesterov
  2022-04-27  0:24     ` Eric W. Biederman
  2022-04-27 15:53   ` Oleg Nesterov
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-25 17:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
>  	 * schedule() will not sleep if there is a pending signal that
>  	 * can awaken the task.
>  	 */
> -	current->jobctl |= JOBCTL_TRACED;
> +	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
>  	set_special_state(TASK_TRACED);

OK, this looks wrong. I actually mean the previous patch which sets
JOBCTL_TRACED.

The problem is that the tracee can be already killed, so that
fatal_signal_pending(current) is true. In this case we can't rely on
signal_wake_up_state() which should clear JOBCTL_TRACED, or the
callers of ptrace_signal_wake_up/etc which clear this flag by hand.

In this case schedule() won't block and ptrace_stop() will leak
JOBCTL_TRACED. Unless I missed something.

We could check fatal_signal_pending() and damn! this is what I think
ptrace_stop() should have done from the very beginning. But for now
I'd suggest to simply clear this flag before return, along with
DELAY_WAKEKILL and LISTENING.

>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

	current->jobctl &=
		~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
@ 2022-04-25 18:33     ` Peter Zijlstra
  2022-04-26  0:38       ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-25 18:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > +static void clear_traced_quiesce(void)
> > +{
> > +	spin_lock_irq(&current->sighand->siglock);
> > +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
> 
> This WARN_ON_ONCE() doesn't look right, the task can be killed right
> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
> drops siglock.

OK, will look at that.

> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
> >  		/*
> >  		 * Don't want to allow preemption here, because
> >  		 * sys_ptrace() needs this task to be inactive.
> > -		 *
> > -		 * XXX: implement read_unlock_no_resched().
> >  		 */
> >  		preempt_disable();
> >  		read_unlock(&tasklist_lock);
> > -		cgroup_enter_frozen();
> > +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> > +
> > +		/*
> > +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
> > +		 * set_current_state(TASK_TRACED) above and schedule() below.
> > +		 * There must not be any blocking (specifically anything that
> > +		 * touched ->saved_state on PREEMPT_RT) between here and
> > +		 * schedule().
> > +		 *
> > +		 * ptrace_check_attach() relies on this with its
> > +		 * wait_task_inactive() usage.
> > +		 */
> > +		clear_traced_quiesce();
> 
> Well, I think it should be called earlier under tasklist_lock,
> before preempt_disable() above.
> 
> We need tasklist_lock to protect ->parent, debugger can be killed
> and go away right after read_unlock(&tasklist_lock).
> 
> Still trying to convince myself everything is right with
> JOBCTL_STOPPED/TRACED ...

Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
use ->saved_state).

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 18:33     ` Peter Zijlstra
@ 2022-04-26  0:38       ` Eric W. Biederman
  2022-04-26  5:51         ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26  0:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
>> On 04/21, Peter Zijlstra wrote:
>> >
>> > +static void clear_traced_quiesce(void)
>> > +{
>> > +	spin_lock_irq(&current->sighand->siglock);
>> > +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
>> 
>> This WARN_ON_ONCE() doesn't look right, the task can be killed right
>> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
>> drops siglock.
>
> OK, will look at that.
>
>> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
>> >  		/*
>> >  		 * Don't want to allow preemption here, because
>> >  		 * sys_ptrace() needs this task to be inactive.
>> > -		 *
>> > -		 * XXX: implement read_unlock_no_resched().
>> >  		 */
>> >  		preempt_disable();
>> >  		read_unlock(&tasklist_lock);
>> > -		cgroup_enter_frozen();
>> > +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
>> > +
>> > +		/*
>> > +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
>> > +		 * set_current_state(TASK_TRACED) above and schedule() below.
>> > +		 * There must not be any blocking (specifically anything that
>> > +		 * touched ->saved_state on PREEMPT_RT) between here and
>> > +		 * schedule().
>> > +		 *
>> > +		 * ptrace_check_attach() relies on this with its
>> > +		 * wait_task_inactive() usage.
>> > +		 */
>> > +		clear_traced_quiesce();
>> 
>> Well, I think it should be called earlier under tasklist_lock,
>> before preempt_disable() above.
>> 
>> We need tasklist_lock to protect ->parent, debugger can be killed
>> and go away right after read_unlock(&tasklist_lock).
>> 
>> Still trying to convince myself everything is right with
>> JOBCTL_STOPPED/TRACED ...
>
> Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
> use ->saved_state).

There are some other issues in this part of ptrace_stop().


I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".


Currently in ptrace_check_attach a parameter of __TASK_TRACED is passed
so that wait_task_inactive cane fail if the "!current->ptrace" branch
of ptrace_stop is take and ptrace_stop does not stop.  With the
TASK_FROZEN state it appears that "!current->ptrace" branch can continue
and freeze somewhere else and wait_task_inactive could decided it was
fine.


I have to run, but hopefully tommorrow I will post the patches that
remove the "!current->ptrace" case altogether and basically
remove the need for quiesce and wait_task_inactive detecting
which branch is taken.

The spinlock in cgroup_enter_frozen remains an issue for PREEMPT_RT.
But the rest of the issues are cleared up by using siglock instead
of tasklist_lock.  Plus the code is just easier to read and understand.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26  0:38       ` Eric W. Biederman
@ 2022-04-26  5:51         ` Oleg Nesterov
  2022-04-26 17:19           ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-26  5:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/25, Eric W. Biederman wrote:
>
> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".

As Peter explained, in this case we can rely on __ptrace_unlink() which
should clear this flag.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26  5:51         ` Oleg Nesterov
@ 2022-04-26 17:19           ` Eric W. Biederman
  2022-04-26 18:11             ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 17:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/25, Eric W. Biederman wrote:
>>
>> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".
>
> As Peter explained, in this case we can rely on __ptrace_unlink() which
> should clear this flag.

I had missed that that signal_wake_up_state was clearing
JOBCTL_TRACED_QUIESCE.

Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
siglock is taken before calling ptrace_stop.  Especially with the
ptrace_notify in signal_delivered that does not look guaranteed.

The __ptrace_unlink could also happen during arch_ptrace_stop.

Relying on siglock is sufficient because __ptrace_unlink holds siglock
over clearing task->ptrace.  Which means that the simple fix for this is
to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26 17:19           ` Eric W. Biederman
@ 2022-04-26 18:11             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-26 18:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/26, Eric W. Biederman wrote:
>
> Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
> siglock is taken before calling ptrace_stop.  Especially with the
> ptrace_notify in signal_delivered that does not look guaranteed.
>
> The __ptrace_unlink could also happen during arch_ptrace_stop.
>
> Relying on siglock is sufficient because __ptrace_unlink holds siglock
> over clearing task->ptrace.  Which means that the simple fix for this is
> to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Or simply clear _QUEIESCE along with _TRACED/DELAY_WAKEKILL before return?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/9] ptrace: cleaning up ptrace_stop
  2022-04-21 18:40   ` Eric W. Biederman
@ 2022-04-26 22:50       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook


While looking at how ptrace is broken on PREEMPT_RT I realized
that ptrace_stop would be much simpler and more maintainable
if tsk->ptrace, tsk->parent, and tsk->real_parent were protected
by siglock.  Most of the changes are general cleanups in support
of this locking change.

While making the necessary changes to protect tsk->ptrace with
siglock I discovered we have two architectures xtensa and um
that were using tsk->ptrace for what most other architectures
use TIF_SIGPENDING for and not protecting tsk->ptrace with any lock.

By the end of this series ptrace should work on PREEMPT_RT with the
CONFIG_FREEZER and CONFIG_CGROUPS disabled, by the simple fact that the
ptrace_stop code becomes less special.  The function cgroup_enter_frozen
because it takes a lock which is a sleeping lock on PREEMPT_RT with
preemption disabled definitely remains a problem.  Peter Zijlstra has
been rewriting the classic freezer and in earlier parts of this
discussion so I presume it is also a problem for PREEMPT_RT.

Peter's series rewriting the freezer[1] should work on top of this
series with minimal changes and patch 2/5 removed.

Eric W. Biederman (9):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Protect parent child relationships by childs siglock
      signal: Always call do_notify_parent_cldstop with siglock held
      ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      ptrace: Don't change __state

 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +-
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched/jobctl.h      |   2 +
 include/linux/sched/signal.h      |   3 +-
 include/linux/signal.h            |   3 +-
 kernel/exit.c                     |   4 +
 kernel/fork.c                     |  12 +--
 kernel/ptrace.c                   |  61 ++++++-------
 kernel/signal.c                   | 187 ++++++++++++++------------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 17 files changed, 131 insertions(+), 184 deletions(-)

[1] https://lkml.kernel.org/r/20220421150248.667412396@infradead.org

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/9] ptrace: cleaning up ptrace_stop
@ 2022-04-26 22:50       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook


While looking at how ptrace is broken on PREEMPT_RT I realized
that ptrace_stop would be much simpler and more maintainable
if tsk->ptrace, tsk->parent, and tsk->real_parent were protected
by siglock.  Most of the changes are general cleanups in support
of this locking change.

While making the necessary changes to protect tsk->ptrace with
siglock I discovered we have two architectures xtensa and um
that were using tsk->ptrace for what most other architectures
use TIF_SIGPENDING for and not protecting tsk->ptrace with any lock.

By the end of this series ptrace should work on PREEMPT_RT with the
CONFIG_FREEZER and CONFIG_CGROUPS disabled, by the simple fact that the
ptrace_stop code becomes less special.  The function cgroup_enter_frozen
because it takes a lock which is a sleeping lock on PREEMPT_RT with
preemption disabled definitely remains a problem.  Peter Zijlstra has
been rewriting the classic freezer and in earlier parts of this
discussion so I presume it is also a problem for PREEMPT_RT.

Peter's series rewriting the freezer[1] should work on top of this
series with minimal changes and patch 2/5 removed.

Eric W. Biederman (9):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Protect parent child relationships by childs siglock
      signal: Always call do_notify_parent_cldstop with siglock held
      ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      ptrace: Don't change __state

 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +-
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched/jobctl.h      |   2 +
 include/linux/sched/signal.h      |   3 +-
 include/linux/signal.h            |   3 +-
 kernel/exit.c                     |   4 +
 kernel/fork.c                     |  12 +--
 kernel/ptrace.c                   |  61 ++++++-------
 kernel/signal.c                   | 187 ++++++++++++++------------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 17 files changed, 131 insertions(+), 184 deletions(-)

[1] https://lkml.kernel.org/r/20220421150248.667412396@infradead.org

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 1/9] signal: Rename send_signal send_signal_locked
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 1/9] signal: Rename send_signal send_signal_locked
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 2/9] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 2/9] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by siglock and not siglock.

Simplify things by guarding the parent/child relationship
with siglock.  For the most part this just requires a little bit
of code motion.  In a couple of places more locking was needed.

After this change tsk->parent, tsk->real_parent, tsk->ptrace tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c | 13 +++++++++----
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..b07af19eca13 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..16d1a84a2cae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -447,15 +446,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -521,8 +520,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
@@ -689,10 +690,14 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 		return ret;
 
 	/* Avoid intermediate state when all opts are cleared */
+	write_lock_irq(&tasklist_lock);
+	spin_lock(&child->sighand->siglock);
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
 	child->ptrace = flags;
+	spin_unlock(&child->sighand->siglock);
+	write_unlock_irq(&tasklist_lock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by siglock and not siglock.

Simplify things by guarding the parent/child relationship
with siglock.  For the most part this just requires a little bit
of code motion.  In a couple of places more locking was needed.

After this change tsk->parent, tsk->real_parent, tsk->ptrace tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c | 13 +++++++++----
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..b07af19eca13 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..16d1a84a2cae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -447,15 +446,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -521,8 +520,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
@@ -689,10 +690,14 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 		return ret;
 
 	/* Avoid intermediate state when all opts are cleared */
+	write_lock_irq(&tasklist_lock);
+	spin_lock(&child->sighand->siglock);
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
 	child->ptrace = flags;
+	spin_unlock(&child->sighand->siglock);
+	write_unlock_irq(&tasklist_lock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 156 +++++++++++++++++-------------------------------
 1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..584d67deb3cb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	bool lock;
 	u64 utime, stime;
 
+	assert_spin_locked(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
+	lock = tsk->sighand != sighand;
+	if (lock)
+		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2172,7 +2176,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
+	if (lock)
+		spin_unlock(&sighand->siglock);
 }
 
 /*
@@ -2193,7 +2198,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if current is not ptraced */
+	if (unlikely(!current->ptrace))
+		return (clear_code) ? 0 : exit_code;
+
+	/*
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
+	 */
+	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
+		gstop_done = task_participate_group_stop(current);
+
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	do_notify_parent_cldstop(current, true, why);
+	if (gstop_done && ptrace_reparented(current))
+		do_notify_parent_cldstop(current, false, why);
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
@@ -2239,15 +2271,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
 
-	/*
-	 * If @why is CLD_STOPPED, we're trapping to participate in a group
-	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * across siglock relocks since INTERRUPT was scheduled, PENDING
-	 * could be clear now.  We act as if SIGCONT is received after
-	 * TASK_TRACED is entered - ignore it.
-	 */
-	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
-		gstop_done = task_participate_group_stop(current);
 
 	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
 	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
@@ -2257,56 +2280,19 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement spin_unlock_no_resched().
+	 */
+	preempt_disable();
 	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
-		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2300,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2444,34 +2429,17 @@ static bool do_signal_stop(int signr)
 	}
 
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
 
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2665,8 +2633,6 @@ bool get_signal(struct ksignal *ksig)
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Notify the parent that we're continuing.  This event is
 		 * always per-process and doesn't make whole lot of sense
@@ -2675,15 +2641,10 @@ bool get_signal(struct ksignal *ksig)
 		 * the ptracer of the group leader too unless it's gonna be
 		 * a duplicate.
 		 */
-		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, false, why);
-
 		if (ptrace_reparented(current->group_leader))
 			do_notify_parent_cldstop(current->group_leader,
 						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
 	}
 
 	for (;;) {
@@ -2940,7 +2901,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2971,21 +2931,15 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
-	}
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
+	    task_participate_group_stop(tsk))
+		do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 156 +++++++++++++++++-------------------------------
 1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..584d67deb3cb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	bool lock;
 	u64 utime, stime;
 
+	assert_spin_locked(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
+	lock = tsk->sighand != sighand;
+	if (lock)
+		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2172,7 +2176,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
+	if (lock)
+		spin_unlock(&sighand->siglock);
 }
 
 /*
@@ -2193,7 +2198,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if current is not ptraced */
+	if (unlikely(!current->ptrace))
+		return (clear_code) ? 0 : exit_code;
+
+	/*
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
+	 */
+	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
+		gstop_done = task_participate_group_stop(current);
+
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	do_notify_parent_cldstop(current, true, why);
+	if (gstop_done && ptrace_reparented(current))
+		do_notify_parent_cldstop(current, false, why);
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
@@ -2239,15 +2271,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
 
-	/*
-	 * If @why is CLD_STOPPED, we're trapping to participate in a group
-	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * across siglock relocks since INTERRUPT was scheduled, PENDING
-	 * could be clear now.  We act as if SIGCONT is received after
-	 * TASK_TRACED is entered - ignore it.
-	 */
-	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
-		gstop_done = task_participate_group_stop(current);
 
 	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
 	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
@@ -2257,56 +2280,19 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement spin_unlock_no_resched().
+	 */
+	preempt_disable();
 	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
-		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2300,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2444,34 +2429,17 @@ static bool do_signal_stop(int signr)
 	}
 
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
 
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2665,8 +2633,6 @@ bool get_signal(struct ksignal *ksig)
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Notify the parent that we're continuing.  This event is
 		 * always per-process and doesn't make whole lot of sense
@@ -2675,15 +2641,10 @@ bool get_signal(struct ksignal *ksig)
 		 * the ptracer of the group leader too unless it's gonna be
 		 * a duplicate.
 		 */
-		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, false, why);
-
 		if (ptrace_reparented(current->group_leader))
 			do_notify_parent_cldstop(current->group_leader,
 						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
 	}
 
 	for (;;) {
@@ -2940,7 +2901,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2971,21 +2931,15 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
-	}
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
+	    task_participate_group_stop(tsk))
+		do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
was needed to detect the when ptrace_stop would decide not to stop
after calling "set_special_state(TASK_TRACED)".  With the recent
cleanups ptrace_stop will always stop after calling set_special_state.

Take advatnage of this by no longer asking wait_task_inactive to
verify the state.  If a bug is hit and wait_task_inactive does not
succeed warn and return -ESRCH.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 16d1a84a2cae..0634da7ac685 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
was needed to detect the when ptrace_stop would decide not to stop
after calling "set_special_state(TASK_TRACED)".  With the recent
cleanups ptrace_stop will always stop after calling set_special_state.

Take advatnage of this by no longer asking wait_task_inactive to
verify the state.  If a bug is hit and wait_task_inactive does not
succeed warn and return -ESRCH.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 16d1a84a2cae..0634da7ac685 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0634da7ac685..842511ee9a9f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -189,17 +189,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -237,33 +234,35 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			/*
+			 * child->sighand can't be NULL, release_task()
+			 * does ptrace_unlink() before __exit_signal().
+			 */
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0634da7ac685..842511ee9a9f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -189,17 +189,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -237,33 +234,35 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			/*
+			 * child->sighand can't be NULL, release_task()
+			 * does ptrace_unlink() before __exit_signal().
+			 */
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL.  This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL
is used while JOBCTL_DELAY_WAKEKILL is set.  This has the same effect
as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that
use TASK_KILLABLE go through signal_wake_up except the wake_up in
ptrace_unfreeze_traced.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_DELAY_WAKEKILL.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  3 ++-
 kernel/ptrace.c              | 11 +++++------
 kernel/signal.c              |  1 +
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..4e154ad8205f 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT	24	/* delay killable wakeups */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..1947c85aa9d9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
+	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 842511ee9a9f..0bea74539320 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
 
@@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task->jobctl & JOBCTL_DELAY_WAKEKILL) {
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	if (lock_task_sighand(child, &flags)) {
 		if (child->ptrace && child->parent == current) {
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
 			/*
 			 * child->sighand can't be NULL, release_task()
 			 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 584d67deb3cb..2b332f89cbad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL.  This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL
is used while JOBCTL_DELAY_WAKEKILL is set.  This has the same effect
as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that
use TASK_KILLABLE go through signal_wake_up except the wake_up in
ptrace_unfreeze_traced.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_DELAY_WAKEKILL.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  3 ++-
 kernel/ptrace.c              | 11 +++++------
 kernel/signal.c              |  1 +
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..4e154ad8205f 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT	24	/* delay killable wakeups */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..1947c85aa9d9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
+	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 842511ee9a9f..0bea74539320 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
 
@@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task->jobctl & JOBCTL_DELAY_WAKEKILL) {
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	if (lock_task_sighand(child, &flags)) {
 		if (child->ptrace && child->parent == current) {
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
 			/*
 			 * child->sighand can't be NULL, release_task()
 			 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 584d67deb3cb..2b332f89cbad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-26 23:33           ` Max Filippov
  -1 siblings, 0 replies; 572+ messages in thread
From: Max Filippov @ 2022-04-26 23:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, rjw, Oleg Nesterov, Ingo Molnar, Vincent Guittot,
	dietmar.eggemann, Steven Rostedt, mgorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, inux-xtensa, Kees Cook, Jann Horn

On Tue, Apr 26, 2022 at 3:52 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
> user_enable_single_step and user_disable_single_step without locking could
> potentiallly cause problems.
>
> So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
> that xtensa already had defined but unused.
>
> Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/xtensa/kernel/ptrace.c | 4 ++--
>  arch/xtensa/kernel/signal.c | 4 ++--
>  include/linux/ptrace.h      | 6 ------
>  3 files changed, 4 insertions(+), 10 deletions(-)

Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-26 23:33           ` Max Filippov
  0 siblings, 0 replies; 572+ messages in thread
From: Max Filippov @ 2022-04-26 23:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, rjw, Oleg Nesterov, Ingo Molnar, Vincent Guittot,
	dietmar.eggemann, Steven Rostedt, mgorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, inux-xtensa, Kees Cook, Jann Horn

On Tue, Apr 26, 2022 at 3:52 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
> user_enable_single_step and user_disable_single_step without locking could
> potentiallly cause problems.
>
> So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
> that xtensa already had defined but unused.
>
> Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/xtensa/kernel/ptrace.c | 4 ++--
>  arch/xtensa/kernel/signal.c | 4 ++--
>  include/linux/ptrace.h      | 6 ------
>  3 files changed, 4 insertions(+), 10 deletions(-)

Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
@ 2022-04-26 23:34   ` Eric W. Biederman
  2022-04-28 10:00     ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 23:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
>
> There's two spots of bother with this:
>
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
>
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
>
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
>
> NOTE: this doesn't actually fix anything yet, just adds extra state.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
>  	 * By using wake_up_state, we ensure the process will wake up and
>  	 * handle its death signal.
>  	 */
> -	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
> +	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
> +		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
> +	else
>  		kick_process(t);
>  }

This hunk is subtle and I don't think it is actually what we want if the
code is going to be robust against tsk->__state becoming TASK_FROZEN.

I think we want the clearing of JOBCTL_STOPPED and JOBCTL_TRACED
to be independent of what tsk->__state and tsk->saved_state are.

Something like:

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
	unsigned int state = 0;
	if (resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL)) {
		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
		state = TASK_WAKEKILL;
	}
	signal_wake_up_state(t, state);
}

static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
	unsigned int state = 0;
	if (resume) {
		t->jobctl &= ~JOBCTL_TRACED;
		state = __TASK_TRACED;
	}
	signal_wake_up_state(t, state);
}

That would allow __set_task_special in the final patch to look like:

/*
 * The special task states (TASK_STOPPED, TASK_TRACED) keep their canonical
 * state in p->jobctl. If either of them got a wakeup that was missed because
 * TASK_FROZEN, then their canonical state reflects that and the below will
 * refuse to restore the special state and instead issue the wakeup.
 */
static int __set_task_special(struct task_struct *p, void *arg)
{
        unsigned int state = 0;

	if (p->jobctl & JOBCTL_TRACED)
        	state = TASK_TRACED;

	else if (p->jobctl & JOBCTL_STOPPED)
		state = TASK_STOPPED;

	if (state)
		WRITE_ONCE(p->__state, state);

	return state;
}


With no need to figure out if a wake_up was dropped and reverse engineer
what the wakeup was.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 17:47   ` Oleg Nesterov
@ 2022-04-27  0:24     ` Eric W. Biederman
  2022-04-28 20:29       ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27  0:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
>>  	 * schedule() will not sleep if there is a pending signal that
>>  	 * can awaken the task.
>>  	 */
>> -	current->jobctl |= JOBCTL_TRACED;
>> +	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
>>  	set_special_state(TASK_TRACED);
>
> OK, this looks wrong. I actually mean the previous patch which sets
> JOBCTL_TRACED.
>
> The problem is that the tracee can be already killed, so that
> fatal_signal_pending(current) is true. In this case we can't rely on
> signal_wake_up_state() which should clear JOBCTL_TRACED, or the
> callers of ptrace_signal_wake_up/etc which clear this flag by hand.
>
> In this case schedule() won't block and ptrace_stop() will leak
> JOBCTL_TRACED. Unless I missed something.
>
> We could check fatal_signal_pending() and damn! this is what I think
> ptrace_stop() should have done from the very beginning. But for now
> I'd suggest to simply clear this flag before return, along with
> DELAY_WAKEKILL and LISTENING.

Oh.  That is an interesting case for JOBCTL_TRACED.  The
scheduler refuses to stop if signal_pending_state(TASK_TRACED, p)
returns true.

The ptrace_stop code used to handle this explicitly and in commit
7d613f9f72ec ("signal: Remove the bogus sigkill_pending in ptrace_stop")
I actually removed the test.  As the test was somewhat wrong and
redundant, and in slightly the wrong location.

But doing:

	/* Don't stop if the task is dying */
	if (unlikely(__fatal_signal_pending(current)))
		return exit_code;

Should work.

>
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> 	current->jobctl &=
> 		~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);


I presume you meant:

	current->jobctl &=
 		~(JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

I don't think we want to do that.  For the case you are worried about it
is a valid fix.

In general this is the wrong approach as we want the waker to clear
JOBCTL_TRACED.  If the waker does not it is possible that
ptrace_freeze_traced might attempt to freeze a process whose state
is not appropriate for attach, because the code is past the call
to schedule().

In fact I think clearing JOBCTL_TRACED at the end of ptrace_stop
will allow ptrace_freeze_traced to come in while siglock is dropped,
expect the process to stop, and have the process not stop.  Of
course wait_task_inactive coming first that might not be a problem.



This is a minor problem with the patchset I just posted.  I thought the
only reason wait_task_inactive could fail was if ptrace_stop() hit the
!current->ptrace case.  Thinking about any it any SIGKILL coming in
before tracee stops in schedule will trigger this, so it is not as
safe as I thought to not pass a state into wait_task_inactive.

It is time for me to shut down today.  I will sleep on that and
see what I can see tomorrow.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27  6:40           ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-27  6:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
> The functions ptrace_stop and do_signal_stop have to drop siglock
> and grab tasklist_lock because the parent/child relation ship
> is guarded by siglock and not siglock.

 "is guarded by tasklist_lock and not siglock." ?

> Simplify things by guarding the parent/child relationship
> with siglock.  For the most part this just requires a little bit
> of code motion.  In a couple of places more locking was needed.
> 
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-27  6:40           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-27  6:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
> The functions ptrace_stop and do_signal_stop have to drop siglock
> and grab tasklist_lock because the parent/child relation ship
> is guarded by siglock and not siglock.

 "is guarded by tasklist_lock and not siglock." ?

> Simplify things by guarding the parent/child relationship
> with siglock.  For the most part this just requires a little bit
> of code motion.  In a couple of places more locking was needed.
> 
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27  7:10           ` Johannes Berg
  -1 siblings, 0 replies; 572+ messages in thread
From: Johannes Berg @ 2022-04-27  7:10 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, linux-um,
	Chris Zankel, Max Filippov, inux-xtensa, Kees Cook, Jann Horn

On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
> single stepping is a little confusing and worse changing tsk->ptrace without locking
> could potentionally cause problems.
> 
> So use a thread info flag with a better name instead of flag in tsk->ptrace.
> 
> Remove the definition PT_DTRACE as uml is the last user.


Looks fine to me.

Acked-by: Johannes Berg <johannes@sipsolutions.net>

Looking at pending patches, I don't see any conflicts from this. I'm
guessing anyway you'll want/need to take these through some tree all
together.

johannes



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-27  7:10           ` Johannes Berg
  0 siblings, 0 replies; 572+ messages in thread
From: Johannes Berg @ 2022-04-27  7:10 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, linux-um,
	Chris Zankel, Max Filippov, inux-xtensa, Kees Cook, Jann Horn

On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
> single stepping is a little confusing and worse changing tsk->ptrace without locking
> could potentionally cause problems.
> 
> So use a thread info flag with a better name instead of flag in tsk->ptrace.
> 
> Remove the definition PT_DTRACE as uml is the last user.


Looks fine to me.

Acked-by: Johannes Berg <johannes@sipsolutions.net>

Looking at pending patches, I don't see any conflicts from this. I'm
guessing anyway you'll want/need to take these through some tree all
together.

johannes



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-27  6:40           ` Sebastian Andrzej Siewior
@ 2022-04-27 13:35             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
>> The functions ptrace_stop and do_signal_stop have to drop siglock
>> and grab tasklist_lock because the parent/child relation ship
>> is guarded by siglock and not siglock.
>
>  "is guarded by tasklist_lock and not siglock." ?

Yes.   Thank you.  I will fix that.

>> Simplify things by guarding the parent/child relationship
>> with siglock.  For the most part this just requires a little bit
>> of code motion.  In a couple of places more locking was needed.
>> 
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Sebastian

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-27 13:35             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
>> The functions ptrace_stop and do_signal_stop have to drop siglock
>> and grab tasklist_lock because the parent/child relation ship
>> is guarded by siglock and not siglock.
>
>  "is guarded by tasklist_lock and not siglock." ?

Yes.   Thank you.  I will fix that.

>> Simplify things by guarding the parent/child relationship
>> with siglock.  For the most part this just requires a little bit
>> of code motion.  In a couple of places more locking was needed.
>> 
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Sebastian

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 13:42           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

As Oleg noticed upthread there are more reasons than simply
!current->ptrace for wait_task_inactive to fail.  In particular a fatal
signal can be received any time before JOBCTL_DELAY_SIGKILL.

So this change is not safe.  I will respin this one.

Eric


> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 16d1a84a2cae..0634da7ac685 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
> +		ret = -ESRCH;
>  
>  	return ret;
>  }

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 13:42           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

As Oleg noticed upthread there are more reasons than simply
!current->ptrace for wait_task_inactive to fail.  In particular a fatal
signal can be received any time before JOBCTL_DELAY_SIGKILL.

So this change is not safe.  I will respin this one.

Eric


> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 16d1a84a2cae..0634da7ac685 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
> +		ret = -ESRCH;
>  
>  	return ret;
>  }

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-27  7:10           ` Johannes Berg
@ 2022-04-27 13:50             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:50 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
>> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
>> single stepping is a little confusing and worse changing tsk->ptrace without locking
>> could potentionally cause problems.
>> 
>> So use a thread info flag with a better name instead of flag in tsk->ptrace.
>> 
>> Remove the definition PT_DTRACE as uml is the last user.
>
>
> Looks fine to me.
>
> Acked-by: Johannes Berg <johannes@sipsolutions.net>

Thanks.

> Looking at pending patches, I don't see any conflicts from this. I'm
> guessing anyway you'll want/need to take these through some tree all
> together.

Taking them all through a single tree looks like it will be easiest.
So I am planning on taking them through my signal tree.

Now that I think of it, the lack of locking also means I want to
Cc stable.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-27 13:50             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:50 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
>> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
>> single stepping is a little confusing and worse changing tsk->ptrace without locking
>> could potentionally cause problems.
>> 
>> So use a thread info flag with a better name instead of flag in tsk->ptrace.
>> 
>> Remove the definition PT_DTRACE as uml is the last user.
>
>
> Looks fine to me.
>
> Acked-by: Johannes Berg <johannes@sipsolutions.net>

Thanks.

> Looking at pending patches, I don't see any conflicts from this. I'm
> guessing anyway you'll want/need to take these through some tree all
> together.

Taking them all through a single tree looks like it will be easiest.
So I am planning on taking them through my signal tree.

Now that I think of it, the lack of locking also means I want to
Cc stable.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 14:10           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>   	}
>
>  	sighand = parent->sighand;
> -	spin_lock_irqsave(&sighand->siglock, flags);
> +	lock = tsk->sighand != sighand;
> +	if (lock)
> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);

But why is it safe?

Suppose we have two tasks, they both trace each other, both call
ptrace_stop() at the same time. Of course this is ugly, they both
will block.

But with this patch in this case we have the trivial ABBA deadlock,
no?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:10           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>   	}
>
>  	sighand = parent->sighand;
> -	spin_lock_irqsave(&sighand->siglock, flags);
> +	lock = tsk->sighand != sighand;
> +	if (lock)
> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);

But why is it safe?

Suppose we have two tasks, they both trace each other, both call
ptrace_stop() at the same time. Of course this is ugly, they both
will block.

But with this patch in this case we have the trivial ABBA deadlock,
no?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:10           ` Oleg Nesterov
@ 2022-04-27 14:20             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>   	}
>>
>>  	sighand = parent->sighand;
>> -	spin_lock_irqsave(&sighand->siglock, flags);
>> +	lock = tsk->sighand != sighand;
>> +	if (lock)
>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>
> But why is it safe?
>
> Suppose we have two tasks, they both trace each other, both call
> ptrace_stop() at the same time. Of course this is ugly, they both
> will block.
>
> But with this patch in this case we have the trivial ABBA deadlock,
> no?

I was thinking in terms of the process tree (which is fine).

The ptrace parental relationship definitely has the potential to be a
graph with cycles.  Which as you point out is not fine.


The result is very nice and I don't want to give it up.  I suspect
something ptrace cycles are always a problem and can simply be
forbidden.  That is going to take some analsysis and some additional
code in ptrace_attach.

I will go look at that.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:20             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>   	}
>>
>>  	sighand = parent->sighand;
>> -	spin_lock_irqsave(&sighand->siglock, flags);
>> +	lock = tsk->sighand != sighand;
>> +	if (lock)
>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>
> But why is it safe?
>
> Suppose we have two tasks, they both trace each other, both call
> ptrace_stop() at the same time. Of course this is ugly, they both
> will block.
>
> But with this patch in this case we have the trivial ABBA deadlock,
> no?

I was thinking in terms of the process tree (which is fine).

The ptrace parental relationship definitely has the potential to be a
graph with cycles.  Which as you point out is not fine.


The result is very nice and I don't want to give it up.  I suspect
something ptrace cycles are always a problem and can simply be
forbidden.  That is going to take some analsysis and some additional
code in ptrace_attach.

I will go look at that.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-27 13:42           ` Eric W. Biederman
@ 2022-04-27 14:27             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
>> was needed to detect the when ptrace_stop would decide not to stop
>> after calling "set_special_state(TASK_TRACED)".  With the recent
>> cleanups ptrace_stop will always stop after calling set_special_state.
>>
>> Take advatnage of this by no longer asking wait_task_inactive to
>> verify the state.  If a bug is hit and wait_task_inactive does not
>> succeed warn and return -ESRCH.
>
> As Oleg noticed upthread there are more reasons than simply
> !current->ptrace for wait_task_inactive to fail.  In particular a fatal
> signal can be received any time before JOBCTL_DELAY_SIGKILL.
>
> So this change is not safe.  I will respin this one.

Bah.  I definitely need to update the description so there is going to
be a v2.

I confused myself.  This change is safe because ptrace_freeze_traced
fails if there is a pending fatal signal, and arranges that no new fatal
signals will wake up the task.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ptrace.c | 14 +++-----------
>>  1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 16d1a84a2cae..0634da7ac685 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
>> +		ret = -ESRCH;
>>  
>>  	return ret;
>>  }
>
> Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 14:27             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
>> was needed to detect the when ptrace_stop would decide not to stop
>> after calling "set_special_state(TASK_TRACED)".  With the recent
>> cleanups ptrace_stop will always stop after calling set_special_state.
>>
>> Take advatnage of this by no longer asking wait_task_inactive to
>> verify the state.  If a bug is hit and wait_task_inactive does not
>> succeed warn and return -ESRCH.
>
> As Oleg noticed upthread there are more reasons than simply
> !current->ptrace for wait_task_inactive to fail.  In particular a fatal
> signal can be received any time before JOBCTL_DELAY_SIGKILL.
>
> So this change is not safe.  I will respin this one.

Bah.  I definitely need to update the description so there is going to
be a v2.

I confused myself.  This change is safe because ptrace_freeze_traced
fails if there is a pending fatal signal, and arranges that no new fatal
signals will wake up the task.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ptrace.c | 14 +++-----------
>>  1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 16d1a84a2cae..0634da7ac685 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
>> +		ret = -ESRCH;
>>  
>>  	return ret;
>>  }
>
> Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:20             ` Eric W. Biederman
@ 2022-04-27 14:43               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.

OK, please consider another case.

We have a parent P and its child C. C traces P.

This is not that unusual, I don't think we can forbid this case.

P reports an event and calls do_notify_parent_cldstop().

C receives SIGSTOP and calls do_notify_parent_cldstop() too.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:43               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.

OK, please consider another case.

We have a parent P and its child C. C traces P.

This is not that unusual, I don't think we can forbid this case.

P reports an event and calls do_notify_parent_cldstop().

C receives SIGSTOP and calls do_notify_parent_cldstop() too.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:20             ` Eric W. Biederman
@ 2022-04-27 14:47               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 04/26, Eric W. Biederman wrote:
>>>
>>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>>   	}
>>>
>>>  	sighand = parent->sighand;
>>> -	spin_lock_irqsave(&sighand->siglock, flags);
>>> +	lock = tsk->sighand != sighand;
>>> +	if (lock)
>>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>>
>> But why is it safe?
>>
>> Suppose we have two tasks, they both trace each other, both call
>> ptrace_stop() at the same time. Of course this is ugly, they both
>> will block.
>>
>> But with this patch in this case we have the trivial ABBA deadlock,
>> no?
>
> I was thinking in terms of the process tree (which is fine).
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.  That is going to take some analsysis and some additional
> code in ptrace_attach.
>
> I will go look at that.


Hmm.  If we have the following process tree.

    A
     \
      B
       \
        C

Process A, B, and C are all in the same process group.
Process A and B are setup to receive SIGCHILD when
their process stops.

Process C traces process A.

When a sigstop is delivered to the group we can have:

Process B takes siglock(B) siglock(A) to notify the real_parent
Process C takes siglock(C) siglock(B) to notify the real_parent
Process A takes siglock(A) siglock(C) to notify the tracer

If they all take their local lock at the same time there is
a deadlock.

I don't think the restriction that you can never ptrace anyone
up the process tree is going to fly.  So it looks like I am back to the
drawing board for this one.

Eric




    



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:47               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 04/26, Eric W. Biederman wrote:
>>>
>>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>>   	}
>>>
>>>  	sighand = parent->sighand;
>>> -	spin_lock_irqsave(&sighand->siglock, flags);
>>> +	lock = tsk->sighand != sighand;
>>> +	if (lock)
>>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>>
>> But why is it safe?
>>
>> Suppose we have two tasks, they both trace each other, both call
>> ptrace_stop() at the same time. Of course this is ugly, they both
>> will block.
>>
>> But with this patch in this case we have the trivial ABBA deadlock,
>> no?
>
> I was thinking in terms of the process tree (which is fine).
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.  That is going to take some analsysis and some additional
> code in ptrace_attach.
>
> I will go look at that.


Hmm.  If we have the following process tree.

    A
     \
      B
       \
        C

Process A, B, and C are all in the same process group.
Process A and B are setup to receive SIGCHILD when
their process stops.

Process C traces process A.

When a sigstop is delivered to the group we can have:

Process B takes siglock(B) siglock(A) to notify the real_parent
Process C takes siglock(C) siglock(B) to notify the real_parent
Process A takes siglock(A) siglock(C) to notify the tracer

If they all take their local lock at the same time there is
a deadlock.

I don't think the restriction that you can never ptrace anyone
up the process tree is going to fly.  So it looks like I am back to the
drawing board for this one.

Eric




    



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 14:56           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if current is not ptraced */
> +	if (unlikely(!current->ptrace))
> +		return (clear_code) ? 0 : exit_code;
> +
> +	/*
> +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> +	 * could be clear now.  We act as if SIGCONT is received after
> +	 * TASK_TRACED is entered - ignore it.
> +	 */
> +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> +		gstop_done = task_participate_group_stop(current);
> +
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	do_notify_parent_cldstop(current, true, why);
> +	if (gstop_done && ptrace_reparented(current))
> +		do_notify_parent_cldstop(current, false, why);

This doesn't look right too. The parent should be notified only after
we set __state = TASK_TRACED and ->exit code.

Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
wakes it up, debugger calls wait_task_stopped() and then it will sleep
again, task_stopped_code() returns 0.

This can be probably fixed if you remove the lockless (fast path)
task_stopped_code() check in wait_task_stopped(), but this is not
nice performance-wise...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:56           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if current is not ptraced */
> +	if (unlikely(!current->ptrace))
> +		return (clear_code) ? 0 : exit_code;
> +
> +	/*
> +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> +	 * could be clear now.  We act as if SIGCONT is received after
> +	 * TASK_TRACED is entered - ignore it.
> +	 */
> +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> +		gstop_done = task_participate_group_stop(current);
> +
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	do_notify_parent_cldstop(current, true, why);
> +	if (gstop_done && ptrace_reparented(current))
> +		do_notify_parent_cldstop(current, false, why);

This doesn't look right too. The parent should be notified only after
we set __state = TASK_TRACED and ->exit code.

Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
wakes it up, debugger calls wait_task_stopped() and then it will sleep
again, task_stopped_code() returns 0.

This can be probably fixed if you remove the lockless (fast path)
task_stopped_code() check in wait_task_stopped(), but this is not
nice performance-wise...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:56           ` Oleg Nesterov
@ 2022-04-27 15:00             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Eric W. Biederman wrote:
> >
> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> >  		spin_lock_irq(&current->sighand->siglock);
> >  	}
> >
> > +	/* Don't stop if current is not ptraced */
> > +	if (unlikely(!current->ptrace))
> > +		return (clear_code) ? 0 : exit_code;
> > +
> > +	/*
> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> > +	 * could be clear now.  We act as if SIGCONT is received after
> > +	 * TASK_TRACED is entered - ignore it.
> > +	 */
> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> > +		gstop_done = task_participate_group_stop(current);
> > +
> > +	/*
> > +	 * Notify parents of the stop.
> > +	 *
> > +	 * While ptraced, there are two parents - the ptracer and
> > +	 * the real_parent of the group_leader.  The ptracer should
> > +	 * know about every stop while the real parent is only
> > +	 * interested in the completion of group stop.  The states
> > +	 * for the two don't interact with each other.  Notify
> > +	 * separately unless they're gonna be duplicates.
> > +	 */
> > +	do_notify_parent_cldstop(current, true, why);
> > +	if (gstop_done && ptrace_reparented(current))
> > +		do_notify_parent_cldstop(current, false, why);
>
> This doesn't look right too. The parent should be notified only after
> we set __state = TASK_TRACED and ->exit code.
>
> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
> wakes it up, debugger calls wait_task_stopped() and then it will sleep
> again, task_stopped_code() returns 0.
>
> This can be probably fixed if you remove the lockless (fast path)
> task_stopped_code() check in wait_task_stopped(), but this is not
> nice performance-wise...

On the other hand, I don't understand why did you move the callsite
of do_notify_parent_cldstop() up... just don't do this?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 15:00             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Eric W. Biederman wrote:
> >
> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> >  		spin_lock_irq(&current->sighand->siglock);
> >  	}
> >
> > +	/* Don't stop if current is not ptraced */
> > +	if (unlikely(!current->ptrace))
> > +		return (clear_code) ? 0 : exit_code;
> > +
> > +	/*
> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> > +	 * could be clear now.  We act as if SIGCONT is received after
> > +	 * TASK_TRACED is entered - ignore it.
> > +	 */
> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> > +		gstop_done = task_participate_group_stop(current);
> > +
> > +	/*
> > +	 * Notify parents of the stop.
> > +	 *
> > +	 * While ptraced, there are two parents - the ptracer and
> > +	 * the real_parent of the group_leader.  The ptracer should
> > +	 * know about every stop while the real parent is only
> > +	 * interested in the completion of group stop.  The states
> > +	 * for the two don't interact with each other.  Notify
> > +	 * separately unless they're gonna be duplicates.
> > +	 */
> > +	do_notify_parent_cldstop(current, true, why);
> > +	if (gstop_done && ptrace_reparented(current))
> > +		do_notify_parent_cldstop(current, false, why);
>
> This doesn't look right too. The parent should be notified only after
> we set __state = TASK_TRACED and ->exit code.
>
> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
> wakes it up, debugger calls wait_task_stopped() and then it will sleep
> again, task_stopped_code() returns 0.
>
> This can be probably fixed if you remove the lockless (fast path)
> task_stopped_code() check in wait_task_stopped(), but this is not
> nice performance-wise...

On the other hand, I don't understand why did you move the callsite
of do_notify_parent_cldstop() up... just don't do this?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:14           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

ACK, but I think that the changelog is wrong.

We could do this right after may_ptrace_stop() has gone. This doesn't
depend on the previous changes in this series.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 15:14           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

ACK, but I think that the changelog is wrong.

We could do this right after may_ptrace_stop() has gone. This doesn't
depend on the previous changes in this series.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:20           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> +	if (lock_task_sighand(child, &flags)) {
> +		if (child->ptrace && child->parent == current) {
> +			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			/*
> +			 * child->sighand can't be NULL, release_task()
> +			 * does ptrace_unlink() before __exit_signal().
> +			 */
> +			if (ignore_state || ptrace_freeze_traced(child))
> +				ret = 0;

The comment above is no longer relevant, it should be removed.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-04-27 15:20           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> +	if (lock_task_sighand(child, &flags)) {
> +		if (child->ptrace && child->parent == current) {
> +			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			/*
> +			 * child->sighand can't be NULL, release_task()
> +			 * does ptrace_unlink() before __exit_signal().
> +			 */
> +			if (ignore_state || ptrace_freeze_traced(child))
> +				ret = 0;

The comment above is no longer relevant, it should be removed.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:41           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>  		return;
>
>  	WARN_ON(!task->ptrace || task->parent != current);
> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>  	 * Recheck state under the lock to close this race.
>  	 */
>  	spin_lock_irq(&task->sighand->siglock);

Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
The tracee can be already woken up by ptrace_resume(), but it is possible that
it didn't clear DELAY_WAKEKILL yet.

Now, before we take ->siglock, the tracee can exit and another thread can do
wait() and reap this task.

Also, I think the comment above should be updated. I agree, it makes sense to
re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared.

> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
>  	/* LISTENING can be set only during STOP traps, clear it */
>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

minor, but

	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);

looks better.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 15:41           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>  		return;
>
>  	WARN_ON(!task->ptrace || task->parent != current);
> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>  	 * Recheck state under the lock to close this race.
>  	 */
>  	spin_lock_irq(&task->sighand->siglock);

Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
The tracee can be already woken up by ptrace_resume(), but it is possible that
it didn't clear DELAY_WAKEKILL yet.

Now, before we take ->siglock, the tracee can exit and another thread can do
wait() and reap this task.

Also, I think the comment above should be updated. I agree, it makes sense to
re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared.

> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
>  	/* LISTENING can be set only during STOP traps, clear it */
>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

minor, but

	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);

looks better.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                     ` (3 preceding siblings ...)
  2022-04-25 17:47   ` Oleg Nesterov
@ 2022-04-27 15:53   ` Oleg Nesterov
  2022-04-27 21:57     ` Eric W. Biederman
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
>  		goto out_put_task_struct;
>  
>  	ret = arch_ptrace(child, request, addr, data);
> -	if (ret || request != PTRACE_DETACH)
> -		ptrace_unfreeze_traced(child);
> +	ptrace_unfreeze_traced(child);

Forgot to mention... whatever we do this doesn't look right.

ptrace_unfreeze_traced() must not be called if the tracee was untraced,
anothet debugger can come after that. I agree, the current code looks
a bit confusing, perhaps it makes sense to re-write it:

	if (request == PTRACE_DETACH && ret == 0)
		; /* nothing to do, no longer traced by us */
	else
		ptrace_unfreeze_traced(child);

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
  (?)
  (?)
@ 2022-04-27 16:09         ` Oleg Nesterov
  2022-04-27 16:33           ` Eric W. Biederman
  -1 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 16:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	 */
>  	if (lock_task_sighand(child, &flags)) {
>  		if (child->ptrace && child->parent == current) {
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);

This WARN_ON() doesn't look right.

It is possible that this child was traced by another task and PTRACE_DETACH'ed,
but it didn't clear DELAY_WAKEKILL.

If the new debugger attaches and calls ptrace() before the child takes siglock
ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 16:09         ` Oleg Nesterov
@ 2022-04-27 16:33           ` Eric W. Biederman
  2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 16:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	 */
>>  	if (lock_task_sighand(child, &flags)) {
>>  		if (child->ptrace && child->parent == current) {
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>
> This WARN_ON() doesn't look right.
>
> It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> but it didn't clear DELAY_WAKEKILL.

That would be a bug.  That would mean that PTRACE_DETACHED process can
not be SIGKILL'd.

> If the new debugger attaches and calls ptrace() before the child takes siglock
> ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 16:33           ` Eric W. Biederman
@ 2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 04/26, Eric W. Biederman wrote:
> >>
> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> >>  	 */
> >>  	if (lock_task_sighand(child, &flags)) {
> >>  		if (child->ptrace && child->parent == current) {
> >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> >
> > This WARN_ON() doesn't look right.
> >
> > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > but it didn't clear DELAY_WAKEKILL.
>
> That would be a bug.  That would mean that PTRACE_DETACHED process can
> not be SIGKILL'd.

Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
SIGKILL after that.

Oleg.

> > If the new debugger attaches and calls ptrace() before the child takes siglock
> > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> 
> Eric
> 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 04/26, Eric W. Biederman wrote:
> >>
> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> >>  	 */
> >>  	if (lock_task_sighand(child, &flags)) {
> >>  		if (child->ptrace && child->parent == current) {
> >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> >
> > This WARN_ON() doesn't look right.
> >
> > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > but it didn't clear DELAY_WAKEKILL.
>
> That would be a bug.  That would mean that PTRACE_DETACHED process can
> not be SIGKILL'd.

Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
SIGKILL after that.

Oleg.

> > If the new debugger attaches and calls ptrace() before the child takes siglock
> > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> 
> Eric
> 


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 17:18               ` Oleg Nesterov
@ 2022-04-27 17:21                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/27, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> > > On 04/26, Eric W. Biederman wrote:
> > >>
> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > >>  	 */
> > >>  	if (lock_task_sighand(child, &flags)) {
> > >>  		if (child->ptrace && child->parent == current) {
> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> > >
> > > This WARN_ON() doesn't look right.
> > >
> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > > but it didn't clear DELAY_WAKEKILL.
> >
> > That would be a bug.  That would mean that PTRACE_DETACHED process can
> > not be SIGKILL'd.
>
> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
> SIGKILL after that.

Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oleg.

> > > If the new debugger attaches and calls ptrace() before the child takes siglock
> > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> > 
> > Eric
> > 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:21                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/27, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> > > On 04/26, Eric W. Biederman wrote:
> > >>
> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > >>  	 */
> > >>  	if (lock_task_sighand(child, &flags)) {
> > >>  		if (child->ptrace && child->parent == current) {
> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> > >
> > > This WARN_ON() doesn't look right.
> > >
> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > > but it didn't clear DELAY_WAKEKILL.
> >
> > That would be a bug.  That would mean that PTRACE_DETACHED process can
> > not be SIGKILL'd.
>
> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
> SIGKILL after that.

Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oleg.

> > > If the new debugger attaches and calls ptrace() before the child takes siglock
> > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> > 
> > Eric
> > 


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 17:21                 ` Oleg Nesterov
@ 2022-04-27 17:31                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 17:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/27, Eric W. Biederman wrote:
>> >
>> > Oleg Nesterov <oleg@redhat.com> writes:
>> >
>> > > On 04/26, Eric W. Biederman wrote:
>> > >>
>> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> > >>  	 */
>> > >>  	if (lock_task_sighand(child, &flags)) {
>> > >>  		if (child->ptrace && child->parent == current) {
>> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > >
>> > > This WARN_ON() doesn't look right.
>> > >
>> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
>> > > but it didn't clear DELAY_WAKEKILL.
>> >
>> > That would be a bug.  That would mean that PTRACE_DETACHED process can
>> > not be SIGKILL'd.
>>
>> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
>> SIGKILL after that.
>
> Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
> up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oh.  You are talking about the window when between clearing the
traced state and when tracee resumes executing and clears
JOBCTL_DELAY_WAKEKILL.

I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked.

That requires both ptrace_attach and ptrace_check_attach for the new
tracer to happen before the tracee is scheduled to run.

I agree.  I think the WARN_ON could reasonably be moved a bit later,
but I don't know that the WARN_ON is important. I simply kept it because
it seemed to make sense.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:31                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 17:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/27, Eric W. Biederman wrote:
>> >
>> > Oleg Nesterov <oleg@redhat.com> writes:
>> >
>> > > On 04/26, Eric W. Biederman wrote:
>> > >>
>> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> > >>  	 */
>> > >>  	if (lock_task_sighand(child, &flags)) {
>> > >>  		if (child->ptrace && child->parent == current) {
>> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > >
>> > > This WARN_ON() doesn't look right.
>> > >
>> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
>> > > but it didn't clear DELAY_WAKEKILL.
>> >
>> > That would be a bug.  That would mean that PTRACE_DETACHED process can
>> > not be SIGKILL'd.
>>
>> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
>> SIGKILL after that.
>
> Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
> up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oh.  You are talking about the window when between clearing the
traced state and when tracee resumes executing and clears
JOBCTL_DELAY_WAKEKILL.

I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked.

That requires both ptrace_attach and ptrace_check_attach for the new
tracer to happen before the tracee is scheduled to run.

I agree.  I think the WARN_ON could reasonably be moved a bit later,
but I don't know that the WARN_ON is important. I simply kept it because
it seemed to make sense.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 15:00             ` Oleg Nesterov
@ 2022-04-27 21:52               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/26, Eric W. Biederman wrote:
>> >
>> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> >  		spin_lock_irq(&current->sighand->siglock);
>> >  	}
>> >
>> > +	/* Don't stop if current is not ptraced */
>> > +	if (unlikely(!current->ptrace))
>> > +		return (clear_code) ? 0 : exit_code;
>> > +
>> > +	/*
>> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
>> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
>> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
>> > +	 * could be clear now.  We act as if SIGCONT is received after
>> > +	 * TASK_TRACED is entered - ignore it.
>> > +	 */
>> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
>> > +		gstop_done = task_participate_group_stop(current);
>> > +
>> > +	/*
>> > +	 * Notify parents of the stop.
>> > +	 *
>> > +	 * While ptraced, there are two parents - the ptracer and
>> > +	 * the real_parent of the group_leader.  The ptracer should
>> > +	 * know about every stop while the real parent is only
>> > +	 * interested in the completion of group stop.  The states
>> > +	 * for the two don't interact with each other.  Notify
>> > +	 * separately unless they're gonna be duplicates.
>> > +	 */
>> > +	do_notify_parent_cldstop(current, true, why);
>> > +	if (gstop_done && ptrace_reparented(current))
>> > +		do_notify_parent_cldstop(current, false, why);
>>
>> This doesn't look right too. The parent should be notified only after
>> we set __state = TASK_TRACED and ->exit code.
>>
>> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
>> wakes it up, debugger calls wait_task_stopped() and then it will sleep
>> again, task_stopped_code() returns 0.
>>
>> This can be probably fixed if you remove the lockless (fast path)
>> task_stopped_code() check in wait_task_stopped(), but this is not
>> nice performance-wise...

Another detail I have overlooked.  Thank you.

Or we can change task_stopped_code look something like:

static int *task_stopped_code(struct task_struct *p, bool ptrace)
{
	if (ptrace) {
-		if (task_is_traced(p) && !(p->jobctl & JOBCTL_LISTENING))
+		if (p->ptrace && !(p->jobctl & JOBCTL_LISTENING))
			return &p->exit_code;
	} else {
		if (p->signal->flags & SIGNAL_STOP_STOPPED)
			return &p->signal->group_exit_code;
	}
	return NULL;
}

I probably need to do a little bit more to ensure that it isn't an
actual process exit_code in p->exit_code.  But the we don't have to
limit ourselves to being precisely in the task_is_traced stopped place
for the fast path.


> On the other hand, I don't understand why did you move the callsite
> of do_notify_parent_cldstop() up... just don't do this?

My goal and I still think it makes sense (if not my implementation)
is to move set_special_state as close as possible to schedule().

That way we can avoid sleeping spin_locks clobbering it and making
our life difficult.

My hope is we can just clean up ptrace_stop instead of making it more
complicated and harder to follow.  Not that I am fundamentally opposed
to the quiesce bit but the code is already very hard to follow because
of all it's nuance and complexity, and I would really like to reduce
that complexity if we can possibly figure out how.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 21:52               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/26, Eric W. Biederman wrote:
>> >
>> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> >  		spin_lock_irq(&current->sighand->siglock);
>> >  	}
>> >
>> > +	/* Don't stop if current is not ptraced */
>> > +	if (unlikely(!current->ptrace))
>> > +		return (clear_code) ? 0 : exit_code;
>> > +
>> > +	/*
>> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
>> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
>> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
>> > +	 * could be clear now.  We act as if SIGCONT is received after
>> > +	 * TASK_TRACED is entered - ignore it.
>> > +	 */
>> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
>> > +		gstop_done = task_participate_group_stop(current);
>> > +
>> > +	/*
>> > +	 * Notify parents of the stop.
>> > +	 *
>> > +	 * While ptraced, there are two parents - the ptracer and
>> > +	 * the real_parent of the group_leader.  The ptracer should
>> > +	 * know about every stop while the real parent is only
>> > +	 * interested in the completion of group stop.  The states
>> > +	 * for the two don't interact with each other.  Notify
>> > +	 * separately unless they're gonna be duplicates.
>> > +	 */
>> > +	do_notify_parent_cldstop(current, true, why);
>> > +	if (gstop_done && ptrace_reparented(current))
>> > +		do_notify_parent_cldstop(current, false, why);
>>
>> This doesn't look right too. The parent should be notified only after
>> we set __state = TASK_TRACED and ->exit code.
>>
>> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
>> wakes it up, debugger calls wait_task_stopped() and then it will sleep
>> again, task_stopped_code() returns 0.
>>
>> This can be probably fixed if you remove the lockless (fast path)
>> task_stopped_code() check in wait_task_stopped(), but this is not
>> nice performance-wise...

Another detail I have overlooked.  Thank you.

Or we can change task_stopped_code look something like:

static int *task_stopped_code(struct task_struct *p, bool ptrace)
{
	if (ptrace) {
-		if (task_is_traced(p) && !(p->jobctl & JOBCTL_LISTENING))
+		if (p->ptrace && !(p->jobctl & JOBCTL_LISTENING))
			return &p->exit_code;
	} else {
		if (p->signal->flags & SIGNAL_STOP_STOPPED)
			return &p->signal->group_exit_code;
	}
	return NULL;
}

I probably need to do a little bit more to ensure that it isn't an
actual process exit_code in p->exit_code.  But the we don't have to
limit ourselves to being precisely in the task_is_traced stopped place
for the fast path.


> On the other hand, I don't understand why did you move the callsite
> of do_notify_parent_cldstop() up... just don't do this?

My goal and I still think it makes sense (if not my implementation)
is to move set_special_state as close as possible to schedule().

That way we can avoid sleeping spin_locks clobbering it and making
our life difficult.

My hope is we can just clean up ptrace_stop instead of making it more
complicated and harder to follow.  Not that I am fundamentally opposed
to the quiesce bit but the code is already very hard to follow because
of all it's nuance and complexity, and I would really like to reduce
that complexity if we can possibly figure out how.

Eric



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-27 15:53   ` Oleg Nesterov
@ 2022-04-27 21:57     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
>>  		goto out_put_task_struct;
>>  
>>  	ret = arch_ptrace(child, request, addr, data);
>> -	if (ret || request != PTRACE_DETACH)
>> -		ptrace_unfreeze_traced(child);
>> +	ptrace_unfreeze_traced(child);
>
> Forgot to mention... whatever we do this doesn't look right.
>
> ptrace_unfreeze_traced() must not be called if the tracee was untraced,
> anothet debugger can come after that. I agree, the current code looks
> a bit confusing, perhaps it makes sense to re-write it:
>
> 	if (request == PTRACE_DETACH && ret == 0)
> 		; /* nothing to do, no longer traced by us */
> 	else
> 		ptrace_unfreeze_traced(child);

This was a bug in my original JOBCTL_DELAY_WAITKILL patch and it was
just cut and pasted here.  I thought it made sense when I was throwing
things together but when I looked more closely I realized that it is
not safe.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 15:41           ` Oleg Nesterov
@ 2022-04-27 22:35             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 22:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

2> On 04/26, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>>  		return;
>>
>>  	WARN_ON(!task->ptrace || task->parent != current);
>> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>>  	 * Recheck state under the lock to close this race.
>>  	 */
>>  	spin_lock_irq(&task->sighand->siglock);
>
> Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
> The tracee can be already woken up by ptrace_resume(), but it is possible that
> it didn't clear DELAY_WAKEKILL yet.

Yes.  The subtle differences in when __TASK_TRACED and
JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues.

This "WARN_ON(!task->ptrace || task->parent != current);" also now
needs to be inside siglock, because the __TASK_TRACED is insufficient.


> Now, before we take ->siglock, the tracee can exit and another thread can do
> wait() and reap this task.
>
> Also, I think the comment above should be updated. I agree, it makes sense to
> re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
> need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
> wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already
> cleared.

I think you are right about it being safe, but I am having a hard time
convincing myself that is true.  I want to be very careful sending
__TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle
spurious wake_ups.

So I think adding task_is_traced to the test to verify the task
is still frozen.

static void ptrace_unfreeze_traced(struct task_struct *task)
{
	unsigned long flags;

	/*
	 * Verify the task is still frozen before unfreezing it,
	 * ptrace_resume could have unfrozen us.
	 */
	if (lock_task_sighand(task, &flags)) {
		if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) &&
		    task_is_traced(task)) {
			task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
			if (__fatal_signal_pending(task))
				wake_up_state(task, __TASK_TRACED);
		}
		unlock_task_sighand(task, &flags);
	}
}

>> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>
>>  	/* LISTENING can be set only during STOP traps, clear it */
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> minor, but
>
> 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);
>
> looks better.

Yes.


Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 22:35             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 22:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

2> On 04/26, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>>  		return;
>>
>>  	WARN_ON(!task->ptrace || task->parent != current);
>> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>>  	 * Recheck state under the lock to close this race.
>>  	 */
>>  	spin_lock_irq(&task->sighand->siglock);
>
> Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
> The tracee can be already woken up by ptrace_resume(), but it is possible that
> it didn't clear DELAY_WAKEKILL yet.

Yes.  The subtle differences in when __TASK_TRACED and
JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues.

This "WARN_ON(!task->ptrace || task->parent != current);" also now
needs to be inside siglock, because the __TASK_TRACED is insufficient.


> Now, before we take ->siglock, the tracee can exit and another thread can do
> wait() and reap this task.
>
> Also, I think the comment above should be updated. I agree, it makes sense to
> re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
> need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
> wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already
> cleared.

I think you are right about it being safe, but I am having a hard time
convincing myself that is true.  I want to be very careful sending
__TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle
spurious wake_ups.

So I think adding task_is_traced to the test to verify the task
is still frozen.

static void ptrace_unfreeze_traced(struct task_struct *task)
{
	unsigned long flags;

	/*
	 * Verify the task is still frozen before unfreezing it,
	 * ptrace_resume could have unfrozen us.
	 */
	if (lock_task_sighand(task, &flags)) {
		if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) &&
		    task_is_traced(task)) {
			task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
			if (__fatal_signal_pending(task))
				wake_up_state(task, __TASK_TRACED);
		}
		unlock_task_sighand(task, &flags);
	}
}

>> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>
>>  	/* LISTENING can be set only during STOP traps, clear it */
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> minor, but
>
> 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);
>
> looks better.

Yes.


Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 23:05           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..1947c85aa9d9 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>  
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>  }
>  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>  {

Grrr.  While looking through everything today I have realized that there
is a bug.

Suppose we have 3 processes: TRACER, TRACEE, KILLER.

Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
been dropped.

The TRACER process has performed ptrace_attach on TRACEE and is in the
middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.

Then comes in the KILLER process and sends the TRACEE a SIGKILL.
The TRACEE __state remains TASK_TRACED, as designed.

The bug appears when the TRACEE makes it to schedule().  Inside
schedule there is a call to signal_pending_state() which notices
a SIGKILL is pending and refuses to sleep.

I could avoid setting TIF_SIGPENDING in signal_wake_up but that
is insufficient as another signal may be pending.

I could avoid marking the task as __fatal_signal_pending but then
where would the information that the task needs to become
__fatal_signal_pending go.

Hmm.

This looks like I need my other pending cleanup which introduces a
helper to get this idea to work.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 23:05           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..1947c85aa9d9 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>  
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>  }
>  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>  {

Grrr.  While looking through everything today I have realized that there
is a bug.

Suppose we have 3 processes: TRACER, TRACEE, KILLER.

Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
been dropped.

The TRACER process has performed ptrace_attach on TRACEE and is in the
middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.

Then comes in the KILLER process and sends the TRACEE a SIGKILL.
The TRACEE __state remains TASK_TRACED, as designed.

The bug appears when the TRACEE makes it to schedule().  Inside
schedule there is a call to signal_pending_state() which notices
a SIGKILL is pending and refuses to sleep.

I could avoid setting TIF_SIGPENDING in signal_wake_up but that
is insufficient as another signal may be pending.

I could avoid marking the task as __fatal_signal_pending but then
where would the information that the task needs to become
__fatal_signal_pending go.

Hmm.

This looks like I need my other pending cleanup which introduces a
helper to get this idea to work.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-26 23:34   ` Eric W. Biederman
@ 2022-04-28 10:00     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Tue, Apr 26, 2022 at 06:34:09PM -0500, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > Currently ptrace_stop() / do_signal_stop() rely on the special states
> > TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> > state exists only in task->__state and nowhere else.
> >
> > There's two spots of bother with this:
> >
> >  - PREEMPT_RT has task->saved_state which complicates matters,
> >    meaning task_is_{traced,stopped}() needs to check an additional
> >    variable.
> >
> >  - An alternative freezer implementation that itself relies on a
> >    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> >    result in misbehaviour.
> >
> > As such, add additional state to task->jobctl to track this state
> > outside of task->__state.
> >
> > NOTE: this doesn't actually fix anything yet, just adds extra state.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
> >  	 * By using wake_up_state, we ensure the process will wake up and
> >  	 * handle its death signal.
> >  	 */
> > -	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
> > +	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
> > +		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
> > +	else
> >  		kick_process(t);
> >  }
> 
> This hunk is subtle and I don't think it is actually what we want if the
> code is going to be robust against tsk->__state becoming TASK_FROZEN.

Oooh, indeed. Yes, let me go back to that resume based thing as you
suggest.

But first, let me go read all your patches :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/9] ptrace: cleaning up ptrace_stop
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-28 10:07         ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook

On Tue, Apr 26, 2022 at 05:50:21PM -0500, Eric W. Biederman wrote:
> ....  Peter Zijlstra has
> been rewriting the classic freezer and in earlier parts of this
> discussion so I presume it is also a problem for PREEMPT_RT.

Ah, the freezer thing is in fact a sched/arm64 issue, the common issue
between these two issues is ptrace though.

Specifically, on recent arm64 chips only a subset of CPUs can execute
arm32 code and 32bit processes are restricted to that subset. If by some
mishap you try and execute a 32bit task on a non-capable CPU it gets
terminated without prejudice.

Now, the current freezer has this problem that tasks can spuriously thaw
too soon (where too soon is before SMP is restored) which leads to these
32bit tasks being killed dead.

That, and it was a good excuse to fix up the current freezer :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/9] ptrace: cleaning up ptrace_stop
@ 2022-04-28 10:07         ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook

On Tue, Apr 26, 2022 at 05:50:21PM -0500, Eric W. Biederman wrote:
> ....  Peter Zijlstra has
> been rewriting the classic freezer and in earlier parts of this
> discussion so I presume it is also a problem for PREEMPT_RT.

Ah, the freezer thing is in fact a sched/arm64 issue, the common issue
between these two issues is ptrace though.

Specifically, on recent arm64 chips only a subset of CPUs can execute
arm32 code and 32bit processes are restricted to that subset. If by some
mishap you try and execute a 32bit task on a non-capable CPU it gets
terminated without prejudice.

Now, the current freezer has this problem that tasks can spuriously thaw
too soon (where too soon is before SMP is restored) which leads to these
32bit tasks being killed dead.

That, and it was a good excuse to fix up the current freezer :-)

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 1/9] signal: Rename send_signal send_signal_locked
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-28 10:27           ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:03PM -0500, Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make
> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/signal.h |  2 ++
>  kernel/signal.c        | 24 ++++++++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index a6db6f2ae113..55605bdf5ce9 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
>  extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
>  			       struct task_struct *p, enum pid_type type);
>  extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
> +extern int send_signal_locked(int sig, struct kernel_siginfo *info,
> +			      struct task_struct *p, enum pid_type type);
>  extern int sigprocmask(int, sigset_t *, sigset_t *);
>  extern void set_current_blocked(sigset_t *);
>  extern void __set_current_blocked(const sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 30cd1ca43bcd..b0403197b0ad 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
> -			enum pid_type type, bool force)
> +static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> +				struct task_struct *t, enum pid_type type, bool force)
>  {
>  	struct sigpending *pending;
>  	struct sigqueue *q;

While there, could you please replace that assert_spin_locked() with
lockdep_assert_held(&t->sighand->siglock) ?

The distinction being that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build without
lockdep.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 1/9] signal: Rename send_signal send_signal_locked
@ 2022-04-28 10:27           ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:03PM -0500, Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make
> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/signal.h |  2 ++
>  kernel/signal.c        | 24 ++++++++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index a6db6f2ae113..55605bdf5ce9 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
>  extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
>  			       struct task_struct *p, enum pid_type type);
>  extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
> +extern int send_signal_locked(int sig, struct kernel_siginfo *info,
> +			      struct task_struct *p, enum pid_type type);
>  extern int sigprocmask(int, sigset_t *, sigset_t *);
>  extern void set_current_blocked(sigset_t *);
>  extern void __set_current_blocked(const sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 30cd1ca43bcd..b0403197b0ad 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
> -			enum pid_type type, bool force)
> +static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> +				struct task_struct *t, enum pid_type type, bool force)
>  {
>  	struct sigpending *pending;
>  	struct sigqueue *q;

While there, could you please replace that assert_spin_locked() with
lockdep_assert_held(&t->sighand->siglock) ?

The distinction being that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build without
lockdep.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-28 10:38           ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:08PM -0500, Eric W. Biederman wrote:
> Now that siglock keeps tsk->parent and tsk->real_parent constant
> require that do_notify_parent_cldstop is called with tsk->siglock held
> instead of the tasklist_lock.
> 
> As all of the callers of do_notify_parent_cldstop had to drop the
> siglock and take tasklist_lock this simplifies all of it's callers.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/signal.c | 156 +++++++++++++++++-------------------------------
>  1 file changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 72d96614effc..584d67deb3cb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  				     bool for_ptracer, int why)
>  {
>  	struct kernel_siginfo info;
> -	unsigned long flags;
>  	struct task_struct *parent;
>  	struct sighand_struct *sighand;
> +	bool lock;
>  	u64 utime, stime;
>  
> +	assert_spin_locked(&tsk->sighand->siglock);

lockdep_assert_held() please...

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 10:38           ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:08PM -0500, Eric W. Biederman wrote:
> Now that siglock keeps tsk->parent and tsk->real_parent constant
> require that do_notify_parent_cldstop is called with tsk->siglock held
> instead of the tasklist_lock.
> 
> As all of the callers of do_notify_parent_cldstop had to drop the
> siglock and take tasklist_lock this simplifies all of it's callers.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/signal.c | 156 +++++++++++++++++-------------------------------
>  1 file changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 72d96614effc..584d67deb3cb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  				     bool for_ptracer, int why)
>  {
>  	struct kernel_siginfo info;
> -	unsigned long flags;
>  	struct task_struct *parent;
>  	struct sighand_struct *sighand;
> +	bool lock;
>  	u64 utime, stime;
>  
> +	assert_spin_locked(&tsk->sighand->siglock);

lockdep_assert_held() please...

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-27 15:14           ` Oleg Nesterov
@ 2022-04-28 10:42             ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> On 04/26, Eric W. Biederman wrote:
> >
> > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > was needed to detect the when ptrace_stop would decide not to stop
> > after calling "set_special_state(TASK_TRACED)".  With the recent
> > cleanups ptrace_stop will always stop after calling set_special_state.
> >
> > Take advatnage of this by no longer asking wait_task_inactive to
> > verify the state.  If a bug is hit and wait_task_inactive does not
> > succeed warn and return -ESRCH.
> 
> ACK, but I think that the changelog is wrong.
> 
> We could do this right after may_ptrace_stop() has gone. This doesn't
> depend on the previous changes in this series.

It very much does rely on there not being any blocking between
set_special_state() and schedule() tho. So all those PREEMPT_RT
spinlock->rt_mutex things need to be gone.

That is also the reason I couldn't do wait_task_inactive(task, 0) in the
other patch, I had to really match 'TASK_TRACED or TASK_FROZEN' any
other state must fail (specifically TASK_RTLOCK_WAIT must not match).

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 10:42             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> On 04/26, Eric W. Biederman wrote:
> >
> > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > was needed to detect the when ptrace_stop would decide not to stop
> > after calling "set_special_state(TASK_TRACED)".  With the recent
> > cleanups ptrace_stop will always stop after calling set_special_state.
> >
> > Take advatnage of this by no longer asking wait_task_inactive to
> > verify the state.  If a bug is hit and wait_task_inactive does not
> > succeed warn and return -ESRCH.
> 
> ACK, but I think that the changelog is wrong.
> 
> We could do this right after may_ptrace_stop() has gone. This doesn't
> depend on the previous changes in this series.

It very much does rely on there not being any blocking between
set_special_state() and schedule() tho. So all those PREEMPT_RT
spinlock->rt_mutex things need to be gone.

That is also the reason I couldn't do wait_task_inactive(task, 0) in the
other patch, I had to really match 'TASK_TRACED or TASK_FROZEN' any
other state must fail (specifically TASK_RTLOCK_WAIT must not match).

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 10:42             ` Peter Zijlstra
@ 2022-04-28 11:19               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> > On 04/26, Eric W. Biederman wrote:
> > >
> > > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > > was needed to detect the when ptrace_stop would decide not to stop
> > > after calling "set_special_state(TASK_TRACED)".  With the recent
> > > cleanups ptrace_stop will always stop after calling set_special_state.
> > >
> > > Take advatnage of this by no longer asking wait_task_inactive to
> > > verify the state.  If a bug is hit and wait_task_inactive does not
> > > succeed warn and return -ESRCH.
> >
> > ACK, but I think that the changelog is wrong.
> >
> > We could do this right after may_ptrace_stop() has gone. This doesn't
> > depend on the previous changes in this series.
>
> It very much does rely on there not being any blocking between
> set_special_state() and schedule() tho. So all those PREEMPT_RT
> spinlock->rt_mutex things need to be gone.

Yes sure. But this patch doesn't add the new problems, imo.

Yes we can hit the WARN_ON_ONCE(!wait_task_inactive()), but this is
correct in that it should not fail, and this is what we need to fix.

> That is also the reason I couldn't do wait_task_inactive(task, 0)

Ah, I din't notice this patch uses wait_task_inactive(child, 0),
I think it should do wait_task_inactive(child, __TASK_TRACED).

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 11:19               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> > On 04/26, Eric W. Biederman wrote:
> > >
> > > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > > was needed to detect the when ptrace_stop would decide not to stop
> > > after calling "set_special_state(TASK_TRACED)".  With the recent
> > > cleanups ptrace_stop will always stop after calling set_special_state.
> > >
> > > Take advatnage of this by no longer asking wait_task_inactive to
> > > verify the state.  If a bug is hit and wait_task_inactive does not
> > > succeed warn and return -ESRCH.
> >
> > ACK, but I think that the changelog is wrong.
> >
> > We could do this right after may_ptrace_stop() has gone. This doesn't
> > depend on the previous changes in this series.
>
> It very much does rely on there not being any blocking between
> set_special_state() and schedule() tho. So all those PREEMPT_RT
> spinlock->rt_mutex things need to be gone.

Yes sure. But this patch doesn't add the new problems, imo.

Yes we can hit the WARN_ON_ONCE(!wait_task_inactive()), but this is
correct in that it should not fail, and this is what we need to fix.

> That is also the reason I couldn't do wait_task_inactive(task, 0)

Ah, I din't notice this patch uses wait_task_inactive(child, 0),
I think it should do wait_task_inactive(child, __TASK_TRACED).

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 11:19               ` Oleg Nesterov
@ 2022-04-28 13:54                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 13:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > That is also the reason I couldn't do wait_task_inactive(task, 0)
> 
> Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> I think it should do wait_task_inactive(child, __TASK_TRACED).

Shouldn't we then switch wait_task_inactive() so have & matching instead
of the current ==.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 13:54                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 13:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > That is also the reason I couldn't do wait_task_inactive(task, 0)
> 
> Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> I think it should do wait_task_inactive(child, __TASK_TRACED).

Shouldn't we then switch wait_task_inactive() so have & matching instead
of the current ==.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 13:54                 ` Peter Zijlstra
@ 2022-04-28 14:57                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > > That is also the reason I couldn't do wait_task_inactive(task, 0)
> >
> > Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> > I think it should do wait_task_inactive(child, __TASK_TRACED).
>
> Shouldn't we then switch wait_task_inactive() so have & matching instead
> of the current ==.

Sorry, I don't understand the context...

As long as ptrace_freeze_traced() sets __state == __TASK_TRACED (as it
currently does) wait_task_inactive(__TASK_TRACED) is what we need ?

After we change it to use JOBCTL_DELAY_WAKEKILL and not abuse __state,
ptrace_attach() should use wait_task_inactive(TASK_TRACED), but this
depends on what exactly we are going to do...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 14:57                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > > That is also the reason I couldn't do wait_task_inactive(task, 0)
> >
> > Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> > I think it should do wait_task_inactive(child, __TASK_TRACED).
>
> Shouldn't we then switch wait_task_inactive() so have & matching instead
> of the current ==.

Sorry, I don't understand the context...

As long as ptrace_freeze_traced() sets __state == __TASK_TRACED (as it
currently does) wait_task_inactive(__TASK_TRACED) is what we need ?

After we change it to use JOBCTL_DELAY_WAKEKILL and not abuse __state,
ptrace_attach() should use wait_task_inactive(TASK_TRACED), but this
depends on what exactly we are going to do...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 23:05           ` Eric W. Biederman
@ 2022-04-28 15:11             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 15:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> > index 3c8b34876744..1947c85aa9d9 100644
> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
> >
> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >  {
> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> >  }
> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> >  {
>
> Grrr.  While looking through everything today I have realized that there
> is a bug.
>
> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>
> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
> been dropped.
>
> The TRACER process has performed ptrace_attach on TRACEE and is in the
> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>
> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
> The TRACEE __state remains TASK_TRACED, as designed.
>
> The bug appears when the TRACEE makes it to schedule().  Inside
> schedule there is a call to signal_pending_state() which notices
> a SIGKILL is pending and refuses to sleep.

And I think this is fine. This doesn't really differ from the case
when the tracee was killed before it takes siglock.

The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
ptrace_stop() can leak this flag. That is why I suggested to clear
it along with LISTENING/DELAY_WAKEKILL before return, exactly because
schedule() won't block if fatal_signal_pending() is true.

But may be I misunderstood you concern?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 15:11             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 15:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> > index 3c8b34876744..1947c85aa9d9 100644
> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
> >
> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >  {
> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> >  }
> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> >  {
>
> Grrr.  While looking through everything today I have realized that there
> is a bug.
>
> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>
> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
> been dropped.
>
> The TRACER process has performed ptrace_attach on TRACEE and is in the
> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>
> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
> The TRACEE __state remains TASK_TRACED, as designed.
>
> The bug appears when the TRACEE makes it to schedule().  Inside
> schedule there is a call to signal_pending_state() which notices
> a SIGKILL is pending and refuses to sleep.

And I think this is fine. This doesn't really differ from the case
when the tracee was killed before it takes siglock.

The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
ptrace_stop() can leak this flag. That is why I suggested to clear
it along with LISTENING/DELAY_WAKEKILL before return, exactly because
schedule() won't block if fatal_signal_pending() is true.

But may be I misunderstood you concern?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 14:57                   ` Oleg Nesterov
@ 2022-04-28 16:09                     ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:

> > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > of the current ==.
> 
> Sorry, I don't understand the context...

This.. I've always found it strange to have wti use a different matching
scheme from ttwu.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f259621f4c93..c039aef4c8fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || (READ_ONCE(p->__state) & match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 16:09                     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:

> > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > of the current ==.
> 
> Sorry, I don't understand the context...

This.. I've always found it strange to have wti use a different matching
scheme from ttwu.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f259621f4c93..c039aef4c8fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || (READ_ONCE(p->__state) & match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 16:09                     ` Peter Zijlstra
@ 2022-04-28 16:19                       ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:
>
> > > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > > of the current ==.
> >
> > Sorry, I don't understand the context...
>
> This.. I've always found it strange to have wti use a different matching
> scheme from ttwu.

Ah. This is what I understood (and I too thought about this), just I meant that
this patch from Eric (assuming wait_task_inactive() still uses __TASK_TRACED) is
fine without your change below.

Oleg.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f259621f4c93..c039aef4c8fe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		 * is actually now running somewhere else!
>  		 */
>  		while (task_running(rq, p)) {
> -			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> +			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
>  				return 0;
>  			cpu_relax();
>  		}
> @@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		running = task_running(rq, p);
>  		queued = task_on_rq_queued(p);
>  		ncsw = 0;
> -		if (!match_state || READ_ONCE(p->__state) == match_state)
> +		if (!match_state || (READ_ONCE(p->__state) & match_state))
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
>  		task_rq_unlock(rq, p, &rf);


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 16:19                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:
>
> > > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > > of the current ==.
> >
> > Sorry, I don't understand the context...
>
> This.. I've always found it strange to have wti use a different matching
> scheme from ttwu.

Ah. This is what I understood (and I too thought about this), just I meant that
this patch from Eric (assuming wait_task_inactive() still uses __TASK_TRACED) is
fine without your change below.

Oleg.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f259621f4c93..c039aef4c8fe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		 * is actually now running somewhere else!
>  		 */
>  		while (task_running(rq, p)) {
> -			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> +			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
>  				return 0;
>  			cpu_relax();
>  		}
> @@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		running = task_running(rq, p);
>  		queued = task_on_rq_queued(p);
>  		ncsw = 0;
> -		if (!match_state || READ_ONCE(p->__state) == match_state)
> +		if (!match_state || (READ_ONCE(p->__state) & match_state))
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
>  		task_rq_unlock(rq, p, &rf);


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-28 15:11             ` Oleg Nesterov
@ 2022-04-28 16:50               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 16:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Eric W. Biederman wrote:
>>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> > index 3c8b34876744..1947c85aa9d9 100644
>> > --- a/include/linux/sched/signal.h
>> > +++ b/include/linux/sched/signal.h
>> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>> >
>> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> >  }
>> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>>
>> Grrr.  While looking through everything today I have realized that there
>> is a bug.
>>
>> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>>
>> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
>> been dropped.
>>
>> The TRACER process has performed ptrace_attach on TRACEE and is in the
>> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>>
>> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
>> The TRACEE __state remains TASK_TRACED, as designed.
>>
>> The bug appears when the TRACEE makes it to schedule().  Inside
>> schedule there is a call to signal_pending_state() which notices
>> a SIGKILL is pending and refuses to sleep.
>
> And I think this is fine. This doesn't really differ from the case
> when the tracee was killed before it takes siglock.

Hmm.  Maybe.

> The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
> ptrace_stop() can leak this flag. That is why I suggested to clear
> it along with LISTENING/DELAY_WAKEKILL before return, exactly because
> schedule() won't block if fatal_signal_pending() is true.
>
> But may be I misunderstood you concern?

Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED
we were guaranteed that schedule() would stop if a SIGKILL was
received after that point.  As well as being immune from wake-ups
from SIGKILL.

I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have
implemented it.

The practical concern then seems to be that we are not guaranteed
wait_task_inactive will succeed.  Which means that it must continue
to include the TASK_TRACED bit.

Previously we were actually guaranteed in ptrace_check_attach that after
ptrace_freeze_traced would succeed as any pending fatal signal would
cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
stop schedule from sleeping.  The ptraced task would continue to be
ptraced, as all other ptrace operations are blocked by virtue of ptrace
being single threaded.

I think in my tired mind yesterday I thought it would messing things
up after schedule decided to sleep.  Still I would like to be able to
let wait_task_inactive not care about the state of the process it is
going to sleep for.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 16:50               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 16:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Eric W. Biederman wrote:
>>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> > index 3c8b34876744..1947c85aa9d9 100644
>> > --- a/include/linux/sched/signal.h
>> > +++ b/include/linux/sched/signal.h
>> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>> >
>> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> >  }
>> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>>
>> Grrr.  While looking through everything today I have realized that there
>> is a bug.
>>
>> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>>
>> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
>> been dropped.
>>
>> The TRACER process has performed ptrace_attach on TRACEE and is in the
>> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>>
>> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
>> The TRACEE __state remains TASK_TRACED, as designed.
>>
>> The bug appears when the TRACEE makes it to schedule().  Inside
>> schedule there is a call to signal_pending_state() which notices
>> a SIGKILL is pending and refuses to sleep.
>
> And I think this is fine. This doesn't really differ from the case
> when the tracee was killed before it takes siglock.

Hmm.  Maybe.

> The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
> ptrace_stop() can leak this flag. That is why I suggested to clear
> it along with LISTENING/DELAY_WAKEKILL before return, exactly because
> schedule() won't block if fatal_signal_pending() is true.
>
> But may be I misunderstood you concern?

Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED
we were guaranteed that schedule() would stop if a SIGKILL was
received after that point.  As well as being immune from wake-ups
from SIGKILL.

I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have
implemented it.

The practical concern then seems to be that we are not guaranteed
wait_task_inactive will succeed.  Which means that it must continue
to include the TASK_TRACED bit.

Previously we were actually guaranteed in ptrace_check_attach that after
ptrace_freeze_traced would succeed as any pending fatal signal would
cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
stop schedule from sleeping.  The ptraced task would continue to be
ptraced, as all other ptrace operations are blocked by virtue of ptrace
being single threaded.

I think in my tired mind yesterday I thought it would messing things
up after schedule decided to sleep.  Still I would like to be able to
let wait_task_inactive not care about the state of the process it is
going to sleep for.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:47               ` Eric W. Biederman
@ 2022-04-28 17:44                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 17:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:

> Hmm.  If we have the following process tree.
> 
>     A
>      \
>       B
>        \
>         C
> 
> Process A, B, and C are all in the same process group.
> Process A and B are setup to receive SIGCHILD when
> their process stops.
> 
> Process C traces process A.
> 
> When a sigstop is delivered to the group we can have:
> 
> Process B takes siglock(B) siglock(A) to notify the real_parent
> Process C takes siglock(C) siglock(B) to notify the real_parent
> Process A takes siglock(A) siglock(C) to notify the tracer
> 
> If they all take their local lock at the same time there is
> a deadlock.
> 
> I don't think the restriction that you can never ptrace anyone
> up the process tree is going to fly.  So it looks like I am back to the
> drawing board for this one.

I've not had time to fully appreciate the nested locking here, but if it
is possible to rework things to always take both locks at the same time,
then it would be possible to impose an arbitrary lock order on things
and break the cycle that way.

That is, simply order the locks by their heap address or something:

static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
{
	if (sh1 > sh2)
		swap(sh1, sh2)

	spin_lock_irq(&sh1->siglock);
	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
}


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 17:44                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 17:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:

> Hmm.  If we have the following process tree.
> 
>     A
>      \
>       B
>        \
>         C
> 
> Process A, B, and C are all in the same process group.
> Process A and B are setup to receive SIGCHILD when
> their process stops.
> 
> Process C traces process A.
> 
> When a sigstop is delivered to the group we can have:
> 
> Process B takes siglock(B) siglock(A) to notify the real_parent
> Process C takes siglock(C) siglock(B) to notify the real_parent
> Process A takes siglock(A) siglock(C) to notify the tracer
> 
> If they all take their local lock at the same time there is
> a deadlock.
> 
> I don't think the restriction that you can never ptrace anyone
> up the process tree is going to fly.  So it looks like I am back to the
> drawing board for this one.

I've not had time to fully appreciate the nested locking here, but if it
is possible to rework things to always take both locks at the same time,
then it would be possible to impose an arbitrary lock order on things
and break the cycle that way.

That is, simply order the locks by their heap address or something:

static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
{
	if (sh1 > sh2)
		swap(sh1, sh2)

	spin_lock_irq(&sh1->siglock);
	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
}


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 17:44                 ` Peter Zijlstra
@ 2022-04-28 18:22                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.

This is clear, but this is not that simple.

For example (with this series at least), ptrace_stop() already holds
current->sighand->siglock which (in particular) we need to protect
current->parent, but then we need current->parent->sighand->siglock
in do_notify_parent_cldstop().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 18:22                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.

This is clear, but this is not that simple.

For example (with this series at least), ptrace_stop() already holds
current->sighand->siglock which (in particular) we need to protect
current->parent, but then we need current->parent->sighand->siglock
in do_notify_parent_cldstop().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 17:44                 ` Peter Zijlstra
@ 2022-04-28 18:37                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>
>> Hmm.  If we have the following process tree.
>> 
>>     A
>>      \
>>       B
>>        \
>>         C
>> 
>> Process A, B, and C are all in the same process group.
>> Process A and B are setup to receive SIGCHILD when
>> their process stops.
>> 
>> Process C traces process A.
>> 
>> When a sigstop is delivered to the group we can have:
>> 
>> Process B takes siglock(B) siglock(A) to notify the real_parent
>> Process C takes siglock(C) siglock(B) to notify the real_parent
>> Process A takes siglock(A) siglock(C) to notify the tracer
>> 
>> If they all take their local lock at the same time there is
>> a deadlock.
>> 
>> I don't think the restriction that you can never ptrace anyone
>> up the process tree is going to fly.  So it looks like I am back to the
>> drawing board for this one.
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.
>
> That is, simply order the locks by their heap address or something:
>
> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
> {
> 	if (sh1 > sh2)
> 		swap(sh1, sh2)
>
> 	spin_lock_irq(&sh1->siglock);
> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
> }

You know it might be.  Especially given that the existing code is
already dropping siglock and grabbing tasklist_lock.

It would take a potentially triple lock function to lock
the task it's real_parent and it's tracer (aka parent).

That makes this possible to consider is that notifying the ``parents''
is a fundamental part of the operation so we know we are going to
need the lock so we can move it up.

Throw in a pinch of lock_task_sighand and the triple lock function
gets quite interesting.

It is certainly worth trying, and I will.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 18:37                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>
>> Hmm.  If we have the following process tree.
>> 
>>     A
>>      \
>>       B
>>        \
>>         C
>> 
>> Process A, B, and C are all in the same process group.
>> Process A and B are setup to receive SIGCHILD when
>> their process stops.
>> 
>> Process C traces process A.
>> 
>> When a sigstop is delivered to the group we can have:
>> 
>> Process B takes siglock(B) siglock(A) to notify the real_parent
>> Process C takes siglock(C) siglock(B) to notify the real_parent
>> Process A takes siglock(A) siglock(C) to notify the tracer
>> 
>> If they all take their local lock at the same time there is
>> a deadlock.
>> 
>> I don't think the restriction that you can never ptrace anyone
>> up the process tree is going to fly.  So it looks like I am back to the
>> drawing board for this one.
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.
>
> That is, simply order the locks by their heap address or something:
>
> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
> {
> 	if (sh1 > sh2)
> 		swap(sh1, sh2)
>
> 	spin_lock_irq(&sh1->siglock);
> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
> }

You know it might be.  Especially given that the existing code is
already dropping siglock and grabbing tasklist_lock.

It would take a potentially triple lock function to lock
the task it's real_parent and it's tracer (aka parent).

That makes this possible to consider is that notifying the ``parents''
is a fundamental part of the operation so we know we are going to
need the lock so we can move it up.

Throw in a pinch of lock_task_sighand and the triple lock function
gets quite interesting.

It is certainly worth trying, and I will.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-28 16:50               ` Eric W. Biederman
@ 2022-04-28 18:53                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> The bug appears when the TRACEE makes it to schedule().  Inside
> >> schedule there is a call to signal_pending_state() which notices
> >> a SIGKILL is pending and refuses to sleep.
> >
> > And I think this is fine. This doesn't really differ from the case
> > when the tracee was killed before it takes siglock.
>
> Hmm.  Maybe.

I hope ;)

> Previously we were actually guaranteed in ptrace_check_attach that after
> ptrace_freeze_traced would succeed as any pending fatal signal would
> cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
> stop schedule from sleeping.

Yes.

So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in
ptrace_check_attach" looks good to me (except it should use
wait_task_inactive(__TASK_TRACED)), but it should come before other
meaningfull changes and the changelog should be updated.

And then we will probably need to reconsider this wait_task_inactive()
and WARN_ON() around it, but depends on what will we finally do.

> I think in my tired mind yesterday

I got lost too ;)

> Still I would like to be able to
> let wait_task_inactive not care about the state of the process it is
> going to sleep for.

Not sure... but to be honest I didn't really pay attention to the
wait_task_inactive(match_state => 0) part...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 18:53                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> The bug appears when the TRACEE makes it to schedule().  Inside
> >> schedule there is a call to signal_pending_state() which notices
> >> a SIGKILL is pending and refuses to sleep.
> >
> > And I think this is fine. This doesn't really differ from the case
> > when the tracee was killed before it takes siglock.
>
> Hmm.  Maybe.

I hope ;)

> Previously we were actually guaranteed in ptrace_check_attach that after
> ptrace_freeze_traced would succeed as any pending fatal signal would
> cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
> stop schedule from sleeping.

Yes.

So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in
ptrace_check_attach" looks good to me (except it should use
wait_task_inactive(__TASK_TRACED)), but it should come before other
meaningfull changes and the changelog should be updated.

And then we will probably need to reconsider this wait_task_inactive()
and WARN_ON() around it, but depends on what will we finally do.

> I think in my tired mind yesterday

I got lost too ;)

> Still I would like to be able to
> let wait_task_inactive not care about the state of the process it is
> going to sleep for.

Not sure... but to be honest I didn't really pay attention to the
wait_task_inactive(match_state => 0) part...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-27  0:24     ` Eric W. Biederman
@ 2022-04-28 20:29       ` Peter Zijlstra
  2022-04-28 20:59         ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 20:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On Tue, Apr 26, 2022 at 07:24:03PM -0500, Eric W. Biederman wrote:
> But doing:
> 
> 	/* Don't stop if the task is dying */
> 	if (unlikely(__fatal_signal_pending(current)))
> 		return exit_code;
> 
> Should work.

Something like so then...

---
Subject: signal,ptrace: Don't stop dying tasks
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Apr 28 22:17:56 CEST 2022

Oleg pointed out that the tracee can already be killed such that
fatal_signal_pending() is true. In that case signal_wake_up_state()
cannot be relied upon to be responsible for the wakeup -- something
we're going to want to rely on.

As such, explicitly handle this case.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/signal.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if the task is dying. */
+	if (unlikely(__fatal_signal_pending(current)))
+		return exit_code;
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 18:37                   ` Eric W. Biederman
@ 2022-04-28 20:49                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>>
>>> Hmm.  If we have the following process tree.
>>> 
>>>     A
>>>      \
>>>       B
>>>        \
>>>         C
>>> 
>>> Process A, B, and C are all in the same process group.
>>> Process A and B are setup to receive SIGCHILD when
>>> their process stops.
>>> 
>>> Process C traces process A.
>>> 
>>> When a sigstop is delivered to the group we can have:
>>> 
>>> Process B takes siglock(B) siglock(A) to notify the real_parent
>>> Process C takes siglock(C) siglock(B) to notify the real_parent
>>> Process A takes siglock(A) siglock(C) to notify the tracer
>>> 
>>> If they all take their local lock at the same time there is
>>> a deadlock.
>>> 
>>> I don't think the restriction that you can never ptrace anyone
>>> up the process tree is going to fly.  So it looks like I am back to the
>>> drawing board for this one.
>>
>> I've not had time to fully appreciate the nested locking here, but if it
>> is possible to rework things to always take both locks at the same time,
>> then it would be possible to impose an arbitrary lock order on things
>> and break the cycle that way.
>>
>> That is, simply order the locks by their heap address or something:
>>
>> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
>> {
>> 	if (sh1 > sh2)
>> 		swap(sh1, sh2)
>>
>> 	spin_lock_irq(&sh1->siglock);
>> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
>> }
>
> You know it might be.  Especially given that the existing code is
> already dropping siglock and grabbing tasklist_lock.
>
> It would take a potentially triple lock function to lock
> the task it's real_parent and it's tracer (aka parent).
>
> That makes this possible to consider is that notifying the ``parents''
> is a fundamental part of the operation so we know we are going to
> need the lock so we can move it up.
>
> Throw in a pinch of lock_task_sighand and the triple lock function
> gets quite interesting.
>
> It is certainly worth trying, and I will.

To my surprise it doesn't look too bad.  The locking simplifications and
not using a lock as big as tasklist_lock probably make it even worth
doing.

I need to sleep on it and look at everything again.  In the
meantime here is my function that comes in with siglock held,
possibly drops it, and grabs the other two locks all in
order.

static void lock_parents_siglocks(bool lock_tracer)
	__releases(&current->sighand->siglock)
	__acquires(&current->sighand->siglock)
	__acquires(&current->real_parent->sighand->siglock)
	__acquires(&current->parent->sighand->siglock)
{
	struct task_struct *me = current;
	struct sighand_struct *m_sighand = me->sighand;

	lockdep_assert_held(&m_sighand->siglock);

	rcu_read_lock();
	for (;;) {
		struct task_struct *parent, *tracer;
		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;

		parent = me->real_parent;
		tracer = lock_tracer? me->parent : parent;

		p_sighand = rcu_dereference(parent->sighand);
		t_sighand = rcu_dereference(tracer->sighand);

		/* Sort the sighands so that s1 >= s2 >= s3 */
		s1 = m_sighand;
		s2 = p_sighand;
		s3 = t_sighand;
		if (s1 > s2)
			swap(s1, s2);
		if (s1 > s3)
			swap(s1, s3);
		if (s2 > s3)
			swap(s2, s3);

		if (s1 != m_sighand) {
			spin_unlock(&m_sighand->siglock);
			spin_lock(&s1->siglock);
		}

		if (s1 != s2)
			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
		if (s2 != s3)
			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);

		if (likely((me->real_parent == parent) &&
			   (me->parent == tracer) &&
			   (parent->sighand == p_sighand) &&
			   (tracer->sighand == t_sighand))) {
			break;
		}
		spin_unlock(&p_sighand->siglock);
                if (t_sighand != p_sighand)
			spin_unlock(&t_sighand->siglock);
		continue;
	}
	rcu_read_unlock();
}

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 20:49                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>>
>>> Hmm.  If we have the following process tree.
>>> 
>>>     A
>>>      \
>>>       B
>>>        \
>>>         C
>>> 
>>> Process A, B, and C are all in the same process group.
>>> Process A and B are setup to receive SIGCHILD when
>>> their process stops.
>>> 
>>> Process C traces process A.
>>> 
>>> When a sigstop is delivered to the group we can have:
>>> 
>>> Process B takes siglock(B) siglock(A) to notify the real_parent
>>> Process C takes siglock(C) siglock(B) to notify the real_parent
>>> Process A takes siglock(A) siglock(C) to notify the tracer
>>> 
>>> If they all take their local lock at the same time there is
>>> a deadlock.
>>> 
>>> I don't think the restriction that you can never ptrace anyone
>>> up the process tree is going to fly.  So it looks like I am back to the
>>> drawing board for this one.
>>
>> I've not had time to fully appreciate the nested locking here, but if it
>> is possible to rework things to always take both locks at the same time,
>> then it would be possible to impose an arbitrary lock order on things
>> and break the cycle that way.
>>
>> That is, simply order the locks by their heap address or something:
>>
>> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
>> {
>> 	if (sh1 > sh2)
>> 		swap(sh1, sh2)
>>
>> 	spin_lock_irq(&sh1->siglock);
>> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
>> }
>
> You know it might be.  Especially given that the existing code is
> already dropping siglock and grabbing tasklist_lock.
>
> It would take a potentially triple lock function to lock
> the task it's real_parent and it's tracer (aka parent).
>
> That makes this possible to consider is that notifying the ``parents''
> is a fundamental part of the operation so we know we are going to
> need the lock so we can move it up.
>
> Throw in a pinch of lock_task_sighand and the triple lock function
> gets quite interesting.
>
> It is certainly worth trying, and I will.

To my surprise it doesn't look too bad.  The locking simplifications and
not using a lock as big as tasklist_lock probably make it even worth
doing.

I need to sleep on it and look at everything again.  In the
meantime here is my function that comes in with siglock held,
possibly drops it, and grabs the other two locks all in
order.

static void lock_parents_siglocks(bool lock_tracer)
	__releases(&current->sighand->siglock)
	__acquires(&current->sighand->siglock)
	__acquires(&current->real_parent->sighand->siglock)
	__acquires(&current->parent->sighand->siglock)
{
	struct task_struct *me = current;
	struct sighand_struct *m_sighand = me->sighand;

	lockdep_assert_held(&m_sighand->siglock);

	rcu_read_lock();
	for (;;) {
		struct task_struct *parent, *tracer;
		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;

		parent = me->real_parent;
		tracer = lock_tracer? me->parent : parent;

		p_sighand = rcu_dereference(parent->sighand);
		t_sighand = rcu_dereference(tracer->sighand);

		/* Sort the sighands so that s1 >= s2 >= s3 */
		s1 = m_sighand;
		s2 = p_sighand;
		s3 = t_sighand;
		if (s1 > s2)
			swap(s1, s2);
		if (s1 > s3)
			swap(s1, s3);
		if (s2 > s3)
			swap(s2, s3);

		if (s1 != m_sighand) {
			spin_unlock(&m_sighand->siglock);
			spin_lock(&s1->siglock);
		}

		if (s1 != s2)
			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
		if (s2 != s3)
			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);

		if (likely((me->real_parent == parent) &&
			   (me->parent == tracer) &&
			   (parent->sighand == p_sighand) &&
			   (tracer->sighand == t_sighand))) {
			break;
		}
		spin_unlock(&p_sighand->siglock);
                if (t_sighand != p_sighand)
			spin_unlock(&t_sighand->siglock);
		continue;
	}
	rcu_read_unlock();
}

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 20:29       ` Peter Zijlstra
@ 2022-04-28 20:59         ` Oleg Nesterov
  2022-04-28 22:21           ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 20:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/28, Peter Zijlstra wrote:
>
> Oleg pointed out that the tracee can already be killed such that
> fatal_signal_pending() is true. In that case signal_wake_up_state()
> cannot be relied upon to be responsible for the wakeup -- something
> we're going to want to rely on.

Peter, I am all confused...

If this patch is against the current tree, we don't need it.

If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
then it can't help - SIGKILL can come right after the tracee drops siglock
and calls schedule().

Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
return to close this race.

Oleg.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if the task is dying. */
> +	if (unlikely(__fatal_signal_pending(current)))
> +		return exit_code;
> +
>  	/*
>  	 * schedule() will not sleep if there is a pending signal that
>  	 * can awaken the task.
> 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 20:49                     ` Eric W. Biederman
@ 2022-04-28 22:19                       ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 03:49:11PM -0500, Eric W. Biederman wrote:

> static void lock_parents_siglocks(bool lock_tracer)
> 	__releases(&current->sighand->siglock)
> 	__acquires(&current->sighand->siglock)
> 	__acquires(&current->real_parent->sighand->siglock)
> 	__acquires(&current->parent->sighand->siglock)
> {
> 	struct task_struct *me = current;
> 	struct sighand_struct *m_sighand = me->sighand;
> 
> 	lockdep_assert_held(&m_sighand->siglock);
> 
> 	rcu_read_lock();
> 	for (;;) {
> 		struct task_struct *parent, *tracer;
> 		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
> 
> 		parent = me->real_parent;
> 		tracer = lock_tracer? me->parent : parent;
> 
> 		p_sighand = rcu_dereference(parent->sighand);
> 		t_sighand = rcu_dereference(tracer->sighand);
> 
> 		/* Sort the sighands so that s1 >= s2 >= s3 */
> 		s1 = m_sighand;
> 		s2 = p_sighand;
> 		s3 = t_sighand;
> 		if (s1 > s2)
> 			swap(s1, s2);
> 		if (s1 > s3)
> 			swap(s1, s3);
> 		if (s2 > s3)
> 			swap(s2, s3);
> 
> 		if (s1 != m_sighand) {
> 			spin_unlock(&m_sighand->siglock);
> 			spin_lock(&s1->siglock);
> 		}
> 
> 		if (s1 != s2)
> 			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
> 		if (s2 != s3)
> 			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);
> 

Might as well just use 1 and 2 for subclass at this point, or use
SIGLOCK_LOCK_FIRST below.

> 		if (likely((me->real_parent == parent) &&
> 			   (me->parent == tracer) &&
> 			   (parent->sighand == p_sighand) &&
> 			   (tracer->sighand == t_sighand))) {
> 			break;
> 		}
> 		spin_unlock(&p_sighand->siglock);
>                 if (t_sighand != p_sighand)
> 			spin_unlock(&t_sighand->siglock);

Indent fail above ^, also you likey need this:

		/*
		 * Since [pt]_sighand will likely change if we go
		 * around, and m_sighand is the only one held, make sure
		 * it is subclass-0, since the above 's1 != m_sighand'
		 * clause very much relies on that.
		 */
		lock_set_subclass(&m_sighand->siglock, 0, _RET_IP_);

> 		continue;
> 	}
> 	rcu_read_unlock();
> }
> 
> Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 22:19                       ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 03:49:11PM -0500, Eric W. Biederman wrote:

> static void lock_parents_siglocks(bool lock_tracer)
> 	__releases(&current->sighand->siglock)
> 	__acquires(&current->sighand->siglock)
> 	__acquires(&current->real_parent->sighand->siglock)
> 	__acquires(&current->parent->sighand->siglock)
> {
> 	struct task_struct *me = current;
> 	struct sighand_struct *m_sighand = me->sighand;
> 
> 	lockdep_assert_held(&m_sighand->siglock);
> 
> 	rcu_read_lock();
> 	for (;;) {
> 		struct task_struct *parent, *tracer;
> 		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
> 
> 		parent = me->real_parent;
> 		tracer = lock_tracer? me->parent : parent;
> 
> 		p_sighand = rcu_dereference(parent->sighand);
> 		t_sighand = rcu_dereference(tracer->sighand);
> 
> 		/* Sort the sighands so that s1 >= s2 >= s3 */
> 		s1 = m_sighand;
> 		s2 = p_sighand;
> 		s3 = t_sighand;
> 		if (s1 > s2)
> 			swap(s1, s2);
> 		if (s1 > s3)
> 			swap(s1, s3);
> 		if (s2 > s3)
> 			swap(s2, s3);
> 
> 		if (s1 != m_sighand) {
> 			spin_unlock(&m_sighand->siglock);
> 			spin_lock(&s1->siglock);
> 		}
> 
> 		if (s1 != s2)
> 			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
> 		if (s2 != s3)
> 			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);
> 

Might as well just use 1 and 2 for subclass at this point, or use
SIGLOCK_LOCK_FIRST below.

> 		if (likely((me->real_parent == parent) &&
> 			   (me->parent == tracer) &&
> 			   (parent->sighand == p_sighand) &&
> 			   (tracer->sighand == t_sighand))) {
> 			break;
> 		}
> 		spin_unlock(&p_sighand->siglock);
>                 if (t_sighand != p_sighand)
> 			spin_unlock(&t_sighand->siglock);

Indent fail above ^, also you likey need this:

		/*
		 * Since [pt]_sighand will likely change if we go
		 * around, and m_sighand is the only one held, make sure
		 * it is subclass-0, since the above 's1 != m_sighand'
		 * clause very much relies on that.
		 */
		lock_set_subclass(&m_sighand->siglock, 0, _RET_IP_);

> 		continue;
> 	}
> 	rcu_read_unlock();
> }
> 
> Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 20:59         ` Oleg Nesterov
@ 2022-04-28 22:21           ` Peter Zijlstra
  2022-04-28 22:50             ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> On 04/28, Peter Zijlstra wrote:
> >
> > Oleg pointed out that the tracee can already be killed such that
> > fatal_signal_pending() is true. In that case signal_wake_up_state()
> > cannot be relied upon to be responsible for the wakeup -- something
> > we're going to want to rely on.
> 
> Peter, I am all confused...
> 
> If this patch is against the current tree, we don't need it.
> 
> If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> then it can't help - SIGKILL can come right after the tracee drops siglock
> and calls schedule().

But by that time it will already have set TRACED and signal_wake_up()
wil clear it, no?

> Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
> to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
> return to close this race.

I think Eric convinced me there was a problem with that, but I'll go
over it all again in the morning, perhaps I'll reach a different
conclusion :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 22:21           ` Peter Zijlstra
@ 2022-04-28 22:50             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 22:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Peter, you know, it is very difficult to me to discuss the changes
in the 2 unfinished series and not loose the context ;) Plus I am
already sleeping. But I'll try to reply anyway.

On 04/29, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> > If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> > then it can't help - SIGKILL can come right after the tracee drops siglock
> > and calls schedule().
>
> But by that time it will already have set TRACED and signal_wake_up()
> wil clear it, no?

No. JOBCTL_DELAY_WAKEKILL is already set, this means that signal_wake_up()
will remove TASK_WAKEKILL from the "state" passed to signal_wake_up_state()
and this is fine and correct, this mean thats ttwu() won't change ->__state.

But this also mean that wake_up_state() will return false, and in this case

	signal_wake_up_state:

		if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
			t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);

won't clear these flags. And this is nice too.

But. fatal_signal_pending() is true! And once we change freeze_traced()
to not abuse p->__state, schedule() won't block because it will check
signal_pending_state(TASK_TRACED == TASK_WAKEKILL | __TASK_TRACED) and
__fatal_signal_pending() == T.

In this case ptrace_stop() will leak JOBCTL_TRACED, so we simply need
to clear it before return along with LISTENING | DELAY_WAKEKILL.

> I'll go
> over it all again in the morning, perhaps I'll reach a different
> conclusion :-)

Same here ;)

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
  2022-04-26 22:50       ` Eric W. Biederman
  (?)
@ 2022-04-29 21:46         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-04-29 21:46         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-04-29 21:46         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler = SIG_DFL &&
 	    (!t->ptrace || (handler = HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo = SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall  			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) = __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent = current) {
-		WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code = data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data = 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state == TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state == TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state = TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched, signal, ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) = TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-04-29 22:27             ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 22:27             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 22:27             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-29 21:48           ` [PATCH v2 12/12] sched, signal, ptrace: " Eric W. Biederman
  (?)
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 14:37             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-02 14:37             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-02 14:37             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 15:39             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:39             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:39             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 15:47             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:47             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:47             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) = __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-02 15:39             ` Oleg Nesterov
  (?)
@ 2022-05-02 16:35               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric




^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 16:35               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric




_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 16:35               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-02 16:35               ` Eric W. Biederman
  (?)
@ 2022-05-03 13:41                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 13:41                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 13:41                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-05-02 14:37             ` Oleg Nesterov
  (?)
@ 2022-05-03 19:36               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-03 19:36               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-03 19:36               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-03 13:41                 ` Oleg Nesterov
  (?)
@ 2022-05-03 20:45                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric









^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 20:45                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric









_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 20:45                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric








^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-03 20:45                   ` Eric W. Biederman
  (?)
@ 2022-05-04 14:02                     ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 14:02                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 14:02                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-04 14:02                     ` Oleg Nesterov
  (?)
@ 2022-05-04 17:37                       ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case.  Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result.  RT has more problems as it turns all spin locks into
>> sleeping locks.  When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
> 	{
> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> 	}

If your concern is signal_wake_up there is no reason it can't be:

	static inline void signal_wake_up(struct task_struct *t, bool fatal)
        {
        	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
                signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
        }

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> 	if (!ret && !ignore_state &&
> 	    /*
> 	     * This can only fail if the frozen tracee races with
> 	     * SIGKILL and enters schedule() with fatal_signal_pending
> 	     */
> 	    !wait_task_inactive(child, __TASK_TRACED))
> 		ret = -ESRCH;
>
> 	return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> 	- add the fatal_signal_pending() check into ptrace_stop()
> 	  (like this patch does)
>
> 	- say, change signal_pending_state:
>
> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (!signal_pending(p))
> 			return 0;
> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
> 			return 0;
> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> 	}
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
>         ^^^^^^
>         tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that silly case that sets
TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
remain.  It just seems to make everything more complicated for no real
reason anymore.  The deadlocks may_ptrace_stop was guarding against are
gone.

Plus the test is so racy we case can happen after we drop siglock
before we schedule, or shortly after we have stopped so we really
don't reliably catch the condition the code is trying to catch.

I think the case I care most about is ptrace_signal, which pretty much
requires the tracer to wait and clear exit_code before being terminated
to cause problems.  We don't handle that at all today.

So yeah.  I think the code handles so little at this point we can just
remove the code and simplify things, if we actually care we can come
back and implement JOBCTL_PTRACE_SIGNR or the like.

I will chew on that a bit and see if I can find any reasons for keeping
the code in ptrace_stop at all.



As an added data point we can probably remove handling of the signal
from ptrace_report_syscall entirely (not in this patchset!).

I took a quick skim and it appears that sending a signal in
ptrace_report_syscall appears to be a feature introduced with ptrace
support in Linux v1.0 and the comment in ptrace_report_syscall appears
to document the fact that the code has always been dead.


I made it through 13 of 133 pages of debian code search results for
PTRACE_SYSCALL, and the only use I could find of setting the continue
signal was when the signal reported from wait was not SIGTRAP.  Exactly
the same as in the comment in ptrace_report_syscall.

If that pattern holds for all of the uses of ptrace then the code
in ptrace_report_syscall is dead.



Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 17:37                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case.  Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result.  RT has more problems as it turns all spin locks into
>> sleeping locks.  When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
> 	{
> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> 	}

If your concern is signal_wake_up there is no reason it can't be:

	static inline void signal_wake_up(struct task_struct *t, bool fatal)
        {
        	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
                signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
        }

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> 	if (!ret && !ignore_state &&
> 	    /*
> 	     * This can only fail if the frozen tracee races with
> 	     * SIGKILL and enters schedule() with fatal_signal_pending
> 	     */
> 	    !wait_task_inactive(child, __TASK_TRACED))
> 		ret = -ESRCH;
>
> 	return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> 	- add the fatal_signal_pending() check into ptrace_stop()
> 	  (like this patch does)
>
> 	- say, change signal_pending_state:
>
> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (!signal_pending(p))
> 			return 0;
> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
> 			return 0;
> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> 	}
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
>         ^^^^^^
>         tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that sill