All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
@ 2022-04-21 15:02 Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Find here a new posting of the ptrace and freezer patches :-)

The majority of the changes are in patch 2, which with much feedback from Oleg
and Eric has changed lots.

I'm hoping we're converging on something agreeable.

---
 drivers/acpi/x86/s2idle.c         |  12 +-
 drivers/android/binder.c          |   4 +-
 drivers/media/pci/pt3/pt3.c       |   4 +-
 drivers/scsi/scsi_transport_spi.c |   7 +-
 fs/cifs/inode.c                   |   4 +-
 fs/cifs/transport.c               |   5 +-
 fs/coredump.c                     |   5 +-
 fs/nfs/file.c                     |   3 +-
 fs/nfs/inode.c                    |  12 +-
 fs/nfs/nfs3proc.c                 |   3 +-
 fs/nfs/nfs4proc.c                 |  14 +--
 fs/nfs/nfs4state.c                |   3 +-
 fs/nfs/pnfs.c                     |   4 +-
 fs/xfs/xfs_trans_ail.c            |   8 +-
 include/linux/completion.h        |   1 +
 include/linux/freezer.h           | 244 ++------------------------------------
 include/linux/sched.h             |  49 ++++----
 include/linux/sched/jobctl.h      |  10 ++
 include/linux/sched/signal.h      |   5 +-
 include/linux/sunrpc/sched.h      |   7 +-
 include/linux/suspend.h           |   8 +-
 include/linux/umh.h               |   9 +-
 include/linux/wait.h              |  40 ++++++-
 init/do_mounts_initrd.c           |  10 +-
 kernel/cgroup/legacy_freezer.c    |  23 ++--
 kernel/exit.c                     |   4 +-
 kernel/fork.c                     |   5 +-
 kernel/freezer.c                  | 137 +++++++++++++++------
 kernel/futex/waitwake.c           |   8 +-
 kernel/hung_task.c                |   4 +-
 kernel/power/hibernate.c          |  35 ++++--
 kernel/power/main.c               |  18 +--
 kernel/power/process.c            |  10 +-
 kernel/power/suspend.c            |  12 +-
 kernel/power/user.c               |  24 ++--
 kernel/ptrace.c                   | 114 ++++++++++--------
 kernel/sched/completion.c         |   9 ++
 kernel/sched/core.c               |  24 ++--
 kernel/signal.c                   |  62 +++++++---
 kernel/time/hrtimer.c             |   4 +-
 kernel/umh.c                      |  18 ++-
 mm/khugepaged.c                   |   4 +-
 net/sunrpc/sched.c                |  12 +-
 net/unix/af_unix.c                |   8 +-
 44 files changed, 478 insertions(+), 528 deletions(-)


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-26 23:34   ` Eric W. Biederman
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched.h        |    8 +++-----
 include/linux/sched/jobctl.h |    6 ++++++
 include/linux/sched/signal.h |    5 ++++-
 kernel/ptrace.c              |   26 +++++++++++++++-----------
 kernel/signal.c              |   16 ++++++++++++----
 5 files changed, 40 insertions(+), 21 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,9 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 
+#define JOBCTL_STOPPED_BIT	24	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	25	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -29,6 +32,9 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,6 +441,7 @@ static inline void signal_wake_up(struct
 {
 	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
 }
+
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
 	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(st
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -218,9 +223,10 @@ static void ptrace_unfreeze_traced(struc
 	 */
 	spin_lock_irq(&task->sighand->siglock);
 	if (READ_ONCE(task->__state) == __TASK_TRACED) {
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~JOBCTL_TRACED;
 			wake_up_state(task, __TASK_TRACED);
-		else
+		} else
 			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -475,8 +481,10 @@ static int ptrace_attach(struct task_str
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -850,8 +858,6 @@ static long ptrace_get_rseq_configuratio
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -892,13 +898,11 @@ static int ptrace_resume(struct task_str
 	 * status and clears the code too; this can't race with the tracee, it
 	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(ke
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
 	 * By using wake_up_state, we ensure the process will wake up and
 	 * handle its death signal.
 	 */
-	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
+	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+	else
 		kick_process(t);
 }
 
@@ -884,7 +889,7 @@ static int check_kill_permission(int sig
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -930,9 +935,10 @@ static bool prepare_signal(int sig, stru
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2219,6 +2225,7 @@ static int ptrace_stop(int exit_code, in
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
 	 */
+	current->jobctl |= JOBCTL_TRACED;
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2460,6 +2467,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
                     ` (4 more replies)
  2022-04-21 15:02 ` [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
task->__state as much.

Due to how PREEMPT_RT is changing the rules vs task->__state with the
introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
blocking spinlock thing), the way ptrace freeze tries to do things no
longer works.

Specifically there are two problems:

 - due to ->saved_state, the ->__state modification removing
   TASK_WAKEKILL no longer works reliably.

 - due to ->saved_state, wait_task_inactive() also no longer works
   reliably.

The first problem is solved by a suggestion from Eric that instead
of changing __state, TASK_WAKEKILL be delayed.

The second problem is solved by a suggestion from Oleg; add
JOBCTL_TRACED_QUIESCE to cover the chunk of code between
set_current_state(TASK_TRACED) and schedule(), such that
ptrace_check_attach() can first wait for JOBCTL_TRACED_QUIESCE to get
cleared, and then use wait_task_inactive().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/jobctl.h |    8 ++-
 kernel/ptrace.c              |   90 ++++++++++++++++++++++---------------------
 kernel/sched/core.c          |    5 --
 kernel/signal.c              |   36 ++++++++++++++---
 4 files changed, 86 insertions(+), 53 deletions(-)

--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,9 +19,11 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT 24	/* delay killable wakeups */
 
-#define JOBCTL_STOPPED_BIT	24	/* do_signal_stop() */
-#define JOBCTL_TRACED_BIT	25	/* ptrace_stop() */
+#define JOBCTL_STOPPED_BIT	25	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	26	/* ptrace_stop() */
+#define JOBCTL_TRACED_QUIESCE_BIT 27
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -31,9 +33,11 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
 #define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+#define JOBCTL_TRACED_QUIESCE	(1UL << JOBCTL_TRACED_QUIESCE_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -193,41 +193,44 @@ static bool looks_like_a_spurious_pid(st
  */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
+	unsigned long flags;
 	bool ret = false;
 
 	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
-	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
+	if (!lock_task_sighand(task, &flags))
+		return ret;
+
+	if (task_is_traced(task) &&
+	    !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		WARN_ON_ONCE(READ_ONCE(task->__state) != TASK_TRACED);
+		WARN_ON_ONCE(task->jobctl & JOBCTL_DELAY_WAKEKILL);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
+	unlock_task_sighand(task, &flags);
 
 	return ret;
 }
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!task_is_traced(task))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
 
-	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
-	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task_is_traced(task)) {
+//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task)) {
 			task->jobctl &= ~JOBCTL_TRACED;
-			wake_up_state(task, __TASK_TRACED);
-		} else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+			wake_up_state(task, TASK_WAKEKILL);
+		}
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -251,40 +254,45 @@ static void ptrace_unfreeze_traced(struc
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
-	int ret = -ESRCH;
+	int traced;
 
 	/*
 	 * We take the read lock around doing both checks to close a
-	 * possible race where someone else was tracing our child and
-	 * detached between these two checks.  After this locked check,
-	 * we are sure that this is our traced child and that can only
-	 * be changed by us so it's not changing right after this.
+	 * possible race where someone else attaches or detaches our
+	 * natural child.
 	 */
 	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
-	}
+	traced = child->ptrace && child->parent == current;
 	read_unlock(&tasklist_lock);
+	if (!traced)
+		return -ESRCH;
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
+	if (ignore_state)
+		return 0;
+
+	if (!task_is_traced(child))
+		return -ESRCH;
+
+	WARN_ON_ONCE(READ_ONCE(child->jobctl) & JOBCTL_DELAY_WAKEKILL);
+
+	/* Wait for JOBCTL_TRACED_QUIESCE to go away, see ptrace_stop(). */
+	for (;;) {
+		if (fatal_signal_pending(current))
+			return -EINTR;
+
+		set_current_state(TASK_KILLABLE);
+		if (!(READ_ONCE(child->jobctl) & JOBCTL_TRACED_QUIESCE))
+			break;
+
+		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 
-	return ret;
+	if (!wait_task_inactive(child, TASK_TRACED) ||
+	    !ptrace_freeze_traced(child))
+		return -ESRCH;
+
+	return 0;
 }
 
 static bool ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
@@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
 		goto out_put_task_struct;
 
 	ret = arch_ptrace(child, request, addr, data);
-	if (ret || request != PTRACE_DETACH)
-		ptrace_unfreeze_traced(child);
+	ptrace_unfreeze_traced(child);
 
  out_put_task_struct:
 	put_task_struct(child);
@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
 				  request == PTRACE_INTERRUPT);
 	if (!ret) {
 		ret = compat_arch_ptrace(child, request, addr, data);
-		if (ret || request != PTRACE_DETACH)
-			ptrace_unfreeze_traced(child);
+		ptrace_unfreeze_traced(child);
 	}
 
  out_put_task_struct:
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -764,6 +764,10 @@ void signal_wake_up_state(struct task_st
 {
 	lockdep_assert_held(&t->sighand->siglock);
 
+	/* Suppress wakekill? */
+	if (t->jobctl & JOBCTL_DELAY_WAKEKILL)
+		state &= ~TASK_WAKEKILL;
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
 
 	/*
@@ -774,7 +778,7 @@ void signal_wake_up_state(struct task_st
 	 * handle its death signal.
 	 */
 	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
-		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);
 	else
 		kick_process(t);
 }
@@ -2187,6 +2191,15 @@ static void do_notify_parent_cldstop(str
 	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
+static void clear_traced_quiesce(void)
+{
+	spin_lock_irq(&current->sighand->siglock);
+	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
+	current->jobctl &= ~JOBCTL_TRACED_QUIESCE;
+	wake_up_state(current->parent, TASK_KILLABLE);
+	spin_unlock_irq(&current->sighand->siglock);
+}
+
 /*
  * This must be called with current->sighand->siglock held.
  *
@@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
 	 */
-	current->jobctl |= JOBCTL_TRACED;
+	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
 		/*
 		 * Don't want to allow preemption here, because
 		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
 		 */
 		preempt_disable();
 		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
+		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
+
+		/*
+		 * JOBCTL_TRACE_QUIESCE bridges the gap between
+		 * set_current_state(TASK_TRACED) above and schedule() below.
+		 * There must not be any blocking (specifically anything that
+		 * touched ->saved_state on PREEMPT_RT) between here and
+		 * schedule().
+		 *
+		 * ptrace_check_attach() relies on this with its
+		 * wait_task_inactive() usage.
+		 */
+		clear_traced_quiesce();
+
 		preempt_enable_no_resched();
 		freezable_schedule();
+
 		cgroup_leave_frozen(true);
 	} else {
 		/*
@@ -2335,6 +2360,7 @@ static int ptrace_stop(int exit_code, in
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rafael explained that the reason for having both PF_NOFREEZE and
PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from
kthread context that has previously called set_freezable().

In preparation of merging the flags, have {,un}lock_system_slee() save
and restore current->flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/acpi/x86/s2idle.c         |   12 ++++++++----
 drivers/scsi/scsi_transport_spi.c |    7 ++++---
 include/linux/suspend.h           |    8 ++++----
 kernel/power/hibernate.c          |   35 ++++++++++++++++++++++-------------
 kernel/power/main.c               |   16 ++++++++++------
 kernel/power/suspend.c            |   12 ++++++++----
 kernel/power/user.c               |   24 ++++++++++++++----------
 7 files changed, 70 insertions(+), 44 deletions(-)

--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -538,12 +538,14 @@ void acpi_s2idle_setup(void)
 
 int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg)
 {
+	unsigned int sleep_flags;
+
 	if (!lps0_device_handle || sleep_no_lps0)
 		return -ENODEV;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	list_add(&arg->list_node, &lps0_s2idle_devops_head);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return 0;
 }
@@ -551,12 +553,14 @@ EXPORT_SYMBOL_GPL(acpi_register_lps0_dev
 
 void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg)
 {
+	unsigned int sleep_flags;
+
 	if (!lps0_device_handle || sleep_no_lps0)
 		return;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	list_del(&arg->list_node);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(acpi_unregister_lps0_dev);
 
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -998,8 +998,9 @@ void
 spi_dv_device(struct scsi_device *sdev)
 {
 	struct scsi_target *starget = sdev->sdev_target;
-	u8 *buffer;
 	const int len = SPI_MAX_ECHO_BUFFER_SIZE*2;
+	unsigned int sleep_flags;
+	u8 *buffer;
 
 	/*
 	 * Because this function and the power management code both call
@@ -1007,7 +1008,7 @@ spi_dv_device(struct scsi_device *sdev)
 	 * while suspend or resume is in progress. Hence the
 	 * lock/unlock_system_sleep() calls.
 	 */
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (scsi_autopm_get_device(sdev))
 		goto unlock_system_sleep;
@@ -1058,7 +1059,7 @@ spi_dv_device(struct scsi_device *sdev)
 	scsi_autopm_put_device(sdev);
 
 unlock_system_sleep:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL(spi_dv_device);
 
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -510,8 +510,8 @@ extern bool pm_save_wakeup_count(unsigne
 extern void pm_wakep_autosleep_enabled(bool set);
 extern void pm_print_active_wakeup_sources(void);
 
-extern void lock_system_sleep(void);
-extern void unlock_system_sleep(void);
+extern unsigned int lock_system_sleep(void);
+extern void unlock_system_sleep(unsigned int);
 
 #else /* !CONFIG_PM_SLEEP */
 
@@ -534,8 +534,8 @@ static inline void pm_system_wakeup(void
 static inline void pm_wakeup_clear(bool reset) {}
 static inline void pm_system_irq_wakeup(unsigned int irq_number) {}
 
-static inline void lock_system_sleep(void) {}
-static inline void unlock_system_sleep(void) {}
+static inline unsigned int lock_system_sleep(void) { return 0; }
+static inline void unlock_system_sleep(unsigned int flags) {}
 
 #endif /* !CONFIG_PM_SLEEP */
 
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -92,20 +92,24 @@ bool hibernation_available(void)
  */
 void hibernation_set_ops(const struct platform_hibernation_ops *ops)
 {
+	unsigned int sleep_flags;
+
 	if (ops && !(ops->begin && ops->end &&  ops->pre_snapshot
 	    && ops->prepare && ops->finish && ops->enter && ops->pre_restore
 	    && ops->restore_cleanup && ops->leave)) {
 		WARN_ON(1);
 		return;
 	}
-	lock_system_sleep();
+
+	sleep_flags = lock_system_sleep();
+
 	hibernation_ops = ops;
 	if (ops)
 		hibernation_mode = HIBERNATION_PLATFORM;
 	else if (hibernation_mode == HIBERNATION_PLATFORM)
 		hibernation_mode = HIBERNATION_SHUTDOWN;
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(hibernation_set_ops);
 
@@ -713,6 +717,7 @@ static int load_image_and_restore(void)
 int hibernate(void)
 {
 	bool snapshot_test = false;
+	unsigned int sleep_flags;
 	int error;
 
 	if (!hibernation_available()) {
@@ -720,7 +725,7 @@ int hibernate(void)
 		return -EPERM;
 	}
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	/* The snapshot device should not be opened while we're running */
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -794,7 +799,7 @@ int hibernate(void)
 	pm_restore_console();
 	hibernate_release();
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 	pr_info("hibernation exit\n");
 
 	return error;
@@ -809,9 +814,10 @@ int hibernate(void)
  */
 int hibernate_quiet_exec(int (*func)(void *data), void *data)
 {
+	unsigned int sleep_flags;
 	int error;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -891,7 +897,7 @@ int hibernate_quiet_exec(int (*func)(voi
 	hibernate_release();
 
 unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error;
 }
@@ -1100,11 +1106,12 @@ static ssize_t disk_show(struct kobject
 static ssize_t disk_store(struct kobject *kobj, struct kobj_attribute *attr,
 			  const char *buf, size_t n)
 {
+	int mode = HIBERNATION_INVALID;
+	unsigned int sleep_flags;
 	int error = 0;
-	int i;
 	int len;
 	char *p;
-	int mode = HIBERNATION_INVALID;
+	int i;
 
 	if (!hibernation_available())
 		return -EPERM;
@@ -1112,7 +1119,7 @@ static ssize_t disk_store(struct kobject
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	for (i = HIBERNATION_FIRST; i <= HIBERNATION_MAX; i++) {
 		if (len == strlen(hibernation_modes[i])
 		    && !strncmp(buf, hibernation_modes[i], len)) {
@@ -1142,7 +1149,7 @@ static ssize_t disk_store(struct kobject
 	if (!error)
 		pm_pr_dbg("Hibernation mode set to '%s'\n",
 			       hibernation_modes[mode]);
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 	return error ? error : n;
 }
 
@@ -1158,9 +1165,10 @@ static ssize_t resume_show(struct kobjec
 static ssize_t resume_store(struct kobject *kobj, struct kobj_attribute *attr,
 			    const char *buf, size_t n)
 {
-	dev_t res;
+	unsigned int sleep_flags;
 	int len = n;
 	char *name;
+	dev_t res;
 
 	if (len && buf[len-1] == '\n')
 		len--;
@@ -1173,9 +1181,10 @@ static ssize_t resume_store(struct kobje
 	if (!res)
 		return -EINVAL;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 	swsusp_resume_device = res;
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
+
 	pm_pr_dbg("Configured hibernation resume from disk to %u\n",
 		  swsusp_resume_device);
 	noresume = 0;
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -21,14 +21,16 @@
 
 #ifdef CONFIG_PM_SLEEP
 
-void lock_system_sleep(void)
+unsigned int lock_system_sleep(void)
 {
+	unsigned int flags = current->flags;
 	current->flags |= PF_FREEZER_SKIP;
 	mutex_lock(&system_transition_mutex);
+	return flags;
 }
 EXPORT_SYMBOL_GPL(lock_system_sleep);
 
-void unlock_system_sleep(void)
+void unlock_system_sleep(unsigned int flags)
 {
 	/*
 	 * Don't use freezer_count() because we don't want the call to
@@ -46,7 +48,8 @@ void unlock_system_sleep(void)
 	 * Which means, if we use try_to_freeze() here, it would make them
 	 * enter the refrigerator, thus causing hibernation to lockup.
 	 */
-	current->flags &= ~PF_FREEZER_SKIP;
+	if (!(flags & PF_FREEZER_SKIP))
+		current->flags &= ~PF_FREEZER_SKIP;
 	mutex_unlock(&system_transition_mutex);
 }
 EXPORT_SYMBOL_GPL(unlock_system_sleep);
@@ -260,16 +263,17 @@ static ssize_t pm_test_show(struct kobje
 static ssize_t pm_test_store(struct kobject *kobj, struct kobj_attribute *attr,
 				const char *buf, size_t n)
 {
+	unsigned int sleep_flags;
 	const char * const *s;
+	int error = -EINVAL;
 	int level;
 	char *p;
 	int len;
-	int error = -EINVAL;
 
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	level = TEST_FIRST;
 	for (s = &pm_tests[level]; level <= TEST_MAX; s++, level++)
@@ -279,7 +283,7 @@ static ssize_t pm_test_store(struct kobj
 			break;
 		}
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error ? error : n;
 }
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -75,9 +75,11 @@ EXPORT_SYMBOL_GPL(pm_suspend_default_s2i
 
 void s2idle_set_ops(const struct platform_s2idle_ops *ops)
 {
-	lock_system_sleep();
+	unsigned int sleep_flags;
+
+	sleep_flags = lock_system_sleep();
 	s2idle_ops = ops;
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 
 static void s2idle_begin(void)
@@ -200,7 +202,9 @@ __setup("mem_sleep_default=", mem_sleep_
  */
 void suspend_set_ops(const struct platform_suspend_ops *ops)
 {
-	lock_system_sleep();
+	unsigned int sleep_flags;
+
+	sleep_flags = lock_system_sleep();
 
 	suspend_ops = ops;
 
@@ -216,7 +220,7 @@ void suspend_set_ops(const struct platfo
 			mem_sleep_current = PM_SUSPEND_MEM;
 	}
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 }
 EXPORT_SYMBOL_GPL(suspend_set_ops);
 
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -46,12 +46,13 @@ int is_hibernate_resume_dev(dev_t dev)
 static int snapshot_open(struct inode *inode, struct file *filp)
 {
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	int error;
 
 	if (!hibernation_available())
 		return -EPERM;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -97,7 +98,7 @@ static int snapshot_open(struct inode *i
 	data->dev = 0;
 
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return error;
 }
@@ -105,8 +106,9 @@ static int snapshot_open(struct inode *i
 static int snapshot_release(struct inode *inode, struct file *filp)
 {
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	swsusp_free();
 	data = filp->private_data;
@@ -123,7 +125,7 @@ static int snapshot_release(struct inode
 			PM_POST_HIBERNATION : PM_POST_RESTORE);
 	hibernate_release();
 
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return 0;
 }
@@ -131,11 +133,12 @@ static int snapshot_release(struct inode
 static ssize_t snapshot_read(struct file *filp, char __user *buf,
                              size_t count, loff_t *offp)
 {
+	loff_t pg_offp = *offp & ~PAGE_MASK;
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	ssize_t res;
-	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	data = filp->private_data;
 	if (!data->ready) {
@@ -156,7 +159,7 @@ static ssize_t snapshot_read(struct file
 		*offp += res;
 
  Unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return res;
 }
@@ -164,11 +167,12 @@ static ssize_t snapshot_read(struct file
 static ssize_t snapshot_write(struct file *filp, const char __user *buf,
                               size_t count, loff_t *offp)
 {
+	loff_t pg_offp = *offp & ~PAGE_MASK;
 	struct snapshot_data *data;
+	unsigned int sleep_flags;
 	ssize_t res;
-	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	sleep_flags = lock_system_sleep();
 
 	data = filp->private_data;
 
@@ -190,7 +194,7 @@ static ssize_t snapshot_write(struct fil
 	if (res > 0)
 		*offp += res;
 unlock:
-	unlock_system_sleep();
+	unlock_system_sleep(sleep_flags);
 
 	return res;
 }



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
  5 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

handle_initrd() marks itself as PF_FREEZER_SKIP in order to ensure
that the UMH, which is going to freeze the system, doesn't
indefinitely wait for it's caller.

Rework things by adding UMH_FREEZABLE to indicate the completion is
freezable.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/umh.h     |    9 +++++----
 init/do_mounts_initrd.c |   10 +---------
 kernel/umh.c            |    8 ++++++++
 3 files changed, 14 insertions(+), 13 deletions(-)

--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -11,10 +11,11 @@
 struct cred;
 struct file;
 
-#define UMH_NO_WAIT	0	/* don't wait at all */
-#define UMH_WAIT_EXEC	1	/* wait for the exec, but not the process */
-#define UMH_WAIT_PROC	2	/* wait for the process to complete */
-#define UMH_KILLABLE	4	/* wait for EXEC/PROC killable */
+#define UMH_NO_WAIT	0x00	/* don't wait at all */
+#define UMH_WAIT_EXEC	0x01	/* wait for the exec, but not the process */
+#define UMH_WAIT_PROC	0x02	/* wait for the process to complete */
+#define UMH_KILLABLE	0x04	/* wait for EXEC/PROC killable */
+#define UMH_FREEZABLE	0x08	/* wait for EXEC/PROC freezable */
 
 struct subprocess_info {
 	struct work_struct work;
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -79,19 +79,11 @@ static void __init handle_initrd(void)
 	init_mkdir("/old", 0700);
 	init_chdir("/old");
 
-	/*
-	 * In case that a resume from disk is carried out by linuxrc or one of
-	 * its children, we need to tell the freezer not to wait for us.
-	 */
-	current->flags |= PF_FREEZER_SKIP;
-
 	info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
 					 GFP_KERNEL, init_linuxrc, NULL, NULL);
 	if (!info)
 		return;
-	call_usermodehelper_exec(info, UMH_WAIT_PROC);
-
-	current->flags &= ~PF_FREEZER_SKIP;
+	call_usermodehelper_exec(info, UMH_WAIT_PROC|UMH_FREEZABLE);
 
 	/* move initrd to rootfs' /old */
 	init_mount("..", ".", NULL, MS_MOVE, NULL);
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -28,6 +28,7 @@
 #include <linux/async.h>
 #include <linux/uaccess.h>
 #include <linux/initrd.h>
+#include <linux/freezer.h>
 
 #include <trace/events/module.h>
 
@@ -436,6 +437,9 @@ int call_usermodehelper_exec(struct subp
 	if (wait == UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
 
+	if (wait & UMH_FREEZABLE)
+		freezer_do_not_count();
+
 	if (wait & UMH_KILLABLE) {
 		retval = wait_for_completion_killable(&done);
 		if (!retval)
@@ -448,6 +452,10 @@ int call_usermodehelper_exec(struct subp
 	}
 
 	wait_for_completion(&done);
+
+	if (wait & UMH_FREEZABLE)
+		freezer_count();
+
 wait_done:
 	retval = sub_info->retval;
 out:



^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction Peter Zijlstra
@ 2022-04-21 15:02 ` Peter Zijlstra
  2022-04-21 17:26   ` Eric W. Biederman
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
  5 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 15:02 UTC (permalink / raw)
  To: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, bigeasy, Will Deacon
  Cc: linux-kernel, peterz, tj, linux-pm

Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/android/binder.c       |    4 
 drivers/media/pci/pt3/pt3.c    |    4 
 fs/cifs/inode.c                |    4 
 fs/cifs/transport.c            |    5 
 fs/coredump.c                  |    5 
 fs/nfs/file.c                  |    3 
 fs/nfs/inode.c                 |   12 --
 fs/nfs/nfs3proc.c              |    3 
 fs/nfs/nfs4proc.c              |   14 +-
 fs/nfs/nfs4state.c             |    3 
 fs/nfs/pnfs.c                  |    4 
 fs/xfs/xfs_trans_ail.c         |    8 -
 include/linux/completion.h     |    1 
 include/linux/freezer.h        |  244 +----------------------------------------
 include/linux/sched.h          |   41 +++---
 include/linux/sunrpc/sched.h   |    7 -
 include/linux/wait.h           |   40 +++++-
 kernel/cgroup/legacy_freezer.c |   23 +--
 kernel/exit.c                  |    4 
 kernel/fork.c                  |    5 
 kernel/freezer.c               |  137 ++++++++++++++++-------
 kernel/futex/waitwake.c        |    8 -
 kernel/hung_task.c             |    4 
 kernel/power/main.c            |    6 -
 kernel/power/process.c         |   10 -
 kernel/ptrace.c                |    2 
 kernel/sched/completion.c      |    9 +
 kernel/sched/core.c            |   19 ++-
 kernel/signal.c                |   14 +-
 kernel/time/hrtimer.c          |    4 
 kernel/umh.c                   |   20 +--
 mm/khugepaged.c                |    4 
 net/sunrpc/sched.c             |   12 --
 net/unix/af_unix.c             |    8 -
 34 files changed, 281 insertions(+), 410 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4034,10 +4034,9 @@ static int binder_wait_for_work(struct b
 	struct binder_proc *proc = thread->proc;
 	int ret = 0;
 
-	freezer_do_not_count();
 	binder_inner_proc_lock(proc);
 	for (;;) {
-		prepare_to_wait(&thread->wait, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(&thread->wait, &wait, TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 		if (binder_has_work_ilocked(thread, do_proc_work))
 			break;
 		if (do_proc_work)
@@ -4054,7 +4053,6 @@ static int binder_wait_for_work(struct b
 	}
 	finish_wait(&thread->wait, &wait);
 	binder_inner_proc_unlock(proc);
-	freezer_count();
 
 	return ret;
 }
--- a/drivers/media/pci/pt3/pt3.c
+++ b/drivers/media/pci/pt3/pt3.c
@@ -445,8 +445,8 @@ static int pt3_fetch_thread(void *data)
 		pt3_proc_dma(adap);
 
 		delay = ktime_set(0, PT3_FETCH_DELAY * NSEC_PER_MSEC);
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		freezable_schedule_hrtimeout_range(&delay,
+		set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
+		schedule_hrtimeout_range(&delay,
 					PT3_FETCH_DELAY_DELTA * NSEC_PER_MSEC,
 					HRTIMER_MODE_REL);
 	}
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2286,7 +2286,7 @@ cifs_invalidate_mapping(struct inode *in
 static int
 cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
@@ -2304,7 +2304,7 @@ cifs_revalidate_mapping(struct inode *in
 		return 0;
 
 	rc = wait_on_bit_lock_action(flags, CIFS_INO_LOCK, cifs_wait_bit_killable,
-				     TASK_KILLABLE);
+				     TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 	if (rc)
 		return rc;
 
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -760,8 +760,9 @@ wait_for_response(struct TCP_Server_Info
 {
 	int error;
 
-	error = wait_event_freezekillable_unsafe(server->response_q,
-				    midQ->mid_state != MID_REQUEST_SUBMITTED);
+	error = wait_event_state(server->response_q,
+				 midQ->mid_state != MID_REQUEST_SUBMITTED,
+				 (TASK_KILLABLE|TASK_FREEZABLE_UNSAFE));
 	if (error < 0)
 		return -ERESTARTSYS;
 
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -402,9 +402,8 @@ static int coredump_wait(int exit_code,
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
 
-		freezer_do_not_count();
-		wait_for_completion(&core_state->startup);
-		freezer_count();
+		wait_for_completion_state(&core_state->startup,
+					  TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
 		/*
 		 * Wait for all the threads to become inactive, so that
 		 * all the thread context (extended register state, like
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -565,7 +565,8 @@ static vm_fault_t nfs_vm_page_mkwrite(st
 	}
 
 	wait_on_bit_action(&NFS_I(inode)->flags, NFS_INO_INVALIDATING,
-			nfs_wait_bit_killable, TASK_KILLABLE);
+			   nfs_wait_bit_killable,
+			   TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 
 	lock_page(page);
 	mapping = page_file_mapping(page);
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -72,18 +72,13 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fat
 	return nfs_fileid_to_ino_t(fattr->fileid);
 }
 
-static int nfs_wait_killable(int mode)
+int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
 }
-
-int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
-{
-	return nfs_wait_killable(mode);
-}
 EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
 
 /**
@@ -1331,7 +1326,8 @@ int nfs_clear_invalid_mapping(struct add
 	 */
 	for (;;) {
 		ret = wait_on_bit_action(bitlock, NFS_INO_INVALIDATING,
-					 nfs_wait_bit_killable, TASK_KILLABLE);
+					 nfs_wait_bit_killable,
+					 TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 		if (ret)
 			goto out;
 		spin_lock(&inode->i_lock);
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -36,7 +36,8 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt,
 		res = rpc_call_sync(clnt, msg, flags);
 		if (res != -EJUKEBOX)
 			break;
-		freezable_schedule_timeout_killable_unsafe(NFS_JUKEBOX_RETRY_TIME);
+		__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
+		schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
 		res = -ERESTARTSYS;
 	} while (!fatal_signal_pending(current));
 	return res;
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -408,8 +408,8 @@ static int nfs4_delay_killable(long *tim
 {
 	might_sleep();
 
-	freezable_schedule_timeout_killable_unsafe(
-		nfs4_update_delay(timeout));
+	__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
+	schedule_timeout(nfs4_update_delay(timeout));
 	if (!__fatal_signal_pending(current))
 		return 0;
 	return -EINTR;
@@ -419,7 +419,8 @@ static int nfs4_delay_interruptible(long
 {
 	might_sleep();
 
-	freezable_schedule_timeout_interruptible_unsafe(nfs4_update_delay(timeout));
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE_UNSAFE);
+	schedule_timeout(nfs4_update_delay(timeout));
 	if (!signal_pending(current))
 		return 0;
 	return __fatal_signal_pending(current) ? -EINTR :-ERESTARTSYS;
@@ -7363,7 +7364,8 @@ nfs4_retry_setlk_simple(struct nfs4_stat
 		status = nfs4_proc_setlk(state, cmd, request);
 		if ((status != -EAGAIN) || IS_SETLK(cmd))
 			break;
-		freezable_schedule_timeout_interruptible(timeout);
+		__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+		schedule_timeout(timeout);
 		timeout *= 2;
 		timeout = min_t(unsigned long, NFS4_LOCK_MAXTIMEOUT, timeout);
 		status = -ERESTARTSYS;
@@ -7431,10 +7433,8 @@ nfs4_retry_setlk(struct nfs4_state *stat
 			break;
 
 		status = -ERESTARTSYS;
-		freezer_do_not_count();
-		wait_woken(&waiter.wait, TASK_INTERRUPTIBLE,
+		wait_woken(&waiter.wait, TASK_INTERRUPTIBLE|TASK_FREEZABLE,
 			   NFS4_LOCK_MAXTIMEOUT);
-		freezer_count();
 	} while (!signalled());
 
 	remove_wait_queue(q, &waiter.wait);
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1314,7 +1314,8 @@ int nfs4_wait_clnt_recover(struct nfs_cl
 
 	refcount_inc(&clp->cl_count);
 	res = wait_on_bit_action(&clp->cl_state, NFS4CLNT_MANAGER_RUNNING,
-				 nfs_wait_bit_killable, TASK_KILLABLE);
+				 nfs_wait_bit_killable,
+				 TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 	if (res)
 		goto out;
 	if (clp->cl_cons_state < 0)
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1907,7 +1907,7 @@ static int pnfs_prepare_to_retry_layoutg
 	pnfs_layoutcommit_inode(lo->plh_inode, false);
 	return wait_on_bit_action(&lo->plh_flags, NFS_LAYOUT_RETURN,
 				   nfs_wait_bit_killable,
-				   TASK_KILLABLE);
+				   TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 }
 
 static void nfs_layoutget_begin(struct pnfs_layout_hdr *lo)
@@ -3182,7 +3182,7 @@ pnfs_layoutcommit_inode(struct inode *in
 		status = wait_on_bit_lock_action(&nfsi->flags,
 				NFS_INO_LAYOUTCOMMITTING,
 				nfs_wait_bit_killable,
-				TASK_KILLABLE);
+				TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 		if (status)
 			goto out;
 	}
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -602,9 +602,9 @@ xfsaild(
 
 	while (1) {
 		if (tout && tout <= 20)
-			set_current_state(TASK_KILLABLE);
+			set_current_state(TASK_KILLABLE|TASK_FREEZABLE);
 		else
-			set_current_state(TASK_INTERRUPTIBLE);
+			set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 
 		/*
 		 * Check kthread_should_stop() after we set the task state to
@@ -653,14 +653,14 @@ xfsaild(
 		    ailp->ail_target == ailp->ail_target_prev &&
 		    list_empty(&ailp->ail_buf_list)) {
 			spin_unlock(&ailp->ail_lock);
-			freezable_schedule();
+			schedule();
 			tout = 0;
 			continue;
 		}
 		spin_unlock(&ailp->ail_lock);
 
 		if (tout)
-			freezable_schedule_timeout(msecs_to_jiffies(tout));
+			schedule_timeout(msecs_to_jiffies(tout));
 
 		__set_current_state(TASK_RUNNING);
 
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -103,6 +103,7 @@ extern void wait_for_completion(struct c
 extern void wait_for_completion_io(struct completion *);
 extern int wait_for_completion_interruptible(struct completion *x);
 extern int wait_for_completion_killable(struct completion *x);
+extern int wait_for_completion_state(struct completion *x, unsigned int state);
 extern unsigned long wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
 extern unsigned long wait_for_completion_io_timeout(struct completion *x,
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -8,9 +8,11 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
+#include <linux/jump_label.h>
 
 #ifdef CONFIG_FREEZER
-extern atomic_t system_freezing_cnt;	/* nr of freezing conds in effect */
+DECLARE_STATIC_KEY_FALSE(freezer_active);
+
 extern bool pm_freezing;		/* PM freezing in effect */
 extern bool pm_nosig_freezing;		/* PM nosig freezing in effect */
 
@@ -22,10 +24,7 @@ extern unsigned int freeze_timeout_msecs
 /*
  * Check if a process has been frozen
  */
-static inline bool frozen(struct task_struct *p)
-{
-	return p->flags & PF_FROZEN;
-}
+extern bool frozen(struct task_struct *p);
 
 extern bool freezing_slow_path(struct task_struct *p);
 
@@ -34,9 +33,10 @@ extern bool freezing_slow_path(struct ta
  */
 static inline bool freezing(struct task_struct *p)
 {
-	if (likely(!atomic_read(&system_freezing_cnt)))
-		return false;
-	return freezing_slow_path(p);
+	if (static_branch_unlikely(&freezer_active))
+		return freezing_slow_path(p);
+
+	return false;
 }
 
 /* Takes and releases task alloc lock using task_lock() */
@@ -48,23 +48,14 @@ extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 extern void thaw_kernel_threads(void);
 
-/*
- * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
- * If try_to_freeze causes a lockdep warning it means the caller may deadlock
- */
-static inline bool try_to_freeze_unsafe(void)
+static inline bool try_to_freeze(void)
 {
 	might_sleep();
 	if (likely(!freezing(current)))
 		return false;
-	return __refrigerator(false);
-}
-
-static inline bool try_to_freeze(void)
-{
 	if (!(current->flags & PF_NOFREEZE))
 		debug_check_no_locks_held();
-	return try_to_freeze_unsafe();
+	return __refrigerator(false);
 }
 
 extern bool freeze_task(struct task_struct *p);
@@ -79,195 +70,6 @@ static inline bool cgroup_freezing(struc
 }
 #endif /* !CONFIG_CGROUP_FREEZER */
 
-/*
- * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
- * calls wait_for_completion(&vfork) and reset right after it returns from this
- * function.  Next, the parent should call try_to_freeze() to freeze itself
- * appropriately in case the child has exited before the freezing of tasks is
- * complete.  However, we don't want kernel threads to be frozen in unexpected
- * places, so we allow them to block freeze_processes() instead or to set
- * PF_NOFREEZE if needed. Fortunately, in the ____call_usermodehelper() case the
- * parent won't really block freeze_processes(), since ____call_usermodehelper()
- * (the child) does a little before exec/exit and it can't be frozen before
- * waking up the parent.
- */
-
-
-/**
- * freezer_do_not_count - tell freezer to ignore %current
- *
- * Tell freezers to ignore the current task when determining whether the
- * target frozen state is reached.  IOW, the current task will be
- * considered frozen enough by freezers.
- *
- * The caller shouldn't do anything which isn't allowed for a frozen task
- * until freezer_cont() is called.  Usually, freezer[_do_not]_count() pair
- * wrap a scheduling operation and nothing much else.
- */
-static inline void freezer_do_not_count(void)
-{
-	current->flags |= PF_FREEZER_SKIP;
-}
-
-/**
- * freezer_count - tell freezer to stop ignoring %current
- *
- * Undo freezer_do_not_count().  It tells freezers that %current should be
- * considered again and tries to freeze if freezing condition is already in
- * effect.
- */
-static inline void freezer_count(void)
-{
-	current->flags &= ~PF_FREEZER_SKIP;
-	/*
-	 * If freezing is in progress, the following paired with smp_mb()
-	 * in freezer_should_skip() ensures that either we see %true
-	 * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP.
-	 */
-	smp_mb();
-	try_to_freeze();
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline void freezer_count_unsafe(void)
-{
-	current->flags &= ~PF_FREEZER_SKIP;
-	smp_mb();
-	try_to_freeze_unsafe();
-}
-
-/**
- * freezer_should_skip - whether to skip a task when determining frozen
- *			 state is reached
- * @p: task in quesion
- *
- * This function is used by freezers after establishing %true freezing() to
- * test whether a task should be skipped when determining the target frozen
- * state is reached.  IOW, if this function returns %true, @p is considered
- * frozen enough.
- */
-static inline bool freezer_should_skip(struct task_struct *p)
-{
-	/*
-	 * The following smp_mb() paired with the one in freezer_count()
-	 * ensures that either freezer_count() sees %true freezing() or we
-	 * see cleared %PF_FREEZER_SKIP and return %false.  This makes it
-	 * impossible for a task to slip frozen state testing after
-	 * clearing %PF_FREEZER_SKIP.
-	 */
-	smp_mb();
-	return p->flags & PF_FREEZER_SKIP;
-}
-
-/*
- * These functions are intended to be used whenever you want allow a sleeping
- * task to be frozen. Note that neither return any clear indication of
- * whether a freeze event happened while in this function.
- */
-
-/* Like schedule(), but should not block the freezer. */
-static inline void freezable_schedule(void)
-{
-	freezer_do_not_count();
-	schedule();
-	freezer_count();
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline void freezable_schedule_unsafe(void)
-{
-	freezer_do_not_count();
-	schedule();
-	freezer_count_unsafe();
-}
-
-/*
- * Like schedule_timeout(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline long freezable_schedule_timeout(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/*
- * Like schedule_timeout_interruptible(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline long freezable_schedule_timeout_interruptible(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_interruptible(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline long freezable_schedule_timeout_interruptible_unsafe(long timeout)
-{
-	long __retval;
-
-	freezer_do_not_count();
-	__retval = schedule_timeout_interruptible(timeout);
-	freezer_count_unsafe();
-	return __retval;
-}
-
-/* Like schedule_timeout_killable(), but should not block the freezer. */
-static inline long freezable_schedule_timeout_killable(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_killable(timeout);
-	freezer_count();
-	return __retval;
-}
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-static inline long freezable_schedule_timeout_killable_unsafe(long timeout)
-{
-	long __retval;
-	freezer_do_not_count();
-	__retval = schedule_timeout_killable(timeout);
-	freezer_count_unsafe();
-	return __retval;
-}
-
-/*
- * Like schedule_hrtimeout_range(), but should not block the freezer.  Do not
- * call this with locks held.
- */
-static inline int freezable_schedule_hrtimeout_range(ktime_t *expires,
-		u64 delta, const enum hrtimer_mode mode)
-{
-	int __retval;
-	freezer_do_not_count();
-	__retval = schedule_hrtimeout_range(expires, delta, mode);
-	freezer_count();
-	return __retval;
-}
-
-/*
- * Freezer-friendly wrappers around wait_event_interruptible(),
- * wait_event_killable() and wait_event_interruptible_timeout(), originally
- * defined in <linux/wait.h>
- */
-
-/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
-#define wait_event_freezekillable_unsafe(wq, condition)			\
-({									\
-	int __retval;							\
-	freezer_do_not_count();						\
-	__retval = wait_event_killable(wq, (condition));		\
-	freezer_count_unsafe();						\
-	__retval;							\
-})
-
 #else /* !CONFIG_FREEZER */
 static inline bool frozen(struct task_struct *p) { return false; }
 static inline bool freezing(struct task_struct *p) { return false; }
@@ -281,35 +83,9 @@ static inline void thaw_kernel_threads(v
 
 static inline bool try_to_freeze(void) { return false; }
 
-static inline void freezer_do_not_count(void) {}
 static inline void freezer_count(void) {}
-static inline int freezer_should_skip(struct task_struct *p) { return 0; }
 static inline void set_freezable(void) {}
 
-#define freezable_schedule()  schedule()
-
-#define freezable_schedule_unsafe()  schedule()
-
-#define freezable_schedule_timeout(timeout)  schedule_timeout(timeout)
-
-#define freezable_schedule_timeout_interruptible(timeout)		\
-	schedule_timeout_interruptible(timeout)
-
-#define freezable_schedule_timeout_interruptible_unsafe(timeout)	\
-	schedule_timeout_interruptible(timeout)
-
-#define freezable_schedule_timeout_killable(timeout)			\
-	schedule_timeout_killable(timeout)
-
-#define freezable_schedule_timeout_killable_unsafe(timeout)		\
-	schedule_timeout_killable(timeout)
-
-#define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
-	schedule_hrtimeout_range(expires, delta, mode)
-
-#define wait_event_freezekillable_unsafe(wq, condition)			\
-		wait_event_killable(wq, condition)
-
 #endif /* !CONFIG_FREEZER */
 
 #endif	/* FREEZER_H_INCLUDED */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -80,25 +80,32 @@ struct task_group;
  */
 
 /* Used in tsk->state: */
-#define TASK_RUNNING			0x0000
-#define TASK_INTERRUPTIBLE		0x0001
-#define TASK_UNINTERRUPTIBLE		0x0002
-#define __TASK_STOPPED			0x0004
-#define __TASK_TRACED			0x0008
+#define TASK_RUNNING			0x000000
+#define TASK_INTERRUPTIBLE		0x000001
+#define TASK_UNINTERRUPTIBLE		0x000002
+#define __TASK_STOPPED			0x000004
+#define __TASK_TRACED			0x000008
 /* Used in tsk->exit_state: */
-#define EXIT_DEAD			0x0010
-#define EXIT_ZOMBIE			0x0020
+#define EXIT_DEAD			0x000010
+#define EXIT_ZOMBIE			0x000020
 #define EXIT_TRACE			(EXIT_ZOMBIE | EXIT_DEAD)
 /* Used in tsk->state again: */
-#define TASK_PARKED			0x0040
-#define TASK_DEAD			0x0080
-#define TASK_WAKEKILL			0x0100
-#define TASK_WAKING			0x0200
-#define TASK_NOLOAD			0x0400
-#define TASK_NEW			0x0800
-/* RT specific auxilliary flag to mark RT lock waiters */
-#define TASK_RTLOCK_WAIT		0x1000
-#define TASK_STATE_MAX			0x2000
+#define TASK_PARKED			0x000040
+#define TASK_DEAD			0x000080
+#define TASK_WAKEKILL			0x000100
+#define TASK_WAKING			0x000200
+#define TASK_NOLOAD			0x000400
+#define TASK_NEW			0x000800
+#define TASK_FREEZABLE			0x001000
+#define __TASK_FREEZABLE_UNSAFE	       (0x002000 * IS_ENABLED(CONFIG_LOCKDEP))
+#define TASK_FROZEN			0x004000
+#define TASK_RTLOCK_WAIT		0x008000
+#define TASK_STATE_MAX			0x010000
+
+/*
+ * DO NOT ADD ANY NEW USERS !
+ */
+#define TASK_FREEZABLE_UNSAFE		(TASK_FREEZABLE | __TASK_FREEZABLE_UNSAFE)
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
@@ -1698,7 +1705,6 @@ extern struct pid *cad_pid;
 #define PF_NPROC_EXCEEDED	0x00001000	/* set_user() noticed that RLIMIT_NPROC was exceeded */
 #define PF_USED_MATH		0x00002000	/* If unset the fpu must be initialized before use */
 #define PF_NOFREEZE		0x00008000	/* This thread should not be frozen */
-#define PF_FROZEN		0x00010000	/* Frozen for system suspend */
 #define PF_KSWAPD		0x00020000	/* I am kswapd */
 #define PF_MEMALLOC_NOFS	0x00040000	/* All allocation requests will inherit GFP_NOFS */
 #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
@@ -1709,7 +1715,6 @@ extern struct pid *cad_pid;
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
 #define PF_MEMALLOC_PIN		0x10000000	/* Allocation context constrained to zones which allow long term pinning. */
-#define PF_FREEZER_SKIP		0x40000000	/* Freezer should not count it as freezable */
 #define PF_SUSPEND_TASK		0x80000000      /* This thread called freeze_processes() and should not be frozen */
 
 /*
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -252,7 +252,7 @@ int		rpc_malloc(struct rpc_task *);
 void		rpc_free(struct rpc_task *);
 int		rpciod_up(void);
 void		rpciod_down(void);
-int		__rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *);
+int		rpc_wait_for_completion_task(struct rpc_task *task);
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 struct net;
 void		rpc_show_tasks(struct net *);
@@ -264,11 +264,6 @@ extern struct workqueue_struct *xprtiod_
 void		rpc_prepare_task(struct rpc_task *task);
 gfp_t		rpc_task_gfp_mask(void);
 
-static inline int rpc_wait_for_completion_task(struct rpc_task *task)
-{
-	return __rpc_wait_for_completion_task(task, NULL);
-}
-
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) || IS_ENABLED(CONFIG_TRACEPOINTS)
 static inline const char * rpc_qname(const struct rpc_wait_queue *q)
 {
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -361,8 +361,8 @@ do {										\
 } while (0)
 
 #define __wait_event_freezable(wq_head, condition)				\
-	___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0,		\
-			    freezable_schedule())
+	___wait_event(wq_head, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE),	\
+			0, 0, schedule())
 
 /**
  * wait_event_freezable - sleep (or freeze) until a condition gets true
@@ -420,8 +420,8 @@ do {										\
 
 #define __wait_event_freezable_timeout(wq_head, condition, timeout)		\
 	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
-		      TASK_INTERRUPTIBLE, 0, timeout,				\
-		      __ret = freezable_schedule_timeout(__ret))
+		      (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 0, timeout,		\
+		      __ret = schedule_timeout(__ret))
 
 /*
  * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid
@@ -641,8 +641,8 @@ do {										\
 
 
 #define __wait_event_freezable_exclusive(wq, condition)				\
-	___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0,			\
-			freezable_schedule())
+	___wait_event(wq, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 1, 0,\
+			schedule())
 
 #define wait_event_freezable_exclusive(wq, condition)				\
 ({										\
@@ -931,6 +931,34 @@ extern int do_wait_intr_irq(wait_queue_h
 	__ret;									\
 })
 
+#define __wait_event_state(wq, condition, state)				\
+	___wait_event(wq, condition, state, 0, 0, schedule())
+
+/**
+ * wait_event_state - sleep until a condition gets true
+ * @wq_head: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @state: state to sleep in
+ *
+ * The process is put to sleep (@state) until the @condition evaluates to true
+ * or a signal is received.  The @condition is checked each time the waitqueue
+ * @wq_head is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ *
+ * The function will return -ERESTARTSYS if it was interrupted by a
+ * signal and 0 if @condition evaluated to true.
+ */
+#define wait_event_state(wq_head, condition, state)				\
+({										\
+	int __ret = 0;								\
+	might_sleep();								\
+	if (!(condition))							\
+		__ret = __wait_event_state(wq_head, condition, state);		\
+	__ret;									\
+})
+
 #define __wait_event_killable_timeout(wq_head, condition, timeout)		\
 	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
 		      TASK_KILLABLE, 0, timeout,				\
--- a/kernel/cgroup/legacy_freezer.c
+++ b/kernel/cgroup/legacy_freezer.c
@@ -113,7 +113,7 @@ static int freezer_css_online(struct cgr
 
 	if (parent && (parent->state & CGROUP_FREEZING)) {
 		freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
-		atomic_inc(&system_freezing_cnt);
+		static_branch_inc(&freezer_active);
 	}
 
 	mutex_unlock(&freezer_mutex);
@@ -134,7 +134,7 @@ static void freezer_css_offline(struct c
 	mutex_lock(&freezer_mutex);
 
 	if (freezer->state & CGROUP_FREEZING)
-		atomic_dec(&system_freezing_cnt);
+		static_branch_dec(&freezer_active);
 
 	freezer->state = 0;
 
@@ -179,6 +179,7 @@ static void freezer_attach(struct cgroup
 			__thaw_task(task);
 		} else {
 			freeze_task(task);
+
 			/* clear FROZEN and propagate upwards */
 			while (freezer && (freezer->state & CGROUP_FROZEN)) {
 				freezer->state &= ~CGROUP_FROZEN;
@@ -271,16 +272,8 @@ static void update_if_frozen(struct cgro
 	css_task_iter_start(css, 0, &it);
 
 	while ((task = css_task_iter_next(&it))) {
-		if (freezing(task)) {
-			/*
-			 * freezer_should_skip() indicates that the task
-			 * should be skipped when determining freezing
-			 * completion.  Consider it frozen in addition to
-			 * the usual frozen condition.
-			 */
-			if (!frozen(task) && !freezer_should_skip(task))
-				goto out_iter_end;
-		}
+		if (freezing(task) && !frozen(task))
+			goto out_iter_end;
 	}
 
 	freezer->state |= CGROUP_FROZEN;
@@ -357,7 +350,7 @@ static void freezer_apply_state(struct f
 
 	if (freeze) {
 		if (!(freezer->state & CGROUP_FREEZING))
-			atomic_inc(&system_freezing_cnt);
+			static_branch_inc(&freezer_active);
 		freezer->state |= state;
 		freeze_cgroup(freezer);
 	} else {
@@ -366,9 +359,9 @@ static void freezer_apply_state(struct f
 		freezer->state &= ~state;
 
 		if (!(freezer->state & CGROUP_FREEZING)) {
-			if (was_freezing)
-				atomic_dec(&system_freezing_cnt);
 			freezer->state &= ~CGROUP_FROZEN;
+			if (was_freezing)
+				static_branch_dec(&freezer_active);
 			unfreeze_cgroup(freezer);
 		}
 	}
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -374,10 +374,10 @@ static void coredump_task_exit(struct ta
 			complete(&core_state->startup);
 
 		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
+			set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
 			if (!self.task) /* see coredump_finish() */
 				break;
-			freezable_schedule();
+			schedule();
 		}
 		__set_current_state(TASK_RUNNING);
 	}
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1417,13 +1417,12 @@ static void complete_vfork_done(struct t
 static int wait_for_vfork_done(struct task_struct *child,
 				struct completion *vfork)
 {
+	unsigned int state = TASK_UNINTERRUPTIBLE|TASK_KILLABLE|TASK_FREEZABLE;
 	int killed;
 
-	freezer_do_not_count();
 	cgroup_enter_frozen();
-	killed = wait_for_completion_killable(vfork);
+	killed = wait_for_completion_state(vfork, state);
 	cgroup_leave_frozen(false);
-	freezer_count();
 
 	if (killed) {
 		task_lock(child);
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -13,10 +13,11 @@
 #include <linux/kthread.h>
 
 /* total number of freezing conditions in effect */
-atomic_t system_freezing_cnt = ATOMIC_INIT(0);
-EXPORT_SYMBOL(system_freezing_cnt);
+DEFINE_STATIC_KEY_FALSE(freezer_active);
+EXPORT_SYMBOL(freezer_active);
 
-/* indicate whether PM freezing is in effect, protected by
+/*
+ * indicate whether PM freezing is in effect, protected by
  * system_transition_mutex
  */
 bool pm_freezing;
@@ -29,7 +30,7 @@ static DEFINE_SPINLOCK(freezer_lock);
  * freezing_slow_path - slow path for testing whether a task needs to be frozen
  * @p: task to be tested
  *
- * This function is called by freezing() if system_freezing_cnt isn't zero
+ * This function is called by freezing() if freezer_active isn't zero
  * and tests whether @p needs to enter and stay in frozen state.  Can be
  * called under any context.  The freezers are responsible for ensuring the
  * target tasks see the updated state.
@@ -52,41 +53,40 @@ bool freezing_slow_path(struct task_stru
 }
 EXPORT_SYMBOL(freezing_slow_path);
 
+bool frozen(struct task_struct *p)
+{
+	return READ_ONCE(p->__state) & TASK_FROZEN;
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
-	/* Hmm, should we be allowed to suspend when there are realtime
-	   processes around? */
+	unsigned int state = get_current_state();
 	bool was_frozen = false;
-	unsigned int save = get_current_state();
 
 	pr_debug("%s entered refrigerator\n", current->comm);
 
+	WARN_ON_ONCE(state && !(state & TASK_NORMAL));
+
 	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		bool freeze;
+
+		set_current_state(TASK_FROZEN);
 
 		spin_lock_irq(&freezer_lock);
-		current->flags |= PF_FROZEN;
-		if (!freezing(current) ||
-		    (check_kthr_stop && kthread_should_stop()))
-			current->flags &= ~PF_FROZEN;
+		freeze = freezing(current) && !(check_kthr_stop && kthread_should_stop());
 		spin_unlock_irq(&freezer_lock);
 
-		if (!(current->flags & PF_FROZEN))
+		if (!freeze)
 			break;
+
 		was_frozen = true;
 		schedule();
 	}
+	__set_current_state(TASK_RUNNING);
 
 	pr_debug("%s left refrigerator\n", current->comm);
 
-	/*
-	 * Restore saved task state before returning.  The mb'd version
-	 * needs to be used; otherwise, it might silently break
-	 * synchronization which depends on ordered task state change.
-	 */
-	set_current_state(save);
-
 	return was_frozen;
 }
 EXPORT_SYMBOL(__refrigerator);
@@ -101,6 +101,44 @@ static void fake_signal_wake_up(struct t
 	}
 }
 
+static int __set_task_frozen(struct task_struct *p, void *arg)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if (p->on_rq)
+		return 0;
+
+	if (p != current && task_curr(p))
+		return 0;
+
+	if (!(state & (TASK_FREEZABLE | __TASK_STOPPED | __TASK_TRACED)))
+		return 0;
+
+	/*
+	 * Only TASK_NORMAL can be augmented with TASK_FREEZABLE, since they
+	 * can suffer spurious wakeups.
+	 */
+	if (state & TASK_FREEZABLE)
+		WARN_ON_ONCE(!(state & TASK_NORMAL));
+
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * It's dangerous to freeze with locks held; there be dragons there.
+	 */
+	if (!(state & __TASK_FREEZABLE_UNSAFE))
+		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
+#endif
+
+	WRITE_ONCE(p->__state, TASK_FROZEN);
+	return TASK_FROZEN;
+}
+
+static bool __freeze_task(struct task_struct *p)
+{
+	/* TASK_FREEZABLE|TASK_STOPPED|TASK_TRACED -> TASK_FROZEN */
+	return task_call_func(p, __set_task_frozen, NULL);
+}
+
 /**
  * freeze_task - send a freeze request to given task
  * @p: task to send the request to
@@ -116,20 +154,8 @@ bool freeze_task(struct task_struct *p)
 {
 	unsigned long flags;
 
-	/*
-	 * This check can race with freezer_do_not_count, but worst case that
-	 * will result in an extra wakeup being sent to the task.  It does not
-	 * race with freezer_count(), the barriers in freezer_count() and
-	 * freezer_should_skip() ensure that either freezer_count() sees
-	 * freezing == true in try_to_freeze() and freezes, or
-	 * freezer_should_skip() sees !PF_FREEZE_SKIP and freezes the task
-	 * normally.
-	 */
-	if (freezer_should_skip(p))
-		return false;
-
 	spin_lock_irqsave(&freezer_lock, flags);
-	if (!freezing(p) || frozen(p)) {
+	if (!freezing(p) || frozen(p) || __freeze_task(p)) {
 		spin_unlock_irqrestore(&freezer_lock, flags);
 		return false;
 	}
@@ -137,19 +163,56 @@ bool freeze_task(struct task_struct *p)
 	if (!(p->flags & PF_KTHREAD))
 		fake_signal_wake_up(p);
 	else
-		wake_up_state(p, TASK_INTERRUPTIBLE);
+		wake_up_state(p, TASK_NORMAL);
 
 	spin_unlock_irqrestore(&freezer_lock, flags);
 	return true;
 }
 
+/*
+ * The special task states (TASK_STOPPED, TASK_TRACED) keep their canonical
+ * state in p->jobctl. If either of them got a wakeup that was missed because
+ * TASK_FROZEN, then their canonical state reflects that and the below will
+ * refuse to restore the special state and instead issue the wakeup.
+ */
+static int __set_task_special(struct task_struct *p, void *arg)
+{
+	unsigned int state = 0;
+
+	if (p->jobctl & JOBCTL_TRACED)
+		state = TASK_TRACED;
+
+	else if (p->jobctl & JOBCTL_STOPPED)
+		state = TASK_STOPPED;
+
+	if (__fatal_signal_pending(p) &&
+	    !(p->jobctl & JOBCTL_DELAY_WAKEKILL))
+		state = 0;
+
+	if (state)
+		WRITE_ONCE(p->__state, state);
+
+	return state;
+}
+
 void __thaw_task(struct task_struct *p)
 {
-	unsigned long flags;
+	unsigned long flags, flags2;
 
 	spin_lock_irqsave(&freezer_lock, flags);
-	if (frozen(p))
-		wake_up_process(p);
+	if (WARN_ON_ONCE(freezing(p)))
+		goto unlock;
+
+	if (lock_task_sighand(p, &flags2)) {
+		/* TASK_FROZEN -> TASK_{STOPPED,TRACED} */
+		bool ret = task_call_func(p, __set_task_special, NULL);
+		unlock_task_sighand(p, &flags2);
+		if (ret)
+			goto unlock;
+	}
+
+	wake_up_state(p, TASK_FROZEN);
+unlock:
 	spin_unlock_irqrestore(&freezer_lock, flags);
 }
 
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -334,7 +334,7 @@ void futex_wait_queue(struct futex_hash_
 	 * futex_queue() calls spin_unlock() upon completion, both serializing
 	 * access to the hash list and forcing another memory barrier.
 	 */
-	set_current_state(TASK_INTERRUPTIBLE);
+	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 	futex_queue(q, hb);
 
 	/* Arm the timer */
@@ -352,7 +352,7 @@ void futex_wait_queue(struct futex_hash_
 		 * is no timeout, or if it has yet to expire.
 		 */
 		if (!timeout || timeout->task)
-			freezable_schedule();
+			schedule();
 	}
 	__set_current_state(TASK_RUNNING);
 }
@@ -430,7 +430,7 @@ static int futex_wait_multiple_setup(str
 			return ret;
 	}
 
-	set_current_state(TASK_INTERRUPTIBLE);
+	set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 
 	for (i = 0; i < count; i++) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)vs[i].w.uaddr;
@@ -504,7 +504,7 @@ static void futex_sleep_multiple(struct
 			return;
 	}
 
-	freezable_schedule();
+	schedule();
 }
 
 /**
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -95,8 +95,8 @@ static void check_hung_task(struct task_
 	 * Ensure the task is not frozen.
 	 * Also, skip vfork and any other user process that freezer should skip.
 	 */
-	if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
-	    return;
+	if (unlikely(READ_ONCE(t->__state) & (TASK_FREEZABLE | TASK_FROZEN)))
+		return;
 
 	/*
 	 * When a freshly created task is scheduled once, changes its state to
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -24,7 +24,7 @@
 unsigned int lock_system_sleep(void)
 {
 	unsigned int flags = current->flags;
-	current->flags |= PF_FREEZER_SKIP;
+	current->flags |= PF_NOFREEZE;
 	mutex_lock(&system_transition_mutex);
 	return flags;
 }
@@ -48,8 +48,8 @@ void unlock_system_sleep(unsigned int fl
 	 * Which means, if we use try_to_freeze() here, it would make them
 	 * enter the refrigerator, thus causing hibernation to lockup.
 	 */
-	if (!(flags & PF_FREEZER_SKIP))
-		current->flags &= ~PF_FREEZER_SKIP;
+	if (!(flags & PF_NOFREEZE))
+		current->flags &= ~PF_NOFREEZE;
 	mutex_unlock(&system_transition_mutex);
 }
 EXPORT_SYMBOL_GPL(unlock_system_sleep);
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -53,8 +53,7 @@ static int try_to_freeze_tasks(bool user
 			if (p == current || !freeze_task(p))
 				continue;
 
-			if (!freezer_should_skip(p))
-				todo++;
+			todo++;
 		}
 		read_unlock(&tasklist_lock);
 
@@ -99,8 +98,7 @@ static int try_to_freeze_tasks(bool user
 		if (!wakeup || pm_debug_messages_on) {
 			read_lock(&tasklist_lock);
 			for_each_process_thread(g, p) {
-				if (p != current && !freezer_should_skip(p)
-				    && freezing(p) && !frozen(p))
+				if (p != current && freezing(p) && !frozen(p))
 					sched_show_task(p);
 			}
 			read_unlock(&tasklist_lock);
@@ -132,7 +130,7 @@ int freeze_processes(void)
 	current->flags |= PF_SUSPEND_TASK;
 
 	if (!pm_freezing)
-		atomic_inc(&system_freezing_cnt);
+		static_branch_inc(&freezer_active);
 
 	pm_wakeup_clear(0);
 	pr_info("Freezing user space processes ... ");
@@ -193,7 +191,7 @@ void thaw_processes(void)
 
 	trace_suspend_resume(TPS("thaw_processes"), 0, true);
 	if (pm_freezing)
-		atomic_dec(&system_freezing_cnt);
+		static_branch_dec(&freezer_active);
 	pm_freezing = false;
 	pm_nosig_freezing = false;
 
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
 	}
 	__set_current_state(TASK_RUNNING);
 
-	if (!wait_task_inactive(child, TASK_TRACED) ||
+	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
 	    !ptrace_freeze_traced(child))
 		return -ESRCH;
 
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -247,6 +247,15 @@ int __sched wait_for_completion_killable
 }
 EXPORT_SYMBOL(wait_for_completion_killable);
 
+int __sched wait_for_completion_state(struct completion *x, unsigned int state)
+{
+	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state);
+	if (t == -ERESTARTSYS)
+		return t;
+	return 0;
+}
+EXPORT_SYMBOL(wait_for_completion_state);
+
 /**
  * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3260,6 +3260,19 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+static inline bool __wti_match(struct task_struct *p, unsigned int match_state)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if ((match_state & TASK_FREEZABLE) && state == TASK_FROZEN)
+		return true;
+
+	if (state == (match_state & ~TASK_FREEZABLE))
+		return true;
+
+	return false;
+}
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3304,7 +3317,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && !__wti_match(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3332,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || __wti_match(p, match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 
@@ -6320,7 +6333,7 @@ static void __sched notrace __schedule(u
 			prev->sched_contributes_to_load =
 				(prev_state & TASK_UNINTERRUPTIBLE) &&
 				!(prev_state & TASK_NOLOAD) &&
-				!(prev->flags & PF_FROZEN);
+				!(prev_state & TASK_FROZEN);
 
 			if (prev->sched_contributes_to_load)
 				rq->nr_uninterruptible++;
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2321,7 +2321,7 @@ static int ptrace_stop(int exit_code, in
 		clear_traced_quiesce();
 
 		preempt_enable_no_resched();
-		freezable_schedule();
+		schedule();
 
 		cgroup_leave_frozen(true);
 	} else {
@@ -2514,7 +2514,7 @@ static bool do_signal_stop(int signr)
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
-		freezable_schedule();
+		schedule();
 		return true;
 	} else {
 		/*
@@ -2589,11 +2589,11 @@ static void do_freezer_trap(void)
 	 * immediately (if there is a non-fatal signal pending), and
 	 * put the task into sleep.
 	 */
-	__set_current_state(TASK_INTERRUPTIBLE);
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 	clear_thread_flag(TIF_SIGPENDING);
 	spin_unlock_irq(&current->sighand->siglock);
 	cgroup_enter_frozen();
-	freezable_schedule();
+	schedule();
 }
 
 static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
@@ -3639,9 +3639,9 @@ static int do_sigtimedwait(const sigset_
 		recalc_sigpending();
 		spin_unlock_irq(&tsk->sighand->siglock);
 
-		__set_current_state(TASK_INTERRUPTIBLE);
-		ret = freezable_schedule_hrtimeout_range(to, tsk->timer_slack_ns,
-							 HRTIMER_MODE_REL);
+		__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+		ret = schedule_hrtimeout_range(to, tsk->timer_slack_ns,
+					       HRTIMER_MODE_REL);
 		spin_lock_irq(&tsk->sighand->siglock);
 		__set_task_blocked(tsk, &tsk->real_blocked);
 		sigemptyset(&tsk->real_blocked);
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2037,11 +2037,11 @@ static int __sched do_nanosleep(struct h
 	struct restart_block *restart;
 
 	do {
-		set_current_state(TASK_INTERRUPTIBLE);
+		set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
 		hrtimer_sleeper_start_expires(t, mode);
 
 		if (likely(t->task))
-			freezable_schedule();
+			schedule();
 
 		hrtimer_cancel(&t->timer);
 		mode = HRTIMER_MODE_ABS;
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -404,6 +404,7 @@ EXPORT_SYMBOL(call_usermodehelper_setup)
  */
 int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 {
+	unsigned int state = TASK_UNINTERRUPTIBLE;
 	DECLARE_COMPLETION_ONSTACK(done);
 	int retval = 0;
 
@@ -437,25 +438,22 @@ int call_usermodehelper_exec(struct subp
 	if (wait == UMH_NO_WAIT)	/* task has freed sub_info */
 		goto unlock;
 
+	if (wait & UMH_KILLABLE)
+		state |= TASK_KILLABLE;
+
 	if (wait & UMH_FREEZABLE)
-		freezer_do_not_count();
+		state |= TASK_FREEZABLE;
 
-	if (wait & UMH_KILLABLE) {
-		retval = wait_for_completion_killable(&done);
-		if (!retval)
-			goto wait_done;
+	retval = wait_for_completion_state(&done, state);
+	if (!retval)
+		goto wait_done;
 
+	if (wait & UMH_KILLABLE) {
 		/* umh_complete() will see NULL and free sub_info */
 		if (xchg(&sub_info->complete, NULL))
 			goto unlock;
-		/* fallthrough, umh_complete() was already called */
 	}
 
-	wait_for_completion(&done);
-
-	if (wait & UMH_FREEZABLE)
-		freezer_count();
-
 wait_done:
 	retval = sub_info->retval;
 out:
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -780,8 +780,8 @@ static void khugepaged_alloc_sleep(void)
 	DEFINE_WAIT(wait);
 
 	add_wait_queue(&khugepaged_wait, &wait);
-	freezable_schedule_timeout_interruptible(
-		msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
+	__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE);
+	schedule_timeout(msecs_to_jiffies(khugepaged_alloc_sleep_millisecs));
 	remove_wait_queue(&khugepaged_wait, &wait);
 }
 
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -268,7 +268,7 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
+	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;
@@ -332,14 +332,12 @@ static int rpc_complete_task(struct rpc_
  * to enforce taking of the wq->lock and hence avoid races with
  * rpc_complete_task().
  */
-int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *action)
+int rpc_wait_for_completion_task(struct rpc_task *task)
 {
-	if (action == NULL)
-		action = rpc_wait_bit_killable;
 	return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE,
-			action, TASK_KILLABLE);
+			rpc_wait_bit_killable, TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
 }
-EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task);
+EXPORT_SYMBOL_GPL(rpc_wait_for_completion_task);
 
 /*
  * Make an RPC task runnable.
@@ -963,7 +961,7 @@ static void __rpc_execute(struct rpc_tas
 		trace_rpc_task_sync_sleep(task, task->tk_action);
 		status = out_of_line_wait_on_bit(&task->tk_runstate,
 				RPC_TASK_QUEUED, rpc_wait_bit_killable,
-				TASK_KILLABLE);
+				TASK_KILLABLE|TASK_FREEZABLE);
 		if (status < 0) {
 			/*
 			 * When a sync task receives a signal, it exits with
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2530,13 +2530,14 @@ static long unix_stream_data_wait(struct
 				  struct sk_buff *last, unsigned int last_len,
 				  bool freezable)
 {
+	unsigned int state = TASK_INTERRUPTIBLE | freezable * TASK_FREEZABLE;
 	struct sk_buff *tail;
 	DEFINE_WAIT(wait);
 
 	unix_state_lock(sk);
 
 	for (;;) {
-		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, state);
 
 		tail = skb_peek_tail(&sk->sk_receive_queue);
 		if (tail != last ||
@@ -2549,10 +2550,7 @@ static long unix_stream_data_wait(struct
 
 		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 		unix_state_unlock(sk);
-		if (freezable)
-			timeo = freezable_schedule_timeout(timeo);
-		else
-			timeo = schedule_timeout(timeo);
+		timeo = schedule_timeout(timeo);
 		unix_state_lock(sk);
 
 		if (sock_flag(sk, SOCK_DEAD))



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
@ 2022-04-21 17:26   ` Eric W. Biederman
  2022-04-21 17:57     ` Oleg Nesterov
  2022-04-21 19:55     ` Peter Zijlstra
  0 siblings, 2 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-21 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
>  	}
>  	__set_current_state(TASK_RUNNING);
>  
> -	if (!wait_task_inactive(child, TASK_TRACED) ||
> +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
>  	    !ptrace_freeze_traced(child))
>  		return -ESRCH;

Do we mind that this is going to fail if the child is frozen
during ptrace_check_attach?

I think to avoid that we need to safely get this to
wait_task_inactive(child, 0), like the coredump code uses.

I would like to say that we can do without the wait_task_inactive,
but it looks like it is necessary to ensure that all of the userspace
registers are saved where the tracer can get at them.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 17:26   ` Eric W. Biederman
@ 2022-04-21 17:57     ` Oleg Nesterov
  2022-04-21 19:55     ` Peter Zijlstra
  1 sibling, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-21 17:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/21, Eric W. Biederman wrote:
>
> I would like to say that we can do without the wait_task_inactive,
> but it looks like it is necessary to ensure that all of the userspace
> registers are saved where the tracer can get at them.

Yes, for example, fpu regs.

But there are more problems. For example, if debugger changes TIF_BLOCKSTEP
we need to ensure the child is already inactive and it will do another
switch_to() after that.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
@ 2022-04-21 18:23   ` Oleg Nesterov
  2022-04-21 19:58     ` Peter Zijlstra
  2022-04-21 18:40   ` Eric W. Biederman
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-21 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.

Looks good after the quick glance... but to me honest I got lost and
I need to apply these patches and read the code carefully.

However, I am not able to do this until Monday, sorry.

Just one nit for now,

>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!task_is_traced(task))
>  		return;
>  
>  	WARN_ON(!task->ptrace || task->parent != current);
>  
> -	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> -	 */
>  	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (task_is_traced(task)) {

I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
I think a single lockless

	if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
		return;

at the start should be enough?

Nobody else can set this flag. It can be cleared by the tracee if it was
woken up, so perhaps we can check it again but afaics this is not strictly
needed.

> +//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));

Did you really want to add the commented WARN_ON_ONCE?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
@ 2022-04-21 18:40   ` Eric W. Biederman
  2022-04-26 22:50       ` Eric W. Biederman
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-21 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.
>
> Due to how PREEMPT_RT is changing the rules vs task->__state with the
> introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
> blocking spinlock thing), the way ptrace freeze tries to do things no
> longer works.


The problem with ptrace_stop and do_signal_stop that requires dropping
siglock and grabbing tasklist_lock is that do_notify_parent_cldstop
needs tasklist_lock to keep parent and real_parent stable.

With just some very modest code changes it looks like we can use
a processes own siglock to keep parent and real_parent stable.  The
siglock is already acquired in all of those places it is just not held
over the changing parent and real_parent.

Then make a rule that a child's siglock must be grabbed before a parents
siglock and do_notify_parent_cldstop can be always be called under the
childs siglock.

This means ptrace_stop can be significantly simplified, and the
notifications can be moved far enough up that set_special_state
can be called after do_notify_parent_cldstop.  With the result
that there is simply no PREEMPT_RT issue to worry about and
wait_task_inactive can be used as is.

I remember Oleg suggesting a change something like this a long
time ago.


I need to handle the case where the parent and the child share
the same sighand but that is just remembering to handle it in
do_notify_parent_cldstop, as the handling is simply not taking
the lock twice.

I am going to play with that and see if I there are any gotcha's
I missed when looking through the code.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 17:26   ` Eric W. Biederman
  2022-04-21 17:57     ` Oleg Nesterov
@ 2022-04-21 19:55     ` Peter Zijlstra
  2022-04-21 20:07       ` Peter Zijlstra
  1 sibling, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 19:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
> >  	}
> >  	__set_current_state(TASK_RUNNING);
> >  
> > -	if (!wait_task_inactive(child, TASK_TRACED) ||
> > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
> >  	    !ptrace_freeze_traced(child))
> >  		return -ESRCH;
> 
> Do we mind that this is going to fail if the child is frozen
> during ptrace_check_attach?

Why should this fail? wait_task_inactive() will in fact succeed if it is
frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
changes elsewhere in this patch.

And I don't see why ptrace_freeze_traced() should fail. It'll warn
though, I should extend/remove that WARN_ON_ONCE() looking at __state,
but it should work.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 18:23   ` Oleg Nesterov
@ 2022-04-21 19:58     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 19:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 08:23:26PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> > task->__state as much.
> 
> Looks good after the quick glance... but to me honest I got lost and
> I need to apply these patches and read the code carefully.
> 
> However, I am not able to do this until Monday, sorry.

Sure, no worries. Take your time.

> Just one nit for now,
> 
> >  static void ptrace_unfreeze_traced(struct task_struct *task)
> >  {
> > -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> > +	if (!task_is_traced(task))
> >  		return;
> >  
> >  	WARN_ON(!task->ptrace || task->parent != current);
> >  
> > -	/*
> > -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> > -	 * Recheck state under the lock to close this race.
> > -	 */
> >  	spin_lock_irq(&task->sighand->siglock);
> > -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> > +	if (task_is_traced(task)) {
> 
> I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
> I think a single lockless
> 
> 	if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
> 		return;
> 
> at the start should be enough?

I think so. That is indeed cleaner. I'll make the change if I don't see
anything wrong with it in the morning when the brain has woken up again
;-)

> 
> Nobody else can set this flag. It can be cleared by the tracee if it was
> woken up, so perhaps we can check it again but afaics this is not strictly
> needed.
> 
> > +//		WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
> 
> Did you really want to add the commented WARN_ON_ONCE?

I did that because:

@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
                                  request == PTRACE_INTERRUPT);
        if (!ret) {
                ret = compat_arch_ptrace(child, request, addr, data);
-               if (ret || request != PTRACE_DETACH)
-                       ptrace_unfreeze_traced(child);
+               ptrace_unfreeze_traced(child);
        }

Can now call unfreeze too often. I left the comment in because I need to
think more about why Eric did that and see if it really is needed.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 19:55     ` Peter Zijlstra
@ 2022-04-21 20:07       ` Peter Zijlstra
  2022-04-22 15:52         ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-21 20:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Thu, Apr 21, 2022 at 09:55:51PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
> > Peter Zijlstra <peterz@infradead.org> writes:
> > 
> > > --- a/kernel/ptrace.c
> > > +++ b/kernel/ptrace.c
> > > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
> > >  	}
> > >  	__set_current_state(TASK_RUNNING);
> > >  
> > > -	if (!wait_task_inactive(child, TASK_TRACED) ||
> > > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
> > >  	    !ptrace_freeze_traced(child))
> > >  		return -ESRCH;
> > 
> > Do we mind that this is going to fail if the child is frozen
> > during ptrace_check_attach?
> 
> Why should this fail? wait_task_inactive() will in fact succeed if it is
> frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
> changes elsewhere in this patch.

These:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3260,6 +3260,19 @@ int migrate_swap(struct task_struct *cur
 }
 #endif /* CONFIG_NUMA_BALANCING */
 
+static inline bool __wti_match(struct task_struct *p, unsigned int match_state)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	if ((match_state & TASK_FREEZABLE) && state == TASK_FROZEN)
+		return true;
+
+	if (state == (match_state & ~TASK_FREEZABLE))
+		return true;
+
+	return false;
+}
+
 /*
  * wait_task_inactive - wait for a thread to unschedule.
  *
@@ -3304,7 +3317,7 @@ unsigned long wait_task_inactive(struct
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && !__wti_match(p, match_state))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3332,7 @@ unsigned long wait_task_inactive(struct
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || __wti_match(p, match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 


> And I don't see why ptrace_freeze_traced() should fail. It'll warn
> though, I should extend/remove that WARN_ON_ONCE() looking at __state,
> but it should work.

And that looks like (after removal of the one WARN):

static bool ptrace_freeze_traced(struct task_struct *task)
{
	unsigned long flags;
	bool ret = false;

	/* Lockless, nobody but us can set this flag */
	if (task->jobctl & JOBCTL_LISTENING)
		return ret;

	if (!lock_task_sighand(task, &flags))
		return ret;

	if (task_is_traced(task) &&
	    !looks_like_a_spurious_pid(task) &&
	    !__fatal_signal_pending(task)) {
		WARN_ON_ONCE(task->jobctl & JOBCTL_DELAY_WAKEKILL);
		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
		ret = true;
	}
	unlock_task_sighand(task, &flags);

	return ret;
}

And nothing there cares about ->__state.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic
  2022-04-21 20:07       ` Peter Zijlstra
@ 2022-04-22 15:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-22 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 21, 2022 at 09:55:51PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 21, 2022 at 12:26:44PM -0500, Eric W. Biederman wrote:
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> > 
>> > > --- a/kernel/ptrace.c
>> > > +++ b/kernel/ptrace.c
>> > > @@ -288,7 +288,7 @@ static int ptrace_check_attach(struct ta
>> > >  	}
>> > >  	__set_current_state(TASK_RUNNING);
>> > >  
>> > > -	if (!wait_task_inactive(child, TASK_TRACED) ||
>> > > +	if (!wait_task_inactive(child, TASK_TRACED|TASK_FREEZABLE) ||
>> > >  	    !ptrace_freeze_traced(child))
>> > >  		return -ESRCH;
>> > 
>> > Do we mind that this is going to fail if the child is frozen
>> > during ptrace_check_attach?
>> 
>> Why should this fail? wait_task_inactive() will in fact succeed if it is
>> frozen due to the added TASK_FREEZABLE and some wait_task_inactive()
>> changes elsewhere in this patch.
>
> These:

I had missed that change to wait_task_inactive.

Still that change to wait_task_inactive fundamentally depends upon the
fact that we don't care about the state we are passing into
wait_task_inactive.  So I think it would be better to simply have a
precursor patch that changes wait_task_inactive(child, TASK_TRACED) to
wait_task_inactive(child, 0) and say so explicitly.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
@ 2022-04-22 17:43 ` Sebastian Andrzej Siewior
  2022-04-22 19:15   ` Eric W. Biederman
  5 siblings, 1 reply; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-22 17:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, ebiederm, Will Deacon, linux-kernel, tj, linux-pm

On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
> Find here a new posting of the ptrace and freezer patches :-)
> 
> The majority of the changes are in patch 2, which with much feedback from Oleg
> and Eric has changed lots.
> 
> I'm hoping we're converging on something agreeable.

I tested this under RT (had to remove the preempt-disable section in
ptrace_stop()) with ssdd [0]. It forks a few tasks and then
PTRACE_SINGLESTEPs them for a few iterations.

The following failures were reported by that tool:
| forktest#27/3790: EXITING, ERROR: wait on PTRACE_ATTACH saw a SIGCHLD count of 0, should be 1
| forktest#225/40029: EXITING, ERROR: wait on PTRACE_SINGLESTEP #22241: no SIGCHLD seen (signal count == 0), signo 5

very rarely. Then I managed to figure out that the latter error triggers
if I compile something large with a RT priority. Sadly it also happens
with my old ptrace hack (but I just noticed it). It didn't happen with
without RT (just the 5 patches applied).

I also managed to trigger this backtrace with RT:
|WARNING: CPU: 1 PID: 3748 at kernel/signal.c:2237 ptrace_stop+0x356/0x370
|Modules linked in:
|CPU: 1 PID: 3748 Comm: ssdd Not tainted 5.18.0-rc3-rt1+ #1
|Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
|RIP: 0010:ptrace_stop+0x356/0x370
|RSP: 0000:ffffc9000d277d98 EFLAGS: 00010246
|RAX: ffff888116d1e100 RBX: ffff888116d1e100 RCX: 0000000000000001
|RDX: 0000000000000001 RSI: 000000000000002e RDI: ffffffff822bdcc3
|RBP: ffff888116d1e100 R08: ffff88811ca99870 R09: 0000000000000001
|R10: ffff88811ca99910 R11: ffff88852ade2680 R12: ffffc9000d277e90
|R13: 0000000000000004 R14: ffff888116d1ed48 R15: 0000000000000000
|FS:  00007f0afdad4580(0000) GS:ffff88852aa40000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|CR2: 00007f0afdad4508 CR3: 0000000558198006 CR4: 00000000000606e0
|Call Trace:
| <TASK>
| get_signal+0x553/0x870
| arch_do_signal_or_restart+0x31/0x7b0
| exit_to_user_mode_prepare+0xe4/0x110
| irqentry_exit_to_user_mode+0x5/0x20
| noist_exc_debug+0xe0/0x120
| asm_exc_debug+0x2b/0x30
|RSP: 002b:00007fffae964b70 EFLAGS: 00000346
|RAX: 0000000000000000 RBX: 00000000000000fc RCX: 00007f0afd9c0d35
|RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
|RBP: 00007fffae964e38 R08: 0000000000000000 R09: 00007fffae962a82
|R10: 00007f0afdad4850 R11: 0000000000000246 R12: 0000000000000000
|R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
| </TASK>

which is the WARN_ON_ONCE() in clear_traced_quiesce().

[0] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
@ 2022-04-22 19:15   ` Eric W. Biederman
  2022-04-22 21:13     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-22 19:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, linux-kernel,
	tj, linux-pm

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
>> Find here a new posting of the ptrace and freezer patches :-)
>> 
>> The majority of the changes are in patch 2, which with much feedback from Oleg
>> and Eric has changed lots.
>> 
>> I'm hoping we're converging on something agreeable.
>
> I tested this under RT (had to remove the preempt-disable section in
> ptrace_stop()) with ssdd [0]. It forks a few tasks and then
> PTRACE_SINGLESTEPs them for a few iterations.

Out of curiosity why did you need to remove the preempt_disable section
on PREEMPT_RT?  It should have lasted for just a moment until schedule
was called.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite
  2022-04-22 19:15   ` Eric W. Biederman
@ 2022-04-22 21:13     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-22 21:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, linux-kernel,
	tj, linux-pm

On 2022-04-22 14:15:35 [-0500], Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-04-21 17:02:48 [+0200], Peter Zijlstra wrote:
> >> Find here a new posting of the ptrace and freezer patches :-)
> >> 
> >> The majority of the changes are in patch 2, which with much feedback from Oleg
> >> and Eric has changed lots.
> >> 
> >> I'm hoping we're converging on something agreeable.
> >
> > I tested this under RT (had to remove the preempt-disable section in
> > ptrace_stop()) with ssdd [0]. It forks a few tasks and then
> > PTRACE_SINGLESTEPs them for a few iterations.
> 
> Out of curiosity why did you need to remove the preempt_disable section
> on PREEMPT_RT?  It should have lasted for just a moment until schedule
> was called.

within that section spinlock_t locks are acquired. These locks are
sleeping locks on PREEMPT_RT and must not be acquired within a
preempt-disable section. (A spinlock_t lock does not disable preemption
on PREEMPT_RT.)

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
  2022-04-21 18:23   ` Oleg Nesterov
  2022-04-21 18:40   ` Eric W. Biederman
@ 2022-04-25 14:35   ` Oleg Nesterov
  2022-04-25 18:33     ` Peter Zijlstra
  2022-04-25 17:47   ` Oleg Nesterov
  2022-04-27 15:53   ` Oleg Nesterov
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-25 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> +static void clear_traced_quiesce(void)
> +{
> +	spin_lock_irq(&current->sighand->siglock);
> +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));

This WARN_ON_ONCE() doesn't look right, the task can be killed right
after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
drops siglock.

> @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
>  		/*
>  		 * Don't want to allow preemption here, because
>  		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
>  		 */
>  		preempt_disable();
>  		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> +
> +		/*
> +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
> +		 * set_current_state(TASK_TRACED) above and schedule() below.
> +		 * There must not be any blocking (specifically anything that
> +		 * touched ->saved_state on PREEMPT_RT) between here and
> +		 * schedule().
> +		 *
> +		 * ptrace_check_attach() relies on this with its
> +		 * wait_task_inactive() usage.
> +		 */
> +		clear_traced_quiesce();

Well, I think it should be called earlier under tasklist_lock,
before preempt_disable() above.

We need tasklist_lock to protect ->parent, debugger can be killed
and go away right after read_unlock(&tasklist_lock).

Still trying to convince myself everything is right with
JOBCTL_STOPPED/TRACED ...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                     ` (2 preceding siblings ...)
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
@ 2022-04-25 17:47   ` Oleg Nesterov
  2022-04-27  0:24     ` Eric W. Biederman
  2022-04-27 15:53   ` Oleg Nesterov
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-25 17:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
>  	 * schedule() will not sleep if there is a pending signal that
>  	 * can awaken the task.
>  	 */
> -	current->jobctl |= JOBCTL_TRACED;
> +	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
>  	set_special_state(TASK_TRACED);

OK, this looks wrong. I actually mean the previous patch which sets
JOBCTL_TRACED.

The problem is that the tracee can be already killed, so that
fatal_signal_pending(current) is true. In this case we can't rely on
signal_wake_up_state() which should clear JOBCTL_TRACED, or the
callers of ptrace_signal_wake_up/etc which clear this flag by hand.

In this case schedule() won't block and ptrace_stop() will leak
JOBCTL_TRACED. Unless I missed something.

We could check fatal_signal_pending() and damn! this is what I think
ptrace_stop() should have done from the very beginning. But for now
I'd suggest to simply clear this flag before return, along with
DELAY_WAKEKILL and LISTENING.

>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

	current->jobctl &=
		~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
@ 2022-04-25 18:33     ` Peter Zijlstra
  2022-04-26  0:38       ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-25 18:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > +static void clear_traced_quiesce(void)
> > +{
> > +	spin_lock_irq(&current->sighand->siglock);
> > +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
> 
> This WARN_ON_ONCE() doesn't look right, the task can be killed right
> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
> drops siglock.

OK, will look at that.

> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
> >  		/*
> >  		 * Don't want to allow preemption here, because
> >  		 * sys_ptrace() needs this task to be inactive.
> > -		 *
> > -		 * XXX: implement read_unlock_no_resched().
> >  		 */
> >  		preempt_disable();
> >  		read_unlock(&tasklist_lock);
> > -		cgroup_enter_frozen();
> > +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> > +
> > +		/*
> > +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
> > +		 * set_current_state(TASK_TRACED) above and schedule() below.
> > +		 * There must not be any blocking (specifically anything that
> > +		 * touched ->saved_state on PREEMPT_RT) between here and
> > +		 * schedule().
> > +		 *
> > +		 * ptrace_check_attach() relies on this with its
> > +		 * wait_task_inactive() usage.
> > +		 */
> > +		clear_traced_quiesce();
> 
> Well, I think it should be called earlier under tasklist_lock,
> before preempt_disable() above.
> 
> We need tasklist_lock to protect ->parent, debugger can be killed
> and go away right after read_unlock(&tasklist_lock).
> 
> Still trying to convince myself everything is right with
> JOBCTL_STOPPED/TRACED ...

Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
use ->saved_state).

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 18:33     ` Peter Zijlstra
@ 2022-04-26  0:38       ` Eric W. Biederman
  2022-04-26  5:51         ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26  0:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
>> On 04/21, Peter Zijlstra wrote:
>> >
>> > +static void clear_traced_quiesce(void)
>> > +{
>> > +	spin_lock_irq(&current->sighand->siglock);
>> > +	WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
>> 
>> This WARN_ON_ONCE() doesn't look right, the task can be killed right
>> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
>> drops siglock.
>
> OK, will look at that.
>
>> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
>> >  		/*
>> >  		 * Don't want to allow preemption here, because
>> >  		 * sys_ptrace() needs this task to be inactive.
>> > -		 *
>> > -		 * XXX: implement read_unlock_no_resched().
>> >  		 */
>> >  		preempt_disable();
>> >  		read_unlock(&tasklist_lock);
>> > -		cgroup_enter_frozen();
>> > +		cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
>> > +
>> > +		/*
>> > +		 * JOBCTL_TRACE_QUIESCE bridges the gap between
>> > +		 * set_current_state(TASK_TRACED) above and schedule() below.
>> > +		 * There must not be any blocking (specifically anything that
>> > +		 * touched ->saved_state on PREEMPT_RT) between here and
>> > +		 * schedule().
>> > +		 *
>> > +		 * ptrace_check_attach() relies on this with its
>> > +		 * wait_task_inactive() usage.
>> > +		 */
>> > +		clear_traced_quiesce();
>> 
>> Well, I think it should be called earlier under tasklist_lock,
>> before preempt_disable() above.
>> 
>> We need tasklist_lock to protect ->parent, debugger can be killed
>> and go away right after read_unlock(&tasklist_lock).
>> 
>> Still trying to convince myself everything is right with
>> JOBCTL_STOPPED/TRACED ...
>
> Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
> use ->saved_state).

There are some other issues in this part of ptrace_stop().


I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".


Currently in ptrace_check_attach a parameter of __TASK_TRACED is passed
so that wait_task_inactive cane fail if the "!current->ptrace" branch
of ptrace_stop is take and ptrace_stop does not stop.  With the
TASK_FROZEN state it appears that "!current->ptrace" branch can continue
and freeze somewhere else and wait_task_inactive could decided it was
fine.


I have to run, but hopefully tommorrow I will post the patches that
remove the "!current->ptrace" case altogether and basically
remove the need for quiesce and wait_task_inactive detecting
which branch is taken.

The spinlock in cgroup_enter_frozen remains an issue for PREEMPT_RT.
But the rest of the issues are cleared up by using siglock instead
of tasklist_lock.  Plus the code is just easier to read and understand.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26  0:38       ` Eric W. Biederman
@ 2022-04-26  5:51         ` Oleg Nesterov
  2022-04-26 17:19           ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-26  5:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/25, Eric W. Biederman wrote:
>
> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".

As Peter explained, in this case we can rely on __ptrace_unlink() which
should clear this flag.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26  5:51         ` Oleg Nesterov
@ 2022-04-26 17:19           ` Eric W. Biederman
  2022-04-26 18:11             ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 17:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/25, Eric W. Biederman wrote:
>>
>> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".
>
> As Peter explained, in this case we can rely on __ptrace_unlink() which
> should clear this flag.

I had missed that that signal_wake_up_state was clearing
JOBCTL_TRACED_QUIESCE.

Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
siglock is taken before calling ptrace_stop.  Especially with the
ptrace_notify in signal_delivered that does not look guaranteed.

The __ptrace_unlink could also happen during arch_ptrace_stop.

Relying on siglock is sufficient because __ptrace_unlink holds siglock
over clearing task->ptrace.  Which means that the simple fix for this is
to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-26 17:19           ` Eric W. Biederman
@ 2022-04-26 18:11             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-26 18:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/26, Eric W. Biederman wrote:
>
> Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
> siglock is taken before calling ptrace_stop.  Especially with the
> ptrace_notify in signal_delivered that does not look guaranteed.
>
> The __ptrace_unlink could also happen during arch_ptrace_stop.
>
> Relying on siglock is sufficient because __ptrace_unlink holds siglock
> over clearing task->ptrace.  Which means that the simple fix for this is
> to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Or simply clear _QUEIESCE along with _TRACED/DELAY_WAKEKILL before return?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/9] ptrace: cleaning up ptrace_stop
  2022-04-21 18:40   ` Eric W. Biederman
@ 2022-04-26 22:50       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook


While looking at how ptrace is broken on PREEMPT_RT I realized
that ptrace_stop would be much simpler and more maintainable
if tsk->ptrace, tsk->parent, and tsk->real_parent were protected
by siglock.  Most of the changes are general cleanups in support
of this locking change.

While making the necessary changes to protect tsk->ptrace with
siglock I discovered we have two architectures xtensa and um
that were using tsk->ptrace for what most other architectures
use TIF_SIGPENDING for and not protecting tsk->ptrace with any lock.

By the end of this series ptrace should work on PREEMPT_RT with the
CONFIG_FREEZER and CONFIG_CGROUPS disabled, by the simple fact that the
ptrace_stop code becomes less special.  The function cgroup_enter_frozen
because it takes a lock which is a sleeping lock on PREEMPT_RT with
preemption disabled definitely remains a problem.  Peter Zijlstra has
been rewriting the classic freezer and in earlier parts of this
discussion so I presume it is also a problem for PREEMPT_RT.

Peter's series rewriting the freezer[1] should work on top of this
series with minimal changes and patch 2/5 removed.

Eric W. Biederman (9):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Protect parent child relationships by childs siglock
      signal: Always call do_notify_parent_cldstop with siglock held
      ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      ptrace: Don't change __state

 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +-
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched/jobctl.h      |   2 +
 include/linux/sched/signal.h      |   3 +-
 include/linux/signal.h            |   3 +-
 kernel/exit.c                     |   4 +
 kernel/fork.c                     |  12 +--
 kernel/ptrace.c                   |  61 ++++++-------
 kernel/signal.c                   | 187 ++++++++++++++------------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 17 files changed, 131 insertions(+), 184 deletions(-)

[1] https://lkml.kernel.org/r/20220421150248.667412396@infradead.org

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/9] ptrace: cleaning up ptrace_stop
@ 2022-04-26 22:50       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook


While looking at how ptrace is broken on PREEMPT_RT I realized
that ptrace_stop would be much simpler and more maintainable
if tsk->ptrace, tsk->parent, and tsk->real_parent were protected
by siglock.  Most of the changes are general cleanups in support
of this locking change.

While making the necessary changes to protect tsk->ptrace with
siglock I discovered we have two architectures xtensa and um
that were using tsk->ptrace for what most other architectures
use TIF_SIGPENDING for and not protecting tsk->ptrace with any lock.

By the end of this series ptrace should work on PREEMPT_RT with the
CONFIG_FREEZER and CONFIG_CGROUPS disabled, by the simple fact that the
ptrace_stop code becomes less special.  The function cgroup_enter_frozen
because it takes a lock which is a sleeping lock on PREEMPT_RT with
preemption disabled definitely remains a problem.  Peter Zijlstra has
been rewriting the classic freezer and in earlier parts of this
discussion so I presume it is also a problem for PREEMPT_RT.

Peter's series rewriting the freezer[1] should work on top of this
series with minimal changes and patch 2/5 removed.

Eric W. Biederman (9):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Protect parent child relationships by childs siglock
      signal: Always call do_notify_parent_cldstop with siglock held
      ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      ptrace: Don't change __state

 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +-
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched/jobctl.h      |   2 +
 include/linux/sched/signal.h      |   3 +-
 include/linux/signal.h            |   3 +-
 kernel/exit.c                     |   4 +
 kernel/fork.c                     |  12 +--
 kernel/ptrace.c                   |  61 ++++++-------
 kernel/signal.c                   | 187 ++++++++++++++------------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 17 files changed, 131 insertions(+), 184 deletions(-)

[1] https://lkml.kernel.org/r/20220421150248.667412396@infradead.org

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 1/9] signal: Rename send_signal send_signal_locked
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 1/9] signal: Rename send_signal send_signal_locked
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 2/9] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 2/9] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by siglock and not siglock.

Simplify things by guarding the parent/child relationship
with siglock.  For the most part this just requires a little bit
of code motion.  In a couple of places more locking was needed.

After this change tsk->parent, tsk->real_parent, tsk->ptrace tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c | 13 +++++++++----
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..b07af19eca13 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..16d1a84a2cae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -447,15 +446,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -521,8 +520,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
@@ -689,10 +690,14 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 		return ret;
 
 	/* Avoid intermediate state when all opts are cleared */
+	write_lock_irq(&tasklist_lock);
+	spin_lock(&child->sighand->siglock);
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
 	child->ptrace = flags;
+	spin_unlock(&child->sighand->siglock);
+	write_unlock_irq(&tasklist_lock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by siglock and not siglock.

Simplify things by guarding the parent/child relationship
with siglock.  For the most part this just requires a little bit
of code motion.  In a couple of places more locking was needed.

After this change tsk->parent, tsk->real_parent, tsk->ptrace tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c | 13 +++++++++----
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..b07af19eca13 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..16d1a84a2cae 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -447,15 +446,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -521,8 +520,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
@@ -689,10 +690,14 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 		return ret;
 
 	/* Avoid intermediate state when all opts are cleared */
+	write_lock_irq(&tasklist_lock);
+	spin_lock(&child->sighand->siglock);
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
 	child->ptrace = flags;
+	spin_unlock(&child->sighand->siglock);
+	write_unlock_irq(&tasklist_lock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 156 +++++++++++++++++-------------------------------
 1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..584d67deb3cb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	bool lock;
 	u64 utime, stime;
 
+	assert_spin_locked(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
+	lock = tsk->sighand != sighand;
+	if (lock)
+		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2172,7 +2176,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
+	if (lock)
+		spin_unlock(&sighand->siglock);
 }
 
 /*
@@ -2193,7 +2198,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if current is not ptraced */
+	if (unlikely(!current->ptrace))
+		return (clear_code) ? 0 : exit_code;
+
+	/*
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
+	 */
+	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
+		gstop_done = task_participate_group_stop(current);
+
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	do_notify_parent_cldstop(current, true, why);
+	if (gstop_done && ptrace_reparented(current))
+		do_notify_parent_cldstop(current, false, why);
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
@@ -2239,15 +2271,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
 
-	/*
-	 * If @why is CLD_STOPPED, we're trapping to participate in a group
-	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * across siglock relocks since INTERRUPT was scheduled, PENDING
-	 * could be clear now.  We act as if SIGCONT is received after
-	 * TASK_TRACED is entered - ignore it.
-	 */
-	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
-		gstop_done = task_participate_group_stop(current);
 
 	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
 	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
@@ -2257,56 +2280,19 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement spin_unlock_no_resched().
+	 */
+	preempt_disable();
 	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
-		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2300,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2444,34 +2429,17 @@ static bool do_signal_stop(int signr)
 	}
 
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
 
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2665,8 +2633,6 @@ bool get_signal(struct ksignal *ksig)
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Notify the parent that we're continuing.  This event is
 		 * always per-process and doesn't make whole lot of sense
@@ -2675,15 +2641,10 @@ bool get_signal(struct ksignal *ksig)
 		 * the ptracer of the group leader too unless it's gonna be
 		 * a duplicate.
 		 */
-		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, false, why);
-
 		if (ptrace_reparented(current->group_leader))
 			do_notify_parent_cldstop(current->group_leader,
 						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
 	}
 
 	for (;;) {
@@ -2940,7 +2901,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2971,21 +2931,15 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
-	}
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
+	    task_participate_group_stop(tsk))
+		do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 156 +++++++++++++++++-------------------------------
 1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..584d67deb3cb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	bool lock;
 	u64 utime, stime;
 
+	assert_spin_locked(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
+	lock = tsk->sighand != sighand;
+	if (lock)
+		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2172,7 +2176,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
+	if (lock)
+		spin_unlock(&sighand->siglock);
 }
 
 /*
@@ -2193,7 +2198,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if current is not ptraced */
+	if (unlikely(!current->ptrace))
+		return (clear_code) ? 0 : exit_code;
+
+	/*
+	 * If @why is CLD_STOPPED, we're trapping to participate in a group
+	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
+	 * across siglock relocks since INTERRUPT was scheduled, PENDING
+	 * could be clear now.  We act as if SIGCONT is received after
+	 * TASK_TRACED is entered - ignore it.
+	 */
+	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
+		gstop_done = task_participate_group_stop(current);
+
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	do_notify_parent_cldstop(current, true, why);
+	if (gstop_done && ptrace_reparented(current))
+		do_notify_parent_cldstop(current, false, why);
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
@@ -2239,15 +2271,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->last_siginfo = info;
 	current->exit_code = exit_code;
 
-	/*
-	 * If @why is CLD_STOPPED, we're trapping to participate in a group
-	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
-	 * across siglock relocks since INTERRUPT was scheduled, PENDING
-	 * could be clear now.  We act as if SIGCONT is received after
-	 * TASK_TRACED is entered - ignore it.
-	 */
-	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
-		gstop_done = task_participate_group_stop(current);
 
 	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
 	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
@@ -2257,56 +2280,19 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/* entering a trap, clear TRAPPING */
 	task_clear_jobctl_trapping(current);
 
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement spin_unlock_no_resched().
+	 */
+	preempt_disable();
 	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
-		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2300,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2444,34 +2429,17 @@ static bool do_signal_stop(int signr)
 	}
 
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
 
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2665,8 +2633,6 @@ bool get_signal(struct ksignal *ksig)
 
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
 		/*
 		 * Notify the parent that we're continuing.  This event is
 		 * always per-process and doesn't make whole lot of sense
@@ -2675,15 +2641,10 @@ bool get_signal(struct ksignal *ksig)
 		 * the ptracer of the group leader too unless it's gonna be
 		 * a duplicate.
 		 */
-		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, false, why);
-
 		if (ptrace_reparented(current->group_leader))
 			do_notify_parent_cldstop(current->group_leader,
 						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
 	}
 
 	for (;;) {
@@ -2940,7 +2901,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2971,21 +2931,15 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
-	}
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
+	    task_participate_group_stop(tsk))
+		do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
was needed to detect the when ptrace_stop would decide not to stop
after calling "set_special_state(TASK_TRACED)".  With the recent
cleanups ptrace_stop will always stop after calling set_special_state.

Take advatnage of this by no longer asking wait_task_inactive to
verify the state.  If a bug is hit and wait_task_inactive does not
succeed warn and return -ESRCH.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 16d1a84a2cae..0634da7ac685 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
was needed to detect the when ptrace_stop would decide not to stop
after calling "set_special_state(TASK_TRACED)".  With the recent
cleanups ptrace_stop will always stop after calling set_special_state.

Take advatnage of this by no longer asking wait_task_inactive to
verify the state.  If a bug is hit and wait_task_inactive does not
succeed warn and return -ESRCH.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 16d1a84a2cae..0634da7ac685 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0634da7ac685..842511ee9a9f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -189,17 +189,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -237,33 +234,35 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			/*
+			 * child->sighand can't be NULL, release_task()
+			 * does ptrace_unlink() before __exit_signal().
+			 */
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0634da7ac685..842511ee9a9f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -189,17 +189,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		WRITE_ONCE(task->__state, __TASK_TRACED);
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -237,33 +234,35 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			/*
+			 * child->sighand can't be NULL, release_task()
+			 * does ptrace_unlink() before __exit_signal().
+			 */
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-26 22:52         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL.  This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL
is used while JOBCTL_DELAY_WAKEKILL is set.  This has the same effect
as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that
use TASK_KILLABLE go through signal_wake_up except the wake_up in
ptrace_unfreeze_traced.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_DELAY_WAKEKILL.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  3 ++-
 kernel/ptrace.c              | 11 +++++------
 kernel/signal.c              |  1 +
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..4e154ad8205f 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT	24	/* delay killable wakeups */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..1947c85aa9d9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
+	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 842511ee9a9f..0bea74539320 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
 
@@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task->jobctl & JOBCTL_DELAY_WAKEKILL) {
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	if (lock_task_sighand(child, &flags)) {
 		if (child->ptrace && child->parent == current) {
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
 			/*
 			 * child->sighand can't be NULL, release_task()
 			 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 584d67deb3cb..2b332f89cbad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-26 22:52         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL.  This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL
is used while JOBCTL_DELAY_WAKEKILL is set.  This has the same effect
as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that
use TASK_KILLABLE go through signal_wake_up except the wake_up in
ptrace_unfreeze_traced.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_DELAY_WAKEKILL.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  3 ++-
 kernel/ptrace.c              | 11 +++++------
 kernel/signal.c              |  1 +
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..4e154ad8205f 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT	24	/* delay killable wakeups */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL	(1UL << JOBCTL_DELAY_WAKEKILL_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..1947c85aa9d9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
+	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 842511ee9a9f..0bea74539320 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_DELAY_WAKEKILL;
 		ret = true;
 	}
 
@@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
 		return;
 
 	WARN_ON(!task->ptrace || task->parent != current);
@@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 * Recheck state under the lock to close this race.
 	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (task->jobctl & JOBCTL_DELAY_WAKEKILL) {
+		task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
 	}
 	spin_unlock_irq(&task->sighand->siglock);
 }
@@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	if (lock_task_sighand(child, &flags)) {
 		if (child->ptrace && child->parent == current) {
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
 			/*
 			 * child->sighand can't be NULL, release_task()
 			 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 584d67deb3cb..2b332f89cbad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-26 23:33           ` Max Filippov
  -1 siblings, 0 replies; 572+ messages in thread
From: Max Filippov @ 2022-04-26 23:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, rjw, Oleg Nesterov, Ingo Molnar, Vincent Guittot,
	dietmar.eggemann, Steven Rostedt, mgorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, inux-xtensa, Kees Cook, Jann Horn

On Tue, Apr 26, 2022 at 3:52 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
> user_enable_single_step and user_disable_single_step without locking could
> potentiallly cause problems.
>
> So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
> that xtensa already had defined but unused.
>
> Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/xtensa/kernel/ptrace.c | 4 ++--
>  arch/xtensa/kernel/signal.c | 4 ++--
>  include/linux/ptrace.h      | 6 ------
>  3 files changed, 4 insertions(+), 10 deletions(-)

Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-26 23:33           ` Max Filippov
  0 siblings, 0 replies; 572+ messages in thread
From: Max Filippov @ 2022-04-26 23:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, rjw, Oleg Nesterov, Ingo Molnar, Vincent Guittot,
	dietmar.eggemann, Steven Rostedt, mgorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, inux-xtensa, Kees Cook, Jann Horn

On Tue, Apr 26, 2022 at 3:52 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
> user_enable_single_step and user_disable_single_step without locking could
> potentiallly cause problems.
>
> So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
> that xtensa already had defined but unused.
>
> Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/xtensa/kernel/ptrace.c | 4 ++--
>  arch/xtensa/kernel/signal.c | 4 ++--
>  include/linux/ptrace.h      | 6 ------
>  3 files changed, 4 insertions(+), 10 deletions(-)

Acked-by: Max Filippov <jcmvbkbc@gmail.com>

-- 
Thanks.
-- Max

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
@ 2022-04-26 23:34   ` Eric W. Biederman
  2022-04-28 10:00     ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-26 23:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

Peter Zijlstra <peterz@infradead.org> writes:

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
>
> There's two spots of bother with this:
>
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
>
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
>
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
>
> NOTE: this doesn't actually fix anything yet, just adds extra state.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
>  	 * By using wake_up_state, we ensure the process will wake up and
>  	 * handle its death signal.
>  	 */
> -	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
> +	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
> +		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
> +	else
>  		kick_process(t);
>  }

This hunk is subtle and I don't think it is actually what we want if the
code is going to be robust against tsk->__state becoming TASK_FROZEN.

I think we want the clearing of JOBCTL_STOPPED and JOBCTL_TRACED
to be independent of what tsk->__state and tsk->saved_state are.

Something like:

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
	unsigned int state = 0;
	if (resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL)) {
		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
		state = TASK_WAKEKILL;
	}
	signal_wake_up_state(t, state);
}

static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
	unsigned int state = 0;
	if (resume) {
		t->jobctl &= ~JOBCTL_TRACED;
		state = __TASK_TRACED;
	}
	signal_wake_up_state(t, state);
}

That would allow __set_task_special in the final patch to look like:

/*
 * The special task states (TASK_STOPPED, TASK_TRACED) keep their canonical
 * state in p->jobctl. If either of them got a wakeup that was missed because
 * TASK_FROZEN, then their canonical state reflects that and the below will
 * refuse to restore the special state and instead issue the wakeup.
 */
static int __set_task_special(struct task_struct *p, void *arg)
{
        unsigned int state = 0;

	if (p->jobctl & JOBCTL_TRACED)
        	state = TASK_TRACED;

	else if (p->jobctl & JOBCTL_STOPPED)
		state = TASK_STOPPED;

	if (state)
		WRITE_ONCE(p->__state, state);

	return state;
}


With no need to figure out if a wake_up was dropped and reverse engineer
what the wakeup was.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-25 17:47   ` Oleg Nesterov
@ 2022-04-27  0:24     ` Eric W. Biederman
  2022-04-28 20:29       ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27  0:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
>>  	 * schedule() will not sleep if there is a pending signal that
>>  	 * can awaken the task.
>>  	 */
>> -	current->jobctl |= JOBCTL_TRACED;
>> +	current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
>>  	set_special_state(TASK_TRACED);
>
> OK, this looks wrong. I actually mean the previous patch which sets
> JOBCTL_TRACED.
>
> The problem is that the tracee can be already killed, so that
> fatal_signal_pending(current) is true. In this case we can't rely on
> signal_wake_up_state() which should clear JOBCTL_TRACED, or the
> callers of ptrace_signal_wake_up/etc which clear this flag by hand.
>
> In this case schedule() won't block and ptrace_stop() will leak
> JOBCTL_TRACED. Unless I missed something.
>
> We could check fatal_signal_pending() and damn! this is what I think
> ptrace_stop() should have done from the very beginning. But for now
> I'd suggest to simply clear this flag before return, along with
> DELAY_WAKEKILL and LISTENING.

Oh.  That is an interesting case for JOBCTL_TRACED.  The
scheduler refuses to stop if signal_pending_state(TASK_TRACED, p)
returns true.

The ptrace_stop code used to handle this explicitly and in commit
7d613f9f72ec ("signal: Remove the bogus sigkill_pending in ptrace_stop")
I actually removed the test.  As the test was somewhat wrong and
redundant, and in slightly the wrong location.

But doing:

	/* Don't stop if the task is dying */
	if (unlikely(__fatal_signal_pending(current)))
		return exit_code;

Should work.

>
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> 	current->jobctl &=
> 		~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);


I presume you meant:

	current->jobctl &=
 		~(JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

I don't think we want to do that.  For the case you are worried about it
is a valid fix.

In general this is the wrong approach as we want the waker to clear
JOBCTL_TRACED.  If the waker does not it is possible that
ptrace_freeze_traced might attempt to freeze a process whose state
is not appropriate for attach, because the code is past the call
to schedule().

In fact I think clearing JOBCTL_TRACED at the end of ptrace_stop
will allow ptrace_freeze_traced to come in while siglock is dropped,
expect the process to stop, and have the process not stop.  Of
course wait_task_inactive coming first that might not be a problem.



This is a minor problem with the patchset I just posted.  I thought the
only reason wait_task_inactive could fail was if ptrace_stop() hit the
!current->ptrace case.  Thinking about any it any SIGKILL coming in
before tracee stops in schedule will trigger this, so it is not as
safe as I thought to not pass a state into wait_task_inactive.

It is time for me to shut down today.  I will sleep on that and
see what I can see tomorrow.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27  6:40           ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-27  6:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
> The functions ptrace_stop and do_signal_stop have to drop siglock
> and grab tasklist_lock because the parent/child relation ship
> is guarded by siglock and not siglock.

 "is guarded by tasklist_lock and not siglock." ?

> Simplify things by guarding the parent/child relationship
> with siglock.  For the most part this just requires a little bit
> of code motion.  In a couple of places more locking was needed.
> 
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-27  6:40           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-04-27  6:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
> The functions ptrace_stop and do_signal_stop have to drop siglock
> and grab tasklist_lock because the parent/child relation ship
> is guarded by siglock and not siglock.

 "is guarded by tasklist_lock and not siglock." ?

> Simplify things by guarding the parent/child relationship
> with siglock.  For the most part this just requires a little bit
> of code motion.  In a couple of places more locking was needed.
> 
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27  7:10           ` Johannes Berg
  -1 siblings, 0 replies; 572+ messages in thread
From: Johannes Berg @ 2022-04-27  7:10 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, linux-um,
	Chris Zankel, Max Filippov, inux-xtensa, Kees Cook, Jann Horn

On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
> single stepping is a little confusing and worse changing tsk->ptrace without locking
> could potentionally cause problems.
> 
> So use a thread info flag with a better name instead of flag in tsk->ptrace.
> 
> Remove the definition PT_DTRACE as uml is the last user.


Looks fine to me.

Acked-by: Johannes Berg <johannes@sipsolutions.net>

Looking at pending patches, I don't see any conflicts from this. I'm
guessing anyway you'll want/need to take these through some tree all
together.

johannes



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-27  7:10           ` Johannes Berg
  0 siblings, 0 replies; 572+ messages in thread
From: Johannes Berg @ 2022-04-27  7:10 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, linux-um,
	Chris Zankel, Max Filippov, inux-xtensa, Kees Cook, Jann Horn

On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
> single stepping is a little confusing and worse changing tsk->ptrace without locking
> could potentionally cause problems.
> 
> So use a thread info flag with a better name instead of flag in tsk->ptrace.
> 
> Remove the definition PT_DTRACE as uml is the last user.


Looks fine to me.

Acked-by: Johannes Berg <johannes@sipsolutions.net>

Looking at pending patches, I don't see any conflicts from this. I'm
guessing anyway you'll want/need to take these through some tree all
together.

johannes



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
  2022-04-27  6:40           ` Sebastian Andrzej Siewior
@ 2022-04-27 13:35             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
>> The functions ptrace_stop and do_signal_stop have to drop siglock
>> and grab tasklist_lock because the parent/child relation ship
>> is guarded by siglock and not siglock.
>
>  "is guarded by tasklist_lock and not siglock." ?

Yes.   Thank you.  I will fix that.

>> Simplify things by guarding the parent/child relationship
>> with siglock.  For the most part this just requires a little bit
>> of code motion.  In a couple of places more locking was needed.
>> 
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Sebastian

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 5/9] signal: Protect parent child relationships by childs siglock
@ 2022-04-27 13:35             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-04-26 17:52:07 [-0500], Eric W. Biederman wrote:
>> The functions ptrace_stop and do_signal_stop have to drop siglock
>> and grab tasklist_lock because the parent/child relation ship
>> is guarded by siglock and not siglock.
>
>  "is guarded by tasklist_lock and not siglock." ?

Yes.   Thank you.  I will fix that.

>> Simplify things by guarding the parent/child relationship
>> with siglock.  For the most part this just requires a little bit
>> of code motion.  In a couple of places more locking was needed.
>> 
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Sebastian

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 13:42           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

As Oleg noticed upthread there are more reasons than simply
!current->ptrace for wait_task_inactive to fail.  In particular a fatal
signal can be received any time before JOBCTL_DELAY_SIGKILL.

So this change is not safe.  I will respin this one.

Eric


> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 16d1a84a2cae..0634da7ac685 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
> +		ret = -ESRCH;
>  
>  	return ret;
>  }

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 13:42           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

As Oleg noticed upthread there are more reasons than simply
!current->ptrace for wait_task_inactive to fail.  In particular a fatal
signal can be received any time before JOBCTL_DELAY_SIGKILL.

So this change is not safe.  I will respin this one.

Eric


> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 16d1a84a2cae..0634da7ac685 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
> +		ret = -ESRCH;
>  
>  	return ret;
>  }

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-27  7:10           ` Johannes Berg
@ 2022-04-27 13:50             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:50 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
>> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
>> single stepping is a little confusing and worse changing tsk->ptrace without locking
>> could potentionally cause problems.
>> 
>> So use a thread info flag with a better name instead of flag in tsk->ptrace.
>> 
>> Remove the definition PT_DTRACE as uml is the last user.
>
>
> Looks fine to me.
>
> Acked-by: Johannes Berg <johannes@sipsolutions.net>

Thanks.

> Looking at pending patches, I don't see any conflicts from this. I'm
> guessing anyway you'll want/need to take these through some tree all
> together.

Taking them all through a single tree looks like it will be easiest.
So I am planning on taking them through my signal tree.

Now that I think of it, the lack of locking also means I want to
Cc stable.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-27 13:50             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 13:50 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2022-04-26 at 17:52 -0500, Eric W. Biederman wrote:
>> User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
>> single stepping is a little confusing and worse changing tsk->ptrace without locking
>> could potentionally cause problems.
>> 
>> So use a thread info flag with a better name instead of flag in tsk->ptrace.
>> 
>> Remove the definition PT_DTRACE as uml is the last user.
>
>
> Looks fine to me.
>
> Acked-by: Johannes Berg <johannes@sipsolutions.net>

Thanks.

> Looking at pending patches, I don't see any conflicts from this. I'm
> guessing anyway you'll want/need to take these through some tree all
> together.

Taking them all through a single tree looks like it will be easiest.
So I am planning on taking them through my signal tree.

Now that I think of it, the lack of locking also means I want to
Cc stable.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 14:10           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>   	}
>
>  	sighand = parent->sighand;
> -	spin_lock_irqsave(&sighand->siglock, flags);
> +	lock = tsk->sighand != sighand;
> +	if (lock)
> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);

But why is it safe?

Suppose we have two tasks, they both trace each other, both call
ptrace_stop() at the same time. Of course this is ugly, they both
will block.

But with this patch in this case we have the trivial ABBA deadlock,
no?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:10           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>   	}
>
>  	sighand = parent->sighand;
> -	spin_lock_irqsave(&sighand->siglock, flags);
> +	lock = tsk->sighand != sighand;
> +	if (lock)
> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);

But why is it safe?

Suppose we have two tasks, they both trace each other, both call
ptrace_stop() at the same time. Of course this is ugly, they both
will block.

But with this patch in this case we have the trivial ABBA deadlock,
no?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:10           ` Oleg Nesterov
@ 2022-04-27 14:20             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>   	}
>>
>>  	sighand = parent->sighand;
>> -	spin_lock_irqsave(&sighand->siglock, flags);
>> +	lock = tsk->sighand != sighand;
>> +	if (lock)
>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>
> But why is it safe?
>
> Suppose we have two tasks, they both trace each other, both call
> ptrace_stop() at the same time. Of course this is ugly, they both
> will block.
>
> But with this patch in this case we have the trivial ABBA deadlock,
> no?

I was thinking in terms of the process tree (which is fine).

The ptrace parental relationship definitely has the potential to be a
graph with cycles.  Which as you point out is not fine.


The result is very nice and I don't want to give it up.  I suspect
something ptrace cycles are always a problem and can simply be
forbidden.  That is going to take some analsysis and some additional
code in ptrace_attach.

I will go look at that.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:20             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>   	}
>>
>>  	sighand = parent->sighand;
>> -	spin_lock_irqsave(&sighand->siglock, flags);
>> +	lock = tsk->sighand != sighand;
>> +	if (lock)
>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>
> But why is it safe?
>
> Suppose we have two tasks, they both trace each other, both call
> ptrace_stop() at the same time. Of course this is ugly, they both
> will block.
>
> But with this patch in this case we have the trivial ABBA deadlock,
> no?

I was thinking in terms of the process tree (which is fine).

The ptrace parental relationship definitely has the potential to be a
graph with cycles.  Which as you point out is not fine.


The result is very nice and I don't want to give it up.  I suspect
something ptrace cycles are always a problem and can simply be
forbidden.  That is going to take some analsysis and some additional
code in ptrace_attach.

I will go look at that.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-27 13:42           ` Eric W. Biederman
@ 2022-04-27 14:27             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
>> was needed to detect the when ptrace_stop would decide not to stop
>> after calling "set_special_state(TASK_TRACED)".  With the recent
>> cleanups ptrace_stop will always stop after calling set_special_state.
>>
>> Take advatnage of this by no longer asking wait_task_inactive to
>> verify the state.  If a bug is hit and wait_task_inactive does not
>> succeed warn and return -ESRCH.
>
> As Oleg noticed upthread there are more reasons than simply
> !current->ptrace for wait_task_inactive to fail.  In particular a fatal
> signal can be received any time before JOBCTL_DELAY_SIGKILL.
>
> So this change is not safe.  I will respin this one.

Bah.  I definitely need to update the description so there is going to
be a v2.

I confused myself.  This change is safe because ptrace_freeze_traced
fails if there is a pending fatal signal, and arranges that no new fatal
signals will wake up the task.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ptrace.c | 14 +++-----------
>>  1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 16d1a84a2cae..0634da7ac685 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
>> +		ret = -ESRCH;
>>  
>>  	return ret;
>>  }
>
> Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 14:27             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
>> was needed to detect the when ptrace_stop would decide not to stop
>> after calling "set_special_state(TASK_TRACED)".  With the recent
>> cleanups ptrace_stop will always stop after calling set_special_state.
>>
>> Take advatnage of this by no longer asking wait_task_inactive to
>> verify the state.  If a bug is hit and wait_task_inactive does not
>> succeed warn and return -ESRCH.
>
> As Oleg noticed upthread there are more reasons than simply
> !current->ptrace for wait_task_inactive to fail.  In particular a fatal
> signal can be received any time before JOBCTL_DELAY_SIGKILL.
>
> So this change is not safe.  I will respin this one.

Bah.  I definitely need to update the description so there is going to
be a v2.

I confused myself.  This change is safe because ptrace_freeze_traced
fails if there is a pending fatal signal, and arranges that no new fatal
signals will wake up the task.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  kernel/ptrace.c | 14 +++-----------
>>  1 file changed, 3 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 16d1a84a2cae..0634da7ac685 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -265,17 +265,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, 0)))
>> +		ret = -ESRCH;
>>  
>>  	return ret;
>>  }
>
> Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:20             ` Eric W. Biederman
@ 2022-04-27 14:43               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.

OK, please consider another case.

We have a parent P and its child C. C traces P.

This is not that unusual, I don't think we can forbid this case.

P reports an event and calls do_notify_parent_cldstop().

C receives SIGSTOP and calls do_notify_parent_cldstop() too.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:43               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.

OK, please consider another case.

We have a parent P and its child C. C traces P.

This is not that unusual, I don't think we can forbid this case.

P reports an event and calls do_notify_parent_cldstop().

C receives SIGSTOP and calls do_notify_parent_cldstop() too.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:20             ` Eric W. Biederman
@ 2022-04-27 14:47               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 04/26, Eric W. Biederman wrote:
>>>
>>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>>   	}
>>>
>>>  	sighand = parent->sighand;
>>> -	spin_lock_irqsave(&sighand->siglock, flags);
>>> +	lock = tsk->sighand != sighand;
>>> +	if (lock)
>>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>>
>> But why is it safe?
>>
>> Suppose we have two tasks, they both trace each other, both call
>> ptrace_stop() at the same time. Of course this is ugly, they both
>> will block.
>>
>> But with this patch in this case we have the trivial ABBA deadlock,
>> no?
>
> I was thinking in terms of the process tree (which is fine).
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.  That is going to take some analsysis and some additional
> code in ptrace_attach.
>
> I will go look at that.


Hmm.  If we have the following process tree.

    A
     \
      B
       \
        C

Process A, B, and C are all in the same process group.
Process A and B are setup to receive SIGCHILD when
their process stops.

Process C traces process A.

When a sigstop is delivered to the group we can have:

Process B takes siglock(B) siglock(A) to notify the real_parent
Process C takes siglock(C) siglock(B) to notify the real_parent
Process A takes siglock(A) siglock(C) to notify the tracer

If they all take their local lock at the same time there is
a deadlock.

I don't think the restriction that you can never ptrace anyone
up the process tree is going to fly.  So it looks like I am back to the
drawing board for this one.

Eric




    



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:47               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 14:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 04/26, Eric W. Biederman wrote:
>>>
>>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>>   	}
>>>
>>>  	sighand = parent->sighand;
>>> -	spin_lock_irqsave(&sighand->siglock, flags);
>>> +	lock = tsk->sighand != sighand;
>>> +	if (lock)
>>> +		spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>>
>> But why is it safe?
>>
>> Suppose we have two tasks, they both trace each other, both call
>> ptrace_stop() at the same time. Of course this is ugly, they both
>> will block.
>>
>> But with this patch in this case we have the trivial ABBA deadlock,
>> no?
>
> I was thinking in terms of the process tree (which is fine).
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles.  Which as you point out is not fine.
>
>
> The result is very nice and I don't want to give it up.  I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.  That is going to take some analsysis and some additional
> code in ptrace_attach.
>
> I will go look at that.


Hmm.  If we have the following process tree.

    A
     \
      B
       \
        C

Process A, B, and C are all in the same process group.
Process A and B are setup to receive SIGCHILD when
their process stops.

Process C traces process A.

When a sigstop is delivered to the group we can have:

Process B takes siglock(B) siglock(A) to notify the real_parent
Process C takes siglock(C) siglock(B) to notify the real_parent
Process A takes siglock(A) siglock(C) to notify the tracer

If they all take their local lock at the same time there is
a deadlock.

I don't think the restriction that you can never ptrace anyone
up the process tree is going to fly.  So it looks like I am back to the
drawing board for this one.

Eric




    



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 14:56           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if current is not ptraced */
> +	if (unlikely(!current->ptrace))
> +		return (clear_code) ? 0 : exit_code;
> +
> +	/*
> +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> +	 * could be clear now.  We act as if SIGCONT is received after
> +	 * TASK_TRACED is entered - ignore it.
> +	 */
> +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> +		gstop_done = task_participate_group_stop(current);
> +
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	do_notify_parent_cldstop(current, true, why);
> +	if (gstop_done && ptrace_reparented(current))
> +		do_notify_parent_cldstop(current, false, why);

This doesn't look right too. The parent should be notified only after
we set __state = TASK_TRACED and ->exit code.

Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
wakes it up, debugger calls wait_task_stopped() and then it will sleep
again, task_stopped_code() returns 0.

This can be probably fixed if you remove the lockless (fast path)
task_stopped_code() check in wait_task_stopped(), but this is not
nice performance-wise...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 14:56           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 14:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if current is not ptraced */
> +	if (unlikely(!current->ptrace))
> +		return (clear_code) ? 0 : exit_code;
> +
> +	/*
> +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> +	 * could be clear now.  We act as if SIGCONT is received after
> +	 * TASK_TRACED is entered - ignore it.
> +	 */
> +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> +		gstop_done = task_participate_group_stop(current);
> +
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	do_notify_parent_cldstop(current, true, why);
> +	if (gstop_done && ptrace_reparented(current))
> +		do_notify_parent_cldstop(current, false, why);

This doesn't look right too. The parent should be notified only after
we set __state = TASK_TRACED and ->exit code.

Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
wakes it up, debugger calls wait_task_stopped() and then it will sleep
again, task_stopped_code() returns 0.

This can be probably fixed if you remove the lockless (fast path)
task_stopped_code() check in wait_task_stopped(), but this is not
nice performance-wise...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:56           ` Oleg Nesterov
@ 2022-04-27 15:00             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Eric W. Biederman wrote:
> >
> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> >  		spin_lock_irq(&current->sighand->siglock);
> >  	}
> >
> > +	/* Don't stop if current is not ptraced */
> > +	if (unlikely(!current->ptrace))
> > +		return (clear_code) ? 0 : exit_code;
> > +
> > +	/*
> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> > +	 * could be clear now.  We act as if SIGCONT is received after
> > +	 * TASK_TRACED is entered - ignore it.
> > +	 */
> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> > +		gstop_done = task_participate_group_stop(current);
> > +
> > +	/*
> > +	 * Notify parents of the stop.
> > +	 *
> > +	 * While ptraced, there are two parents - the ptracer and
> > +	 * the real_parent of the group_leader.  The ptracer should
> > +	 * know about every stop while the real parent is only
> > +	 * interested in the completion of group stop.  The states
> > +	 * for the two don't interact with each other.  Notify
> > +	 * separately unless they're gonna be duplicates.
> > +	 */
> > +	do_notify_parent_cldstop(current, true, why);
> > +	if (gstop_done && ptrace_reparented(current))
> > +		do_notify_parent_cldstop(current, false, why);
>
> This doesn't look right too. The parent should be notified only after
> we set __state = TASK_TRACED and ->exit code.
>
> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
> wakes it up, debugger calls wait_task_stopped() and then it will sleep
> again, task_stopped_code() returns 0.
>
> This can be probably fixed if you remove the lockless (fast path)
> task_stopped_code() check in wait_task_stopped(), but this is not
> nice performance-wise...

On the other hand, I don't understand why did you move the callsite
of do_notify_parent_cldstop() up... just don't do this?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 15:00             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Eric W. Biederman wrote:
> >
> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> >  		spin_lock_irq(&current->sighand->siglock);
> >  	}
> >
> > +	/* Don't stop if current is not ptraced */
> > +	if (unlikely(!current->ptrace))
> > +		return (clear_code) ? 0 : exit_code;
> > +
> > +	/*
> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
> > +	 * could be clear now.  We act as if SIGCONT is received after
> > +	 * TASK_TRACED is entered - ignore it.
> > +	 */
> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> > +		gstop_done = task_participate_group_stop(current);
> > +
> > +	/*
> > +	 * Notify parents of the stop.
> > +	 *
> > +	 * While ptraced, there are two parents - the ptracer and
> > +	 * the real_parent of the group_leader.  The ptracer should
> > +	 * know about every stop while the real parent is only
> > +	 * interested in the completion of group stop.  The states
> > +	 * for the two don't interact with each other.  Notify
> > +	 * separately unless they're gonna be duplicates.
> > +	 */
> > +	do_notify_parent_cldstop(current, true, why);
> > +	if (gstop_done && ptrace_reparented(current))
> > +		do_notify_parent_cldstop(current, false, why);
>
> This doesn't look right too. The parent should be notified only after
> we set __state = TASK_TRACED and ->exit code.
>
> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
> wakes it up, debugger calls wait_task_stopped() and then it will sleep
> again, task_stopped_code() returns 0.
>
> This can be probably fixed if you remove the lockless (fast path)
> task_stopped_code() check in wait_task_stopped(), but this is not
> nice performance-wise...

On the other hand, I don't understand why did you move the callsite
of do_notify_parent_cldstop() up... just don't do this?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:14           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

ACK, but I think that the changelog is wrong.

We could do this right after may_ptrace_stop() has gone. This doesn't
depend on the previous changes in this series.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-27 15:14           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> was needed to detect the when ptrace_stop would decide not to stop
> after calling "set_special_state(TASK_TRACED)".  With the recent
> cleanups ptrace_stop will always stop after calling set_special_state.
>
> Take advatnage of this by no longer asking wait_task_inactive to
> verify the state.  If a bug is hit and wait_task_inactive does not
> succeed warn and return -ESRCH.

ACK, but I think that the changelog is wrong.

We could do this right after may_ptrace_stop() has gone. This doesn't
depend on the previous changes in this series.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:20           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> +	if (lock_task_sighand(child, &flags)) {
> +		if (child->ptrace && child->parent == current) {
> +			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			/*
> +			 * child->sighand can't be NULL, release_task()
> +			 * does ptrace_unlink() before __exit_signal().
> +			 */
> +			if (ignore_state || ptrace_freeze_traced(child))
> +				ret = 0;

The comment above is no longer relevant, it should be removed.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-04-27 15:20           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> +	if (lock_task_sighand(child, &flags)) {
> +		if (child->ptrace && child->parent == current) {
> +			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			/*
> +			 * child->sighand can't be NULL, release_task()
> +			 * does ptrace_unlink() before __exit_signal().
> +			 */
> +			if (ignore_state || ptrace_freeze_traced(child))
> +				ret = 0;

The comment above is no longer relevant, it should be removed.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 15:41           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>  		return;
>
>  	WARN_ON(!task->ptrace || task->parent != current);
> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>  	 * Recheck state under the lock to close this race.
>  	 */
>  	spin_lock_irq(&task->sighand->siglock);

Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
The tracee can be already woken up by ptrace_resume(), but it is possible that
it didn't clear DELAY_WAKEKILL yet.

Now, before we take ->siglock, the tracee can exit and another thread can do
wait() and reap this task.

Also, I think the comment above should be updated. I agree, it makes sense to
re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared.

> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
>  	/* LISTENING can be set only during STOP traps, clear it */
>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

minor, but

	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);

looks better.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 15:41           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>  		return;
>
>  	WARN_ON(!task->ptrace || task->parent != current);
> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>  	 * Recheck state under the lock to close this race.
>  	 */
>  	spin_lock_irq(&task->sighand->siglock);

Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
The tracee can be already woken up by ptrace_resume(), but it is possible that
it didn't clear DELAY_WAKEKILL yet.

Now, before we take ->siglock, the tracee can exit and another thread can do
wait() and reap this task.

Also, I think the comment above should be updated. I agree, it makes sense to
re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared.

> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
>  	/* LISTENING can be set only during STOP traps, clear it */
>  	current->jobctl &= ~JOBCTL_LISTENING;
> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

minor, but

	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);

looks better.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
                     ` (3 preceding siblings ...)
  2022-04-25 17:47   ` Oleg Nesterov
@ 2022-04-27 15:53   ` Oleg Nesterov
  2022-04-27 21:57     ` Eric W. Biederman
  4 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 15:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, mingo, vincent.guittot, dietmar.eggemann, rostedt, mgorman,
	ebiederm, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On 04/21, Peter Zijlstra wrote:
>
> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
>  		goto out_put_task_struct;
>  
>  	ret = arch_ptrace(child, request, addr, data);
> -	if (ret || request != PTRACE_DETACH)
> -		ptrace_unfreeze_traced(child);
> +	ptrace_unfreeze_traced(child);

Forgot to mention... whatever we do this doesn't look right.

ptrace_unfreeze_traced() must not be called if the tracee was untraced,
anothet debugger can come after that. I agree, the current code looks
a bit confusing, perhaps it makes sense to re-write it:

	if (request == PTRACE_DETACH && ret == 0)
		; /* nothing to do, no longer traced by us */
	else
		ptrace_unfreeze_traced(child);

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
  (?)
  (?)
@ 2022-04-27 16:09         ` Oleg Nesterov
  2022-04-27 16:33           ` Eric W. Biederman
  -1 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 16:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/26, Eric W. Biederman wrote:
>
> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	 */
>  	if (lock_task_sighand(child, &flags)) {
>  		if (child->ptrace && child->parent == current) {
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);

This WARN_ON() doesn't look right.

It is possible that this child was traced by another task and PTRACE_DETACH'ed,
but it didn't clear DELAY_WAKEKILL.

If the new debugger attaches and calls ptrace() before the child takes siglock
ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 16:09         ` Oleg Nesterov
@ 2022-04-27 16:33           ` Eric W. Biederman
  2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 16:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	 */
>>  	if (lock_task_sighand(child, &flags)) {
>>  		if (child->ptrace && child->parent == current) {
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>
> This WARN_ON() doesn't look right.
>
> It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> but it didn't clear DELAY_WAKEKILL.

That would be a bug.  That would mean that PTRACE_DETACHED process can
not be SIGKILL'd.

> If the new debugger attaches and calls ptrace() before the child takes siglock
> ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 16:33           ` Eric W. Biederman
@ 2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 04/26, Eric W. Biederman wrote:
> >>
> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> >>  	 */
> >>  	if (lock_task_sighand(child, &flags)) {
> >>  		if (child->ptrace && child->parent == current) {
> >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> >
> > This WARN_ON() doesn't look right.
> >
> > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > but it didn't clear DELAY_WAKEKILL.
>
> That would be a bug.  That would mean that PTRACE_DETACHED process can
> not be SIGKILL'd.

Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
SIGKILL after that.

Oleg.

> > If the new debugger attaches and calls ptrace() before the child takes siglock
> > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> 
> Eric
> 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:18               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 04/26, Eric W. Biederman wrote:
> >>
> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> >>  	 */
> >>  	if (lock_task_sighand(child, &flags)) {
> >>  		if (child->ptrace && child->parent == current) {
> >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> >
> > This WARN_ON() doesn't look right.
> >
> > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > but it didn't clear DELAY_WAKEKILL.
>
> That would be a bug.  That would mean that PTRACE_DETACHED process can
> not be SIGKILL'd.

Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
SIGKILL after that.

Oleg.

> > If the new debugger attaches and calls ptrace() before the child takes siglock
> > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> 
> Eric
> 


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 17:18               ` Oleg Nesterov
@ 2022-04-27 17:21                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/27, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> > > On 04/26, Eric W. Biederman wrote:
> > >>
> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > >>  	 */
> > >>  	if (lock_task_sighand(child, &flags)) {
> > >>  		if (child->ptrace && child->parent == current) {
> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> > >
> > > This WARN_ON() doesn't look right.
> > >
> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > > but it didn't clear DELAY_WAKEKILL.
> >
> > That would be a bug.  That would mean that PTRACE_DETACHED process can
> > not be SIGKILL'd.
>
> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
> SIGKILL after that.

Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oleg.

> > > If the new debugger attaches and calls ptrace() before the child takes siglock
> > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> > 
> > Eric
> > 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:21                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-27 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Oleg Nesterov wrote:
>
> On 04/27, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <oleg@redhat.com> writes:
> >
> > > On 04/26, Eric W. Biederman wrote:
> > >>
> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > >>  	 */
> > >>  	if (lock_task_sighand(child, &flags)) {
> > >>  		if (child->ptrace && child->parent == current) {
> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> > >
> > > This WARN_ON() doesn't look right.
> > >
> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > > but it didn't clear DELAY_WAKEKILL.
> >
> > That would be a bug.  That would mean that PTRACE_DETACHED process can
> > not be SIGKILL'd.
>
> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
> SIGKILL after that.

Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oleg.

> > > If the new debugger attaches and calls ptrace() before the child takes siglock
> > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> > 
> > Eric
> > 


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 17:21                 ` Oleg Nesterov
@ 2022-04-27 17:31                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 17:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/27, Eric W. Biederman wrote:
>> >
>> > Oleg Nesterov <oleg@redhat.com> writes:
>> >
>> > > On 04/26, Eric W. Biederman wrote:
>> > >>
>> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> > >>  	 */
>> > >>  	if (lock_task_sighand(child, &flags)) {
>> > >>  		if (child->ptrace && child->parent == current) {
>> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > >
>> > > This WARN_ON() doesn't look right.
>> > >
>> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
>> > > but it didn't clear DELAY_WAKEKILL.
>> >
>> > That would be a bug.  That would mean that PTRACE_DETACHED process can
>> > not be SIGKILL'd.
>>
>> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
>> SIGKILL after that.
>
> Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
> up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oh.  You are talking about the window when between clearing the
traced state and when tracee resumes executing and clears
JOBCTL_DELAY_WAKEKILL.

I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked.

That requires both ptrace_attach and ptrace_check_attach for the new
tracer to happen before the tracee is scheduled to run.

I agree.  I think the WARN_ON could reasonably be moved a bit later,
but I don't know that the WARN_ON is important. I simply kept it because
it seemed to make sense.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 17:31                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 17:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/27, Eric W. Biederman wrote:
>> >
>> > Oleg Nesterov <oleg@redhat.com> writes:
>> >
>> > > On 04/26, Eric W. Biederman wrote:
>> > >>
>> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> > >>  	 */
>> > >>  	if (lock_task_sighand(child, &flags)) {
>> > >>  		if (child->ptrace && child->parent == current) {
>> > >> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> > >> +			WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > >
>> > > This WARN_ON() doesn't look right.
>> > >
>> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
>> > > but it didn't clear DELAY_WAKEKILL.
>> >
>> > That would be a bug.  That would mean that PTRACE_DETACHED process can
>> > not be SIGKILL'd.
>>
>> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
>> SIGKILL after that.
>
> Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
> up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oh.  You are talking about the window when between clearing the
traced state and when tracee resumes executing and clears
JOBCTL_DELAY_WAKEKILL.

I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked.

That requires both ptrace_attach and ptrace_check_attach for the new
tracer to happen before the tracee is scheduled to run.

I agree.  I think the WARN_ON could reasonably be moved a bit later,
but I don't know that the WARN_ON is important. I simply kept it because
it seemed to make sense.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 15:00             ` Oleg Nesterov
@ 2022-04-27 21:52               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/26, Eric W. Biederman wrote:
>> >
>> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> >  		spin_lock_irq(&current->sighand->siglock);
>> >  	}
>> >
>> > +	/* Don't stop if current is not ptraced */
>> > +	if (unlikely(!current->ptrace))
>> > +		return (clear_code) ? 0 : exit_code;
>> > +
>> > +	/*
>> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
>> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
>> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
>> > +	 * could be clear now.  We act as if SIGCONT is received after
>> > +	 * TASK_TRACED is entered - ignore it.
>> > +	 */
>> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
>> > +		gstop_done = task_participate_group_stop(current);
>> > +
>> > +	/*
>> > +	 * Notify parents of the stop.
>> > +	 *
>> > +	 * While ptraced, there are two parents - the ptracer and
>> > +	 * the real_parent of the group_leader.  The ptracer should
>> > +	 * know about every stop while the real parent is only
>> > +	 * interested in the completion of group stop.  The states
>> > +	 * for the two don't interact with each other.  Notify
>> > +	 * separately unless they're gonna be duplicates.
>> > +	 */
>> > +	do_notify_parent_cldstop(current, true, why);
>> > +	if (gstop_done && ptrace_reparented(current))
>> > +		do_notify_parent_cldstop(current, false, why);
>>
>> This doesn't look right too. The parent should be notified only after
>> we set __state = TASK_TRACED and ->exit code.
>>
>> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
>> wakes it up, debugger calls wait_task_stopped() and then it will sleep
>> again, task_stopped_code() returns 0.
>>
>> This can be probably fixed if you remove the lockless (fast path)
>> task_stopped_code() check in wait_task_stopped(), but this is not
>> nice performance-wise...

Another detail I have overlooked.  Thank you.

Or we can change task_stopped_code look something like:

static int *task_stopped_code(struct task_struct *p, bool ptrace)
{
	if (ptrace) {
-		if (task_is_traced(p) && !(p->jobctl & JOBCTL_LISTENING))
+		if (p->ptrace && !(p->jobctl & JOBCTL_LISTENING))
			return &p->exit_code;
	} else {
		if (p->signal->flags & SIGNAL_STOP_STOPPED)
			return &p->signal->group_exit_code;
	}
	return NULL;
}

I probably need to do a little bit more to ensure that it isn't an
actual process exit_code in p->exit_code.  But the we don't have to
limit ourselves to being precisely in the task_is_traced stopped place
for the fast path.


> On the other hand, I don't understand why did you move the callsite
> of do_notify_parent_cldstop() up... just don't do this?

My goal and I still think it makes sense (if not my implementation)
is to move set_special_state as close as possible to schedule().

That way we can avoid sleeping spin_locks clobbering it and making
our life difficult.

My hope is we can just clean up ptrace_stop instead of making it more
complicated and harder to follow.  Not that I am fundamentally opposed
to the quiesce bit but the code is already very hard to follow because
of all it's nuance and complexity, and I would really like to reduce
that complexity if we can possibly figure out how.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-27 21:52               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/26, Eric W. Biederman wrote:
>> >
>> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> >  		spin_lock_irq(&current->sighand->siglock);
>> >  	}
>> >
>> > +	/* Don't stop if current is not ptraced */
>> > +	if (unlikely(!current->ptrace))
>> > +		return (clear_code) ? 0 : exit_code;
>> > +
>> > +	/*
>> > +	 * If @why is CLD_STOPPED, we're trapping to participate in a group
>> > +	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
>> > +	 * across siglock relocks since INTERRUPT was scheduled, PENDING
>> > +	 * could be clear now.  We act as if SIGCONT is received after
>> > +	 * TASK_TRACED is entered - ignore it.
>> > +	 */
>> > +	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
>> > +		gstop_done = task_participate_group_stop(current);
>> > +
>> > +	/*
>> > +	 * Notify parents of the stop.
>> > +	 *
>> > +	 * While ptraced, there are two parents - the ptracer and
>> > +	 * the real_parent of the group_leader.  The ptracer should
>> > +	 * know about every stop while the real parent is only
>> > +	 * interested in the completion of group stop.  The states
>> > +	 * for the two don't interact with each other.  Notify
>> > +	 * separately unless they're gonna be duplicates.
>> > +	 */
>> > +	do_notify_parent_cldstop(current, true, why);
>> > +	if (gstop_done && ptrace_reparented(current))
>> > +		do_notify_parent_cldstop(current, false, why);
>>
>> This doesn't look right too. The parent should be notified only after
>> we set __state = TASK_TRACED and ->exit code.
>>
>> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
>> wakes it up, debugger calls wait_task_stopped() and then it will sleep
>> again, task_stopped_code() returns 0.
>>
>> This can be probably fixed if you remove the lockless (fast path)
>> task_stopped_code() check in wait_task_stopped(), but this is not
>> nice performance-wise...

Another detail I have overlooked.  Thank you.

Or we can change task_stopped_code look something like:

static int *task_stopped_code(struct task_struct *p, bool ptrace)
{
	if (ptrace) {
-		if (task_is_traced(p) && !(p->jobctl & JOBCTL_LISTENING))
+		if (p->ptrace && !(p->jobctl & JOBCTL_LISTENING))
			return &p->exit_code;
	} else {
		if (p->signal->flags & SIGNAL_STOP_STOPPED)
			return &p->signal->group_exit_code;
	}
	return NULL;
}

I probably need to do a little bit more to ensure that it isn't an
actual process exit_code in p->exit_code.  But the we don't have to
limit ourselves to being precisely in the task_is_traced stopped place
for the fast path.


> On the other hand, I don't understand why did you move the callsite
> of do_notify_parent_cldstop() up... just don't do this?

My goal and I still think it makes sense (if not my implementation)
is to move set_special_state as close as possible to schedule().

That way we can avoid sleeping spin_locks clobbering it and making
our life difficult.

My hope is we can just clean up ptrace_stop instead of making it more
complicated and harder to follow.  Not that I am fundamentally opposed
to the quiesce bit but the code is already very hard to follow because
of all it's nuance and complexity, and I would really like to reduce
that complexity if we can possibly figure out how.

Eric



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-27 15:53   ` Oleg Nesterov
@ 2022-04-27 21:57     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 21:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
>>  		goto out_put_task_struct;
>>  
>>  	ret = arch_ptrace(child, request, addr, data);
>> -	if (ret || request != PTRACE_DETACH)
>> -		ptrace_unfreeze_traced(child);
>> +	ptrace_unfreeze_traced(child);
>
> Forgot to mention... whatever we do this doesn't look right.
>
> ptrace_unfreeze_traced() must not be called if the tracee was untraced,
> anothet debugger can come after that. I agree, the current code looks
> a bit confusing, perhaps it makes sense to re-write it:
>
> 	if (request == PTRACE_DETACH && ret == 0)
> 		; /* nothing to do, no longer traced by us */
> 	else
> 		ptrace_unfreeze_traced(child);

This was a bug in my original JOBCTL_DELAY_WAITKILL patch and it was
just cut and pasted here.  I thought it made sense when I was throwing
things together but when I looked more closely I realized that it is
not safe.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 15:41           ` Oleg Nesterov
@ 2022-04-27 22:35             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 22:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

2> On 04/26, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>>  		return;
>>
>>  	WARN_ON(!task->ptrace || task->parent != current);
>> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>>  	 * Recheck state under the lock to close this race.
>>  	 */
>>  	spin_lock_irq(&task->sighand->siglock);
>
> Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
> The tracee can be already woken up by ptrace_resume(), but it is possible that
> it didn't clear DELAY_WAKEKILL yet.

Yes.  The subtle differences in when __TASK_TRACED and
JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues.

This "WARN_ON(!task->ptrace || task->parent != current);" also now
needs to be inside siglock, because the __TASK_TRACED is insufficient.


> Now, before we take ->siglock, the tracee can exit and another thread can do
> wait() and reap this task.
>
> Also, I think the comment above should be updated. I agree, it makes sense to
> re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
> need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
> wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already
> cleared.

I think you are right about it being safe, but I am having a hard time
convincing myself that is true.  I want to be very careful sending
__TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle
spurious wake_ups.

So I think adding task_is_traced to the test to verify the task
is still frozen.

static void ptrace_unfreeze_traced(struct task_struct *task)
{
	unsigned long flags;

	/*
	 * Verify the task is still frozen before unfreezing it,
	 * ptrace_resume could have unfrozen us.
	 */
	if (lock_task_sighand(task, &flags)) {
		if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) &&
		    task_is_traced(task)) {
			task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
			if (__fatal_signal_pending(task))
				wake_up_state(task, __TASK_TRACED);
		}
		unlock_task_sighand(task, &flags);
	}
}

>> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>
>>  	/* LISTENING can be set only during STOP traps, clear it */
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> minor, but
>
> 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);
>
> looks better.

Yes.


Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 22:35             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 22:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

2> On 04/26, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>>  		return;
>>
>>  	WARN_ON(!task->ptrace || task->parent != current);
>> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>>  	 * Recheck state under the lock to close this race.
>>  	 */
>>  	spin_lock_irq(&task->sighand->siglock);
>
> Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
> The tracee can be already woken up by ptrace_resume(), but it is possible that
> it didn't clear DELAY_WAKEKILL yet.

Yes.  The subtle differences in when __TASK_TRACED and
JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues.

This "WARN_ON(!task->ptrace || task->parent != current);" also now
needs to be inside siglock, because the __TASK_TRACED is insufficient.


> Now, before we take ->siglock, the tracee can exit and another thread can do
> wait() and reap this task.
>
> Also, I think the comment above should be updated. I agree, it makes sense to
> re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
> need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
> wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already
> cleared.

I think you are right about it being safe, but I am having a hard time
convincing myself that is true.  I want to be very careful sending
__TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle
spurious wake_ups.

So I think adding task_is_traced to the test to verify the task
is still frozen.

static void ptrace_unfreeze_traced(struct task_struct *task)
{
	unsigned long flags;

	/*
	 * Verify the task is still frozen before unfreezing it,
	 * ptrace_resume could have unfrozen us.
	 */
	if (lock_task_sighand(task, &flags)) {
		if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) &&
		    task_is_traced(task)) {
			task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
			if (__fatal_signal_pending(task))
				wake_up_state(task, __TASK_TRACED);
		}
		unlock_task_sighand(task, &flags);
	}
}

>> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>
>>  	/* LISTENING can be set only during STOP traps, clear it */
>>  	current->jobctl &= ~JOBCTL_LISTENING;
>> +	current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> minor, but
>
> 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);
>
> looks better.

Yes.


Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-27 23:05           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..1947c85aa9d9 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>  
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>  }
>  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>  {

Grrr.  While looking through everything today I have realized that there
is a bug.

Suppose we have 3 processes: TRACER, TRACEE, KILLER.

Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
been dropped.

The TRACER process has performed ptrace_attach on TRACEE and is in the
middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.

Then comes in the KILLER process and sends the TRACEE a SIGKILL.
The TRACEE __state remains TASK_TRACED, as designed.

The bug appears when the TRACEE makes it to schedule().  Inside
schedule there is a call to signal_pending_state() which notices
a SIGKILL is pending and refuses to sleep.

I could avoid setting TIF_SIGPENDING in signal_wake_up but that
is insufficient as another signal may be pending.

I could avoid marking the task as __fatal_signal_pending but then
where would the information that the task needs to become
__fatal_signal_pending go.

Hmm.

This looks like I need my other pending cleanup which introduces a
helper to get this idea to work.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-27 23:05           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-27 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..1947c85aa9d9 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>  
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>  }
>  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>  {

Grrr.  While looking through everything today I have realized that there
is a bug.

Suppose we have 3 processes: TRACER, TRACEE, KILLER.

Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
been dropped.

The TRACER process has performed ptrace_attach on TRACEE and is in the
middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.

Then comes in the KILLER process and sends the TRACEE a SIGKILL.
The TRACEE __state remains TASK_TRACED, as designed.

The bug appears when the TRACEE makes it to schedule().  Inside
schedule there is a call to signal_pending_state() which notices
a SIGKILL is pending and refuses to sleep.

I could avoid setting TIF_SIGPENDING in signal_wake_up but that
is insufficient as another signal may be pending.

I could avoid marking the task as __fatal_signal_pending but then
where would the information that the task needs to become
__fatal_signal_pending go.

Hmm.

This looks like I need my other pending cleanup which introduces a
helper to get this idea to work.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-26 23:34   ` Eric W. Biederman
@ 2022-04-28 10:00     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, linux-kernel, tj, linux-pm

On Tue, Apr 26, 2022 at 06:34:09PM -0500, Eric W. Biederman wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > Currently ptrace_stop() / do_signal_stop() rely on the special states
> > TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> > state exists only in task->__state and nowhere else.
> >
> > There's two spots of bother with this:
> >
> >  - PREEMPT_RT has task->saved_state which complicates matters,
> >    meaning task_is_{traced,stopped}() needs to check an additional
> >    variable.
> >
> >  - An alternative freezer implementation that itself relies on a
> >    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> >    result in misbehaviour.
> >
> > As such, add additional state to task->jobctl to track this state
> > outside of task->__state.
> >
> > NOTE: this doesn't actually fix anything yet, just adds extra state.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -770,7 +773,9 @@ void signal_wake_up_state(struct task_st
> >  	 * By using wake_up_state, we ensure the process will wake up and
> >  	 * handle its death signal.
> >  	 */
> > -	if (!wake_up_state(t, state | TASK_INTERRUPTIBLE))
> > +	if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
> > +		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
> > +	else
> >  		kick_process(t);
> >  }
> 
> This hunk is subtle and I don't think it is actually what we want if the
> code is going to be robust against tsk->__state becoming TASK_FROZEN.

Oooh, indeed. Yes, let me go back to that resume based thing as you
suggest.

But first, let me go read all your patches :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/9] ptrace: cleaning up ptrace_stop
  2022-04-26 22:50       ` Eric W. Biederman
@ 2022-04-28 10:07         ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook

On Tue, Apr 26, 2022 at 05:50:21PM -0500, Eric W. Biederman wrote:
> ....  Peter Zijlstra has
> been rewriting the classic freezer and in earlier parts of this
> discussion so I presume it is also a problem for PREEMPT_RT.

Ah, the freezer thing is in fact a sched/arm64 issue, the common issue
between these two issues is ptrace though.

Specifically, on recent arm64 chips only a subset of CPUs can execute
arm32 code and 32bit processes are restricted to that subset. If by some
mishap you try and execute a 32bit task on a non-capable CPU it gets
terminated without prejudice.

Now, the current freezer has this problem that tasks can spuriously thaw
too soon (where too soon is before SMP is restored) which leads to these
32bit tasks being killed dead.

That, and it was a good excuse to fix up the current freezer :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/9] ptrace: cleaning up ptrace_stop
@ 2022-04-28 10:07         ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook

On Tue, Apr 26, 2022 at 05:50:21PM -0500, Eric W. Biederman wrote:
> ....  Peter Zijlstra has
> been rewriting the classic freezer and in earlier parts of this
> discussion so I presume it is also a problem for PREEMPT_RT.

Ah, the freezer thing is in fact a sched/arm64 issue, the common issue
between these two issues is ptrace though.

Specifically, on recent arm64 chips only a subset of CPUs can execute
arm32 code and 32bit processes are restricted to that subset. If by some
mishap you try and execute a 32bit task on a non-capable CPU it gets
terminated without prejudice.

Now, the current freezer has this problem that tasks can spuriously thaw
too soon (where too soon is before SMP is restored) which leads to these
32bit tasks being killed dead.

That, and it was a good excuse to fix up the current freezer :-)

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 1/9] signal: Rename send_signal send_signal_locked
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-28 10:27           ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:03PM -0500, Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make
> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/signal.h |  2 ++
>  kernel/signal.c        | 24 ++++++++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index a6db6f2ae113..55605bdf5ce9 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
>  extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
>  			       struct task_struct *p, enum pid_type type);
>  extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
> +extern int send_signal_locked(int sig, struct kernel_siginfo *info,
> +			      struct task_struct *p, enum pid_type type);
>  extern int sigprocmask(int, sigset_t *, sigset_t *);
>  extern void set_current_blocked(sigset_t *);
>  extern void __set_current_blocked(const sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 30cd1ca43bcd..b0403197b0ad 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
> -			enum pid_type type, bool force)
> +static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> +				struct task_struct *t, enum pid_type type, bool force)
>  {
>  	struct sigpending *pending;
>  	struct sigqueue *q;

While there, could you please replace that assert_spin_locked() with
lockdep_assert_held(&t->sighand->siglock) ?

The distinction being that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build without
lockdep.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 1/9] signal: Rename send_signal send_signal_locked
@ 2022-04-28 10:27           ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:03PM -0500, Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make
> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/signal.h |  2 ++
>  kernel/signal.c        | 24 ++++++++++++------------
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index a6db6f2ae113..55605bdf5ce9 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
>  extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
>  			       struct task_struct *p, enum pid_type type);
>  extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
> +extern int send_signal_locked(int sig, struct kernel_siginfo *info,
> +			      struct task_struct *p, enum pid_type type);
>  extern int sigprocmask(int, sigset_t *, sigset_t *);
>  extern void set_current_blocked(sigset_t *);
>  extern void __set_current_blocked(const sigset_t *);
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 30cd1ca43bcd..b0403197b0ad 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
> -			enum pid_type type, bool force)
> +static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> +				struct task_struct *t, enum pid_type type, bool force)
>  {
>  	struct sigpending *pending;
>  	struct sigqueue *q;

While there, could you please replace that assert_spin_locked() with
lockdep_assert_held(&t->sighand->siglock) ?

The distinction being that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build without
lockdep.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-26 22:52         ` Eric W. Biederman
@ 2022-04-28 10:38           ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:08PM -0500, Eric W. Biederman wrote:
> Now that siglock keeps tsk->parent and tsk->real_parent constant
> require that do_notify_parent_cldstop is called with tsk->siglock held
> instead of the tasklist_lock.
> 
> As all of the callers of do_notify_parent_cldstop had to drop the
> siglock and take tasklist_lock this simplifies all of it's callers.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/signal.c | 156 +++++++++++++++++-------------------------------
>  1 file changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 72d96614effc..584d67deb3cb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  				     bool for_ptracer, int why)
>  {
>  	struct kernel_siginfo info;
> -	unsigned long flags;
>  	struct task_struct *parent;
>  	struct sighand_struct *sighand;
> +	bool lock;
>  	u64 utime, stime;
>  
> +	assert_spin_locked(&tsk->sighand->siglock);

lockdep_assert_held() please...

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 10:38           ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Tue, Apr 26, 2022 at 05:52:08PM -0500, Eric W. Biederman wrote:
> Now that siglock keeps tsk->parent and tsk->real_parent constant
> require that do_notify_parent_cldstop is called with tsk->siglock held
> instead of the tasklist_lock.
> 
> As all of the callers of do_notify_parent_cldstop had to drop the
> siglock and take tasklist_lock this simplifies all of it's callers.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/signal.c | 156 +++++++++++++++++-------------------------------
>  1 file changed, 55 insertions(+), 101 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 72d96614effc..584d67deb3cb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  				     bool for_ptracer, int why)
>  {
>  	struct kernel_siginfo info;
> -	unsigned long flags;
>  	struct task_struct *parent;
>  	struct sighand_struct *sighand;
> +	bool lock;
>  	u64 utime, stime;
>  
> +	assert_spin_locked(&tsk->sighand->siglock);

lockdep_assert_held() please...

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-27 15:14           ` Oleg Nesterov
@ 2022-04-28 10:42             ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> On 04/26, Eric W. Biederman wrote:
> >
> > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > was needed to detect the when ptrace_stop would decide not to stop
> > after calling "set_special_state(TASK_TRACED)".  With the recent
> > cleanups ptrace_stop will always stop after calling set_special_state.
> >
> > Take advatnage of this by no longer asking wait_task_inactive to
> > verify the state.  If a bug is hit and wait_task_inactive does not
> > succeed warn and return -ESRCH.
> 
> ACK, but I think that the changelog is wrong.
> 
> We could do this right after may_ptrace_stop() has gone. This doesn't
> depend on the previous changes in this series.

It very much does rely on there not being any blocking between
set_special_state() and schedule() tho. So all those PREEMPT_RT
spinlock->rt_mutex things need to be gone.

That is also the reason I couldn't do wait_task_inactive(task, 0) in the
other patch, I had to really match 'TASK_TRACED or TASK_FROZEN' any
other state must fail (specifically TASK_RTLOCK_WAIT must not match).

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 10:42             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 10:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> On 04/26, Eric W. Biederman wrote:
> >
> > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > was needed to detect the when ptrace_stop would decide not to stop
> > after calling "set_special_state(TASK_TRACED)".  With the recent
> > cleanups ptrace_stop will always stop after calling set_special_state.
> >
> > Take advatnage of this by no longer asking wait_task_inactive to
> > verify the state.  If a bug is hit and wait_task_inactive does not
> > succeed warn and return -ESRCH.
> 
> ACK, but I think that the changelog is wrong.
> 
> We could do this right after may_ptrace_stop() has gone. This doesn't
> depend on the previous changes in this series.

It very much does rely on there not being any blocking between
set_special_state() and schedule() tho. So all those PREEMPT_RT
spinlock->rt_mutex things need to be gone.

That is also the reason I couldn't do wait_task_inactive(task, 0) in the
other patch, I had to really match 'TASK_TRACED or TASK_FROZEN' any
other state must fail (specifically TASK_RTLOCK_WAIT must not match).

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 10:42             ` Peter Zijlstra
@ 2022-04-28 11:19               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> > On 04/26, Eric W. Biederman wrote:
> > >
> > > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > > was needed to detect the when ptrace_stop would decide not to stop
> > > after calling "set_special_state(TASK_TRACED)".  With the recent
> > > cleanups ptrace_stop will always stop after calling set_special_state.
> > >
> > > Take advatnage of this by no longer asking wait_task_inactive to
> > > verify the state.  If a bug is hit and wait_task_inactive does not
> > > succeed warn and return -ESRCH.
> >
> > ACK, but I think that the changelog is wrong.
> >
> > We could do this right after may_ptrace_stop() has gone. This doesn't
> > depend on the previous changes in this series.
>
> It very much does rely on there not being any blocking between
> set_special_state() and schedule() tho. So all those PREEMPT_RT
> spinlock->rt_mutex things need to be gone.

Yes sure. But this patch doesn't add the new problems, imo.

Yes we can hit the WARN_ON_ONCE(!wait_task_inactive()), but this is
correct in that it should not fail, and this is what we need to fix.

> That is also the reason I couldn't do wait_task_inactive(task, 0)

Ah, I din't notice this patch uses wait_task_inactive(child, 0),
I think it should do wait_task_inactive(child, __TASK_TRACED).

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 11:19               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Wed, Apr 27, 2022 at 05:14:57PM +0200, Oleg Nesterov wrote:
> > On 04/26, Eric W. Biederman wrote:
> > >
> > > Asking wait_task_inactive to verify that tsk->__state == __TASK_TRACED
> > > was needed to detect the when ptrace_stop would decide not to stop
> > > after calling "set_special_state(TASK_TRACED)".  With the recent
> > > cleanups ptrace_stop will always stop after calling set_special_state.
> > >
> > > Take advatnage of this by no longer asking wait_task_inactive to
> > > verify the state.  If a bug is hit and wait_task_inactive does not
> > > succeed warn and return -ESRCH.
> >
> > ACK, but I think that the changelog is wrong.
> >
> > We could do this right after may_ptrace_stop() has gone. This doesn't
> > depend on the previous changes in this series.
>
> It very much does rely on there not being any blocking between
> set_special_state() and schedule() tho. So all those PREEMPT_RT
> spinlock->rt_mutex things need to be gone.

Yes sure. But this patch doesn't add the new problems, imo.

Yes we can hit the WARN_ON_ONCE(!wait_task_inactive()), but this is
correct in that it should not fail, and this is what we need to fix.

> That is also the reason I couldn't do wait_task_inactive(task, 0)

Ah, I din't notice this patch uses wait_task_inactive(child, 0),
I think it should do wait_task_inactive(child, __TASK_TRACED).

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 11:19               ` Oleg Nesterov
@ 2022-04-28 13:54                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 13:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > That is also the reason I couldn't do wait_task_inactive(task, 0)
> 
> Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> I think it should do wait_task_inactive(child, __TASK_TRACED).

Shouldn't we then switch wait_task_inactive() so have & matching instead
of the current ==.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 13:54                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 13:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > That is also the reason I couldn't do wait_task_inactive(task, 0)
> 
> Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> I think it should do wait_task_inactive(child, __TASK_TRACED).

Shouldn't we then switch wait_task_inactive() so have & matching instead
of the current ==.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 13:54                 ` Peter Zijlstra
@ 2022-04-28 14:57                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > > That is also the reason I couldn't do wait_task_inactive(task, 0)
> >
> > Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> > I think it should do wait_task_inactive(child, __TASK_TRACED).
>
> Shouldn't we then switch wait_task_inactive() so have & matching instead
> of the current ==.

Sorry, I don't understand the context...

As long as ptrace_freeze_traced() sets __state == __TASK_TRACED (as it
currently does) wait_task_inactive(__TASK_TRACED) is what we need ?

After we change it to use JOBCTL_DELAY_WAKEKILL and not abuse __state,
ptrace_attach() should use wait_task_inactive(TASK_TRACED), but this
depends on what exactly we are going to do...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 14:57                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 01:19:11PM +0200, Oleg Nesterov wrote:
> > > That is also the reason I couldn't do wait_task_inactive(task, 0)
> >
> > Ah, I din't notice this patch uses wait_task_inactive(child, 0),
> > I think it should do wait_task_inactive(child, __TASK_TRACED).
>
> Shouldn't we then switch wait_task_inactive() so have & matching instead
> of the current ==.

Sorry, I don't understand the context...

As long as ptrace_freeze_traced() sets __state == __TASK_TRACED (as it
currently does) wait_task_inactive(__TASK_TRACED) is what we need ?

After we change it to use JOBCTL_DELAY_WAKEKILL and not abuse __state,
ptrace_attach() should use wait_task_inactive(TASK_TRACED), but this
depends on what exactly we are going to do...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-27 23:05           ` Eric W. Biederman
@ 2022-04-28 15:11             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 15:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> > index 3c8b34876744..1947c85aa9d9 100644
> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
> >
> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >  {
> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> >  }
> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> >  {
>
> Grrr.  While looking through everything today I have realized that there
> is a bug.
>
> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>
> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
> been dropped.
>
> The TRACER process has performed ptrace_attach on TRACEE and is in the
> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>
> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
> The TRACEE __state remains TASK_TRACED, as designed.
>
> The bug appears when the TRACEE makes it to schedule().  Inside
> schedule there is a call to signal_pending_state() which notices
> a SIGKILL is pending and refuses to sleep.

And I think this is fine. This doesn't really differ from the case
when the tracee was killed before it takes siglock.

The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
ptrace_stop() can leak this flag. That is why I suggested to clear
it along with LISTENING/DELAY_WAKEKILL before return, exactly because
schedule() won't block if fatal_signal_pending() is true.

But may be I misunderstood you concern?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 15:11             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 15:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/27, Eric W. Biederman wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> > index 3c8b34876744..1947c85aa9d9 100644
> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
> >
> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >  {
> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> >  }
> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> >  {
>
> Grrr.  While looking through everything today I have realized that there
> is a bug.
>
> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>
> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
> been dropped.
>
> The TRACER process has performed ptrace_attach on TRACEE and is in the
> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>
> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
> The TRACEE __state remains TASK_TRACED, as designed.
>
> The bug appears when the TRACEE makes it to schedule().  Inside
> schedule there is a call to signal_pending_state() which notices
> a SIGKILL is pending and refuses to sleep.

And I think this is fine. This doesn't really differ from the case
when the tracee was killed before it takes siglock.

The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
ptrace_stop() can leak this flag. That is why I suggested to clear
it along with LISTENING/DELAY_WAKEKILL before return, exactly because
schedule() won't block if fatal_signal_pending() is true.

But may be I misunderstood you concern?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 14:57                   ` Oleg Nesterov
@ 2022-04-28 16:09                     ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:

> > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > of the current ==.
> 
> Sorry, I don't understand the context...

This.. I've always found it strange to have wti use a different matching
scheme from ttwu.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f259621f4c93..c039aef4c8fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || (READ_ONCE(p->__state) & match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 16:09                     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 16:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:

> > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > of the current ==.
> 
> Sorry, I don't understand the context...

This.. I've always found it strange to have wti use a different matching
scheme from ttwu.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f259621f4c93..c039aef4c8fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
 				return 0;
 			cpu_relax();
 		}
@@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		running = task_running(rq, p);
 		queued = task_on_rq_queued(p);
 		ncsw = 0;
-		if (!match_state || READ_ONCE(p->__state) == match_state)
+		if (!match_state || (READ_ONCE(p->__state) & match_state))
 			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
 		task_rq_unlock(rq, p, &rf);
 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
  2022-04-28 16:09                     ` Peter Zijlstra
@ 2022-04-28 16:19                       ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:
>
> > > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > > of the current ==.
> >
> > Sorry, I don't understand the context...
>
> This.. I've always found it strange to have wti use a different matching
> scheme from ttwu.

Ah. This is what I understood (and I too thought about this), just I meant that
this patch from Eric (assuming wait_task_inactive() still uses __TASK_TRACED) is
fine without your change below.

Oleg.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f259621f4c93..c039aef4c8fe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		 * is actually now running somewhere else!
>  		 */
>  		while (task_running(rq, p)) {
> -			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> +			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
>  				return 0;
>  			cpu_relax();
>  		}
> @@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		running = task_running(rq, p);
>  		queued = task_on_rq_queued(p);
>  		ncsw = 0;
> -		if (!match_state || READ_ONCE(p->__state) == match_state)
> +		if (!match_state || (READ_ONCE(p->__state) & match_state))
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
>  		task_rq_unlock(rq, p, &rf);


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
@ 2022-04-28 16:19                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 04:57:50PM +0200, Oleg Nesterov wrote:
>
> > > Shouldn't we then switch wait_task_inactive() so have & matching instead
> > > of the current ==.
> >
> > Sorry, I don't understand the context...
>
> This.. I've always found it strange to have wti use a different matching
> scheme from ttwu.

Ah. This is what I understood (and I too thought about this), just I meant that
this patch from Eric (assuming wait_task_inactive() still uses __TASK_TRACED) is
fine without your change below.

Oleg.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f259621f4c93..c039aef4c8fe 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3304,7 +3304,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		 * is actually now running somewhere else!
>  		 */
>  		while (task_running(rq, p)) {
> -			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> +			if (match_state && unlikely(!(READ_ONCE(p->__state) & match_state)))
>  				return 0;
>  			cpu_relax();
>  		}
> @@ -3319,7 +3319,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
>  		running = task_running(rq, p);
>  		queued = task_on_rq_queued(p);
>  		ncsw = 0;
> -		if (!match_state || READ_ONCE(p->__state) == match_state)
> +		if (!match_state || (READ_ONCE(p->__state) & match_state))
>  			ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
>  		task_rq_unlock(rq, p, &rf);


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-28 15:11             ` Oleg Nesterov
@ 2022-04-28 16:50               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 16:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Eric W. Biederman wrote:
>>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> > index 3c8b34876744..1947c85aa9d9 100644
>> > --- a/include/linux/sched/signal.h
>> > +++ b/include/linux/sched/signal.h
>> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>> >
>> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> >  }
>> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>>
>> Grrr.  While looking through everything today I have realized that there
>> is a bug.
>>
>> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>>
>> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
>> been dropped.
>>
>> The TRACER process has performed ptrace_attach on TRACEE and is in the
>> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>>
>> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
>> The TRACEE __state remains TASK_TRACED, as designed.
>>
>> The bug appears when the TRACEE makes it to schedule().  Inside
>> schedule there is a call to signal_pending_state() which notices
>> a SIGKILL is pending and refuses to sleep.
>
> And I think this is fine. This doesn't really differ from the case
> when the tracee was killed before it takes siglock.

Hmm.  Maybe.

> The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
> ptrace_stop() can leak this flag. That is why I suggested to clear
> it along with LISTENING/DELAY_WAKEKILL before return, exactly because
> schedule() won't block if fatal_signal_pending() is true.
>
> But may be I misunderstood you concern?

Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED
we were guaranteed that schedule() would stop if a SIGKILL was
received after that point.  As well as being immune from wake-ups
from SIGKILL.

I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have
implemented it.

The practical concern then seems to be that we are not guaranteed
wait_task_inactive will succeed.  Which means that it must continue
to include the TASK_TRACED bit.

Previously we were actually guaranteed in ptrace_check_attach that after
ptrace_freeze_traced would succeed as any pending fatal signal would
cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
stop schedule from sleeping.  The ptraced task would continue to be
ptraced, as all other ptrace operations are blocked by virtue of ptrace
being single threaded.

I think in my tired mind yesterday I thought it would messing things
up after schedule decided to sleep.  Still I would like to be able to
let wait_task_inactive not care about the state of the process it is
going to sleep for.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 16:50               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 16:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/27, Eric W. Biederman wrote:
>>
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>
>> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> > index 3c8b34876744..1947c85aa9d9 100644
>> > --- a/include/linux/sched/signal.h
>> > +++ b/include/linux/sched/signal.h
>> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>> >
>> >  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>> > -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> > +	bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > +	signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> >  }
>> >  static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>> >  {
>>
>> Grrr.  While looking through everything today I have realized that there
>> is a bug.
>>
>> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>>
>> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
>> been dropped.
>>
>> The TRACER process has performed ptrace_attach on TRACEE and is in the
>> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>>
>> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
>> The TRACEE __state remains TASK_TRACED, as designed.
>>
>> The bug appears when the TRACEE makes it to schedule().  Inside
>> schedule there is a call to signal_pending_state() which notices
>> a SIGKILL is pending and refuses to sleep.
>
> And I think this is fine. This doesn't really differ from the case
> when the tracee was killed before it takes siglock.

Hmm.  Maybe.

> The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
> ptrace_stop() can leak this flag. That is why I suggested to clear
> it along with LISTENING/DELAY_WAKEKILL before return, exactly because
> schedule() won't block if fatal_signal_pending() is true.
>
> But may be I misunderstood you concern?

Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED
we were guaranteed that schedule() would stop if a SIGKILL was
received after that point.  As well as being immune from wake-ups
from SIGKILL.

I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have
implemented it.

The practical concern then seems to be that we are not guaranteed
wait_task_inactive will succeed.  Which means that it must continue
to include the TASK_TRACED bit.

Previously we were actually guaranteed in ptrace_check_attach that after
ptrace_freeze_traced would succeed as any pending fatal signal would
cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
stop schedule from sleeping.  The ptraced task would continue to be
ptraced, as all other ptrace operations are blocked by virtue of ptrace
being single threaded.

I think in my tired mind yesterday I thought it would messing things
up after schedule decided to sleep.  Still I would like to be able to
let wait_task_inactive not care about the state of the process it is
going to sleep for.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-27 14:47               ` Eric W. Biederman
@ 2022-04-28 17:44                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 17:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:

> Hmm.  If we have the following process tree.
> 
>     A
>      \
>       B
>        \
>         C
> 
> Process A, B, and C are all in the same process group.
> Process A and B are setup to receive SIGCHILD when
> their process stops.
> 
> Process C traces process A.
> 
> When a sigstop is delivered to the group we can have:
> 
> Process B takes siglock(B) siglock(A) to notify the real_parent
> Process C takes siglock(C) siglock(B) to notify the real_parent
> Process A takes siglock(A) siglock(C) to notify the tracer
> 
> If they all take their local lock at the same time there is
> a deadlock.
> 
> I don't think the restriction that you can never ptrace anyone
> up the process tree is going to fly.  So it looks like I am back to the
> drawing board for this one.

I've not had time to fully appreciate the nested locking here, but if it
is possible to rework things to always take both locks at the same time,
then it would be possible to impose an arbitrary lock order on things
and break the cycle that way.

That is, simply order the locks by their heap address or something:

static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
{
	if (sh1 > sh2)
		swap(sh1, sh2)

	spin_lock_irq(&sh1->siglock);
	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
}


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 17:44                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 17:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:

> Hmm.  If we have the following process tree.
> 
>     A
>      \
>       B
>        \
>         C
> 
> Process A, B, and C are all in the same process group.
> Process A and B are setup to receive SIGCHILD when
> their process stops.
> 
> Process C traces process A.
> 
> When a sigstop is delivered to the group we can have:
> 
> Process B takes siglock(B) siglock(A) to notify the real_parent
> Process C takes siglock(C) siglock(B) to notify the real_parent
> Process A takes siglock(A) siglock(C) to notify the tracer
> 
> If they all take their local lock at the same time there is
> a deadlock.
> 
> I don't think the restriction that you can never ptrace anyone
> up the process tree is going to fly.  So it looks like I am back to the
> drawing board for this one.

I've not had time to fully appreciate the nested locking here, but if it
is possible to rework things to always take both locks at the same time,
then it would be possible to impose an arbitrary lock order on things
and break the cycle that way.

That is, simply order the locks by their heap address or something:

static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
{
	if (sh1 > sh2)
		swap(sh1, sh2)

	spin_lock_irq(&sh1->siglock);
	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
}


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 17:44                 ` Peter Zijlstra
@ 2022-04-28 18:22                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.

This is clear, but this is not that simple.

For example (with this series at least), ptrace_stop() already holds
current->sighand->siglock which (in particular) we need to protect
current->parent, but then we need current->parent->sighand->siglock
in do_notify_parent_cldstop().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 18:22                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Peter Zijlstra wrote:
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.

This is clear, but this is not that simple.

For example (with this series at least), ptrace_stop() already holds
current->sighand->siglock which (in particular) we need to protect
current->parent, but then we need current->parent->sighand->siglock
in do_notify_parent_cldstop().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 17:44                 ` Peter Zijlstra
@ 2022-04-28 18:37                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>
>> Hmm.  If we have the following process tree.
>> 
>>     A
>>      \
>>       B
>>        \
>>         C
>> 
>> Process A, B, and C are all in the same process group.
>> Process A and B are setup to receive SIGCHILD when
>> their process stops.
>> 
>> Process C traces process A.
>> 
>> When a sigstop is delivered to the group we can have:
>> 
>> Process B takes siglock(B) siglock(A) to notify the real_parent
>> Process C takes siglock(C) siglock(B) to notify the real_parent
>> Process A takes siglock(A) siglock(C) to notify the tracer
>> 
>> If they all take their local lock at the same time there is
>> a deadlock.
>> 
>> I don't think the restriction that you can never ptrace anyone
>> up the process tree is going to fly.  So it looks like I am back to the
>> drawing board for this one.
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.
>
> That is, simply order the locks by their heap address or something:
>
> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
> {
> 	if (sh1 > sh2)
> 		swap(sh1, sh2)
>
> 	spin_lock_irq(&sh1->siglock);
> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
> }

You know it might be.  Especially given that the existing code is
already dropping siglock and grabbing tasklist_lock.

It would take a potentially triple lock function to lock
the task it's real_parent and it's tracer (aka parent).

That makes this possible to consider is that notifying the ``parents''
is a fundamental part of the operation so we know we are going to
need the lock so we can move it up.

Throw in a pinch of lock_task_sighand and the triple lock function
gets quite interesting.

It is certainly worth trying, and I will.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 18:37                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>
>> Hmm.  If we have the following process tree.
>> 
>>     A
>>      \
>>       B
>>        \
>>         C
>> 
>> Process A, B, and C are all in the same process group.
>> Process A and B are setup to receive SIGCHILD when
>> their process stops.
>> 
>> Process C traces process A.
>> 
>> When a sigstop is delivered to the group we can have:
>> 
>> Process B takes siglock(B) siglock(A) to notify the real_parent
>> Process C takes siglock(C) siglock(B) to notify the real_parent
>> Process A takes siglock(A) siglock(C) to notify the tracer
>> 
>> If they all take their local lock at the same time there is
>> a deadlock.
>> 
>> I don't think the restriction that you can never ptrace anyone
>> up the process tree is going to fly.  So it looks like I am back to the
>> drawing board for this one.
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.
>
> That is, simply order the locks by their heap address or something:
>
> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
> {
> 	if (sh1 > sh2)
> 		swap(sh1, sh2)
>
> 	spin_lock_irq(&sh1->siglock);
> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
> }

You know it might be.  Especially given that the existing code is
already dropping siglock and grabbing tasklist_lock.

It would take a potentially triple lock function to lock
the task it's real_parent and it's tracer (aka parent).

That makes this possible to consider is that notifying the ``parents''
is a fundamental part of the operation so we know we are going to
need the lock so we can move it up.

Throw in a pinch of lock_task_sighand and the triple lock function
gets quite interesting.

It is certainly worth trying, and I will.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
  2022-04-28 16:50               ` Eric W. Biederman
@ 2022-04-28 18:53                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> The bug appears when the TRACEE makes it to schedule().  Inside
> >> schedule there is a call to signal_pending_state() which notices
> >> a SIGKILL is pending and refuses to sleep.
> >
> > And I think this is fine. This doesn't really differ from the case
> > when the tracee was killed before it takes siglock.
>
> Hmm.  Maybe.

I hope ;)

> Previously we were actually guaranteed in ptrace_check_attach that after
> ptrace_freeze_traced would succeed as any pending fatal signal would
> cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
> stop schedule from sleeping.

Yes.

So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in
ptrace_check_attach" looks good to me (except it should use
wait_task_inactive(__TASK_TRACED)), but it should come before other
meaningfull changes and the changelog should be updated.

And then we will probably need to reconsider this wait_task_inactive()
and WARN_ON() around it, but depends on what will we finally do.

> I think in my tired mind yesterday

I got lost too ;)

> Still I would like to be able to
> let wait_task_inactive not care about the state of the process it is
> going to sleep for.

Not sure... but to be honest I didn't really pay attention to the
wait_task_inactive(match_state => 0) part...

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 9/9] ptrace: Don't change __state
@ 2022-04-28 18:53                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn

On 04/28, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> The bug appears when the TRACEE makes it to schedule().  Inside
> >> schedule there is a call to signal_pending_state() which notices
> >> a SIGKILL is pending and refuses to sleep.
> >
> > And I think this is fine. This doesn't really differ from the case
> > when the tracee was killed before it takes siglock.
>
> Hmm.  Maybe.

I hope ;)

> Previously we were actually guaranteed in ptrace_check_attach that after
> ptrace_freeze_traced would succeed as any pending fatal signal would
> cause ptrace_freeze_traced to fail.  Any incoming fatal signal would not
> stop schedule from sleeping.

Yes.

So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in
ptrace_check_attach" looks good to me (except it should use
wait_task_inactive(__TASK_TRACED)), but it should come before other
meaningfull changes and the changelog should be updated.

And then we will probably need to reconsider this wait_task_inactive()
and WARN_ON() around it, but depends on what will we finally do.

> I think in my tired mind yesterday

I got lost too ;)

> Still I would like to be able to
> let wait_task_inactive not care about the state of the process it is
> going to sleep for.

Not sure... but to be honest I didn't really pay attention to the
wait_task_inactive(match_state => 0) part...

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-27  0:24     ` Eric W. Biederman
@ 2022-04-28 20:29       ` Peter Zijlstra
  2022-04-28 20:59         ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 20:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On Tue, Apr 26, 2022 at 07:24:03PM -0500, Eric W. Biederman wrote:
> But doing:
> 
> 	/* Don't stop if the task is dying */
> 	if (unlikely(__fatal_signal_pending(current)))
> 		return exit_code;
> 
> Should work.

Something like so then...

---
Subject: signal,ptrace: Don't stop dying tasks
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Apr 28 22:17:56 CEST 2022

Oleg pointed out that the tracee can already be killed such that
fatal_signal_pending() is true. In that case signal_wake_up_state()
cannot be relied upon to be responsible for the wakeup -- something
we're going to want to rely on.

As such, explicitly handle this case.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/signal.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	/* Don't stop if the task is dying. */
+	if (unlikely(__fatal_signal_pending(current)))
+		return exit_code;
+
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 18:37                   ` Eric W. Biederman
@ 2022-04-28 20:49                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>>
>>> Hmm.  If we have the following process tree.
>>> 
>>>     A
>>>      \
>>>       B
>>>        \
>>>         C
>>> 
>>> Process A, B, and C are all in the same process group.
>>> Process A and B are setup to receive SIGCHILD when
>>> their process stops.
>>> 
>>> Process C traces process A.
>>> 
>>> When a sigstop is delivered to the group we can have:
>>> 
>>> Process B takes siglock(B) siglock(A) to notify the real_parent
>>> Process C takes siglock(C) siglock(B) to notify the real_parent
>>> Process A takes siglock(A) siglock(C) to notify the tracer
>>> 
>>> If they all take their local lock at the same time there is
>>> a deadlock.
>>> 
>>> I don't think the restriction that you can never ptrace anyone
>>> up the process tree is going to fly.  So it looks like I am back to the
>>> drawing board for this one.
>>
>> I've not had time to fully appreciate the nested locking here, but if it
>> is possible to rework things to always take both locks at the same time,
>> then it would be possible to impose an arbitrary lock order on things
>> and break the cycle that way.
>>
>> That is, simply order the locks by their heap address or something:
>>
>> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
>> {
>> 	if (sh1 > sh2)
>> 		swap(sh1, sh2)
>>
>> 	spin_lock_irq(&sh1->siglock);
>> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
>> }
>
> You know it might be.  Especially given that the existing code is
> already dropping siglock and grabbing tasklist_lock.
>
> It would take a potentially triple lock function to lock
> the task it's real_parent and it's tracer (aka parent).
>
> That makes this possible to consider is that notifying the ``parents''
> is a fundamental part of the operation so we know we are going to
> need the lock so we can move it up.
>
> Throw in a pinch of lock_task_sighand and the triple lock function
> gets quite interesting.
>
> It is certainly worth trying, and I will.

To my surprise it doesn't look too bad.  The locking simplifications and
not using a lock as big as tasklist_lock probably make it even worth
doing.

I need to sleep on it and look at everything again.  In the
meantime here is my function that comes in with siglock held,
possibly drops it, and grabs the other two locks all in
order.

static void lock_parents_siglocks(bool lock_tracer)
	__releases(&current->sighand->siglock)
	__acquires(&current->sighand->siglock)
	__acquires(&current->real_parent->sighand->siglock)
	__acquires(&current->parent->sighand->siglock)
{
	struct task_struct *me = current;
	struct sighand_struct *m_sighand = me->sighand;

	lockdep_assert_held(&m_sighand->siglock);

	rcu_read_lock();
	for (;;) {
		struct task_struct *parent, *tracer;
		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;

		parent = me->real_parent;
		tracer = lock_tracer? me->parent : parent;

		p_sighand = rcu_dereference(parent->sighand);
		t_sighand = rcu_dereference(tracer->sighand);

		/* Sort the sighands so that s1 >= s2 >= s3 */
		s1 = m_sighand;
		s2 = p_sighand;
		s3 = t_sighand;
		if (s1 > s2)
			swap(s1, s2);
		if (s1 > s3)
			swap(s1, s3);
		if (s2 > s3)
			swap(s2, s3);

		if (s1 != m_sighand) {
			spin_unlock(&m_sighand->siglock);
			spin_lock(&s1->siglock);
		}

		if (s1 != s2)
			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
		if (s2 != s3)
			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);

		if (likely((me->real_parent == parent) &&
			   (me->parent == tracer) &&
			   (parent->sighand == p_sighand) &&
			   (tracer->sighand == t_sighand))) {
			break;
		}
		spin_unlock(&p_sighand->siglock);
                if (t_sighand != p_sighand)
			spin_unlock(&t_sighand->siglock);
		continue;
	}
	rcu_read_unlock();
}

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 20:49                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-28 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>>
>>> Hmm.  If we have the following process tree.
>>> 
>>>     A
>>>      \
>>>       B
>>>        \
>>>         C
>>> 
>>> Process A, B, and C are all in the same process group.
>>> Process A and B are setup to receive SIGCHILD when
>>> their process stops.
>>> 
>>> Process C traces process A.
>>> 
>>> When a sigstop is delivered to the group we can have:
>>> 
>>> Process B takes siglock(B) siglock(A) to notify the real_parent
>>> Process C takes siglock(C) siglock(B) to notify the real_parent
>>> Process A takes siglock(A) siglock(C) to notify the tracer
>>> 
>>> If they all take their local lock at the same time there is
>>> a deadlock.
>>> 
>>> I don't think the restriction that you can never ptrace anyone
>>> up the process tree is going to fly.  So it looks like I am back to the
>>> drawing board for this one.
>>
>> I've not had time to fully appreciate the nested locking here, but if it
>> is possible to rework things to always take both locks at the same time,
>> then it would be possible to impose an arbitrary lock order on things
>> and break the cycle that way.
>>
>> That is, simply order the locks by their heap address or something:
>>
>> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
>> {
>> 	if (sh1 > sh2)
>> 		swap(sh1, sh2)
>>
>> 	spin_lock_irq(&sh1->siglock);
>> 	spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
>> }
>
> You know it might be.  Especially given that the existing code is
> already dropping siglock and grabbing tasklist_lock.
>
> It would take a potentially triple lock function to lock
> the task it's real_parent and it's tracer (aka parent).
>
> That makes this possible to consider is that notifying the ``parents''
> is a fundamental part of the operation so we know we are going to
> need the lock so we can move it up.
>
> Throw in a pinch of lock_task_sighand and the triple lock function
> gets quite interesting.
>
> It is certainly worth trying, and I will.

To my surprise it doesn't look too bad.  The locking simplifications and
not using a lock as big as tasklist_lock probably make it even worth
doing.

I need to sleep on it and look at everything again.  In the
meantime here is my function that comes in with siglock held,
possibly drops it, and grabs the other two locks all in
order.

static void lock_parents_siglocks(bool lock_tracer)
	__releases(&current->sighand->siglock)
	__acquires(&current->sighand->siglock)
	__acquires(&current->real_parent->sighand->siglock)
	__acquires(&current->parent->sighand->siglock)
{
	struct task_struct *me = current;
	struct sighand_struct *m_sighand = me->sighand;

	lockdep_assert_held(&m_sighand->siglock);

	rcu_read_lock();
	for (;;) {
		struct task_struct *parent, *tracer;
		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;

		parent = me->real_parent;
		tracer = lock_tracer? me->parent : parent;

		p_sighand = rcu_dereference(parent->sighand);
		t_sighand = rcu_dereference(tracer->sighand);

		/* Sort the sighands so that s1 >= s2 >= s3 */
		s1 = m_sighand;
		s2 = p_sighand;
		s3 = t_sighand;
		if (s1 > s2)
			swap(s1, s2);
		if (s1 > s3)
			swap(s1, s3);
		if (s2 > s3)
			swap(s2, s3);

		if (s1 != m_sighand) {
			spin_unlock(&m_sighand->siglock);
			spin_lock(&s1->siglock);
		}

		if (s1 != s2)
			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
		if (s2 != s3)
			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);

		if (likely((me->real_parent == parent) &&
			   (me->parent == tracer) &&
			   (parent->sighand == p_sighand) &&
			   (tracer->sighand == t_sighand))) {
			break;
		}
		spin_unlock(&p_sighand->siglock);
                if (t_sighand != p_sighand)
			spin_unlock(&t_sighand->siglock);
		continue;
	}
	rcu_read_unlock();
}

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 20:29       ` Peter Zijlstra
@ 2022-04-28 20:59         ` Oleg Nesterov
  2022-04-28 22:21           ` Peter Zijlstra
  0 siblings, 1 reply; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 20:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On 04/28, Peter Zijlstra wrote:
>
> Oleg pointed out that the tracee can already be killed such that
> fatal_signal_pending() is true. In that case signal_wake_up_state()
> cannot be relied upon to be responsible for the wakeup -- something
> we're going to want to rely on.

Peter, I am all confused...

If this patch is against the current tree, we don't need it.

If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
then it can't help - SIGKILL can come right after the tracee drops siglock
and calls schedule().

Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
return to close this race.

Oleg.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>  
> +	/* Don't stop if the task is dying. */
> +	if (unlikely(__fatal_signal_pending(current)))
> +		return exit_code;
> +
>  	/*
>  	 * schedule() will not sleep if there is a pending signal that
>  	 * can awaken the task.
> 


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
  2022-04-28 20:49                     ` Eric W. Biederman
@ 2022-04-28 22:19                       ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 03:49:11PM -0500, Eric W. Biederman wrote:

> static void lock_parents_siglocks(bool lock_tracer)
> 	__releases(&current->sighand->siglock)
> 	__acquires(&current->sighand->siglock)
> 	__acquires(&current->real_parent->sighand->siglock)
> 	__acquires(&current->parent->sighand->siglock)
> {
> 	struct task_struct *me = current;
> 	struct sighand_struct *m_sighand = me->sighand;
> 
> 	lockdep_assert_held(&m_sighand->siglock);
> 
> 	rcu_read_lock();
> 	for (;;) {
> 		struct task_struct *parent, *tracer;
> 		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
> 
> 		parent = me->real_parent;
> 		tracer = lock_tracer? me->parent : parent;
> 
> 		p_sighand = rcu_dereference(parent->sighand);
> 		t_sighand = rcu_dereference(tracer->sighand);
> 
> 		/* Sort the sighands so that s1 >= s2 >= s3 */
> 		s1 = m_sighand;
> 		s2 = p_sighand;
> 		s3 = t_sighand;
> 		if (s1 > s2)
> 			swap(s1, s2);
> 		if (s1 > s3)
> 			swap(s1, s3);
> 		if (s2 > s3)
> 			swap(s2, s3);
> 
> 		if (s1 != m_sighand) {
> 			spin_unlock(&m_sighand->siglock);
> 			spin_lock(&s1->siglock);
> 		}
> 
> 		if (s1 != s2)
> 			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
> 		if (s2 != s3)
> 			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);
> 

Might as well just use 1 and 2 for subclass at this point, or use
SIGLOCK_LOCK_FIRST below.

> 		if (likely((me->real_parent == parent) &&
> 			   (me->parent == tracer) &&
> 			   (parent->sighand == p_sighand) &&
> 			   (tracer->sighand == t_sighand))) {
> 			break;
> 		}
> 		spin_unlock(&p_sighand->siglock);
>                 if (t_sighand != p_sighand)
> 			spin_unlock(&t_sighand->siglock);

Indent fail above ^, also you likey need this:

		/*
		 * Since [pt]_sighand will likely change if we go
		 * around, and m_sighand is the only one held, make sure
		 * it is subclass-0, since the above 's1 != m_sighand'
		 * clause very much relies on that.
		 */
		lock_set_subclass(&m_sighand->siglock, 0, _RET_IP_);

> 		continue;
> 	}
> 	rcu_read_unlock();
> }
> 
> Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-04-28 22:19                       ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, inux-xtensa, Kees Cook,
	Jann Horn

On Thu, Apr 28, 2022 at 03:49:11PM -0500, Eric W. Biederman wrote:

> static void lock_parents_siglocks(bool lock_tracer)
> 	__releases(&current->sighand->siglock)
> 	__acquires(&current->sighand->siglock)
> 	__acquires(&current->real_parent->sighand->siglock)
> 	__acquires(&current->parent->sighand->siglock)
> {
> 	struct task_struct *me = current;
> 	struct sighand_struct *m_sighand = me->sighand;
> 
> 	lockdep_assert_held(&m_sighand->siglock);
> 
> 	rcu_read_lock();
> 	for (;;) {
> 		struct task_struct *parent, *tracer;
> 		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
> 
> 		parent = me->real_parent;
> 		tracer = lock_tracer? me->parent : parent;
> 
> 		p_sighand = rcu_dereference(parent->sighand);
> 		t_sighand = rcu_dereference(tracer->sighand);
> 
> 		/* Sort the sighands so that s1 >= s2 >= s3 */
> 		s1 = m_sighand;
> 		s2 = p_sighand;
> 		s3 = t_sighand;
> 		if (s1 > s2)
> 			swap(s1, s2);
> 		if (s1 > s3)
> 			swap(s1, s3);
> 		if (s2 > s3)
> 			swap(s2, s3);
> 
> 		if (s1 != m_sighand) {
> 			spin_unlock(&m_sighand->siglock);
> 			spin_lock(&s1->siglock);
> 		}
> 
> 		if (s1 != s2)
> 			spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
> 		if (s2 != s3)
> 			spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);
> 

Might as well just use 1 and 2 for subclass at this point, or use
SIGLOCK_LOCK_FIRST below.

> 		if (likely((me->real_parent == parent) &&
> 			   (me->parent == tracer) &&
> 			   (parent->sighand == p_sighand) &&
> 			   (tracer->sighand == t_sighand))) {
> 			break;
> 		}
> 		spin_unlock(&p_sighand->siglock);
>                 if (t_sighand != p_sighand)
> 			spin_unlock(&t_sighand->siglock);

Indent fail above ^, also you likey need this:

		/*
		 * Since [pt]_sighand will likely change if we go
		 * around, and m_sighand is the only one held, make sure
		 * it is subclass-0, since the above 's1 != m_sighand'
		 * clause very much relies on that.
		 */
		lock_set_subclass(&m_sighand->siglock, 0, _RET_IP_);

> 		continue;
> 	}
> 	rcu_read_unlock();
> }
> 
> Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 20:59         ` Oleg Nesterov
@ 2022-04-28 22:21           ` Peter Zijlstra
  2022-04-28 22:50             ` Oleg Nesterov
  0 siblings, 1 reply; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-28 22:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> On 04/28, Peter Zijlstra wrote:
> >
> > Oleg pointed out that the tracee can already be killed such that
> > fatal_signal_pending() is true. In that case signal_wake_up_state()
> > cannot be relied upon to be responsible for the wakeup -- something
> > we're going to want to rely on.
> 
> Peter, I am all confused...
> 
> If this patch is against the current tree, we don't need it.
> 
> If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> then it can't help - SIGKILL can come right after the tracee drops siglock
> and calls schedule().

But by that time it will already have set TRACED and signal_wake_up()
wil clear it, no?

> Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
> to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
> return to close this race.

I think Eric convinced me there was a problem with that, but I'll go
over it all again in the morning, perhaps I'll reach a different
conclusion :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT
  2022-04-28 22:21           ` Peter Zijlstra
@ 2022-04-28 22:50             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-04-28 22:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, linux-kernel, tj,
	linux-pm

Peter, you know, it is very difficult to me to discuss the changes
in the 2 unfinished series and not loose the context ;) Plus I am
already sleeping. But I'll try to reply anyway.

On 04/29, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> > If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> > then it can't help - SIGKILL can come right after the tracee drops siglock
> > and calls schedule().
>
> But by that time it will already have set TRACED and signal_wake_up()
> wil clear it, no?

No. JOBCTL_DELAY_WAKEKILL is already set, this means that signal_wake_up()
will remove TASK_WAKEKILL from the "state" passed to signal_wake_up_state()
and this is fine and correct, this mean thats ttwu() won't change ->__state.

But this also mean that wake_up_state() will return false, and in this case

	signal_wake_up_state:

		if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
			t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);

won't clear these flags. And this is nice too.

But. fatal_signal_pending() is true! And once we change freeze_traced()
to not abuse p->__state, schedule() won't block because it will check
signal_pending_state(TASK_TRACED == TASK_WAKEKILL | __TASK_TRACED) and
__fatal_signal_pending() == T.

In this case ptrace_stop() will leak JOBCTL_TRACED, so we simply need
to clear it before return along with LISTENING | DELAY_WAKEKILL.

> I'll go
> over it all again in the morning, perhaps I'll reach a different
> conclusion :-)

Same here ;)

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
  2022-04-26 22:50       ` Eric W. Biederman
  (?)
@ 2022-04-29 21:46         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-04-29 21:46         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-04-29 21:46         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Don't change __state
      ptrace: Remove arch_ptrace_attach
      ptrace: Always take siglock in ptrace_resume
      ptrace: Only return signr from ptrace_stop if it was provided
      ptrace: Always call schedule in ptrace_stop
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |  10 +++
 include/linux/sched/signal.h      |  23 ++++++-
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  88 +++++++++----------------
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 18 files changed, 145 insertions(+), 228 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler = SIG_DFL &&
 	    (!t->ptrace || (handler = HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function send_signal_locked does more than __group_send_sig_info so
replace it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo = SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall  			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_KILL:
 		if (child->exit_state)	/* already dead */
 			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending.  Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called.  With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  8 +++++++-
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/signal.c              |  9 +++------
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	unsigned int state = 0;
+	if (resume) {
+		state = TASK_WAKEKILL;
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+			state |= __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) = __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent = current) {
-		WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 */
-	set_special_state(TASK_TRACED);
+	if (!__fatal_signal_pending(current))
+		set_special_state(TASK_TRACED);
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 644eb7439d01..22041531adf6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1280,10 +1280,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1292,8 +1288,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1305,12 +1299,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1450,12 +1438,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 22041531adf6..c1c99e8be147 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -845,8 +845,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -882,18 +880,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code = data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data = 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped.  At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set.   Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/jobctl.h |  2 ++
 kernel/ptrace.c              |  5 +++++
 kernel/signal.c              | 12 ++++++------
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
+	spin_lock(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	spin_unlock(&child->sighand->siglock);
+
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl |= JOBCTL_PTRACE_SIGNR;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 		/* tasklist protects us from ptrace_freeze_traced() */
 		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
 		read_unlock(&tasklist_lock);
 	}
 
@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
+	/* Did userspace perhaps provide a signal to resume with? */
+	if (current->jobctl & JOBCTL_PTRACE_SIGNR)
 		exit_code = current->exit_code;
+	else if (clear_code)
+		exit_code = 0;
+
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state == TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state == TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING.  This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state = TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-------
 kernel/signal.c | 68 ++++++++++++++++++-------------------------------
 2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
-
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-04-29 21:48           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched, signal, ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-04-29 21:48           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-04-29 21:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 17 ++++++++++++++---
 kernel/ptrace.c              | 17 +++++++++++++----
 kernel/signal.c              | 16 +++++++++++++---
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) = TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 #define JOBCTL_PTRACE_SIGNR_BIT	25	/* ptrace signal number */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 #define JOBCTL_PTRACE_SIGNR	(1UL << JOBCTL_PTRACE_SIGNR_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
 {
 	unsigned int state = 0;
 	if (resume) {
+		unsigned long jmask = JOBCTL_STOPPED;
 		state = TASK_WAKEKILL;
-		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+			jmask |= JOBCTL_TRACED;
 			state |= __TASK_TRACED;
+		}
+		t->jobctl &= ~jmask;
 	}
 	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
-	child->jobctl |= JOBCTL_PTRACE_SIGNR;
+	child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
-	if (!__fatal_signal_pending(current))
+	if (!__fatal_signal_pending(current)) {
 		set_special_state(TASK_TRACED);
+		current->jobctl |= JOBCTL_TRACED;
+	}
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	/* LISTENING can be set only during STOP traps, clear it */
 	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+	WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
+
+		WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
 		return true;
 	} else {
 		/*
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-04-29 22:27             ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 22:27             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-04-29 22:27             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-04-29 22:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On Fri, Apr 29, 2022 at 04:48:32PM -0500, Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new
> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
> when it is indicated a fatal signal is pending.  Skip adding
> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
> that use TASK_KILLABLE go through signal_wake_up.
> 
> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  include/linux/sched.h        |  2 +-
>  include/linux/sched/jobctl.h |  2 ++
>  include/linux/sched/signal.h |  8 +++++++-
>  kernel/ptrace.c              | 21 ++++++++-------------
>  kernel/signal.c              |  9 +++------
>  5 files changed, 21 insertions(+), 21 deletions(-)

Please fold this hunk:

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-02  7:50             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-02  7:58             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:27 [-0500], Eric W. Biederman wrote:
> The function send_signal_locked does more than __group_send_sig_info so
> replace it.

This might be easier to understand:
   __group_send_sig_info() is just a wrapper around send_signal_locked()
   with a special pid_type. 
   
   Replace __group_send_sig_info() with send_signal_locked(,,,
   PIDTYPE_TGID).

However, keep it as is if you feel otherwise ;)

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02  8:59             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02  8:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
> 
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
     signal_wake_up

> when it is indicated a fatal signal is pending.  Skip adding
                      +that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set.  This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
                                                                        ,
> that use TASK_KILLABLE go through signal_wake_up.
                        ,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called.  With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
> 
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop.  Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided
@ 2022-05-02 10:08             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped.  At either point
> the result can be that ptrace will continue and not stop at schedule.
> 
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
> 
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
      ,
> that the signal to continue with has be set.   Use that
                                       been
> new flag to decided how to set return signal.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-04-29 21:48           ` [PATCH v2 12/12] sched, signal, ptrace: " Eric W. Biederman
  (?)
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-02 10:18             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 10:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
 From: Peter Zijlstra (Intel) <peterz@infradead.org>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-02 13:38           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-02 13:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, tglx

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
> 
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 14:37             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-02 14:37             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-02 14:37             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 14:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>  	case PTRACE_KILL:
>  		if (child->exit_state)	/* already dead */
>  			return 0;
> -		return ptrace_resume(child, request, SIGKILL);
> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

	case PTRACE_KILL:
		if (!child->exit_state)
			send_sig_info(SIGKILL);
		return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 15:39             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:39             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:39             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED
...
>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>  {
> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> +	unsigned int state = 0;
> +	if (resume) {
> +		state = TASK_WAKEKILL;
> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> +			state |= __TASK_TRACED;
> +	}
> +	signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  		spin_lock_irq(&current->sighand->siglock);
>  	}
>
> -	/*
> -	 * schedule() will not sleep if there is a pending signal that
> -	 * can awaken the task.
> -	 */
> -	set_special_state(TASK_TRACED);
> +	if (!__fatal_signal_pending(current))
> +		set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-04-29 21:48           ` Eric W. Biederman
  (?)
@ 2022-05-02 15:47             ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:47             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 15:47             ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-02 15:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 04/29, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) = __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {
> +		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-02 15:39             ` Oleg Nesterov
  (?)
@ 2022-05-02 16:35               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric




^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 16:35               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric




_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-02 16:35               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-02 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
> ...
>>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>>  {
>> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> +	unsigned int state = 0;
>> +	if (resume) {
>> +		state = TASK_WAKEKILL;
>> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> +			state |= __TASK_TRACED;
>> +	}
>> +	signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully.  As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case).  I can only see that happening by removing the
dependency on the final state in wait_task_inactive.  Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  		spin_lock_irq(&current->sighand->siglock);
>>  	}
>>
>> -	/*
>> -	 * schedule() will not sleep if there is a pending signal that
>> -	 * can awaken the task.
>> -	 */
>> -	set_special_state(TASK_TRACED);
>> +	if (!__fatal_signal_pending(current))
>> +		set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called.  With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens.  Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
	if (__fatal_signal_pending(current)) {
		return clear_code ? 0 : exit_code;
        }

With a little bit of care put in to ensure everytime the logic changes
that early return changes too.  I think that just complicates things
unnecessarily.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-02 16:35               ` Eric W. Biederman
  (?)
@ 2022-05-03 13:41                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 13:41                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 13:41                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-03 13:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED			__TASK_TRACED
> > ...
> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
> >>  {
> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> +	unsigned int state = 0;
> >> +	if (resume) {
> >> +		state = TASK_WAKEKILL;
> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> +			state |= __TASK_TRACED;
> >> +	}
> >> +	signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case).  I can only see that happening by removing the
> dependency on the final state in wait_task_inactive.  Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-05-02 14:37             ` Oleg Nesterov
  (?)
@ 2022-05-03 19:36               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-03 19:36               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-03 19:36               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 19:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, stable, Al Viro

Oleg Nesterov <oleg@redhat.com> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume.  Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine.  At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>>  	case PTRACE_KILL:
>>  		if (child->exit_state)	/* already dead */
>>  			return 0;
>> -		return ptrace_resume(child, request, SIGKILL);
>> +		return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> 	case PTRACE_KILL:
> 		if (!child->exit_state)
> 			send_sig_info(SIGKILL);
> 		return 0;
>
> to make this change a bit more compatible.


Quite.  The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

 	case PTRACE_KILL:
 		send_sig_info(SIGKILL);
 		return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-03 13:41                 ` Oleg Nesterov
  (?)
@ 2022-05-03 20:45                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric









^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 20:45                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric









_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-03 20:45                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-03 20:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> >>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED			__TASK_TRACED
>> > ...
>> >>  static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >>  {
>> >> -	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> +	unsigned int state = 0;
>> >> +	if (resume) {
>> >> +		state = TASK_WAKEKILL;
>> >> +		if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> +			state |= __TASK_TRACED;
>> >> +	}
>> >> +	signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case.  Possibly
signal_wake_up_state.  The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way.  Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case).  I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive.  Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result.  RT has more problems as it turns all spin locks into
sleeping locks.  When a task is frozen it turns it's sleeping state into
TASK_FROZEN.  That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed.  This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up.  After which they check their conditions and go back to sleep
on their on.  For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed.  It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working.  That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT.  That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop.  Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it.  I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that.  The difference in behavior
is explicit enough we can think about it easily.

Eric








^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-03 20:45                   ` Eric W. Biederman
  (?)
@ 2022-05-04 14:02                     ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 14:02                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 14:02                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-04 14:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case.  Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result.  RT has more problems as it turns all spin locks into
> sleeping locks.  When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

	static inline void signal_wake_up(struct task_struct *t, bool resume)
	{
		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
	}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

	if (!ret && !ignore_state &&
	    /*
	     * This can only fail if the frozen tracee races with
	     * SIGKILL and enters schedule() with fatal_signal_pending
	     */
	    !wait_task_inactive(child, __TASK_TRACED))
		ret = -ESRCH;

	return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

	- add the fatal_signal_pending() check into ptrace_stop()
	  (like this patch does)

	- say, change signal_pending_state:

	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (!signal_pending(p))
			return 0;
		if (p->jobctl & JOBCTL_TASK_FROZEN)
			return 0;
		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
	}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
        ^^^^^^
        tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-04 14:02                     ` Oleg Nesterov
  (?)
@ 2022-05-04 17:37                       ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case.  Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result.  RT has more problems as it turns all spin locks into
>> sleeping locks.  When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
> 	{
> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> 	}

If your concern is signal_wake_up there is no reason it can't be:

	static inline void signal_wake_up(struct task_struct *t, bool fatal)
        {
        	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
                signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
        }

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> 	if (!ret && !ignore_state &&
> 	    /*
> 	     * This can only fail if the frozen tracee races with
> 	     * SIGKILL and enters schedule() with fatal_signal_pending
> 	     */
> 	    !wait_task_inactive(child, __TASK_TRACED))
> 		ret = -ESRCH;
>
> 	return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> 	- add the fatal_signal_pending() check into ptrace_stop()
> 	  (like this patch does)
>
> 	- say, change signal_pending_state:
>
> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (!signal_pending(p))
> 			return 0;
> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
> 			return 0;
> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> 	}
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
>         ^^^^^^
>         tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that silly case that sets
TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
remain.  It just seems to make everything more complicated for no real
reason anymore.  The deadlocks may_ptrace_stop was guarding against are
gone.

Plus the test is so racy we case can happen after we drop siglock
before we schedule, or shortly after we have stopped so we really
don't reliably catch the condition the code is trying to catch.

I think the case I care most about is ptrace_signal, which pretty much
requires the tracer to wait and clear exit_code before being terminated
to cause problems.  We don't handle that at all today.

So yeah.  I think the code handles so little at this point we can just
remove the code and simplify things, if we actually care we can come
back and implement JOBCTL_PTRACE_SIGNR or the like.

I will chew on that a bit and see if I can find any reasons for keeping
the code in ptrace_stop at all.



As an added data point we can probably remove handling of the signal
from ptrace_report_syscall entirely (not in this patchset!).

I took a quick skim and it appears that sending a signal in
ptrace_report_syscall appears to be a feature introduced with ptrace
support in Linux v1.0 and the comment in ptrace_report_syscall appears
to document the fact that the code has always been dead.


I made it through 13 of 133 pages of debian code search results for
PTRACE_SYSCALL, and the only use I could find of setting the continue
signal was when the signal reported from wait was not SIGTRAP.  Exactly
the same as in the comment in ptrace_report_syscall.

If that pattern holds for all of the uses of ptrace then the code
in ptrace_report_syscall is dead.



Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 17:37                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case.  Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result.  RT has more problems as it turns all spin locks into
>> sleeping locks.  When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
> 	{
> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> 	}

If your concern is signal_wake_up there is no reason it can't be:

	static inline void signal_wake_up(struct task_struct *t, bool fatal)
        {
        	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
                signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
        }

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> 	if (!ret && !ignore_state &&
> 	    /*
> 	     * This can only fail if the frozen tracee races with
> 	     * SIGKILL and enters schedule() with fatal_signal_pending
> 	     */
> 	    !wait_task_inactive(child, __TASK_TRACED))
> 		ret = -ESRCH;
>
> 	return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> 	- add the fatal_signal_pending() check into ptrace_stop()
> 	  (like this patch does)
>
> 	- say, change signal_pending_state:
>
> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (!signal_pending(p))
> 			return 0;
> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
> 			return 0;
> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> 	}
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
>         ^^^^^^
>         tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that silly case that sets
TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
remain.  It just seems to make everything more complicated for no real
reason anymore.  The deadlocks may_ptrace_stop was guarding against are
gone.

Plus the test is so racy we case can happen after we drop siglock
before we schedule, or shortly after we have stopped so we really
don't reliably catch the condition the code is trying to catch.

I think the case I care most about is ptrace_signal, which pretty much
requires the tracer to wait and clear exit_code before being terminated
to cause problems.  We don't handle that at all today.

So yeah.  I think the code handles so little at this point we can just
remove the code and simplify things, if we actually care we can come
back and implement JOBCTL_PTRACE_SIGNR or the like.

I will chew on that a bit and see if I can find any reasons for keeping
the code in ptrace_stop at all.



As an added data point we can probably remove handling of the signal
from ptrace_report_syscall entirely (not in this patchset!).

I took a quick skim and it appears that sending a signal in
ptrace_report_syscall appears to be a feature introduced with ptrace
support in Linux v1.0 and the comment in ptrace_report_syscall appears
to document the fact that the code has always been dead.


I made it through 13 of 133 pages of debian code search results for
PTRACE_SYSCALL, and the only use I could find of setting the continue
signal was when the signal reported from wait was not SIGTRAP.  Exactly
the same as in the comment in ptrace_report_syscall.

If that pattern holds for all of the uses of ptrace then the code
in ptrace_report_syscall is dead.



Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 17:37                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 17:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case.  Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result.  RT has more problems as it turns all spin locks into
>> sleeping locks.  When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
> 	{
> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> 	}

If your concern is signal_wake_up there is no reason it can't be:

	static inline void signal_wake_up(struct task_struct *t, bool fatal)
        {
        	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
                signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
        }

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> 	if (!ret && !ignore_state &&
> 	    /*
> 	     * This can only fail if the frozen tracee races with
> 	     * SIGKILL and enters schedule() with fatal_signal_pending
> 	     */
> 	    !wait_task_inactive(child, __TASK_TRACED))
> 		ret = -ESRCH;
>
> 	return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> 	- add the fatal_signal_pending() check into ptrace_stop()
> 	  (like this patch does)
>
> 	- say, change signal_pending_state:
>
> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (!signal_pending(p))
> 			return 0;
> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
> 			return 0;
> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> 	}
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
>         ^^^^^^
>         tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that silly case that sets
TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
remain.  It just seems to make everything more complicated for no real
reason anymore.  The deadlocks may_ptrace_stop was guarding against are
gone.

Plus the test is so racy we case can happen after we drop siglock
before we schedule, or shortly after we have stopped so we really
don't reliably catch the condition the code is trying to catch.

I think the case I care most about is ptrace_signal, which pretty much
requires the tracer to wait and clear exit_code before being terminated
to cause problems.  We don't handle that at all today.

So yeah.  I think the code handles so little at this point we can just
remove the code and simplify things, if we actually care we can come
back and implement JOBCTL_PTRACE_SIGNR or the like.

I will chew on that a bit and see if I can find any reasons for keeping
the code in ptrace_stop at all.



As an added data point we can probably remove handling of the signal
from ptrace_report_syscall entirely (not in this patchset!).

I took a quick skim and it appears that sending a signal in
ptrace_report_syscall appears to be a feature introduced with ptrace
support in Linux v1.0 and the comment in ptrace_report_syscall appears
to document the fact that the code has always been dead.


I made it through 13 of 133 pages of debian code search results for
PTRACE_SYSCALL, and the only use I could find of setting the continue
signal was when the signal reported from wait was not SIGTRAP.  Exactly
the same as in the comment in ptrace_report_syscall.

If that pattern holds for all of the uses of ptrace then the code
in ptrace_report_syscall is dead.



Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
  2022-05-04 17:37                       ` Eric W. Biederman
  (?)
@ 2022-05-04 18:28                         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 18:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/03, Eric W. Biederman wrote:
>>>
>>> Oleg Nesterov <oleg@redhat.com> writes:
>>>
>>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>>> > with SIGKILL. I still can't understand this.
>>> >
>>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>>> > in 11/12.
>>>
>>> >
>>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>>> > *signal_wake_up() better?
>>>
>>> Not changing __state is better because it removes special cases
>>> from the scheduler that only apply to ptrace.
>>
>> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>>
>> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
>> from __state) and complicating *signal_wake_up() (I mean, compared
>> to your previous version) is a good idea.
>>
>> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
>> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
>> around wait_task_inactive().
>>
>>> > And even if we need to ensure the tracee will always block after
>>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>>> > looks unnecessary to me.
>>>
>>> We still need to change signal_wake_up in that case.  Possibly
>>> signal_wake_up_state.
>>
>> Of course. See above.
>>
>>> >> if we depend on wait_task_inactive failing if the process is in the
>>> >> wrong state.
>>> >
>>> > OK, I guess this is what I do not understand. Could you spell please?
>>> >
>>> > And speaking of RT, wait_task_inactive() still can fail because
>>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>>> > preempt_disable() ? I don't understand the plan :/
>>>
>>> Let me describe his freezer change as that is much easier to get to the
>>> final result.  RT has more problems as it turns all spin locks into
>>> sleeping locks.  When a task is frozen
>>
>> [...snip...]
>>
>> Oh, thanks Eric, but I understand this part. But I still can't understand
>> why is it that critical to block in schedule... OK, I need to think about
>> it. Lets assume this is really necessary.
>>
>> Anyway. I'd suggest to not change TASK_TRACED in this series and not
>> complicate signal_wake_up() more than you did in your previous version:
>>
>> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
>> 	{
>> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> 	}
>
> If your concern is signal_wake_up there is no reason it can't be:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool fatal)
>         {
>         	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
>                 signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
>         }
>
> I guess I was more targeted in this version, which lead to more if
> statements but as there is only one place in the code that can be
> JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
> TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.
>
> So yes. I can make the code as simple as my earlier version of
> signal_wake_up.
>
>> JOBCTL_PTRACE_FROZEN is fine.
>>
>> ptrace_check_attach() can do
>>
>> 	if (!ret && !ignore_state &&
>> 	    /*
>> 	     * This can only fail if the frozen tracee races with
>> 	     * SIGKILL and enters schedule() with fatal_signal_pending
>> 	     */
>> 	    !wait_task_inactive(child, __TASK_TRACED))
>> 		ret = -ESRCH;
>>
>> 	return ret;
>>
>>
>> Now. If/when we really need to ensure that the frozen tracee always
>> blocks and wait_task_inactive() never fails, we can just do
>>
>> 	- add the fatal_signal_pending() check into ptrace_stop()
>> 	  (like this patch does)
>>
>> 	- say, change signal_pending_state:
>>
>> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
>> 	{
>> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>> 			return 0;
>> 		if (!signal_pending(p))
>> 			return 0;
>> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
>> 			return 0;
>> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
>> 	}
>>
>> in a separate patch which should carefully document the need for this
>> change.
>>
>>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>>> > I mean, I am not sure it worth the trouble.
>>>
>>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>>> - stopping in ptrace_report_syscall.
>>> - Not having PT_TRACESYSGOOD set.
>>> - The tracee being killed with a fatal signal
>>         ^^^^^^
>>         tracer ?
>
> Both actually.
>
>>> - The tracee sending SIGTRAP to itself.
>>
>> Oh, but this is clear. But do we really care? If the tracer exits
>> unexpectedly, the tracee can have a lot more problems, I don't think
>> that this particular one is that important.
>
> I don't know of complaints, and if you haven't heard them either
> that that is a good indication that in practice we don't care.
>
> At a practical level I just don't want that silly case that sets
> TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
> remain.  It just seems to make everything more complicated for no real
> reason anymore.  The deadlocks may_ptrace_stop was guarding against are
> gone.
>
> Plus the test is so racy we case can happen after we drop siglock
> before we schedule, or shortly after we have stopped so we really
> don't reliably catch the condition the code is trying to catch.
>
> I think the case I care most about is ptrace_signal, which pretty much
> requires the tracer to wait and clear exit_code before being terminated
> to cause problems.  We don't handle that at all today.
>
> So yeah.  I think the code handles so little at this point we can just
> remove the code and simplify things, if we actually care we can come
> back and implement JOBCTL_PTRACE_SIGNR or the like.

The original explanation for handling this is:

commit 66519f549ae516e7ff2f24a8a5134713411a4a58
Author: Roland McGrath <roland@redhat.com>
Date:   Tue Jan 4 05:38:15 2005 -0800

    [PATCH] fix ptracer death race yielding bogus BUG_ON
    
    There is a BUG_ON in ptrace_stop that hits if the thread is not ptraced.
    However, there is no synchronization between a thread deciding to do a
    ptrace stop and so going here, and its ptracer dying and so detaching from
    it and clearing its ->ptrace field.
    
    The RHEL3 2.4-based kernel has a backport of a slightly older version of
    the 2.6 signals code, which has a different but equivalent BUG_ON.  This
    actually bit users in practice (when the debugger dies), but was
    exceedingly difficult to reproduce in contrived circumstances.  We moved
    forward in RHEL3 just by removing the BUG_ON, and that fixed the real user
    problems even though I was never able to reproduce the scenario myself.
    So, to my knowledge this scenario has never actually been seen in practice
    under 2.6.  But it's plain to see from the code that it is indeed possible.
    
    This patch removes that BUG_ON, but also goes further and tries to handle
    this case more gracefully than simply avoiding the crash.  By removing the
    BUG_ON alone, it becomes possible for the real parent of a process to see
    spurious SIGCHLD notifications intended for the debugger that has just
    died, and have its child wind up stopped unexpectedly.  This patch avoids
    that possibility by detecting the case when we are about to do the ptrace
    stop but our ptracer has gone away, and simply eliding that ptrace stop
    altogether as if we hadn't been ptraced when we hit the interesting event
    (signal or ptrace_notify call for syscall tracing or something like that).
    
    Signed-off-by: Roland McGrath <roland@redhat.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

And it was all about
	BUG_ON(!(current->ptrace & PT_PTRACED));
At the beginning of ptrace_stop.

Which seems like a bit of buggy overkill.

>
> I will chew on that a bit and see if I can find any reasons for keeping
> the code in ptrace_stop at all.

Still chewing.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 18:28                         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 18:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/03, Eric W. Biederman wrote:
>>>
>>> Oleg Nesterov <oleg@redhat.com> writes:
>>>
>>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>>> > with SIGKILL. I still can't understand this.
>>> >
>>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>>> > in 11/12.
>>>
>>> >
>>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>>> > *signal_wake_up() better?
>>>
>>> Not changing __state is better because it removes special cases
>>> from the scheduler that only apply to ptrace.
>>
>> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>>
>> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
>> from __state) and complicating *signal_wake_up() (I mean, compared
>> to your previous version) is a good idea.
>>
>> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
>> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
>> around wait_task_inactive().
>>
>>> > And even if we need to ensure the tracee will always block after
>>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>>> > looks unnecessary to me.
>>>
>>> We still need to change signal_wake_up in that case.  Possibly
>>> signal_wake_up_state.
>>
>> Of course. See above.
>>
>>> >> if we depend on wait_task_inactive failing if the process is in the
>>> >> wrong state.
>>> >
>>> > OK, I guess this is what I do not understand. Could you spell please?
>>> >
>>> > And speaking of RT, wait_task_inactive() still can fail because
>>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>>> > preempt_disable() ? I don't understand the plan :/
>>>
>>> Let me describe his freezer change as that is much easier to get to the
>>> final result.  RT has more problems as it turns all spin locks into
>>> sleeping locks.  When a task is frozen
>>
>> [...snip...]
>>
>> Oh, thanks Eric, but I understand this part. But I still can't understand
>> why is it that critical to block in schedule... OK, I need to think about
>> it. Lets assume this is really necessary.
>>
>> Anyway. I'd suggest to not change TASK_TRACED in this series and not
>> complicate signal_wake_up() more than you did in your previous version:
>>
>> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
>> 	{
>> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> 	}
>
> If your concern is signal_wake_up there is no reason it can't be:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool fatal)
>         {
>         	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
>                 signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
>         }
>
> I guess I was more targeted in this version, which lead to more if
> statements but as there is only one place in the code that can be
> JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
> TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.
>
> So yes. I can make the code as simple as my earlier version of
> signal_wake_up.
>
>> JOBCTL_PTRACE_FROZEN is fine.
>>
>> ptrace_check_attach() can do
>>
>> 	if (!ret && !ignore_state &&
>> 	    /*
>> 	     * This can only fail if the frozen tracee races with
>> 	     * SIGKILL and enters schedule() with fatal_signal_pending
>> 	     */
>> 	    !wait_task_inactive(child, __TASK_TRACED))
>> 		ret = -ESRCH;
>>
>> 	return ret;
>>
>>
>> Now. If/when we really need to ensure that the frozen tracee always
>> blocks and wait_task_inactive() never fails, we can just do
>>
>> 	- add the fatal_signal_pending() check into ptrace_stop()
>> 	  (like this patch does)
>>
>> 	- say, change signal_pending_state:
>>
>> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
>> 	{
>> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>> 			return 0;
>> 		if (!signal_pending(p))
>> 			return 0;
>> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
>> 			return 0;
>> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
>> 	}
>>
>> in a separate patch which should carefully document the need for this
>> change.
>>
>>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>>> > I mean, I am not sure it worth the trouble.
>>>
>>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>>> - stopping in ptrace_report_syscall.
>>> - Not having PT_TRACESYSGOOD set.
>>> - The tracee being killed with a fatal signal
>>         ^^^^^^
>>         tracer ?
>
> Both actually.
>
>>> - The tracee sending SIGTRAP to itself.
>>
>> Oh, but this is clear. But do we really care? If the tracer exits
>> unexpectedly, the tracee can have a lot more problems, I don't think
>> that this particular one is that important.
>
> I don't know of complaints, and if you haven't heard them either
> that that is a good indication that in practice we don't care.
>
> At a practical level I just don't want that silly case that sets
> TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
> remain.  It just seems to make everything more complicated for no real
> reason anymore.  The deadlocks may_ptrace_stop was guarding against are
> gone.
>
> Plus the test is so racy we case can happen after we drop siglock
> before we schedule, or shortly after we have stopped so we really
> don't reliably catch the condition the code is trying to catch.
>
> I think the case I care most about is ptrace_signal, which pretty much
> requires the tracer to wait and clear exit_code before being terminated
> to cause problems.  We don't handle that at all today.
>
> So yeah.  I think the code handles so little at this point we can just
> remove the code and simplify things, if we actually care we can come
> back and implement JOBCTL_PTRACE_SIGNR or the like.

The original explanation for handling this is:

commit 66519f549ae516e7ff2f24a8a5134713411a4a58
Author: Roland McGrath <roland@redhat.com>
Date:   Tue Jan 4 05:38:15 2005 -0800

    [PATCH] fix ptracer death race yielding bogus BUG_ON
    
    There is a BUG_ON in ptrace_stop that hits if the thread is not ptraced.
    However, there is no synchronization between a thread deciding to do a
    ptrace stop and so going here, and its ptracer dying and so detaching from
    it and clearing its ->ptrace field.
    
    The RHEL3 2.4-based kernel has a backport of a slightly older version of
    the 2.6 signals code, which has a different but equivalent BUG_ON.  This
    actually bit users in practice (when the debugger dies), but was
    exceedingly difficult to reproduce in contrived circumstances.  We moved
    forward in RHEL3 just by removing the BUG_ON, and that fixed the real user
    problems even though I was never able to reproduce the scenario myself.
    So, to my knowledge this scenario has never actually been seen in practice
    under 2.6.  But it's plain to see from the code that it is indeed possible.
    
    This patch removes that BUG_ON, but also goes further and tries to handle
    this case more gracefully than simply avoiding the crash.  By removing the
    BUG_ON alone, it becomes possible for the real parent of a process to see
    spurious SIGCHLD notifications intended for the debugger that has just
    died, and have its child wind up stopped unexpectedly.  This patch avoids
    that possibility by detecting the case when we are about to do the ptrace
    stop but our ptracer has gone away, and simply eliding that ptrace stop
    altogether as if we hadn't been ptraced when we hit the interesting event
    (signal or ptrace_notify call for syscall tracing or something like that).
    
    Signed-off-by: Roland McGrath <roland@redhat.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

And it was all about
	BUG_ON(!(current->ptrace & PT_PTRACED));
At the beginning of ptrace_stop.

Which seems like a bit of buggy overkill.

>
> I will chew on that a bit and see if I can find any reasons for keeping
> the code in ptrace_stop at all.

Still chewing.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v2 07/12] ptrace: Don't change __state
@ 2022-05-04 18:28                         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 18:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/03, Eric W. Biederman wrote:
>>>
>>> Oleg Nesterov <oleg@redhat.com> writes:
>>>
>>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>>> > with SIGKILL. I still can't understand this.
>>> >
>>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>>> > in 11/12.
>>>
>>> >
>>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>>> > *signal_wake_up() better?
>>>
>>> Not changing __state is better because it removes special cases
>>> from the scheduler that only apply to ptrace.
>>
>> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>>
>> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
>> from __state) and complicating *signal_wake_up() (I mean, compared
>> to your previous version) is a good idea.
>>
>> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
>> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
>> around wait_task_inactive().
>>
>>> > And even if we need to ensure the tracee will always block after
>>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>>> > looks unnecessary to me.
>>>
>>> We still need to change signal_wake_up in that case.  Possibly
>>> signal_wake_up_state.
>>
>> Of course. See above.
>>
>>> >> if we depend on wait_task_inactive failing if the process is in the
>>> >> wrong state.
>>> >
>>> > OK, I guess this is what I do not understand. Could you spell please?
>>> >
>>> > And speaking of RT, wait_task_inactive() still can fail because
>>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>>> > preempt_disable() ? I don't understand the plan :/
>>>
>>> Let me describe his freezer change as that is much easier to get to the
>>> final result.  RT has more problems as it turns all spin locks into
>>> sleeping locks.  When a task is frozen
>>
>> [...snip...]
>>
>> Oh, thanks Eric, but I understand this part. But I still can't understand
>> why is it that critical to block in schedule... OK, I need to think about
>> it. Lets assume this is really necessary.
>>
>> Anyway. I'd suggest to not change TASK_TRACED in this series and not
>> complicate signal_wake_up() more than you did in your previous version:
>>
>> 	static inline void signal_wake_up(struct task_struct *t, bool resume)
>> 	{
>> 		bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> 		signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> 	}
>
> If your concern is signal_wake_up there is no reason it can't be:
>
> 	static inline void signal_wake_up(struct task_struct *t, bool fatal)
>         {
>         	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
>                 signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
>         }
>
> I guess I was more targeted in this version, which lead to more if
> statements but as there is only one place in the code that can be
> JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
> TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.
>
> So yes. I can make the code as simple as my earlier version of
> signal_wake_up.
>
>> JOBCTL_PTRACE_FROZEN is fine.
>>
>> ptrace_check_attach() can do
>>
>> 	if (!ret && !ignore_state &&
>> 	    /*
>> 	     * This can only fail if the frozen tracee races with
>> 	     * SIGKILL and enters schedule() with fatal_signal_pending
>> 	     */
>> 	    !wait_task_inactive(child, __TASK_TRACED))
>> 		ret = -ESRCH;
>>
>> 	return ret;
>>
>>
>> Now. If/when we really need to ensure that the frozen tracee always
>> blocks and wait_task_inactive() never fails, we can just do
>>
>> 	- add the fatal_signal_pending() check into ptrace_stop()
>> 	  (like this patch does)
>>
>> 	- say, change signal_pending_state:
>>
>> 	static inline int signal_pending_state(unsigned int state, struct task_struct *p)
>> 	{
>> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>> 			return 0;
>> 		if (!signal_pending(p))
>> 			return 0;
>> 		if (p->jobctl & JOBCTL_TASK_FROZEN)
>> 			return 0;
>> 		return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
>> 	}
>>
>> in a separate patch which should carefully document the need for this
>> change.
>>
>>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>>> > I mean, I am not sure it worth the trouble.
>>>
>>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>>> - stopping in ptrace_report_syscall.
>>> - Not having PT_TRACESYSGOOD set.
>>> - The tracee being killed with a fatal signal
>>         ^^^^^^
>>         tracer ?
>
> Both actually.
>
>>> - The tracee sending SIGTRAP to itself.
>>
>> Oh, but this is clear. But do we really care? If the tracer exits
>> unexpectedly, the tracee can have a lot more problems, I don't think
>> that this particular one is that important.
>
> I don't know of complaints, and if you haven't heard them either
> that that is a good indication that in practice we don't care.
>
> At a practical level I just don't want that silly case that sets
> TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
> remain.  It just seems to make everything more complicated for no real
> reason anymore.  The deadlocks may_ptrace_stop was guarding against are
> gone.
>
> Plus the test is so racy we case can happen after we drop siglock
> before we schedule, or shortly after we have stopped so we really
> don't reliably catch the condition the code is trying to catch.
>
> I think the case I care most about is ptrace_signal, which pretty much
> requires the tracer to wait and clear exit_code before being terminated
> to cause problems.  We don't handle that at all today.
>
> So yeah.  I think the code handles so little at this point we can just
> remove the code and simplify things, if we actually care we can come
> back and implement JOBCTL_PTRACE_SIGNR or the like.

The original explanation for handling this is:

commit 66519f549ae516e7ff2f24a8a5134713411a4a58
Author: Roland McGrath <roland@redhat.com>
Date:   Tue Jan 4 05:38:15 2005 -0800

    [PATCH] fix ptracer death race yielding bogus BUG_ON
    
    There is a BUG_ON in ptrace_stop that hits if the thread is not ptraced.
    However, there is no synchronization between a thread deciding to do a
    ptrace stop and so going here, and its ptracer dying and so detaching from
    it and clearing its ->ptrace field.
    
    The RHEL3 2.4-based kernel has a backport of a slightly older version of
    the 2.6 signals code, which has a different but equivalent BUG_ON.  This
    actually bit users in practice (when the debugger dies), but was
    exceedingly difficult to reproduce in contrived circumstances.  We moved
    forward in RHEL3 just by removing the BUG_ON, and that fixed the real user
    problems even though I was never able to reproduce the scenario myself.
    So, to my knowledge this scenario has never actually been seen in practice
    under 2.6.  But it's plain to see from the code that it is indeed possible.
    
    This patch removes that BUG_ON, but also goes further and tries to handle
    this case more gracefully than simply avoiding the crash.  By removing the
    BUG_ON alone, it becomes possible for the real parent of a process to see
    spurious SIGCHLD notifications intended for the debugger that has just
    died, and have its child wind up stopped unexpectedly.  This patch avoids
    that possibility by detecting the case when we are about to do the ptrace
    stop but our ptracer has gone away, and simply eliding that ptrace stop
    altogether as if we hadn't been ptraced when we hit the interesting event
    (signal or ptrace_notify call for syscall tracing or something like that).
    
    Signed-off-by: Roland McGrath <roland@redhat.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

And it was all about
	BUG_ON(!(current->ptrace & PT_PTRACED));
At the beginning of ptrace_stop.

Which seems like a bit of buggy overkill.

>
> I will chew on that a bit and see if I can find any reasons for keeping
> the code in ptrace_stop at all.

Still chewing.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v3 0/11] ptrace: cleaning up ptrace_stop
  2022-04-29 21:46         ` Eric W. Biederman
  (?)
@ 2022-05-04 22:39           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v3 is that
instead of trying to prevent sending a spurious SIGTRAP when the tracer
dies with the tracee in ptrace_report_syscall, I have modified the code
to just stop trying.  While I still have taken TASK_WAKEKILL out of
TASK_TRACED I have implemented simpler logic in signal_wake_up.  Further
I have followed Oleg's advice and exit early from ptrace_stop if a fatal
signal is pending.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

I believe this set of changes will provide a firm foundation for solving
the PREEMPT_RT and freezer challenges.

With fewer lines added and more lines removed this set of changes looks
like it is moving in a good direction.

Eric W. Biederman (10):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++----------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 138 insertions(+), 237 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v3 0/11] ptrace: cleaning up ptrace_stop
@ 2022-05-04 22:39           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v3 is that
instead of trying to prevent sending a spurious SIGTRAP when the tracer
dies with the tracee in ptrace_report_syscall, I have modified the code
to just stop trying.  While I still have taken TASK_WAKEKILL out of
TASK_TRACED I have implemented simpler logic in signal_wake_up.  Further
I have followed Oleg's advice and exit early from ptrace_stop if a fatal
signal is pending.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

I believe this set of changes will provide a firm foundation for solving
the PREEMPT_RT and freezer challenges.

With fewer lines added and more lines removed this set of changes looks
like it is moving in a good direction.

Eric W. Biederman (10):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++----------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 138 insertions(+), 237 deletions(-)

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v3 0/11] ptrace: cleaning up ptrace_stop
@ 2022-05-04 22:39           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v3 is that
instead of trying to prevent sending a spurious SIGTRAP when the tracer
dies with the tracee in ptrace_report_syscall, I have modified the code
to just stop trying.  While I still have taken TASK_WAKEKILL out of
TASK_TRACED I have implemented simpler logic in signal_wake_up.  Further
I have followed Oleg's advice and exit early from ptrace_stop if a fatal
signal is pending.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

I believe this set of changes will provide a firm foundation for solving
the PREEMPT_RT and freezer challenges.

With fewer lines added and more lines removed this set of changes looks
like it is moving in a good direction.

Eric W. Biederman (10):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++----------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 135 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 138 insertions(+), 237 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v3 01/11] signal: Rename send_signal send_signal_locked
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 01/11] signal: Rename send_signal send_signal_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 01/11] signal: Rename send_signal send_signal_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler = SIG_DFL &&
 	    (!t->ptrace || (handler = HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 02/11] signal: Replace __group_send_sig_info with send_signal_locked
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 02/11] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 02/11] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo = SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 03/11] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 03/11] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 03/11] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall  			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 04/11] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 04/11] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 04/11] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 05/11] ptrace: Remove arch_ptrace_attach
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 05/11] ptrace: Remove arch_ptrace_attach
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 05/11] ptrace: Remove arch_ptrace_attach
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 06/11] signal: Use lockdep_assert_held instead of assert_spin_locked
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 06/11] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 06/11] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 07/11] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 07/11] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 07/11] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

With the removal of the incomplete detection of the tracer going away
in ptrace_stop, ptrace_stop always sleeps in schedule after
ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
warn if wait_task_inactive fails.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 ++-------
 kernel/signal.c | 81 ++++++++++++++++++-------------------------------
 2 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..16828fde5424 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,8 +2187,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2343,7 +2322,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2494,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2547,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr == 0)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

With the removal of the incomplete detection of the tracer going away
in ptrace_stop, ptrace_stop always sleeps in schedule after
ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
warn if wait_task_inactive fails.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 ++-------
 kernel/signal.c | 81 ++++++++++++++++++-------------------------------
 2 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..16828fde5424 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,8 +2187,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2343,7 +2322,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2494,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2547,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr == 0)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

With the removal of the incomplete detection of the tracer going away
in ptrace_stop, ptrace_stop always sleeps in schedule after
ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
warn if wait_task_inactive fails.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 ++-------
 kernel/signal.c | 81 ++++++++++++++++++-------------------------------
 2 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..16828fde5424 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,8 +2187,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2343,7 +2322,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2494,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2547,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr = 0)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 09/11] ptrace: Don't change __state
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 10 +++++++---
 6 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index 16828fde5424..e0b416b21ad3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2210,9 +2210,13 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
+	 * After this point signal_wake_up will clear TASK_TRACED
+	 * if a fatal signal comes in.  Handle previous fatal signals
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (__fatal_signal_pending(current))
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2300,7 +2304,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 10 +++++++---
 6 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index 16828fde5424..e0b416b21ad3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2210,9 +2210,13 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
+	 * After this point signal_wake_up will clear TASK_TRACED
+	 * if a fatal signal comes in.  Handle previous fatal signals
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (__fatal_signal_pending(current))
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2300,7 +2304,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 10 +++++++---
 6 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) = __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent = current) {
-		WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index 16828fde5424..e0b416b21ad3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2210,9 +2210,13 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
+	 * After this point signal_wake_up will clear TASK_TRACED
+	 * if a fatal signal comes in.  Handle previous fatal signals
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (__fatal_signal_pending(current))
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2300,7 +2304,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 10/11] ptrace: Always take siglock in ptrace_resume
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 10/11] ptrace: Always take siglock in ptrace_resume
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 10/11] ptrace: Always take siglock in ptrace_resume
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code = data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data = 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 11/11] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-04 22:40             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index e0b416b21ad3..80108017783d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2437,6 +2442,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 11/11] sched, signal, ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index e0b416b21ad3..80108017783d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2437,6 +2442,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v3 11/11] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-04 22:40             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-04 22:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) = TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index e0b416b21ad3..80108017783d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2437,6 +2442,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
  2022-05-04 22:40             ` Eric W. Biederman
  (?)
@ 2022-05-05 12:50               ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-05 12:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
implement ?

> set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
> jobctl_unfreeze_task (when ptrace_stop remains asleep).

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-05 12:50               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-05 12:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
implement ?

> set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
> jobctl_unfreeze_task (when ptrace_stop remains asleep).

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-05 12:50               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-05 12:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
> 
> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
implement ?

> set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
> jobctl_unfreeze_task (when ptrace_stop remains asleep).

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-04 22:40             ` Eric W. Biederman
  (?)
@ 2022-05-05 14:57               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 14:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)
>  	__releases(&current->sighand->siglock)
>  	__acquires(&current->sighand->siglock)
>  {
> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  
>  	spin_unlock_irq(&current->sighand->siglock);
>  	read_lock(&tasklist_lock);
> -	if (likely(current->ptrace)) {
> -		/*
> -		 * Notify parents of the stop.
> -		 *
> -		 * While ptraced, there are two parents - the ptracer and
> -		 * the real_parent of the group_leader.  The ptracer should
> -		 * know about every stop while the real parent is only
> -		 * interested in the completion of group stop.  The states
> -		 * for the two don't interact with each other.  Notify
> -		 * separately unless they're gonna be duplicates.
> -		 */
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	if (current->ptrace)
>  		do_notify_parent_cldstop(current, true, why);
> -		if (gstop_done && ptrace_reparented(current))
> -			do_notify_parent_cldstop(current, false, why);
> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> +		do_notify_parent_cldstop(current, false, why);
>  
> -		/*
> -		 * Don't want to allow preemption here, because
> -		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
> -		 */
> -		preempt_disable();
> -		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> -		preempt_enable_no_resched();
> -		freezable_schedule();
> -		cgroup_leave_frozen(true);
> -	} else {
> -		/*
> -		 * By the time we got the lock, our tracer went away.
> -		 * Don't drop the lock yet, another tracer may come.
> -		 *
> -		 * If @gstop_done, the ptracer went away between group stop
> -		 * completion and here.  During detach, it would have set
> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
> -		 * the real parent of the group stop completion is enough.
> -		 */
> -		if (gstop_done)
> -			do_notify_parent_cldstop(current, false, why);
> -
> -		/* tasklist protects us from ptrace_freeze_traced() */
> -		__set_current_state(TASK_RUNNING);
> -		read_code = false;
> -		if (clear_code)
> -			exit_code = 0;
> -		read_unlock(&tasklist_lock);
> -	}
> +	/*
> +	 * Don't want to allow preemption here, because
> +	 * sys_ptrace() needs this task to be inactive.
> +	 *
> +	 * XXX: implement read_unlock_no_resched().
> +	 */
> +	preempt_disable();
> +	read_unlock(&tasklist_lock);
> +	cgroup_enter_frozen();
> +	preempt_enable_no_resched();
> +	freezable_schedule();

I must have missed something.

So the tracee calls ptrace_notify() but debugger goes away before the
ptrace_notify() takes siglock. After that the no longer traced task
will sleep in TASK_TRACED ?

Looks like ptrace_stop() needs to check current->ptrace before it does
set_special_state(TASK_TRACED) with siglock held? Then we can rely on
ptrace_unlink() which will wake the tracee up even if debugger exits.

No?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 14:57               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 14:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)
>  	__releases(&current->sighand->siglock)
>  	__acquires(&current->sighand->siglock)
>  {
> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  
>  	spin_unlock_irq(&current->sighand->siglock);
>  	read_lock(&tasklist_lock);
> -	if (likely(current->ptrace)) {
> -		/*
> -		 * Notify parents of the stop.
> -		 *
> -		 * While ptraced, there are two parents - the ptracer and
> -		 * the real_parent of the group_leader.  The ptracer should
> -		 * know about every stop while the real parent is only
> -		 * interested in the completion of group stop.  The states
> -		 * for the two don't interact with each other.  Notify
> -		 * separately unless they're gonna be duplicates.
> -		 */
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	if (current->ptrace)
>  		do_notify_parent_cldstop(current, true, why);
> -		if (gstop_done && ptrace_reparented(current))
> -			do_notify_parent_cldstop(current, false, why);
> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> +		do_notify_parent_cldstop(current, false, why);
>  
> -		/*
> -		 * Don't want to allow preemption here, because
> -		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
> -		 */
> -		preempt_disable();
> -		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> -		preempt_enable_no_resched();
> -		freezable_schedule();
> -		cgroup_leave_frozen(true);
> -	} else {
> -		/*
> -		 * By the time we got the lock, our tracer went away.
> -		 * Don't drop the lock yet, another tracer may come.
> -		 *
> -		 * If @gstop_done, the ptracer went away between group stop
> -		 * completion and here.  During detach, it would have set
> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
> -		 * the real parent of the group stop completion is enough.
> -		 */
> -		if (gstop_done)
> -			do_notify_parent_cldstop(current, false, why);
> -
> -		/* tasklist protects us from ptrace_freeze_traced() */
> -		__set_current_state(TASK_RUNNING);
> -		read_code = false;
> -		if (clear_code)
> -			exit_code = 0;
> -		read_unlock(&tasklist_lock);
> -	}
> +	/*
> +	 * Don't want to allow preemption here, because
> +	 * sys_ptrace() needs this task to be inactive.
> +	 *
> +	 * XXX: implement read_unlock_no_resched().
> +	 */
> +	preempt_disable();
> +	read_unlock(&tasklist_lock);
> +	cgroup_enter_frozen();
> +	preempt_enable_no_resched();
> +	freezable_schedule();

I must have missed something.

So the tracee calls ptrace_notify() but debugger goes away before the
ptrace_notify() takes siglock. After that the no longer traced task
will sleep in TASK_TRACED ?

Looks like ptrace_stop() needs to check current->ptrace before it does
set_special_state(TASK_TRACED) with siglock held? Then we can rely on
ptrace_unlink() which will wake the tracee up even if debugger exits.

No?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 14:57               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 14:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)
>  	__releases(&current->sighand->siglock)
>  	__acquires(&current->sighand->siglock)
>  {
> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>  
>  	spin_unlock_irq(&current->sighand->siglock);
>  	read_lock(&tasklist_lock);
> -	if (likely(current->ptrace)) {
> -		/*
> -		 * Notify parents of the stop.
> -		 *
> -		 * While ptraced, there are two parents - the ptracer and
> -		 * the real_parent of the group_leader.  The ptracer should
> -		 * know about every stop while the real parent is only
> -		 * interested in the completion of group stop.  The states
> -		 * for the two don't interact with each other.  Notify
> -		 * separately unless they're gonna be duplicates.
> -		 */
> +	/*
> +	 * Notify parents of the stop.
> +	 *
> +	 * While ptraced, there are two parents - the ptracer and
> +	 * the real_parent of the group_leader.  The ptracer should
> +	 * know about every stop while the real parent is only
> +	 * interested in the completion of group stop.  The states
> +	 * for the two don't interact with each other.  Notify
> +	 * separately unless they're gonna be duplicates.
> +	 */
> +	if (current->ptrace)
>  		do_notify_parent_cldstop(current, true, why);
> -		if (gstop_done && ptrace_reparented(current))
> -			do_notify_parent_cldstop(current, false, why);
> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> +		do_notify_parent_cldstop(current, false, why);
>  
> -		/*
> -		 * Don't want to allow preemption here, because
> -		 * sys_ptrace() needs this task to be inactive.
> -		 *
> -		 * XXX: implement read_unlock_no_resched().
> -		 */
> -		preempt_disable();
> -		read_unlock(&tasklist_lock);
> -		cgroup_enter_frozen();
> -		preempt_enable_no_resched();
> -		freezable_schedule();
> -		cgroup_leave_frozen(true);
> -	} else {
> -		/*
> -		 * By the time we got the lock, our tracer went away.
> -		 * Don't drop the lock yet, another tracer may come.
> -		 *
> -		 * If @gstop_done, the ptracer went away between group stop
> -		 * completion and here.  During detach, it would have set
> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
> -		 * the real parent of the group stop completion is enough.
> -		 */
> -		if (gstop_done)
> -			do_notify_parent_cldstop(current, false, why);
> -
> -		/* tasklist protects us from ptrace_freeze_traced() */
> -		__set_current_state(TASK_RUNNING);
> -		read_code = false;
> -		if (clear_code)
> -			exit_code = 0;
> -		read_unlock(&tasklist_lock);
> -	}
> +	/*
> +	 * Don't want to allow preemption here, because
> +	 * sys_ptrace() needs this task to be inactive.
> +	 *
> +	 * XXX: implement read_unlock_no_resched().
> +	 */
> +	preempt_disable();
> +	read_unlock(&tasklist_lock);
> +	cgroup_enter_frozen();
> +	preempt_enable_no_resched();
> +	freezable_schedule();

I must have missed something.

So the tracee calls ptrace_notify() but debugger goes away before the
ptrace_notify() takes siglock. After that the no longer traced task
will sleep in TASK_TRACED ?

Looks like ptrace_stop() needs to check current->ptrace before it does
set_special_state(TASK_TRACED) with siglock held? Then we can rely on
ptrace_unlink() which will wake the tracee up even if debugger exits.

No?

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-04 22:40             ` Eric W. Biederman
  (?)
@ 2022-05-05 15:01               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> With the removal of the incomplete detection of the tracer going away
> in ptrace_stop, ptrace_stop always sleeps in schedule after
> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
> warn if wait_task_inactive fails.

Oh. Again, I don't understand the changelog. If we forget about RT,
ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
may_ptrace_stop() has gone.

IOW. Lets forget about RT

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +		ret = -ESRCH;
>
>  	return ret;
>  }

Why do you think this change would be wrong without any other changes?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 15:01               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> With the removal of the incomplete detection of the tracer going away
> in ptrace_stop, ptrace_stop always sleeps in schedule after
> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
> warn if wait_task_inactive fails.

Oh. Again, I don't understand the changelog. If we forget about RT,
ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
may_ptrace_stop() has gone.

IOW. Lets forget about RT

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +		ret = -ESRCH;
>
>  	return ret;
>  }

Why do you think this change would be wrong without any other changes?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 15:01               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> With the removal of the incomplete detection of the tracer going away
> in ptrace_stop, ptrace_stop always sleeps in schedule after
> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
> warn if wait_task_inactive fails.

Oh. Again, I don't understand the changelog. If we forget about RT,
ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
may_ptrace_stop() has gone.

IOW. Lets forget about RT

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>
> -	if (!ret && !ignore_state) {
> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
> -			/*
> -			 * This can only happen if may_ptrace_stop() fails and
> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
> -			 * so we should not worry about leaking __TASK_TRACED.
> -			 */
> -			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
> -			ret = -ESRCH;
> -		}
> -	}
> +	if (!ret && !ignore_state &&
> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +		ret = -ESRCH;
>
>  	return ret;
>  }

Why do you think this change would be wrong without any other changes?

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-04 22:40             ` Eric W. Biederman
  (?)
@ 2022-05-05 15:28               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)

Forgot to mention... but in general I like this change.

In particular, I like the fact it kills the ugly "int clear_code" arg
which looks as if it solves the problems with the exiting tracer, but
actually it doesn't. And we do not really care, imo.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 15:28               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)

Forgot to mention... but in general I like this change.

In particular, I like the fact it kills the ugly "int clear_code" arg
which looks as if it solves the problems with the exiting tracer, but
actually it doesn't. And we do not really care, imo.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 15:28               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 15:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> -			unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> +		       kernel_siginfo_t *info)

Forgot to mention... but in general I like this change.

In particular, I like the fact it kills the ugly "int clear_code" arg
which looks as if it solves the problems with the exiting tracer, but
actually it doesn't. And we do not really care, imo.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
  2022-05-05 12:50               ` Sebastian Andrzej Siewior
  (?)
@ 2022-05-05 16:48                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>> 
>> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
>> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
> implement ?

Yes.  Thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-05 16:48                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>> 
>> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
>> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
> implement ?

Yes.  Thank you.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 09/11] ptrace: Don't change __state
@ 2022-05-05 16:48                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>> 
>> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
>> implemention a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is
> implement ?

Yes.  Thank you.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 14:57               ` Oleg Nesterov
  (?)
@ 2022-05-05 16:59                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>>  	__releases(&current->sighand->siglock)
>>  	__acquires(&current->sighand->siglock)
>>  {
>> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  
>>  	spin_unlock_irq(&current->sighand->siglock);
>>  	read_lock(&tasklist_lock);
>> -	if (likely(current->ptrace)) {
>> -		/*
>> -		 * Notify parents of the stop.
>> -		 *
>> -		 * While ptraced, there are two parents - the ptracer and
>> -		 * the real_parent of the group_leader.  The ptracer should
>> -		 * know about every stop while the real parent is only
>> -		 * interested in the completion of group stop.  The states
>> -		 * for the two don't interact with each other.  Notify
>> -		 * separately unless they're gonna be duplicates.
>> -		 */
>> +	/*
>> +	 * Notify parents of the stop.
>> +	 *
>> +	 * While ptraced, there are two parents - the ptracer and
>> +	 * the real_parent of the group_leader.  The ptracer should
>> +	 * know about every stop while the real parent is only
>> +	 * interested in the completion of group stop.  The states
>> +	 * for the two don't interact with each other.  Notify
>> +	 * separately unless they're gonna be duplicates.
>> +	 */
>> +	if (current->ptrace)
>>  		do_notify_parent_cldstop(current, true, why);
>> -		if (gstop_done && ptrace_reparented(current))
>> -			do_notify_parent_cldstop(current, false, why);
>> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
>> +		do_notify_parent_cldstop(current, false, why);
>>  
>> -		/*
>> -		 * Don't want to allow preemption here, because
>> -		 * sys_ptrace() needs this task to be inactive.
>> -		 *
>> -		 * XXX: implement read_unlock_no_resched().
>> -		 */
>> -		preempt_disable();
>> -		read_unlock(&tasklist_lock);
>> -		cgroup_enter_frozen();
>> -		preempt_enable_no_resched();
>> -		freezable_schedule();
>> -		cgroup_leave_frozen(true);
>> -	} else {
>> -		/*
>> -		 * By the time we got the lock, our tracer went away.
>> -		 * Don't drop the lock yet, another tracer may come.
>> -		 *
>> -		 * If @gstop_done, the ptracer went away between group stop
>> -		 * completion and here.  During detach, it would have set
>> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
>> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
>> -		 * the real parent of the group stop completion is enough.
>> -		 */
>> -		if (gstop_done)
>> -			do_notify_parent_cldstop(current, false, why);
>> -
>> -		/* tasklist protects us from ptrace_freeze_traced() */
>> -		__set_current_state(TASK_RUNNING);
>> -		read_code = false;
>> -		if (clear_code)
>> -			exit_code = 0;
>> -		read_unlock(&tasklist_lock);
>> -	}
>> +	/*
>> +	 * Don't want to allow preemption here, because
>> +	 * sys_ptrace() needs this task to be inactive.
>> +	 *
>> +	 * XXX: implement read_unlock_no_resched().
>> +	 */
>> +	preempt_disable();
>> +	read_unlock(&tasklist_lock);
>> +	cgroup_enter_frozen();
>> +	preempt_enable_no_resched();
>> +	freezable_schedule();
>
> I must have missed something.
>
> So the tracee calls ptrace_notify() but debugger goes away before the
> ptrace_notify() takes siglock. After that the no longer traced task
> will sleep in TASK_TRACED ?
>
> Looks like ptrace_stop() needs to check current->ptrace before it does
> set_special_state(TASK_TRACED) with siglock held? Then we can rely on
> ptrace_unlink() which will wake the tracee up even if debugger exits.
>
> No?

Hmm.  If the debugger goes away when siglock is dropped and reaquired at
the top of ptrace_stop, that would appear to set the debugged process up
to sleep indefinitely.

I was thinking of the SIGKILL case which is handled.

Thank you for catching that.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 16:59                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>>  	__releases(&current->sighand->siglock)
>>  	__acquires(&current->sighand->siglock)
>>  {
>> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  
>>  	spin_unlock_irq(&current->sighand->siglock);
>>  	read_lock(&tasklist_lock);
>> -	if (likely(current->ptrace)) {
>> -		/*
>> -		 * Notify parents of the stop.
>> -		 *
>> -		 * While ptraced, there are two parents - the ptracer and
>> -		 * the real_parent of the group_leader.  The ptracer should
>> -		 * know about every stop while the real parent is only
>> -		 * interested in the completion of group stop.  The states
>> -		 * for the two don't interact with each other.  Notify
>> -		 * separately unless they're gonna be duplicates.
>> -		 */
>> +	/*
>> +	 * Notify parents of the stop.
>> +	 *
>> +	 * While ptraced, there are two parents - the ptracer and
>> +	 * the real_parent of the group_leader.  The ptracer should
>> +	 * know about every stop while the real parent is only
>> +	 * interested in the completion of group stop.  The states
>> +	 * for the two don't interact with each other.  Notify
>> +	 * separately unless they're gonna be duplicates.
>> +	 */
>> +	if (current->ptrace)
>>  		do_notify_parent_cldstop(current, true, why);
>> -		if (gstop_done && ptrace_reparented(current))
>> -			do_notify_parent_cldstop(current, false, why);
>> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
>> +		do_notify_parent_cldstop(current, false, why);
>>  
>> -		/*
>> -		 * Don't want to allow preemption here, because
>> -		 * sys_ptrace() needs this task to be inactive.
>> -		 *
>> -		 * XXX: implement read_unlock_no_resched().
>> -		 */
>> -		preempt_disable();
>> -		read_unlock(&tasklist_lock);
>> -		cgroup_enter_frozen();
>> -		preempt_enable_no_resched();
>> -		freezable_schedule();
>> -		cgroup_leave_frozen(true);
>> -	} else {
>> -		/*
>> -		 * By the time we got the lock, our tracer went away.
>> -		 * Don't drop the lock yet, another tracer may come.
>> -		 *
>> -		 * If @gstop_done, the ptracer went away between group stop
>> -		 * completion and here.  During detach, it would have set
>> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
>> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
>> -		 * the real parent of the group stop completion is enough.
>> -		 */
>> -		if (gstop_done)
>> -			do_notify_parent_cldstop(current, false, why);
>> -
>> -		/* tasklist protects us from ptrace_freeze_traced() */
>> -		__set_current_state(TASK_RUNNING);
>> -		read_code = false;
>> -		if (clear_code)
>> -			exit_code = 0;
>> -		read_unlock(&tasklist_lock);
>> -	}
>> +	/*
>> +	 * Don't want to allow preemption here, because
>> +	 * sys_ptrace() needs this task to be inactive.
>> +	 *
>> +	 * XXX: implement read_unlock_no_resched().
>> +	 */
>> +	preempt_disable();
>> +	read_unlock(&tasklist_lock);
>> +	cgroup_enter_frozen();
>> +	preempt_enable_no_resched();
>> +	freezable_schedule();
>
> I must have missed something.
>
> So the tracee calls ptrace_notify() but debugger goes away before the
> ptrace_notify() takes siglock. After that the no longer traced task
> will sleep in TASK_TRACED ?
>
> Looks like ptrace_stop() needs to check current->ptrace before it does
> set_special_state(TASK_TRACED) with siglock held? Then we can rely on
> ptrace_unlink() which will wake the tracee up even if debugger exits.
>
> No?

Hmm.  If the debugger goes away when siglock is dropped and reaquired at
the top of ptrace_stop, that would appear to set the debugged process up
to sleep indefinitely.

I was thinking of the SIGKILL case which is handled.

Thank you for catching that.

Eric



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 16:59                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 16:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>>  	__releases(&current->sighand->siglock)
>>  	__acquires(&current->sighand->siglock)
>>  {
>> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>  
>>  	spin_unlock_irq(&current->sighand->siglock);
>>  	read_lock(&tasklist_lock);
>> -	if (likely(current->ptrace)) {
>> -		/*
>> -		 * Notify parents of the stop.
>> -		 *
>> -		 * While ptraced, there are two parents - the ptracer and
>> -		 * the real_parent of the group_leader.  The ptracer should
>> -		 * know about every stop while the real parent is only
>> -		 * interested in the completion of group stop.  The states
>> -		 * for the two don't interact with each other.  Notify
>> -		 * separately unless they're gonna be duplicates.
>> -		 */
>> +	/*
>> +	 * Notify parents of the stop.
>> +	 *
>> +	 * While ptraced, there are two parents - the ptracer and
>> +	 * the real_parent of the group_leader.  The ptracer should
>> +	 * know about every stop while the real parent is only
>> +	 * interested in the completion of group stop.  The states
>> +	 * for the two don't interact with each other.  Notify
>> +	 * separately unless they're gonna be duplicates.
>> +	 */
>> +	if (current->ptrace)
>>  		do_notify_parent_cldstop(current, true, why);
>> -		if (gstop_done && ptrace_reparented(current))
>> -			do_notify_parent_cldstop(current, false, why);
>> +	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
>> +		do_notify_parent_cldstop(current, false, why);
>>  
>> -		/*
>> -		 * Don't want to allow preemption here, because
>> -		 * sys_ptrace() needs this task to be inactive.
>> -		 *
>> -		 * XXX: implement read_unlock_no_resched().
>> -		 */
>> -		preempt_disable();
>> -		read_unlock(&tasklist_lock);
>> -		cgroup_enter_frozen();
>> -		preempt_enable_no_resched();
>> -		freezable_schedule();
>> -		cgroup_leave_frozen(true);
>> -	} else {
>> -		/*
>> -		 * By the time we got the lock, our tracer went away.
>> -		 * Don't drop the lock yet, another tracer may come.
>> -		 *
>> -		 * If @gstop_done, the ptracer went away between group stop
>> -		 * completion and here.  During detach, it would have set
>> -		 * JOBCTL_STOP_PENDING on us and we'll re-enter
>> -		 * TASK_STOPPED in do_signal_stop() on return, so notifying
>> -		 * the real parent of the group stop completion is enough.
>> -		 */
>> -		if (gstop_done)
>> -			do_notify_parent_cldstop(current, false, why);
>> -
>> -		/* tasklist protects us from ptrace_freeze_traced() */
>> -		__set_current_state(TASK_RUNNING);
>> -		read_code = false;
>> -		if (clear_code)
>> -			exit_code = 0;
>> -		read_unlock(&tasklist_lock);
>> -	}
>> +	/*
>> +	 * Don't want to allow preemption here, because
>> +	 * sys_ptrace() needs this task to be inactive.
>> +	 *
>> +	 * XXX: implement read_unlock_no_resched().
>> +	 */
>> +	preempt_disable();
>> +	read_unlock(&tasklist_lock);
>> +	cgroup_enter_frozen();
>> +	preempt_enable_no_resched();
>> +	freezable_schedule();
>
> I must have missed something.
>
> So the tracee calls ptrace_notify() but debugger goes away before the
> ptrace_notify() takes siglock. After that the no longer traced task
> will sleep in TASK_TRACED ?
>
> Looks like ptrace_stop() needs to check current->ptrace before it does
> set_special_state(TASK_TRACED) with siglock held? Then we can rely on
> ptrace_unlink() which will wake the tracee up even if debugger exits.
>
> No?

Hmm.  If the debugger goes away when siglock is dropped and reaquired at
the top of ptrace_stop, that would appear to set the debugged process up
to sleep indefinitely.

I was thinking of the SIGKILL case which is handled.

Thank you for catching that.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 15:01               ` Oleg Nesterov
  (?)
@ 2022-05-05 17:21                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> With the removal of the incomplete detection of the tracer going away
>> in ptrace_stop, ptrace_stop always sleeps in schedule after
>> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
>> warn if wait_task_inactive fails.
>
> Oh. Again, I don't understand the changelog. If we forget about RT,
> ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
> may_ptrace_stop() has gone.
>
> IOW. Lets forget about RT
>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +		ret = -ESRCH;
>>
>>  	return ret;
>>  }
>
> Why do you think this change would be wrong without any other changes?

For purposes of this analysis ptrace_detach and ptrace_exit (when the
tracer exits) can't happen.  So the bug you spotted in ptrace_stop does
not apply.

I was thinking that the test against !current->ptrace that replaced
the old may_ptrace_stop could trigger a failure here.  If the
ptrace_freeze_traced happens before that test that branch clearly can
not happen.

*Looks twice* Both ptrace_check_attach and ptrace_stop taking a
read_lock on tasklist_lock does not protect against concurrency by each
other, but the write_lock on tasklist_lock in ptrace_attach does
protect against a ptrace_attach coming in after the test and before
__set_current_state(TASK_RUNNING).

So yes. I should really split that part out into it's own patch.
And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
because PREMPT_RT is currently broken with respect to ptrace.  Which
makes a WARN_ON_ONCE appropriate.

I will see how much of this analysis I can put in the changelog.

Thank you,
Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:21                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> With the removal of the incomplete detection of the tracer going away
>> in ptrace_stop, ptrace_stop always sleeps in schedule after
>> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
>> warn if wait_task_inactive fails.
>
> Oh. Again, I don't understand the changelog. If we forget about RT,
> ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
> may_ptrace_stop() has gone.
>
> IOW. Lets forget about RT
>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +		ret = -ESRCH;
>>
>>  	return ret;
>>  }
>
> Why do you think this change would be wrong without any other changes?

For purposes of this analysis ptrace_detach and ptrace_exit (when the
tracer exits) can't happen.  So the bug you spotted in ptrace_stop does
not apply.

I was thinking that the test against !current->ptrace that replaced
the old may_ptrace_stop could trigger a failure here.  If the
ptrace_freeze_traced happens before that test that branch clearly can
not happen.

*Looks twice* Both ptrace_check_attach and ptrace_stop taking a
read_lock on tasklist_lock does not protect against concurrency by each
other, but the write_lock on tasklist_lock in ptrace_attach does
protect against a ptrace_attach coming in after the test and before
__set_current_state(TASK_RUNNING).

So yes. I should really split that part out into it's own patch.
And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
because PREMPT_RT is currently broken with respect to ptrace.  Which
makes a WARN_ON_ONCE appropriate.

I will see how much of this analysis I can put in the changelog.

Thank you,
Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:21                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> With the removal of the incomplete detection of the tracer going away
>> in ptrace_stop, ptrace_stop always sleeps in schedule after
>> ptrace_freeze_traced succeeds.  Modify ptrace_check_attach to
>> warn if wait_task_inactive fails.
>
> Oh. Again, I don't understand the changelog. If we forget about RT,
> ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
> may_ptrace_stop() has gone.
>
> IOW. Lets forget about RT
>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>
>> -	if (!ret && !ignore_state) {
>> -		if (!wait_task_inactive(child, __TASK_TRACED)) {
>> -			/*
>> -			 * This can only happen if may_ptrace_stop() fails and
>> -			 * ptrace_stop() changes ->state back to TASK_RUNNING,
>> -			 * so we should not worry about leaking __TASK_TRACED.
>> -			 */
>> -			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
>> -			ret = -ESRCH;
>> -		}
>> -	}
>> +	if (!ret && !ignore_state &&
>> +	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +		ret = -ESRCH;
>>
>>  	return ret;
>>  }
>
> Why do you think this change would be wrong without any other changes?

For purposes of this analysis ptrace_detach and ptrace_exit (when the
tracer exits) can't happen.  So the bug you spotted in ptrace_stop does
not apply.

I was thinking that the test against !current->ptrace that replaced
the old may_ptrace_stop could trigger a failure here.  If the
ptrace_freeze_traced happens before that test that branch clearly can
not happen.

*Looks twice* Both ptrace_check_attach and ptrace_stop taking a
read_lock on tasklist_lock does not protect against concurrency by each
other, but the write_lock on tasklist_lock in ptrace_attach does
protect against a ptrace_attach coming in after the test and before
__set_current_state(TASK_RUNNING).

So yes. I should really split that part out into it's own patch.
And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
because PREMPT_RT is currently broken with respect to ptrace.  Which
makes a WARN_ON_ONCE appropriate.

I will see how much of this analysis I can put in the changelog.

Thank you,
Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 17:21                 ` Eric W. Biederman
  (?)
@ 2022-05-05 17:27                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 17:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
> because PREMPT_RT is currently broken with respect to ptrace.  Which
> makes a WARN_ON_ONCE appropriate.

Yes agreed. In this case WARN_ON_ONCE() can help a user to understand
that a failure was caused by the kernel problem which we need to fix
anyway.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:27                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 17:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
> because PREMPT_RT is currently broken with respect to ptrace.  Which
> makes a WARN_ON_ONCE appropriate.

Yes agreed. In this case WARN_ON_ONCE() can help a user to understand
that a failure was caused by the kernel problem which we need to fix
anyway.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:27                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 17:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
> because PREMPT_RT is currently broken with respect to ptrace.  Which
> makes a WARN_ON_ONCE appropriate.

Yes agreed. In this case WARN_ON_ONCE() can help a user to understand
that a failure was caused by the kernel problem which we need to fix
anyway.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 15:28               ` Oleg Nesterov
  (?)
@ 2022-05-05 17:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>
> Forgot to mention... but in general I like this change.
>
> In particular, I like the fact it kills the ugly "int clear_code" arg
> which looks as if it solves the problems with the exiting tracer, but
> actually it doesn't. And we do not really care, imo.

Further either this change is necessary or we need to take siglock in
the !current->ptrace path in "ptrace: Don't change __state" so that
JOBCTL_TRACED can be cleared.

So I vote for deleting code, and making ptrace_stop easier to reason
about.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>
> Forgot to mention... but in general I like this change.
>
> In particular, I like the fact it kills the ugly "int clear_code" arg
> which looks as if it solves the problems with the exiting tracer, but
> actually it doesn't. And we do not really care, imo.

Further either this change is necessary or we need to take siglock in
the !current->ptrace path in "ptrace: Don't change __state" so that
JOBCTL_TRACED can be cleared.

So I vote for deleting code, and making ptrace_stop easier to reason
about.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 17:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 17:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> -			unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> +		       kernel_siginfo_t *info)
>
> Forgot to mention... but in general I like this change.
>
> In particular, I like the fact it kills the ugly "int clear_code" arg
> which looks as if it solves the problems with the exiting tracer, but
> actually it doesn't. And we do not really care, imo.

Further either this change is necessary or we need to take siglock in
the !current->ptrace path in "ptrace: Don't change __state" so that
JOBCTL_TRACED can be cleared.

So I vote for deleting code, and making ptrace_stop easier to reason
about.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 17:53                 ` Eric W. Biederman
  (?)
@ 2022-05-05 18:10                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 18:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> So I vote for deleting code, and making ptrace_stop easier to reason
> about.

Yes, yes, agreed.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 18:10                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 18:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> So I vote for deleting code, and making ptrace_stop easier to reason
> about.

Yes, yes, agreed.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 18:10                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-05 18:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> So I vote for deleting code, and making ptrace_stop easier to reason
> about.

Yes, yes, agreed.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-04 22:39           ` Eric W. Biederman
  (?)
@ 2022-05-05 18:25             ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v4 is the
split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
two patches because the dependency I thought exited between two
different changes did not exist.  The rest of the changes are minor
tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
removing an always true branch, and adding an early  test to see if the
ptracer had gone, before TASK_TRAPPING was set.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

This set of changes continues to look like it will provide a firm
foundation for solving the PREEMPT_RT and freezer challenges.

Eric W. Biederman (11):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Document that wait_task_inactive can't fail
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++---------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 140 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 140 insertions(+), 240 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-05 18:25             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v4 is the
split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
two patches because the dependency I thought exited between two
different changes did not exist.  The rest of the changes are minor
tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
removing an always true branch, and adding an early  test to see if the
ptracer had gone, before TASK_TRAPPING was set.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

This set of changes continues to look like it will provide a firm
foundation for solving the PREEMPT_RT and freezer challenges.

Eric W. Biederman (11):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Document that wait_task_inactive can't fail
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++---------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 140 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 140 insertions(+), 240 deletions(-)

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-05 18:25             ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups.  This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track.  The biggest change in v4 is the
split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
two patches because the dependency I thought exited between two
different changes did not exist.  The rest of the changes are minor
tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
removing an always true branch, and adding an early  test to see if the
ptracer had gone, before TASK_TRAPPING was set.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

This set of changes continues to look like it will provide a firm
foundation for solving the PREEMPT_RT and freezer challenges.

Eric W. Biederman (11):
      signal: Rename send_signal send_signal_locked
      signal: Replace __group_send_sig_info with send_signal_locked
      ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
      ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
      ptrace: Remove arch_ptrace_attach
      signal: Use lockdep_assert_held instead of assert_spin_locked
      ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
      ptrace: Document that wait_task_inactive can't fail
      ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
      ptrace: Don't change __state
      ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
      sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

 arch/ia64/include/asm/ptrace.h    |   4 --
 arch/ia64/kernel/ptrace.c         |  57 ----------------
 arch/um/include/asm/thread_info.h |   2 +
 arch/um/kernel/exec.c             |   2 +-
 arch/um/kernel/process.c          |   2 +-
 arch/um/kernel/ptrace.c           |   8 +--
 arch/um/kernel/signal.c           |   4 +-
 arch/x86/kernel/step.c            |   3 +-
 arch/xtensa/kernel/ptrace.c       |   4 +-
 arch/xtensa/kernel/signal.c       |   4 +-
 drivers/tty/tty_jobctrl.c         |   4 +-
 include/linux/ptrace.h            |   7 --
 include/linux/sched.h             |  10 ++-
 include/linux/sched/jobctl.h      |   8 +++
 include/linux/sched/signal.h      |  20 ++++--
 include/linux/signal.h            |   3 +-
 kernel/ptrace.c                   |  87 ++++++++---------------
 kernel/sched/core.c               |   5 +-
 kernel/signal.c                   | 140 +++++++++++++++++---------------------
 kernel/time/posix-cpu-timers.c    |   6 +-
 20 files changed, 140 insertions(+), 240 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH v4 01/12] signal: Rename send_signal send_signal_locked
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler == SIG_DFL &&
 	    (!t->ptrace || (handler == HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 01/12] signal: Rename send_signal send_signal_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/signal.h |  2 ++
 kernel/signal.c        | 24 ++++++++++++------------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
 extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
 extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
 	struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
-			enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+		       struct task_struct *t, enum pid_type type)
 {
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
 			force = true;
 		}
 	}
-	return __send_signal(sig, info, t, type, force);
+	return __send_signal_locked(sig, info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
 int
 __group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
 {
-	return send_signal(sig, info, p, PIDTYPE_TGID);
+	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
 }
 
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
 	int ret = -ESRCH;
 
 	if (lock_task_sighand(p, &flags)) {
-		ret = send_signal(sig, info, p, type);
+		ret = send_signal_locked(sig, info, p, type);
 		unlock_task_sighand(p, &flags);
 	}
 
@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	if (action->sa.sa_handler = SIG_DFL &&
 	    (!t->ptrace || (handler = HANDLER_EXIT)))
 		t->signal->flags &= ~SIGNAL_UNKILLABLE;
-	ret = send_signal(sig, info, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 
 	return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
 
 	if (sig) {
 		if (lock_task_sighand(p, &flags)) {
-			ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+			ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
 			unlock_task_sighand(p, &flags);
 		} else
 			ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 * parent's namespaces.
 	 */
 	if (valid_signal(sig) && sig)
-		__send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+		__send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
 	__wake_up_parent(tsk, tsk->parent);
 	spin_unlock_irqrestore(&psig->siglock, flags);
 
@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-		send_signal(signr, info, current, type);
+		send_signal_locked(signr, info, current, type);
 		signr = 0;
 	}
 
@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
 			   "the deadlock.\n");
 		return;
 	}
-	ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+	ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
 	spin_unlock(&t->sighand->siglock);
 	if (ret)
 		kdb_printf("Fail to deliver Signal %d to process %d.\n",
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 02/12] signal: Replace __group_send_sig_info with send_signal_locked
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo == SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 02/12] signal: Replace __group_send_sig_info with send_signal_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value.  As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/tty/tty_jobctrl.c      | 4 ++--
 include/linux/signal.h         | 1 -
 kernel/signal.c                | 8 +-------
 kernel/time/posix-cpu-timers.c | 6 +++---
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
 				spin_unlock_irq(&p->sighand->siglock);
 				continue;
 			}
-			__group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
-			__group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+			send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+			send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
 			put_pid(p->signal->tty_old_pgrp);  /* A noop */
 			spin_lock(&tty->ctrl.lock);
 			tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
 extern int send_signal_locked(int sig, struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)
 
 __setup("print-fatal-signals=", setup_print_fatal_signals);
 
-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
-	return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
 int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
 			enum pid_type type)
 {
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-		__group_send_sig_info(SIGCHLD, &info, parent);
+		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
 	/*
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
 {
 	if (tsk->dl.dl_overrun) {
 		tsk->dl.dl_overrun = 0;
-		__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+		send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 }
 
@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
 			rt ? "RT" : "CPU", hard ? "hard" : "soft",
 			current->comm, task_pid_nr(current));
 	}
-	__group_send_sig_info(signo, SEND_SIG_PRIV, current);
+	send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
 	return true;
 }
 
@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
 		trace_itimer_expire(signo = SIGPROF ?
 				    ITIMER_PROF : ITIMER_VIRTUAL,
 				    task_tgid(tsk), cur_time);
-		__group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+		send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
 	}
 
 	if (it->expires && it->expires < *expires)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall =
 			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

User mode linux is the last user of the PT_DTRACE flag.  Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/um/include/asm/thread_info.h | 2 ++
 arch/um/kernel/exec.c             | 2 +-
 arch/um/kernel/process.c          | 2 +-
 arch/um/kernel/ptrace.c           | 8 ++++----
 arch/um/kernel/signal.c           | 4 ++--
 include/linux/ptrace.h            | 1 -
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_RESTORE_SIGMASK	7
 #define TIF_NOTIFY_RESUME	8
 #define TIF_SECCOMP		9	/* secure computing */
+#define TIF_SINGLESTEP		10	/* single stepping userspace */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 
 #endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
 {
 	PT_REGS_IP(regs) = eip;
 	PT_REGS_SP(regs) = esp;
-	current->ptrace &= ~PT_DTRACE;
+	clear_thread_flag(TIF_SINGLESTEP);
 #ifdef SUBARCH_EXECVE1
 	SUBARCH_EXECVE1(regs->regs);
 #endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
 {
 	struct task_struct *task = t ? t : current;
 
-	if (!(task->ptrace & PT_DTRACE))
+	if (!test_thread_flag(TIF_SINGLESTEP))
 		return 0;
 
 	if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_DTRACE;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_DTRACE;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 	child->thread.singlestep_syscall = 0;
 
 #ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
 }
 
 /*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
  * PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
  */
 int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
 	audit_syscall_exit(regs);
 
 	/* Fake a debug trap */
-	if (ptraced & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		send_sigtrap(&regs->regs, 0);
 
 	if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 	unsigned long sp;
 	int err;
 
-	if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+	if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
 		singlestep = 1;
 
 	/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
 	 * on the host.  The tracing thread will check this flag and
 	 * PTRACE_SYSCALL if necessary.
 	 */
-	if (current->ptrace & PT_DTRACE)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		current->thread.singlestep_syscall  			is_syscall(PT_REGS_IP(&current->thread.regs));
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
-#define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 
 #define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable

xtensa is the last user of the PT_SINGLESTEP flag.  Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace.  Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: stable@vger.kernel.org
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/xtensa/kernel/ptrace.c | 4 ++--
 arch/xtensa/kernel/signal.c | 4 ++--
 include/linux/ptrace.h      | 6 ------
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 
 void user_enable_single_step(struct task_struct *child)
 {
-	child->ptrace |= PT_SINGLESTEP;
+	set_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 void user_disable_single_step(struct task_struct *child)
 {
-	child->ptrace &= ~PT_SINGLESTEP;
+	clear_tsk_thread_flag(child, TIF_SINGLESTEP);
 }
 
 /*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
 		/* Set up the stack frame */
 		ret = setup_frame(&ksig, sigmask_to_save(), regs);
 		signal_setup_done(ret, &ksig, 0);
-		if (current->ptrace & PT_SINGLESTEP)
+		if (test_thread_flag(TIF_SINGLESTEP))
 			task_pt_regs(current)->icountlevel = 1;
 
 		return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
 	/* If there's no signal to deliver, we just restore the saved mask.  */
 	restore_saved_sigmask();
 
-	if (current->ptrace & PT_SINGLESTEP)
+	if (test_thread_flag(TIF_SINGLESTEP))
 		task_pt_regs(current)->icountlevel = 1;
 	return;
 }
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT	31
-#define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT	30
-#define PT_BLOCKSTEP		(1<<PT_BLOCKSTEP_BIT)
-
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 05/12] ptrace: Remove arch_ptrace_attach
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 05/12] ptrace: Remove arch_ptrace_attach
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) == TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 05/12] ptrace: Remove arch_ptrace_attach
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED.  In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary.  So just
remove it.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/include/asm/ptrace.h |  4 ---
 arch/ia64/kernel/ptrace.c      | 57 ----------------------------------
 kernel/ptrace.c                | 18 -----------
 3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
   #define arch_ptrace_stop_needed() \
 	(!test_thread_flag(TIF_RESTORE_RSE))
 
-  extern void ptrace_attach_sync_user_rbs (struct task_struct *);
-  #define arch_ptrace_attach(child) \
-	ptrace_attach_sync_user_rbs(child)
-
   #define arch_has_single_step()  (1)
   #define arch_has_block_step()   (1)
 
diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
 	unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
 }
 
-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped.  arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
-	int stopped = 0;
-	struct unw_frame_info info;
-
-	/*
-	 * If the child is in TASK_STOPPED, we need to change that to
-	 * TASK_TRACED momentarily while we operate on it.  This ensures
-	 * that the child won't be woken up and return to user mode while
-	 * we are doing the sync.  (It can only be woken up for SIGKILL.)
-	 */
-
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_STOPPED &&
-		    !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
-			set_notify_resume(child);
-
-			WRITE_ONCE(child->__state, TASK_TRACED);
-			stopped = 1;
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-
-	if (!stopped)
-		return;
-
-	unw_init_from_blocked_task(&info, child);
-	do_sync_rbs(&info, ia64_sync_user_rbs);
-
-	/*
-	 * Now move the child back into TASK_STOPPED if it should be in a
-	 * job control stop, so that SIGCONT can be used to wake it up.
-	 */
-	read_lock(&tasklist_lock);
-	if (child->sighand) {
-		spin_lock_irq(&child->sighand->siglock);
-		if (READ_ONCE(child->__state) = TASK_TRACED &&
-		    (child->signal->flags & SIGNAL_STOP_STOPPED)) {
-			WRITE_ONCE(child->__state, TASK_STOPPED);
-		}
-		spin_unlock_irq(&child->sighand->siglock);
-	}
-	read_unlock(&tasklist_lock);
-}
-
 /*
  * Write f32-f127 back to task->thread.fph if it has been modified.
  */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
 	return ret;
 }
 
-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child)	do { } while (0)
-#endif
-
 SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 		unsigned long, data)
 {
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_TRACEME) {
 		ret = ptrace_traceme();
-		if (!ret)
-			arch_ptrace_attach(current);
 		goto out;
 	}
 
@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,
 
 	if (request = PTRACE_ATTACH || request = PTRACE_SEIZE) {
 		ret = ptrace_attach(child, request, addr, data);
-		/*
-		 * Some architectures need to do book-keeping after
-		 * a ptrace attach.
-		 */
-		if (!ret)
-			arch_ptrace_attach(child);
 		goto out_put_task_struct;
 	}
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 06/12] signal: Use lockdep_assert_held instead of assert_spin_locked
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 06/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 06/12] signal: Use lockdep_assert_held instead of assert_spin_locked
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock.  Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 static void ptrace_trap_notify(struct task_struct *t)
 {
 	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
 	ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	int override_rlimit;
 	int ret = 0, result;
 
-	assert_spin_locked(&t->sighand->siglock);
+	lockdep_assert_held(&t->sighand->siglock);
 
 	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, force))
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 07/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 07/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 07/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman, stable, Al Viro

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop.  At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop.  Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0.  This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it.  As that has always
been the intent of the code this seems like a reasonable change.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/step.c | 3 +--
 kernel/ptrace.c        | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
 	 *
 	 * NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
 	 * task is current or it can't be running, otherwise we can race
-	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
-	 * PTRACE_KILL is not safe.
+	 * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
 	 */
 	local_irq_disable();
 	debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
 		return ptrace_resume(child, request, data);
 
 	case PTRACE_KILL:
-		if (child->exit_state)	/* already dead */
-			return 0;
-		return ptrace_resume(child, request, SIGKILL);
+		send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+		return 0;
 
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

After ptrace_freeze_traced succeeds it is known that the the tracee
has a __state value of __TASK_TRACED and that no __ptrace_unlink will
happen because the tracer is waiting for the tracee, and the tracee is
in ptrace_stop.

The function ptrace_freeze_traced can succeed at any point after
ptrace_stop has set TASK_TRACED and dropped siglock.  The read_lock on
tasklist_lock only excludes ptrace_attach.

This means that the !current->ptrace which executes under a read_lock
of tasklist_lock will never see a ptrace_freeze_trace as the tracer
must have gone away before the tasklist_lock was taken and
ptrace_attach can not occur until the read_lock is dropped.  As
ptrace_freeze_traced depends upon ptrace_attach running before it can
run that excludes ptrace_freeze_traced until __state is set to
TASK_RUNNING.  This means that task_is_traced will fail in
ptrace_freeze_attach and ptrace_freeze_attached will fail.

On the current->ptrace branch of ptrace_stop which will be reached any
time after ptrace_freeze_traced has succeed it is known that __state
is __TASK_TRACED and schedule() will be called with that state.

Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED)
should never fail.  Remove the stale comment about may_ptrace_stop.

Strictly speaking this is not true because if PREEMPT_RT is enabled
wait_task_inactive can fail because __state can be changed.  I don't
see this as a problem as the ptrace code is currently broken on
PREMPT_RT, and this is one of the issues.  Failing and warning when
the assumptions of the code are broken is good.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

After ptrace_freeze_traced succeeds it is known that the the tracee
has a __state value of __TASK_TRACED and that no __ptrace_unlink will
happen because the tracer is waiting for the tracee, and the tracee is
in ptrace_stop.

The function ptrace_freeze_traced can succeed at any point after
ptrace_stop has set TASK_TRACED and dropped siglock.  The read_lock on
tasklist_lock only excludes ptrace_attach.

This means that the !current->ptrace which executes under a read_lock
of tasklist_lock will never see a ptrace_freeze_trace as the tracer
must have gone away before the tasklist_lock was taken and
ptrace_attach can not occur until the read_lock is dropped.  As
ptrace_freeze_traced depends upon ptrace_attach running before it can
run that excludes ptrace_freeze_traced until __state is set to
TASK_RUNNING.  This means that task_is_traced will fail in
ptrace_freeze_attach and ptrace_freeze_attached will fail.

On the current->ptrace branch of ptrace_stop which will be reached any
time after ptrace_freeze_traced has succeed it is known that __state
is __TASK_TRACED and schedule() will be called with that state.

Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED)
should never fail.  Remove the stale comment about may_ptrace_stop.

Strictly speaking this is not true because if PREEMPT_RT is enabled
wait_task_inactive can fail because __state can be changed.  I don't
see this as a problem as the ptrace code is currently broken on
PREMPT_RT, and this is one of the issues.  Failing and warning when
the assumptions of the code are broken is good.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

After ptrace_freeze_traced succeeds it is known that the the tracee
has a __state value of __TASK_TRACED and that no __ptrace_unlink will
happen because the tracer is waiting for the tracee, and the tracee is
in ptrace_stop.

The function ptrace_freeze_traced can succeed at any point after
ptrace_stop has set TASK_TRACED and dropped siglock.  The read_lock on
tasklist_lock only excludes ptrace_attach.

This means that the !current->ptrace which executes under a read_lock
of tasklist_lock will never see a ptrace_freeze_trace as the tracer
must have gone away before the tasklist_lock was taken and
ptrace_attach can not occur until the read_lock is dropped.  As
ptrace_freeze_traced depends upon ptrace_attach running before it can
run that excludes ptrace_freeze_traced until __state is set to
TASK_RUNNING.  This means that task_is_traced will fail in
ptrace_freeze_attach and ptrace_freeze_attached will fail.

On the current->ptrace branch of ptrace_stop which will be reached any
time after ptrace_freeze_traced has succeed it is known that __state
is __TASK_TRACED and schedule() will be called with that state.

Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED)
should never fail.  Remove the stale comment about may_ptrace_stop.

Strictly speaking this is not true because if PREEMPT_RT is enabled
wait_task_inactive can fail because __state can be changed.  I don't
see this as a problem as the ptrace code is currently broken on
PREMPT_RT, and this is one of the issues.  Failing and warning when
the assumptions of the code are broken is good.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state) {
-		if (!wait_task_inactive(child, __TASK_TRACED)) {
-			/*
-			 * This can only happen if may_ptrace_stop() fails and
-			 * ptrace_stop() changes ->state back to TASK_RUNNING,
-			 * so we should not worry about leaking __TASK_TRACED.
-			 */
-			WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
-			ret = -ESRCH;
-		}
-	}
+	if (!ret && !ignore_state &&
+	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+		ret = -ESRCH;
 
 	return ret;
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 09/12] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 92 ++++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 54 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..d2d0c753156c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,13 +2187,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2212,7 +2211,14 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
+	 *
+	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
+	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (!current->ptrace)
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2259,54 +2265,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2299,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2343,7 +2327,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2499,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2552,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr == 0)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 09/12] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 92 ++++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 54 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..d2d0c753156c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,13 +2187,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2212,7 +2211,14 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
+	 *
+	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
+	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (!current->ptrace)
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2259,54 +2265,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2299,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2343,7 +2327,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2499,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2552,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr == 0)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 09/12] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1].  The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger.  To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop.  The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure.  At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired.  If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee.  The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that.  This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 92 ++++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 54 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..d2d0c753156c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,13 +2187,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * with.  If the code did not stop because the tracer is gone,
  * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, int clear_code,
-			unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
 	bool gstop_done = false;
-	bool read_code = true;
 
 	if (arch_ptrace_stop_needed()) {
 		/*
@@ -2212,7 +2211,14 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	/*
 	 * schedule() will not sleep if there is a pending signal that
 	 * can awaken the task.
+	 *
+	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
+	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
+	 * here to prevent ptrace_stop sleeping in schedule.
 	 */
+	if (!current->ptrace)
+		return exit_code;
+
 	set_special_state(TASK_TRACED);
 
 	/*
@@ -2259,54 +2265,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 
 	spin_unlock_irq(&current->sighand->siglock);
 	read_lock(&tasklist_lock);
-	if (likely(current->ptrace)) {
-		/*
-		 * Notify parents of the stop.
-		 *
-		 * While ptraced, there are two parents - the ptracer and
-		 * the real_parent of the group_leader.  The ptracer should
-		 * know about every stop while the real parent is only
-		 * interested in the completion of group stop.  The states
-		 * for the two don't interact with each other.  Notify
-		 * separately unless they're gonna be duplicates.
-		 */
+	/*
+	 * Notify parents of the stop.
+	 *
+	 * While ptraced, there are two parents - the ptracer and
+	 * the real_parent of the group_leader.  The ptracer should
+	 * know about every stop while the real parent is only
+	 * interested in the completion of group stop.  The states
+	 * for the two don't interact with each other.  Notify
+	 * separately unless they're gonna be duplicates.
+	 */
+	if (current->ptrace)
 		do_notify_parent_cldstop(current, true, why);
-		if (gstop_done && ptrace_reparented(current))
-			do_notify_parent_cldstop(current, false, why);
+	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+		do_notify_parent_cldstop(current, false, why);
 
-		/*
-		 * Don't want to allow preemption here, because
-		 * sys_ptrace() needs this task to be inactive.
-		 *
-		 * XXX: implement read_unlock_no_resched().
-		 */
-		preempt_disable();
-		read_unlock(&tasklist_lock);
-		cgroup_enter_frozen();
-		preempt_enable_no_resched();
-		freezable_schedule();
-		cgroup_leave_frozen(true);
-	} else {
-		/*
-		 * By the time we got the lock, our tracer went away.
-		 * Don't drop the lock yet, another tracer may come.
-		 *
-		 * If @gstop_done, the ptracer went away between group stop
-		 * completion and here.  During detach, it would have set
-		 * JOBCTL_STOP_PENDING on us and we'll re-enter
-		 * TASK_STOPPED in do_signal_stop() on return, so notifying
-		 * the real parent of the group stop completion is enough.
-		 */
-		if (gstop_done)
-			do_notify_parent_cldstop(current, false, why);
-
-		/* tasklist protects us from ptrace_freeze_traced() */
-		__set_current_state(TASK_RUNNING);
-		read_code = false;
-		if (clear_code)
-			exit_code = 0;
-		read_unlock(&tasklist_lock);
-	}
+	/*
+	 * Don't want to allow preemption here, because
+	 * sys_ptrace() needs this task to be inactive.
+	 *
+	 * XXX: implement read_unlock_no_resched().
+	 */
+	preempt_disable();
+	read_unlock(&tasklist_lock);
+	cgroup_enter_frozen();
+	preempt_enable_no_resched();
+	freezable_schedule();
+	cgroup_leave_frozen(true);
 
 	/*
 	 * We are back.  Now reacquire the siglock before touching
@@ -2314,8 +2299,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	if (read_code)
-		exit_code = current->exit_code;
+	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2343,7 +2327,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, 1, message, &info);
+	return ptrace_stop(exit_code, why, message, &info);
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2499,7 @@ static void do_jobctl_trap(void)
 				 CLD_STOPPED, 0);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
 	}
 }
 
@@ -2568,7 +2552,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	if (signr = 0)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implement a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is set
in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 14 ++++++--------
 6 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index d2d0c753156c..a58b68a2d3c6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,14 +2209,12 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 *
-	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
-	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
-	 * here to prevent ptrace_stop sleeping in schedule.
+	 * After this point ptrace_signal_wake_up or signal_wake_up
+	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
+	 * signal comes in.  Handle previous ptrace_unlinks and fatal
+	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace)
+	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
@@ -2305,7 +2303,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implement a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is set
in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 14 ++++++--------
 6 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) == __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent == current) {
-		WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index d2d0c753156c..a58b68a2d3c6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,14 +2209,12 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 *
-	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
-	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
-	 * here to prevent ptrace_stop sleeping in schedule.
+	 * After this point ptrace_signal_wake_up or signal_wake_up
+	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
+	 * signal comes in.  Handle previous ptrace_unlinks and fatal
+	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace)
+	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
@@ -2305,7 +2303,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implement a new jobctl flag TASK_PTRACE_FROZEN.  This new flag is set
in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal.  Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set.  This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep.  As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending.   The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop.  Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched.h        |  2 +-
 include/linux/sched/jobctl.h |  2 ++
 include/linux/sched/signal.h |  5 +++--
 kernel/ptrace.c              | 21 ++++++++-------------
 kernel/sched/core.c          |  5 +----
 kernel/signal.c              | 14 ++++++--------
 6 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
 #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED			__TASK_TRACED
 
 #define TASK_IDLE			(TASK_UNINTERRUPTIBLE | TASK_NOLOAD)
 
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);
 
 extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
-		WRITE_ONCE(task->__state, __TASK_TRACED);
+		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
 	spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 
 static void ptrace_unfreeze_traced(struct task_struct *task)
 {
-	if (READ_ONCE(task->__state) != __TASK_TRACED)
-		return;
-
-	WARN_ON(!task->ptrace || task->parent != current);
+	unsigned long flags;
 
 	/*
-	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
-	 * Recheck state under the lock to close this race.
+	 * The child may be awake and may have cleared
+	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
+	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
 	 */
-	spin_lock_irq(&task->sighand->siglock);
-	if (READ_ONCE(task->__state) = __TASK_TRACED) {
+	if (lock_task_sighand(task, &flags)) {
+		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
 		if (__fatal_signal_pending(task))
 			wake_up_state(task, __TASK_TRACED);
-		else
-			WRITE_ONCE(task->__state, TASK_TRACED);
+		unlock_task_sighand(task, &flags);
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 }
 
 /**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	 */
 	read_lock(&tasklist_lock);
 	if (child->ptrace && child->parent = current) {
-		WARN_ON(READ_ONCE(child->__state) = __TASK_TRACED);
 		/*
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 	/*
 	 * We must load prev->state once (task_struct::state is volatile), such
-	 * that:
-	 *
-	 *  - we form a control dependency vs deactivate_task() below.
-	 *  - ptrace_{,un}freeze_traced() can change ->state underneath us.
+	 * that we form a control dependency vs deactivate_task() below.
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index d2d0c753156c..a58b68a2d3c6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,14 +2209,12 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	}
 
 	/*
-	 * schedule() will not sleep if there is a pending signal that
-	 * can awaken the task.
-	 *
-	 * After this point ptrace_signal_wake_up will clear TASK_TRACED
-	 * if ptrace_unlink happens.  Handle previous ptrace_unlinks
-	 * here to prevent ptrace_stop sleeping in schedule.
+	 * After this point ptrace_signal_wake_up or signal_wake_up
+	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
+	 * signal comes in.  Handle previous ptrace_unlinks and fatal
+	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace)
+	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
@@ -2305,7 +2303,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	current->exit_code = 0;
 
 	/* LISTENING can be set only during STOP traps, clear it */
-	current->jobctl &= ~JOBCTL_LISTENING;
+	current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
 
 	/*
 	 * Queued signals ignored us while we were stopped for tracing.
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 11/12] ptrace: Always take siglock in ptrace_resume
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 11/12] ptrace: Always take siglock in ptrace_resume
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code == data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data == 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 11/12] ptrace: Always take siglock in ptrace_resume
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W. Biederman

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
 static int ptrace_resume(struct task_struct *child, long request,
 			 unsigned long data)
 {
-	bool need_siglock;
-
 	if (!valid_signal(data))
 		return -EIO;
 
@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * Note that we need siglock even if ->exit_code = data and/or this
 	 * status was not reported yet, the new status must not be cleared by
 	 * wait_task_stopped() after resume.
-	 *
-	 * If data = 0 we do not care if wait_task_stopped() reports the old
-	 * status and clears the code too; this can't race with the tracee, it
-	 * takes siglock after resume.
 	 */
-	need_siglock = data && !thread_group_empty(current);
-	if (need_siglock)
-		spin_lock_irq(&child->sighand->siglock);
+	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
 	wake_up_state(child, __TASK_TRACED);
-	if (need_siglock)
-		spin_unlock_irq(&child->sighand->siglock);
+	spin_unlock_irq(&child->sighand->siglock);
 
 	return 0;
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-05 18:26               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index a58b68a2d3c6..e782c2611b64 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2436,6 +2441,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 12/12] sched, signal, ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index a58b68a2d3c6..e782c2611b64 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2436,6 +2441,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-05-05 18:26               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-05 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Eric W . Biederman

From: Peter Zijlstra <peterz@infradead.org>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

 - PREEMPT_RT has task->saved_state which complicates matters,
   meaning task_is_{traced,stopped}() needs to check an additional
   variable.

 - An alternative freezer implementation that itself relies on a
   special TASK state would loose TASK_TRACED/TASK_STOPPED and will
   result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
  * didn't add a unnecessary newline in signal.h
  * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
    instead of in signal_wake_up_state.  This prevents the clearing
    of TASK_STOPPED and TASK_TRACED from getting lost.
  * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h        |  8 +++-----
 include/linux/sched/jobctl.h |  6 ++++++
 include/linux/sched/signal.h | 19 +++++++++++++++----
 kernel/ptrace.c              | 16 +++++++++++++---
 kernel/signal.c              | 10 ++++++++--
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;
 
 #define task_is_running(task)		(READ_ONCE((task)->__state) = TASK_RUNNING)
 
-#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task)		((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task)	((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)
 
 /*
  * Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 #define JOBCTL_PTRACE_FROZEN_BIT	24	/* frozen for ptrace */
 
+#define JOBCTL_STOPPED_BIT	26	/* do_signal_stop() */
+#define JOBCTL_TRACED_BIT	27	/* ptrace_stop() */
+
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
 #define JOBCTL_STOP_CONSUME	(1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 #define JOBCTL_PTRACE_FROZEN	(1UL << JOBCTL_PTRACE_FROZEN_BIT)
 
+#define JOBCTL_STOPPED		(1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED		(1UL << JOBCTL_TRACED_BIT)
+
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
 static inline void kernel_signal_stop(void)
 {
 	spin_lock_irq(&current->sighand->siglock);
-	if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+	if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
 
 static inline void signal_wake_up(struct task_struct *t, bool fatal)
 {
-	fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
-	signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+		t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+		state = TASK_WAKEKILL | __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
 {
-	signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+	unsigned int state = 0;
+	if (resume) {
+		t->jobctl &= ~JOBCTL_TRACED;
+		state = __TASK_TRACED;
+	}
+	signal_wake_up_state(t, state);
 }
 
 void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
 	return true;
 }
 
-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
 static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
 	 */
 	if (lock_task_sighand(task, &flags)) {
 		task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
-		if (__fatal_signal_pending(task))
+		if (__fatal_signal_pending(task)) {
+			task->jobctl &= ~TASK_TRACED;
 			wake_up_state(task, __TASK_TRACED);
+		}
 		unlock_task_sighand(task, &flags);
 	}
 }
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * in and out of STOPPED are protected by siglock.
 	 */
 	if (task_is_stopped(task) &&
-	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+	    task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+		task->jobctl &= ~JOBCTL_STOPPED;
 		signal_wake_up_state(task, __TASK_STOPPED);
+	}
 
 	spin_unlock(&task->sighand->siglock);
 
@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 */
 	spin_lock_irq(&child->sighand->siglock);
 	child->exit_code = data;
+	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index a58b68a2d3c6..e782c2611b64 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
  */
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
+	lockdep_assert_held(&t->sighand->siglock);
+
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		for_each_thread(p, t) {
 			flush_sigqueue_mask(&flush, &t->pending);
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_SEIZED))) {
+				t->jobctl &= ~JOBCTL_STOPPED;
 				wake_up_state(t, __TASK_STOPPED);
-			else
+			} else
 				ptrace_trap_notify(t);
 		}
 
@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	current->jobctl |= JOBCTL_TRACED;
 
 	/*
 	 * We're committing to trapping.  TRACED should be visible before
@@ -2436,6 +2441,7 @@ static bool do_signal_stop(int signr)
 		if (task_participate_group_stop(current))
 			notify = CLD_STOPPED;
 
+		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
  2022-05-05 18:26               ` Eric W. Biederman
  (?)
@ 2022-05-06  6:55                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-06  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-05 13:26:41 [-0500], Eric W. Biederman wrote:
> After ptrace_freeze_traced succeeds it is known that the the tracee
                                                       the

> has a __state value of __TASK_TRACED and that no __ptrace_unlink will
> happen because the tracer is waiting for the tracee, and the tracee is
> in ptrace_stop.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
@ 2022-05-06  6:55                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-06  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-05 13:26:41 [-0500], Eric W. Biederman wrote:
> After ptrace_freeze_traced succeeds it is known that the the tracee
                                                       the

> has a __state value of __TASK_TRACED and that no __ptrace_unlink will
> happen because the tracer is waiting for the tracee, and the tracee is
> in ptrace_stop.

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail
@ 2022-05-06  6:55                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-06  6:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 2022-05-05 13:26:41 [-0500], Eric W. Biederman wrote:
> After ptrace_freeze_traced succeeds it is known that the the tracee
                                                       the

> has a __state value of __TASK_TRACED and that no __ptrace_unlink will
> happen because the tracer is waiting for the tracee, and the tracee is
> in ptrace_stop.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-06 14:14               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

I can't comment 5/12. to be honest I didn't even try to look into
arch/ia64/.

But other than that I see no problems in this version. However, I'd
like to actually apply the whole series and read the changed code
carefully, but sorry, I don't think I can do this before Monday.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 14:14               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

I can't comment 5/12. to be honest I didn't even try to look into
arch/ia64/.

But other than that I see no problems in this version. However, I'd
like to actually apply the whole series and read the changed code
carefully, but sorry, I don't think I can do this before Monday.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 14:14               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

I can't comment 5/12. to be honest I didn't even try to look into
arch/ia64/.

But other than that I see no problems in this version. However, I'd
like to actually apply the whole series and read the changed code
carefully, but sorry, I don't think I can do this before Monday.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-06 14:14               ` Oleg Nesterov
  (?)
@ 2022-05-06 14:38                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 14:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11): signal: Rename send_signal send_signal_locked
>> signal: Replace __group_send_sig_info with send_signal_locked
>> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP ptrace/xtensa:
>> Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace: Remove
>> arch_ptrace_attach signal: Use lockdep_assert_held instead of
>> assert_spin_locked ptrace: Reimplement PTRACE_KILL by always sending
>> SIGKILL ptrace: Document that wait_task_inactive can't fail ptrace:
>> Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Don't change
>> __state ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> I can't comment 5/12. to be honest I didn't even try to look into
> arch/ia64/.

I just looked at arch_ptrace_attach again and I spotted what looks like
a fairly easy analysis that is mostly arch-generic code that shows this
is dead code on ia64.

On ia64 arch_ptrace_attach is ptrace_attach_sync_user_rbs, and does
nothing if __state is not TASK_STOPPED.

When arch_ptrace_attach is called after ptrace_traceme __state is
TASK_RUNNING pretty much by definition as we are running in the
child.  Therefore ptrace_attach_sync_user_rbs does nothing in that case.

When arch_ptrace_attach is called after ptrace_attach __state there
are two possibilities.  If the tracee was already in TASK_STOPPED
before the ptrace_attach, the tracee will be in TASK_TRACED.
Otherwise the tracee will be in TASK_TRACED or on it's way to stopping
in TASK_TRACED.

Unless I totally misread ptrace_attach.  There is no way that after
a successful ptrace_attach for the tracee to be in TASK_STOPPED.
This makes ptrace_attach_sync_user_rbs a big noop, AKA dead code.
So it can be removed.

> But other than that I see no problems in this version. However, I'd
> like to actually apply the whole series and read the changed code
> carefully, but sorry, I don't think I can do this before Monday.

No rush.  I don't expect the merge window will open for a while yet.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 14:38                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 14:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11): signal: Rename send_signal send_signal_locked
>> signal: Replace __group_send_sig_info with send_signal_locked
>> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP ptrace/xtensa:
>> Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace: Remove
>> arch_ptrace_attach signal: Use lockdep_assert_held instead of
>> assert_spin_locked ptrace: Reimplement PTRACE_KILL by always sending
>> SIGKILL ptrace: Document that wait_task_inactive can't fail ptrace:
>> Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Don't change
>> __state ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> I can't comment 5/12. to be honest I didn't even try to look into
> arch/ia64/.

I just looked at arch_ptrace_attach again and I spotted what looks like
a fairly easy analysis that is mostly arch-generic code that shows this
is dead code on ia64.

On ia64 arch_ptrace_attach is ptrace_attach_sync_user_rbs, and does
nothing if __state is not TASK_STOPPED.

When arch_ptrace_attach is called after ptrace_traceme __state is
TASK_RUNNING pretty much by definition as we are running in the
child.  Therefore ptrace_attach_sync_user_rbs does nothing in that case.

When arch_ptrace_attach is called after ptrace_attach __state there
are two possibilities.  If the tracee was already in TASK_STOPPED
before the ptrace_attach, the tracee will be in TASK_TRACED.
Otherwise the tracee will be in TASK_TRACED or on it's way to stopping
in TASK_TRACED.

Unless I totally misread ptrace_attach.  There is no way that after
a successful ptrace_attach for the tracee to be in TASK_STOPPED.
This makes ptrace_attach_sync_user_rbs a big noop, AKA dead code.
So it can be removed.

> But other than that I see no problems in this version. However, I'd
> like to actually apply the whole series and read the changed code
> carefully, but sorry, I don't think I can do this before Monday.

No rush.  I don't expect the merge window will open for a while yet.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 14:38                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 14:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11): signal: Rename send_signal send_signal_locked
>> signal: Replace __group_send_sig_info with send_signal_locked
>> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP ptrace/xtensa:
>> Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace: Remove
>> arch_ptrace_attach signal: Use lockdep_assert_held instead of
>> assert_spin_locked ptrace: Reimplement PTRACE_KILL by always sending
>> SIGKILL ptrace: Document that wait_task_inactive can't fail ptrace:
>> Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Don't change
>> __state ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> I can't comment 5/12. to be honest I didn't even try to look into
> arch/ia64/.

I just looked at arch_ptrace_attach again and I spotted what looks like
a fairly easy analysis that is mostly arch-generic code that shows this
is dead code on ia64.

On ia64 arch_ptrace_attach is ptrace_attach_sync_user_rbs, and does
nothing if __state is not TASK_STOPPED.

When arch_ptrace_attach is called after ptrace_traceme __state is
TASK_RUNNING pretty much by definition as we are running in the
child.  Therefore ptrace_attach_sync_user_rbs does nothing in that case.

When arch_ptrace_attach is called after ptrace_attach __state there
are two possibilities.  If the tracee was already in TASK_STOPPED
before the ptrace_attach, the tracee will be in TASK_TRACED.
Otherwise the tracee will be in TASK_TRACED or on it's way to stopping
in TASK_TRACED.

Unless I totally misread ptrace_attach.  There is no way that after
a successful ptrace_attach for the tracee to be in TASK_STOPPED.
This makes ptrace_attach_sync_user_rbs a big noop, AKA dead code.
So it can be removed.

> But other than that I see no problems in this version. However, I'd
> like to actually apply the whole series and read the changed code
> carefully, but sorry, I don't think I can do this before Monday.

No rush.  I don't expect the merge window will open for a while yet.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-05 18:26               ` Eric W. Biederman
  (?)
@ 2022-05-06 15:09                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 15:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -103,7 +103,7 @@ struct task_group;
>  /* Convenience macros for the sake of set_current_state: */
>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED

however I personally still dislike this change. But let me read the
code with this series applied, perhaps I will change my mind. If not,
I will argue ;)

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-06 15:09                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 15:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -103,7 +103,7 @@ struct task_group;
>  /* Convenience macros for the sake of set_current_state: */
>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED

however I personally still dislike this change. But let me read the
code with this series applied, perhaps I will change my mind. If not,
I will argue ;)

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-06 15:09                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-06 15:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -103,7 +103,7 @@ struct task_group;
>  /* Convenience macros for the sake of set_current_state: */
>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED			__TASK_TRACED

however I personally still dislike this change. But let me read the
code with this series applied, perhaps I will change my mind. If not,
I will argue ;)

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-06 15:09                 ` Oleg Nesterov
  (?)
@ 2022-05-06 19:42                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 19:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -103,7 +103,7 @@ struct task_group;
>>  /* Convenience macros for the sake of set_current_state: */
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
>
> however I personally still dislike this change. But let me read the
> code with this series applied, perhaps I will change my mind. If not,
> I will argue ;)

That is fair.  I kind of grew on my after I implemented it and wrapped
my head around what was going on, as it is simple and there are no
implicit cases.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-06 19:42                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 19:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -103,7 +103,7 @@ struct task_group;
>>  /* Convenience macros for the sake of set_current_state: */
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
>
> however I personally still dislike this change. But let me read the
> code with this series applied, perhaps I will change my mind. If not,
> I will argue ;)

That is fair.  I kind of grew on my after I implemented it and wrapped
my head around what was going on, as it is simple and there are no
implicit cases.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-06 19:42                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 19:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -103,7 +103,7 @@ struct task_group;
>>  /* Convenience macros for the sake of set_current_state: */
>>  #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>>  #define TASK_STOPPED			(TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED			(TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED			__TASK_TRACED
>
> however I personally still dislike this change. But let me read the
> code with this series applied, perhaps I will change my mind. If not,
> I will argue ;)

That is fair.  I kind of grew on my after I implemented it and wrapped
my head around what was going on, as it is simple and there are no
implicit cases.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-05 18:25             ` Eric W. Biederman
@ 2022-05-06 21:26               ` Kees Cook
  -1 siblings, 0 replies; 572+ messages in thread
From: Kees Cook @ 2022-05-06 21:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, linux-ia64, Robert O'Callahan,
	Kyle Huey

On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.
> 
> There are a lot of details we have to get right to sort out the
> technical challenges and this is my parred back version of the changes
> that contains just those problems I see good solutions to that I believe
> are ready.
> 
> A couple of issues have been pointed but I think this parred back set of
> changes is still on the right track.  The biggest change in v4 is the
> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
> two patches because the dependency I thought exited between two
> different changes did not exist.  The rest of the changes are minor
> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
> removing an always true branch, and adding an early  test to see if the
> ptracer had gone, before TASK_TRAPPING was set.
> 
> This set of changes should support Peter's freezer rewrite, and with the
> addition of changing wait_task_inactive(TASK_TRACED) to be
> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
> races or issues to be concerned about from the ptrace side.
> 
> More work is needed to support PREEMPT_RT, but these changes get things
> closer.
> 
> This set of changes continues to look like it will provide a firm
> foundation for solving the PREEMPT_RT and freezer challenges.

One of the more sensitive projects to changes around ptrace is rr
(Robert and Kyle added to CC). I ran rr's selftests before/after this
series and saw no changes. My failures remained the same; I assume
they're due to missing CPU features (pkeys) or build configs (bpf), etc:

99% tests passed, 19 tests failed out of 2777

Total Test time (real) = 773.40 sec

The following tests FAILED:
         42 - bpf_map (Failed)
         43 - bpf_map-no-syscallbuf (Failed)
        414 - netfilter (Failed)
        415 - netfilter-no-syscallbuf (Failed)
        454 - x86/pkeys (Failed)
        455 - x86/pkeys-no-syscallbuf (Failed)
        1152 - ttyname (Failed)
        1153 - ttyname-no-syscallbuf (Failed)
        1430 - bpf_map-32 (Failed)
        1431 - bpf_map-32-no-syscallbuf (Failed)
        1502 - detach_sigkill-32 (Failed)
        1802 - netfilter-32 (Failed)
        1803 - netfilter-32-no-syscallbuf (Failed)
        1842 - x86/pkeys-32 (Failed)
        1843 - x86/pkeys-32-no-syscallbuf (Failed)
        2316 - crash_in_function-32 (Failed)
        2317 - crash_in_function-32-no-syscallbuf (Failed)
        2540 - ttyname-32 (Failed)
        2541 - ttyname-32-no-syscallbuf (Failed)

So, I guess:

Tested-by: Kees Cook <keescook@chromium.org>

:)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 21:26               ` Kees Cook
  0 siblings, 0 replies; 572+ messages in thread
From: Kees Cook @ 2022-05-06 21:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, linux-ia64, Robert O'Callahan,
	Kyle Huey

On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups.  This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.
> 
> There are a lot of details we have to get right to sort out the
> technical challenges and this is my parred back version of the changes
> that contains just those problems I see good solutions to that I believe
> are ready.
> 
> A couple of issues have been pointed but I think this parred back set of
> changes is still on the right track.  The biggest change in v4 is the
> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
> two patches because the dependency I thought exited between two
> different changes did not exist.  The rest of the changes are minor
> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
> removing an always true branch, and adding an early  test to see if the
> ptracer had gone, before TASK_TRAPPING was set.
> 
> This set of changes should support Peter's freezer rewrite, and with the
> addition of changing wait_task_inactive(TASK_TRACED) to be
> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
> races or issues to be concerned about from the ptrace side.
> 
> More work is needed to support PREEMPT_RT, but these changes get things
> closer.
> 
> This set of changes continues to look like it will provide a firm
> foundation for solving the PREEMPT_RT and freezer challenges.

One of the more sensitive projects to changes around ptrace is rr
(Robert and Kyle added to CC). I ran rr's selftests before/after this
series and saw no changes. My failures remained the same; I assume
they're due to missing CPU features (pkeys) or build configs (bpf), etc:

99% tests passed, 19 tests failed out of 2777

Total Test time (real) = 773.40 sec

The following tests FAILED:
         42 - bpf_map (Failed)
         43 - bpf_map-no-syscallbuf (Failed)
        414 - netfilter (Failed)
        415 - netfilter-no-syscallbuf (Failed)
        454 - x86/pkeys (Failed)
        455 - x86/pkeys-no-syscallbuf (Failed)
        1152 - ttyname (Failed)
        1153 - ttyname-no-syscallbuf (Failed)
        1430 - bpf_map-32 (Failed)
        1431 - bpf_map-32-no-syscallbuf (Failed)
        1502 - detach_sigkill-32 (Failed)
        1802 - netfilter-32 (Failed)
        1803 - netfilter-32-no-syscallbuf (Failed)
        1842 - x86/pkeys-32 (Failed)
        1843 - x86/pkeys-32-no-syscallbuf (Failed)
        2316 - crash_in_function-32 (Failed)
        2317 - crash_in_function-32-no-syscallbuf (Failed)
        2540 - ttyname-32 (Failed)
        2541 - ttyname-32-no-syscallbuf (Failed)

So, I guess:

Tested-by: Kees Cook <keescook@chromium.org>

:)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-06 21:26               ` Kees Cook
  (?)
@ 2022-05-06 21:59                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 21:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, linux-ia64, Robert O'Callahan,
	Kyle Huey

Kees Cook <keescook@chromium.org> writes:

> On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
>> The states TASK_STOPPED and TASK_TRACE are special in they can not
>> handle spurious wake-ups.  This plus actively depending upon and
>> changing the value of tsk->__state causes problems for PREEMPT_RT and
>> Peter's freezer rewrite.
>> 
>> There are a lot of details we have to get right to sort out the
>> technical challenges and this is my parred back version of the changes
>> that contains just those problems I see good solutions to that I believe
>> are ready.
>> 
>> A couple of issues have been pointed but I think this parred back set of
>> changes is still on the right track.  The biggest change in v4 is the
>> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
>> two patches because the dependency I thought exited between two
>> different changes did not exist.  The rest of the changes are minor
>> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
>> removing an always true branch, and adding an early  test to see if the
>> ptracer had gone, before TASK_TRAPPING was set.
>> 
>> This set of changes should support Peter's freezer rewrite, and with the
>> addition of changing wait_task_inactive(TASK_TRACED) to be
>> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
>> races or issues to be concerned about from the ptrace side.
>> 
>> More work is needed to support PREEMPT_RT, but these changes get things
>> closer.
>> 
>> This set of changes continues to look like it will provide a firm
>> foundation for solving the PREEMPT_RT and freezer challenges.
>
> One of the more sensitive projects to changes around ptrace is rr
> (Robert and Kyle added to CC). I ran rr's selftests before/after this
> series and saw no changes. My failures remained the same; I assume
> they're due to missing CPU features (pkeys) or build configs (bpf), etc:
>
> 99% tests passed, 19 tests failed out of 2777
>
> Total Test time (real) = 773.40 sec
>
> The following tests FAILED:
>          42 - bpf_map (Failed)
>          43 - bpf_map-no-syscallbuf (Failed)
>         414 - netfilter (Failed)
>         415 - netfilter-no-syscallbuf (Failed)
>         454 - x86/pkeys (Failed)
>         455 - x86/pkeys-no-syscallbuf (Failed)
>         1152 - ttyname (Failed)
>         1153 - ttyname-no-syscallbuf (Failed)
>         1430 - bpf_map-32 (Failed)
>         1431 - bpf_map-32-no-syscallbuf (Failed)
>         1502 - detach_sigkill-32 (Failed)
>         1802 - netfilter-32 (Failed)
>         1803 - netfilter-32-no-syscallbuf (Failed)
>         1842 - x86/pkeys-32 (Failed)
>         1843 - x86/pkeys-32-no-syscallbuf (Failed)
>         2316 - crash_in_function-32 (Failed)
>         2317 - crash_in_function-32-no-syscallbuf (Failed)
>         2540 - ttyname-32 (Failed)
>         2541 - ttyname-32-no-syscallbuf (Failed)
>
> So, I guess:
>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> :)

Thank you.  I was thinking it would be good to add the rr folks to the
discussion.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 21:59                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 21:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, linux-ia64, Robert O'Callahan,
	Kyle Huey

Kees Cook <keescook@chromium.org> writes:

> On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
>> The states TASK_STOPPED and TASK_TRACE are special in they can not
>> handle spurious wake-ups.  This plus actively depending upon and
>> changing the value of tsk->__state causes problems for PREEMPT_RT and
>> Peter's freezer rewrite.
>> 
>> There are a lot of details we have to get right to sort out the
>> technical challenges and this is my parred back version of the changes
>> that contains just those problems I see good solutions to that I believe
>> are ready.
>> 
>> A couple of issues have been pointed but I think this parred back set of
>> changes is still on the right track.  The biggest change in v4 is the
>> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
>> two patches because the dependency I thought exited between two
>> different changes did not exist.  The rest of the changes are minor
>> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
>> removing an always true branch, and adding an early  test to see if the
>> ptracer had gone, before TASK_TRAPPING was set.
>> 
>> This set of changes should support Peter's freezer rewrite, and with the
>> addition of changing wait_task_inactive(TASK_TRACED) to be
>> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
>> races or issues to be concerned about from the ptrace side.
>> 
>> More work is needed to support PREEMPT_RT, but these changes get things
>> closer.
>> 
>> This set of changes continues to look like it will provide a firm
>> foundation for solving the PREEMPT_RT and freezer challenges.
>
> One of the more sensitive projects to changes around ptrace is rr
> (Robert and Kyle added to CC). I ran rr's selftests before/after this
> series and saw no changes. My failures remained the same; I assume
> they're due to missing CPU features (pkeys) or build configs (bpf), etc:
>
> 99% tests passed, 19 tests failed out of 2777
>
> Total Test time (real) = 773.40 sec
>
> The following tests FAILED:
>          42 - bpf_map (Failed)
>          43 - bpf_map-no-syscallbuf (Failed)
>         414 - netfilter (Failed)
>         415 - netfilter-no-syscallbuf (Failed)
>         454 - x86/pkeys (Failed)
>         455 - x86/pkeys-no-syscallbuf (Failed)
>         1152 - ttyname (Failed)
>         1153 - ttyname-no-syscallbuf (Failed)
>         1430 - bpf_map-32 (Failed)
>         1431 - bpf_map-32-no-syscallbuf (Failed)
>         1502 - detach_sigkill-32 (Failed)
>         1802 - netfilter-32 (Failed)
>         1803 - netfilter-32-no-syscallbuf (Failed)
>         1842 - x86/pkeys-32 (Failed)
>         1843 - x86/pkeys-32-no-syscallbuf (Failed)
>         2316 - crash_in_function-32 (Failed)
>         2317 - crash_in_function-32-no-syscallbuf (Failed)
>         2540 - ttyname-32 (Failed)
>         2541 - ttyname-32-no-syscallbuf (Failed)
>
> So, I guess:
>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> :)

Thank you.  I was thinking it would be good to add the rr folks to the
discussion.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-06 21:59                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-06 21:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, linux-ia64, Robert O'Callahan,
	Kyle Huey

Kees Cook <keescook@chromium.org> writes:

> On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
>> The states TASK_STOPPED and TASK_TRACE are special in they can not
>> handle spurious wake-ups.  This plus actively depending upon and
>> changing the value of tsk->__state causes problems for PREEMPT_RT and
>> Peter's freezer rewrite.
>> 
>> There are a lot of details we have to get right to sort out the
>> technical challenges and this is my parred back version of the changes
>> that contains just those problems I see good solutions to that I believe
>> are ready.
>> 
>> A couple of issues have been pointed but I think this parred back set of
>> changes is still on the right track.  The biggest change in v4 is the
>> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
>> two patches because the dependency I thought exited between two
>> different changes did not exist.  The rest of the changes are minor
>> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
>> removing an always true branch, and adding an early  test to see if the
>> ptracer had gone, before TASK_TRAPPING was set.
>> 
>> This set of changes should support Peter's freezer rewrite, and with the
>> addition of changing wait_task_inactive(TASK_TRACED) to be
>> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
>> races or issues to be concerned about from the ptrace side.
>> 
>> More work is needed to support PREEMPT_RT, but these changes get things
>> closer.
>> 
>> This set of changes continues to look like it will provide a firm
>> foundation for solving the PREEMPT_RT and freezer challenges.
>
> One of the more sensitive projects to changes around ptrace is rr
> (Robert and Kyle added to CC). I ran rr's selftests before/after this
> series and saw no changes. My failures remained the same; I assume
> they're due to missing CPU features (pkeys) or build configs (bpf), etc:
>
> 99% tests passed, 19 tests failed out of 2777
>
> Total Test time (real) = 773.40 sec
>
> The following tests FAILED:
>          42 - bpf_map (Failed)
>          43 - bpf_map-no-syscallbuf (Failed)
>         414 - netfilter (Failed)
>         415 - netfilter-no-syscallbuf (Failed)
>         454 - x86/pkeys (Failed)
>         455 - x86/pkeys-no-syscallbuf (Failed)
>         1152 - ttyname (Failed)
>         1153 - ttyname-no-syscallbuf (Failed)
>         1430 - bpf_map-32 (Failed)
>         1431 - bpf_map-32-no-syscallbuf (Failed)
>         1502 - detach_sigkill-32 (Failed)
>         1802 - netfilter-32 (Failed)
>         1803 - netfilter-32-no-syscallbuf (Failed)
>         1842 - x86/pkeys-32 (Failed)
>         1843 - x86/pkeys-32-no-syscallbuf (Failed)
>         2316 - crash_in_function-32 (Failed)
>         2317 - crash_in_function-32-no-syscallbuf (Failed)
>         2540 - ttyname-32 (Failed)
>         2541 - ttyname-32-no-syscallbuf (Failed)
>
> So, I guess:
>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> :)

Thank you.  I was thinking it would be good to add the rr folks to the
discussion.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-10 14:11               ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

OK, lgtm.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
find a good argument against it ;) and yes, this is subjective.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:11               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

OK, lgtm.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
find a good argument against it ;) and yes, this is subjective.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:11               ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
>       signal: Rename send_signal send_signal_locked
>       signal: Replace __group_send_sig_info with send_signal_locked
>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>       ptrace: Remove arch_ptrace_attach
>       signal: Use lockdep_assert_held instead of assert_spin_locked
>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>       ptrace: Document that wait_task_inactive can't fail
>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>       ptrace: Don't change __state
>       ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

OK, lgtm.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
find a good argument against it ;) and yes, this is subjective.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-05 18:26               ` Eric W. Biederman
  (?)
@ 2022-05-10 14:23                 ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>  
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {

But I still think that a lockless

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

check at the start of ptrace_unfreeze_traced() makes sense to avoid
lock_task_sighand() if possible.

And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
JOBCTL_TRACED to make this optimization work better. The same for
ptrace_signal_wake_up().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 14:23                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>  
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {

But I still think that a lockless

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

check at the start of ptrace_unfreeze_traced() makes sense to avoid
lock_task_sighand() if possible.

And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
JOBCTL_TRACED to make this optimization work better. The same for
ptrace_signal_wake_up().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 14:23                 ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 14:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/05, Eric W. Biederman wrote:
>
>  static void ptrace_unfreeze_traced(struct task_struct *task)
>  {
> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
> -		return;
> -
> -	WARN_ON(!task->ptrace || task->parent != current);
> +	unsigned long flags;
>  
>  	/*
> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> -	 * Recheck state under the lock to close this race.
> +	 * The child may be awake and may have cleared
> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>  	 */
> -	spin_lock_irq(&task->sighand->siglock);
> -	if (READ_ONCE(task->__state) = __TASK_TRACED) {
> +	if (lock_task_sighand(task, &flags)) {

But I still think that a lockless

	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
		return;

check at the start of ptrace_unfreeze_traced() makes sense to avoid
lock_task_sighand() if possible.

And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
JOBCTL_TRACED to make this optimization work better. The same for
ptrace_signal_wake_up().

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-10 14:11               ` Oleg Nesterov
  (?)
@ 2022-05-10 14:26                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11):
>>       signal: Rename send_signal send_signal_locked
>>       signal: Replace __group_send_sig_info with send_signal_locked
>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>       ptrace: Remove arch_ptrace_attach
>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>       ptrace: Document that wait_task_inactive can't fail
>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>       ptrace: Don't change __state
>>       ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> OK, lgtm.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
>
> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
> find a good argument against it ;) and yes, this is subjective.

Does anyone else have any comments on this patchset?

If not I am going to apply this to a branch and get it into linux-next.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:26                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11):
>>       signal: Rename send_signal send_signal_locked
>>       signal: Replace __group_send_sig_info with send_signal_locked
>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>       ptrace: Remove arch_ptrace_attach
>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>       ptrace: Document that wait_task_inactive can't fail
>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>       ptrace: Don't change __state
>>       ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> OK, lgtm.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
>
> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
> find a good argument against it ;) and yes, this is subjective.

Does anyone else have any comments on this patchset?

If not I am going to apply this to a branch and get it into linux-next.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:26                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11):
>>       signal: Rename send_signal send_signal_locked
>>       signal: Replace __group_send_sig_info with send_signal_locked
>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>       ptrace: Remove arch_ptrace_attach
>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>       ptrace: Document that wait_task_inactive can't fail
>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>       ptrace: Don't change __state
>>       ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> OK, lgtm.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
>
> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
> find a good argument against it ;) and yes, this is subjective.

Does anyone else have any comments on this patchset?

If not I am going to apply this to a branch and get it into linux-next.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-10 14:26                 ` Eric W. Biederman
  (?)
@ 2022-05-10 14:45                   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-10 14:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
> Does anyone else have any comments on this patchset?
> 
> If not I am going to apply this to a branch and get it into linux-next.

Looks good I guess.
Be aware that there will be clash due to
   https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/

which sits currently in -akpm.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:45                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-10 14:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
> Does anyone else have any comments on this patchset?
> 
> If not I am going to apply this to a branch and get it into linux-next.

Looks good I guess.
Be aware that there will be clash due to
   https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/

which sits currently in -akpm.

> Eric

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 14:45                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-10 14:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
> Does anyone else have any comments on this patchset?
> 
> If not I am going to apply this to a branch and get it into linux-next.

Looks good I guess.
Be aware that there will be clash due to
   https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/

which sits currently in -akpm.

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-10 14:23                 ` Oleg Nesterov
  (?)
@ 2022-05-10 15:17                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> -		return;
>> -
>> -	WARN_ON(!task->ptrace || task->parent != current);
>> +	unsigned long flags;
>>  
>>  	/*
>> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
>> -	 * Recheck state under the lock to close this race.
>> +	 * The child may be awake and may have cleared
>> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
>> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>>  	 */
>> -	spin_lock_irq(&task->sighand->siglock);
>> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
>> +	if (lock_task_sighand(task, &flags)) {
>
> But I still think that a lockless
>
> 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> 		return;
>
> check at the start of ptrace_unfreeze_traced() makes sense to avoid
> lock_task_sighand() if possible.
>
> And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> JOBCTL_TRACED to make this optimization work better. The same for
> ptrace_signal_wake_up().

What do you have that suggests that taking siglock there is a problem?

What you propose will definitely work as an incremental change, and
in an incremental change we can explain why doing the stupid simple
thing is not good enough.

I am not really opposed on any grounds except that simplicity is good,
and hard to get wrong.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 15:17                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> -		return;
>> -
>> -	WARN_ON(!task->ptrace || task->parent != current);
>> +	unsigned long flags;
>>  
>>  	/*
>> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
>> -	 * Recheck state under the lock to close this race.
>> +	 * The child may be awake and may have cleared
>> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
>> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>>  	 */
>> -	spin_lock_irq(&task->sighand->siglock);
>> -	if (READ_ONCE(task->__state) == __TASK_TRACED) {
>> +	if (lock_task_sighand(task, &flags)) {
>
> But I still think that a lockless
>
> 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> 		return;
>
> check at the start of ptrace_unfreeze_traced() makes sense to avoid
> lock_task_sighand() if possible.
>
> And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> JOBCTL_TRACED to make this optimization work better. The same for
> ptrace_signal_wake_up().

What do you have that suggests that taking siglock there is a problem?

What you propose will definitely work as an incremental change, and
in an incremental change we can explain why doing the stupid simple
thing is not good enough.

I am not really opposed on any grounds except that simplicity is good,
and hard to get wrong.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 15:17                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>>  static void ptrace_unfreeze_traced(struct task_struct *task)
>>  {
>> -	if (READ_ONCE(task->__state) != __TASK_TRACED)
>> -		return;
>> -
>> -	WARN_ON(!task->ptrace || task->parent != current);
>> +	unsigned long flags;
>>  
>>  	/*
>> -	 * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
>> -	 * Recheck state under the lock to close this race.
>> +	 * The child may be awake and may have cleared
>> +	 * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
>> +	 * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>>  	 */
>> -	spin_lock_irq(&task->sighand->siglock);
>> -	if (READ_ONCE(task->__state) = __TASK_TRACED) {
>> +	if (lock_task_sighand(task, &flags)) {
>
> But I still think that a lockless
>
> 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> 		return;
>
> check at the start of ptrace_unfreeze_traced() makes sense to avoid
> lock_task_sighand() if possible.
>
> And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> JOBCTL_TRACED to make this optimization work better. The same for
> ptrace_signal_wake_up().

What do you have that suggests that taking siglock there is a problem?

What you propose will definitely work as an incremental change, and
in an incremental change we can explain why doing the stupid simple
thing is not good enough.

I am not really opposed on any grounds except that simplicity is good,
and hard to get wrong.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-10 14:45                   ` Sebastian Andrzej Siewior
  (?)
@ 2022-05-10 15:18                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
>> Does anyone else have any comments on this patchset?
>> 
>> If not I am going to apply this to a branch and get it into linux-next.
>
> Looks good I guess.
> Be aware that there will be clash due to
>    https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/
>
> which sits currently in -akpm.

Thanks for the heads up.  That looks like the best kind of conflict.
One where code just disappears.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 15:18                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
>> Does anyone else have any comments on this patchset?
>> 
>> If not I am going to apply this to a branch and get it into linux-next.
>
> Looks good I guess.
> Be aware that there will be clash due to
>    https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/
>
> which sits currently in -akpm.

Thanks for the heads up.  That looks like the best kind of conflict.
One where code just disappears.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-10 15:18                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-10 15:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleg Nesterov, linux-kernel, rjw, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
>> Does anyone else have any comments on this patchset?
>> 
>> If not I am going to apply this to a branch and get it into linux-next.
>
> Looks good I guess.
> Be aware that there will be clash due to
>    https://lore.kernel.org/all/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn/
>
> which sits currently in -akpm.

Thanks for the heads up.  That looks like the best kind of conflict.
One where code just disappears.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
  2022-05-10 15:17                   ` Eric W. Biederman
  (?)
@ 2022-05-10 15:34                     ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 15:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/10, Eric W. Biederman wrote:
>
> > But I still think that a lockless
> >
> > 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> > 		return;
> >
> > check at the start of ptrace_unfreeze_traced() makes sense to avoid
> > lock_task_sighand() if possible.
> >
> > And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> > JOBCTL_TRACED to make this optimization work better. The same for
> > ptrace_signal_wake_up().
>
> What do you have that suggests that taking siglock there is a problem?

Not necessarily a problem, but this optimization is free. If the tracee
was resumed, it can compete for siglock with debugger.

> What you propose will definitely work as an incremental change, and
> in an incremental change we can explain why doing the stupid simple
> thing is not good enough.

OK.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 15:34                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 15:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/10, Eric W. Biederman wrote:
>
> > But I still think that a lockless
> >
> > 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> > 		return;
> >
> > check at the start of ptrace_unfreeze_traced() makes sense to avoid
> > lock_task_sighand() if possible.
> >
> > And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> > JOBCTL_TRACED to make this optimization work better. The same for
> > ptrace_signal_wake_up().
>
> What do you have that suggests that taking siglock there is a problem?

Not necessarily a problem, but this optimization is free. If the tracee
was resumed, it can compete for siglock with debugger.

> What you propose will definitely work as an incremental change, and
> in an incremental change we can explain why doing the stupid simple
> thing is not good enough.

OK.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 10/12] ptrace: Don't change __state
@ 2022-05-10 15:34                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-10 15:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64

On 05/10, Eric W. Biederman wrote:
>
> > But I still think that a lockless
> >
> > 	if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
> > 		return;
> >
> > check at the start of ptrace_unfreeze_traced() makes sense to avoid
> > lock_task_sighand() if possible.
> >
> > And ptrace_resume() can probably clear JOBCTL_PTRACE_FROZEN along with
> > JOBCTL_TRACED to make this optimization work better. The same for
> > ptrace_signal_wake_up().
>
> What do you have that suggests that taking siglock there is a problem?

Not necessarily a problem, but this optimization is free. If the tracee
was resumed, it can compete for siglock with debugger.

> What you propose will definitely work as an incremental change, and
> in an incremental change we can explain why doing the stupid simple
> thing is not good enough.

OK.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
  2022-05-10 14:26                 ` Eric W. Biederman
  (?)
@ 2022-05-11 20:00                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-11 20:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/05, Eric W. Biederman wrote:
>>>
>>> Eric W. Biederman (11):
>>>       signal: Rename send_signal send_signal_locked
>>>       signal: Replace __group_send_sig_info with send_signal_locked
>>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>>       ptrace: Remove arch_ptrace_attach
>>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>>       ptrace: Document that wait_task_inactive can't fail
>>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>>       ptrace: Don't change __state
>>>       ptrace: Always take siglock in ptrace_resume
>>>
>>> Peter Zijlstra (1):
>>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>>
>> OK, lgtm.
>>
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>>
>>
>> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
>> find a good argument against it ;) and yes, this is subjective.
>
> Does anyone else have any comments on this patchset?
>
> If not I am going to apply this to a branch and get it into linux-next.

Thank you all.

I have pushed this to my ptrace_stop-cleanup-for-v5.19 branch
and placed the branch in linux-next.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-11 20:00                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-11 20:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/05, Eric W. Biederman wrote:
>>>
>>> Eric W. Biederman (11):
>>>       signal: Rename send_signal send_signal_locked
>>>       signal: Replace __group_send_sig_info with send_signal_locked
>>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>>       ptrace: Remove arch_ptrace_attach
>>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>>       ptrace: Document that wait_task_inactive can't fail
>>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>>       ptrace: Don't change __state
>>>       ptrace: Always take siglock in ptrace_resume
>>>
>>> Peter Zijlstra (1):
>>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>>
>> OK, lgtm.
>>
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>>
>>
>> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
>> find a good argument against it ;) and yes, this is subjective.
>
> Does anyone else have any comments on this patchset?
>
> If not I am going to apply this to a branch and get it into linux-next.

Thank you all.

I have pushed this to my ptrace_stop-cleanup-for-v5.19 branch
and placed the branch in linux-next.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop
@ 2022-05-11 20:00                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-11 20:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Oleg Nesterov <oleg@redhat.com> writes:
>
>> On 05/05, Eric W. Biederman wrote:
>>>
>>> Eric W. Biederman (11):
>>>       signal: Rename send_signal send_signal_locked
>>>       signal: Replace __group_send_sig_info with send_signal_locked
>>>       ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>>       ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>>       ptrace: Remove arch_ptrace_attach
>>>       signal: Use lockdep_assert_held instead of assert_spin_locked
>>>       ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>>       ptrace: Document that wait_task_inactive can't fail
>>>       ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>>       ptrace: Don't change __state
>>>       ptrace: Always take siglock in ptrace_resume
>>>
>>> Peter Zijlstra (1):
>>>       sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>>
>> OK, lgtm.
>>
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>>
>>
>> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
>> find a good argument against it ;) and yes, this is subjective.
>
> Does anyone else have any comments on this patchset?
>
> If not I am going to apply this to a branch and get it into linux-next.

Thank you all.

I have pushed this to my ptrace_stop-cleanup-for-v5.19 branch
and placed the branch in linux-next.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-05 18:25             ` Eric W. Biederman
  (?)
@ 2022-05-18 22:49               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64, Robert O'Callahan, Kyle Huey, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras


For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-18 22:49               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64, Robert O'Callahan, Kyle Huey, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras


For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-18 22:49               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, oleg, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	mgorman, bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Jann Horn, Kees Cook,
	linux-ia64, Robert O'Callahan, Kyle Huey, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras


For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman, linux-alpha

Rather than update this defition when I move tsk->real_parent into
signal_struct remove it now.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c
index 2e125e5c1508..0fca99dc5757 100644
--- a/arch/alpha/kernel/asm-offsets.c
+++ b/arch/alpha/kernel/asm-offsets.c
@@ -21,7 +21,6 @@ void foo(void)
 
         DEFINE(TASK_BLOCKED, offsetof(struct task_struct, blocked));
         DEFINE(TASK_CRED, offsetof(struct task_struct, cred));
-        DEFINE(TASK_REAL_PARENT, offsetof(struct task_struct, real_parent));
         DEFINE(TASK_GROUP_LEADER, offsetof(struct task_struct, group_leader));
         DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
         BLANK();
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman, linux-alpha

Rather than update this defition when I move tsk->real_parent into
signal_struct remove it now.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c
index 2e125e5c1508..0fca99dc5757 100644
--- a/arch/alpha/kernel/asm-offsets.c
+++ b/arch/alpha/kernel/asm-offsets.c
@@ -21,7 +21,6 @@ void foo(void)
 
         DEFINE(TASK_BLOCKED, offsetof(struct task_struct, blocked));
         DEFINE(TASK_CRED, offsetof(struct task_struct, cred));
-        DEFINE(TASK_REAL_PARENT, offsetof(struct task_struct, real_parent));
         DEFINE(TASK_GROUP_LEADER, offsetof(struct task_struct, group_leader));
         DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
         BLANK();
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman, linux-alpha

Rather than update this defition when I move tsk->real_parent into
signal_struct remove it now.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c
index 2e125e5c1508..0fca99dc5757 100644
--- a/arch/alpha/kernel/asm-offsets.c
+++ b/arch/alpha/kernel/asm-offsets.c
@@ -21,7 +21,6 @@ void foo(void)
 
         DEFINE(TASK_BLOCKED, offsetof(struct task_struct, blocked));
         DEFINE(TASK_CRED, offsetof(struct task_struct, cred));
-        DEFINE(TASK_REAL_PARENT, offsetof(struct task_struct, real_parent));
         DEFINE(TASK_GROUP_LEADER, offsetof(struct task_struct, group_leader));
         DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
         BLANK();
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky

Rather than update this defition when I move tsk->real_parent into
signal_struct remove it now.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c
index 2e125e5c1508..0fca99dc5757 100644
--- a/arch/alpha/kernel/asm-offsets.c
+++ b/arch/alpha/kernel/asm-offsets.c
@@ -21,7 +21,6 @@ void foo(void)
 
         DEFINE(TASK_BLOCKED, offsetof(struct task_struct, blocked));
         DEFINE(TASK_CRED, offsetof(struct task_struct, cred));
-        DEFINE(TASK_REAL_PARENT, offsetof(struct task_struct, real_parent));
         DEFINE(TASK_GROUP_LEADER, offsetof(struct task_struct, group_leader));
         DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
         BLANK();
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 02/16] signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Rather than update the unused definition of IA64_TASK_REAL_PARENT_OFFSENT
when I move tsk->real_parent into signal_struct remove it now.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/ia64/kernel/asm-offsets.c b/arch/ia64/kernel/asm-offsets.c
index be3b90fef2e9..245c4333ea30 100644
--- a/arch/ia64/kernel/asm-offsets.c
+++ b/arch/ia64/kernel/asm-offsets.c
@@ -55,7 +55,6 @@ void foo(void)
 	DEFINE(IA64_PID_UPID_OFFSET, offsetof (struct pid, numbers[0]));
 	DEFINE(IA64_TASK_PENDING_OFFSET,offsetof (struct task_struct, pending));
 	DEFINE(IA64_TASK_PID_OFFSET, offsetof (struct task_struct, pid));
-	DEFINE(IA64_TASK_REAL_PARENT_OFFSET, offsetof (struct task_struct, real_parent));
 	DEFINE(IA64_TASK_SIGNAL_OFFSET,offsetof (struct task_struct, signal));
 	DEFINE(IA64_TASK_TGID_OFFSET, offsetof (struct task_struct, tgid));
 	DEFINE(IA64_TASK_THREAD_KSP_OFFSET, offsetof (struct task_struct, thread.ksp));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 02/16] signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Rather than update the unused definition of IA64_TASK_REAL_PARENT_OFFSENT
when I move tsk->real_parent into signal_struct remove it now.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/ia64/kernel/asm-offsets.c b/arch/ia64/kernel/asm-offsets.c
index be3b90fef2e9..245c4333ea30 100644
--- a/arch/ia64/kernel/asm-offsets.c
+++ b/arch/ia64/kernel/asm-offsets.c
@@ -55,7 +55,6 @@ void foo(void)
 	DEFINE(IA64_PID_UPID_OFFSET, offsetof (struct pid, numbers[0]));
 	DEFINE(IA64_TASK_PENDING_OFFSET,offsetof (struct task_struct, pending));
 	DEFINE(IA64_TASK_PID_OFFSET, offsetof (struct task_struct, pid));
-	DEFINE(IA64_TASK_REAL_PARENT_OFFSET, offsetof (struct task_struct, real_parent));
 	DEFINE(IA64_TASK_SIGNAL_OFFSET,offsetof (struct task_struct, signal));
 	DEFINE(IA64_TASK_TGID_OFFSET, offsetof (struct task_struct, tgid));
 	DEFINE(IA64_TASK_THREAD_KSP_OFFSET, offsetof (struct task_struct, thread.ksp));
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 02/16] signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Rather than update the unused definition of IA64_TASK_REAL_PARENT_OFFSENT
when I move tsk->real_parent into signal_struct remove it now.

Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/ia64/kernel/asm-offsets.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/ia64/kernel/asm-offsets.c b/arch/ia64/kernel/asm-offsets.c
index be3b90fef2e9..245c4333ea30 100644
--- a/arch/ia64/kernel/asm-offsets.c
+++ b/arch/ia64/kernel/asm-offsets.c
@@ -55,7 +55,6 @@ void foo(void)
 	DEFINE(IA64_PID_UPID_OFFSET, offsetof (struct pid, numbers[0]));
 	DEFINE(IA64_TASK_PENDING_OFFSET,offsetof (struct task_struct, pending));
 	DEFINE(IA64_TASK_PID_OFFSET, offsetof (struct task_struct, pid));
-	DEFINE(IA64_TASK_REAL_PARENT_OFFSET, offsetof (struct task_struct, real_parent));
 	DEFINE(IA64_TASK_SIGNAL_OFFSET,offsetof (struct task_struct, signal));
 	DEFINE(IA64_TASK_TGID_OFFSET, offsetof (struct task_struct, tgid));
 	DEFINE(IA64_TASK_THREAD_KSP_OFFSET, offsetof (struct task_struct, thread.ksp));
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

kdb has a bug that when using the ps command to display a list of
processes, if a process is being debugged the debugger as the parent
process.

This is silly, and I expect it never comes up in ptractice.  As there
is very little point in using gdb and kdb simultaneously.  Update the
code to use real_parent so that it is clear kdb does not want to
display a debugger as the parent of a process.

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)"
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/debug/kdb/kdb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..db49f1026eaa 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2306,7 +2306,7 @@ void kdb_ps1(const struct task_struct *p)
 
 	cpu = kdb_process_cpu(p);
 	kdb_printf("0x%px %8d %8d  %d %4d   %c  0x%px %c%s\n",
-		   (void *)p, p->pid, p->parent->pid,
+		   (void *)p, p->pid, p->real_parent->pid,
 		   kdb_task_has_cpu(p), kdb_process_cpu(p),
 		   kdb_task_state_char(p),
 		   (void *)(&p->thread),
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

kdb has a bug that when using the ps command to display a list of
processes, if a process is being debugged the debugger as the parent
process.

This is silly, and I expect it never comes up in ptractice.  As there
is very little point in using gdb and kdb simultaneously.  Update the
code to use real_parent so that it is clear kdb does not want to
display a debugger as the parent of a process.

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)"
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/debug/kdb/kdb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..db49f1026eaa 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2306,7 +2306,7 @@ void kdb_ps1(const struct task_struct *p)
 
 	cpu = kdb_process_cpu(p);
 	kdb_printf("0x%px %8d %8d  %d %4d   %c  0x%px %c%s\n",
-		   (void *)p, p->pid, p->parent->pid,
+		   (void *)p, p->pid, p->real_parent->pid,
 		   kdb_task_has_cpu(p), kdb_process_cpu(p),
 		   kdb_task_state_char(p),
 		   (void *)(&p->thread),
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

kdb has a bug that when using the ps command to display a list of
processes, if a process is being debugged the debugger as the parent
process.

This is silly, and I expect it never comes up in ptractice.  As there
is very little point in using gdb and kdb simultaneously.  Update the
code to use real_parent so that it is clear kdb does not want to
display a debugger as the parent of a process.

Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)"
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/debug/kdb/kdb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..db49f1026eaa 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2306,7 +2306,7 @@ void kdb_ps1(const struct task_struct *p)
 
 	cpu = kdb_process_cpu(p);
 	kdb_printf("0x%px %8d %8d  %d %4d   %c  0x%px %c%s\n",
-		   (void *)p, p->pid, p->parent->pid,
+		   (void *)p, p->pid, p->real_parent->pid,
 		   kdb_task_has_cpu(p), kdb_process_cpu(p),
 		   kdb_task_state_char(p),
 		   (void *)(&p->thread),
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 04/16] powerpc/xmon:  Use real_parent when displaying a list of processes
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

xmon has a bug (copied from kdb) that when showing a list of processes
the debugger is listed as the parent, if a processes is being debugged.

This is silly, and I expect it is rare enough no has noticed in
practice.  Update the code to use real_parent so that it is clear xmon
does not want to display a debugger as the parent of a process.

Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Fixes: 6dfb54049f9a ("powerpc/xmon: Add xmon command to dump process/task similar to ps(1)")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/powerpc/xmon/xmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..b308ef9ce604 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3282,7 +3282,7 @@ static void show_task(struct task_struct *volatile tsk)
 
 	printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
 		tsk->thread.ksp, tsk->thread.regs,
-		tsk->pid, rcu_dereference(tsk->parent)->pid,
+		tsk->pid, rcu_dereference(tsk->real_parent)->pid,
 		state, task_cpu(tsk),
 		tsk->comm);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 04/16] powerpc/xmon: Use real_parent when displaying a list of processes
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

xmon has a bug (copied from kdb) that when showing a list of processes
the debugger is listed as the parent, if a processes is being debugged.

This is silly, and I expect it is rare enough no has noticed in
practice.  Update the code to use real_parent so that it is clear xmon
does not want to display a debugger as the parent of a process.

Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Fixes: 6dfb54049f9a ("powerpc/xmon: Add xmon command to dump process/task similar to ps(1)")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/powerpc/xmon/xmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..b308ef9ce604 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3282,7 +3282,7 @@ static void show_task(struct task_struct *volatile tsk)
 
 	printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
 		tsk->thread.ksp, tsk->thread.regs,
-		tsk->pid, rcu_dereference(tsk->parent)->pid,
+		tsk->pid, rcu_dereference(tsk->real_parent)->pid,
 		state, task_cpu(tsk),
 		tsk->comm);
 }
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 04/16] powerpc/xmon:  Use real_parent when displaying a list of processes
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

xmon has a bug (copied from kdb) that when showing a list of processes
the debugger is listed as the parent, if a processes is being debugged.

This is silly, and I expect it is rare enough no has noticed in
practice.  Update the code to use real_parent so that it is clear xmon
does not want to display a debugger as the parent of a process.

Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Fixes: 6dfb54049f9a ("powerpc/xmon: Add xmon command to dump process/task similar to ps(1)")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/powerpc/xmon/xmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..b308ef9ce604 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3282,7 +3282,7 @@ static void show_task(struct task_struct *volatile tsk)
 
 	printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
 		tsk->thread.ksp, tsk->thread.regs,
-		tsk->pid, rcu_dereference(tsk->parent)->pid,
+		tsk->pid, rcu_dereference(tsk->real_parent)->pid,
 		state, task_cpu(tsk),
 		tsk->comm);
 }
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
been impossible to attach another thread in the same thread group.

Remove the code from __ptrace_detach that was trying to support
detaching from a thread in the same thread group.  The code is
dead and I can not make sense of what it is trying to do.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 328a34a99124..ca0e47691229 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,19 +526,6 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
-/*
- * Called with irqs disabled, returns true if childs should reap themselves.
- */
-static int ignoring_children(struct sighand_struct *sigh)
-{
-	int ret;
-	spin_lock(&sigh->siglock);
-	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
-	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
-	spin_unlock(&sigh->siglock);
-	return ret;
-}
-
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
 
 	dead = !thread_group_leader(p);
 
-	if (!dead && thread_group_empty(p)) {
-		if (!same_thread_group(p->real_parent, tracer))
-			dead = do_notify_parent(p, p->exit_signal);
-		else if (ignoring_children(tracer->sighand)) {
-			__wake_up_parent(p, tracer);
-			dead = true;
-		}
-	}
+	if (!dead && thread_group_empty(p))
+		dead = do_notify_parent(p, p->exit_signal);
+
 	/* Mark it as in the process of being reaped. */
 	if (dead)
 		p->exit_state = EXIT_DEAD;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
been impossible to attach another thread in the same thread group.

Remove the code from __ptrace_detach that was trying to support
detaching from a thread in the same thread group.  The code is
dead and I can not make sense of what it is trying to do.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 328a34a99124..ca0e47691229 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,19 +526,6 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
-/*
- * Called with irqs disabled, returns true if childs should reap themselves.
- */
-static int ignoring_children(struct sighand_struct *sigh)
-{
-	int ret;
-	spin_lock(&sigh->siglock);
-	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
-	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
-	spin_unlock(&sigh->siglock);
-	return ret;
-}
-
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
 
 	dead = !thread_group_leader(p);
 
-	if (!dead && thread_group_empty(p)) {
-		if (!same_thread_group(p->real_parent, tracer))
-			dead = do_notify_parent(p, p->exit_signal);
-		else if (ignoring_children(tracer->sighand)) {
-			__wake_up_parent(p, tracer);
-			dead = true;
-		}
-	}
+	if (!dead && thread_group_empty(p))
+		dead = do_notify_parent(p, p->exit_signal);
+
 	/* Mark it as in the process of being reaped. */
 	if (dead)
 		p->exit_state = EXIT_DEAD;
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
been impossible to attach another thread in the same thread group.

Remove the code from __ptrace_detach that was trying to support
detaching from a thread in the same thread group.  The code is
dead and I can not make sense of what it is trying to do.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 328a34a99124..ca0e47691229 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,19 +526,6 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
-/*
- * Called with irqs disabled, returns true if childs should reap themselves.
- */
-static int ignoring_children(struct sighand_struct *sigh)
-{
-	int ret;
-	spin_lock(&sigh->siglock);
-	ret = (sigh->action[SIGCHLD-1].sa.sa_handler = SIG_IGN) ||
-	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
-	spin_unlock(&sigh->siglock);
-	return ret;
-}
-
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
 
 	dead = !thread_group_leader(p);
 
-	if (!dead && thread_group_empty(p)) {
-		if (!same_thread_group(p->real_parent, tracer))
-			dead = do_notify_parent(p, p->exit_signal);
-		else if (ignoring_children(tracer->sighand)) {
-			__wake_up_parent(p, tracer);
-			dead = true;
-		}
-	}
+	if (!dead && thread_group_empty(p))
+		dead = do_notify_parent(p, p->exit_signal);
+
 	/* Mark it as in the process of being reaped. */
 	if (dead)
 		p->exit_state = EXIT_DEAD;
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
can never race with SIGKILL") it has been unnecessary for
ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

Having the code taking an unnecessary lock is confusing
as it suggests that other parts of the code need to take
the unnecessary lock as well.

So remove the unnecessary lock to make the code more
efficient, simpler, and less confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ca0e47691229..15e93eafa6f0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -677,34 +677,20 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 
 static int ptrace_getsiginfo(struct task_struct *child, kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(info, child->last_siginfo);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(info, child->last_siginfo);
+	return 0;
 }
 
 static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(child->last_siginfo, info);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(child->last_siginfo, info);
+	return 0;
 }
 
 static int ptrace_peek_siginfo(struct task_struct *child,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
can never race with SIGKILL") it has been unnecessary for
ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

Having the code taking an unnecessary lock is confusing
as it suggests that other parts of the code need to take
the unnecessary lock as well.

So remove the unnecessary lock to make the code more
efficient, simpler, and less confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ca0e47691229..15e93eafa6f0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -677,34 +677,20 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 
 static int ptrace_getsiginfo(struct task_struct *child, kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(info, child->last_siginfo);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(info, child->last_siginfo);
+	return 0;
 }
 
 static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(child->last_siginfo, info);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(child->last_siginfo, info);
+	return 0;
 }
 
 static int ptrace_peek_siginfo(struct task_struct *child,
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
can never race with SIGKILL") it has been unnecessary for
ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

Having the code taking an unnecessary lock is confusing
as it suggests that other parts of the code need to take
the unnecessary lock as well.

So remove the unnecessary lock to make the code more
efficient, simpler, and less confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ca0e47691229..15e93eafa6f0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -677,34 +677,20 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 
 static int ptrace_getsiginfo(struct task_struct *child, kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(info, child->last_siginfo);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(info, child->last_siginfo);
+	return 0;
 }
 
 static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *info)
 {
-	unsigned long flags;
-	int error = -ESRCH;
+	if (unlikely(!child->last_siginfo))
+		return -EINVAL;
 
-	if (lock_task_sighand(child, &flags)) {
-		error = -EINVAL;
-		if (likely(child->last_siginfo != NULL)) {
-			copy_siginfo(child->last_siginfo, info);
-			error = 0;
-		}
-		unlock_task_sighand(child, &flags);
-	}
-	return error;
+	copy_siginfo(child->last_siginfo, info);
+	return 0;
 }
 
 static int ptrace_peek_siginfo(struct task_struct *child,
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 07/16] signal: Wake up the designated parent
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Today if a process is ptraced only the ptracer will ever be woken up in
wait, if the parent is waiting with __WNOTHREAD.  Update the code
so that the real_parent can also be woken up with __WNOTHREAD even
when the code is ptraced.

Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..0e26f73c49ac 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1421,26 +1421,35 @@ static int ptrace_do_wait(struct wait_opts *wo, struct task_struct *tsk)
 	return 0;
 }
 
+struct child_wait_info {
+	struct task_struct *p;
+	struct task_struct *parent;
+};
+
 static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
 				int sync, void *key)
 {
 	struct wait_opts *wo = container_of(wait, struct wait_opts,
 						child_wait);
-	struct task_struct *p = key;
+	struct child_wait_info *info = key;
 
-	if (!eligible_pid(wo, p))
+	if (!eligible_pid(wo, info->p))
 		return 0;
 
-	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-		return 0;
+	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
+			return 0;
 
 	return default_wake_function(wait, mode, sync, key);
 }
 
 void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 {
+	struct child_wait_info info = {
+		.p = p,
+		.parent = parent,
+	};
 	__wake_up_sync_key(&parent->signal->wait_chldexit,
-			   TASK_INTERRUPTIBLE, p);
+			   TASK_INTERRUPTIBLE, &info);
 }
 
 static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Today if a process is ptraced only the ptracer will ever be woken up in
wait, if the parent is waiting with __WNOTHREAD.  Update the code
so that the real_parent can also be woken up with __WNOTHREAD even
when the code is ptraced.

Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..0e26f73c49ac 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1421,26 +1421,35 @@ static int ptrace_do_wait(struct wait_opts *wo, struct task_struct *tsk)
 	return 0;
 }
 
+struct child_wait_info {
+	struct task_struct *p;
+	struct task_struct *parent;
+};
+
 static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
 				int sync, void *key)
 {
 	struct wait_opts *wo = container_of(wait, struct wait_opts,
 						child_wait);
-	struct task_struct *p = key;
+	struct child_wait_info *info = key;
 
-	if (!eligible_pid(wo, p))
+	if (!eligible_pid(wo, info->p))
 		return 0;
 
-	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-		return 0;
+	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
+			return 0;
 
 	return default_wake_function(wait, mode, sync, key);
 }
 
 void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 {
+	struct child_wait_info info = {
+		.p = p,
+		.parent = parent,
+	};
 	__wake_up_sync_key(&parent->signal->wait_chldexit,
-			   TASK_INTERRUPTIBLE, p);
+			   TASK_INTERRUPTIBLE, &info);
 }
 
 static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Today if a process is ptraced only the ptracer will ever be woken up in
wait, if the parent is waiting with __WNOTHREAD.  Update the code
so that the real_parent can also be woken up with __WNOTHREAD even
when the code is ptraced.

Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..0e26f73c49ac 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1421,26 +1421,35 @@ static int ptrace_do_wait(struct wait_opts *wo, struct task_struct *tsk)
 	return 0;
 }
 
+struct child_wait_info {
+	struct task_struct *p;
+	struct task_struct *parent;
+};
+
 static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
 				int sync, void *key)
 {
 	struct wait_opts *wo = container_of(wait, struct wait_opts,
 						child_wait);
-	struct task_struct *p = key;
+	struct child_wait_info *info = key;
 
-	if (!eligible_pid(wo, p))
+	if (!eligible_pid(wo, info->p))
 		return 0;
 
-	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-		return 0;
+	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
+			return 0;
 
 	return default_wake_function(wait, mode, sync, key);
 }
 
 void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 {
+	struct child_wait_info info = {
+		.p = p,
+		.parent = parent,
+	};
 	__wake_up_sync_key(&parent->signal->wait_chldexit,
-			   TASK_INTERRUPTIBLE, p);
+			   TASK_INTERRUPTIBLE, &info);
 }
 
 static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The code in ptrace_signal to populate siginfo if the signal number
changed is buggy.  If the tracer contined the tracee using
ptrace_detach it is guaranteed to use the real_parent (or possibly a
new tracer) but definitely not the origional tracer to populate si_pid
and si_uid.

Fix this bug by only updating siginfo from the tracer so that the
tracers pid and the tracers uid are always used.

If it happens that ptrace_resume or ptrace_detach don't have
a signal to continue with clear siginfo.

This is a very old bug that has been fixable since commit 1669ce53e2ff
("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO") when last_siginfo was
introduced and the tracer could change siginfo.

Fixes: v2.1.68
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 31 +++++++++++++++++++++++++++++--
 kernel/signal.c | 18 ------------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 15e93eafa6f0..a24eed725cec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,6 +526,33 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
+static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
+{
+	struct kernel_siginfo *info = child->last_siginfo;
+
+	child->exit_code = signr;
+	/*
+	 * Update the siginfo structure if the signal has
+	 * changed.  If the debugger wanted something
+	 * specific in the siginfo structure then it should
+	 * have updated *info via PTRACE_SETSIGINFO.
+	 */
+	if (info && (info->si_signo != signr)) {
+		clear_siginfo(info);
+
+		if (signr != 0) {
+			info->si_signo = signr;
+			info->si_errno = 0;
+			info->si_code = SI_USER;
+			rcu_read_lock();
+			info->si_pid = task_pid_nr_ns(current, task_active_pid_ns(child));
+			info->si_uid = from_kuid_munged(task_cred_xxx(child, user_ns),
+						current_uid());
+			rcu_read_unlock();
+		}
+	}
+}
+
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -579,7 +606,7 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -851,7 +878,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * wait_task_stopped() after resume.
 	 */
 	spin_lock_irq(&child->sighand->siglock);
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index e782c2611b64..ff4a52352390 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2562,24 +2562,6 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	if (signr == 0)
 		return signr;
 
-	/*
-	 * Update the siginfo structure if the signal has
-	 * changed.  If the debugger wanted something
-	 * specific in the siginfo structure then it should
-	 * have updated *info via PTRACE_SETSIGINFO.
-	 */
-	if (signr != info->si_signo) {
-		clear_siginfo(info);
-		info->si_signo = signr;
-		info->si_errno = 0;
-		info->si_code = SI_USER;
-		rcu_read_lock();
-		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = from_kuid_munged(current_user_ns(),
-						task_uid(current->parent));
-		rcu_read_unlock();
-	}
-
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The code in ptrace_signal to populate siginfo if the signal number
changed is buggy.  If the tracer contined the tracee using
ptrace_detach it is guaranteed to use the real_parent (or possibly a
new tracer) but definitely not the origional tracer to populate si_pid
and si_uid.

Fix this bug by only updating siginfo from the tracer so that the
tracers pid and the tracers uid are always used.

If it happens that ptrace_resume or ptrace_detach don't have
a signal to continue with clear siginfo.

This is a very old bug that has been fixable since commit 1669ce53e2ff
("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO") when last_siginfo was
introduced and the tracer could change siginfo.

Fixes: v2.1.68
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 31 +++++++++++++++++++++++++++++--
 kernel/signal.c | 18 ------------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 15e93eafa6f0..a24eed725cec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,6 +526,33 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
+static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
+{
+	struct kernel_siginfo *info = child->last_siginfo;
+
+	child->exit_code = signr;
+	/*
+	 * Update the siginfo structure if the signal has
+	 * changed.  If the debugger wanted something
+	 * specific in the siginfo structure then it should
+	 * have updated *info via PTRACE_SETSIGINFO.
+	 */
+	if (info && (info->si_signo != signr)) {
+		clear_siginfo(info);
+
+		if (signr != 0) {
+			info->si_signo = signr;
+			info->si_errno = 0;
+			info->si_code = SI_USER;
+			rcu_read_lock();
+			info->si_pid = task_pid_nr_ns(current, task_active_pid_ns(child));
+			info->si_uid = from_kuid_munged(task_cred_xxx(child, user_ns),
+						current_uid());
+			rcu_read_unlock();
+		}
+	}
+}
+
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -579,7 +606,7 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -851,7 +878,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * wait_task_stopped() after resume.
 	 */
 	spin_lock_irq(&child->sighand->siglock);
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index e782c2611b64..ff4a52352390 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2562,24 +2562,6 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	if (signr == 0)
 		return signr;
 
-	/*
-	 * Update the siginfo structure if the signal has
-	 * changed.  If the debugger wanted something
-	 * specific in the siginfo structure then it should
-	 * have updated *info via PTRACE_SETSIGINFO.
-	 */
-	if (signr != info->si_signo) {
-		clear_siginfo(info);
-		info->si_signo = signr;
-		info->si_errno = 0;
-		info->si_code = SI_USER;
-		rcu_read_lock();
-		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = from_kuid_munged(current_user_ns(),
-						task_uid(current->parent));
-		rcu_read_unlock();
-	}
-
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The code in ptrace_signal to populate siginfo if the signal number
changed is buggy.  If the tracer contined the tracee using
ptrace_detach it is guaranteed to use the real_parent (or possibly a
new tracer) but definitely not the origional tracer to populate si_pid
and si_uid.

Fix this bug by only updating siginfo from the tracer so that the
tracers pid and the tracers uid are always used.

If it happens that ptrace_resume or ptrace_detach don't have
a signal to continue with clear siginfo.

This is a very old bug that has been fixable since commit 1669ce53e2ff
("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO") when last_siginfo was
introduced and the tracer could change siginfo.

Fixes: v2.1.68
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 31 +++++++++++++++++++++++++++++--
 kernel/signal.c | 18 ------------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 15e93eafa6f0..a24eed725cec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,6 +526,33 @@ static int ptrace_traceme(void)
 	return ret;
 }
 
+static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
+{
+	struct kernel_siginfo *info = child->last_siginfo;
+
+	child->exit_code = signr;
+	/*
+	 * Update the siginfo structure if the signal has
+	 * changed.  If the debugger wanted something
+	 * specific in the siginfo structure then it should
+	 * have updated *info via PTRACE_SETSIGINFO.
+	 */
+	if (info && (info->si_signo != signr)) {
+		clear_siginfo(info);
+
+		if (signr != 0) {
+			info->si_signo = signr;
+			info->si_errno = 0;
+			info->si_code = SI_USER;
+			rcu_read_lock();
+			info->si_pid = task_pid_nr_ns(current, task_active_pid_ns(child));
+			info->si_uid = from_kuid_munged(task_cred_xxx(child, user_ns),
+						current_uid());
+			rcu_read_unlock();
+		}
+	}
+}
+
 /*
  * Called with tasklist_lock held for writing.
  * Unlink a traced task, and clean it up if it was a traced zombie.
@@ -579,7 +606,7 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	 * tasklist_lock avoids the race with wait_task_stopped(), see
 	 * the comment in ptrace_resume().
 	 */
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -851,7 +878,7 @@ static int ptrace_resume(struct task_struct *child, long request,
 	 * wait_task_stopped() after resume.
 	 */
 	spin_lock_irq(&child->sighand->siglock);
-	child->exit_code = data;
+	ptrace_set_signr(child, data);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index e782c2611b64..ff4a52352390 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2562,24 +2562,6 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	if (signr = 0)
 		return signr;
 
-	/*
-	 * Update the siginfo structure if the signal has
-	 * changed.  If the debugger wanted something
-	 * specific in the siginfo structure then it should
-	 * have updated *info via PTRACE_SETSIGINFO.
-	 */
-	if (signr != info->si_signo) {
-		clear_siginfo(info);
-		info->si_signo = signr;
-		info->si_errno = 0;
-		info->si_code = SI_USER;
-		rcu_read_lock();
-		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = from_kuid_munged(current_user_ns(),
-						task_uid(current->parent));
-		rcu_read_unlock();
-	}
-
 	/* If the (new) signal is now blocked, requeue it.  */
 	if (sigismember(&current->blocked, signr) ||
 	    fatal_signal_pending(current)) {
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 09/16] ptrace: In ptrace_setsiginfo deal with invalid si_signo
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

If the tracer calls PTRACE_SETSIGINFO it only has an effect if the
tracee is stopped in ptrace_signal.

When one of PTRACE_DETACH, PTRACE_SINGLESTEP, PTRACE_SINGLEBLOCK,
PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SYSCALL, or
PTRACE_CONT pass in a signel number to continue with the kernel
validates that signal number and the ptrace_signal verifies the signal
number matches the si_signo, before the siginfo is used.

As the signal number to continue with is verified to be a valid signal
number the signal number in si_signo must be a valid signal number.

Make this obvious and avoid needing checks later by immediately
clearing siginfo if si_signo is not valid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a24eed725cec..a0a07d140751 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -716,7 +716,9 @@ static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *
 	if (unlikely(!child->last_siginfo))
 		return -EINVAL;
 
-	copy_siginfo(child->last_siginfo, info);
+	clear_siginfo(child->last_siginfo);
+	if (valid_signal(info->si_signo))
+		copy_siginfo(child->last_siginfo, info);
 	return 0;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 09/16] ptrace: In ptrace_setsiginfo deal with invalid si_signo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

If the tracer calls PTRACE_SETSIGINFO it only has an effect if the
tracee is stopped in ptrace_signal.

When one of PTRACE_DETACH, PTRACE_SINGLESTEP, PTRACE_SINGLEBLOCK,
PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SYSCALL, or
PTRACE_CONT pass in a signel number to continue with the kernel
validates that signal number and the ptrace_signal verifies the signal
number matches the si_signo, before the siginfo is used.

As the signal number to continue with is verified to be a valid signal
number the signal number in si_signo must be a valid signal number.

Make this obvious and avoid needing checks later by immediately
clearing siginfo if si_signo is not valid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a24eed725cec..a0a07d140751 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -716,7 +716,9 @@ static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *
 	if (unlikely(!child->last_siginfo))
 		return -EINVAL;
 
-	copy_siginfo(child->last_siginfo, info);
+	clear_siginfo(child->last_siginfo);
+	if (valid_signal(info->si_signo))
+		copy_siginfo(child->last_siginfo, info);
 	return 0;
 }
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 09/16] ptrace: In ptrace_setsiginfo deal with invalid si_signo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

If the tracer calls PTRACE_SETSIGINFO it only has an effect if the
tracee is stopped in ptrace_signal.

When one of PTRACE_DETACH, PTRACE_SINGLESTEP, PTRACE_SINGLEBLOCK,
PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SYSCALL, or
PTRACE_CONT pass in a signel number to continue with the kernel
validates that signal number and the ptrace_signal verifies the signal
number matches the si_signo, before the siginfo is used.

As the signal number to continue with is verified to be a valid signal
number the signal number in si_signo must be a valid signal number.

Make this obvious and avoid needing checks later by immediately
clearing siginfo if si_signo is not valid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a24eed725cec..a0a07d140751 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -716,7 +716,9 @@ static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *
 	if (unlikely(!child->last_siginfo))
 		return -EINVAL;
 
-	copy_siginfo(child->last_siginfo, info);
+	clear_siginfo(child->last_siginfo);
+	if (valid_signal(info->si_signo))
+		copy_siginfo(child->last_siginfo, info);
 	return 0;
 }
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 10/16] ptrace: In ptrace_signal look at what the debugger did with siginfo
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siginfo is only modified by the tracer and that siginfo is
cleared with the signal is canceled have ptrace_signal directly examine
siginfo.

This makes the code a little simpler and handles the case when
the tracer exits without calling ptrace_detach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff4a52352390..3d955c23b13d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2556,9 +2556,10 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
+	signr = info->si_signo;
 	if (signr == 0)
 		return signr;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 10/16] ptrace: In ptrace_signal look at what the debugger did with siginfo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siginfo is only modified by the tracer and that siginfo is
cleared with the signal is canceled have ptrace_signal directly examine
siginfo.

This makes the code a little simpler and handles the case when
the tracer exits without calling ptrace_detach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff4a52352390..3d955c23b13d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2556,9 +2556,10 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
+	signr = info->si_signo;
 	if (signr == 0)
 		return signr;
 
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 10/16] ptrace: In ptrace_signal look at what the debugger did with siginfo
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siginfo is only modified by the tracer and that siginfo is
cleared with the signal is canceled have ptrace_signal directly examine
siginfo.

This makes the code a little simpler and handles the case when
the tracer exits without calling ptrace_detach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff4a52352390..3d955c23b13d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2556,9 +2556,10 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info);
 
 	/* We're back.  Did the debugger cancel the sig?  */
+	signr = info->si_signo;
 	if (signr = 0)
 		return signr;
 
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 11/16] ptrace: Use si_sino as the signal number to resume with
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The signal number to resume with is already in si_signo.  So instead
of placing an extra copy in tsk->exit_code and later reading the extra
copy from tsk->exit_code just read si_signo.

Read si_signo in ptrace_do_notify where it is easy as the siginfo is a
local variable.  Only ptrace_report_syscall cares about the signal to
resume with from ptrace_stop and it calls ptrace_notify which calls
ptrace_do_notify so moving the actual work into ptrace_do_notify where
it is easier is not a problem.

With ptrace_stop not being involved in returning the signal to tracer
asked the tracee to resume with remove the comment and the return
code from ptrace_stop.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c |  1 -
 kernel/signal.c | 13 ++++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a0a07d140751..e0ecb1536dfc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -530,7 +530,6 @@ static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
 {
 	struct kernel_siginfo *info = child->last_siginfo;
 
-	child->exit_code = signr;
 	/*
 	 * Update the siginfo structure if the signal has
 	 * changed.  If the debugger wanted something
diff --git a/kernel/signal.c b/kernel/signal.c
index 3d955c23b13d..2cc45e8448e2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2186,12 +2186,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * We always set current->last_siginfo while stopped here.
  * That makes it a way to test a stopped process for
  * being ptrace-stopped vs being job-control-stopped.
- *
- * Returns the signal the ptracer requested the code resume
- * with.  If the code did not stop because the tracer is gone,
- * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, unsigned long message,
+static void ptrace_stop(int exit_code, int why, unsigned long message,
 		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
@@ -2219,7 +2215,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
 	if (!current->ptrace || __fatal_signal_pending(current))
-		return exit_code;
+		return;
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2302,7 +2298,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2316,7 +2311,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * This sets TIF_SIGPENDING, but never clears it.
 	 */
 	recalc_sigpending_tsk(current);
-	return exit_code;
 }
 
 static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long message)
@@ -2330,7 +2324,8 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, message, &info);
+	ptrace_stop(exit_code, why, message, &info);
+	return info.si_signo;
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 11/16] ptrace: Use si_sino as the signal number to resume with
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The signal number to resume with is already in si_signo.  So instead
of placing an extra copy in tsk->exit_code and later reading the extra
copy from tsk->exit_code just read si_signo.

Read si_signo in ptrace_do_notify where it is easy as the siginfo is a
local variable.  Only ptrace_report_syscall cares about the signal to
resume with from ptrace_stop and it calls ptrace_notify which calls
ptrace_do_notify so moving the actual work into ptrace_do_notify where
it is easier is not a problem.

With ptrace_stop not being involved in returning the signal to tracer
asked the tracee to resume with remove the comment and the return
code from ptrace_stop.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c |  1 -
 kernel/signal.c | 13 ++++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a0a07d140751..e0ecb1536dfc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -530,7 +530,6 @@ static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
 {
 	struct kernel_siginfo *info = child->last_siginfo;
 
-	child->exit_code = signr;
 	/*
 	 * Update the siginfo structure if the signal has
 	 * changed.  If the debugger wanted something
diff --git a/kernel/signal.c b/kernel/signal.c
index 3d955c23b13d..2cc45e8448e2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2186,12 +2186,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * We always set current->last_siginfo while stopped here.
  * That makes it a way to test a stopped process for
  * being ptrace-stopped vs being job-control-stopped.
- *
- * Returns the signal the ptracer requested the code resume
- * with.  If the code did not stop because the tracer is gone,
- * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, unsigned long message,
+static void ptrace_stop(int exit_code, int why, unsigned long message,
 		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
@@ -2219,7 +2215,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
 	if (!current->ptrace || __fatal_signal_pending(current))
-		return exit_code;
+		return;
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2302,7 +2298,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2316,7 +2311,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * This sets TIF_SIGPENDING, but never clears it.
 	 */
 	recalc_sigpending_tsk(current);
-	return exit_code;
 }
 
 static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long message)
@@ -2330,7 +2324,8 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, message, &info);
+	ptrace_stop(exit_code, why, message, &info);
+	return info.si_signo;
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 11/16] ptrace: Use si_sino as the signal number to resume with
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The signal number to resume with is already in si_signo.  So instead
of placing an extra copy in tsk->exit_code and later reading the extra
copy from tsk->exit_code just read si_signo.

Read si_signo in ptrace_do_notify where it is easy as the siginfo is a
local variable.  Only ptrace_report_syscall cares about the signal to
resume with from ptrace_stop and it calls ptrace_notify which calls
ptrace_do_notify so moving the actual work into ptrace_do_notify where
it is easier is not a problem.

With ptrace_stop not being involved in returning the signal to tracer
asked the tracee to resume with remove the comment and the return
code from ptrace_stop.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c |  1 -
 kernel/signal.c | 13 ++++---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a0a07d140751..e0ecb1536dfc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -530,7 +530,6 @@ static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
 {
 	struct kernel_siginfo *info = child->last_siginfo;
 
-	child->exit_code = signr;
 	/*
 	 * Update the siginfo structure if the signal has
 	 * changed.  If the debugger wanted something
diff --git a/kernel/signal.c b/kernel/signal.c
index 3d955c23b13d..2cc45e8448e2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2186,12 +2186,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  * We always set current->last_siginfo while stopped here.
  * That makes it a way to test a stopped process for
  * being ptrace-stopped vs being job-control-stopped.
- *
- * Returns the signal the ptracer requested the code resume
- * with.  If the code did not stop because the tracer is gone,
- * the stop signal remains unchanged unless clear_code.
  */
-static int ptrace_stop(int exit_code, int why, unsigned long message,
+static void ptrace_stop(int exit_code, int why, unsigned long message,
 		       kernel_siginfo_t *info)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
@@ -2219,7 +2215,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
 	if (!current->ptrace || __fatal_signal_pending(current))
-		return exit_code;
+		return;
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2302,7 +2298,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * any signal-sending on another CPU that wants to examine it.
 	 */
 	spin_lock_irq(&current->sighand->siglock);
-	exit_code = current->exit_code;
 	current->last_siginfo = NULL;
 	current->ptrace_message = 0;
 	current->exit_code = 0;
@@ -2316,7 +2311,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	 * This sets TIF_SIGPENDING, but never clears it.
 	 */
 	recalc_sigpending_tsk(current);
-	return exit_code;
 }
 
 static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long message)
@@ -2330,7 +2324,8 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	return ptrace_stop(exit_code, why, message, &info);
+	ptrace_stop(exit_code, why, message, &info);
+	return info.si_signo;
 }
 
 int ptrace_notify(int exit_code, unsigned long message)
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 12/16] ptrace: Stop protecting ptrace_set_signr with tasklist_lock
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that ptrace_set_signr no longer sets task->exit_code the race
documented in commit b72c186999e6 ("ptrace: fix race between
ptrace_resume() and wait_task_stopped()") is no longer possible, as
task->exit_code is only updated by wait during a ptrace_stop.

As there is no possibilty of a race and ptrace_freeze_traced is
all of the protection ptrace_set_signr needs to operate without
contention move ptrace_set_signr outside of tasklist_lock
and remove the documentation about the race that is no more.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e0ecb1536dfc..d0527b6e2b29 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -595,17 +595,14 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	/* Architecture-specific hardware disable .. */
 	ptrace_disable(child);
 
+	ptrace_set_signr(child, data);
+
 	write_lock_irq(&tasklist_lock);
 	/*
 	 * We rely on ptrace_freeze_traced(). It can't be killed and
 	 * untraced by another thread, it can't be a zombie.
 	 */
 	WARN_ON(!child->ptrace || child->exit_state);
-	/*
-	 * tasklist_lock avoids the race with wait_task_stopped(), see
-	 * the comment in ptrace_resume().
-	 */
-	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -869,17 +866,9 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
-	/*
-	 * Change ->exit_code and ->state under siglock to avoid the race
-	 * with wait_task_stopped() in between; a non-zero ->exit_code will
-	 * wrongly look like another report from tracee.
-	 *
-	 * Note that we need siglock even if ->exit_code == data and/or this
-	 * status was not reported yet, the new status must not be cleared by
-	 * wait_task_stopped() after resume.
-	 */
-	spin_lock_irq(&child->sighand->siglock);
 	ptrace_set_signr(child, data);
+
+	spin_lock_irq(&child->sighand->siglock);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 12/16] ptrace: Stop protecting ptrace_set_signr with tasklist_lock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that ptrace_set_signr no longer sets task->exit_code the race
documented in commit b72c186999e6 ("ptrace: fix race between
ptrace_resume() and wait_task_stopped()") is no longer possible, as
task->exit_code is only updated by wait during a ptrace_stop.

As there is no possibilty of a race and ptrace_freeze_traced is
all of the protection ptrace_set_signr needs to operate without
contention move ptrace_set_signr outside of tasklist_lock
and remove the documentation about the race that is no more.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e0ecb1536dfc..d0527b6e2b29 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -595,17 +595,14 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	/* Architecture-specific hardware disable .. */
 	ptrace_disable(child);
 
+	ptrace_set_signr(child, data);
+
 	write_lock_irq(&tasklist_lock);
 	/*
 	 * We rely on ptrace_freeze_traced(). It can't be killed and
 	 * untraced by another thread, it can't be a zombie.
 	 */
 	WARN_ON(!child->ptrace || child->exit_state);
-	/*
-	 * tasklist_lock avoids the race with wait_task_stopped(), see
-	 * the comment in ptrace_resume().
-	 */
-	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -869,17 +866,9 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
-	/*
-	 * Change ->exit_code and ->state under siglock to avoid the race
-	 * with wait_task_stopped() in between; a non-zero ->exit_code will
-	 * wrongly look like another report from tracee.
-	 *
-	 * Note that we need siglock even if ->exit_code == data and/or this
-	 * status was not reported yet, the new status must not be cleared by
-	 * wait_task_stopped() after resume.
-	 */
-	spin_lock_irq(&child->sighand->siglock);
 	ptrace_set_signr(child, data);
+
+	spin_lock_irq(&child->sighand->siglock);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 12/16] ptrace: Stop protecting ptrace_set_signr with tasklist_lock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that ptrace_set_signr no longer sets task->exit_code the race
documented in commit b72c186999e6 ("ptrace: fix race between
ptrace_resume() and wait_task_stopped()") is no longer possible, as
task->exit_code is only updated by wait during a ptrace_stop.

As there is no possibilty of a race and ptrace_freeze_traced is
all of the protection ptrace_set_signr needs to operate without
contention move ptrace_set_signr outside of tasklist_lock
and remove the documentation about the race that is no more.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e0ecb1536dfc..d0527b6e2b29 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -595,17 +595,14 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
 	/* Architecture-specific hardware disable .. */
 	ptrace_disable(child);
 
+	ptrace_set_signr(child, data);
+
 	write_lock_irq(&tasklist_lock);
 	/*
 	 * We rely on ptrace_freeze_traced(). It can't be killed and
 	 * untraced by another thread, it can't be a zombie.
 	 */
 	WARN_ON(!child->ptrace || child->exit_state);
-	/*
-	 * tasklist_lock avoids the race with wait_task_stopped(), see
-	 * the comment in ptrace_resume().
-	 */
-	ptrace_set_signr(child, data);
 	__ptrace_detach(current, child);
 	write_unlock_irq(&tasklist_lock);
 
@@ -869,17 +866,9 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
-	/*
-	 * Change ->exit_code and ->state under siglock to avoid the race
-	 * with wait_task_stopped() in between; a non-zero ->exit_code will
-	 * wrongly look like another report from tracee.
-	 *
-	 * Note that we need siglock even if ->exit_code = data and/or this
-	 * status was not reported yet, the new status must not be cleared by
-	 * wait_task_stopped() after resume.
-	 */
-	spin_lock_irq(&child->sighand->siglock);
 	ptrace_set_signr(child, data);
+
+	spin_lock_irq(&child->sighand->siglock);
 	child->jobctl &= ~JOBCTL_TRACED;
 	wake_up_state(child, __TASK_TRACED);
 	spin_unlock_irq(&child->sighand->siglock);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 13/16] ptrace: Document why ptrace_setoptions does not need a lock
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions that change ->ptrace are: ptrace_attach, ptrace_traceme,
ptrace_init_task, __ptrace_unlink, ptrace_setoptions.

Except for ptrace_setoptions all of the places where ->ptrace is
modified hold tasklist_lock for write, and either the tracee or the
tracer is modifies ->ptrace.

When ptrace_setoptions is called the tracee has been frozen with
ptrace_freeze_traced, and most be explicitly unfrozen by the tracer
before it can do anything.  As ptrace_setoption is run in the tracer
there can be no contention by the simple fact that the tracee can't
run.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d0527b6e2b29..fbadd2f21f09 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -689,7 +689,10 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 	if (ret)
 		return ret;
 
-	/* Avoid intermediate state when all opts are cleared */
+	/*
+	 * With a frozen tracee, only the tracer modifies ->ptrace.
+	 * Avoid intermediate state when all opts are cleared.
+	 */
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 13/16] ptrace: Document why ptrace_setoptions does not need a lock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions that change ->ptrace are: ptrace_attach, ptrace_traceme,
ptrace_init_task, __ptrace_unlink, ptrace_setoptions.

Except for ptrace_setoptions all of the places where ->ptrace is
modified hold tasklist_lock for write, and either the tracee or the
tracer is modifies ->ptrace.

When ptrace_setoptions is called the tracee has been frozen with
ptrace_freeze_traced, and most be explicitly unfrozen by the tracer
before it can do anything.  As ptrace_setoption is run in the tracer
there can be no contention by the simple fact that the tracee can't
run.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d0527b6e2b29..fbadd2f21f09 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -689,7 +689,10 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 	if (ret)
 		return ret;
 
-	/* Avoid intermediate state when all opts are cleared */
+	/*
+	 * With a frozen tracee, only the tracer modifies ->ptrace.
+	 * Avoid intermediate state when all opts are cleared.
+	 */
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 13/16] ptrace: Document why ptrace_setoptions does not need a lock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions that change ->ptrace are: ptrace_attach, ptrace_traceme,
ptrace_init_task, __ptrace_unlink, ptrace_setoptions.

Except for ptrace_setoptions all of the places where ->ptrace is
modified hold tasklist_lock for write, and either the tracee or the
tracer is modifies ->ptrace.

When ptrace_setoptions is called the tracee has been frozen with
ptrace_freeze_traced, and most be explicitly unfrozen by the tracer
before it can do anything.  As ptrace_setoption is run in the tracer
there can be no contention by the simple fact that the tracee can't
run.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d0527b6e2b29..fbadd2f21f09 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -689,7 +689,10 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 	if (ret)
 		return ret;
 
-	/* Avoid intermediate state when all opts are cleared */
+	/*
+	 * With a frozen tracee, only the tracer modifies ->ptrace.
+	 * Avoid intermediate state when all opts are cleared.
+	 */
 	flags = child->ptrace;
 	flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
 	flags |= (data << PT_OPT_FLAG_SHIFT);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 14/16] signal: Protect parent child relationships by childs siglock
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by tasklist_lock and not siglock.

Simplify things by additionally guarding the parent/child relationship
with siglock.  This just requires a little bit of code motion.

After this change tsk->parent, tsk->real_parent, tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

The field tsk->ptrace is protected by siglock except for the options
which may change without siglock being held.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c |  9 +++++----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 0e26f73c49ac..bad434b23c48 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fbadd2f21f09..77dfdb3d1ced 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -441,15 +440,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -517,8 +516,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 14/16] signal: Protect parent child relationships by childs siglock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by tasklist_lock and not siglock.

Simplify things by additionally guarding the parent/child relationship
with siglock.  This just requires a little bit of code motion.

After this change tsk->parent, tsk->real_parent, tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

The field tsk->ptrace is protected by siglock except for the options
which may change without siglock being held.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c |  9 +++++----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 0e26f73c49ac..bad434b23c48 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fbadd2f21f09..77dfdb3d1ced 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -441,15 +440,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -517,8 +516,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 14/16] signal: Protect parent child relationships by childs siglock
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by tasklist_lock and not siglock.

Simplify things by additionally guarding the parent/child relationship
with siglock.  This just requires a little bit of code motion.

After this change tsk->parent, tsk->real_parent, tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock.  The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list.  The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

The field tsk->ptrace is protected by siglock except for the options
which may change without siglock being held.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c   |  4 ++++
 kernel/fork.c   | 12 ++++++------
 kernel/ptrace.c |  9 +++++----
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 0e26f73c49ac..bad434b23c48 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,
 
 	reaper = find_new_reaper(father, reaper);
 	list_for_each_entry(p, &father->children, sibling) {
+		spin_lock(&p->sighand->siglock);
 		for_each_thread(p, t) {
 			RCU_INIT_POINTER(t->real_parent, reaper);
 			BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) = father));
 			if (likely(!t->ptrace))
 				t->parent = t->real_parent;
+		}
+		spin_unlock(&p->sighand->siglock);
+		for_each_thread(p, t) {
 			if (t->pdeath_signal)
 				group_send_sig_info(t->pdeath_signal,
 						    SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	write_lock_irq(&tasklist_lock);
 
+	klp_copy_process(p);
+
+	sched_core_fork(p);
+
+	spin_lock(&current->sighand->siglock);
+
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
 		p->exit_signal = args->exit_signal;
 	}
 
-	klp_copy_process(p);
-
-	sched_core_fork(p);
-
-	spin_lock(&current->sighand->siglock);
-
 	/*
 	 * Copy seccomp details explicitly here, in case they were changed
 	 * before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fbadd2f21f09..77dfdb3d1ced 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
 	clear_task_syscall_work(child, SYSCALL_EMU);
 #endif
 
+	spin_lock(&child->sighand->siglock);
 	child->parent = child->real_parent;
 	list_del_init(&child->ptrace_entry);
 	old_cred = child->ptracer_cred;
 	child->ptracer_cred = NULL;
 	put_cred(old_cred);
-
-	spin_lock(&child->sighand->siglock);
 	child->ptrace = 0;
 	/*
 	 * Clear all pending traps and TRAPPING.  TRAPPING should be
@@ -441,15 +440,15 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
+	spin_lock(&task->sighand->siglock);
 	task->ptrace = flags;
 
 	ptrace_link(task, current);
 
 	/* SEIZE doesn't trap tracee on attach */
 	if (!seize)
-		send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+		send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);
 
-	spin_lock(&task->sighand->siglock);
 
 	/*
 	 * If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -517,8 +516,10 @@ static int ptrace_traceme(void)
 		 * pretend ->real_parent untraces us right after return.
 		 */
 		if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+			spin_lock(&current->sighand->siglock);
 			current->ptrace = PT_PTRACED;
 			ptrace_link(current, current->real_parent);
+			spin_unlock(&current->sighand->siglock);
 		}
 	}
 	write_unlock_irq(&tasklist_lock);
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 15/16] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 77dfdb3d1ced..fa65841bbdbe 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,17 +194,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -240,32 +237,30 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 15/16] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 77dfdb3d1ced..fa65841bbdbe 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,17 +194,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -240,32 +237,30 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent == current) {
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent == current) {
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 15/16] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach.  The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/ptrace.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 77dfdb3d1ced..fa65841bbdbe 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,17 +194,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 {
 	bool ret = false;
 
-	/* Lockless, nobody but us can set this flag */
 	if (task->jobctl & JOBCTL_LISTENING)
 		return ret;
 
-	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
-	spin_unlock_irq(&task->sighand->siglock);
 
 	return ret;
 }
@@ -240,32 +237,30 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
  * state.
  *
  * CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
  *
  * RETURNS:
  * 0 on success, -ESRCH if %child is not ready.
  */
 static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 {
+	unsigned long flags;
 	int ret = -ESRCH;
 
 	/*
-	 * We take the read lock around doing both checks to close a
+	 * We take the siglock around doing both checks to close a
 	 * possible race where someone else was tracing our child and
 	 * detached between these two checks.  After this locked check,
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
-	read_lock(&tasklist_lock);
-	if (child->ptrace && child->parent = current) {
-		/*
-		 * child->sighand can't be NULL, release_task()
-		 * does ptrace_unlink() before __exit_signal().
-		 */
-		if (ignore_state || ptrace_freeze_traced(child))
-			ret = 0;
+	if (lock_task_sighand(child, &flags)) {
+		if (child->ptrace && child->parent = current) {
+			if (ignore_state || ptrace_freeze_traced(child))
+				ret = 0;
+		}
+		unlock_task_sighand(child, &flags);
 	}
-	read_unlock(&tasklist_lock);
 
 	if (!ret && !ignore_state &&
 	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-18 22:53                 ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

This removes one reason for taking tasklist_lock.

This makes ptrace_stop so that it should reliably work correctly and
reliably with PREEMPT_RT enabled and CONFIG_CGROUPS disabled.  The
remaining challenge is that cgroup_enter_frozen takes spin_lock after
__state has been set to TASK_TRACED.  Which on PREEMPT_RT means the
code can sleep and change __state.  Not only that but it means that
wait_task_inactive could potentially detect the code scheduling away
at that point and fail, causing ptrace_check_attach to fail.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 262 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 189 insertions(+), 73 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2cc45e8448e2..d4956be51939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1994,6 +1994,129 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
+/**
+ * lock_parents_siglocks - Take current, real_parent, and parent's siglock
+ * @lock_tracer: The tracers siglock is needed.
+ *
+ * There is no natural ordering to these locks so they must be sorted
+ * before being taken.
+ *
+ * There are two complicating factors here:
+ * - The locks live in sighand and sighand can be arbitrarily shared
+ * - parent and real_parent can change when current's siglock is unlocked.
+ *
+ * To deal with this first the all of the sighand pointers are
+ * gathered under current's siglock, and the sighand pointers are
+ * sorted.  As siglock lives inside of sighand this also sorts the
+ * siglock's by address.
+ *
+ * Then the siglocks are taken in order dropping current's siglock if
+ * necessary.
+ *
+ * Finally if parent and real_parent have not changed return.
+ * If they either parent has changed drop their locks and try again.
+ *
+ * Changing sighand is an infrequent and somewhat expensive operation
+ * (unshare or exec) and so even in the worst case this loop
+ * should not loop too many times before all of the proper locks are
+ * taken in order.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held
+ *
+ * RETURNS:
+ * current's, real_parent's, and parent's siglock held.
+ */
+static void lock_parents_siglocks(bool lock_tracer)
+	__releases(&current->sighand->siglock)
+	__acquires(&current->sighand->siglock)
+	__acquires(&current->real_parent->sighand->siglock)
+	__acquires(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct sighand_struct *m_sighand = me->sighand;
+
+	lockdep_assert_held(&m_sighand->siglock);
+
+	rcu_read_lock();
+	for (;;) {
+		struct task_struct *parent, *tracer;
+		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
+
+		parent = me->real_parent;
+		tracer = ptrace_parent(me);
+		if (!tracer || !lock_tracer)
+			tracer = parent;
+
+		p_sighand = rcu_dereference(parent->sighand);
+		t_sighand = rcu_dereference(tracer->sighand);
+
+		/* Sort the sighands so that s1 >= s2 >= s3 */
+		s1 = m_sighand;
+		s2 = p_sighand;
+		s3 = t_sighand;
+		if (s1 > s2)
+			swap(s1, s2);
+		if (s1 > s3)
+			swap(s1, s3);
+		if (s2 > s3)
+			swap(s2, s3);
+
+		/* Take the locks in order */
+		if (s1 != m_sighand) {
+			spin_unlock(&m_sighand->siglock);
+			spin_lock(&s1->siglock);
+		}
+		if (s1 != s2)
+			spin_lock_nested(&s2->siglock, 1);
+		if (s2 != s3)
+			spin_lock_nested(&s3->siglock, 2);
+
+		/* Verify the proper locks are held */
+		if (likely((s1 == m_sighand) ||
+			   ((me->real_parent == parent) &&
+			    (me->parent == tracer) &&
+			    (parent->sighand == p_sighand) &&
+			    (tracer->sighand == t_sighand)))) {
+			break;
+		}
+
+		/* Drop all but current's siglock */
+		if (p_sighand != m_sighand)
+			spin_unlock(&p_sighand->siglock);
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+
+		/*
+		 * Since [pt]_sighand will likely change if we go
+		 * around, and m_sighand is the only one held, make sure
+		 * it is subclass-0, since the above 's1 != m_sighand'
+		 * clause very much relies on that.
+		 */
+		lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
+	}
+	rcu_read_unlock();
+}
+
+static void unlock_parents_siglocks(bool unlock_tracer)
+	__releases(&current->real_parent->sighand->siglock)
+	__releases(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct task_struct *parent = me->real_parent;
+	struct task_struct *tracer = ptrace_parent(me);
+	struct sighand_struct *m_sighand = me->sighand;
+	struct sighand_struct *p_sighand = parent->sighand;
+
+	if (p_sighand != m_sighand)
+		spin_unlock(&p_sighand->siglock);
+	if (tracer && unlock_tracer) {
+		struct sighand_struct *t_sighand = tracer->sighand;
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+	}
+}
+
 static void do_notify_pidfd(struct task_struct *task)
 {
 	struct pid *pid;
@@ -2125,11 +2248,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
 	u64 utime, stime;
 
+	lockdep_assert_held(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2137,6 +2261,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 		parent = tsk->real_parent;
 	}
 
+	lockdep_assert_held(&parent->sighand->siglock);
+
 	clear_siginfo(&info);
 	info.si_signo = SIGCHLD;
 	info.si_errno = 0;
@@ -2168,7 +2294,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2176,7 +2301,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
 /*
@@ -2208,14 +2332,18 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	lock_parents_siglocks(true);
 	/*
 	 * After this point ptrace_signal_wake_up or signal_wake_up
 	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
 	 * signal comes in.  Handle previous ptrace_unlinks and fatal
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace || __fatal_signal_pending(current))
+
+	if (!current->ptrace || __fatal_signal_pending(current)) {
+		unlock_parents_siglocks(true);
 		return;
+	}
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2254,16 +2382,6 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
-	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
-	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
-	if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
-		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
-
-	/* entering a trap, clear TRAPPING */
-	task_clear_jobctl_trapping(current);
-
-	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
 	/*
 	 * Notify parents of the stop.
 	 *
@@ -2279,14 +2397,25 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
 		do_notify_parent_cldstop(current, false, why);
 
+	unlock_parents_siglocks(true);
+
+	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
+	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+	if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
+		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
+
+	/* entering a trap, clear TRAPPING */
+	task_clear_jobctl_trapping(current);
+
 	/*
 	 * Don't want to allow preemption here, because
 	 * sys_ptrace() needs this task to be inactive.
 	 *
-	 * XXX: implement read_unlock_no_resched().
+	 * XXX: implement spin_unlock_no_resched().
 	 */
 	preempt_disable();
-	read_unlock(&tasklist_lock);
+	spin_unlock_irq(&current->sighand->siglock);
+
 	cgroup_enter_frozen();
 	preempt_enable_no_resched();
 	freezable_schedule();
@@ -2361,8 +2490,8 @@ int ptrace_notify(int exit_code, unsigned long message)
  * on %true return.
  *
  * RETURNS:
- * %false if group stop is already cancelled or ptrace trap is scheduled.
- * %true if participated in group stop.
+ * %false if group stop is already cancelled.
+ * %true otherwise (as lock_parents_siglocks may have dropped siglock).
  */
 static bool do_signal_stop(int signr)
 	__releases(&current->sighand->siglock)
@@ -2425,36 +2554,24 @@ static bool do_signal_stop(int signr)
 		}
 	}
 
+	lock_parents_siglocks(false);
+	/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+	if (unlikely(!(current->jobctl & JOBCTL_STOP_PENDING)))
+		goto out;
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 
 		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2465,8 +2582,11 @@ static bool do_signal_stop(int signr)
 		 * Schedule it and let the caller deal with it.
 		 */
 		task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
-		return false;
 	}
+out:
+	unlock_parents_siglocks(false);
+	spin_unlock_irq(&current->sighand->siglock);
+	return true;
 }
 
 /**
@@ -2624,32 +2744,30 @@ bool get_signal(struct ksignal *ksig)
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
 		int why;
 
-		if (signal->flags & SIGNAL_CLD_CONTINUED)
-			why = CLD_CONTINUED;
-		else
-			why = CLD_STOPPED;
+		lock_parents_siglocks(true);
+		/* Recheck signal->flags after unlock+lock of siglock */
+		if (likely(signal->flags & SIGNAL_CLD_MASK)) {
+			if (signal->flags & SIGNAL_CLD_CONTINUED)
+				why = CLD_CONTINUED;
+			else
+				why = CLD_STOPPED;
 
-		signal->flags &= ~SIGNAL_CLD_MASK;
+			signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
-		/*
-		 * Notify the parent that we're continuing.  This event is
-		 * always per-process and doesn't make whole lot of sense
-		 * for ptracers, who shouldn't consume the state via
-		 * wait(2) either, but, for backward compatibility, notify
-		 * the ptracer of the group leader too unless it's gonna be
-		 * a duplicate.
-		 */
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, false, why);
-
-		if (ptrace_reparented(current->group_leader))
-			do_notify_parent_cldstop(current->group_leader,
-						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
+			/*
+			 * Notify the parent that we're continuing.  This event is
+			 * always per-process and doesn't make whole lot of sense
+			 * for ptracers, who shouldn't consume the state via
+			 * wait(2) either, but, for backward compatibility, notify
+			 * the ptracer of the group leader too unless it's gonna be
+			 * a duplicate.
+			 */
+			do_notify_parent_cldstop(current, false, why);
+			if (ptrace_reparented(current->group_leader))
+				do_notify_parent_cldstop(current->group_leader,
+							 true, why);
+		}
+		unlock_parents_siglocks(true);
 	}
 
 	for (;;) {
@@ -2906,7 +3024,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2937,21 +3054,20 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING)) {
+		lock_parents_siglocks(false);
+		/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+		if ((tsk->jobctl & JOBCTL_STOP_PENDING) &&
+		    task_participate_group_stop(tsk))
+			do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 	}
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

This removes one reason for taking tasklist_lock.

This makes ptrace_stop so that it should reliably work correctly and
reliably with PREEMPT_RT enabled and CONFIG_CGROUPS disabled.  The
remaining challenge is that cgroup_enter_frozen takes spin_lock after
__state has been set to TASK_TRACED.  Which on PREEMPT_RT means the
code can sleep and change __state.  Not only that but it means that
wait_task_inactive could potentially detect the code scheduling away
at that point and fail, causing ptrace_check_attach to fail.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 262 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 189 insertions(+), 73 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2cc45e8448e2..d4956be51939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1994,6 +1994,129 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
+/**
+ * lock_parents_siglocks - Take current, real_parent, and parent's siglock
+ * @lock_tracer: The tracers siglock is needed.
+ *
+ * There is no natural ordering to these locks so they must be sorted
+ * before being taken.
+ *
+ * There are two complicating factors here:
+ * - The locks live in sighand and sighand can be arbitrarily shared
+ * - parent and real_parent can change when current's siglock is unlocked.
+ *
+ * To deal with this first the all of the sighand pointers are
+ * gathered under current's siglock, and the sighand pointers are
+ * sorted.  As siglock lives inside of sighand this also sorts the
+ * siglock's by address.
+ *
+ * Then the siglocks are taken in order dropping current's siglock if
+ * necessary.
+ *
+ * Finally if parent and real_parent have not changed return.
+ * If they either parent has changed drop their locks and try again.
+ *
+ * Changing sighand is an infrequent and somewhat expensive operation
+ * (unshare or exec) and so even in the worst case this loop
+ * should not loop too many times before all of the proper locks are
+ * taken in order.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held
+ *
+ * RETURNS:
+ * current's, real_parent's, and parent's siglock held.
+ */
+static void lock_parents_siglocks(bool lock_tracer)
+	__releases(&current->sighand->siglock)
+	__acquires(&current->sighand->siglock)
+	__acquires(&current->real_parent->sighand->siglock)
+	__acquires(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct sighand_struct *m_sighand = me->sighand;
+
+	lockdep_assert_held(&m_sighand->siglock);
+
+	rcu_read_lock();
+	for (;;) {
+		struct task_struct *parent, *tracer;
+		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
+
+		parent = me->real_parent;
+		tracer = ptrace_parent(me);
+		if (!tracer || !lock_tracer)
+			tracer = parent;
+
+		p_sighand = rcu_dereference(parent->sighand);
+		t_sighand = rcu_dereference(tracer->sighand);
+
+		/* Sort the sighands so that s1 >= s2 >= s3 */
+		s1 = m_sighand;
+		s2 = p_sighand;
+		s3 = t_sighand;
+		if (s1 > s2)
+			swap(s1, s2);
+		if (s1 > s3)
+			swap(s1, s3);
+		if (s2 > s3)
+			swap(s2, s3);
+
+		/* Take the locks in order */
+		if (s1 != m_sighand) {
+			spin_unlock(&m_sighand->siglock);
+			spin_lock(&s1->siglock);
+		}
+		if (s1 != s2)
+			spin_lock_nested(&s2->siglock, 1);
+		if (s2 != s3)
+			spin_lock_nested(&s3->siglock, 2);
+
+		/* Verify the proper locks are held */
+		if (likely((s1 == m_sighand) ||
+			   ((me->real_parent == parent) &&
+			    (me->parent == tracer) &&
+			    (parent->sighand == p_sighand) &&
+			    (tracer->sighand == t_sighand)))) {
+			break;
+		}
+
+		/* Drop all but current's siglock */
+		if (p_sighand != m_sighand)
+			spin_unlock(&p_sighand->siglock);
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+
+		/*
+		 * Since [pt]_sighand will likely change if we go
+		 * around, and m_sighand is the only one held, make sure
+		 * it is subclass-0, since the above 's1 != m_sighand'
+		 * clause very much relies on that.
+		 */
+		lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
+	}
+	rcu_read_unlock();
+}
+
+static void unlock_parents_siglocks(bool unlock_tracer)
+	__releases(&current->real_parent->sighand->siglock)
+	__releases(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct task_struct *parent = me->real_parent;
+	struct task_struct *tracer = ptrace_parent(me);
+	struct sighand_struct *m_sighand = me->sighand;
+	struct sighand_struct *p_sighand = parent->sighand;
+
+	if (p_sighand != m_sighand)
+		spin_unlock(&p_sighand->siglock);
+	if (tracer && unlock_tracer) {
+		struct sighand_struct *t_sighand = tracer->sighand;
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+	}
+}
+
 static void do_notify_pidfd(struct task_struct *task)
 {
 	struct pid *pid;
@@ -2125,11 +2248,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
 	u64 utime, stime;
 
+	lockdep_assert_held(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2137,6 +2261,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 		parent = tsk->real_parent;
 	}
 
+	lockdep_assert_held(&parent->sighand->siglock);
+
 	clear_siginfo(&info);
 	info.si_signo = SIGCHLD;
 	info.si_errno = 0;
@@ -2168,7 +2294,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2176,7 +2301,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
 /*
@@ -2208,14 +2332,18 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	lock_parents_siglocks(true);
 	/*
 	 * After this point ptrace_signal_wake_up or signal_wake_up
 	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
 	 * signal comes in.  Handle previous ptrace_unlinks and fatal
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace || __fatal_signal_pending(current))
+
+	if (!current->ptrace || __fatal_signal_pending(current)) {
+		unlock_parents_siglocks(true);
 		return;
+	}
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2254,16 +2382,6 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
-	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
-	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
-	if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
-		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
-
-	/* entering a trap, clear TRAPPING */
-	task_clear_jobctl_trapping(current);
-
-	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
 	/*
 	 * Notify parents of the stop.
 	 *
@@ -2279,14 +2397,25 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
 		do_notify_parent_cldstop(current, false, why);
 
+	unlock_parents_siglocks(true);
+
+	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
+	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+	if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
+		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
+
+	/* entering a trap, clear TRAPPING */
+	task_clear_jobctl_trapping(current);
+
 	/*
 	 * Don't want to allow preemption here, because
 	 * sys_ptrace() needs this task to be inactive.
 	 *
-	 * XXX: implement read_unlock_no_resched().
+	 * XXX: implement spin_unlock_no_resched().
 	 */
 	preempt_disable();
-	read_unlock(&tasklist_lock);
+	spin_unlock_irq(&current->sighand->siglock);
+
 	cgroup_enter_frozen();
 	preempt_enable_no_resched();
 	freezable_schedule();
@@ -2361,8 +2490,8 @@ int ptrace_notify(int exit_code, unsigned long message)
  * on %true return.
  *
  * RETURNS:
- * %false if group stop is already cancelled or ptrace trap is scheduled.
- * %true if participated in group stop.
+ * %false if group stop is already cancelled.
+ * %true otherwise (as lock_parents_siglocks may have dropped siglock).
  */
 static bool do_signal_stop(int signr)
 	__releases(&current->sighand->siglock)
@@ -2425,36 +2554,24 @@ static bool do_signal_stop(int signr)
 		}
 	}
 
+	lock_parents_siglocks(false);
+	/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+	if (unlikely(!(current->jobctl & JOBCTL_STOP_PENDING)))
+		goto out;
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 
 		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2465,8 +2582,11 @@ static bool do_signal_stop(int signr)
 		 * Schedule it and let the caller deal with it.
 		 */
 		task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
-		return false;
 	}
+out:
+	unlock_parents_siglocks(false);
+	spin_unlock_irq(&current->sighand->siglock);
+	return true;
 }
 
 /**
@@ -2624,32 +2744,30 @@ bool get_signal(struct ksignal *ksig)
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
 		int why;
 
-		if (signal->flags & SIGNAL_CLD_CONTINUED)
-			why = CLD_CONTINUED;
-		else
-			why = CLD_STOPPED;
+		lock_parents_siglocks(true);
+		/* Recheck signal->flags after unlock+lock of siglock */
+		if (likely(signal->flags & SIGNAL_CLD_MASK)) {
+			if (signal->flags & SIGNAL_CLD_CONTINUED)
+				why = CLD_CONTINUED;
+			else
+				why = CLD_STOPPED;
 
-		signal->flags &= ~SIGNAL_CLD_MASK;
+			signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
-		/*
-		 * Notify the parent that we're continuing.  This event is
-		 * always per-process and doesn't make whole lot of sense
-		 * for ptracers, who shouldn't consume the state via
-		 * wait(2) either, but, for backward compatibility, notify
-		 * the ptracer of the group leader too unless it's gonna be
-		 * a duplicate.
-		 */
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, false, why);
-
-		if (ptrace_reparented(current->group_leader))
-			do_notify_parent_cldstop(current->group_leader,
-						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
+			/*
+			 * Notify the parent that we're continuing.  This event is
+			 * always per-process and doesn't make whole lot of sense
+			 * for ptracers, who shouldn't consume the state via
+			 * wait(2) either, but, for backward compatibility, notify
+			 * the ptracer of the group leader too unless it's gonna be
+			 * a duplicate.
+			 */
+			do_notify_parent_cldstop(current, false, why);
+			if (ptrace_reparented(current->group_leader))
+				do_notify_parent_cldstop(current->group_leader,
+							 true, why);
+		}
+		unlock_parents_siglocks(true);
 	}
 
 	for (;;) {
@@ -2906,7 +3024,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2937,21 +3054,20 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING)) {
+		lock_parents_siglocks(false);
+		/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+		if ((tsk->jobctl & JOBCTL_STOP_PENDING) &&
+		    task_participate_group_stop(tsk))
+			do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 	}
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-05-18 22:53                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-18 22:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: rjw, Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Eric W. Biederman

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

This removes one reason for taking tasklist_lock.

This makes ptrace_stop so that it should reliably work correctly and
reliably with PREEMPT_RT enabled and CONFIG_CGROUPS disabled.  The
remaining challenge is that cgroup_enter_frozen takes spin_lock after
__state has been set to TASK_TRACED.  Which on PREEMPT_RT means the
code can sleep and change __state.  Not only that but it means that
wait_task_inactive could potentially detect the code scheduling away
at that point and fail, causing ptrace_check_attach to fail.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/signal.c | 262 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 189 insertions(+), 73 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2cc45e8448e2..d4956be51939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1994,6 +1994,129 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 	return ret;
 }
 
+/**
+ * lock_parents_siglocks - Take current, real_parent, and parent's siglock
+ * @lock_tracer: The tracers siglock is needed.
+ *
+ * There is no natural ordering to these locks so they must be sorted
+ * before being taken.
+ *
+ * There are two complicating factors here:
+ * - The locks live in sighand and sighand can be arbitrarily shared
+ * - parent and real_parent can change when current's siglock is unlocked.
+ *
+ * To deal with this first the all of the sighand pointers are
+ * gathered under current's siglock, and the sighand pointers are
+ * sorted.  As siglock lives inside of sighand this also sorts the
+ * siglock's by address.
+ *
+ * Then the siglocks are taken in order dropping current's siglock if
+ * necessary.
+ *
+ * Finally if parent and real_parent have not changed return.
+ * If they either parent has changed drop their locks and try again.
+ *
+ * Changing sighand is an infrequent and somewhat expensive operation
+ * (unshare or exec) and so even in the worst case this loop
+ * should not loop too many times before all of the proper locks are
+ * taken in order.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held
+ *
+ * RETURNS:
+ * current's, real_parent's, and parent's siglock held.
+ */
+static void lock_parents_siglocks(bool lock_tracer)
+	__releases(&current->sighand->siglock)
+	__acquires(&current->sighand->siglock)
+	__acquires(&current->real_parent->sighand->siglock)
+	__acquires(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct sighand_struct *m_sighand = me->sighand;
+
+	lockdep_assert_held(&m_sighand->siglock);
+
+	rcu_read_lock();
+	for (;;) {
+		struct task_struct *parent, *tracer;
+		struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
+
+		parent = me->real_parent;
+		tracer = ptrace_parent(me);
+		if (!tracer || !lock_tracer)
+			tracer = parent;
+
+		p_sighand = rcu_dereference(parent->sighand);
+		t_sighand = rcu_dereference(tracer->sighand);
+
+		/* Sort the sighands so that s1 >= s2 >= s3 */
+		s1 = m_sighand;
+		s2 = p_sighand;
+		s3 = t_sighand;
+		if (s1 > s2)
+			swap(s1, s2);
+		if (s1 > s3)
+			swap(s1, s3);
+		if (s2 > s3)
+			swap(s2, s3);
+
+		/* Take the locks in order */
+		if (s1 != m_sighand) {
+			spin_unlock(&m_sighand->siglock);
+			spin_lock(&s1->siglock);
+		}
+		if (s1 != s2)
+			spin_lock_nested(&s2->siglock, 1);
+		if (s2 != s3)
+			spin_lock_nested(&s3->siglock, 2);
+
+		/* Verify the proper locks are held */
+		if (likely((s1 = m_sighand) ||
+			   ((me->real_parent = parent) &&
+			    (me->parent = tracer) &&
+			    (parent->sighand = p_sighand) &&
+			    (tracer->sighand = t_sighand)))) {
+			break;
+		}
+
+		/* Drop all but current's siglock */
+		if (p_sighand != m_sighand)
+			spin_unlock(&p_sighand->siglock);
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+
+		/*
+		 * Since [pt]_sighand will likely change if we go
+		 * around, and m_sighand is the only one held, make sure
+		 * it is subclass-0, since the above 's1 != m_sighand'
+		 * clause very much relies on that.
+		 */
+		lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
+	}
+	rcu_read_unlock();
+}
+
+static void unlock_parents_siglocks(bool unlock_tracer)
+	__releases(&current->real_parent->sighand->siglock)
+	__releases(&current->parent->sighand->siglock)
+{
+	struct task_struct *me = current;
+	struct task_struct *parent = me->real_parent;
+	struct task_struct *tracer = ptrace_parent(me);
+	struct sighand_struct *m_sighand = me->sighand;
+	struct sighand_struct *p_sighand = parent->sighand;
+
+	if (p_sighand != m_sighand)
+		spin_unlock(&p_sighand->siglock);
+	if (tracer && unlock_tracer) {
+		struct sighand_struct *t_sighand = tracer->sighand;
+		if (t_sighand != p_sighand)
+			spin_unlock(&t_sighand->siglock);
+	}
+}
+
 static void do_notify_pidfd(struct task_struct *task)
 {
 	struct pid *pid;
@@ -2125,11 +2248,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 				     bool for_ptracer, int why)
 {
 	struct kernel_siginfo info;
-	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
 	u64 utime, stime;
 
+	lockdep_assert_held(&tsk->sighand->siglock);
+
 	if (for_ptracer) {
 		parent = tsk->parent;
 	} else {
@@ -2137,6 +2261,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 		parent = tsk->real_parent;
 	}
 
+	lockdep_assert_held(&parent->sighand->siglock);
+
 	clear_siginfo(&info);
 	info.si_signo = SIGCHLD;
 	info.si_errno = 0;
@@ -2168,7 +2294,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
  	}
 
 	sighand = parent->sighand;
-	spin_lock_irqsave(&sighand->siglock, flags);
 	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
 	    !(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
 		send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2176,7 +2301,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
 	 */
 	__wake_up_parent(tsk, parent);
-	spin_unlock_irqrestore(&sighand->siglock, flags);
 }
 
 /*
@@ -2208,14 +2332,18 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 		spin_lock_irq(&current->sighand->siglock);
 	}
 
+	lock_parents_siglocks(true);
 	/*
 	 * After this point ptrace_signal_wake_up or signal_wake_up
 	 * will clear TASK_TRACED if ptrace_unlink happens or a fatal
 	 * signal comes in.  Handle previous ptrace_unlinks and fatal
 	 * signals here to prevent ptrace_stop sleeping in schedule.
 	 */
-	if (!current->ptrace || __fatal_signal_pending(current))
+
+	if (!current->ptrace || __fatal_signal_pending(current)) {
+		unlock_parents_siglocks(true);
 		return;
+	}
 
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
@@ -2254,16 +2382,6 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (why = CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
 		gstop_done = task_participate_group_stop(current);
 
-	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
-	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
-	if (info && info->si_code >> 8 = PTRACE_EVENT_STOP)
-		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
-
-	/* entering a trap, clear TRAPPING */
-	task_clear_jobctl_trapping(current);
-
-	spin_unlock_irq(&current->sighand->siglock);
-	read_lock(&tasklist_lock);
 	/*
 	 * Notify parents of the stop.
 	 *
@@ -2279,14 +2397,25 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
 	if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
 		do_notify_parent_cldstop(current, false, why);
 
+	unlock_parents_siglocks(true);
+
+	/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
+	task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+	if (info && info->si_code >> 8 = PTRACE_EVENT_STOP)
+		task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
+
+	/* entering a trap, clear TRAPPING */
+	task_clear_jobctl_trapping(current);
+
 	/*
 	 * Don't want to allow preemption here, because
 	 * sys_ptrace() needs this task to be inactive.
 	 *
-	 * XXX: implement read_unlock_no_resched().
+	 * XXX: implement spin_unlock_no_resched().
 	 */
 	preempt_disable();
-	read_unlock(&tasklist_lock);
+	spin_unlock_irq(&current->sighand->siglock);
+
 	cgroup_enter_frozen();
 	preempt_enable_no_resched();
 	freezable_schedule();
@@ -2361,8 +2490,8 @@ int ptrace_notify(int exit_code, unsigned long message)
  * on %true return.
  *
  * RETURNS:
- * %false if group stop is already cancelled or ptrace trap is scheduled.
- * %true if participated in group stop.
+ * %false if group stop is already cancelled.
+ * %true otherwise (as lock_parents_siglocks may have dropped siglock).
  */
 static bool do_signal_stop(int signr)
 	__releases(&current->sighand->siglock)
@@ -2425,36 +2554,24 @@ static bool do_signal_stop(int signr)
 		}
 	}
 
+	lock_parents_siglocks(false);
+	/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+	if (unlikely(!(current->jobctl & JOBCTL_STOP_PENDING)))
+		goto out;
 	if (likely(!current->ptrace)) {
-		int notify = 0;
-
 		/*
 		 * If there are no other threads in the group, or if there
 		 * is a group stop in progress and we are the last to stop,
-		 * report to the parent.
+		 * report to the real_parent.
 		 */
 		if (task_participate_group_stop(current))
-			notify = CLD_STOPPED;
+			do_notify_parent_cldstop(current, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 
 		current->jobctl |= JOBCTL_STOPPED;
 		set_special_state(TASK_STOPPED);
 		spin_unlock_irq(&current->sighand->siglock);
 
-		/*
-		 * Notify the parent of the group stop completion.  Because
-		 * we're not holding either the siglock or tasklist_lock
-		 * here, ptracer may attach inbetween; however, this is for
-		 * group stop and should always be delivered to the real
-		 * parent of the group leader.  The new ptracer will get
-		 * its notification when this task transitions into
-		 * TASK_TRACED.
-		 */
-		if (notify) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current, false, notify);
-			read_unlock(&tasklist_lock);
-		}
-
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
 		cgroup_enter_frozen();
 		freezable_schedule();
@@ -2465,8 +2582,11 @@ static bool do_signal_stop(int signr)
 		 * Schedule it and let the caller deal with it.
 		 */
 		task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
-		return false;
 	}
+out:
+	unlock_parents_siglocks(false);
+	spin_unlock_irq(&current->sighand->siglock);
+	return true;
 }
 
 /**
@@ -2624,32 +2744,30 @@ bool get_signal(struct ksignal *ksig)
 	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
 		int why;
 
-		if (signal->flags & SIGNAL_CLD_CONTINUED)
-			why = CLD_CONTINUED;
-		else
-			why = CLD_STOPPED;
+		lock_parents_siglocks(true);
+		/* Recheck signal->flags after unlock+lock of siglock */
+		if (likely(signal->flags & SIGNAL_CLD_MASK)) {
+			if (signal->flags & SIGNAL_CLD_CONTINUED)
+				why = CLD_CONTINUED;
+			else
+				why = CLD_STOPPED;
 
-		signal->flags &= ~SIGNAL_CLD_MASK;
+			signal->flags &= ~SIGNAL_CLD_MASK;
 
-		spin_unlock_irq(&sighand->siglock);
-
-		/*
-		 * Notify the parent that we're continuing.  This event is
-		 * always per-process and doesn't make whole lot of sense
-		 * for ptracers, who shouldn't consume the state via
-		 * wait(2) either, but, for backward compatibility, notify
-		 * the ptracer of the group leader too unless it's gonna be
-		 * a duplicate.
-		 */
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, false, why);
-
-		if (ptrace_reparented(current->group_leader))
-			do_notify_parent_cldstop(current->group_leader,
-						true, why);
-		read_unlock(&tasklist_lock);
-
-		goto relock;
+			/*
+			 * Notify the parent that we're continuing.  This event is
+			 * always per-process and doesn't make whole lot of sense
+			 * for ptracers, who shouldn't consume the state via
+			 * wait(2) either, but, for backward compatibility, notify
+			 * the ptracer of the group leader too unless it's gonna be
+			 * a duplicate.
+			 */
+			do_notify_parent_cldstop(current, false, why);
+			if (ptrace_reparented(current->group_leader))
+				do_notify_parent_cldstop(current->group_leader,
+							 true, why);
+		}
+		unlock_parents_siglocks(true);
 	}
 
 	for (;;) {
@@ -2906,7 +3024,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 
 void exit_signals(struct task_struct *tsk)
 {
-	int group_stop = 0;
 	sigset_t unblocked;
 
 	/*
@@ -2937,21 +3054,20 @@ void exit_signals(struct task_struct *tsk)
 	signotset(&unblocked);
 	retarget_shared_pending(tsk, &unblocked);
 
-	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
-	    task_participate_group_stop(tsk))
-		group_stop = CLD_STOPPED;
-out:
-	spin_unlock_irq(&tsk->sighand->siglock);
-
 	/*
 	 * If group stop has completed, deliver the notification.  This
 	 * should always go to the real parent of the group leader.
 	 */
-	if (unlikely(group_stop)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, false, group_stop);
-		read_unlock(&tasklist_lock);
+	if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING)) {
+		lock_parents_siglocks(false);
+		/* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+		if ((tsk->jobctl & JOBCTL_STOP_PENDING) &&
+		    task_participate_group_stop(tsk))
+			do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+		unlock_parents_siglocks(false);
 	}
+out:
+	spin_unlock_irq(&tsk->sighand->siglock);
 }
 
 /*
-- 
2.35.3

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
       [not found]               ` <CALWUPBdFDLuT7JaNGSJ_UXbHf8y9uKdC-SkAqzd=FQC0MX4nNQ@mail.gmail.com>
  2022-05-19  6:19                   ` Sebastian Andrzej Siewior
@ 2022-05-19  6:19                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-19  6:19 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Eric W. Biederman, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-19  6:19                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-19  6:19 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Eric W. Biederman, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-19  6:19                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-19  6:19 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Eric W. Biederman, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-19  7:56                   ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-19  7:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
> 
> This is silly, and I expect it never comes up in ptractice.  As there
                                                   ^^^^^^^^^

Lol, love the new word :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19  7:56                   ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-19  7:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
> 
> This is silly, and I expect it never comes up in ptractice.  As there
                                                   ^^^^^^^^^

Lol, love the new word :-)

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19  7:56                   ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-19  7:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
> 
> This is silly, and I expect it never comes up in ptractice.  As there
                                                   ^^^^^^^^^

Lol, love the new word :-)

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-19  6:19                   ` Sebastian Andrzej Siewior
  (?)
@ 2022-05-19 18:05                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Kyle Huey, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric




^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-19 18:05                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Kyle Huey, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric




_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-19 18:05                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Kyle Huey, LKML, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Jason Wessel, Daniel Thompson,
	Douglas Anderson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-19  7:56                   ` Peter Zijlstra
  (?)
@ 2022-05-19 18:06                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>> 
>> This is silly, and I expect it never comes up in ptractice.  As there
>                                                    ^^^^^^^^^
>
> Lol, love the new word :-)

It wasn't intentional but now I just might have to keep it.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 18:06                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>> 
>> This is silly, and I expect it never comes up in ptractice.  As there
>                                                    ^^^^^^^^^
>
> Lol, love the new word :-)

It wasn't intentional but now I just might have to keep it.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 18:06                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>> 
>> This is silly, and I expect it never comes up in ptractice.  As there
>                                                    ^^^^^^^^^
>
> Lol, love the new word :-)

It wasn't intentional but now I just might have to keep it.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-19 20:52                   ` Doug Anderson
  -1 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-19 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice.  As there
> is very little point in using gdb and kdb simultaneously.  Update the
> code to use real_parent so that it is clear kdb does not want to
> display a debugger as the parent of a process.

So I would tend to defer to Daniel, but I'm not convinced that the
behavior you describe for kdb today _is_ actually silly.

If I was in kdb and I was listing processes, I might actually want to
see that a process's parent was set to gdb. Presumably that would tell
me extra information that might be relevant to my debug session.

Personally, I'd rather add an extra piece of information into the list
showing the real parent if it's not the same as the parent. Then
you're not throwing away information.

-Doug

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 20:52                   ` Doug Anderson
  0 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-19 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice.  As there
> is very little point in using gdb and kdb simultaneously.  Update the
> code to use real_parent so that it is clear kdb does not want to
> display a debugger as the parent of a process.

So I would tend to defer to Daniel, but I'm not convinced that the
behavior you describe for kdb today _is_ actually silly.

If I was in kdb and I was listing processes, I might actually want to
see that a process's parent was set to gdb. Presumably that would tell
me extra information that might be relevant to my debug session.

Personally, I'd rather add an extra piece of information into the list
showing the real parent if it's not the same as the parent. Then
you're not throwing away information.

-Doug

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 20:52                   ` Doug Anderson
  0 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-19 20:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice.  As there
> is very little point in using gdb and kdb simultaneously.  Update the
> code to use real_parent so that it is clear kdb does not want to
> display a debugger as the parent of a process.

So I would tend to defer to Daniel, but I'm not convinced that the
behavior you describe for kdb today _is_ actually silly.

If I was in kdb and I was listing processes, I might actually want to
see that a process's parent was set to gdb. Presumably that would tell
me extra information that might be relevant to my debug session.

Personally, I'd rather add an extra piece of information into the list
showing the real parent if it's not the same as the parent. Then
you're not throwing away information.

-Doug

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-19 20:52                   ` Doug Anderson
  (?)
@ 2022-05-19 23:48                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 23:48 UTC (permalink / raw)
  To: Doug Anderson
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Doug Anderson <dianders@chromium.org> writes:

> Hi,
>
> On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice.  As there
>> is very little point in using gdb and kdb simultaneously.  Update the
>> code to use real_parent so that it is clear kdb does not want to
>> display a debugger as the parent of a process.
>
> So I would tend to defer to Daniel, but I'm not convinced that the
> behavior you describe for kdb today _is_ actually silly.
>
> If I was in kdb and I was listing processes, I might actually want to
> see that a process's parent was set to gdb. Presumably that would tell
> me extra information that might be relevant to my debug session.
>
> Personally, I'd rather add an extra piece of information into the list
> showing the real parent if it's not the same as the parent. Then
> you're not throwing away information.

The name of the field is confusing for anyone who isn't intimate with
the implementation details.  The function getppid returns
tsk->real_parent->tgid.

If kdb wants information of what the tracer is that is fine, but I
recommend putting that information in another field.

Given that the original description says give the information that ps
gives my sense is that kdb is currently wrong.  Especially as it does
not give you the actual parentage anywhere.

I can certainly be convinced, but I do want some clarity.  It looks very
attractive to rename task->parent to task->ptracer and leave the field
NULL when there is no tracer.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 23:48                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 23:48 UTC (permalink / raw)
  To: Doug Anderson
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Doug Anderson <dianders@chromium.org> writes:

> Hi,
>
> On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice.  As there
>> is very little point in using gdb and kdb simultaneously.  Update the
>> code to use real_parent so that it is clear kdb does not want to
>> display a debugger as the parent of a process.
>
> So I would tend to defer to Daniel, but I'm not convinced that the
> behavior you describe for kdb today _is_ actually silly.
>
> If I was in kdb and I was listing processes, I might actually want to
> see that a process's parent was set to gdb. Presumably that would tell
> me extra information that might be relevant to my debug session.
>
> Personally, I'd rather add an extra piece of information into the list
> showing the real parent if it's not the same as the parent. Then
> you're not throwing away information.

The name of the field is confusing for anyone who isn't intimate with
the implementation details.  The function getppid returns
tsk->real_parent->tgid.

If kdb wants information of what the tracer is that is fine, but I
recommend putting that information in another field.

Given that the original description says give the information that ps
gives my sense is that kdb is currently wrong.  Especially as it does
not give you the actual parentage anywhere.

I can certainly be convinced, but I do want some clarity.  It looks very
attractive to rename task->parent to task->ptracer and leave the field
NULL when there is no tracer.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-19 23:48                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-19 23:48 UTC (permalink / raw)
  To: Doug Anderson
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Doug Anderson <dianders@chromium.org> writes:

> Hi,
>
> On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice.  As there
>> is very little point in using gdb and kdb simultaneously.  Update the
>> code to use real_parent so that it is clear kdb does not want to
>> display a debugger as the parent of a process.
>
> So I would tend to defer to Daniel, but I'm not convinced that the
> behavior you describe for kdb today _is_ actually silly.
>
> If I was in kdb and I was listing processes, I might actually want to
> see that a process's parent was set to gdb. Presumably that would tell
> me extra information that might be relevant to my debug session.
>
> Personally, I'd rather add an extra piece of information into the list
> showing the real parent if it's not the same as the parent. Then
> you're not throwing away information.

The name of the field is confusing for anyone who isn't intimate with
the implementation details.  The function getppid returns
tsk->real_parent->tgid.

If kdb wants information of what the tracer is that is fine, but I
recommend putting that information in another field.

Given that the original description says give the information that ps
gives my sense is that kdb is currently wrong.  Especially as it does
not give you the actual parentage anywhere.

I can certainly be convinced, but I do want some clarity.  It looks very
attractive to rename task->parent to task->ptracer and leave the field
NULL when there is no tracer.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-19 18:05                     ` Eric W. Biederman
  (?)
@ 2022-05-20  5:24                       ` Kyle Huey
  -1 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-05-20  5:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[  812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
    I  L    5.18.0-rc1+ #2
[  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[  812.151601] Call Trace:
[  812.151602]  <TASK>
[  812.151604]  do_signal_stop+0x228/0x260
[  812.151606]  get_signal+0x43a/0x8e0
[  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
[  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
[  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
[  812.151620]  syscall_exit_to_user_mode+0x26/0x40
[  812.151621]  ret_from_fork+0x15/0x30
[  812.151623] RIP: 0033:0x7f612dfcd125
[  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[  812.151632]  </TASK>

- Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  5:24                       ` Kyle Huey
  0 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-05-20  5:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[  812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
    I  L    5.18.0-rc1+ #2
[  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[  812.151601] Call Trace:
[  812.151602]  <TASK>
[  812.151604]  do_signal_stop+0x228/0x260
[  812.151606]  get_signal+0x43a/0x8e0
[  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
[  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
[  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
[  812.151620]  syscall_exit_to_user_mode+0x26/0x40
[  812.151621]  ret_from_fork+0x15/0x30
[  812.151623] RIP: 0033:0x7f612dfcd125
[  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[  812.151632]  </TASK>

- Kyle

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  5:24                       ` Kyle Huey
  0 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-05-20  5:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[  812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
    I  L    5.18.0-rc1+ #2
[  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[  812.151601] Call Trace:
[  812.151602]  <TASK>
[  812.151604]  do_signal_stop+0x228/0x260
[  812.151606]  get_signal+0x43a/0x8e0
[  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
[  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
[  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
[  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
[  812.151620]  syscall_exit_to_user_mode+0x26/0x40
[  812.151621]  ret_from_fork+0x15/0x30
[  812.151623] RIP: 0033:0x7f612dfcd125
[  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[  812.151632]  </TASK>

- Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-20  7:33                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  7:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  7:33                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  7:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  7:33                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  7:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-18 22:49               ` Eric W. Biederman
  (?)
@ 2022-05-20  9:19                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  9:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  9:19                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  9:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20  9:19                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 572+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-20  9:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-20 16:19                   ` kernel test robot
  -1 siblings, 0 replies; 572+ messages in thread
From: kernel test robot @ 2022-05-20 16:19 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: kbuild-all, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64, Robert OCallahan,
	Kyle Huey, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Jason Wessel, Daniel Thompson

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220518]
[cannot apply to linux/master powerpc/next wireless-next/main wireless/main linus/master v5.18-rc7 v5.18-rc6 v5.18-rc5 v5.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
base:    736ee37e2e8eed7fe48d0a37ee5a709514d478b3
config: parisc-randconfig-s032-20220519 (https://download.01.org/0day-ci/archive/20220521/202205210010.E4Hyn2kD-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
        git checkout 4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   kernel/signal.c: note: in included file (through arch/parisc/include/uapi/asm/signal.h, arch/parisc/include/asm/signal.h, include/uapi/linux/signal.h, ...):
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:195:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:195:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:195:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:198:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:198:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:198:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:480:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:480:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:480:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:484:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:484:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:484:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:542:53: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct k_sigaction *ka @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:542:53: sparse:     expected struct k_sigaction *ka
   kernel/signal.c:542:53: sparse:     got struct k_sigaction [noderef] __rcu *
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:1261:9: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1328:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1328:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1328:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1329:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *action @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:1329:16: sparse:     expected struct k_sigaction *action
   kernel/signal.c:1329:16: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:1349:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1349:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1349:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1938:36: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1938:36: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1938:36: sparse:     got struct spinlock [noderef] __rcu *
>> kernel/signal.c:2048:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2048:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2048:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2057:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2057:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2057:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2087:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> kernel/signal.c:2087:21: sparse:    struct task_struct [noderef] __rcu *
>> kernel/signal.c:2087:21: sparse:    struct task_struct *
>> kernel/signal.c:2117:40: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2117:40: sparse:     expected struct task_struct *parent
   kernel/signal.c:2117:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2119:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2119:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2119:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2120:50: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *p_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2120:50: sparse:     expected struct sighand_struct *p_sighand
   kernel/signal.c:2120:50: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2125:58: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *t_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2125:58: sparse:     expected struct sighand_struct *t_sighand
   kernel/signal.c:2125:58: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2171:44: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2190:65: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2190:65: sparse:     expected struct task_struct *tsk
   kernel/signal.c:2190:65: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2191:40: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2209:14: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *psig @@     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand @@
   kernel/signal.c:2209:14: sparse:     expected struct sighand_struct *psig
   kernel/signal.c:2209:14: sparse:     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand
   kernel/signal.c:2238:53: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct task_struct *t @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2238:53: sparse:     expected struct task_struct *t
   kernel/signal.c:2238:53: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2239:34: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2239:34: sparse:     expected struct task_struct *parent
   kernel/signal.c:2239:34: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2269:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2269:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2269:24: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2272:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2272:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2272:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2307:17: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2307:17: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2307:17: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2341:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2341:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2341:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2343:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2343:39: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2343:39: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2428:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2428:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2428:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2440:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2440:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2440:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2479:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2479:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2479:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2481:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2481:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2481:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2584:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2584:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2584:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2599:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2599:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2599:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2656:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2656:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2656:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2668:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2668:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2668:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2726:49: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2726:49: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2726:49: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:3052:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3052:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3052:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3081:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3081:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3081:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3138:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3138:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3138:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3140:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3140:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3140:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3291:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3291:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3291:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3294:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3294:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3294:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3683:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3683:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3683:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3695:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3695:37: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3695:37: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3700:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3700:35: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3700:35: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3705:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3705:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3705:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4159:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4159:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4159:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4171:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4171:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4171:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4189:11: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *k @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:4189:11: sparse:     expected struct k_sigaction *k
   kernel/signal.c:4189:11: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:4191:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4191:25: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4191:25: sparse:     got struct spinlock [noderef] __rcu *

vim +2048 kernel/signal.c

  1934	
  1935	void sigqueue_free(struct sigqueue *q)
  1936	{
  1937		unsigned long flags;
> 1938		spinlock_t *lock = &current->sighand->siglock;
  1939	
  1940		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1941		/*
  1942		 * We must hold ->siglock while testing q->list
  1943		 * to serialize with collect_signal() or with
  1944		 * __exit_signal()->flush_sigqueue().
  1945		 */
  1946		spin_lock_irqsave(lock, flags);
  1947		q->flags &= ~SIGQUEUE_PREALLOC;
  1948		/*
  1949		 * If it is queued it will be freed when dequeued,
  1950		 * like the "regular" sigqueue.
  1951		 */
  1952		if (!list_empty(&q->list))
  1953			q = NULL;
  1954		spin_unlock_irqrestore(lock, flags);
  1955	
  1956		if (q)
  1957			__sigqueue_free(q);
  1958	}
  1959	
  1960	int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
  1961	{
  1962		int sig = q->info.si_signo;
  1963		struct sigpending *pending;
  1964		struct task_struct *t;
  1965		unsigned long flags;
  1966		int ret, result;
  1967	
  1968		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1969	
  1970		ret = -1;
  1971		rcu_read_lock();
  1972		t = pid_task(pid, type);
  1973		if (!t || !likely(lock_task_sighand(t, &flags)))
  1974			goto ret;
  1975	
  1976		ret = 1; /* the signal is ignored */
  1977		result = TRACE_SIGNAL_IGNORED;
  1978		if (!prepare_signal(sig, t, false))
  1979			goto out;
  1980	
  1981		ret = 0;
  1982		if (unlikely(!list_empty(&q->list))) {
  1983			/*
  1984			 * If an SI_TIMER entry is already queue just increment
  1985			 * the overrun count.
  1986			 */
  1987			BUG_ON(q->info.si_code != SI_TIMER);
  1988			q->info.si_overrun++;
  1989			result = TRACE_SIGNAL_ALREADY_PENDING;
  1990			goto out;
  1991		}
  1992		q->info.si_overrun = 0;
  1993	
  1994		signalfd_notify(t, sig);
  1995		pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
  1996		list_add_tail(&q->list, &pending->list);
  1997		sigaddset(&pending->signal, sig);
  1998		complete_signal(sig, t, type);
  1999		result = TRACE_SIGNAL_DELIVERED;
  2000	out:
  2001		trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result);
  2002		unlock_task_sighand(t, &flags);
  2003	ret:
  2004		rcu_read_unlock();
  2005		return ret;
  2006	}
  2007	
  2008	/**
  2009	 * lock_parents_siglocks - Take current, real_parent, and parent's siglock
  2010	 * @lock_tracer: The tracers siglock is needed.
  2011	 *
  2012	 * There is no natural ordering to these locks so they must be sorted
  2013	 * before being taken.
  2014	 *
  2015	 * There are two complicating factors here:
  2016	 * - The locks live in sighand and sighand can be arbitrarily shared
  2017	 * - parent and real_parent can change when current's siglock is unlocked.
  2018	 *
  2019	 * To deal with this first the all of the sighand pointers are
  2020	 * gathered under current's siglock, and the sighand pointers are
  2021	 * sorted.  As siglock lives inside of sighand this also sorts the
  2022	 * siglock's by address.
  2023	 *
  2024	 * Then the siglocks are taken in order dropping current's siglock if
  2025	 * necessary.
  2026	 *
  2027	 * Finally if parent and real_parent have not changed return.
  2028	 * If they either parent has changed drop their locks and try again.
  2029	 *
  2030	 * Changing sighand is an infrequent and somewhat expensive operation
  2031	 * (unshare or exec) and so even in the worst case this loop
  2032	 * should not loop too many times before all of the proper locks are
  2033	 * taken in order.
  2034	 *
  2035	 * CONTEXT:
  2036	 * Must be called with @current->sighand->siglock held
  2037	 *
  2038	 * RETURNS:
  2039	 * current's, real_parent's, and parent's siglock held.
  2040	 */
  2041	static void lock_parents_siglocks(bool lock_tracer)
  2042		__releases(&current->sighand->siglock)
  2043		__acquires(&current->sighand->siglock)
  2044		__acquires(&current->real_parent->sighand->siglock)
  2045		__acquires(&current->parent->sighand->siglock)
  2046	{
  2047		struct task_struct *me = current;
> 2048		struct sighand_struct *m_sighand = me->sighand;
  2049	
  2050		lockdep_assert_held(&m_sighand->siglock);
  2051	
  2052		rcu_read_lock();
  2053		for (;;) {
  2054			struct task_struct *parent, *tracer;
  2055			struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
  2056	
  2057			parent = me->real_parent;
  2058			tracer = ptrace_parent(me);
  2059			if (!tracer || !lock_tracer)
  2060				tracer = parent;
  2061	
  2062			p_sighand = rcu_dereference(parent->sighand);
  2063			t_sighand = rcu_dereference(tracer->sighand);
  2064	
  2065			/* Sort the sighands so that s1 >= s2 >= s3 */
  2066			s1 = m_sighand;
  2067			s2 = p_sighand;
  2068			s3 = t_sighand;
  2069			if (s1 > s2)
  2070				swap(s1, s2);
  2071			if (s1 > s3)
  2072				swap(s1, s3);
  2073			if (s2 > s3)
  2074				swap(s2, s3);
  2075	
  2076			/* Take the locks in order */
  2077			if (s1 != m_sighand) {
  2078				spin_unlock(&m_sighand->siglock);
  2079				spin_lock(&s1->siglock);
  2080			}
  2081			if (s1 != s2)
  2082				spin_lock_nested(&s2->siglock, 1);
  2083			if (s2 != s3)
  2084				spin_lock_nested(&s3->siglock, 2);
  2085	
  2086			/* Verify the proper locks are held */
> 2087			if (likely((s1 == m_sighand) ||
  2088				   ((me->real_parent == parent) &&
  2089				    (me->parent == tracer) &&
  2090				    (parent->sighand == p_sighand) &&
  2091				    (tracer->sighand == t_sighand)))) {
  2092				break;
  2093			}
  2094	
  2095			/* Drop all but current's siglock */
  2096			if (p_sighand != m_sighand)
  2097				spin_unlock(&p_sighand->siglock);
  2098			if (t_sighand != p_sighand)
  2099				spin_unlock(&t_sighand->siglock);
  2100	
  2101			/*
  2102			 * Since [pt]_sighand will likely change if we go
  2103			 * around, and m_sighand is the only one held, make sure
  2104			 * it is subclass-0, since the above 's1 != m_sighand'
  2105			 * clause very much relies on that.
  2106			 */
  2107			lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
  2108		}
  2109		rcu_read_unlock();
  2110	}
  2111	
  2112	static void unlock_parents_siglocks(bool unlock_tracer)
  2113		__releases(&current->real_parent->sighand->siglock)
  2114		__releases(&current->parent->sighand->siglock)
  2115	{
  2116		struct task_struct *me = current;
> 2117		struct task_struct *parent = me->real_parent;
  2118		struct task_struct *tracer = ptrace_parent(me);
  2119		struct sighand_struct *m_sighand = me->sighand;
> 2120		struct sighand_struct *p_sighand = parent->sighand;
  2121	
  2122		if (p_sighand != m_sighand)
  2123			spin_unlock(&p_sighand->siglock);
  2124		if (tracer && unlock_tracer) {
> 2125			struct sighand_struct *t_sighand = tracer->sighand;
  2126			if (t_sighand != p_sighand)
  2127				spin_unlock(&t_sighand->siglock);
  2128		}
  2129	}
  2130	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-05-20 16:19                   ` kernel test robot
  0 siblings, 0 replies; 572+ messages in thread
From: kernel test robot @ 2022-05-20 16:19 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: kbuild-all, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64, Robert OCallahan,
	Kyle Huey, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Jason Wessel, Daniel Thompson

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220518]
[cannot apply to linux/master powerpc/next wireless-next/main wireless/main linus/master v5.18-rc7 v5.18-rc6 v5.18-rc5 v5.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
base:    736ee37e2e8eed7fe48d0a37ee5a709514d478b3
config: parisc-randconfig-s032-20220519 (https://download.01.org/0day-ci/archive/20220521/202205210010.E4Hyn2kD-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
        git checkout 4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   kernel/signal.c: note: in included file (through arch/parisc/include/uapi/asm/signal.h, arch/parisc/include/asm/signal.h, include/uapi/linux/signal.h, ...):
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:195:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:195:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:195:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:198:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:198:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:198:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:480:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:480:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:480:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:484:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:484:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:484:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:542:53: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct k_sigaction *ka @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:542:53: sparse:     expected struct k_sigaction *ka
   kernel/signal.c:542:53: sparse:     got struct k_sigaction [noderef] __rcu *
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:1261:9: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1328:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1328:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1328:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1329:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *action @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:1329:16: sparse:     expected struct k_sigaction *action
   kernel/signal.c:1329:16: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:1349:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1349:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1349:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1938:36: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1938:36: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1938:36: sparse:     got struct spinlock [noderef] __rcu *
>> kernel/signal.c:2048:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2048:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2048:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2057:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2057:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2057:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2087:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> kernel/signal.c:2087:21: sparse:    struct task_struct [noderef] __rcu *
>> kernel/signal.c:2087:21: sparse:    struct task_struct *
>> kernel/signal.c:2117:40: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2117:40: sparse:     expected struct task_struct *parent
   kernel/signal.c:2117:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2119:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2119:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2119:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2120:50: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *p_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2120:50: sparse:     expected struct sighand_struct *p_sighand
   kernel/signal.c:2120:50: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2125:58: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *t_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2125:58: sparse:     expected struct sighand_struct *t_sighand
   kernel/signal.c:2125:58: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2171:44: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2190:65: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2190:65: sparse:     expected struct task_struct *tsk
   kernel/signal.c:2190:65: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2191:40: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2209:14: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *psig @@     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand @@
   kernel/signal.c:2209:14: sparse:     expected struct sighand_struct *psig
   kernel/signal.c:2209:14: sparse:     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand
   kernel/signal.c:2238:53: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct task_struct *t @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2238:53: sparse:     expected struct task_struct *t
   kernel/signal.c:2238:53: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2239:34: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2239:34: sparse:     expected struct task_struct *parent
   kernel/signal.c:2239:34: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2269:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2269:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2269:24: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2272:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2272:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2272:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2307:17: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2307:17: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2307:17: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2341:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2341:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2341:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2343:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2343:39: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2343:39: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2428:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2428:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2428:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2440:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2440:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2440:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2479:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2479:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2479:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2481:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2481:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2481:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2584:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2584:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2584:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2599:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2599:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2599:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2656:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2656:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2656:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2668:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2668:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2668:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2726:49: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2726:49: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2726:49: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:3052:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3052:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3052:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3081:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3081:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3081:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3138:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3138:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3138:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3140:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3140:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3140:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3291:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3291:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3291:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3294:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3294:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3294:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3683:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3683:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3683:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3695:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3695:37: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3695:37: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3700:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3700:35: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3700:35: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3705:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3705:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3705:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4159:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4159:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4159:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4171:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4171:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4171:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4189:11: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *k @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:4189:11: sparse:     expected struct k_sigaction *k
   kernel/signal.c:4189:11: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:4191:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4191:25: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4191:25: sparse:     got struct spinlock [noderef] __rcu *

vim +2048 kernel/signal.c

  1934	
  1935	void sigqueue_free(struct sigqueue *q)
  1936	{
  1937		unsigned long flags;
> 1938		spinlock_t *lock = &current->sighand->siglock;
  1939	
  1940		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1941		/*
  1942		 * We must hold ->siglock while testing q->list
  1943		 * to serialize with collect_signal() or with
  1944		 * __exit_signal()->flush_sigqueue().
  1945		 */
  1946		spin_lock_irqsave(lock, flags);
  1947		q->flags &= ~SIGQUEUE_PREALLOC;
  1948		/*
  1949		 * If it is queued it will be freed when dequeued,
  1950		 * like the "regular" sigqueue.
  1951		 */
  1952		if (!list_empty(&q->list))
  1953			q = NULL;
  1954		spin_unlock_irqrestore(lock, flags);
  1955	
  1956		if (q)
  1957			__sigqueue_free(q);
  1958	}
  1959	
  1960	int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
  1961	{
  1962		int sig = q->info.si_signo;
  1963		struct sigpending *pending;
  1964		struct task_struct *t;
  1965		unsigned long flags;
  1966		int ret, result;
  1967	
  1968		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1969	
  1970		ret = -1;
  1971		rcu_read_lock();
  1972		t = pid_task(pid, type);
  1973		if (!t || !likely(lock_task_sighand(t, &flags)))
  1974			goto ret;
  1975	
  1976		ret = 1; /* the signal is ignored */
  1977		result = TRACE_SIGNAL_IGNORED;
  1978		if (!prepare_signal(sig, t, false))
  1979			goto out;
  1980	
  1981		ret = 0;
  1982		if (unlikely(!list_empty(&q->list))) {
  1983			/*
  1984			 * If an SI_TIMER entry is already queue just increment
  1985			 * the overrun count.
  1986			 */
  1987			BUG_ON(q->info.si_code != SI_TIMER);
  1988			q->info.si_overrun++;
  1989			result = TRACE_SIGNAL_ALREADY_PENDING;
  1990			goto out;
  1991		}
  1992		q->info.si_overrun = 0;
  1993	
  1994		signalfd_notify(t, sig);
  1995		pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
  1996		list_add_tail(&q->list, &pending->list);
  1997		sigaddset(&pending->signal, sig);
  1998		complete_signal(sig, t, type);
  1999		result = TRACE_SIGNAL_DELIVERED;
  2000	out:
  2001		trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result);
  2002		unlock_task_sighand(t, &flags);
  2003	ret:
  2004		rcu_read_unlock();
  2005		return ret;
  2006	}
  2007	
  2008	/**
  2009	 * lock_parents_siglocks - Take current, real_parent, and parent's siglock
  2010	 * @lock_tracer: The tracers siglock is needed.
  2011	 *
  2012	 * There is no natural ordering to these locks so they must be sorted
  2013	 * before being taken.
  2014	 *
  2015	 * There are two complicating factors here:
  2016	 * - The locks live in sighand and sighand can be arbitrarily shared
  2017	 * - parent and real_parent can change when current's siglock is unlocked.
  2018	 *
  2019	 * To deal with this first the all of the sighand pointers are
  2020	 * gathered under current's siglock, and the sighand pointers are
  2021	 * sorted.  As siglock lives inside of sighand this also sorts the
  2022	 * siglock's by address.
  2023	 *
  2024	 * Then the siglocks are taken in order dropping current's siglock if
  2025	 * necessary.
  2026	 *
  2027	 * Finally if parent and real_parent have not changed return.
  2028	 * If they either parent has changed drop their locks and try again.
  2029	 *
  2030	 * Changing sighand is an infrequent and somewhat expensive operation
  2031	 * (unshare or exec) and so even in the worst case this loop
  2032	 * should not loop too many times before all of the proper locks are
  2033	 * taken in order.
  2034	 *
  2035	 * CONTEXT:
  2036	 * Must be called with @current->sighand->siglock held
  2037	 *
  2038	 * RETURNS:
  2039	 * current's, real_parent's, and parent's siglock held.
  2040	 */
  2041	static void lock_parents_siglocks(bool lock_tracer)
  2042		__releases(&current->sighand->siglock)
  2043		__acquires(&current->sighand->siglock)
  2044		__acquires(&current->real_parent->sighand->siglock)
  2045		__acquires(&current->parent->sighand->siglock)
  2046	{
  2047		struct task_struct *me = current;
> 2048		struct sighand_struct *m_sighand = me->sighand;
  2049	
  2050		lockdep_assert_held(&m_sighand->siglock);
  2051	
  2052		rcu_read_lock();
  2053		for (;;) {
  2054			struct task_struct *parent, *tracer;
  2055			struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
  2056	
  2057			parent = me->real_parent;
  2058			tracer = ptrace_parent(me);
  2059			if (!tracer || !lock_tracer)
  2060				tracer = parent;
  2061	
  2062			p_sighand = rcu_dereference(parent->sighand);
  2063			t_sighand = rcu_dereference(tracer->sighand);
  2064	
  2065			/* Sort the sighands so that s1 >= s2 >= s3 */
  2066			s1 = m_sighand;
  2067			s2 = p_sighand;
  2068			s3 = t_sighand;
  2069			if (s1 > s2)
  2070				swap(s1, s2);
  2071			if (s1 > s3)
  2072				swap(s1, s3);
  2073			if (s2 > s3)
  2074				swap(s2, s3);
  2075	
  2076			/* Take the locks in order */
  2077			if (s1 != m_sighand) {
  2078				spin_unlock(&m_sighand->siglock);
  2079				spin_lock(&s1->siglock);
  2080			}
  2081			if (s1 != s2)
  2082				spin_lock_nested(&s2->siglock, 1);
  2083			if (s2 != s3)
  2084				spin_lock_nested(&s3->siglock, 2);
  2085	
  2086			/* Verify the proper locks are held */
> 2087			if (likely((s1 == m_sighand) ||
  2088				   ((me->real_parent == parent) &&
  2089				    (me->parent == tracer) &&
  2090				    (parent->sighand == p_sighand) &&
  2091				    (tracer->sighand == t_sighand)))) {
  2092				break;
  2093			}
  2094	
  2095			/* Drop all but current's siglock */
  2096			if (p_sighand != m_sighand)
  2097				spin_unlock(&p_sighand->siglock);
  2098			if (t_sighand != p_sighand)
  2099				spin_unlock(&t_sighand->siglock);
  2100	
  2101			/*
  2102			 * Since [pt]_sighand will likely change if we go
  2103			 * around, and m_sighand is the only one held, make sure
  2104			 * it is subclass-0, since the above 's1 != m_sighand'
  2105			 * clause very much relies on that.
  2106			 */
  2107			lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
  2108		}
  2109		rcu_read_unlock();
  2110	}
  2111	
  2112	static void unlock_parents_siglocks(bool unlock_tracer)
  2113		__releases(&current->real_parent->sighand->siglock)
  2114		__releases(&current->parent->sighand->siglock)
  2115	{
  2116		struct task_struct *me = current;
> 2117		struct task_struct *parent = me->real_parent;
  2118		struct task_struct *tracer = ptrace_parent(me);
  2119		struct sighand_struct *m_sighand = me->sighand;
> 2120		struct sighand_struct *p_sighand = parent->sighand;
  2121	
  2122		if (p_sighand != m_sighand)
  2123			spin_unlock(&p_sighand->siglock);
  2124		if (tracer && unlock_tracer) {
> 2125			struct sighand_struct *t_sighand = tracer->sighand;
  2126			if (t_sighand != p_sighand)
  2127				spin_unlock(&t_sighand->siglock);
  2128		}
  2129	}
  2130	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held
@ 2022-05-20 16:19                   ` kernel test robot
  0 siblings, 0 replies; 572+ messages in thread
From: kernel test robot @ 2022-05-20 16:19 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: kbuild-all, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64, Robert OCallahan,
	Kyle Huey, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Jason Wessel, Daniel Thompson

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220518]
[cannot apply to linux/master powerpc/next wireless-next/main wireless/main linus/master v5.18-rc7 v5.18-rc6 v5.18-rc5 v5.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
base:    736ee37e2e8eed7fe48d0a37ee5a709514d478b3
config: parisc-randconfig-s032-20220519 (https://download.01.org/0day-ci/archive/20220521/202205210010.E4Hyn2kD-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
        git checkout 4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   kernel/signal.c: note: in included file (through arch/parisc/include/uapi/asm/signal.h, arch/parisc/include/asm/signal.h, include/uapi/linux/signal.h, ...):
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:195:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:195:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:195:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:198:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:198:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:198:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:480:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:480:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:480:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:484:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:484:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:484:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:542:53: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct k_sigaction *ka @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:542:53: sparse:     expected struct k_sigaction *ka
   kernel/signal.c:542:53: sparse:     got struct k_sigaction [noderef] __rcu *
   include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
   kernel/signal.c:1261:9: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
   kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
   kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
   kernel/signal.c:1328:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1328:9: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1328:9: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1329:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *action @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:1329:16: sparse:     expected struct k_sigaction *action
   kernel/signal.c:1329:16: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:1349:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1349:34: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1349:34: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:1938:36: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:1938:36: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:1938:36: sparse:     got struct spinlock [noderef] __rcu *
>> kernel/signal.c:2048:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2048:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2048:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2057:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2057:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2057:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2087:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> kernel/signal.c:2087:21: sparse:    struct task_struct [noderef] __rcu *
>> kernel/signal.c:2087:21: sparse:    struct task_struct *
>> kernel/signal.c:2117:40: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2117:40: sparse:     expected struct task_struct *parent
   kernel/signal.c:2117:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2119:46: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *m_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2119:46: sparse:     expected struct sighand_struct *m_sighand
   kernel/signal.c:2119:46: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2120:50: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *p_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2120:50: sparse:     expected struct sighand_struct *p_sighand
   kernel/signal.c:2120:50: sparse:     got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2125:58: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *t_sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2125:58: sparse:     expected struct sighand_struct *t_sighand
   kernel/signal.c:2125:58: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2171:44: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2190:65: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2190:65: sparse:     expected struct task_struct *tsk
   kernel/signal.c:2190:65: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2191:40: sparse: sparse: cast removes address space '__rcu' of expression
   kernel/signal.c:2209:14: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *psig @@     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand @@
   kernel/signal.c:2209:14: sparse:     expected struct sighand_struct *psig
   kernel/signal.c:2209:14: sparse:     got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand
   kernel/signal.c:2238:53: sparse: sparse: incorrect type in argument 3 (different address spaces) @@     expected struct task_struct *t @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2238:53: sparse:     expected struct task_struct *t
   kernel/signal.c:2238:53: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2239:34: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2239:34: sparse:     expected struct task_struct *parent
   kernel/signal.c:2239:34: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2269:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *parent @@
   kernel/signal.c:2269:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2269:24: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/signal.c:2272:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/signal.c:2272:24: sparse:     expected struct task_struct *parent
   kernel/signal.c:2272:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/signal.c:2307:17: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2307:17: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2307:17: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:2341:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2341:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2341:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2343:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2343:39: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2343:39: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2428:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2428:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2428:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2440:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2440:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2440:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2479:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2479:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2479:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2481:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2481:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2481:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2584:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2584:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2584:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2599:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2599:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2599:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2656:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2656:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2656:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2668:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:2668:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:2668:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:2726:49: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/signal.c:2726:49: sparse:     expected struct sighand_struct *sighand
   kernel/signal.c:2726:49: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/signal.c:3052:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3052:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3052:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3081:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3081:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3081:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3138:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3138:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3138:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3140:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3140:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3140:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3291:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3291:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3291:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3294:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3294:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3294:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3683:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3683:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3683:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3695:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3695:37: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3695:37: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3700:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3700:35: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3700:35: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:3705:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:3705:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:3705:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4159:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4159:31: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4159:31: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4171:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4171:33: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4171:33: sparse:     got struct spinlock [noderef] __rcu *
   kernel/signal.c:4189:11: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct k_sigaction *k @@     got struct k_sigaction [noderef] __rcu * @@
   kernel/signal.c:4189:11: sparse:     expected struct k_sigaction *k
   kernel/signal.c:4189:11: sparse:     got struct k_sigaction [noderef] __rcu *
   kernel/signal.c:4191:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/signal.c:4191:25: sparse:     expected struct spinlock [usertype] *lock
   kernel/signal.c:4191:25: sparse:     got struct spinlock [noderef] __rcu *

vim +2048 kernel/signal.c

  1934	
  1935	void sigqueue_free(struct sigqueue *q)
  1936	{
  1937		unsigned long flags;
> 1938		spinlock_t *lock = &current->sighand->siglock;
  1939	
  1940		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1941		/*
  1942		 * We must hold ->siglock while testing q->list
  1943		 * to serialize with collect_signal() or with
  1944		 * __exit_signal()->flush_sigqueue().
  1945		 */
  1946		spin_lock_irqsave(lock, flags);
  1947		q->flags &= ~SIGQUEUE_PREALLOC;
  1948		/*
  1949		 * If it is queued it will be freed when dequeued,
  1950		 * like the "regular" sigqueue.
  1951		 */
  1952		if (!list_empty(&q->list))
  1953			q = NULL;
  1954		spin_unlock_irqrestore(lock, flags);
  1955	
  1956		if (q)
  1957			__sigqueue_free(q);
  1958	}
  1959	
  1960	int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
  1961	{
  1962		int sig = q->info.si_signo;
  1963		struct sigpending *pending;
  1964		struct task_struct *t;
  1965		unsigned long flags;
  1966		int ret, result;
  1967	
  1968		BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
  1969	
  1970		ret = -1;
  1971		rcu_read_lock();
  1972		t = pid_task(pid, type);
  1973		if (!t || !likely(lock_task_sighand(t, &flags)))
  1974			goto ret;
  1975	
  1976		ret = 1; /* the signal is ignored */
  1977		result = TRACE_SIGNAL_IGNORED;
  1978		if (!prepare_signal(sig, t, false))
  1979			goto out;
  1980	
  1981		ret = 0;
  1982		if (unlikely(!list_empty(&q->list))) {
  1983			/*
  1984			 * If an SI_TIMER entry is already queue just increment
  1985			 * the overrun count.
  1986			 */
  1987			BUG_ON(q->info.si_code != SI_TIMER);
  1988			q->info.si_overrun++;
  1989			result = TRACE_SIGNAL_ALREADY_PENDING;
  1990			goto out;
  1991		}
  1992		q->info.si_overrun = 0;
  1993	
  1994		signalfd_notify(t, sig);
  1995		pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
  1996		list_add_tail(&q->list, &pending->list);
  1997		sigaddset(&pending->signal, sig);
  1998		complete_signal(sig, t, type);
  1999		result = TRACE_SIGNAL_DELIVERED;
  2000	out:
  2001		trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result);
  2002		unlock_task_sighand(t, &flags);
  2003	ret:
  2004		rcu_read_unlock();
  2005		return ret;
  2006	}
  2007	
  2008	/**
  2009	 * lock_parents_siglocks - Take current, real_parent, and parent's siglock
  2010	 * @lock_tracer: The tracers siglock is needed.
  2011	 *
  2012	 * There is no natural ordering to these locks so they must be sorted
  2013	 * before being taken.
  2014	 *
  2015	 * There are two complicating factors here:
  2016	 * - The locks live in sighand and sighand can be arbitrarily shared
  2017	 * - parent and real_parent can change when current's siglock is unlocked.
  2018	 *
  2019	 * To deal with this first the all of the sighand pointers are
  2020	 * gathered under current's siglock, and the sighand pointers are
  2021	 * sorted.  As siglock lives inside of sighand this also sorts the
  2022	 * siglock's by address.
  2023	 *
  2024	 * Then the siglocks are taken in order dropping current's siglock if
  2025	 * necessary.
  2026	 *
  2027	 * Finally if parent and real_parent have not changed return.
  2028	 * If they either parent has changed drop their locks and try again.
  2029	 *
  2030	 * Changing sighand is an infrequent and somewhat expensive operation
  2031	 * (unshare or exec) and so even in the worst case this loop
  2032	 * should not loop too many times before all of the proper locks are
  2033	 * taken in order.
  2034	 *
  2035	 * CONTEXT:
  2036	 * Must be called with @current->sighand->siglock held
  2037	 *
  2038	 * RETURNS:
  2039	 * current's, real_parent's, and parent's siglock held.
  2040	 */
  2041	static void lock_parents_siglocks(bool lock_tracer)
  2042		__releases(&current->sighand->siglock)
  2043		__acquires(&current->sighand->siglock)
  2044		__acquires(&current->real_parent->sighand->siglock)
  2045		__acquires(&current->parent->sighand->siglock)
  2046	{
  2047		struct task_struct *me = current;
> 2048		struct sighand_struct *m_sighand = me->sighand;
  2049	
  2050		lockdep_assert_held(&m_sighand->siglock);
  2051	
  2052		rcu_read_lock();
  2053		for (;;) {
  2054			struct task_struct *parent, *tracer;
  2055			struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
  2056	
  2057			parent = me->real_parent;
  2058			tracer = ptrace_parent(me);
  2059			if (!tracer || !lock_tracer)
  2060				tracer = parent;
  2061	
  2062			p_sighand = rcu_dereference(parent->sighand);
  2063			t_sighand = rcu_dereference(tracer->sighand);
  2064	
  2065			/* Sort the sighands so that s1 >= s2 >= s3 */
  2066			s1 = m_sighand;
  2067			s2 = p_sighand;
  2068			s3 = t_sighand;
  2069			if (s1 > s2)
  2070				swap(s1, s2);
  2071			if (s1 > s3)
  2072				swap(s1, s3);
  2073			if (s2 > s3)
  2074				swap(s2, s3);
  2075	
  2076			/* Take the locks in order */
  2077			if (s1 != m_sighand) {
  2078				spin_unlock(&m_sighand->siglock);
  2079				spin_lock(&s1->siglock);
  2080			}
  2081			if (s1 != s2)
  2082				spin_lock_nested(&s2->siglock, 1);
  2083			if (s2 != s3)
  2084				spin_lock_nested(&s3->siglock, 2);
  2085	
  2086			/* Verify the proper locks are held */
> 2087			if (likely((s1 = m_sighand) ||
  2088				   ((me->real_parent = parent) &&
  2089				    (me->parent = tracer) &&
  2090				    (parent->sighand = p_sighand) &&
  2091				    (tracer->sighand = t_sighand)))) {
  2092				break;
  2093			}
  2094	
  2095			/* Drop all but current's siglock */
  2096			if (p_sighand != m_sighand)
  2097				spin_unlock(&p_sighand->siglock);
  2098			if (t_sighand != p_sighand)
  2099				spin_unlock(&t_sighand->siglock);
  2100	
  2101			/*
  2102			 * Since [pt]_sighand will likely change if we go
  2103			 * around, and m_sighand is the only one held, make sure
  2104			 * it is subclass-0, since the above 's1 != m_sighand'
  2105			 * clause very much relies on that.
  2106			 */
  2107			lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
  2108		}
  2109		rcu_read_unlock();
  2110	}
  2111	
  2112	static void unlock_parents_siglocks(bool unlock_tracer)
  2113		__releases(&current->real_parent->sighand->siglock)
  2114		__releases(&current->parent->sighand->siglock)
  2115	{
  2116		struct task_struct *me = current;
> 2117		struct task_struct *parent = me->real_parent;
  2118		struct task_struct *tracer = ptrace_parent(me);
  2119		struct sighand_struct *m_sighand = me->sighand;
> 2120		struct sighand_struct *p_sighand = parent->sighand;
  2121	
  2122		if (p_sighand != m_sighand)
  2123			spin_unlock(&p_sighand->siglock);
  2124		if (tracer && unlock_tracer) {
> 2125			struct sighand_struct *t_sighand = tracer->sighand;
  2126			if (t_sighand != p_sighand)
  2127				spin_unlock(&t_sighand->siglock);
  2128		}
  2129	}
  2130	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-20  7:33                 ` Sebastian Andrzej Siewior
  (?)
@ 2022-05-20 19:32                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-20 19:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20 19:32                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-20 19:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20 19:32                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-05-20 19:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, rjw, oleg, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | ==============> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-20 19:32                   ` Eric W. Biederman
  (?)
@ 2022-05-20 19:58                     ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-20 19:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, linux-kernel, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20 19:58                     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-20 19:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, linux-kernel, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-05-20 19:58                     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-05-20 19:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, linux-kernel, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Jann Horn,
	Kees Cook, linux-ia64, Robert O'Callahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | ==============> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
  2022-05-19 23:48                     ` Eric W. Biederman
  (?)
@ 2022-05-20 23:01                       ` Doug Anderson
  -1 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-20 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> kdb has a bug that when using the ps command to display a list of
> >> processes, if a process is being debugged the debugger as the parent
> >> process.
> >>
> >> This is silly, and I expect it never comes up in ptractice.  As there
> >> is very little point in using gdb and kdb simultaneously.  Update the
> >> code to use real_parent so that it is clear kdb does not want to
> >> display a debugger as the parent of a process.
> >
> > So I would tend to defer to Daniel, but I'm not convinced that the
> > behavior you describe for kdb today _is_ actually silly.
> >
> > If I was in kdb and I was listing processes, I might actually want to
> > see that a process's parent was set to gdb. Presumably that would tell
> > me extra information that might be relevant to my debug session.
> >
> > Personally, I'd rather add an extra piece of information into the list
> > showing the real parent if it's not the same as the parent. Then
> > you're not throwing away information.
>
> The name of the field is confusing for anyone who isn't intimate with
> the implementation details.  The function getppid returns
> tsk->real_parent->tgid.
>
> If kdb wants information of what the tracer is that is fine, but I
> recommend putting that information in another field.
>
> Given that the original description says give the information that ps
> gives my sense is that kdb is currently wrong.  Especially as it does
> not give you the actual parentage anywhere.
>
> I can certainly be convinced, but I do want some clarity.  It looks very
> attractive to rename task->parent to task->ptracer and leave the field
> NULL when there is no tracer.

Fair enough. You can consider my objection rescinded.

Presumably, though, you're hoping for an Ack for your patch and you
plan to take it with the rest of the series. That's going to need to
come from Daniel anyway as he is the actual maintainer. I'm just the
peanut gallery. ;-)

-Doug

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-20 23:01                       ` Doug Anderson
  0 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-20 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> kdb has a bug that when using the ps command to display a list of
> >> processes, if a process is being debugged the debugger as the parent
> >> process.
> >>
> >> This is silly, and I expect it never comes up in ptractice.  As there
> >> is very little point in using gdb and kdb simultaneously.  Update the
> >> code to use real_parent so that it is clear kdb does not want to
> >> display a debugger as the parent of a process.
> >
> > So I would tend to defer to Daniel, but I'm not convinced that the
> > behavior you describe for kdb today _is_ actually silly.
> >
> > If I was in kdb and I was listing processes, I might actually want to
> > see that a process's parent was set to gdb. Presumably that would tell
> > me extra information that might be relevant to my debug session.
> >
> > Personally, I'd rather add an extra piece of information into the list
> > showing the real parent if it's not the same as the parent. Then
> > you're not throwing away information.
>
> The name of the field is confusing for anyone who isn't intimate with
> the implementation details.  The function getppid returns
> tsk->real_parent->tgid.
>
> If kdb wants information of what the tracer is that is fine, but I
> recommend putting that information in another field.
>
> Given that the original description says give the information that ps
> gives my sense is that kdb is currently wrong.  Especially as it does
> not give you the actual parentage anywhere.
>
> I can certainly be convinced, but I do want some clarity.  It looks very
> attractive to rename task->parent to task->ptracer and leave the field
> NULL when there is no tracer.

Fair enough. You can consider my objection rescinded.

Presumably, though, you're hoping for an Ack for your patch and you
plan to take it with the rest of the series. That's going to need to
come from Daniel anyway as he is the actual maintainer. I'm just the
peanut gallery. ;-)

-Doug

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes
@ 2022-05-20 23:01                       ` Doug Anderson
  0 siblings, 0 replies; 572+ messages in thread
From: Doug Anderson @ 2022-05-20 23:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: LKML, Rafael J. Wysocki, Oleg Nesterov, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Sebastian Andrzej Siewior, Will Deacon, Tejun Heo, Linux PM,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Miller, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

Hi,

On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Doug Anderson <dianders@chromium.org> writes:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> kdb has a bug that when using the ps command to display a list of
> >> processes, if a process is being debugged the debugger as the parent
> >> process.
> >>
> >> This is silly, and I expect it never comes up in ptractice.  As there
> >> is very little point in using gdb and kdb simultaneously.  Update the
> >> code to use real_parent so that it is clear kdb does not want to
> >> display a debugger as the parent of a process.
> >
> > So I would tend to defer to Daniel, but I'm not convinced that the
> > behavior you describe for kdb today _is_ actually silly.
> >
> > If I was in kdb and I was listing processes, I might actually want to
> > see that a process's parent was set to gdb. Presumably that would tell
> > me extra information that might be relevant to my debug session.
> >
> > Personally, I'd rather add an extra piece of information into the list
> > showing the real parent if it's not the same as the parent. Then
> > you're not throwing away information.
>
> The name of the field is confusing for anyone who isn't intimate with
> the implementation details.  The function getppid returns
> tsk->real_parent->tgid.
>
> If kdb wants information of what the tracer is that is fine, but I
> recommend putting that information in another field.
>
> Given that the original description says give the information that ps
> gives my sense is that kdb is currently wrong.  Especially as it does
> not give you the actual parentage anywhere.
>
> I can certainly be convinced, but I do want some clarity.  It looks very
> attractive to rename task->parent to task->ptracer and leave the field
> NULL when there is no tracer.

Fair enough. You can consider my objection rescinded.

Presumably, though, you're hoping for an Ack for your patch and you
plan to take it with the rest of the series. That's going to need to
come from Daniel anyway as he is the actual maintainer. I'm just the
peanut gallery. ;-)

-Doug

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-24 11:42                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 11:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sorry for delay.

On 05/18, Eric W. Biederman wrote:
>
> Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> been impossible to attach another thread in the same thread group.
>
> Remove the code from __ptrace_detach that was trying to support
> detaching from a thread in the same thread group.

may be I am totally confused, but I think you misunderstood this code
and thus this patch is very wrong.

The same_thread_group() check does NOT try to check if debugger and
tracee is in the same thread group, this is indeed impossible.

We need this check to know if the tracee was ptrace_reparented() before
__ptrace_unlink() or not.


> -static int ignoring_children(struct sighand_struct *sigh)
> -{
> -	int ret;
> -	spin_lock(&sigh->siglock);
> -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> -	spin_unlock(&sigh->siglock);
> -	return ret;
> -}

...

> @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>
>  	dead = !thread_group_leader(p);
>
> -	if (!dead && thread_group_empty(p)) {
> -		if (!same_thread_group(p->real_parent, tracer))
> -			dead = do_notify_parent(p, p->exit_signal);
> -		else if (ignoring_children(tracer->sighand)) {
> -			__wake_up_parent(p, tracer);
> -			dead = true;
> -		}
> -	}

So the code above does:

	- if !same_thread_group(p->real_parent, tracer), then the tracee was
	  ptrace_reparented(), and now we need to notify its natural parent
	  to let it know it has a zombie child.

	- otherwise, the tracee is our natural child, and it is actually dead.
	  however, since we are going to reap this task, we need to wake up our
	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.

See?

> +	if (!dead && thread_group_empty(p))
> +		dead = do_notify_parent(p, p->exit_signal);

No, this looks wrong. Or I missed something?

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-24 11:42                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 11:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sorry for delay.

On 05/18, Eric W. Biederman wrote:
>
> Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> been impossible to attach another thread in the same thread group.
>
> Remove the code from __ptrace_detach that was trying to support
> detaching from a thread in the same thread group.

may be I am totally confused, but I think you misunderstood this code
and thus this patch is very wrong.

The same_thread_group() check does NOT try to check if debugger and
tracee is in the same thread group, this is indeed impossible.

We need this check to know if the tracee was ptrace_reparented() before
__ptrace_unlink() or not.


> -static int ignoring_children(struct sighand_struct *sigh)
> -{
> -	int ret;
> -	spin_lock(&sigh->siglock);
> -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> -	spin_unlock(&sigh->siglock);
> -	return ret;
> -}

...

> @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>
>  	dead = !thread_group_leader(p);
>
> -	if (!dead && thread_group_empty(p)) {
> -		if (!same_thread_group(p->real_parent, tracer))
> -			dead = do_notify_parent(p, p->exit_signal);
> -		else if (ignoring_children(tracer->sighand)) {
> -			__wake_up_parent(p, tracer);
> -			dead = true;
> -		}
> -	}

So the code above does:

	- if !same_thread_group(p->real_parent, tracer), then the tracee was
	  ptrace_reparented(), and now we need to notify its natural parent
	  to let it know it has a zombie child.

	- otherwise, the tracee is our natural child, and it is actually dead.
	  however, since we are going to reap this task, we need to wake up our
	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.

See?

> +	if (!dead && thread_group_empty(p))
> +		dead = do_notify_parent(p, p->exit_signal);

No, this looks wrong. Or I missed something?

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-24 11:42                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 11:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Sorry for delay.

On 05/18, Eric W. Biederman wrote:
>
> Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> been impossible to attach another thread in the same thread group.
>
> Remove the code from __ptrace_detach that was trying to support
> detaching from a thread in the same thread group.

may be I am totally confused, but I think you misunderstood this code
and thus this patch is very wrong.

The same_thread_group() check does NOT try to check if debugger and
tracee is in the same thread group, this is indeed impossible.

We need this check to know if the tracee was ptrace_reparented() before
__ptrace_unlink() or not.


> -static int ignoring_children(struct sighand_struct *sigh)
> -{
> -	int ret;
> -	spin_lock(&sigh->siglock);
> -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler = SIG_IGN) ||
> -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> -	spin_unlock(&sigh->siglock);
> -	return ret;
> -}

...

> @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>
>  	dead = !thread_group_leader(p);
>
> -	if (!dead && thread_group_empty(p)) {
> -		if (!same_thread_group(p->real_parent, tracer))
> -			dead = do_notify_parent(p, p->exit_signal);
> -		else if (ignoring_children(tracer->sighand)) {
> -			__wake_up_parent(p, tracer);
> -			dead = true;
> -		}
> -	}

So the code above does:

	- if !same_thread_group(p->real_parent, tracer), then the tracee was
	  ptrace_reparented(), and now we need to notify its natural parent
	  to let it know it has a zombie child.

	- otherwise, the tracee is our natural child, and it is actually dead.
	  however, since we are going to reap this task, we need to wake up our
	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.

See?

> +	if (!dead && thread_group_empty(p))
> +		dead = do_notify_parent(p, p->exit_signal);

No, this looks wrong. Or I missed something?

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-24 13:25                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
> can never race with SIGKILL") it has been unnecessary for
> ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

ACK


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
@ 2022-05-24 13:25                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
> can never race with SIGKILL") it has been unnecessary for
> ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

ACK


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
@ 2022-05-24 13:25                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
> can never race with SIGKILL") it has been unnecessary for
> ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

ACK

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-24 13:25                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

I fail to understand this patch...

On 05/18, Eric W. Biederman wrote:
>
> Today if a process is ptraced only the ptracer will ever be woken up in
> wait

and why is this wrong?

> Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")

how does this change fix 75b95953a569?

>  static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>  				int sync, void *key)
>  {
>  	struct wait_opts *wo = container_of(wait, struct wait_opts,
>  						child_wait);
> -	struct task_struct *p = key;
> +	struct child_wait_info *info = key;
>
> -	if (!eligible_pid(wo, p))
> +	if (!eligible_pid(wo, info->p))
>  		return 0;
>
> -	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -		return 0;
> +	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
> +			return 0;

So. wait->private is the task T which sleeping on wait_chldexit.

Before the patch the logic is clear. T called do_wait(__WNOTHREAD) and
we do not need to wake it up if it is not the "actual" parent of p.

After the patch we check it T is actual to the "parent" arg passed to
__wake_up_parent(). Why??? This arg is only used to find the
->signal->wait_chldexit wait_queue_head, and this is fine.

As I said, I don't understand this patch. But at least this change is
wrong in case when __wake_up_parent() is calles by __ptrace_detach().
(you removed it in 5/16 but this looks wrong too). Sure, we can change
ptrace_detach() to use __wake_up_parent(p, p->parent), but for what?

I must have missed something.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-24 13:25                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

I fail to understand this patch...

On 05/18, Eric W. Biederman wrote:
>
> Today if a process is ptraced only the ptracer will ever be woken up in
> wait

and why is this wrong?

> Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")

how does this change fix 75b95953a569?

>  static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>  				int sync, void *key)
>  {
>  	struct wait_opts *wo = container_of(wait, struct wait_opts,
>  						child_wait);
> -	struct task_struct *p = key;
> +	struct child_wait_info *info = key;
>
> -	if (!eligible_pid(wo, p))
> +	if (!eligible_pid(wo, info->p))
>  		return 0;
>
> -	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -		return 0;
> +	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
> +			return 0;

So. wait->private is the task T which sleeping on wait_chldexit.

Before the patch the logic is clear. T called do_wait(__WNOTHREAD) and
we do not need to wake it up if it is not the "actual" parent of p.

After the patch we check it T is actual to the "parent" arg passed to
__wake_up_parent(). Why??? This arg is only used to find the
->signal->wait_chldexit wait_queue_head, and this is fine.

As I said, I don't understand this patch. But at least this change is
wrong in case when __wake_up_parent() is calles by __ptrace_detach().
(you removed it in 5/16 but this looks wrong too). Sure, we can change
ptrace_detach() to use __wake_up_parent(p, p->parent), but for what?

I must have missed something.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-24 13:25                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 13:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

I fail to understand this patch...

On 05/18, Eric W. Biederman wrote:
>
> Today if a process is ptraced only the ptracer will ever be woken up in
> wait

and why is this wrong?

> Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")

how does this change fix 75b95953a569?

>  static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>  				int sync, void *key)
>  {
>  	struct wait_opts *wo = container_of(wait, struct wait_opts,
>  						child_wait);
> -	struct task_struct *p = key;
> +	struct child_wait_info *info = key;
>
> -	if (!eligible_pid(wo, p))
> +	if (!eligible_pid(wo, info->p))
>  		return 0;
>
> -	if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -		return 0;
> +	if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
> +			return 0;

So. wait->private is the task T which sleeping on wait_chldexit.

Before the patch the logic is clear. T called do_wait(__WNOTHREAD) and
we do not need to wake it up if it is not the "actual" parent of p.

After the patch we check it T is actual to the "parent" arg passed to
__wake_up_parent(). Why??? This arg is only used to find the
->signal->wait_chldexit wait_queue_head, and this is fine.

As I said, I don't understand this patch. But at least this change is
wrong in case when __wake_up_parent() is calles by __ptrace_detach().
(you removed it in 5/16 but this looks wrong too). Sure, we can change
ptrace_detach() to use __wake_up_parent(p, p->parent), but for what?

I must have missed something.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
  2022-05-18 22:53                 ` Eric W. Biederman
  (?)
@ 2022-05-24 15:27                   ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 15:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> The code in ptrace_signal to populate siginfo if the signal number
> changed is buggy.  If the tracer contined the tracee using
> ptrace_detach it is guaranteed to use the real_parent (or possibly a
> new tracer) but definitely not the origional tracer to populate si_pid
> and si_uid.

I guess nobody cares. As the comment says

	 If the debugger wanted something
	 specific in the siginfo structure then it should
	 have updated *info via PTRACE_SETSIGINFO.

otherwise I don't think si_pid/si_uid have any value.

However the patch looks fine to me, just the word "buggy" looks a bit
too strong imo.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-05-24 15:27                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 15:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> The code in ptrace_signal to populate siginfo if the signal number
> changed is buggy.  If the tracer contined the tracee using
> ptrace_detach it is guaranteed to use the real_parent (or possibly a
> new tracer) but definitely not the origional tracer to populate si_pid
> and si_uid.

I guess nobody cares. As the comment says

	 If the debugger wanted something
	 specific in the siginfo structure then it should
	 have updated *info via PTRACE_SETSIGINFO.

otherwise I don't think si_pid/si_uid have any value.

However the patch looks fine to me, just the word "buggy" looks a bit
too strong imo.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-05-24 15:27                   ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 15:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/18, Eric W. Biederman wrote:
>
> The code in ptrace_signal to populate siginfo if the signal number
> changed is buggy.  If the tracer contined the tracee using
> ptrace_detach it is guaranteed to use the real_parent (or possibly a
> new tracer) but definitely not the origional tracer to populate si_pid
> and si_uid.

I guess nobody cares. As the comment says

	 If the debugger wanted something
	 specific in the siginfo structure then it should
	 have updated *info via PTRACE_SETSIGINFO.

otherwise I don't think si_pid/si_uid have any value.

However the patch looks fine to me, just the word "buggy" looks a bit
too strong imo.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
  2022-05-24 13:25                   ` Oleg Nesterov
  (?)
@ 2022-05-24 16:28                     ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 16:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> I fail to understand this patch...
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Today if a process is ptraced only the ptracer will ever be woken up in
> > wait
>
> and why is this wrong?
>
> > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>
> how does this change fix 75b95953a569?

OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
the problematic case is current->ptrace == T. Right?

I dislike this patch anyway, but let me think more about it.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-24 16:28                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 16:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> I fail to understand this patch...
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Today if a process is ptraced only the ptracer will ever be woken up in
> > wait
>
> and why is this wrong?
>
> > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>
> how does this change fix 75b95953a569?

OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
the problematic case is current->ptrace == T. Right?

I dislike this patch anyway, but let me think more about it.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-24 16:28                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-24 16:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> I fail to understand this patch...
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Today if a process is ptraced only the ptracer will ever be woken up in
> > wait
>
> and why is this wrong?
>
> > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>
> how does this change fix 75b95953a569?

OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
the problematic case is current->ptrace = T. Right?

I dislike this patch anyway, but let me think more about it.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
  2022-05-24 16:28                     ` Oleg Nesterov
  (?)
@ 2022-05-25 14:28                       ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> On 05/24, Oleg Nesterov wrote:
> >
> > I fail to understand this patch...
> >
> > On 05/18, Eric W. Biederman wrote:
> > >
> > > Today if a process is ptraced only the ptracer will ever be woken up in
> > > wait
> >
> > and why is this wrong?
> >
> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
> >
> > how does this change fix 75b95953a569?
>
> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
> the problematic case is current->ptrace == T. Right?
>
> I dislike this patch anyway, but let me think more about it.

OK, now that I understand the problem, the patch doesn't look bad to me,
although I'd ask to make the changelog more clear.

After this change __wake_up_parent() can't accept any "parent" from
p->parent thread group, but all callers look fine except ptrace_detach().

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-25 14:28                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> On 05/24, Oleg Nesterov wrote:
> >
> > I fail to understand this patch...
> >
> > On 05/18, Eric W. Biederman wrote:
> > >
> > > Today if a process is ptraced only the ptracer will ever be woken up in
> > > wait
> >
> > and why is this wrong?
> >
> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
> >
> > how does this change fix 75b95953a569?
>
> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
> the problematic case is current->ptrace == T. Right?
>
> I dislike this patch anyway, but let me think more about it.

OK, now that I understand the problem, the patch doesn't look bad to me,
although I'd ask to make the changelog more clear.

After this change __wake_up_parent() can't accept any "parent" from
p->parent thread group, but all callers look fine except ptrace_detach().

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-05-25 14:28                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> On 05/24, Oleg Nesterov wrote:
> >
> > I fail to understand this patch...
> >
> > On 05/18, Eric W. Biederman wrote:
> > >
> > > Today if a process is ptraced only the ptracer will ever be woken up in
> > > wait
> >
> > and why is this wrong?
> >
> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
> >
> > how does this change fix 75b95953a569?
>
> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
> the problematic case is current->ptrace = T. Right?
>
> I dislike this patch anyway, but let me think more about it.

OK, now that I understand the problem, the patch doesn't look bad to me,
although I'd ask to make the changelog more clear.

After this change __wake_up_parent() can't accept any "parent" from
p->parent thread group, but all callers look fine except ptrace_detach().

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
  2022-05-24 11:42                   ` Oleg Nesterov
  (?)
@ 2022-05-25 14:33                     ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> Sorry for delay.
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> > been impossible to attach another thread in the same thread group.
> >
> > Remove the code from __ptrace_detach that was trying to support
> > detaching from a thread in the same thread group.
>
> may be I am totally confused, but I think you misunderstood this code
> and thus this patch is very wrong.
>
> The same_thread_group() check does NOT try to check if debugger and
> tracee is in the same thread group, this is indeed impossible.
>
> We need this check to know if the tracee was ptrace_reparented() before
> __ptrace_unlink() or not.
>
>
> > -static int ignoring_children(struct sighand_struct *sigh)
> > -{
> > -	int ret;
> > -	spin_lock(&sigh->siglock);
> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> > -	spin_unlock(&sigh->siglock);
> > -	return ret;
> > -}
>
> ...
>
> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
> >
> >  	dead = !thread_group_leader(p);
> >
> > -	if (!dead && thread_group_empty(p)) {
> > -		if (!same_thread_group(p->real_parent, tracer))
> > -			dead = do_notify_parent(p, p->exit_signal);
> > -		else if (ignoring_children(tracer->sighand)) {
> > -			__wake_up_parent(p, tracer);
> > -			dead = true;
> > -		}
> > -	}
>
> So the code above does:
>
> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
> 	  ptrace_reparented(), and now we need to notify its natural parent
> 	  to let it know it has a zombie child.
>
> 	- otherwise, the tracee is our natural child, and it is actually dead.
> 	  however, since we are going to reap this task, we need to wake up our
> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>
> See?
>
> > +	if (!dead && thread_group_empty(p))
> > +		dead = do_notify_parent(p, p->exit_signal);
>
> No, this looks wrong. Or I missed something?

Yes, but...

That said, it seems that we do not need __wake_up_parent() if it was our
natural child?

I'll recheck. Eric, I'll continue to read this series tomorrow, can't
concentrate on ptrace today.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-25 14:33                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> Sorry for delay.
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> > been impossible to attach another thread in the same thread group.
> >
> > Remove the code from __ptrace_detach that was trying to support
> > detaching from a thread in the same thread group.
>
> may be I am totally confused, but I think you misunderstood this code
> and thus this patch is very wrong.
>
> The same_thread_group() check does NOT try to check if debugger and
> tracee is in the same thread group, this is indeed impossible.
>
> We need this check to know if the tracee was ptrace_reparented() before
> __ptrace_unlink() or not.
>
>
> > -static int ignoring_children(struct sighand_struct *sigh)
> > -{
> > -	int ret;
> > -	spin_lock(&sigh->siglock);
> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> > -	spin_unlock(&sigh->siglock);
> > -	return ret;
> > -}
>
> ...
>
> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
> >
> >  	dead = !thread_group_leader(p);
> >
> > -	if (!dead && thread_group_empty(p)) {
> > -		if (!same_thread_group(p->real_parent, tracer))
> > -			dead = do_notify_parent(p, p->exit_signal);
> > -		else if (ignoring_children(tracer->sighand)) {
> > -			__wake_up_parent(p, tracer);
> > -			dead = true;
> > -		}
> > -	}
>
> So the code above does:
>
> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
> 	  ptrace_reparented(), and now we need to notify its natural parent
> 	  to let it know it has a zombie child.
>
> 	- otherwise, the tracee is our natural child, and it is actually dead.
> 	  however, since we are going to reap this task, we need to wake up our
> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>
> See?
>
> > +	if (!dead && thread_group_empty(p))
> > +		dead = do_notify_parent(p, p->exit_signal);
>
> No, this looks wrong. Or I missed something?

Yes, but...

That said, it seems that we do not need __wake_up_parent() if it was our
natural child?

I'll recheck. Eric, I'll continue to read this series tomorrow, can't
concentrate on ptrace today.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-05-25 14:33                     ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-05-25 14:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 05/24, Oleg Nesterov wrote:
>
> Sorry for delay.
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> > been impossible to attach another thread in the same thread group.
> >
> > Remove the code from __ptrace_detach that was trying to support
> > detaching from a thread in the same thread group.
>
> may be I am totally confused, but I think you misunderstood this code
> and thus this patch is very wrong.
>
> The same_thread_group() check does NOT try to check if debugger and
> tracee is in the same thread group, this is indeed impossible.
>
> We need this check to know if the tracee was ptrace_reparented() before
> __ptrace_unlink() or not.
>
>
> > -static int ignoring_children(struct sighand_struct *sigh)
> > -{
> > -	int ret;
> > -	spin_lock(&sigh->siglock);
> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler = SIG_IGN) ||
> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> > -	spin_unlock(&sigh->siglock);
> > -	return ret;
> > -}
>
> ...
>
> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
> >
> >  	dead = !thread_group_leader(p);
> >
> > -	if (!dead && thread_group_empty(p)) {
> > -		if (!same_thread_group(p->real_parent, tracer))
> > -			dead = do_notify_parent(p, p->exit_signal);
> > -		else if (ignoring_children(tracer->sighand)) {
> > -			__wake_up_parent(p, tracer);
> > -			dead = true;
> > -		}
> > -	}
>
> So the code above does:
>
> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
> 	  ptrace_reparented(), and now we need to notify its natural parent
> 	  to let it know it has a zombie child.
>
> 	- otherwise, the tracee is our natural child, and it is actually dead.
> 	  however, since we are going to reap this task, we need to wake up our
> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>
> See?
>
> > +	if (!dead && thread_group_empty(p))
> > +		dead = do_notify_parent(p, p->exit_signal);
>
> No, this looks wrong. Or I missed something?

Yes, but...

That said, it seems that we do not need __wake_up_parent() if it was our
natural child?

I'll recheck. Eric, I'll continue to read this series tomorrow, can't
concentrate on ptrace today.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
  2022-05-25 14:33                     ` Oleg Nesterov
  (?)
@ 2022-06-06 16:06                       ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 16:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> Sorry for delay.
>>
>> On 05/18, Eric W. Biederman wrote:
>> >
>> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
>> > been impossible to attach another thread in the same thread group.
>> >
>> > Remove the code from __ptrace_detach that was trying to support
>> > detaching from a thread in the same thread group.
>>
>> may be I am totally confused, but I think you misunderstood this code
>> and thus this patch is very wrong.
>>
>> The same_thread_group() check does NOT try to check if debugger and
>> tracee is in the same thread group, this is indeed impossible.
>>
>> We need this check to know if the tracee was ptrace_reparented() before
>> __ptrace_unlink() or not.
>>
>>
>> > -static int ignoring_children(struct sighand_struct *sigh)
>> > -{
>> > -	int ret;
>> > -	spin_lock(&sigh->siglock);
>> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
>> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
>> > -	spin_unlock(&sigh->siglock);
>> > -	return ret;
>> > -}
>>
>> ...
>>
>> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>> >
>> >  	dead = !thread_group_leader(p);
>> >
>> > -	if (!dead && thread_group_empty(p)) {
>> > -		if (!same_thread_group(p->real_parent, tracer))
>> > -			dead = do_notify_parent(p, p->exit_signal);
>> > -		else if (ignoring_children(tracer->sighand)) {
>> > -			__wake_up_parent(p, tracer);
>> > -			dead = true;
>> > -		}
>> > -	}
>>
>> So the code above does:
>>
>> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
>> 	  ptrace_reparented(), and now we need to notify its natural parent
>> 	  to let it know it has a zombie child.
>>
>> 	- otherwise, the tracee is our natural child, and it is actually dead.
>> 	  however, since we are going to reap this task, we need to wake up our
>> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>>
>> See?
>>
>> > +	if (!dead && thread_group_empty(p))
>> > +		dead = do_notify_parent(p, p->exit_signal);
>>
>> No, this looks wrong. Or I missed something?
>
> Yes, but...
>
> That said, it seems that we do not need __wake_up_parent() if it was our
> natural child?

Agreed on both counts.

Hmm.  I see where the logic comes from.  The ignoring_children test and
the __wake_up_parent are what do_notify_parent does when the parent
ignores children.  Hmm.  I even see all of this document in the comment
above __ptrace_detach.

So I am just going to drop this change.

> I'll recheck. Eric, I'll continue to read this series tomorrow, can't
> concentrate on ptrace today.

No worries.  This was entirely too close to the merge window so I
dropped it all until today.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-06-06 16:06                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 16:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> Sorry for delay.
>>
>> On 05/18, Eric W. Biederman wrote:
>> >
>> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
>> > been impossible to attach another thread in the same thread group.
>> >
>> > Remove the code from __ptrace_detach that was trying to support
>> > detaching from a thread in the same thread group.
>>
>> may be I am totally confused, but I think you misunderstood this code
>> and thus this patch is very wrong.
>>
>> The same_thread_group() check does NOT try to check if debugger and
>> tracee is in the same thread group, this is indeed impossible.
>>
>> We need this check to know if the tracee was ptrace_reparented() before
>> __ptrace_unlink() or not.
>>
>>
>> > -static int ignoring_children(struct sighand_struct *sigh)
>> > -{
>> > -	int ret;
>> > -	spin_lock(&sigh->siglock);
>> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
>> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
>> > -	spin_unlock(&sigh->siglock);
>> > -	return ret;
>> > -}
>>
>> ...
>>
>> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>> >
>> >  	dead = !thread_group_leader(p);
>> >
>> > -	if (!dead && thread_group_empty(p)) {
>> > -		if (!same_thread_group(p->real_parent, tracer))
>> > -			dead = do_notify_parent(p, p->exit_signal);
>> > -		else if (ignoring_children(tracer->sighand)) {
>> > -			__wake_up_parent(p, tracer);
>> > -			dead = true;
>> > -		}
>> > -	}
>>
>> So the code above does:
>>
>> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
>> 	  ptrace_reparented(), and now we need to notify its natural parent
>> 	  to let it know it has a zombie child.
>>
>> 	- otherwise, the tracee is our natural child, and it is actually dead.
>> 	  however, since we are going to reap this task, we need to wake up our
>> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>>
>> See?
>>
>> > +	if (!dead && thread_group_empty(p))
>> > +		dead = do_notify_parent(p, p->exit_signal);
>>
>> No, this looks wrong. Or I missed something?
>
> Yes, but...
>
> That said, it seems that we do not need __wake_up_parent() if it was our
> natural child?

Agreed on both counts.

Hmm.  I see where the logic comes from.  The ignoring_children test and
the __wake_up_parent are what do_notify_parent does when the parent
ignores children.  Hmm.  I even see all of this document in the comment
above __ptrace_detach.

So I am just going to drop this change.

> I'll recheck. Eric, I'll continue to read this series tomorrow, can't
> concentrate on ptrace today.

No worries.  This was entirely too close to the merge window so I
dropped it all until today.

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach
@ 2022-06-06 16:06                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 16:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> Sorry for delay.
>>
>> On 05/18, Eric W. Biederman wrote:
>> >
>> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
>> > been impossible to attach another thread in the same thread group.
>> >
>> > Remove the code from __ptrace_detach that was trying to support
>> > detaching from a thread in the same thread group.
>>
>> may be I am totally confused, but I think you misunderstood this code
>> and thus this patch is very wrong.
>>
>> The same_thread_group() check does NOT try to check if debugger and
>> tracee is in the same thread group, this is indeed impossible.
>>
>> We need this check to know if the tracee was ptrace_reparented() before
>> __ptrace_unlink() or not.
>>
>>
>> > -static int ignoring_children(struct sighand_struct *sigh)
>> > -{
>> > -	int ret;
>> > -	spin_lock(&sigh->siglock);
>> > -	ret = (sigh->action[SIGCHLD-1].sa.sa_handler = SIG_IGN) ||
>> > -	      (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
>> > -	spin_unlock(&sigh->siglock);
>> > -	return ret;
>> > -}
>>
>> ...
>>
>> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>> >
>> >  	dead = !thread_group_leader(p);
>> >
>> > -	if (!dead && thread_group_empty(p)) {
>> > -		if (!same_thread_group(p->real_parent, tracer))
>> > -			dead = do_notify_parent(p, p->exit_signal);
>> > -		else if (ignoring_children(tracer->sighand)) {
>> > -			__wake_up_parent(p, tracer);
>> > -			dead = true;
>> > -		}
>> > -	}
>>
>> So the code above does:
>>
>> 	- if !same_thread_group(p->real_parent, tracer), then the tracee was
>> 	  ptrace_reparented(), and now we need to notify its natural parent
>> 	  to let it know it has a zombie child.
>>
>> 	- otherwise, the tracee is our natural child, and it is actually dead.
>> 	  however, since we are going to reap this task, we need to wake up our
>> 	  sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>>
>> See?
>>
>> > +	if (!dead && thread_group_empty(p))
>> > +		dead = do_notify_parent(p, p->exit_signal);
>>
>> No, this looks wrong. Or I missed something?
>
> Yes, but...
>
> That said, it seems that we do not need __wake_up_parent() if it was our
> natural child?

Agreed on both counts.

Hmm.  I see where the logic comes from.  The ignoring_children test and
the __wake_up_parent are what do_notify_parent does when the parent
ignores children.  Hmm.  I even see all of this document in the comment
above __ptrace_detach.

So I am just going to drop this change.

> I'll recheck. Eric, I'll continue to read this series tomorrow, can't
> concentrate on ptrace today.

No worries.  This was entirely too close to the merge window so I
dropped it all until today.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-05-20  5:24                       ` Kyle Huey
@ 2022-06-06 16:12                         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 16:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

Kyle Huey <khuey@pernos.co> writes:

> On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>>
>> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> >> cleanly to Linus's tip.
>> >
>> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>>
>> Yes that is the branch this all applies to.
>>
>> This is my second round of cleanups this cycle for this code.
>> I just keep finding little things that deserve to be changed,
>> when I am working on the more substantial issues.
>>
>> Eric
>
> When running the rr test suite, I see hangs like this

Thanks.  I will dig into this.

Is there an easy way I can run the rr test suite to see if I can
reproduce this myself?

Thanks,
Eric

>
> [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> [condvar_stress-:12152]
> [  812.151529] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> snd_hda_codec_
> hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> xhci_pci_renesas wmi video
> [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
>     I  L    5.18.0-rc1+ #2
> [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> a c1 9a 5f 85 c0 74 02 5d
> [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> knlGS:0000000000000000
> [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> [  812.151601] Call Trace:
> [  812.151602]  <TASK>
> [  812.151604]  do_signal_stop+0x228/0x260
> [  812.151606]  get_signal+0x43a/0x8e0
> [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> [  812.151621]  ret_from_fork+0x15/0x30
> [  812.151623] RIP: 0033:0x7f612dfcd125
> [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> 0 48 89 c7 b8 3c 00 00 00
> [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> [  812.151632]  </TASK>
>
> - Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-06-06 16:12                         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 16:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

Kyle Huey <khuey@pernos.co> writes:

> On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>>
>> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> >> cleanly to Linus's tip.
>> >
>> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>>
>> Yes that is the branch this all applies to.
>>
>> This is my second round of cleanups this cycle for this code.
>> I just keep finding little things that deserve to be changed,
>> when I am working on the more substantial issues.
>>
>> Eric
>
> When running the rr test suite, I see hangs like this

Thanks.  I will dig into this.

Is there an easy way I can run the rr test suite to see if I can
reproduce this myself?

Thanks,
Eric

>
> [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> [condvar_stress-:12152]
> [  812.151529] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> snd_hda_codec_
> hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> xhci_pci_renesas wmi video
> [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
>     I  L    5.18.0-rc1+ #2
> [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> a c1 9a 5f 85 c0 74 02 5d
> [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> knlGS:0000000000000000
> [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> [  812.151601] Call Trace:
> [  812.151602]  <TASK>
> [  812.151604]  do_signal_stop+0x228/0x260
> [  812.151606]  get_signal+0x43a/0x8e0
> [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> [  812.151621]  ret_from_fork+0x15/0x30
> [  812.151623] RIP: 0033:0x7f612dfcd125
> [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> 0 48 89 c7 b8 3c 00 00 00
> [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> [  812.151632]  </TASK>
>
> - Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
  2022-05-25 14:28                       ` Oleg Nesterov
  (?)
@ 2022-06-06 22:10                         ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> On 05/24, Oleg Nesterov wrote:
>> >
>> > I fail to understand this patch...
>> >
>> > On 05/18, Eric W. Biederman wrote:
>> > >
>> > > Today if a process is ptraced only the ptracer will ever be woken up in
>> > > wait
>> >
>> > and why is this wrong?
>> >
>> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>> >
>> > how does this change fix 75b95953a569?
>>
>> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
>> the problematic case is current->ptrace == T. Right?
>>
>> I dislike this patch anyway, but let me think more about it.
>
> OK, now that I understand the problem, the patch doesn't look bad to me,
> although I'd ask to make the changelog more clear.

I will see what I can do.

> After this change __wake_up_parent() can't accept any "parent" from
> p->parent thread group, but all callers look fine except
> ptrace_detach().

Having looked at it a little more I think the change was too
restrictive.  For the !ptrace_reparented case there are possibly
two threads of the parent process that wait_consider_task will
allow to wait even with __WNOTHREAD specified.  It is desirable
to wake them both up.

Which if I have had enough sleep reduces this patch to just:

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..c8156366b722 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
        if (!eligible_pid(wo, p))
                return 0;
 
-       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-               return 0;
+       if ((wo->wo_flags & __WNOTHREAD) &&
+           (wait->private != p->parent) &&
+           (wait->private != p->real_parent))
+                       return 0;
 
        return default_wake_function(wait, mode, sync, key);
 }


I think that solves the issue without missing wake-ups without adding
any more.

For the same set of reasons it looks like the __wake_up_parent in
__ptrace_detach is just simply dead code.  I don't think there is a case
where when !ptrace_reparented the thread that is the real_parent can
sleep in do_wait when the thread that was calling ptrace could not.

That needs a very close look to confirm. 

Eric


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-06-06 22:10                         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> On 05/24, Oleg Nesterov wrote:
>> >
>> > I fail to understand this patch...
>> >
>> > On 05/18, Eric W. Biederman wrote:
>> > >
>> > > Today if a process is ptraced only the ptracer will ever be woken up in
>> > > wait
>> >
>> > and why is this wrong?
>> >
>> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>> >
>> > how does this change fix 75b95953a569?
>>
>> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
>> the problematic case is current->ptrace == T. Right?
>>
>> I dislike this patch anyway, but let me think more about it.
>
> OK, now that I understand the problem, the patch doesn't look bad to me,
> although I'd ask to make the changelog more clear.

I will see what I can do.

> After this change __wake_up_parent() can't accept any "parent" from
> p->parent thread group, but all callers look fine except
> ptrace_detach().

Having looked at it a little more I think the change was too
restrictive.  For the !ptrace_reparented case there are possibly
two threads of the parent process that wait_consider_task will
allow to wait even with __WNOTHREAD specified.  It is desirable
to wake them both up.

Which if I have had enough sleep reduces this patch to just:

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..c8156366b722 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
        if (!eligible_pid(wo, p))
                return 0;
 
-       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-               return 0;
+       if ((wo->wo_flags & __WNOTHREAD) &&
+           (wait->private != p->parent) &&
+           (wait->private != p->real_parent))
+                       return 0;
 
        return default_wake_function(wait, mode, sync, key);
 }


I think that solves the issue without missing wake-ups without adding
any more.

For the same set of reasons it looks like the __wake_up_parent in
__ptrace_detach is just simply dead code.  I don't think there is a case
where when !ptrace_reparented the thread that is the real_parent can
sleep in do_wait when the thread that was calling ptrace could not.

That needs a very close look to confirm. 

Eric


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-06-06 22:10                         ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> On 05/24, Oleg Nesterov wrote:
>> >
>> > I fail to understand this patch...
>> >
>> > On 05/18, Eric W. Biederman wrote:
>> > >
>> > > Today if a process is ptraced only the ptracer will ever be woken up in
>> > > wait
>> >
>> > and why is this wrong?
>> >
>> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>> >
>> > how does this change fix 75b95953a569?
>>
>> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
>> the problematic case is current->ptrace = T. Right?
>>
>> I dislike this patch anyway, but let me think more about it.
>
> OK, now that I understand the problem, the patch doesn't look bad to me,
> although I'd ask to make the changelog more clear.

I will see what I can do.

> After this change __wake_up_parent() can't accept any "parent" from
> p->parent thread group, but all callers look fine except
> ptrace_detach().

Having looked at it a little more I think the change was too
restrictive.  For the !ptrace_reparented case there are possibly
two threads of the parent process that wait_consider_task will
allow to wait even with __WNOTHREAD specified.  It is desirable
to wake them both up.

Which if I have had enough sleep reduces this patch to just:

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..c8156366b722 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
        if (!eligible_pid(wo, p))
                return 0;
 
-       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
-               return 0;
+       if ((wo->wo_flags & __WNOTHREAD) &&
+           (wait->private != p->parent) &&
+           (wait->private != p->real_parent))
+                       return 0;
 
        return default_wake_function(wait, mode, sync, key);
 }


I think that solves the issue without missing wake-ups without adding
any more.

For the same set of reasons it looks like the __wake_up_parent in
__ptrace_detach is just simply dead code.  I don't think there is a case
where when !ptrace_reparented the thread that is the real_parent can
sleep in do_wait when the thread that was calling ptrace could not.

That needs a very close look to confirm. 

Eric

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
  2022-05-24 15:27                   ` Oleg Nesterov
  (?)
@ 2022-06-06 22:16                     ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/18, Eric W. Biederman wrote:
>>
>> The code in ptrace_signal to populate siginfo if the signal number
>> changed is buggy.  If the tracer contined the tracee using
>> ptrace_detach it is guaranteed to use the real_parent (or possibly a
>> new tracer) but definitely not the origional tracer to populate si_pid
>> and si_uid.
>
> I guess nobody cares. As the comment says
>
> 	 If the debugger wanted something
> 	 specific in the siginfo structure then it should
> 	 have updated *info via PTRACE_SETSIGINFO.
>
> otherwise I don't think si_pid/si_uid have any value.

No one has complained so it is clearly no one cares.  So it is
definitely not a regression.  Or even anything that needs to be
backported.

However si_pid and si_uid are defined with SI_USER are defined
to be whomever sent the signal.  So I would argue by definition
those values are wrong.

> However the patch looks fine to me, just the word "buggy" looks a bit
> too strong imo.

I guess I am in general agreement.  Perhaps I can just say they values
are wrong by definition?

Eric



^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-06-06 22:16                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/18, Eric W. Biederman wrote:
>>
>> The code in ptrace_signal to populate siginfo if the signal number
>> changed is buggy.  If the tracer contined the tracee using
>> ptrace_detach it is guaranteed to use the real_parent (or possibly a
>> new tracer) but definitely not the origional tracer to populate si_pid
>> and si_uid.
>
> I guess nobody cares. As the comment says
>
> 	 If the debugger wanted something
> 	 specific in the siginfo structure then it should
> 	 have updated *info via PTRACE_SETSIGINFO.
>
> otherwise I don't think si_pid/si_uid have any value.

No one has complained so it is clearly no one cares.  So it is
definitely not a regression.  Or even anything that needs to be
backported.

However si_pid and si_uid are defined with SI_USER are defined
to be whomever sent the signal.  So I would argue by definition
those values are wrong.

> However the patch looks fine to me, just the word "buggy" looks a bit
> too strong imo.

I guess I am in general agreement.  Perhaps I can just say they values
are wrong by definition?

Eric



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-06-06 22:16                     ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-06 22:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

Oleg Nesterov <oleg@redhat.com> writes:

> On 05/18, Eric W. Biederman wrote:
>>
>> The code in ptrace_signal to populate siginfo if the signal number
>> changed is buggy.  If the tracer contined the tracee using
>> ptrace_detach it is guaranteed to use the real_parent (or possibly a
>> new tracer) but definitely not the origional tracer to populate si_pid
>> and si_uid.
>
> I guess nobody cares. As the comment says
>
> 	 If the debugger wanted something
> 	 specific in the siginfo structure then it should
> 	 have updated *info via PTRACE_SETSIGINFO.
>
> otherwise I don't think si_pid/si_uid have any value.

No one has complained so it is clearly no one cares.  So it is
definitely not a regression.  Or even anything that needs to be
backported.

However si_pid and si_uid are defined with SI_USER are defined
to be whomever sent the signal.  So I would argue by definition
those values are wrong.

> However the patch looks fine to me, just the word "buggy" looks a bit
> too strong imo.

I guess I am in general agreement.  Perhaps I can just say they values
are wrong by definition?

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
  2022-06-06 22:10                         ` Eric W. Biederman
  (?)
@ 2022-06-07 15:26                           ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Which if I have had enough sleep reduces this patch to just:
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f072959fcab7..c8156366b722 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>         if (!eligible_pid(wo, p))
>                 return 0;
>
> -       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -               return 0;
> +       if ((wo->wo_flags & __WNOTHREAD) &&
> +           (wait->private != p->parent) &&
> +           (wait->private != p->real_parent))
> +                       return 0;
>
>         return default_wake_function(wait, mode, sync, key);
>  }
>
>
> I think that solves the issue without missing wake-ups without adding
> any more.

Agreed, and looks much simpler.

> For the same set of reasons it looks like the __wake_up_parent in
> __ptrace_detach is just simply dead code.  I don't think there is a case
> where when !ptrace_reparented the thread that is the real_parent can
> sleep in do_wait when the thread that was calling ptrace could not.

Yes... this doesn't really differ from the case when one thread reaps
a natural child and another thread sleep in do_wait().

> That needs a very close look to confirm.

Yes.

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-06-07 15:26                           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Which if I have had enough sleep reduces this patch to just:
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f072959fcab7..c8156366b722 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>         if (!eligible_pid(wo, p))
>                 return 0;
>
> -       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -               return 0;
> +       if ((wo->wo_flags & __WNOTHREAD) &&
> +           (wait->private != p->parent) &&
> +           (wait->private != p->real_parent))
> +                       return 0;
>
>         return default_wake_function(wait, mode, sync, key);
>  }
>
>
> I think that solves the issue without missing wake-ups without adding
> any more.

Agreed, and looks much simpler.

> For the same set of reasons it looks like the __wake_up_parent in
> __ptrace_detach is just simply dead code.  I don't think there is a case
> where when !ptrace_reparented the thread that is the real_parent can
> sleep in do_wait when the thread that was calling ptrace could not.

Yes... this doesn't really differ from the case when one thread reaps
a natural child and another thread sleep in do_wait().

> That needs a very close look to confirm.

Yes.

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 07/16] signal: Wake up the designated parent
@ 2022-06-07 15:26                           ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Which if I have had enough sleep reduces this patch to just:
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f072959fcab7..c8156366b722 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
>         if (!eligible_pid(wo, p))
>                 return 0;
>
> -       if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> -               return 0;
> +       if ((wo->wo_flags & __WNOTHREAD) &&
> +           (wait->private != p->parent) &&
> +           (wait->private != p->real_parent))
> +                       return 0;
>
>         return default_wake_function(wait, mode, sync, key);
>  }
>
>
> I think that solves the issue without missing wake-ups without adding
> any more.

Agreed, and looks much simpler.

> For the same set of reasons it looks like the __wake_up_parent in
> __ptrace_detach is just simply dead code.  I don't think there is a case
> where when !ptrace_reparented the thread that is the real_parent can
> sleep in do_wait when the thread that was calling ptrace could not.

Yes... this doesn't really differ from the case when one thread reaps
a natural child and another thread sleep in do_wait().

> That needs a very close look to confirm.

Yes.

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
  2022-06-06 22:16                     ` Eric W. Biederman
  (?)
@ 2022-06-07 15:29                       ` Oleg Nesterov
  -1 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > However the patch looks fine to me, just the word "buggy" looks a bit
> > too strong imo.
>
> I guess I am in general agreement.  Perhaps I can just say they values
> are wrong by definition?

Up to you. I won't really argue with "buggy".

Oleg.


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-06-07 15:29                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > However the patch looks fine to me, just the word "buggy" looks a bit
> > too strong imo.
>
> I guess I am in general agreement.  Perhaps I can just say they values
> are wrong by definition?

Up to you. I won't really argue with "buggy".

Oleg.


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace
@ 2022-06-07 15:29                       ` Oleg Nesterov
  0 siblings, 0 replies; 572+ messages in thread
From: Oleg Nesterov @ 2022-06-07 15:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, mingo, vincent.guittot, dietmar.eggemann,
	rostedt, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Peter Zijlstra, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um, Chris Zankel, Max Filippov, linux-xtensa, Kees Cook,
	Jann Horn, linux-ia64, Robert OCallahan, Kyle Huey,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Jason Wessel,
	Daniel Thompson, Douglas Anderson, Douglas Miller,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras

On 06/06, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > However the patch looks fine to me, just the word "buggy" looks a bit
> > too strong imo.
>
> I guess I am in general agreement.  Perhaps I can just say they values
> are wrong by definition?

Up to you. I won't really argue with "buggy".

Oleg.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
  2022-06-06 16:12                         ` Eric W. Biederman
  (?)
@ 2022-06-09 19:59                           ` Kyle Huey
  -1 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-06-09 19:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <khuey@pernos.co> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >>
> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks.  I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [  812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> >     I  L    5.18.0-rc1+ #2
> > [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [  812.151601] Call Trace:
> > [  812.151602]  <TASK>
> > [  812.151604]  do_signal_stop+0x228/0x260
> > [  812.151606]  get_signal+0x43a/0x8e0
> > [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> > [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> > [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> > [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> > [  812.151621]  ret_from_fork+0x15/0x30
> > [  812.151623] RIP: 0033:0x7f612dfcd125
> > [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [  812.151632]  </TASK>
> >
> > - Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-06-09 19:59                           ` Kyle Huey
  0 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-06-09 19:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <khuey@pernos.co> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >>
> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks.  I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [  812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> >     I  L    5.18.0-rc1+ #2
> > [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [  812.151601] Call Trace:
> > [  812.151602]  <TASK>
> > [  812.151604]  do_signal_stop+0x228/0x260
> > [  812.151606]  get_signal+0x43a/0x8e0
> > [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> > [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> > [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> > [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> > [  812.151621]  ret_from_fork+0x15/0x30
> > [  812.151623] RIP: 0033:0x7f612dfcd125
> > [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [  812.151632]  </TASK>
> >
> > - Kyle

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
@ 2022-06-09 19:59                           ` Kyle Huey
  0 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-06-09 19:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Sebastian Andrzej Siewior, LKML, rjw, oleg, mingo,
	vincent.guittot, dietmar.eggemann, rostedt, mgorman, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Jann Horn, Kees Cook, linux-ia64,
	Robert O'Callahan, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Jason Wessel, Daniel Thompson, Douglas Anderson,
	Douglas Miller, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras

On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kyle Huey <khuey@pernos.co> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >>
> >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks.  I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [  812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [  812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [  812.151570]  libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [  812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> >     I  L    5.18.0-rc1+ #2
> > [  812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [  812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [  812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [  812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [  812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [  812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [  812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [  812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [  812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [  812.151598] FS:  00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [  812.151599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [  812.151601] Call Trace:
> > [  812.151602]  <TASK>
> > [  812.151604]  do_signal_stop+0x228/0x260
> > [  812.151606]  get_signal+0x43a/0x8e0
> > [  812.151608]  arch_do_signal_or_restart+0x37/0x7d0
> > [  812.151610]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151612]  ? __perf_event_task_sched_in+0x81/0x230
> > [  812.151616]  ? __this_cpu_preempt_check+0x13/0x20
> > [  812.151617]  exit_to_user_mode_prepare+0x130/0x1a0
> > [  812.151620]  syscall_exit_to_user_mode+0x26/0x40
> > [  812.151621]  ret_from_fork+0x15/0x30
> > [  812.151623] RIP: 0033:0x7f612dfcd125
> > [  812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [  812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [  812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [  812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [  812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [  812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [  812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [  812.151632]  </TASK>
> >
> > - Kyle

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-05-05 18:26               ` [PATCH v4 12/12] sched, signal, ptrace: " Eric W. Biederman
  (?)
@ 2022-06-21 13:00                 ` Alexander Gordeev
  -1 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 13:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Hi Eric, Peter,

On s390 this patch triggers warning at kernel/ptrace.c:272 when
kill_child testcase from strace tool is repeatedly used (the source
is attached for reference):

while :; do
	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
done

It normally takes few minutes to cause the warning in -rc3, but FWIW
it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.

Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
fail") suggests this WARN_ON_ONCE() is not really expected, yet we
observe a child in __TASK_TRACED state. Could you please comment here?

Thanks!

[-- Attachment #2: kill_child.c --]
[-- Type: text/plain, Size: 1305 bytes --]

/*
 * Check for the corner case that previously lead to segfault
 * due to an attempt to access unitialised tcp->s_ent.
 *
 * 13994 ????( <unfinished ...>
 * ...
 * 13994 <... ???? resumed>) = ?
 *
 * Copyright (c) 2019 The strace developers.
 * All rights reserved.
 *
 * SPDX-License-Identifier: GPL-2.0-or-later
 */

#include "tests.h"

#include <sched.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define ITERS    10000
#define SC_ITERS 10000

int
main(void)
{
	volatile sig_atomic_t *const mem =
		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	if (mem == MAP_FAILED)
		perror_msg_and_fail("mmap");

	for (unsigned int i = 0; i < ITERS; ++i) {
		mem[0] = mem[1] = 0;

		const pid_t pid = fork();
		if (pid < 0)
			perror_msg_and_fail("fork");

		if (!pid) {
			/* wait for the parent */
			while (!mem[0])
				;
			/* let the parent know we are running */
			mem[1] = 1;

			for (unsigned int j = 0; j < SC_ITERS; j++)
				sched_yield();

			pause();
			return 0;
		}

		/* let the child know we are running */
		mem[0] = 1;
		/* wait for the child */
		while (!mem[1])
			;

		if (kill(pid, SIGKILL))
			perror_msg_and_fail("kill");
		if (wait(NULL) != pid)
			perror_msg_and_fail("wait");
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 13:00                 ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 13:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Hi Eric, Peter,

On s390 this patch triggers warning at kernel/ptrace.c:272 when
kill_child testcase from strace tool is repeatedly used (the source
is attached for reference):

while :; do
	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
done

It normally takes few minutes to cause the warning in -rc3, but FWIW
it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.

Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
fail") suggests this WARN_ON_ONCE() is not really expected, yet we
observe a child in __TASK_TRACED state. Could you please comment here?

Thanks!

[-- Attachment #2: kill_child.c --]
[-- Type: text/plain, Size: 1305 bytes --]

/*
 * Check for the corner case that previously lead to segfault
 * due to an attempt to access unitialised tcp->s_ent.
 *
 * 13994 ????( <unfinished ...>
 * ...
 * 13994 <... ???? resumed>) = ?
 *
 * Copyright (c) 2019 The strace developers.
 * All rights reserved.
 *
 * SPDX-License-Identifier: GPL-2.0-or-later
 */

#include "tests.h"

#include <sched.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define ITERS    10000
#define SC_ITERS 10000

int
main(void)
{
	volatile sig_atomic_t *const mem =
		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	if (mem == MAP_FAILED)
		perror_msg_and_fail("mmap");

	for (unsigned int i = 0; i < ITERS; ++i) {
		mem[0] = mem[1] = 0;

		const pid_t pid = fork();
		if (pid < 0)
			perror_msg_and_fail("fork");

		if (!pid) {
			/* wait for the parent */
			while (!mem[0])
				;
			/* let the parent know we are running */
			mem[1] = 1;

			for (unsigned int j = 0; j < SC_ITERS; j++)
				sched_yield();

			pause();
			return 0;
		}

		/* let the child know we are running */
		mem[0] = 1;
		/* wait for the child */
		while (!mem[1])
			;

		if (kill(pid, SIGKILL))
			perror_msg_and_fail("kill");
		if (wait(NULL) != pid)
			perror_msg_and_fail("wait");
	}

	return 0;
}

[-- Attachment #3: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 13:00                 ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 13:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
> 
> There's two spots of bother with this:
> 
>  - PREEMPT_RT has task->saved_state which complicates matters,
>    meaning task_is_{traced,stopped}() needs to check an additional
>    variable.
> 
>  - An alternative freezer implementation that itself relies on a
>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>    result in misbehaviour.
> 
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
> 
> NOTE: this doesn't actually fix anything yet, just adds extra state.
> 
> --EWB
>   * didn't add a unnecessary newline in signal.h
>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>     instead of in signal_wake_up_state.  This prevents the clearing
>     of TASK_STOPPED and TASK_TRACED from getting lost.
>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Hi Eric, Peter,

On s390 this patch triggers warning at kernel/ptrace.c:272 when
kill_child testcase from strace tool is repeatedly used (the source
is attached for reference):

while :; do
	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
done

It normally takes few minutes to cause the warning in -rc3, but FWIW
it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.

Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
fail") suggests this WARN_ON_ONCE() is not really expected, yet we
observe a child in __TASK_TRACED state. Could you please comment here?

Thanks!

[-- Attachment #2: kill_child.c --]
[-- Type: text/plain, Size: 1305 bytes --]

/*
 * Check for the corner case that previously lead to segfault
 * due to an attempt to access unitialised tcp->s_ent.
 *
 * 13994 ????( <unfinished ...>
 * ...
 * 13994 <... ???? resumed>) = ?
 *
 * Copyright (c) 2019 The strace developers.
 * All rights reserved.
 *
 * SPDX-License-Identifier: GPL-2.0-or-later
 */

#include "tests.h"

#include <sched.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

#define ITERS    10000
#define SC_ITERS 10000

int
main(void)
{
	volatile sig_atomic_t *const mem =
		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	if (mem == MAP_FAILED)
		perror_msg_and_fail("mmap");

	for (unsigned int i = 0; i < ITERS; ++i) {
		mem[0] = mem[1] = 0;

		const pid_t pid = fork();
		if (pid < 0)
			perror_msg_and_fail("fork");

		if (!pid) {
			/* wait for the parent */
			while (!mem[0])
				;
			/* let the parent know we are running */
			mem[1] = 1;

			for (unsigned int j = 0; j < SC_ITERS; j++)
				sched_yield();

			pause();
			return 0;
		}

		/* let the child know we are running */
		mem[0] = 1;
		/* wait for the child */
		while (!mem[1])
			;

		if (kill(pid, SIGKILL))
			perror_msg_and_fail("kill");
		if (wait(NULL) != pid)
			perror_msg_and_fail("wait");
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-21 13:00                 ` Alexander Gordeev
@ 2022-06-21 14:02                   ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-21 14:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>> 
>> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> state exists only in task->__state and nowhere else.
>> 
>> There's two spots of bother with this:
>> 
>>  - PREEMPT_RT has task->saved_state which complicates matters,
>>    meaning task_is_{traced,stopped}() needs to check an additional
>>    variable.
>> 
>>  - An alternative freezer implementation that itself relies on a
>>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>>    result in misbehaviour.
>> 
>> As such, add additional state to task->jobctl to track this state
>> outside of task->__state.
>> 
>> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> 
>> --EWB
>>   * didn't add a unnecessary newline in signal.h
>>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>>     instead of in signal_wake_up_state.  This prevents the clearing
>>     of TASK_STOPPED and TASK_TRACED from getting lost.
>>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>
> Hi Eric, Peter,
>
> On s390 this patch triggers warning at kernel/ptrace.c:272 when
> kill_child testcase from strace tool is repeatedly used (the source
> is attached for reference):
>
> while :; do
> 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> done
>
> It normally takes few minutes to cause the warning in -rc3, but FWIW
> it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>
> Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> observe a child in __TASK_TRACED state. Could you please comment here?
>

For clarity the warning is that the child is not in __TASK_TRACED state.

The code is waiting for the code to stop in the scheduler in the
__TASK_TRACED state so that it can safely read and change the
processes state.  Some of that state is not even saved until the
process is scheduled out so we have to wait until the process
is stopped in the scheduler.

At least on s390 it looks like there is a race between SIGKILL and
ptrace_check_attach.  That isn't good.

Reading the code below there is something missing because I don't see
anything making ptrace calls, and ptrace_check_attach (which contains
the warning) only happens in the ptrace syscall.

Eric



> Thanks!
>
> /*
>  * Check for the corner case that previously lead to segfault
>  * due to an attempt to access unitialised tcp->s_ent.
>  *
>  * 13994 ????( <unfinished ...>
>  * ...
>  * 13994 <... ???? resumed>) = ?
>  *
>  * Copyright (c) 2019 The strace developers.
>  * All rights reserved.
>  *
>  * SPDX-License-Identifier: GPL-2.0-or-later
>  */
>
> #include "tests.h"
>
> #include <sched.h>
> #include <signal.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
>
> #define ITERS    10000
> #define SC_ITERS 10000
>
> int
> main(void)
> {
> 	volatile sig_atomic_t *const mem =
> 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> 	if (mem == MAP_FAILED)
> 		perror_msg_and_fail("mmap");
>
> 	for (unsigned int i = 0; i < ITERS; ++i) {
> 		mem[0] = mem[1] = 0;
>
> 		const pid_t pid = fork();
> 		if (pid < 0)
> 			perror_msg_and_fail("fork");
>
> 		if (!pid) {
> 			/* wait for the parent */
> 			while (!mem[0])
> 				;
> 			/* let the parent know we are running */
> 			mem[1] = 1;
>
> 			for (unsigned int j = 0; j < SC_ITERS; j++)
> 				sched_yield();
>
> 			pause();
> 			return 0;
> 		}
>
> 		/* let the child know we are running */
> 		mem[0] = 1;
> 		/* wait for the child */
> 		while (!mem[1])
> 			;
>
> 		if (kill(pid, SIGKILL))
> 			perror_msg_and_fail("kill");
> 		if (wait(NULL) != pid)
> 			perror_msg_and_fail("wait");
> 	}
>
> 	return 0;
> }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 14:02                   ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-21 14:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>> 
>> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> state exists only in task->__state and nowhere else.
>> 
>> There's two spots of bother with this:
>> 
>>  - PREEMPT_RT has task->saved_state which complicates matters,
>>    meaning task_is_{traced,stopped}() needs to check an additional
>>    variable.
>> 
>>  - An alternative freezer implementation that itself relies on a
>>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>>    result in misbehaviour.
>> 
>> As such, add additional state to task->jobctl to track this state
>> outside of task->__state.
>> 
>> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> 
>> --EWB
>>   * didn't add a unnecessary newline in signal.h
>>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>>     instead of in signal_wake_up_state.  This prevents the clearing
>>     of TASK_STOPPED and TASK_TRACED from getting lost.
>>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>
> Hi Eric, Peter,
>
> On s390 this patch triggers warning at kernel/ptrace.c:272 when
> kill_child testcase from strace tool is repeatedly used (the source
> is attached for reference):
>
> while :; do
> 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> done
>
> It normally takes few minutes to cause the warning in -rc3, but FWIW
> it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>
> Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> observe a child in __TASK_TRACED state. Could you please comment here?
>

For clarity the warning is that the child is not in __TASK_TRACED state.

The code is waiting for the code to stop in the scheduler in the
__TASK_TRACED state so that it can safely read and change the
processes state.  Some of that state is not even saved until the
process is scheduled out so we have to wait until the process
is stopped in the scheduler.

At least on s390 it looks like there is a race between SIGKILL and
ptrace_check_attach.  That isn't good.

Reading the code below there is something missing because I don't see
anything making ptrace calls, and ptrace_check_attach (which contains
the warning) only happens in the ptrace syscall.

Eric



> Thanks!
>
> /*
>  * Check for the corner case that previously lead to segfault
>  * due to an attempt to access unitialised tcp->s_ent.
>  *
>  * 13994 ????( <unfinished ...>
>  * ...
>  * 13994 <... ???? resumed>) = ?
>  *
>  * Copyright (c) 2019 The strace developers.
>  * All rights reserved.
>  *
>  * SPDX-License-Identifier: GPL-2.0-or-later
>  */
>
> #include "tests.h"
>
> #include <sched.h>
> #include <signal.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
>
> #define ITERS    10000
> #define SC_ITERS 10000
>
> int
> main(void)
> {
> 	volatile sig_atomic_t *const mem > 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> 	if (mem = MAP_FAILED)
> 		perror_msg_and_fail("mmap");
>
> 	for (unsigned int i = 0; i < ITERS; ++i) {
> 		mem[0] = mem[1] = 0;
>
> 		const pid_t pid = fork();
> 		if (pid < 0)
> 			perror_msg_and_fail("fork");
>
> 		if (!pid) {
> 			/* wait for the parent */
> 			while (!mem[0])
> 				;
> 			/* let the parent know we are running */
> 			mem[1] = 1;
>
> 			for (unsigned int j = 0; j < SC_ITERS; j++)
> 				sched_yield();
>
> 			pause();
> 			return 0;
> 		}
>
> 		/* let the child know we are running */
> 		mem[0] = 1;
> 		/* wait for the child */
> 		while (!mem[1])
> 			;
>
> 		if (kill(pid, SIGKILL))
> 			perror_msg_and_fail("kill");
> 		if (wait(NULL) != pid)
> 			perror_msg_and_fail("wait");
> 	}
>
> 	return 0;
> }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-21 14:02                   ` Eric W. Biederman
  (?)
@ 2022-06-21 15:15                     ` Alexander Gordeev
  -1 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 15:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
> Alexander Gordeev <agordeev@linux.ibm.com> writes:
> 
> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> >> From: Peter Zijlstra <peterz@infradead.org>
> >> 
> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> >> state exists only in task->__state and nowhere else.
> >> 
> >> There's two spots of bother with this:
> >> 
> >>  - PREEMPT_RT has task->saved_state which complicates matters,
> >>    meaning task_is_{traced,stopped}() needs to check an additional
> >>    variable.
> >> 
> >>  - An alternative freezer implementation that itself relies on a
> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> >>    result in misbehaviour.
> >> 
> >> As such, add additional state to task->jobctl to track this state
> >> outside of task->__state.
> >> 
> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
> >> 
> >> --EWB
> >>   * didn't add a unnecessary newline in signal.h
> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
> >>     instead of in signal_wake_up_state.  This prevents the clearing
> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> >
> > Hi Eric, Peter,
> >
> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
> > kill_child testcase from strace tool is repeatedly used (the source
> > is attached for reference):
> >
> > while :; do
> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> > done
> >
> > It normally takes few minutes to cause the warning in -rc3, but FWIW
> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
> >
> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> > observe a child in __TASK_TRACED state. Could you please comment here?
> >
> 
> For clarity the warning is that the child is not in __TASK_TRACED state.
> 
> The code is waiting for the code to stop in the scheduler in the
> __TASK_TRACED state so that it can safely read and change the
> processes state.  Some of that state is not even saved until the
> process is scheduled out so we have to wait until the process
> is stopped in the scheduler.

So I assume (checked actually) the return 0 below from kernel/sched/core.c:
wait_task_inactive() is where it bails out:

3303                 while (task_running(rq, p)) {
3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
3305                                 return 0;
3306                         cpu_relax();
3307                 }

Yet, the child task is always found in __TASK_TRACED state (as seen
in crash dumps):

> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
  101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
  108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
crash> task bb04b200 __state
PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
  __state = 8,

crash> task d0b10100 __state
PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
  __state = 8,

> At least on s390 it looks like there is a race between SIGKILL and
> ptrace_check_attach.  That isn't good.
>
> Reading the code below there is something missing because I don't see
> anything making ptrace calls, and ptrace_check_attach (which contains
> the warning) only happens in the ptrace syscall.

That is what I believe strace does when calling that code:

 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child

> Eric
> 
> 
> 
> > Thanks!
> >
> > /*
> >  * Check for the corner case that previously lead to segfault
> >  * due to an attempt to access unitialised tcp->s_ent.
> >  *
> >  * 13994 ????( <unfinished ...>
> >  * ...
> >  * 13994 <... ???? resumed>) = ?
> >  *
> >  * Copyright (c) 2019 The strace developers.
> >  * All rights reserved.
> >  *
> >  * SPDX-License-Identifier: GPL-2.0-or-later
> >  */
> >
> > #include "tests.h"
> >
> > #include <sched.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > #include <sys/wait.h>
> >
> > #define ITERS    10000
> > #define SC_ITERS 10000
> >
> > int
> > main(void)
> > {
> > 	volatile sig_atomic_t *const mem =
> > 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> > 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > 	if (mem == MAP_FAILED)
> > 		perror_msg_and_fail("mmap");
> >
> > 	for (unsigned int i = 0; i < ITERS; ++i) {
> > 		mem[0] = mem[1] = 0;
> >
> > 		const pid_t pid = fork();
> > 		if (pid < 0)
> > 			perror_msg_and_fail("fork");
> >
> > 		if (!pid) {
> > 			/* wait for the parent */
> > 			while (!mem[0])
> > 				;
> > 			/* let the parent know we are running */
> > 			mem[1] = 1;
> >
> > 			for (unsigned int j = 0; j < SC_ITERS; j++)
> > 				sched_yield();
> >
> > 			pause();
> > 			return 0;
> > 		}
> >
> > 		/* let the child know we are running */
> > 		mem[0] = 1;
> > 		/* wait for the child */
> > 		while (!mem[1])
> > 			;
> >
> > 		if (kill(pid, SIGKILL))
> > 			perror_msg_and_fail("kill");
> > 		if (wait(NULL) != pid)
> > 			perror_msg_and_fail("wait");
> > 	}
> >
> > 	return 0;
> > }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 15:15                     ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 15:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
> Alexander Gordeev <agordeev@linux.ibm.com> writes:
> 
> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> >> From: Peter Zijlstra <peterz@infradead.org>
> >> 
> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> >> state exists only in task->__state and nowhere else.
> >> 
> >> There's two spots of bother with this:
> >> 
> >>  - PREEMPT_RT has task->saved_state which complicates matters,
> >>    meaning task_is_{traced,stopped}() needs to check an additional
> >>    variable.
> >> 
> >>  - An alternative freezer implementation that itself relies on a
> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> >>    result in misbehaviour.
> >> 
> >> As such, add additional state to task->jobctl to track this state
> >> outside of task->__state.
> >> 
> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
> >> 
> >> --EWB
> >>   * didn't add a unnecessary newline in signal.h
> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
> >>     instead of in signal_wake_up_state.  This prevents the clearing
> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> >
> > Hi Eric, Peter,
> >
> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
> > kill_child testcase from strace tool is repeatedly used (the source
> > is attached for reference):
> >
> > while :; do
> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> > done
> >
> > It normally takes few minutes to cause the warning in -rc3, but FWIW
> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
> >
> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> > observe a child in __TASK_TRACED state. Could you please comment here?
> >
> 
> For clarity the warning is that the child is not in __TASK_TRACED state.
> 
> The code is waiting for the code to stop in the scheduler in the
> __TASK_TRACED state so that it can safely read and change the
> processes state.  Some of that state is not even saved until the
> process is scheduled out so we have to wait until the process
> is stopped in the scheduler.

So I assume (checked actually) the return 0 below from kernel/sched/core.c:
wait_task_inactive() is where it bails out:

3303                 while (task_running(rq, p)) {
3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
3305                                 return 0;
3306                         cpu_relax();
3307                 }

Yet, the child task is always found in __TASK_TRACED state (as seen
in crash dumps):

> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
  101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
  108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
crash> task bb04b200 __state
PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
  __state = 8,

crash> task d0b10100 __state
PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
  __state = 8,

> At least on s390 it looks like there is a race between SIGKILL and
> ptrace_check_attach.  That isn't good.
>
> Reading the code below there is something missing because I don't see
> anything making ptrace calls, and ptrace_check_attach (which contains
> the warning) only happens in the ptrace syscall.

That is what I believe strace does when calling that code:

 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child

> Eric
> 
> 
> 
> > Thanks!
> >
> > /*
> >  * Check for the corner case that previously lead to segfault
> >  * due to an attempt to access unitialised tcp->s_ent.
> >  *
> >  * 13994 ????( <unfinished ...>
> >  * ...
> >  * 13994 <... ???? resumed>) = ?
> >  *
> >  * Copyright (c) 2019 The strace developers.
> >  * All rights reserved.
> >  *
> >  * SPDX-License-Identifier: GPL-2.0-or-later
> >  */
> >
> > #include "tests.h"
> >
> > #include <sched.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > #include <sys/wait.h>
> >
> > #define ITERS    10000
> > #define SC_ITERS 10000
> >
> > int
> > main(void)
> > {
> > 	volatile sig_atomic_t *const mem =
> > 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> > 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > 	if (mem == MAP_FAILED)
> > 		perror_msg_and_fail("mmap");
> >
> > 	for (unsigned int i = 0; i < ITERS; ++i) {
> > 		mem[0] = mem[1] = 0;
> >
> > 		const pid_t pid = fork();
> > 		if (pid < 0)
> > 			perror_msg_and_fail("fork");
> >
> > 		if (!pid) {
> > 			/* wait for the parent */
> > 			while (!mem[0])
> > 				;
> > 			/* let the parent know we are running */
> > 			mem[1] = 1;
> >
> > 			for (unsigned int j = 0; j < SC_ITERS; j++)
> > 				sched_yield();
> >
> > 			pause();
> > 			return 0;
> > 		}
> >
> > 		/* let the child know we are running */
> > 		mem[0] = 1;
> > 		/* wait for the child */
> > 		while (!mem[1])
> > 			;
> >
> > 		if (kill(pid, SIGKILL))
> > 			perror_msg_and_fail("kill");
> > 		if (wait(NULL) != pid)
> > 			perror_msg_and_fail("wait");
> > 	}
> >
> > 	return 0;
> > }

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 15:15                     ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-21 15:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
> Alexander Gordeev <agordeev@linux.ibm.com> writes:
> 
> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
> >> From: Peter Zijlstra <peterz@infradead.org>
> >> 
> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> >> state exists only in task->__state and nowhere else.
> >> 
> >> There's two spots of bother with this:
> >> 
> >>  - PREEMPT_RT has task->saved_state which complicates matters,
> >>    meaning task_is_{traced,stopped}() needs to check an additional
> >>    variable.
> >> 
> >>  - An alternative freezer implementation that itself relies on a
> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> >>    result in misbehaviour.
> >> 
> >> As such, add additional state to task->jobctl to track this state
> >> outside of task->__state.
> >> 
> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
> >> 
> >> --EWB
> >>   * didn't add a unnecessary newline in signal.h
> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
> >>     instead of in signal_wake_up_state.  This prevents the clearing
> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
> >
> > Hi Eric, Peter,
> >
> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
> > kill_child testcase from strace tool is repeatedly used (the source
> > is attached for reference):
> >
> > while :; do
> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
> > done
> >
> > It normally takes few minutes to cause the warning in -rc3, but FWIW
> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
> >
> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
> > observe a child in __TASK_TRACED state. Could you please comment here?
> >
> 
> For clarity the warning is that the child is not in __TASK_TRACED state.
> 
> The code is waiting for the code to stop in the scheduler in the
> __TASK_TRACED state so that it can safely read and change the
> processes state.  Some of that state is not even saved until the
> process is scheduled out so we have to wait until the process
> is stopped in the scheduler.

So I assume (checked actually) the return 0 below from kernel/sched/core.c:
wait_task_inactive() is where it bails out:

3303                 while (task_running(rq, p)) {
3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
3305                                 return 0;
3306                         cpu_relax();
3307                 }

Yet, the child task is always found in __TASK_TRACED state (as seen
in crash dumps):

> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
  101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
  108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
crash> task bb04b200 __state
PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
  __state = 8,

crash> task d0b10100 __state
PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
  __state = 8,

> At least on s390 it looks like there is a race between SIGKILL and
> ptrace_check_attach.  That isn't good.
>
> Reading the code below there is something missing because I don't see
> anything making ptrace calls, and ptrace_check_attach (which contains
> the warning) only happens in the ptrace syscall.

That is what I believe strace does when calling that code:

 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child

> Eric
> 
> 
> 
> > Thanks!
> >
> > /*
> >  * Check for the corner case that previously lead to segfault
> >  * due to an attempt to access unitialised tcp->s_ent.
> >  *
> >  * 13994 ????( <unfinished ...>
> >  * ...
> >  * 13994 <... ???? resumed>) = ?
> >  *
> >  * Copyright (c) 2019 The strace developers.
> >  * All rights reserved.
> >  *
> >  * SPDX-License-Identifier: GPL-2.0-or-later
> >  */
> >
> > #include "tests.h"
> >
> > #include <sched.h>
> > #include <signal.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > #include <sys/wait.h>
> >
> > #define ITERS    10000
> > #define SC_ITERS 10000
> >
> > int
> > main(void)
> > {
> > 	volatile sig_atomic_t *const mem > > 		mmap(NULL, get_page_size(), PROT_READ | PROT_WRITE,
> > 		     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > 	if (mem = MAP_FAILED)
> > 		perror_msg_and_fail("mmap");
> >
> > 	for (unsigned int i = 0; i < ITERS; ++i) {
> > 		mem[0] = mem[1] = 0;
> >
> > 		const pid_t pid = fork();
> > 		if (pid < 0)
> > 			perror_msg_and_fail("fork");
> >
> > 		if (!pid) {
> > 			/* wait for the parent */
> > 			while (!mem[0])
> > 				;
> > 			/* let the parent know we are running */
> > 			mem[1] = 1;
> >
> > 			for (unsigned int j = 0; j < SC_ITERS; j++)
> > 				sched_yield();
> >
> > 			pause();
> > 			return 0;
> > 		}
> >
> > 		/* let the child know we are running */
> > 		mem[0] = 1;
> > 		/* wait for the child */
> > 		while (!mem[1])
> > 			;
> >
> > 		if (kill(pid, SIGKILL))
> > 			perror_msg_and_fail("kill");
> > 		if (wait(NULL) != pid)
> > 			perror_msg_and_fail("wait");
> > 	}
> >
> > 	return 0;
> > }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-21 15:15                     ` Alexander Gordeev
  (?)
@ 2022-06-21 17:47                       ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-21 17:47 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

That is weird.

>> At least on s390 it looks like there is a race between SIGKILL and
>> ptrace_check_attach.  That isn't good.
>>
>> Reading the code below there is something missing because I don't see
>> anything making ptrace calls, and ptrace_check_attach (which contains
>> the warning) only happens in the ptrace syscall.
>
> That is what I believe strace does when calling that code:
>
>  	strace -f -qq -e signal=none -e trace=sched_yield,/kill	./kill_child

Thank you.  That was my braino.

I will have to see if it reproduces for me on x86 (I don't have an
s390).  Perhaps if I can reproduce it I can guess what is going wrong.

So far it appears WARN_ON_ONCE has nothing to warn about yet it is
warning.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 17:47                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-21 17:47 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

That is weird.

>> At least on s390 it looks like there is a race between SIGKILL and
>> ptrace_check_attach.  That isn't good.
>>
>> Reading the code below there is something missing because I don't see
>> anything making ptrace calls, and ptrace_check_attach (which contains
>> the warning) only happens in the ptrace syscall.
>
> That is what I believe strace does when calling that code:
>
>  	strace -f -qq -e signal=none -e trace=sched_yield,/kill	./kill_child

Thank you.  That was my braino.

I will have to see if it reproduces for me on x86 (I don't have an
s390).  Perhaps if I can reproduce it I can guess what is going wrong.

So far it appears WARN_ON_ONCE has nothing to warn about yet it is
warning.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-21 17:47                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-21 17:47 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

That is weird.

>> At least on s390 it looks like there is a race between SIGKILL and
>> ptrace_check_attach.  That isn't good.
>>
>> Reading the code below there is something missing because I don't see
>> anything making ptrace calls, and ptrace_check_attach (which contains
>> the warning) only happens in the ptrace syscall.
>
> That is what I believe strace does when calling that code:
>
>  	strace -f -qq -e signal=none -e trace=sched_yield,/kill	./kill_child

Thank you.  That was my braino.

I will have to see if it reproduces for me on x86 (I don't have an
s390).  Perhaps if I can reproduce it I can guess what is going wrong.

So far it appears WARN_ON_ONCE has nothing to warn about yet it is
warning.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-05-05 18:25             ` Eric W. Biederman
                               ` (17 preceding siblings ...)
  (?)
@ 2022-06-22 16:43             ` Eric W. Biederman
  2022-06-22 16:45               ` [PATCH 1/3] signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit Eric W. Biederman
                                 ` (4 more replies)
  -1 siblings, 5 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-22 16:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, bigeasy, Peter Zijlstra, Jann Horn,
	Kees Cook, Alexander Gordeev, Robert O'Callahan, Kyle Huey,
	Keno Fischer


Recently I had a conversation where it was pointed out to me that
SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
difficult for a tracer to handle.

Keeping SIGKILL working for anything after the process has been killed
is also a real pain from an implementation point of view.

So I am attempting to remove this wart in the userspace API and see
if anyone cares.

Eric W. Biederman (3):
      signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
      signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
      signal: Drop signals received after a fatal signal has been processed

 fs/coredump.c                |  2 +-
 include/linux/sched/signal.h |  1 +
 kernel/exit.c                | 20 +++++++++++++++++++-
 kernel/fork.c                |  2 ++
 kernel/signal.c              |  3 ++-
 5 files changed, 25 insertions(+), 3 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* [PATCH 1/3] signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
  2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
@ 2022-06-22 16:45               ` Eric W. Biederman
  2022-06-22 16:46               ` [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit Eric W. Biederman
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-22 16:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, bigeasy, Peter Zijlstra, Jann Horn,
	Kees Cook, Alexander Gordeev, Robert O'Callahan, Kyle Huey,
	Keno Fischer


The function do_group_exit has an optimization that avoids taking
siglock and doing the work to find other threads in the signal group
and shutting them down.

It is very desirable for SIGNAL_GROUP_EXIT to always been set whenever
it is decided for the process to exit.  That ensures only a single
place needs to be tested, and a single bit of state needs to be looked
at.  This makes the optimization in do_group_exit counter productive.

Make the code and maintenance simpler by removing this unnecessary
option.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..96e4b12edea8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -905,7 +905,7 @@ do_group_exit(int exit_code)
 		exit_code = sig->group_exit_code;
 	else if (sig->group_exec_task)
 		exit_code = 0;
-	else if (!thread_group_empty(current)) {
+	else {
 		struct sighand_struct *const sighand = current->sighand;
 
 		spin_lock_irq(&sighand->siglock);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
  2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
  2022-06-22 16:45               ` [PATCH 1/3] signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit Eric W. Biederman
@ 2022-06-22 16:46               ` Eric W. Biederman
  2022-06-23  7:49                 ` kernel test robot
  2022-06-22 16:47               ` [PATCH 3/3] signal: Drop signals received after a fatal signal has been processed Eric W. Biederman
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-22 16:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, bigeasy, Peter Zijlstra, Jann Horn,
	Kees Cook, Alexander Gordeev, Robert O'Callahan, Kyle Huey,
	Keno Fischer


Track how many threads have not started exiting and when the last
thread starts exiting set SIGNAL_GROUP_EXIT.

This guarantees that SIGNAL_GROUP_EXIT will get set when a process
exits.  In practice this achieves nothing as glibc's implementation of
_exit calls sys_group_exit then sys_exit.  While glibc's implemenation
of pthread_exit calls exit (which cleansup and calls _exit) if it is
the last thread and sys_exit if it is the last thread.

This means the only way the kernel might observe a process that does
not set call exit_group is if the language runtime does not use glibc.

With more cleanups I hope to move the decrement of quick_threads
earlier.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/linux/sched/signal.h |  1 +
 kernel/exit.c                | 18 ++++++++++++++++++
 kernel/fork.c                |  2 ++
 3 files changed, 21 insertions(+)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index cafbe03eed01..20099268fa25 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -94,6 +94,7 @@ struct signal_struct {
 	refcount_t		sigcnt;
 	atomic_t		live;
 	int			nr_threads;
+	int			quick_threads;
 	struct list_head	thread_head;
 
 	wait_queue_head_t	wait_chldexit;	/* for wait4() */
diff --git a/kernel/exit.c b/kernel/exit.c
index 96e4b12edea8..beaedb867bd3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -733,11 +733,29 @@ static void check_stack_usage(void)
 static inline void check_stack_usage(void) {}
 #endif
 
+static void synchronize_group_exit(struct task_struct *tsk, long code)
+{
+	struct sighand_struct *sighand = tsk->sighand;
+	struct signal_struct *signal = tsk->signal;
+
+	spin_lock_irq(&sighand->siglock);
+	signal->quick_threads--;
+	if ((signal->quick_threads == 0) &&
+	    !(signal->flags & SIGNAL_GROUP_EXIT)) {
+		signal->flags = SIGNAL_GROUP_EXIT;
+		signal->group_exit_code = code;
+		signal->group_stop_count = 0;
+	}
+	spin_unlock_irq(&sighand->siglock);
+}
+
 void __noreturn do_exit(long code)
 {
 	struct task_struct *tsk = current;
 	int group_dead;
 
+	synchronize_group_exit(tsk, code);
+
 	WARN_ON(tsk->plug);
 
 	kcov_task_exit(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index 9d44f2d46c69..67813b25a567 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1692,6 +1692,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 		return -ENOMEM;
 
 	sig->nr_threads = 1;
+	sig->quick_threads = 1;
 	atomic_set(&sig->live, 1);
 	refcount_set(&sig->sigcnt, 1);
 
@@ -2444,6 +2445,7 @@ static __latent_entropy struct task_struct *copy_process(
 			__this_cpu_inc(process_counts);
 		} else {
 			current->signal->nr_threads++;
+			current->signal->quick_threads++;
 			atomic_inc(&current->signal->live);
 			refcount_inc(&current->signal->sigcnt);
 			task_join_group_stop(p);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* [PATCH 3/3] signal: Drop signals received after a fatal signal has been processed
  2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
  2022-06-22 16:45               ` [PATCH 1/3] signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit Eric W. Biederman
  2022-06-22 16:46               ` [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit Eric W. Biederman
@ 2022-06-22 16:47               ` Eric W. Biederman
  2022-06-23 15:12               ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Alexander Gordeev
  2022-07-08 22:25               ` Eric W. Biederman
  4 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-22 16:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, mingo, bigeasy, Peter Zijlstra, Jann Horn,
	Kees Cook, Alexander Gordeev, Robert O'Callahan, Kyle Huey,
	Keno Fischer


In 403bad72b67d ("coredump: only SIGKILL should interrupt the
coredumping task") Oleg modified the kernel to drop all signals that
come in during a coredump except SIGKILL, and suggested that it might
be a good idea to generalize that to other cases after the process has
received a fatal signal.

Semantically it does not make sense to perform any signal delivery
after the process has already been killed.

When a signal is sent while a process is dying today the signal is
placed in the signal queue by __send_signal and a single task of the
process is woken up with signal_wake_up, if there are any tasks that
have not set PF_EXITING.

Take things one step farther and have prepare_signal report that all
signals that come after a process has been killed should be ignored.
While retaining the historical exception of allowing SIGKILL to
interrupt coredumps.

Update the comment in fs/coredump.c to make it clear coredumps are
special in being able to receive SIGKILL.

This changes things so that a process stopped in PTRACE_EVENT_EXIT can
not be made to escape it's ptracer and finish exiting by sending it
SIGKILL.  That a process can be made to leave PTRACE_EVENT_EXIT and
escape it's tracer by sending the process a SIGKILL has been
complicating tracer's for no apparent advantage.  If the process needs
to be made to leave PTRACE_EVENT_EXIT all that needs to happen is to
kill the proceses's tracer.  This differs from the coredump code where
there is no other mechanism besides honoring SIGKILL to expedite the
end of coredumping.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/coredump.c   | 2 +-
 kernel/signal.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index ebc43f960b64..b836948c9543 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -354,7 +354,7 @@ static int zap_process(struct task_struct *start, int exit_code)
 	struct task_struct *t;
 	int nr = 0;
 
-	/* ignore all signals except SIGKILL, see prepare_signal() */
+	/* Allow SIGKILL, see prepare_signal() */
 	start->signal->flags = SIGNAL_GROUP_EXIT;
 	start->signal->group_exit_code = exit_code;
 	start->signal->group_stop_count = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..369d65b06025 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -913,8 +913,9 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
 		if (signal->core_state)
 			return sig == SIGKILL;
 		/*
-		 * The process is in the middle of dying, nothing to do.
+		 * The process is in the middle of dying, drop the signal.
 		 */
+		return false;
 	} else if (sig_kernel_stop(sig)) {
 		/*
 		 * This is a stop signal.  Remove SIGCONT from all queues.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
  2022-06-22 16:46               ` [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit Eric W. Biederman
@ 2022-06-23  7:49                 ` kernel test robot
  0 siblings, 0 replies; 572+ messages in thread
From: kernel test robot @ 2022-06-23  7:49 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov
  Cc: kbuild-all, linux-kernel, mingo, bigeasy, Peter Zijlstra,
	Jann Horn, Kees Cook, Alexander Gordeev, Robert O'Callahan,
	Kyle Huey, Keno Fischer

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.19-rc3 next-20220622]
[cannot apply to kees/for-next/pstore]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/signal-Ensure-SIGNAL_GROUP_EXIT-gets-set-in-do_group_exit/20220623-014543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 3abc3ae553c7ed73365b385b9a4cffc5176aae45
config: nios2-randconfig-s032-20220622 (https://download.01.org/0day-ci/archive/20220623/202206231547.B89O2N6f-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-31-g4880bd19-dirty
        # https://github.com/intel-lab-lkp/linux/commit/0e1cfa4e4efe9c701f3ef5665c22310ac7aa7e7e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-W-Biederman/signal-Ensure-SIGNAL_GROUP_EXIT-gets-set-in-do_group_exit/20220623-014543
        git checkout 0e1cfa4e4efe9c701f3ef5665c22310ac7aa7e7e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=nios2 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   kernel/exit.c:281:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *tsk @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:281:37: sparse:     expected struct task_struct *tsk
   kernel/exit.c:281:37: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/exit.c:284:32: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *task @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:284:32: sparse:     expected struct task_struct *task
   kernel/exit.c:284:32: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/exit.c:285:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *task @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:285:35: sparse:     expected struct task_struct *task
   kernel/exit.c:285:35: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/exit.c:330:24: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:330:24: sparse:     expected struct task_struct *parent
   kernel/exit.c:330:24: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/exit.c:357:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:357:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:357:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:360:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:360:29: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:360:29: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:583:29: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *reaper @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:583:29: sparse:     expected struct task_struct *reaper
   kernel/exit.c:583:29: sparse:     got struct task_struct [noderef] __rcu *real_parent
   kernel/exit.c:585:29: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct task_struct *reaper @@     got struct task_struct [noderef] __rcu *real_parent @@
   kernel/exit.c:585:29: sparse:     expected struct task_struct *reaper
   kernel/exit.c:585:29: sparse:     got struct task_struct [noderef] __rcu *real_parent
>> kernel/exit.c:738:45: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/exit.c:738:45: sparse:     expected struct sighand_struct *sighand
   kernel/exit.c:738:45: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/exit.c:927:63: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct sighand_struct *const sighand @@     got struct sighand_struct [noderef] __rcu *sighand @@
   kernel/exit.c:927:63: sparse:     expected struct sighand_struct *const sighand
   kernel/exit.c:927:63: sparse:     got struct sighand_struct [noderef] __rcu *sighand
   kernel/exit.c:1082:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1082:39: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1082:39: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1107:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1107:41: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1107:41: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1196:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1196:25: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1196:25: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1211:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1211:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1211:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1262:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1262:25: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1262:25: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1265:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1265:35: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1265:35: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1271:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct spinlock [usertype] *lock @@     got struct spinlock [noderef] __rcu * @@
   kernel/exit.c:1271:27: sparse:     expected struct spinlock [usertype] *lock
   kernel/exit.c:1271:27: sparse:     got struct spinlock [noderef] __rcu *
   kernel/exit.c:1452:59: sparse: sparse: incompatible types in comparison expression (different base types):
   kernel/exit.c:1452:59: sparse:    void *
   kernel/exit.c:1452:59: sparse:    struct task_struct [noderef] __rcu *
   kernel/exit.c:1468:25: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected struct task_struct *parent @@     got struct task_struct [noderef] __rcu * @@
   kernel/exit.c:1468:25: sparse:     expected struct task_struct *parent
   kernel/exit.c:1468:25: sparse:     got struct task_struct [noderef] __rcu *
   kernel/exit.c: note: in included file (through arch/nios2/include/uapi/asm/elf.h, arch/nios2/include/asm/elf.h, include/linux/elf.h, ...):
   include/linux/ptrace.h:92:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *p1 @@     got struct task_struct [noderef] __rcu *real_parent @@
   include/linux/ptrace.h:92:40: sparse:     expected struct task_struct *p1
   include/linux/ptrace.h:92:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   include/linux/ptrace.h:92:60: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *p2 @@     got struct task_struct [noderef] __rcu *parent @@
   include/linux/ptrace.h:92:60: sparse:     expected struct task_struct *p2
   include/linux/ptrace.h:92:60: sparse:     got struct task_struct [noderef] __rcu *parent
   include/linux/ptrace.h:92:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *p1 @@     got struct task_struct [noderef] __rcu *real_parent @@
   include/linux/ptrace.h:92:40: sparse:     expected struct task_struct *p1
   include/linux/ptrace.h:92:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   include/linux/ptrace.h:92:60: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *p2 @@     got struct task_struct [noderef] __rcu *parent @@
   include/linux/ptrace.h:92:60: sparse:     expected struct task_struct *p2
   include/linux/ptrace.h:92:60: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/exit.c: note: in included file (through include/linux/sched/signal.h, include/linux/rcuwait.h, include/linux/percpu-rwsem.h, ...):
   include/linux/sched/task.h:110:21: sparse: sparse: context imbalance in 'wait_task_zombie' - unexpected unlock
   include/linux/sched/task.h:110:21: sparse: sparse: context imbalance in 'wait_task_stopped' - unexpected unlock
   include/linux/sched/task.h:110:21: sparse: sparse: context imbalance in 'wait_task_continued' - unexpected unlock
   kernel/exit.c: note: in included file (through arch/nios2/include/uapi/asm/elf.h, arch/nios2/include/asm/elf.h, include/linux/elf.h, ...):
   include/linux/ptrace.h:92:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected struct task_struct *p1 @@     got struct task_struct [noderef] __rcu *real_parent @@
   include/linux/ptrace.h:92:40: sparse:     expected struct task_struct *p1
   include/linux/ptrace.h:92:40: sparse:     got struct task_struct [noderef] __rcu *real_parent
   include/linux/ptrace.h:92:60: sparse: sparse: incorrect type in argument 2 (different address spaces) @@     expected struct task_struct *p2 @@     got struct task_struct [noderef] __rcu *parent @@
   include/linux/ptrace.h:92:60: sparse:     expected struct task_struct *p2
   include/linux/ptrace.h:92:60: sparse:     got struct task_struct [noderef] __rcu *parent
   kernel/exit.c: note: in included file (through include/linux/thread_info.h, include/asm-generic/preempt.h, arch/nios2/include/generated/asm/preempt.h, ...):
   arch/nios2/include/asm/thread_info.h:62:9: sparse: sparse: context imbalance in 'do_wait' - wrong count at exit

vim +738 kernel/exit.c

   735	
   736	static void synchronize_group_exit(struct task_struct *tsk, long code)
   737	{
 > 738		struct sighand_struct *sighand = tsk->sighand;
   739		struct signal_struct *signal = tsk->signal;
   740	
   741		spin_lock_irq(&sighand->siglock);
   742		signal->quick_threads--;
   743		if ((signal->quick_threads == 0) &&
   744		    !(signal->flags & SIGNAL_GROUP_EXIT)) {
   745			signal->flags = SIGNAL_GROUP_EXIT;
   746			signal->group_exit_code = code;
   747			signal->group_stop_count = 0;
   748		}
   749		spin_unlock_irq(&sighand->siglock);
   750	}
   751	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
                                 ` (2 preceding siblings ...)
  2022-06-22 16:47               ` [PATCH 3/3] signal: Drop signals received after a fatal signal has been processed Eric W. Biederman
@ 2022-06-23 15:12               ` Alexander Gordeev
  2022-06-23 21:55                 ` Eric W. Biederman
  2022-07-08 22:25               ` Eric W. Biederman
  4 siblings, 1 reply; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-23 15:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, linux-kernel, mingo, bigeasy, Peter Zijlstra,
	Jann Horn, Kees Cook, Robert O'Callahan, Kyle Huey,
	Keno Fischer

On Wed, Jun 22, 2022 at 11:43:37AM -0500, Eric W. Biederman wrote:
> Recently I had a conversation where it was pointed out to me that
> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> difficult for a tracer to handle.
> 
> Keeping SIGKILL working for anything after the process has been killed
> is also a real pain from an implementation point of view.
> 
> So I am attempting to remove this wart in the userspace API and see
> if anyone cares.

Hi Eric,

With this series s390 hits the warning exactly same way. Is that expected?

Thanks!

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-06-23 15:12               ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Alexander Gordeev
@ 2022-06-23 21:55                 ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-23 21:55 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Oleg Nesterov, linux-kernel, mingo, bigeasy, Peter Zijlstra,
	Jann Horn, Kees Cook, Robert O'Callahan, Kyle Huey,
	Keno Fischer

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Wed, Jun 22, 2022 at 11:43:37AM -0500, Eric W. Biederman wrote:
>> Recently I had a conversation where it was pointed out to me that
>> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>> difficult for a tracer to handle.
>> 
>> Keeping SIGKILL working for anything after the process has been killed
>> is also a real pain from an implementation point of view.
>> 
>> So I am attempting to remove this wart in the userspace API and see
>> if anyone cares.
>
> Hi Eric,
>
> With this series s390 hits the warning exactly same way. Is that expected?

Yes.  I was working on this before I got your mysterious bug report.  I
included you because I am including everyone I know who deals with the
userspace side of this since I am very deliberately changing the user
visible behavior of PTRACE_EVENT_EXIT.

I am going to start seeing if I can find any possible explanation for
your regression report.  Since I don't have much to go on I expect I
will have to revert the last change in my ptrace_stop series that
apparently triggers the WARN_ON you reported.  I really would have
expected the WARN_ON to be triggered in the patch in which it was
introduced, not the final patch in the series.


To the best of my knowledge changing PTRACE_EVENT_EXIT is both desirable
from a userspace semantics standpoint and from a kernel implementation
standpoint.  If someone knows any differently and depends upon sending
SIGKILL to processes in PTRACE_EVENT_EXIT to steal the process away from
the tracer I would love to hear about that case.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-21 15:15                     ` Alexander Gordeev
  (?)
@ 2022-06-25 16:34                       ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-25 16:34 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,
>

I haven't gotten as far as reproducing this but I have started giving
this issue some thought.

This entire thing smells like a memory barrier is missing somewhere.
However by definition the lock implementations in linux provide all the
needed memory barriers, and in the ptrace_stop and ptrace_check_attach
path I don't see cases where these values are sampled outside of a lock
except in wait_task_inactive.  Does doing that perhaps require a
barrier? 

The two things I can think of that could shed light on what is going on
is enabling lockdep, to enable the debug check in signal_wake_up_state
and verifying bits of state that should be constant while the task
is frozen for ptrace are indeed constant when task is frozen for ptrace.
Something like my patch below.

If you could test that when you have a chance that would help narrow
down what is going on.

Thank you,
Eric

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..6467a2b1c3bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+	if (!ret && !ignore_state) {
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
+		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
+		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
 		ret = -ESRCH;
+	}
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-25 16:34                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-25 16:34 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,
>

I haven't gotten as far as reproducing this but I have started giving
this issue some thought.

This entire thing smells like a memory barrier is missing somewhere.
However by definition the lock implementations in linux provide all the
needed memory barriers, and in the ptrace_stop and ptrace_check_attach
path I don't see cases where these values are sampled outside of a lock
except in wait_task_inactive.  Does doing that perhaps require a
barrier? 

The two things I can think of that could shed light on what is going on
is enabling lockdep, to enable the debug check in signal_wake_up_state
and verifying bits of state that should be constant while the task
is frozen for ptrace are indeed constant when task is frozen for ptrace.
Something like my patch below.

If you could test that when you have a chance that would help narrow
down what is going on.

Thank you,
Eric

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..6467a2b1c3bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+	if (!ret && !ignore_state) {
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
+		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
+		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
 		ret = -ESRCH;
+	}
 
 	return ret;
 }

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-25 16:34                       ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-25 16:34 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote:
>> Alexander Gordeev <agordeev@linux.ibm.com> writes:
>> 
>> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote:
>> >> From: Peter Zijlstra <peterz@infradead.org>
>> >> 
>> >> Currently ptrace_stop() / do_signal_stop() rely on the special states
>> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
>> >> state exists only in task->__state and nowhere else.
>> >> 
>> >> There's two spots of bother with this:
>> >> 
>> >>  - PREEMPT_RT has task->saved_state which complicates matters,
>> >>    meaning task_is_{traced,stopped}() needs to check an additional
>> >>    variable.
>> >> 
>> >>  - An alternative freezer implementation that itself relies on a
>> >>    special TASK state would loose TASK_TRACED/TASK_STOPPED and will
>> >>    result in misbehaviour.
>> >> 
>> >> As such, add additional state to task->jobctl to track this state
>> >> outside of task->__state.
>> >> 
>> >> NOTE: this doesn't actually fix anything yet, just adds extra state.
>> >> 
>> >> --EWB
>> >>   * didn't add a unnecessary newline in signal.h
>> >>   * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
>> >>     instead of in signal_wake_up_state.  This prevents the clearing
>> >>     of TASK_STOPPED and TASK_TRACED from getting lost.
>> >>   * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>> >
>> > Hi Eric, Peter,
>> >
>> > On s390 this patch triggers warning at kernel/ptrace.c:272 when
>> > kill_child testcase from strace tool is repeatedly used (the source
>> > is attached for reference):
>> >
>> > while :; do
>> > 	strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child
>> > done
>> >
>> > It normally takes few minutes to cause the warning in -rc3, but FWIW
>> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.
>> >
>> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't
>> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we
>> > observe a child in __TASK_TRACED state. Could you please comment here?
>> >
>> 
>> For clarity the warning is that the child is not in __TASK_TRACED state.
>> 
>> The code is waiting for the code to stop in the scheduler in the
>> __TASK_TRACED state so that it can safely read and change the
>> processes state.  Some of that state is not even saved until the
>> process is scheduled out so we have to wait until the process
>> is stopped in the scheduler.
>
> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
>
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
>
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
>
>> 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
>
> crash> task d0b10100 __state
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,
>

I haven't gotten as far as reproducing this but I have started giving
this issue some thought.

This entire thing smells like a memory barrier is missing somewhere.
However by definition the lock implementations in linux provide all the
needed memory barriers, and in the ptrace_stop and ptrace_check_attach
path I don't see cases where these values are sampled outside of a lock
except in wait_task_inactive.  Does doing that perhaps require a
barrier? 

The two things I can think of that could shed light on what is going on
is enabling lockdep, to enable the debug check in signal_wake_up_state
and verifying bits of state that should be constant while the task
is frozen for ptrace are indeed constant when task is frozen for ptrace.
Something like my patch below.

If you could test that when you have a chance that would help narrow
down what is going on.

Thank you,
Eric

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..6467a2b1c3bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+	if (!ret && !ignore_state) {
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
+		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
+		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
 		ret = -ESRCH;
+	}
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-25 16:34                       ` Eric W. Biederman
  (?)
@ 2022-06-28 18:36                         ` Alexander Gordeev
  -1 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-28 18:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote:
> I haven't gotten as far as reproducing this but I have started giving
> this issue some thought.
> 
> This entire thing smells like a memory barrier is missing somewhere.
> However by definition the lock implementations in linux provide all the
> needed memory barriers, and in the ptrace_stop and ptrace_check_attach
> path I don't see cases where these values are sampled outside of a lock
> except in wait_task_inactive.  Does doing that perhaps require a
> barrier? 
> 
> The two things I can think of that could shed light on what is going on
> is enabling lockdep, to enable the debug check in signal_wake_up_state
> and verifying bits of state that should be constant while the task
> is frozen for ptrace are indeed constant when task is frozen for ptrace.
> Something like my patch below.
> 
> If you could test that when you have a chance that would help narrow
> down what is going on.
> 
> Thank you,
> Eric
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..6467a2b1c3bc 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state &&
> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +	if (!ret && !ignore_state) {
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
> +		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
> +		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
> +		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
>  		ret = -ESRCH;
> +	}
>  
>  	return ret;
>  }

I modified your chunk a bit - hope that is what you had in mind:

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..f0e9a9a4d63c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-		ret = -ESRCH;
+	if (!ret && !ignore_state) {
+		unsigned int __state;
+
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
+		__state = READ_ONCE(child->__state);
+		if (__state != __TASK_TRACED) {
+			pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state);
+			WARN_ON_ONCE(1);
+		}
+		if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+			ret = -ESRCH;
+	}
 
 	return ret;
 }


When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING,
as reported by the preceding pr_err(). Yet, in the resulting core dump
it is always __TASK_TRACED.

Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED)
confirms the unexpected __state is always TASK_RUNNING. It never observed
more than one iteration and gets printed once in 30-60 mins.

So probably when the condition is entered __state is TASK_RUNNING more
often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind
of consistent with my previous observation that kernel/sched/core.c:3305
is where return 0 makes wait_task_inactive() fail.

No other WARN_ON_ONCE() hit ever.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 18:36                         ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-28 18:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote:
> I haven't gotten as far as reproducing this but I have started giving
> this issue some thought.
> 
> This entire thing smells like a memory barrier is missing somewhere.
> However by definition the lock implementations in linux provide all the
> needed memory barriers, and in the ptrace_stop and ptrace_check_attach
> path I don't see cases where these values are sampled outside of a lock
> except in wait_task_inactive.  Does doing that perhaps require a
> barrier? 
> 
> The two things I can think of that could shed light on what is going on
> is enabling lockdep, to enable the debug check in signal_wake_up_state
> and verifying bits of state that should be constant while the task
> is frozen for ptrace are indeed constant when task is frozen for ptrace.
> Something like my patch below.
> 
> If you could test that when you have a chance that would help narrow
> down what is going on.
> 
> Thank you,
> Eric
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..6467a2b1c3bc 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state &&
> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +	if (!ret && !ignore_state) {
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
> +		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
> +		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
> +		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
>  		ret = -ESRCH;
> +	}
>  
>  	return ret;
>  }

I modified your chunk a bit - hope that is what you had in mind:

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..f0e9a9a4d63c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-		ret = -ESRCH;
+	if (!ret && !ignore_state) {
+		unsigned int __state;
+
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
+		__state = READ_ONCE(child->__state);
+		if (__state != __TASK_TRACED) {
+			pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state);
+			WARN_ON_ONCE(1);
+		}
+		if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+			ret = -ESRCH;
+	}
 
 	return ret;
 }


When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING,
as reported by the preceding pr_err(). Yet, in the resulting core dump
it is always __TASK_TRACED.

Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED)
confirms the unexpected __state is always TASK_RUNNING. It never observed
more than one iteration and gets printed once in 30-60 mins.

So probably when the condition is entered __state is TASK_RUNNING more
often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind
of consistent with my previous observation that kernel/sched/core.c:3305
is where return 0 makes wait_task_inactive() fail.

No other WARN_ON_ONCE() hit ever.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 18:36                         ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-28 18:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote:
> I haven't gotten as far as reproducing this but I have started giving
> this issue some thought.
> 
> This entire thing smells like a memory barrier is missing somewhere.
> However by definition the lock implementations in linux provide all the
> needed memory barriers, and in the ptrace_stop and ptrace_check_attach
> path I don't see cases where these values are sampled outside of a lock
> except in wait_task_inactive.  Does doing that perhaps require a
> barrier? 
> 
> The two things I can think of that could shed light on what is going on
> is enabling lockdep, to enable the debug check in signal_wake_up_state
> and verifying bits of state that should be constant while the task
> is frozen for ptrace are indeed constant when task is frozen for ptrace.
> Something like my patch below.
> 
> If you could test that when you have a chance that would help narrow
> down what is going on.
> 
> Thank you,
> Eric
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..6467a2b1c3bc 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state &&
> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +	if (!ret && !ignore_state) {
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
> +		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
> +		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
> +		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
>  		ret = -ESRCH;
> +	}
>  
>  	return ret;
>  }

I modified your chunk a bit - hope that is what you had in mind:

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..f0e9a9a4d63c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	}
 	read_unlock(&tasklist_lock);
 
-	if (!ret && !ignore_state &&
-	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
-		ret = -ESRCH;
+	if (!ret && !ignore_state) {
+		unsigned int __state;
+
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
+		WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
+		__state = READ_ONCE(child->__state);
+		if (__state != __TASK_TRACED) {
+			pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state);
+			WARN_ON_ONCE(1);
+		}
+		if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+			ret = -ESRCH;
+	}
 
 	return ret;
 }


When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING,
as reported by the preceding pr_err(). Yet, in the resulting core dump
it is always __TASK_TRACED.

Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED)
confirms the unexpected __state is always TASK_RUNNING. It never observed
more than one iteration and gets printed once in 30-60 mins.

So probably when the condition is entered __state is TASK_RUNNING more
often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind
of consistent with my previous observation that kernel/sched/core.c:3305
is where return 0 makes wait_task_inactive() fail.

No other WARN_ON_ONCE() hit ever.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-28 18:36                         ` Alexander Gordeev
@ 2022-06-28 22:42                           ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-28 22:42 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote:
>> I haven't gotten as far as reproducing this but I have started giving
>> this issue some thought.
>> 
>> This entire thing smells like a memory barrier is missing somewhere.
>> However by definition the lock implementations in linux provide all the
>> needed memory barriers, and in the ptrace_stop and ptrace_check_attach
>> path I don't see cases where these values are sampled outside of a lock
>> except in wait_task_inactive.  Does doing that perhaps require a
>> barrier? 
>> 
>> The two things I can think of that could shed light on what is going on
>> is enabling lockdep, to enable the debug check in signal_wake_up_state
>> and verifying bits of state that should be constant while the task
>> is frozen for ptrace are indeed constant when task is frozen for ptrace.
>> Something like my patch below.
>> 
>> If you could test that when you have a chance that would help narrow
>> down what is going on.
>> 
>> Thank you,
>> Eric
>> 
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 156a99283b11..6467a2b1c3bc 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state &&
>> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +	if (!ret && !ignore_state) {
>> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
>> +		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
>> +		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
>> +		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
>>  		ret = -ESRCH;
>> +	}
>>  
>>  	return ret;
>>  }
>
> I modified your chunk a bit - hope that is what you had in mind:

Yes.

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..f0e9a9a4d63c 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state &&
> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> -		ret = -ESRCH;
> +	if (!ret && !ignore_state) {
> +		unsigned int __state;
> +
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
> +		__state = READ_ONCE(child->__state);
> +		if (__state != __TASK_TRACED) {
> +			pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state);
> +			WARN_ON_ONCE(1);
> +		}
> +		if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +			ret = -ESRCH;
> +	}
>  
>  	return ret;
>  }
>
>
> When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING,
> as reported by the preceding pr_err(). Yet, in the resulting core dump
> it is always __TASK_TRACED.

Did you enable CONFIG_LOCKDEP?  I am just wanting to ensure
that every caller of signal_wake_up_state was holding siglock.

> Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED)
> confirms the unexpected __state is always TASK_RUNNING. It never observed
> more than one iteration and gets printed once in 30-60 mins.

Hmm.  This does smell lock a missing barrier.

> So probably when the condition is entered __state is TASK_RUNNING more
> often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind
> of consistent with my previous observation that kernel/sched/core.c:3305
> is where return 0 makes wait_task_inactive() fail.
>
> No other WARN_ON_ONCE() hit ever.

Yes.  This smells like something is missing.

I am completely rusty at rolling barriers by hand but does something
like the below clear up those mysterious warnings?

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..cb85bcf84640 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
+		smp_rmb();
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..bcd576e9de66 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	smp_wmb();
 	current->jobctl |= JOBCTL_TRACED;
 
 	/*

Eric

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 22:42                           ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-28 22:42 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, rostedt, mgorman, bigeasy, Will Deacon, tj,
	linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Alexander Gordeev <agordeev@linux.ibm.com> writes:

> On Sat, Jun 25, 2022 at 11:34:46AM -0500, Eric W. Biederman wrote:
>> I haven't gotten as far as reproducing this but I have started giving
>> this issue some thought.
>> 
>> This entire thing smells like a memory barrier is missing somewhere.
>> However by definition the lock implementations in linux provide all the
>> needed memory barriers, and in the ptrace_stop and ptrace_check_attach
>> path I don't see cases where these values are sampled outside of a lock
>> except in wait_task_inactive.  Does doing that perhaps require a
>> barrier? 
>> 
>> The two things I can think of that could shed light on what is going on
>> is enabling lockdep, to enable the debug check in signal_wake_up_state
>> and verifying bits of state that should be constant while the task
>> is frozen for ptrace are indeed constant when task is frozen for ptrace.
>> Something like my patch below.
>> 
>> If you could test that when you have a chance that would help narrow
>> down what is going on.
>> 
>> Thank you,
>> Eric
>> 
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 156a99283b11..6467a2b1c3bc 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -268,9 +268,13 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> -	if (!ret && !ignore_state &&
>> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> +	if (!ret && !ignore_state) {
>> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
>> +		WARN_ON_ONCE(!(child->joctctl & JOBCTL_TRACED));
>> +		WARN_ON_ONCE(READ_ONCE(child->__state) != __TASK_TRACED);
>> +		WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED));
>>  		ret = -ESRCH;
>> +	}
>>  
>>  	return ret;
>>  }
>
> I modified your chunk a bit - hope that is what you had in mind:

Yes.

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..f0e9a9a4d63c 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,19 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>  	}
>  	read_unlock(&tasklist_lock);
>  
> -	if (!ret && !ignore_state &&
> -	    WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> -		ret = -ESRCH;
> +	if (!ret && !ignore_state) {
> +		unsigned int __state;
> +
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
> +		WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
> +		__state = READ_ONCE(child->__state);
> +		if (__state != __TASK_TRACED) {
> +			pr_err("%s(%d) __state %x", __FUNCTION__, __LINE__, __state);
> +			WARN_ON_ONCE(1);
> +		}
> +		if (WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> +			ret = -ESRCH;
> +	}
>  
>  	return ret;
>  }
>
>
> When WARN_ON_ONCE(1) hits the child __state is always zero/TASK_RUNNING,
> as reported by the preceding pr_err(). Yet, in the resulting core dump
> it is always __TASK_TRACED.

Did you enable CONFIG_LOCKDEP?  I am just wanting to ensure
that every caller of signal_wake_up_state was holding siglock.

> Removing WARN_ON_ONCE(1) while looping until (__state != __TASK_TRACED)
> confirms the unexpected __state is always TASK_RUNNING. It never observed
> more than one iteration and gets printed once in 30-60 mins.

Hmm.  This does smell lock a missing barrier.

> So probably when the condition is entered __state is TASK_RUNNING more
> often, but gets overwritten with __TASK_TRACED pretty quickly. Which kind
> of consistent with my previous observation that kernel/sched/core.c:3305
> is where return 0 makes wait_task_inactive() fail.
>
> No other WARN_ON_ONCE() hit ever.

Yes.  This smells like something is missing.

I am completely rusty at rolling barriers by hand but does something
like the below clear up those mysterious warnings?

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..cb85bcf84640 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
+		smp_rmb();
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..bcd576e9de66 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 		return exit_code;
 
 	set_special_state(TASK_TRACED);
+	smp_wmb();
 	current->jobctl |= JOBCTL_TRACED;
 
 	/*

Eric

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-28 22:42                           ` Eric W. Biederman
@ 2022-06-28 22:48                             ` Steven Rostedt
  -1 siblings, 0 replies; 572+ messages in thread
From: Steven Rostedt @ 2022-06-28 22:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Gordeev, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, 28 Jun 2022 17:42:22 -0500
"Eric W. Biederman" <ebiederm@xmission.com> wrote:

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..cb85bcf84640 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
>  	spin_lock_irq(&task->sighand->siglock);
>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
>  	    !__fatal_signal_pending(task)) {
> +		smp_rmb();
>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
>  		ret = true;
>  	}
> diff --git a/kernel/signal.c b/kernel/signal.c
> index edb1dc9b00dc..bcd576e9de66 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
>  		return exit_code;
>  
>  	set_special_state(TASK_TRACED);
> +	smp_wmb();
>  	current->jobctl |= JOBCTL_TRACED;
>  

Are not these both done under the sighand->siglock spinlock?

That is, the two paths should already be synchronized, and the memory
barriers will not help anything inside the locks. The locking should (and
must) handle all that.

-- Steve


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 22:48                             ` Steven Rostedt
  0 siblings, 0 replies; 572+ messages in thread
From: Steven Rostedt @ 2022-06-28 22:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Gordeev, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, 28 Jun 2022 17:42:22 -0500
"Eric W. Biederman" <ebiederm@xmission.com> wrote:

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 156a99283b11..cb85bcf84640 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
>  	spin_lock_irq(&task->sighand->siglock);
>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
>  	    !__fatal_signal_pending(task)) {
> +		smp_rmb();
>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
>  		ret = true;
>  	}
> diff --git a/kernel/signal.c b/kernel/signal.c
> index edb1dc9b00dc..bcd576e9de66 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
>  		return exit_code;
>  
>  	set_special_state(TASK_TRACED);
> +	smp_wmb();
>  	current->jobctl |= JOBCTL_TRACED;
>  

Are not these both done under the sighand->siglock spinlock?

That is, the two paths should already be synchronized, and the memory
barriers will not help anything inside the locks. The locking should (and
must) handle all that.

-- Steve

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-21 15:15                     ` Alexander Gordeev
  (?)
@ 2022-06-28 23:15                       ` Steven Rostedt
  -1 siblings, 0 replies; 572+ messages in thread
From: Steven Rostedt @ 2022-06-28 23:15 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Eric W. Biederman, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, 21 Jun 2022 17:15:47 +0200
Alexander Gordeev <agordeev@linux.ibm.com> wrote:

> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
> 
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
> 
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
> 
> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state  
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
> 
> crash> task d0b10100 __state  
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

If you are using crash, can you enable all trace events?

Then you should be able to extract the ftrace ring buffer from crash using
the trace.so extend (https://github.com/fujitsu/crash-trace)

I guess it should still work with s390.

Then you can see the events that lead up to the crash.

-- Steve

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 23:15                       ` Steven Rostedt
  0 siblings, 0 replies; 572+ messages in thread
From: Steven Rostedt @ 2022-06-28 23:15 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Eric W. Biederman, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, 21 Jun 2022 17:15:47 +0200
Alexander Gordeev <agordeev@linux.ibm.com> wrote:

> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
> 
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
> 
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
> 
> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state  
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
> 
> crash> task d0b10100 __state  
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

If you are using crash, can you enable all trace events?

Then you should be able to extract the ftrace ring buffer from crash using
the trace.so extend (https://github.com/fujitsu/crash-trace)

I guess it should still work with s390.

Then you can see the events that lead up to the crash.

-- Steve

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-28 23:15                       ` Steven Rostedt
  0 siblings, 0 replies; 572+ messages in thread
From: Steven Rostedt @ 2022-06-28 23:15 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Eric W. Biederman, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

On Tue, 21 Jun 2022 17:15:47 +0200
Alexander Gordeev <agordeev@linux.ibm.com> wrote:

> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
> wait_task_inactive() is where it bails out:
> 
> 3303                 while (task_running(rq, p)) {
> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
> 3305                                 return 0;
> 3306                         cpu_relax();
> 3307                 }
> 
> Yet, the child task is always found in __TASK_TRACED state (as seen
> in crash dumps):
> 
> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
> crash> task bb04b200 __state  
> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>   __state = 8,
> 
> crash> task d0b10100 __state  
> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>   __state = 8,

If you are using crash, can you enable all trace events?

Then you should be able to extract the ftrace ring buffer from crash using
the trace.so extend (https://github.com/fujitsu/crash-trace)

I guess it should still work with s390.

Then you can see the events that lead up to the crash.

-- Steve

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-28 22:48                             ` Steven Rostedt
  (?)
@ 2022-06-29  3:39                               ` Eric W. Biederman
  -1 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-29  3:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 28 Jun 2022 17:42:22 -0500
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 156a99283b11..cb85bcf84640 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
>>  	spin_lock_irq(&task->sighand->siglock);
>>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
>>  	    !__fatal_signal_pending(task)) {
>> +		smp_rmb();
>>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
>>  		ret = true;
>>  	}
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index edb1dc9b00dc..bcd576e9de66 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
>>  		return exit_code;
>>  
>>  	set_special_state(TASK_TRACED);
>> +	smp_wmb();
>>  	current->jobctl |= JOBCTL_TRACED;
>>  
>
> Are not these both done under the sighand->siglock spinlock?
>
> That is, the two paths should already be synchronized, and the memory
> barriers will not help anything inside the locks. The locking should (and
> must) handle all that.

I would presume so to.  However the READ_ONCE that is going astray
does not look like it is honoring that.

So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
is a subtle detail in the barriers that spin locks provide that we are
overlooking?

I just know the observed behavior is:

- reading tsk->jobctl and seeing  JOBCTL_TRACED set.
- reading tsk->__state and seeing TASK_RUNNING.

So unless PREEMPT_RT is enabled on s390.  It looks like there is a
barrier problem.

Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
you don't but I figure I should ask and make certain as PREEMPT_RT can
cause this kind of failure.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-29  3:39                               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-29  3:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 28 Jun 2022 17:42:22 -0500
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 156a99283b11..cb85bcf84640 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
>>  	spin_lock_irq(&task->sighand->siglock);
>>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
>>  	    !__fatal_signal_pending(task)) {
>> +		smp_rmb();
>>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
>>  		ret = true;
>>  	}
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index edb1dc9b00dc..bcd576e9de66 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
>>  		return exit_code;
>>  
>>  	set_special_state(TASK_TRACED);
>> +	smp_wmb();
>>  	current->jobctl |= JOBCTL_TRACED;
>>  
>
> Are not these both done under the sighand->siglock spinlock?
>
> That is, the two paths should already be synchronized, and the memory
> barriers will not help anything inside the locks. The locking should (and
> must) handle all that.

I would presume so to.  However the READ_ONCE that is going astray
does not look like it is honoring that.

So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
is a subtle detail in the barriers that spin locks provide that we are
overlooking?

I just know the observed behavior is:

- reading tsk->jobctl and seeing  JOBCTL_TRACED set.
- reading tsk->__state and seeing TASK_RUNNING.

So unless PREEMPT_RT is enabled on s390.  It looks like there is a
barrier problem.

Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
you don't but I figure I should ask and make certain as PREEMPT_RT can
cause this kind of failure.

Eric

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-29  3:39                               ` Eric W. Biederman
  0 siblings, 0 replies; 572+ messages in thread
From: Eric W. Biederman @ 2022-06-29  3:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 28 Jun 2022 17:42:22 -0500
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 156a99283b11..cb85bcf84640 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
>>  	spin_lock_irq(&task->sighand->siglock);
>>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
>>  	    !__fatal_signal_pending(task)) {
>> +		smp_rmb();
>>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
>>  		ret = true;
>>  	}
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index edb1dc9b00dc..bcd576e9de66 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
>>  		return exit_code;
>>  
>>  	set_special_state(TASK_TRACED);
>> +	smp_wmb();
>>  	current->jobctl |= JOBCTL_TRACED;
>>  
>
> Are not these both done under the sighand->siglock spinlock?
>
> That is, the two paths should already be synchronized, and the memory
> barriers will not help anything inside the locks. The locking should (and
> must) handle all that.

I would presume so to.  However the READ_ONCE that is going astray
does not look like it is honoring that.

So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
is a subtle detail in the barriers that spin locks provide that we are
overlooking?

I just know the observed behavior is:

- reading tsk->jobctl and seeing  JOBCTL_TRACED set.
- reading tsk->__state and seeing TASK_RUNNING.

So unless PREEMPT_RT is enabled on s390.  It looks like there is a
barrier problem.

Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
you don't but I figure I should ask and make certain as PREEMPT_RT can
cause this kind of failure.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-29  3:39                               ` Eric W. Biederman
  (?)
@ 2022-06-29 20:25                                 ` Alexander Gordeev
  -1 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-29 20:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Tue, 28 Jun 2022 17:42:22 -0500
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >
> >> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> >> index 156a99283b11..cb85bcf84640 100644
> >> --- a/kernel/ptrace.c
> >> +++ b/kernel/ptrace.c
> >> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
> >>  	spin_lock_irq(&task->sighand->siglock);
> >>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
> >>  	    !__fatal_signal_pending(task)) {
> >> +		smp_rmb();
> >>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
> >>  		ret = true;
> >>  	}
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index edb1dc9b00dc..bcd576e9de66 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> >>  		return exit_code;
> >>  
> >>  	set_special_state(TASK_TRACED);
> >> +	smp_wmb();
> >>  	current->jobctl |= JOBCTL_TRACED;
> >>  
> >
> > Are not these both done under the sighand->siglock spinlock?
> >
> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?
> 
> I just know the observed behavior is:
> 
> - reading tsk->jobctl and seeing  JOBCTL_TRACED set.
> - reading tsk->__state and seeing TASK_RUNNING.
> 
> So unless PREEMPT_RT is enabled on s390.  It looks like there is a
> barrier problem.
> 
> Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
> you don't but I figure I should ask and make certain as PREEMPT_RT can
> cause this kind of failure.

There is no change with the barriers added.

CONFIG_PREEMPT_RT is disabled and CONFIG_LOCKDEP is enabled (in attach).
FWIW, I also added a full barrier:

@@ -271,6 +272,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
        if (!ret && !ignore_state) {
                unsigned int __state;
 
+               smp_mb();
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
                __state = READ_ONCE(child->__state);

I have not been able to extract the ftrace ring buffer yet - going to do that.

> Eric

Thanks!

[-- Attachment #2: config-5.19.0-rc4-08751-g2cf560748ed6 --]
[-- Type: text/plain, Size: 87568 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 5.19.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="s390x-12.1.0-gcc (GCC) 12.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120100
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=121
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_HAVE_KERNEL_UNCOMPRESSED=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
# CONFIG_KERNEL_UNCOMPRESSED is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_WATCH_QUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# CONFIG_BPF_JIT is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_SCHED_CORE=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CC_NO_ARRAY_BOUNDS=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_MISC=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
CONFIG_BOOT_CONFIG=y
# CONFIG_BOOT_CONFIG_EMBED is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_MMU=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_PGSTE=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
CONFIG_HAVE_MARCH_ZEC12_FEATURES=y
# CONFIG_MARCH_Z10 is not set
# CONFIG_MARCH_Z196 is not set
CONFIG_MARCH_ZEC12=y
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z14 is not set
# CONFIG_MARCH_Z15 is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
# CONFIG_TUNE_Z14 is not set
# CONFIG_TUNE_Z15 is not set
# CONFIG_TUNE_Z16 is not set
CONFIG_64BIT=y
CONFIG_COMMAND_LINE_SIZE=4096
CONFIG_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=512
CONFIG_HOTPLUG_CPU=y
CONFIG_NUMA=y
CONFIG_NODES_SHIFT=1
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_DRAWER=y
CONFIG_SCHED_TOPOLOGY=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
CONFIG_ARCH_RANDOM=y
# CONFIG_KERNEL_NOBP is not set
CONFIG_EXPOLINE=y
# CONFIG_EXPOLINE_EXTERN is not set
# CONFIG_EXPOLINE_OFF is not set
CONFIG_EXPOLINE_AUTO=y
# CONFIG_EXPOLINE_FULL is not set
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
# end of Processor type and features

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_MAX_PHYSMEM_BITS=46
# end of Memory setup

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI_NR_FUNCTIONS=512
CONFIG_HAS_IOMEM=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m
CONFIG_VFIO_CCW=m
CONFIG_VFIO_AP=m
# end of I/O subsystem

#
# Dump support
#
CONFIG_CRASH_DUMP=y
# end of Dump support

CONFIG_CCW=y
CONFIG_HAVE_PNETID=y

#
# Virtualization
#
CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_HAVE_KVM_INVALID_WAKEUPS=y
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
# CONFIG_KVM_S390_UCONTROL is not set
CONFIG_S390_GUEST=y
# end of Virtualization

CONFIG_S390_MODULES_SANITY_TEST_HELPERS=y

#
# Selftests
#
CONFIG_S390_UNWIND_SELFTEST=m
CONFIG_S390_KPROBES_SANITY_TEST=m
CONFIG_S390_MODULES_SANITY_TEST=m
# end of Selftests

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_USTAT_F_TINODE=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_NO_GATHER=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE=y
CONFIG_ARCH_HAS_SCALED_CPUTIME=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ALTERNATE_USER_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_VDSO_DATA=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
CONFIG_MODVERSIONS=y
CONFIG_ASM_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_RQ_ALLOC_TIME=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_ICQ=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
CONFIG_BLK_WBT=y
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BLK_CGROUP_IOCOST=y
CONFIG_BLK_CGROUP_IOPRIO=y
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_STATE=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_ZPOOL=y
CONFIG_SWAP=y
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_DEFAULT_ON is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
CONFIG_MEM_SOFT_DIRTY=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_PAGE_IDLE_FLAG=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ZONE_DMA=y
CONFIG_HMM_MIRROR=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_ANON_VMA_NAME=y
CONFIG_USERFAULTFD=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_NET_REDIRECT=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
CONFIG_UNIX_DIAG=m
# CONFIG_TLS is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_INTERFACE is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_XFRM_ESPINTCP=y
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
# CONFIG_INET_ESP_OFFLOAD is not set
CONFIG_INET_ESPINTCP=y
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
# CONFIG_TCP_CONG_NV is not set
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
# CONFIG_TCP_CONG_BBR is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
# CONFIG_INET6_ESP_OFFLOAD is not set
CONFIG_INET6_ESPINTCP=y
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
CONFIG_IPV6_RPL_LWTUNNEL=y
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_NETLABEL is not set
CONFIG_MPTCP=y
CONFIG_INET_MPTCP_DIAG=m
CONFIG_MPTCP_IPV6=y
# CONFIG_MPTCP_KUNIT_TEST is not set
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_EGRESS=y
CONFIG_NETFILTER_SKIP_EGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
CONFIG_NETFILTER_NETLINK_HOOK=m
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
# CONFIG_NF_TABLES_NETDEV is not set
# CONFIG_NFT_NUMGEN is not set
CONFIG_NFT_CT=m
# CONFIG_NFT_CONNLIMIT is not set
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
# CONFIG_NFT_MASQ is not set
# CONFIG_NFT_REDIR is not set
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
# CONFIG_NFT_QUEUE is not set
# CONFIG_NFT_QUOTA is not set
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
# CONFIG_NFT_SOCKET is not set
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
# CONFIG_NETFILTER_XT_TARGET_NETMAP is not set
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
# CONFIG_NETFILTER_XT_TARGET_REDIRECT is not set
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_L2TP=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
# CONFIG_NETFILTER_XT_MATCH_SOCKET is not set
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
# CONFIG_IP_SET_HASH_IPMARK is not set
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
# CONFIG_IP_SET_HASH_IPMAC is not set
# CONFIG_IP_SET_HASH_MAC is not set
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
# CONFIG_IP_VS_PROTO_SCTP is not set

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
# CONFIG_IP_VS_FO is not set
# CONFIG_IP_VS_OVF is not set
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
# CONFIG_NF_SOCKET_IPV4 is not set
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
# CONFIG_NFT_DUP_IPV4 is not set
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
# CONFIG_NF_LOG_ARP is not set
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
# CONFIG_IP_NF_TARGET_SYNPROXY is not set
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_NETMAP is not set
# CONFIG_IP_NF_TARGET_REDIRECT is not set
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_SOCKET_IPV6 is not set
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
# CONFIG_NFT_DUP_IPV6 is not set
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
# CONFIG_IP6_NF_TARGET_SYNPROXY is not set
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
# CONFIG_IP6_NF_TARGET_NPT is not set
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
# CONFIG_NFT_BRIDGE_REJECT is not set
# CONFIG_NF_CONNTRACK_BRIDGE is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_INET_SCTP_DIAG=m
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
# CONFIG_RDS_DEBUG is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=y
CONFIG_GARP=m
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_CAKE is not set
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
CONFIG_NET_SCH_ETS=m
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_CLS_MATCHALL is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
# CONFIG_NET_ACT_SAMPLE is not set
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
# CONFIG_NET_ACT_SKBMOD is not set
# CONFIG_NET_ACT_IFE is not set
# CONFIG_NET_ACT_TUNNEL_KEY is not set
CONFIG_NET_ACT_GATE=m
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_OPENVSWITCH_VXLAN=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
# CONFIG_MPLS_ROUTING is not set
CONFIG_NET_NSH=m
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
# CONFIG_NET_L3_MASTER_DEV is not set
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
# CONFIG_MCTP is not set
CONFIG_FIB_RULES=y
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
# CONFIG_NFC is not set
# CONFIG_PSAMPLE is not set
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
# CONFIG_PAGE_POOL_STATS is not set
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
# CONFIG_NETDEV_ADDR_LIST_TEST is not set

#
# Device Drivers
#
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCIEASPM is not set
# CONFIG_PCIE_PTM is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_MSI_ARCH_FALLBACKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
# CONFIG_PCI_PF_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y

#
# PCI controller drivers
#

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_DEVTMPFS_SAFE=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=m
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_ZRAM=y
CONFIG_ZRAM_DEF_COMP_LZORLE=y
# CONFIG_ZRAM_DEF_COMP_ZSTD is not set
# CONFIG_ZRAM_DEF_COMP_LZ4 is not set
# CONFIG_ZRAM_DEF_COMP_LZO is not set
# CONFIG_ZRAM_DEF_COMP_LZ4HC is not set
# CONFIG_ZRAM_DEF_COMP_842 is not set
CONFIG_ZRAM_DEF_COMP="lzo-rle"
# CONFIG_ZRAM_WRITEBACK is not set
# CONFIG_ZRAM_MEMORY_TRACKING is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_DRBD=m
# CONFIG_DRBD_FAULT_INJECTION is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# S/390 block device drivers
#
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_RBD=m

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
# CONFIG_NVME_MULTIPATH is not set
# CONFIG_NVME_VERBOSE_ERRORS is not set
# CONFIG_NVME_RDMA is not set
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TCP is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_TIFM_CORE is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=y
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_MPI3MR is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=m
CONFIG_SCSI_DH_HP_SW=m
CONFIG_SCSI_DH_EMC=m
CONFIG_SCSI_DH_ALUA=m
# end of SCSI device support

# CONFIG_ATA is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
CONFIG_BCACHE=m
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
# CONFIG_BCACHE_ASYNC_REGISTRATION is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
# CONFIG_DM_CACHE is not set
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
CONFIG_DM_MULTIPATH_IOA=m
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
# CONFIG_DM_LOG_WRITES is not set
CONFIG_DM_INTEGRITY=m
CONFIG_DM_AUDIT=y
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
# CONFIG_WIREGUARD is not set
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
CONFIG_BAREUDP=m
# CONFIG_GTP is not set
CONFIG_AMT=m
# CONFIG_MACSEC is not set
# CONFIG_NETCONSOLE is not set
CONFIG_TUN=m
CONFIG_TAP=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALACRITECH is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ASIX is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CADENCE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_DAVICOM is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_ENGLEDER is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_FUNGIBLE is not set
# CONFIG_NET_VENDOR_GOOGLE is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_LITEX is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX4_CORE_GEN2=y
CONFIG_MLX5_CORE=m
# CONFIG_MLX5_FPGA is not set
CONFIG_MLX5_CORE_EN=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_MLX5_MPFS=y
CONFIG_MLX5_ESWITCH=y
CONFIG_MLX5_BRIDGE=y
CONFIG_MLX5_CLS_ACT=y
CONFIG_MLX5_TC_SAMPLE=y
# CONFIG_MLX5_CORE_IPOIB is not set
CONFIG_MLX5_SW_STEERING=y
# CONFIG_MLX5_SF is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_MICROSOFT is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETERION is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VERTEXCOM is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_NET_VENDOR_XILINX is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_MDIO_DEVICE is not set

#
# PCS device drivers
#
# end of PCS device drivers

CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPPOE=m
CONFIG_PPTP=m
CONFIG_PPPOL2TP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
# CONFIG_SLIP is not set
CONFIG_SLHC=m

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_OSX=y
CONFIG_CCWGROUP=y
CONFIG_ISM=m
# end of S/390 network device drivers

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WAN is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
# CONFIG_NOZOMI is not set
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_TTY_PRINTK is not set
CONFIG_VIRTIO_CONSOLE=m
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_S390=m
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
CONFIG_DEVMEM=y
CONFIG_DEVPORT=y
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
# CONFIG_TCG_VTPM_PROXY is not set

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_UV_UAPI=m
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_VMCP_CMA_SIZE=4
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set
CONFIG_RANDOM_TRUST_CPU=y
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
# CONFIG_WATCHDOG_SYSFS is not set
# CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_MADERA is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_VX855 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_MODE_HELPERS is not set
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION is not set
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

# CONFIG_LOGO is not set
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_INFINIBAND_MTHCA is not set
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX5_INFINIBAND=m
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_RDMA_RXE is not set
# CONFIG_RDMA_SIW is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI_CORE=m
CONFIG_VFIO_PCI=m
CONFIG_MLX5_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=m
CONFIG_VIRTIO_PCI_LIB_LEGACY=m
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_INPUT=y
# CONFIG_VIRTIO_MMIO is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_GOLDFISH is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_S390_IOMMU=y
CONFIG_S390_CCW_IOMMU=y
CONFIG_S390_AP_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

# CONFIG_RAS is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

# CONFIG_LIBNVDIMM is not set
CONFIG_DAX=y
# CONFIG_DEV_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
# CONFIG_EXT4_KUNIT_TESTS is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=y
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
CONFIG_NILFS2_FS=m
# CONFIG_F2FS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_LIMITED=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_ENCRYPTION_INLINE_CRYPT is not set
CONFIG_FS_VERITY=y
# CONFIG_FS_VERITY_DEBUG is not set
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_FUSE_DAX=y
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_ERROR_INJECTION is not set
# CONFIG_CACHEFILES_ONDEMAND is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_FAT_KUNIT_TEST is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
CONFIG_NTFS_RW=y
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
# CONFIG_PROC_VMCORE_DEVICE_DUMP is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_INODE64=y
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=m
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=m
# CONFIG_ECRYPT_FS_MESSAGING is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_ZSTD=y
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
CONFIG_ROMFS_ON_BLOCK=y
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
# CONFIG_NFS_V4_1 is not set
# CONFIG_NFS_FSCACHE is not set
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
# CONFIG_NFSD_SCSILAYOUT is not set
# CONFIG_NFSD_FLEXFILELAYOUT is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_SWAP=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
# CONFIG_SUNRPC_DEBUG is not set
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG is not set
CONFIG_CIFS_DFS_UPCALL=y
CONFIG_CIFS_SWN_UPCALL=y
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=m
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=m
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set
CONFIG_UNICODE=y
# CONFIG_UNICODE_NORMALIZATION_SELFTEST is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_USER_DECRYPTED_DATA is not set
# CONFIG_KEY_DH_OPERATIONS is not set
CONFIG_KEY_NOTIFICATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_INFINIBAND is not set
# CONFIG_SECURITY_NETWORK_XFRM is not set
CONFIG_SECURITY_PATH=y
CONFIG_LSM_MMAP_MIN_ADDR=65536
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=0
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_LOADPIN is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
CONFIG_SECURITY_LANDLOCK=y
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
# CONFIG_IMA_DEFAULT_HASH_SHA1 is not set
CONFIG_IMA_DEFAULT_HASH_SHA256=y
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha256"
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_READ_POLICY=y
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
# CONFIG_IMA_DISABLE_HTABLE is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
# CONFIG_INIT_STACK_ALL_PATTERN is not set
CONFIG_INIT_STACK_ALL_ZERO=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
# CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is not set
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ENGINE=m

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_ECDSA=m
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_SM2=m
CONFIG_CRYPTO_CURVE25519=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_NHPOLY1305=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_SM3=m
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=m
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=m
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
CONFIG_CRYPTO_STATS=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
# CONFIG_ZCRYPT_DEBUG is not set
CONFIG_ZCRYPT_MULTIDEVNODES=y
CONFIG_PKEY=m
CONFIG_CRYPTO_PAES_S390=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_SHA3_256_S390=m
CONFIG_CRYPTO_SHA3_512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_CRYPTO_CHACHA_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_CRYPTO_CRC32_S390=y
# CONFIG_CRYPTO_DEV_NITROX_CNN55XX is not set
CONFIG_CRYPTO_DEV_VIRTIO=m
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
# CONFIG_SIGNED_PE_FILE_VERIFICATION is not set
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
# CONFIG_SYSTEM_BLACKLIST_KEYRING is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=m
CONFIG_PRIME_NUMBERS=m
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
CONFIG_CRYPTO_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=m
CONFIG_CRYPTO_LIB_CURVE25519=m
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
CONFIG_CRYPTO_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_CHACHA20POLY1305=m
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_LIB_MEMNEQ=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=m
CONFIG_842_DECOMPRESS=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_ZLIB_DFLTCC=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_MICROLZMA=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_IOMMU_HELPER=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_LRU_CACHE=m
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_VMLINUX_MAP is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# CONFIG_KCSAN is not set
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_DEBUG_WX=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
CONFIG_PTDUMP_DEBUGFS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
CONFIG_TEST_LOCKUP=m
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=m
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_REF_SCALE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
CONFIG_RCU_TRACE=y
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_NOP_MCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_BOOTTIME_TRACING=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_AUXDISPLAY is not set
# CONFIG_SAMPLE_TRACE_EVENTS is not set
# CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS is not set
CONFIG_SAMPLE_TRACE_PRINTK=m
CONFIG_SAMPLE_FTRACE_DIRECT=m
CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
# CONFIG_SAMPLE_TRACE_ARRAY is not set
# CONFIG_SAMPLE_KOBJECT is not set
# CONFIG_SAMPLE_KPROBES is not set
# CONFIG_SAMPLE_KFIFO is not set
# CONFIG_SAMPLE_LIVEPATCH is not set
# CONFIG_SAMPLE_CONFIGFS is not set
# CONFIG_SAMPLE_VFIO_MDEV_MTTY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB is not set
# CONFIG_SAMPLE_VFIO_MDEV_MBOCHS is not set
# CONFIG_SAMPLE_WATCHDOG is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set

#
# s390 Debugging
#
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_CIO_INJECT is not set
# end of s390 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=m
CONFIG_KUNIT_DEBUGFS=y
# CONFIG_KUNIT_TEST is not set
# CONFIG_KUNIT_EXAMPLE_TEST is not set
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=m
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_TEST_DIV64 is not set
CONFIG_KPROBES_SANITY_TEST=m
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
# CONFIG_SYSCTL_KUNIT_TEST is not set
# CONFIG_LIST_KUNIT_TEST is not set
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_CMDLINE_KUNIT_TEST is not set
# CONFIG_BITS_TEST is not set
# CONFIG_SLUB_KUNIT_TEST is not set
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
CONFIG_TEST_LIVEPATCH=m
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_FREE_PAGES is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-29 20:25                                 ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-29 20:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Tue, 28 Jun 2022 17:42:22 -0500
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >
> >> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> >> index 156a99283b11..cb85bcf84640 100644
> >> --- a/kernel/ptrace.c
> >> +++ b/kernel/ptrace.c
> >> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
> >>  	spin_lock_irq(&task->sighand->siglock);
> >>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
> >>  	    !__fatal_signal_pending(task)) {
> >> +		smp_rmb();
> >>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
> >>  		ret = true;
> >>  	}
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index edb1dc9b00dc..bcd576e9de66 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> >>  		return exit_code;
> >>  
> >>  	set_special_state(TASK_TRACED);
> >> +	smp_wmb();
> >>  	current->jobctl |= JOBCTL_TRACED;
> >>  
> >
> > Are not these both done under the sighand->siglock spinlock?
> >
> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?
> 
> I just know the observed behavior is:
> 
> - reading tsk->jobctl and seeing  JOBCTL_TRACED set.
> - reading tsk->__state and seeing TASK_RUNNING.
> 
> So unless PREEMPT_RT is enabled on s390.  It looks like there is a
> barrier problem.
> 
> Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
> you don't but I figure I should ask and make certain as PREEMPT_RT can
> cause this kind of failure.

There is no change with the barriers added.

CONFIG_PREEMPT_RT is disabled and CONFIG_LOCKDEP is enabled (in attach).
FWIW, I also added a full barrier:

@@ -271,6 +272,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
        if (!ret && !ignore_state) {
                unsigned int __state;
 
+               smp_mb();
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
                __state = READ_ONCE(child->__state);

I have not been able to extract the ftrace ring buffer yet - going to do that.

> Eric

Thanks!

[-- Attachment #2: config-5.19.0-rc4-08751-g2cf560748ed6 --]
[-- Type: text/plain, Size: 87568 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 5.19.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="s390x-12.1.0-gcc (GCC) 12.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120100
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=121
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_HAVE_KERNEL_UNCOMPRESSED=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
# CONFIG_KERNEL_UNCOMPRESSED is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_WATCH_QUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# CONFIG_BPF_JIT is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_SCHED_CORE=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CC_NO_ARRAY_BOUNDS=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_MISC=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
CONFIG_BOOT_CONFIG=y
# CONFIG_BOOT_CONFIG_EMBED is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_MMU=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_PGSTE=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
CONFIG_HAVE_MARCH_ZEC12_FEATURES=y
# CONFIG_MARCH_Z10 is not set
# CONFIG_MARCH_Z196 is not set
CONFIG_MARCH_ZEC12=y
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z14 is not set
# CONFIG_MARCH_Z15 is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
# CONFIG_TUNE_Z14 is not set
# CONFIG_TUNE_Z15 is not set
# CONFIG_TUNE_Z16 is not set
CONFIG_64BIT=y
CONFIG_COMMAND_LINE_SIZE=4096
CONFIG_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=512
CONFIG_HOTPLUG_CPU=y
CONFIG_NUMA=y
CONFIG_NODES_SHIFT=1
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_DRAWER=y
CONFIG_SCHED_TOPOLOGY=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
CONFIG_ARCH_RANDOM=y
# CONFIG_KERNEL_NOBP is not set
CONFIG_EXPOLINE=y
# CONFIG_EXPOLINE_EXTERN is not set
# CONFIG_EXPOLINE_OFF is not set
CONFIG_EXPOLINE_AUTO=y
# CONFIG_EXPOLINE_FULL is not set
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
# end of Processor type and features

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_MAX_PHYSMEM_BITS=46
# end of Memory setup

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI_NR_FUNCTIONS=512
CONFIG_HAS_IOMEM=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m
CONFIG_VFIO_CCW=m
CONFIG_VFIO_AP=m
# end of I/O subsystem

#
# Dump support
#
CONFIG_CRASH_DUMP=y
# end of Dump support

CONFIG_CCW=y
CONFIG_HAVE_PNETID=y

#
# Virtualization
#
CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_HAVE_KVM_INVALID_WAKEUPS=y
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
# CONFIG_KVM_S390_UCONTROL is not set
CONFIG_S390_GUEST=y
# end of Virtualization

CONFIG_S390_MODULES_SANITY_TEST_HELPERS=y

#
# Selftests
#
CONFIG_S390_UNWIND_SELFTEST=m
CONFIG_S390_KPROBES_SANITY_TEST=m
CONFIG_S390_MODULES_SANITY_TEST=m
# end of Selftests

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_USTAT_F_TINODE=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_NO_GATHER=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE=y
CONFIG_ARCH_HAS_SCALED_CPUTIME=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ALTERNATE_USER_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_VDSO_DATA=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
CONFIG_MODVERSIONS=y
CONFIG_ASM_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_RQ_ALLOC_TIME=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_ICQ=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
CONFIG_BLK_WBT=y
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BLK_CGROUP_IOCOST=y
CONFIG_BLK_CGROUP_IOPRIO=y
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_STATE=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_ZPOOL=y
CONFIG_SWAP=y
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_DEFAULT_ON is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
CONFIG_MEM_SOFT_DIRTY=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_PAGE_IDLE_FLAG=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ZONE_DMA=y
CONFIG_HMM_MIRROR=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_ANON_VMA_NAME=y
CONFIG_USERFAULTFD=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_NET_REDIRECT=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
CONFIG_UNIX_DIAG=m
# CONFIG_TLS is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_INTERFACE is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_XFRM_ESPINTCP=y
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
# CONFIG_INET_ESP_OFFLOAD is not set
CONFIG_INET_ESPINTCP=y
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
# CONFIG_TCP_CONG_NV is not set
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
# CONFIG_TCP_CONG_BBR is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
# CONFIG_INET6_ESP_OFFLOAD is not set
CONFIG_INET6_ESPINTCP=y
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
CONFIG_IPV6_RPL_LWTUNNEL=y
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_NETLABEL is not set
CONFIG_MPTCP=y
CONFIG_INET_MPTCP_DIAG=m
CONFIG_MPTCP_IPV6=y
# CONFIG_MPTCP_KUNIT_TEST is not set
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_EGRESS=y
CONFIG_NETFILTER_SKIP_EGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
CONFIG_NETFILTER_NETLINK_HOOK=m
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
# CONFIG_NF_TABLES_NETDEV is not set
# CONFIG_NFT_NUMGEN is not set
CONFIG_NFT_CT=m
# CONFIG_NFT_CONNLIMIT is not set
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
# CONFIG_NFT_MASQ is not set
# CONFIG_NFT_REDIR is not set
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
# CONFIG_NFT_QUEUE is not set
# CONFIG_NFT_QUOTA is not set
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
# CONFIG_NFT_SOCKET is not set
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
# CONFIG_NETFILTER_XT_TARGET_NETMAP is not set
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
# CONFIG_NETFILTER_XT_TARGET_REDIRECT is not set
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_L2TP=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
# CONFIG_NETFILTER_XT_MATCH_SOCKET is not set
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
# CONFIG_IP_SET_HASH_IPMARK is not set
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
# CONFIG_IP_SET_HASH_IPMAC is not set
# CONFIG_IP_SET_HASH_MAC is not set
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
# CONFIG_IP_VS_PROTO_SCTP is not set

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
# CONFIG_IP_VS_FO is not set
# CONFIG_IP_VS_OVF is not set
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
# CONFIG_NF_SOCKET_IPV4 is not set
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
# CONFIG_NFT_DUP_IPV4 is not set
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
# CONFIG_NF_LOG_ARP is not set
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
# CONFIG_IP_NF_TARGET_SYNPROXY is not set
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_NETMAP is not set
# CONFIG_IP_NF_TARGET_REDIRECT is not set
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_SOCKET_IPV6 is not set
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
# CONFIG_NFT_DUP_IPV6 is not set
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
# CONFIG_IP6_NF_TARGET_SYNPROXY is not set
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
# CONFIG_IP6_NF_TARGET_NPT is not set
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
# CONFIG_NFT_BRIDGE_REJECT is not set
# CONFIG_NF_CONNTRACK_BRIDGE is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_INET_SCTP_DIAG=m
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
# CONFIG_RDS_DEBUG is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=y
CONFIG_GARP=m
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_CAKE is not set
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
CONFIG_NET_SCH_ETS=m
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_CLS_MATCHALL is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
# CONFIG_NET_ACT_SAMPLE is not set
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
# CONFIG_NET_ACT_SKBMOD is not set
# CONFIG_NET_ACT_IFE is not set
# CONFIG_NET_ACT_TUNNEL_KEY is not set
CONFIG_NET_ACT_GATE=m
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_OPENVSWITCH_VXLAN=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
# CONFIG_MPLS_ROUTING is not set
CONFIG_NET_NSH=m
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
# CONFIG_NET_L3_MASTER_DEV is not set
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
# CONFIG_MCTP is not set
CONFIG_FIB_RULES=y
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
# CONFIG_NFC is not set
# CONFIG_PSAMPLE is not set
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
# CONFIG_PAGE_POOL_STATS is not set
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
# CONFIG_NETDEV_ADDR_LIST_TEST is not set

#
# Device Drivers
#
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCIEASPM is not set
# CONFIG_PCIE_PTM is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_MSI_ARCH_FALLBACKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
# CONFIG_PCI_PF_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y

#
# PCI controller drivers
#

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_DEVTMPFS_SAFE=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=m
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_ZRAM=y
CONFIG_ZRAM_DEF_COMP_LZORLE=y
# CONFIG_ZRAM_DEF_COMP_ZSTD is not set
# CONFIG_ZRAM_DEF_COMP_LZ4 is not set
# CONFIG_ZRAM_DEF_COMP_LZO is not set
# CONFIG_ZRAM_DEF_COMP_LZ4HC is not set
# CONFIG_ZRAM_DEF_COMP_842 is not set
CONFIG_ZRAM_DEF_COMP="lzo-rle"
# CONFIG_ZRAM_WRITEBACK is not set
# CONFIG_ZRAM_MEMORY_TRACKING is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_DRBD=m
# CONFIG_DRBD_FAULT_INJECTION is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# S/390 block device drivers
#
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_RBD=m

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
# CONFIG_NVME_MULTIPATH is not set
# CONFIG_NVME_VERBOSE_ERRORS is not set
# CONFIG_NVME_RDMA is not set
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TCP is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_TIFM_CORE is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=y
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_MPI3MR is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=m
CONFIG_SCSI_DH_HP_SW=m
CONFIG_SCSI_DH_EMC=m
CONFIG_SCSI_DH_ALUA=m
# end of SCSI device support

# CONFIG_ATA is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
CONFIG_BCACHE=m
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
# CONFIG_BCACHE_ASYNC_REGISTRATION is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
# CONFIG_DM_CACHE is not set
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
CONFIG_DM_MULTIPATH_IOA=m
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
# CONFIG_DM_LOG_WRITES is not set
CONFIG_DM_INTEGRITY=m
CONFIG_DM_AUDIT=y
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
# CONFIG_WIREGUARD is not set
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
CONFIG_BAREUDP=m
# CONFIG_GTP is not set
CONFIG_AMT=m
# CONFIG_MACSEC is not set
# CONFIG_NETCONSOLE is not set
CONFIG_TUN=m
CONFIG_TAP=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALACRITECH is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ASIX is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CADENCE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_DAVICOM is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_ENGLEDER is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_FUNGIBLE is not set
# CONFIG_NET_VENDOR_GOOGLE is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_LITEX is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX4_CORE_GEN2=y
CONFIG_MLX5_CORE=m
# CONFIG_MLX5_FPGA is not set
CONFIG_MLX5_CORE_EN=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_MLX5_MPFS=y
CONFIG_MLX5_ESWITCH=y
CONFIG_MLX5_BRIDGE=y
CONFIG_MLX5_CLS_ACT=y
CONFIG_MLX5_TC_SAMPLE=y
# CONFIG_MLX5_CORE_IPOIB is not set
CONFIG_MLX5_SW_STEERING=y
# CONFIG_MLX5_SF is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_MICROSOFT is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETERION is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VERTEXCOM is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_NET_VENDOR_XILINX is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_MDIO_DEVICE is not set

#
# PCS device drivers
#
# end of PCS device drivers

CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPPOE=m
CONFIG_PPTP=m
CONFIG_PPPOL2TP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
# CONFIG_SLIP is not set
CONFIG_SLHC=m

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_OSX=y
CONFIG_CCWGROUP=y
CONFIG_ISM=m
# end of S/390 network device drivers

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WAN is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
# CONFIG_NOZOMI is not set
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_TTY_PRINTK is not set
CONFIG_VIRTIO_CONSOLE=m
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_S390=m
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
CONFIG_DEVMEM=y
CONFIG_DEVPORT=y
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
# CONFIG_TCG_VTPM_PROXY is not set

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_UV_UAPI=m
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_VMCP_CMA_SIZE=4
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set
CONFIG_RANDOM_TRUST_CPU=y
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
# CONFIG_WATCHDOG_SYSFS is not set
# CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_MADERA is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_VX855 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_MODE_HELPERS is not set
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION is not set
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

# CONFIG_LOGO is not set
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_INFINIBAND_MTHCA is not set
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX5_INFINIBAND=m
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_RDMA_RXE is not set
# CONFIG_RDMA_SIW is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI_CORE=m
CONFIG_VFIO_PCI=m
CONFIG_MLX5_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=m
CONFIG_VIRTIO_PCI_LIB_LEGACY=m
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_INPUT=y
# CONFIG_VIRTIO_MMIO is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_GOLDFISH is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_S390_IOMMU=y
CONFIG_S390_CCW_IOMMU=y
CONFIG_S390_AP_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

# CONFIG_RAS is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

# CONFIG_LIBNVDIMM is not set
CONFIG_DAX=y
# CONFIG_DEV_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
# CONFIG_EXT4_KUNIT_TESTS is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=y
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
CONFIG_NILFS2_FS=m
# CONFIG_F2FS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_LIMITED=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_ENCRYPTION_INLINE_CRYPT is not set
CONFIG_FS_VERITY=y
# CONFIG_FS_VERITY_DEBUG is not set
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_FUSE_DAX=y
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_ERROR_INJECTION is not set
# CONFIG_CACHEFILES_ONDEMAND is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_FAT_KUNIT_TEST is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
CONFIG_NTFS_RW=y
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
# CONFIG_PROC_VMCORE_DEVICE_DUMP is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_INODE64=y
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=m
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=m
# CONFIG_ECRYPT_FS_MESSAGING is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_ZSTD=y
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
CONFIG_ROMFS_ON_BLOCK=y
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
# CONFIG_NFS_V4_1 is not set
# CONFIG_NFS_FSCACHE is not set
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
# CONFIG_NFSD_SCSILAYOUT is not set
# CONFIG_NFSD_FLEXFILELAYOUT is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_SWAP=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
# CONFIG_SUNRPC_DEBUG is not set
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG is not set
CONFIG_CIFS_DFS_UPCALL=y
CONFIG_CIFS_SWN_UPCALL=y
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=m
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=m
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set
CONFIG_UNICODE=y
# CONFIG_UNICODE_NORMALIZATION_SELFTEST is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_USER_DECRYPTED_DATA is not set
# CONFIG_KEY_DH_OPERATIONS is not set
CONFIG_KEY_NOTIFICATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_INFINIBAND is not set
# CONFIG_SECURITY_NETWORK_XFRM is not set
CONFIG_SECURITY_PATH=y
CONFIG_LSM_MMAP_MIN_ADDR=65536
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=0
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_LOADPIN is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
CONFIG_SECURITY_LANDLOCK=y
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
# CONFIG_IMA_DEFAULT_HASH_SHA1 is not set
CONFIG_IMA_DEFAULT_HASH_SHA256=y
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha256"
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_READ_POLICY=y
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
# CONFIG_IMA_DISABLE_HTABLE is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
# CONFIG_INIT_STACK_ALL_PATTERN is not set
CONFIG_INIT_STACK_ALL_ZERO=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
# CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is not set
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ENGINE=m

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_ECDSA=m
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_SM2=m
CONFIG_CRYPTO_CURVE25519=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_NHPOLY1305=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_SM3=m
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=m
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=m
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
CONFIG_CRYPTO_STATS=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
# CONFIG_ZCRYPT_DEBUG is not set
CONFIG_ZCRYPT_MULTIDEVNODES=y
CONFIG_PKEY=m
CONFIG_CRYPTO_PAES_S390=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_SHA3_256_S390=m
CONFIG_CRYPTO_SHA3_512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_CRYPTO_CHACHA_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_CRYPTO_CRC32_S390=y
# CONFIG_CRYPTO_DEV_NITROX_CNN55XX is not set
CONFIG_CRYPTO_DEV_VIRTIO=m
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
# CONFIG_SIGNED_PE_FILE_VERIFICATION is not set
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
# CONFIG_SYSTEM_BLACKLIST_KEYRING is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=m
CONFIG_PRIME_NUMBERS=m
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
CONFIG_CRYPTO_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=m
CONFIG_CRYPTO_LIB_CURVE25519=m
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
CONFIG_CRYPTO_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_CHACHA20POLY1305=m
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_LIB_MEMNEQ=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=m
CONFIG_842_DECOMPRESS=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_ZLIB_DFLTCC=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_MICROLZMA=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_IOMMU_HELPER=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_LRU_CACHE=m
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_VMLINUX_MAP is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# CONFIG_KCSAN is not set
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_DEBUG_WX=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
CONFIG_PTDUMP_DEBUGFS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
CONFIG_TEST_LOCKUP=m
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=m
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_REF_SCALE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
CONFIG_RCU_TRACE=y
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_NOP_MCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_BOOTTIME_TRACING=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_AUXDISPLAY is not set
# CONFIG_SAMPLE_TRACE_EVENTS is not set
# CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS is not set
CONFIG_SAMPLE_TRACE_PRINTK=m
CONFIG_SAMPLE_FTRACE_DIRECT=m
CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
# CONFIG_SAMPLE_TRACE_ARRAY is not set
# CONFIG_SAMPLE_KOBJECT is not set
# CONFIG_SAMPLE_KPROBES is not set
# CONFIG_SAMPLE_KFIFO is not set
# CONFIG_SAMPLE_LIVEPATCH is not set
# CONFIG_SAMPLE_CONFIGFS is not set
# CONFIG_SAMPLE_VFIO_MDEV_MTTY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB is not set
# CONFIG_SAMPLE_VFIO_MDEV_MBOCHS is not set
# CONFIG_SAMPLE_WATCHDOG is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set

#
# s390 Debugging
#
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_CIO_INJECT is not set
# end of s390 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=m
CONFIG_KUNIT_DEBUGFS=y
# CONFIG_KUNIT_TEST is not set
# CONFIG_KUNIT_EXAMPLE_TEST is not set
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=m
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_TEST_DIV64 is not set
CONFIG_KPROBES_SANITY_TEST=m
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
# CONFIG_SYSCTL_KUNIT_TEST is not set
# CONFIG_LIST_KUNIT_TEST is not set
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_CMDLINE_KUNIT_TEST is not set
# CONFIG_BITS_TEST is not set
# CONFIG_SLUB_KUNIT_TEST is not set
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
CONFIG_TEST_LIVEPATCH=m
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_FREE_PAGES is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

[-- Attachment #3: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-06-29 20:25                                 ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-06-29 20:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, linux-kernel, rjw, Oleg Nesterov, mingo,
	vincent.guittot, dietmar.eggemann, mgorman, bigeasy, Will Deacon,
	tj, linux-pm, Peter Zijlstra, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um, Chris Zankel, Max Filippov,
	linux-xtensa, Kees Cook, Jann Horn, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
> 
> > On Tue, 28 Jun 2022 17:42:22 -0500
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >
> >> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> >> index 156a99283b11..cb85bcf84640 100644
> >> --- a/kernel/ptrace.c
> >> +++ b/kernel/ptrace.c
> >> @@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
> >>  	spin_lock_irq(&task->sighand->siglock);
> >>  	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
> >>  	    !__fatal_signal_pending(task)) {
> >> +		smp_rmb();
> >>  		task->jobctl |= JOBCTL_PTRACE_FROZEN;
> >>  		ret = true;
> >>  	}
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index edb1dc9b00dc..bcd576e9de66 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -2233,6 +2233,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
> >>  		return exit_code;
> >>  
> >>  	set_special_state(TASK_TRACED);
> >> +	smp_wmb();
> >>  	current->jobctl |= JOBCTL_TRACED;
> >>  
> >
> > Are not these both done under the sighand->siglock spinlock?
> >
> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?
> 
> I just know the observed behavior is:
> 
> - reading tsk->jobctl and seeing  JOBCTL_TRACED set.
> - reading tsk->__state and seeing TASK_RUNNING.
> 
> So unless PREEMPT_RT is enabled on s390.  It looks like there is a
> barrier problem.
> 
> Alexander do you have PREEMPT_RT enabled on s390?  I have been assuming
> you don't but I figure I should ask and make certain as PREEMPT_RT can
> cause this kind of failure.

There is no change with the barriers added.

CONFIG_PREEMPT_RT is disabled and CONFIG_LOCKDEP is enabled (in attach).
FWIW, I also added a full barrier:

@@ -271,6 +272,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
        if (!ret && !ignore_state) {
                unsigned int __state;
 
+               smp_mb();
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_PTRACE_FROZEN));
                WARN_ON_ONCE(!(child->jobctl & JOBCTL_TRACED));
                __state = READ_ONCE(child->__state);

I have not been able to extract the ftrace ring buffer yet - going to do that.

> Eric

Thanks!

[-- Attachment #2: config-5.19.0-rc4-08751-g2cf560748ed6 --]
[-- Type: text/plain, Size: 87568 bytes --]

#
# Automatically generated file; DO NOT EDIT.
# Linux/s390 5.19.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="s390x-12.1.0-gcc (GCC) 12.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120100
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=121
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_HAVE_KERNEL_UNCOMPRESSED=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
# CONFIG_KERNEL_UNCOMPRESSED is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_WATCH_QUEUE=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
# CONFIG_TIME_KUNIT_TEST is not set

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# CONFIG_BPF_JIT is not set
# end of BPF subsystem

CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_SCHED_CORE=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
# CONFIG_PRINTK_INDEX is not set

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CC_IMPLICIT_FALLTHROUGH="-Wimplicit-fallthrough=5"
CONFIG_GCC12_NO_ARRAY_BOUNDS=y
CONFIG_CC_NO_ARRAY_BOUNDS=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_MISC=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_TIME_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
CONFIG_RD_ZSTD=y
CONFIG_BOOT_CONFIG=y
# CONFIG_BOOT_CONFIG_EMBED is not set
CONFIG_INITRAMFS_PRESERVE_MTIME=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_IO_URING=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_KCMP=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# end of Kernel Performance Events And Counters

CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# end of General setup

CONFIG_MMU=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_PGSTE=y
CONFIG_AUDIT_ARCH=y
CONFIG_NO_IOPORT_MAP=y
# CONFIG_PCI_QUIRKS is not set
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_S390=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PGTABLE_LEVELS=5
CONFIG_HAVE_LIVEPATCH=y
CONFIG_LIVEPATCH=y

#
# Processor type and features
#
CONFIG_HAVE_MARCH_Z10_FEATURES=y
CONFIG_HAVE_MARCH_Z196_FEATURES=y
CONFIG_HAVE_MARCH_ZEC12_FEATURES=y
# CONFIG_MARCH_Z10 is not set
# CONFIG_MARCH_Z196 is not set
CONFIG_MARCH_ZEC12=y
# CONFIG_MARCH_Z13 is not set
# CONFIG_MARCH_Z14 is not set
# CONFIG_MARCH_Z15 is not set
CONFIG_MARCH_ZEC12_TUNE=y
# CONFIG_TUNE_DEFAULT is not set
# CONFIG_TUNE_Z10 is not set
# CONFIG_TUNE_Z196 is not set
CONFIG_TUNE_ZEC12=y
# CONFIG_TUNE_Z13 is not set
# CONFIG_TUNE_Z14 is not set
# CONFIG_TUNE_Z15 is not set
# CONFIG_TUNE_Z16 is not set
CONFIG_64BIT=y
CONFIG_COMMAND_LINE_SIZE=4096
CONFIG_COMPAT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=512
CONFIG_HOTPLUG_CPU=y
CONFIG_NUMA=y
CONFIG_NODES_SHIFT=1
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_BOOK=y
CONFIG_SCHED_DRAWER=y
CONFIG_SCHED_TOPOLOGY=y
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
CONFIG_ARCH_RANDOM=y
# CONFIG_KERNEL_NOBP is not set
CONFIG_EXPOLINE=y
# CONFIG_EXPOLINE_EXTERN is not set
# CONFIG_EXPOLINE_OFF is not set
CONFIG_EXPOLINE_AUTO=y
# CONFIG_EXPOLINE_FULL is not set
CONFIG_RELOCATABLE=y
CONFIG_RANDOMIZE_BASE=y
# end of Processor type and features

#
# Memory setup
#
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_MAX_PHYSMEM_BITS=46
# end of Memory setup

#
# I/O subsystem
#
CONFIG_QDIO=y
CONFIG_PCI_NR_FUNCTIONS=512
CONFIG_HAS_IOMEM=y
CONFIG_CHSC_SCH=y
CONFIG_SCM_BUS=y
CONFIG_EADM_SCH=m
CONFIG_VFIO_CCW=m
CONFIG_VFIO_AP=m
# end of I/O subsystem

#
# Dump support
#
CONFIG_CRASH_DUMP=y
# end of Dump support

CONFIG_CCW=y
CONFIG_HAVE_PNETID=y

#
# Virtualization
#
CONFIG_PROTECTED_VIRTUALIZATION_GUEST=y
CONFIG_PFAULT=y
CONFIG_CMM=m
CONFIG_CMM_IUCV=y
CONFIG_APPLDATA_BASE=y
CONFIG_APPLDATA_MEM=m
CONFIG_APPLDATA_OS=m
CONFIG_APPLDATA_NET_SUM=m
CONFIG_S390_HYPFS_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQFD=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_KVM_ASYNC_PF_SYNC=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM_VFIO=y
CONFIG_HAVE_KVM_INVALID_WAKEUPS=y
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL=y
CONFIG_HAVE_KVM_NO_POLL=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
# CONFIG_KVM_S390_UCONTROL is not set
CONFIG_S390_GUEST=y
# end of Virtualization

CONFIG_S390_MODULES_SANITY_TEST_HELPERS=y

#
# Selftests
#
CONFIG_S390_UNWIND_SELFTEST=m
CONFIG_S390_KPROBES_SANITY_TEST=m
CONFIG_S390_MODULES_SANITY_TEST=m
# end of Selftests

#
# General architecture-dependent options
#
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_GENERIC_ENTRY=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
# CONFIG_STATIC_KEYS_SELFTEST is not set
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE=y
CONFIG_HAVE_FUNCTION_ERROR_INJECTION=y
CONFIG_HAVE_NMI=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_ARCH_HAS_FORTIFY_SOURCE=y
CONFIG_ARCH_HAS_SET_MEMORY=y
CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT=y
CONFIG_ARCH_WANTS_NO_INSTR=y
CONFIG_ARCH_32BIT_USTAT_F_TINODE=y
CONFIG_HAVE_ASM_MODVERSIONS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_RSEQ=y
CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_HAVE_ARCH_JUMP_LABEL_RELATIVE=y
CONFIG_MMU_GATHER_TABLE_FREE=y
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
CONFIG_MMU_GATHER_NO_GATHER=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
# CONFIG_SECCOMP_CACHE_DEBUG is not set
CONFIG_LTO_NONE=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE=y
CONFIG_ARCH_HAS_SCALED_CPUTIME=y
CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_SOFT_DIRTY=y
CONFIG_HAVE_MOD_ARCH_SPECIFIC=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK=y
CONFIG_ALTERNATE_USER_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_ELF_RANDOMIZE=y
CONFIG_PAGE_SIZE_LESS_THAN_64KB=y
CONFIG_PAGE_SIZE_LESS_THAN_256KB=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_CLONE_BACKWARDS2=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_OLD_SIGACTION=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_HAVE_ARCH_VMAP_STACK=y
CONFIG_VMAP_STACK=y
CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET=y
CONFIG_RANDOMIZE_KSTACK_OFFSET=y
# CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT is not set
CONFIG_ARCH_HAS_STRICT_KERNEL_RWX=y
CONFIG_STRICT_KERNEL_RWX=y
CONFIG_ARCH_HAS_STRICT_MODULE_RWX=y
CONFIG_STRICT_MODULE_RWX=y
# CONFIG_LOCK_EVENT_COUNTS is not set
CONFIG_ARCH_HAS_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_VDSO_DATA=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling

CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
# end of General architecture-dependent options

CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODULE_UNLOAD_TAINT_TRACKING is not set
CONFIG_MODVERSIONS=y
CONFIG_ASM_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
CONFIG_MODULE_SIG_ALL=y
# CONFIG_MODULE_SIG_SHA1 is not set
# CONFIG_MODULE_SIG_SHA224 is not set
CONFIG_MODULE_SIG_SHA256=y
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha256"
CONFIG_MODULE_COMPRESS_NONE=y
# CONFIG_MODULE_COMPRESS_GZIP is not set
# CONFIG_MODULE_COMPRESS_XZ is not set
# CONFIG_MODULE_COMPRESS_ZSTD is not set
# CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
CONFIG_MODPROBE_PATH="/sbin/modprobe"
# CONFIG_TRIM_UNUSED_KSYMS is not set
CONFIG_MODULES_TREE_LOOKUP=y
CONFIG_BLOCK=y
CONFIG_BLOCK_LEGACY_AUTOLOAD=y
CONFIG_BLK_RQ_ALLOC_TIME=y
CONFIG_BLK_CGROUP_RWSTAT=y
CONFIG_BLK_DEV_BSG_COMMON=y
CONFIG_BLK_ICQ=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_INTEGRITY_T10=y
# CONFIG_BLK_DEV_ZONED is not set
CONFIG_BLK_DEV_THROTTLING=y
# CONFIG_BLK_DEV_THROTTLING_LOW is not set
CONFIG_BLK_WBT=y
CONFIG_BLK_WBT_MQ=y
CONFIG_BLK_CGROUP_IOLATENCY=y
CONFIG_BLK_CGROUP_IOCOST=y
CONFIG_BLK_CGROUP_IOPRIO=y
CONFIG_BLK_DEBUG_FS=y
# CONFIG_BLK_SED_OPAL is not set
CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_AIX_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
CONFIG_IBM_PARTITION=y
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_CMDLINE_PARTITION is not set
# end of Partition Types

CONFIG_BLOCK_COMPAT=y
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_BLK_MQ_RDMA=y
CONFIG_BLOCK_HOLDER_DEPRECATED=y
CONFIG_BLK_MQ_STACKING=y

#
# IO Schedulers
#
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_MQ_IOSCHED_KYBER=y
CONFIG_IOSCHED_BFQ=y
CONFIG_BFQ_GROUP_IOSCHED=y
# CONFIG_BFQ_CGROUP_DEBUG is not set
# end of IO Schedulers

CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PADATA=y
CONFIG_ASN1=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK=y
CONFIG_ARCH_INLINE_SPIN_TRYLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK=y
CONFIG_ARCH_INLINE_SPIN_LOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_BH=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_READ_TRYLOCK=y
CONFIG_ARCH_INLINE_READ_LOCK=y
CONFIG_ARCH_INLINE_READ_LOCK_BH=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_READ_UNLOCK=y
CONFIG_ARCH_INLINE_READ_UNLOCK_BH=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_READ_UNLOCK_IRQRESTORE=y
CONFIG_ARCH_INLINE_WRITE_TRYLOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK=y
CONFIG_ARCH_INLINE_WRITE_LOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_LOCK_IRQSAVE=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_BH=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_RWSEM_SPIN_ON_OWNER=y
CONFIG_LOCK_SPIN_ON_OWNER=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_STATE=y
CONFIG_ELFCORE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=m
CONFIG_COREDUMP=y
# end of Executable file formats

#
# Memory Management options
#
CONFIG_ZPOOL=y
CONFIG_SWAP=y
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_DEFAULT_ON is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
CONFIG_ZBUD=y
# CONFIG_Z3FOLD is not set
CONFIG_ZSMALLOC=y
CONFIG_ZSMALLOC_STAT=y

#
# SLAB allocator options
#
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
# CONFIG_SLUB_STATS is not set
CONFIG_SLUB_CPU_PARTIAL=y
# end of SLAB allocator options

# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK_PHYS_MAP=y
CONFIG_HAVE_FAST_GUP=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y
CONFIG_MEMORY_BALLOON=y
CONFIG_BALLOON_COMPACTION=y
CONFIG_COMPACTION=y
CONFIG_PAGE_REPORTING=y
CONFIG_MIGRATION=y
CONFIG_CONTIG_ALLOC=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
# CONFIG_READ_ONLY_THP_FOR_FS is not set
CONFIG_FRONTSWAP=y
CONFIG_CMA=y
# CONFIG_CMA_DEBUG is not set
# CONFIG_CMA_DEBUGFS is not set
CONFIG_CMA_SYSFS=y
CONFIG_CMA_AREAS=7
CONFIG_MEM_SOFT_DIRTY=y
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
CONFIG_PAGE_IDLE_FLAG=y
CONFIG_IDLE_PAGE_TRACKING=y
CONFIG_ARCH_HAS_CURRENT_STACK_POINTER=y
CONFIG_ZONE_DMA=y
CONFIG_HMM_MIRROR=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PERCPU_STATS=y
# CONFIG_GUP_TEST is not set
CONFIG_ARCH_HAS_PTE_SPECIAL=y
CONFIG_ANON_VMA_NAME=y
CONFIG_USERFAULTFD=y

#
# Data Access Monitoring
#
# CONFIG_DAMON is not set
# end of Data Access Monitoring
# end of Memory Management options

CONFIG_NET=y
CONFIG_NET_INGRESS=y
CONFIG_NET_EGRESS=y
CONFIG_NET_REDIRECT=y
CONFIG_SKB_EXTENSIONS=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_SCM=y
CONFIG_AF_UNIX_OOB=y
CONFIG_UNIX_DIAG=m
# CONFIG_TLS is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
CONFIG_XFRM_USER=m
# CONFIG_XFRM_INTERFACE is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_AH=m
CONFIG_XFRM_ESP=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_XFRM_ESPINTCP=y
CONFIG_IUCV=y
CONFIG_AFIUCV=m
CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_IP_ROUTE_CLASSID=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IP_TUNNEL=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE_COMMON=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_NET_IPVTI=m
CONFIG_NET_UDP_TUNNEL=m
# CONFIG_NET_FOU is not set
# CONFIG_NET_FOU_IP_TUNNELS is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
# CONFIG_INET_ESP_OFFLOAD is not set
CONFIG_INET_ESPINTCP=y
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
# CONFIG_INET_RAW_DIAG is not set
# CONFIG_INET_DIAG_DESTROY is not set
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
# CONFIG_TCP_CONG_NV is not set
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
# CONFIG_TCP_CONG_BBR is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_IPV6_ROUTE_INFO is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
# CONFIG_INET6_ESP_OFFLOAD is not set
CONFIG_INET6_ESPINTCP=y
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=m
# CONFIG_IPV6_ILA is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_SIT_6RD is not set
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_GRE=m
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SUBTREES=y
# CONFIG_IPV6_MROUTE is not set
# CONFIG_IPV6_SEG6_LWTUNNEL is not set
# CONFIG_IPV6_SEG6_HMAC is not set
CONFIG_IPV6_RPL_LWTUNNEL=y
# CONFIG_IPV6_IOAM6_LWTUNNEL is not set
# CONFIG_NETLABEL is not set
CONFIG_MPTCP=y
CONFIG_INET_MPTCP_DIAG=m
CONFIG_MPTCP_IPV6=y
# CONFIG_MPTCP_KUNIT_TEST is not set
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=m

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_INGRESS=y
CONFIG_NETFILTER_EGRESS=y
CONFIG_NETFILTER_SKIP_EGRESS=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_FAMILY_BRIDGE=y
CONFIG_NETFILTER_FAMILY_ARP=y
CONFIG_NETFILTER_NETLINK_HOOK=m
CONFIG_NETFILTER_NETLINK_ACCT=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NETFILTER_NETLINK_OSF=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_LOG_SYSLOG=m
CONFIG_NETFILTER_CONNCOUNT=m
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
# CONFIG_NF_CONNTRACK_ZONES is not set
CONFIG_NF_CONNTRACK_PROCFS=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMEOUT=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_LABELS=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_GRE=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_BROADCAST=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_CT_NETLINK_TIMEOUT=m
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
CONFIG_NF_NAT=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_SIP=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_MASQUERADE=y
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
# CONFIG_NF_TABLES_NETDEV is not set
# CONFIG_NFT_NUMGEN is not set
CONFIG_NFT_CT=m
# CONFIG_NFT_CONNLIMIT is not set
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
# CONFIG_NFT_MASQ is not set
# CONFIG_NFT_REDIR is not set
CONFIG_NFT_NAT=m
# CONFIG_NFT_TUNNEL is not set
CONFIG_NFT_OBJREF=m
# CONFIG_NFT_QUEUE is not set
# CONFIG_NFT_QUOTA is not set
CONFIG_NFT_REJECT=m
CONFIG_NFT_REJECT_INET=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB=m
CONFIG_NFT_FIB_INET=m
# CONFIG_NFT_XFRM is not set
# CONFIG_NFT_SOCKET is not set
# CONFIG_NFT_OSF is not set
# CONFIG_NFT_TPROXY is not set
# CONFIG_NFT_SYNPROXY is not set
# CONFIG_NF_FLOW_TABLE is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XTABLES_COMPAT=y

#
# Xtables combined modules
#
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_SET=m

#
# Xtables targets
#
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_CT=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_NAT=m
# CONFIG_NETFILTER_XT_TARGET_NETMAP is not set
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_NOTRACK is not set
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
# CONFIG_NETFILTER_XT_TARGET_REDIRECT is not set
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m

#
# Xtables matches
#
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ECN=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPCOMP is not set
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_L2TP=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
# CONFIG_NETFILTER_XT_MATCH_SOCKET is not set
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
# end of Core Netfilter Configuration

CONFIG_IP_SET=m
CONFIG_IP_SET_MAX=256
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
# CONFIG_IP_SET_HASH_IPMARK is not set
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
# CONFIG_IP_SET_HASH_IPMAC is not set
# CONFIG_IP_SET_HASH_MAC is not set
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
# CONFIG_IP_VS_PROTO_SCTP is not set

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
# CONFIG_IP_VS_FO is not set
# CONFIG_IP_VS_OVF is not set
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
# CONFIG_IP_VS_MH is not set
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_TWOS=m

#
# IPVS SH scheduler
#
CONFIG_IP_VS_SH_TAB_BITS=8

#
# IPVS MH scheduler
#
CONFIG_IP_VS_MH_TAB_INDEX=12

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_NFCT=y
CONFIG_IP_VS_PE_SIP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
# CONFIG_NF_SOCKET_IPV4 is not set
CONFIG_NF_TPROXY_IPV4=m
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_REJECT_IPV4=m
# CONFIG_NFT_DUP_IPV4 is not set
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_DUP_IPV4=m
# CONFIG_NF_LOG_ARP is not set
CONFIG_NF_LOG_IPV4=m
CONFIG_NF_REJECT_IPV4=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
# CONFIG_IP_NF_TARGET_SYNPROXY is not set
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_NETMAP is not set
# CONFIG_IP_NF_TARGET_REDIRECT is not set
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
# end of IP: Netfilter Configuration

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_SOCKET_IPV6 is not set
CONFIG_NF_TPROXY_IPV6=m
CONFIG_NF_TABLES_IPV6=y
CONFIG_NFT_REJECT_IPV6=m
# CONFIG_NFT_DUP_IPV6 is not set
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_DUP_IPV6=m
CONFIG_NF_REJECT_IPV6=m
CONFIG_NF_LOG_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
# CONFIG_IP6_NF_MATCH_SRH is not set
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
# CONFIG_IP6_NF_TARGET_SYNPROXY is not set
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
# CONFIG_IP6_NF_TARGET_NPT is not set
# end of IPv6: Netfilter Configuration

CONFIG_NF_DEFRAG_IPV6=m
CONFIG_NF_TABLES_BRIDGE=m
# CONFIG_NFT_BRIDGE_META is not set
# CONFIG_NFT_BRIDGE_REJECT is not set
# CONFIG_NF_CONNTRACK_BRIDGE is not set
# CONFIG_BRIDGE_NF_EBTABLES is not set
# CONFIG_BPFILTER is not set
# CONFIG_IP_DCCP is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_OBJCNT is not set
CONFIG_SCTP_DEFAULT_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_SHA1 is not set
# CONFIG_SCTP_DEFAULT_COOKIE_HMAC_NONE is not set
CONFIG_SCTP_COOKIE_HMAC_MD5=y
# CONFIG_SCTP_COOKIE_HMAC_SHA1 is not set
CONFIG_INET_SCTP_DIAG=m
CONFIG_RDS=m
CONFIG_RDS_RDMA=m
CONFIG_RDS_TCP=m
# CONFIG_RDS_DEBUG is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_STP=y
CONFIG_GARP=m
CONFIG_BRIDGE=y
CONFIG_BRIDGE_IGMP_SNOOPING=y
# CONFIG_BRIDGE_VLAN_FILTERING is not set
CONFIG_BRIDGE_MRP=y
# CONFIG_BRIDGE_CFM is not set
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
# CONFIG_VLAN_8021Q_MVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_PHONET is not set
# CONFIG_6LOWPAN is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_CBS is not set
# CONFIG_NET_SCH_ETF is not set
# CONFIG_NET_SCH_TAPRIO is not set
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
# CONFIG_NET_SCH_SKBPRIO is not set
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
# CONFIG_NET_SCH_CAKE is not set
# CONFIG_NET_SCH_FQ is not set
# CONFIG_NET_SCH_HHF is not set
# CONFIG_NET_SCH_PIE is not set
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_SCH_PLUG=m
CONFIG_NET_SCH_ETS=m
# CONFIG_NET_SCH_DEFAULT is not set

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_CLS_FLOWER is not set
# CONFIG_NET_CLS_MATCHALL is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
# CONFIG_NET_ACT_SAMPLE is not set
CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
CONFIG_NET_ACT_SKBEDIT=m
CONFIG_NET_ACT_CSUM=m
# CONFIG_NET_ACT_MPLS is not set
# CONFIG_NET_ACT_VLAN is not set
# CONFIG_NET_ACT_BPF is not set
# CONFIG_NET_ACT_CONNMARK is not set
# CONFIG_NET_ACT_CTINFO is not set
# CONFIG_NET_ACT_SKBMOD is not set
# CONFIG_NET_ACT_IFE is not set
# CONFIG_NET_ACT_TUNNEL_KEY is not set
CONFIG_NET_ACT_GATE=m
# CONFIG_NET_TC_SKB_EXT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_DNS_RESOLVER=y
# CONFIG_BATMAN_ADV is not set
CONFIG_OPENVSWITCH=m
CONFIG_OPENVSWITCH_GRE=m
CONFIG_OPENVSWITCH_VXLAN=m
CONFIG_VSOCKETS=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_VSOCKETS_LOOPBACK=m
CONFIG_VIRTIO_VSOCKETS=m
CONFIG_VIRTIO_VSOCKETS_COMMON=m
CONFIG_NETLINK_DIAG=m
CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
# CONFIG_MPLS_ROUTING is not set
CONFIG_NET_NSH=m
# CONFIG_HSR is not set
CONFIG_NET_SWITCHDEV=y
# CONFIG_NET_L3_MASTER_DEV is not set
# CONFIG_QRTR is not set
# CONFIG_NET_NCSI is not set
CONFIG_PCPU_DEV_REFCNT=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_SOCK_RX_QUEUE_MAPPING=y
CONFIG_XPS=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_CGROUP_NET_CLASSID=y
CONFIG_NET_RX_BUSY_POLL=y
CONFIG_BQL=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_DROP_MONITOR is not set
# end of Network testing
# end of Networking options

# CONFIG_CAN is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_AF_KCM is not set
CONFIG_STREAM_PARSER=y
# CONFIG_MCTP is not set
CONFIG_FIB_RULES=y
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
CONFIG_CEPH_LIB=m
# CONFIG_CEPH_LIB_PRETTYDEBUG is not set
# CONFIG_CEPH_LIB_USE_DNS_RESOLVER is not set
# CONFIG_NFC is not set
# CONFIG_PSAMPLE is not set
# CONFIG_NET_IFE is not set
CONFIG_LWTUNNEL=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_DST_CACHE=y
CONFIG_GRO_CELLS=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NET_DEVLINK=y
CONFIG_PAGE_POOL=y
# CONFIG_PAGE_POOL_STATS is not set
CONFIG_FAILOVER=m
CONFIG_ETHTOOL_NETLINK=y
# CONFIG_NETDEV_ADDR_LIST_TEST is not set

#
# Device Drivers
#
CONFIG_HAVE_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
# CONFIG_PCIEPORTBUS is not set
# CONFIG_PCIEASPM is not set
# CONFIG_PCIE_PTM is not set
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
CONFIG_PCI_MSI_ARCH_FALLBACKS=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
# CONFIG_PCI_STUB is not set
# CONFIG_PCI_PF_STUB is not set
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
# CONFIG_PCI_PRI is not set
# CONFIG_PCI_PASID is not set
# CONFIG_PCIE_BUS_TUNE_OFF is not set
CONFIG_PCIE_BUS_DEFAULT=y
# CONFIG_PCIE_BUS_SAFE is not set
# CONFIG_PCIE_BUS_PERFORMANCE is not set
# CONFIG_PCIE_BUS_PEER2PEER is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
CONFIG_HOTPLUG_PCI_S390=y

#
# PCI controller drivers
#

#
# DesignWare PCI Core Support
#
# CONFIG_PCIE_DW_PLAT_HOST is not set
# CONFIG_PCI_MESON is not set
# end of DesignWare PCI Core Support

#
# Mobiveil PCIe Core Support
#
# end of Mobiveil PCIe Core Support

#
# Cadence PCIe controllers support
#
# end of Cadence PCIe controllers support
# end of PCI controller drivers

#
# PCI Endpoint
#
# CONFIG_PCI_ENDPOINT is not set
# end of PCI Endpoint

#
# PCI switch controller drivers
#
# CONFIG_PCI_SW_SWITCHTEC is not set
# end of PCI switch controller drivers

# CONFIG_CXL_BUS is not set
# CONFIG_PCCARD is not set
# CONFIG_RAPIDIO is not set

#
# Generic Driver Options
#
CONFIG_AUXILIARY_BUS=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set
CONFIG_DEVTMPFS_SAFE=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y

#
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_FW_LOADER_USER_HELPER is not set
# CONFIG_FW_LOADER_COMPRESS is not set
# CONFIG_FW_UPLOAD is not set
# end of Firmware loader

CONFIG_ALLOW_DEV_COREDUMP=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_SYS_HYPERVISOR=y
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_GENERIC_CPU_VULNERABILITIES=y
CONFIG_DMA_SHARED_BUFFER=y
# CONFIG_DMA_FENCE_TRACE is not set
# end of Generic Driver Options

#
# Bus devices
#
# CONFIG_MHI_BUS is not set
# CONFIG_MHI_BUS_EP is not set
# end of Bus devices

CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y

#
# Firmware Drivers
#

#
# ARM System Control and Management Interface Protocol
#
# end of ARM System Control and Management Interface Protocol

# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# Tegra firmware driver
#
# end of Tegra firmware driver
# end of Firmware Drivers

# CONFIG_GNSS is not set
# CONFIG_MTD is not set
# CONFIG_OF is not set
# CONFIG_PARPORT is not set
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_NULL_BLK is not set
CONFIG_CDROM=m
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
CONFIG_ZRAM=y
CONFIG_ZRAM_DEF_COMP_LZORLE=y
# CONFIG_ZRAM_DEF_COMP_ZSTD is not set
# CONFIG_ZRAM_DEF_COMP_LZ4 is not set
# CONFIG_ZRAM_DEF_COMP_LZO is not set
# CONFIG_ZRAM_DEF_COMP_LZ4HC is not set
# CONFIG_ZRAM_DEF_COMP_842 is not set
CONFIG_ZRAM_DEF_COMP="lzo-rle"
# CONFIG_ZRAM_WRITEBACK is not set
# CONFIG_ZRAM_MEMORY_TRACKING is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_DRBD=m
# CONFIG_DRBD_FAULT_INJECTION is not set
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=32768
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set

#
# S/390 block device drivers
#
CONFIG_DCSSBLK=m
CONFIG_DASD=y
CONFIG_DASD_PROFILE=y
CONFIG_DASD_ECKD=y
CONFIG_DASD_FBA=y
CONFIG_DASD_DIAG=y
CONFIG_DASD_EER=y
CONFIG_SCM_BLOCK=m
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_RBD=m

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
# CONFIG_NVME_MULTIPATH is not set
# CONFIG_NVME_VERBOSE_ERRORS is not set
# CONFIG_NVME_RDMA is not set
# CONFIG_NVME_FC is not set
# CONFIG_NVME_TCP is not set
# CONFIG_NVME_TARGET is not set
# end of NVME Support

#
# Misc devices
#
# CONFIG_DUMMY_IRQ is not set
# CONFIG_PHANTOM is not set
# CONFIG_TIFM_CORE is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_HP_ILO is not set
# CONFIG_SRAM is not set
# CONFIG_DW_XDATA_PCIE is not set
# CONFIG_PCI_ENDPOINT_TEST is not set
# CONFIG_XILINX_SDFEC is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
# end of EEPROM support

# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# end of Texas Instruments shared transport line discipline

#
# Altera FPGA firmware download module (requires I2C)
#
CONFIG_GENWQE=m
CONFIG_GENWQE_PLATFORM_ERROR_RECOVERY=0
# CONFIG_ECHO is not set
# CONFIG_BCM_VK is not set
# CONFIG_MISC_ALCOR_PCI is not set
# CONFIG_MISC_RTSX_PCI is not set
# CONFIG_HABANA_AI is not set
# CONFIG_UACCE is not set
# CONFIG_PVPANIC is not set
# end of Misc devices

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
CONFIG_RAID_ATTRS=m
CONFIG_SCSI_COMMON=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=y
CONFIG_BLK_DEV_BSG=y
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_HOST_SMP=y
CONFIG_SCSI_SRP_ATTRS=m
# end of SCSI Transports

CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_ESAS2R is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT3SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_MPI3MR is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_MYRB is not set
# CONFIG_LIBFC is not set
# CONFIG_SCSI_SNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FDOMAIN_PCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_WD719X is not set
CONFIG_SCSI_DEBUG=m
CONFIG_ZFCP=m
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_VIRTIO=m
# CONFIG_SCSI_CHELSIO_FCOE is not set
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=m
CONFIG_SCSI_DH_HP_SW=m
CONFIG_SCSI_DH_EMC=m
CONFIG_SCSI_DH_ALUA=m
# end of SCSI device support

# CONFIG_ATA is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
CONFIG_BCACHE=m
# CONFIG_BCACHE_DEBUG is not set
# CONFIG_BCACHE_CLOSURES_DEBUG is not set
# CONFIG_BCACHE_ASYNC_REGISTRATION is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
CONFIG_DM_BUFIO=m
# CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING is not set
CONFIG_DM_BIO_PRISON=m
CONFIG_DM_PERSISTENT_DATA=m
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
# CONFIG_DM_CACHE is not set
CONFIG_DM_WRITECACHE=m
# CONFIG_DM_EBS is not set
# CONFIG_DM_ERA is not set
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
CONFIG_DM_MULTIPATH_IOA=m
CONFIG_DM_DELAY=m
# CONFIG_DM_DUST is not set
CONFIG_DM_INIT=y
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
# CONFIG_DM_VERITY_FEC is not set
CONFIG_DM_SWITCH=m
# CONFIG_DM_LOG_WRITES is not set
CONFIG_DM_INTEGRITY=m
CONFIG_DM_AUDIT=y
# CONFIG_TARGET_CORE is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# end of IEEE 1394 (FireWire) support

CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_BONDING=m
CONFIG_DUMMY=m
# CONFIG_WIREGUARD is not set
CONFIG_EQUALIZER=m
# CONFIG_NET_FC is not set
CONFIG_IFB=m
# CONFIG_NET_TEAM is not set
CONFIG_MACVLAN=m
CONFIG_MACVTAP=m
# CONFIG_IPVLAN is not set
CONFIG_VXLAN=m
# CONFIG_GENEVE is not set
CONFIG_BAREUDP=m
# CONFIG_GTP is not set
CONFIG_AMT=m
# CONFIG_MACSEC is not set
# CONFIG_NETCONSOLE is not set
CONFIG_TUN=m
CONFIG_TAP=m
# CONFIG_TUN_VNET_CROSS_LE is not set
CONFIG_VETH=m
CONFIG_VIRTIO_NET=m
CONFIG_NLMON=m
# CONFIG_VSOCKMON is not set
# CONFIG_ARCNET is not set
CONFIG_ETHERNET=y
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALACRITECH is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_ALTERA_TSE is not set
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ASIX is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
# CONFIG_NET_VENDOR_BROADCOM is not set
# CONFIG_NET_VENDOR_CADENCE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_DAVICOM is not set
# CONFIG_DNET is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_ENGLEDER is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_FUNGIBLE is not set
# CONFIG_NET_VENDOR_GOOGLE is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_JME is not set
# CONFIG_NET_VENDOR_LITEX is not set
# CONFIG_NET_VENDOR_MARVELL is not set
CONFIG_NET_VENDOR_MELLANOX=y
CONFIG_MLX4_EN=m
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX4_CORE_GEN2=y
CONFIG_MLX5_CORE=m
# CONFIG_MLX5_FPGA is not set
CONFIG_MLX5_CORE_EN=y
CONFIG_MLX5_EN_ARFS=y
CONFIG_MLX5_EN_RXNFC=y
CONFIG_MLX5_MPFS=y
CONFIG_MLX5_ESWITCH=y
CONFIG_MLX5_BRIDGE=y
CONFIG_MLX5_CLS_ACT=y
CONFIG_MLX5_TC_SAMPLE=y
# CONFIG_MLX5_CORE_IPOIB is not set
CONFIG_MLX5_SW_STEERING=y
# CONFIG_MLX5_SF is not set
# CONFIG_MLXSW_CORE is not set
# CONFIG_MLXFW is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_MICROSOFT is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_FEALNX is not set
# CONFIG_NET_VENDOR_NI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETERION is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VERTEXCOM is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_NET_VENDOR_XILINX is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PHYLIB is not set
# CONFIG_MDIO_DEVICE is not set

#
# PCS device drivers
#
# end of PCS device drivers

CONFIG_PPP=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_MPPE=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPPOE=m
CONFIG_PPTP=m
CONFIG_PPPOL2TP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
# CONFIG_SLIP is not set
CONFIG_SLHC=m

#
# S/390 network device drivers
#
CONFIG_LCS=m
CONFIG_CTCM=m
CONFIG_NETIUCV=m
CONFIG_SMSGIUCV=m
CONFIG_SMSGIUCV_EVENT=m
CONFIG_QETH=y
CONFIG_QETH_L2=y
CONFIG_QETH_L3=y
CONFIG_QETH_OSX=y
CONFIG_CCWGROUP=y
CONFIG_ISM=m
# end of S/390 network device drivers

#
# Host-side USB support is needed for USB Network Adapter support
#
# CONFIG_WAN is not set

#
# Wireless WAN
#
# CONFIG_WWAN is not set
# end of Wireless WAN

# CONFIG_VMXNET3 is not set
# CONFIG_NETDEVSIM is not set
CONFIG_NET_FAILOVER=m

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set
# CONFIG_RMI4_CORE is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set
# end of Hardware I/O ports
# end of Input device support

#
# Character devices
#
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=0
CONFIG_LDISC_AUTOLOAD=y

#
# Serial drivers
#

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set
# CONFIG_SERIAL_FSL_LINFLEXUART is not set
# end of Serial drivers

# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_N_GSM is not set
# CONFIG_NOZOMI is not set
# CONFIG_NULL_TTY is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IUCV=y
# CONFIG_SERIAL_DEV_BUS is not set
# CONFIG_TTY_PRINTK is not set
CONFIG_VIRTIO_CONSOLE=m
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_BA431 is not set
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_HW_RANDOM_S390=m
# CONFIG_HW_RANDOM_XIPHERA is not set
# CONFIG_APPLICOM is not set
CONFIG_DEVMEM=y
CONFIG_DEVPORT=y
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=y
# CONFIG_TCG_VTPM_PROXY is not set

#
# S/390 character device drivers
#
CONFIG_TN3270=y
CONFIG_TN3270_TTY=y
CONFIG_TN3270_FS=y
CONFIG_TN3270_CONSOLE=y
CONFIG_TN3215=y
CONFIG_TN3215_CONSOLE=y
CONFIG_CCW_CONSOLE=y
CONFIG_SCLP_TTY=y
CONFIG_SCLP_CONSOLE=y
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
CONFIG_HMC_DRV=m
# CONFIG_SCLP_OFB is not set
CONFIG_S390_UV_UAPI=m
CONFIG_S390_TAPE=m

#
# S/390 tape hardware support
#
CONFIG_S390_TAPE_34XX=m
CONFIG_S390_TAPE_3590=m
CONFIG_VMLOGRDR=m
CONFIG_VMCP=y
CONFIG_VMCP_CMA_SIZE=4
CONFIG_MONREADER=m
CONFIG_MONWRITER=m
CONFIG_S390_VMUR=m
# CONFIG_XILLYBUS is not set
CONFIG_RANDOM_TRUST_CPU=y
CONFIG_RANDOM_TRUST_BOOTLOADER=y
# end of Character devices

#
# I2C support
#
# CONFIG_I2C is not set
# end of I2C support

# CONFIG_I3C is not set
# CONFIG_SPI is not set
# CONFIG_SPMI is not set
# CONFIG_HSI is not set
# CONFIG_PPS is not set

#
# PTP clock support
#
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_PTP_1588_CLOCK_OPTIONAL=y

#
# Enable PHYLIB and NETWORK_PHY_TIMESTAMPING to see the additional clocks.
#
# end of PTP clock support

# CONFIG_PINCTRL is not set
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_RESET is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_NOWAYOUT=y
CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED=y
CONFIG_WATCHDOG_OPEN_TIMEOUT=0
# CONFIG_WATCHDOG_SYSFS is not set
# CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT is not set

#
# Watchdog Pretimeout Governors
#
# CONFIG_WATCHDOG_PRETIMEOUT_GOV is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_XILINX_WATCHDOG is not set
# CONFIG_CADENCE_WATCHDOG is not set
# CONFIG_DW_WATCHDOG is not set
# CONFIG_MAX63XX_WATCHDOG is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_I6300ESB_WDT is not set
CONFIG_DIAG288_WATCHDOG=m

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_MADERA is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_LPC_ICH is not set
# CONFIG_LPC_SCH is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_KEMPLD is not set
# CONFIG_MFD_MT6397 is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_MFD_SYSCON is not set
# CONFIG_MFD_TI_AM335X_TSCADC is not set
# CONFIG_MFD_TQMX86 is not set
# CONFIG_MFD_VX855 is not set
# end of Multifunction device drivers

# CONFIG_REGULATOR is not set
# CONFIG_RC_CORE is not set

#
# CEC support
#
# CONFIG_MEDIA_CEC_SUPPORT is not set
# end of CEC support

# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_DRM_DEBUG_MODESET_LOCK is not set

#
# ARM devices
#
# end of ARM devices

#
# Frame buffer Devices
#
CONFIG_FB_CMDLINE=y
CONFIG_FB_NOTIFY=y
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_MODE_HELPERS is not set
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_OPENCORES is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_IBM_GXT4500 is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_SIMPLE is not set
# CONFIG_FB_SM712 is not set
# end of Frame buffer Devices

#
# Backlight & LCD device support
#
# CONFIG_LCD_CLASS_DEVICE is not set
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
# end of Backlight & LCD device support

#
# Console display driver support
#
CONFIG_DUMMY_CONSOLE=y
CONFIG_DUMMY_CONSOLE_COLUMNS=80
CONFIG_DUMMY_CONSOLE_ROWS=25
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION is not set
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
# CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER is not set
# end of Console display driver support

# CONFIG_LOGO is not set
# end of Graphics support

# CONFIG_SOUND is not set

#
# HID support
#
# CONFIG_HID is not set
# end of HID support

CONFIG_USB_OHCI_LITTLE_ENDIAN=y
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
# CONFIG_INFINIBAND_USER_MAD is not set
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS=y
CONFIG_INFINIBAND_VIRT_DMA=y
# CONFIG_INFINIBAND_MTHCA is not set
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX5_INFINIBAND=m
# CONFIG_INFINIBAND_OCRDMA is not set
# CONFIG_RDMA_RXE is not set
# CONFIG_RDMA_SIW is not set
# CONFIG_INFINIBAND_IPOIB is not set
# CONFIG_INFINIBAND_SRP is not set
# CONFIG_INFINIBAND_ISER is not set
# CONFIG_INFINIBAND_RTRS_CLIENT is not set
# CONFIG_INFINIBAND_RTRS_SERVER is not set
# CONFIG_DMADEVICES is not set

#
# DMABUF options
#
CONFIG_SYNC_FILE=y
# CONFIG_SW_SYNC is not set
# CONFIG_UDMABUF is not set
# CONFIG_DMABUF_MOVE_NOTIFY is not set
# CONFIG_DMABUF_DEBUG is not set
# CONFIG_DMABUF_SELFTESTS is not set
# CONFIG_DMABUF_HEAPS is not set
# CONFIG_DMABUF_SYSFS_STATS is not set
# end of DMABUF options

# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
CONFIG_VFIO=m
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_VFIO_VIRQFD=m
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI_CORE=m
CONFIG_VFIO_PCI=m
CONFIG_MLX5_VFIO_PCI=m
CONFIG_VFIO_MDEV=m
CONFIG_IRQ_BYPASS_MANAGER=m
# CONFIG_VIRT_DRIVERS is not set
CONFIG_VIRTIO=y
CONFIG_VIRTIO_PCI_LIB=m
CONFIG_VIRTIO_PCI_LIB_LEGACY=m
CONFIG_VIRTIO_MENU=y
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_LEGACY=y
CONFIG_VIRTIO_BALLOON=m
CONFIG_VIRTIO_INPUT=y
# CONFIG_VIRTIO_MMIO is not set
# CONFIG_VDPA is not set
CONFIG_VHOST_IOTLB=m
CONFIG_VHOST=m
CONFIG_VHOST_MENU=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_VSOCK=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

#
# Microsoft Hyper-V guest support
#
# end of Microsoft Hyper-V guest support

# CONFIG_GREYBUS is not set
# CONFIG_COMEDI is not set
# CONFIG_STAGING is not set
# CONFIG_GOLDFISH is not set
# CONFIG_COMMON_CLK is not set
# CONFIG_HWSPINLOCK is not set

#
# Clock Source drivers
#
# end of Clock Source drivers

# CONFIG_MAILBOX is not set
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y

#
# Generic IOMMU Pagetable Support
#
# end of Generic IOMMU Pagetable Support

# CONFIG_IOMMU_DEBUGFS is not set
CONFIG_IOMMU_DEFAULT_DMA_STRICT=y
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
# CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set
CONFIG_S390_IOMMU=y
CONFIG_S390_CCW_IOMMU=y
CONFIG_S390_AP_IOMMU=y

#
# Remoteproc drivers
#
# CONFIG_REMOTEPROC is not set
# end of Remoteproc drivers

#
# Rpmsg drivers
#
# CONFIG_RPMSG_VIRTIO is not set
# end of Rpmsg drivers

#
# SOC (System On Chip) specific Drivers
#

#
# Amlogic SoC drivers
#
# end of Amlogic SoC drivers

#
# Broadcom SoC drivers
#
# end of Broadcom SoC drivers

#
# NXP/Freescale QorIQ SoC drivers
#
# end of NXP/Freescale QorIQ SoC drivers

#
# i.MX SoC drivers
#
# end of i.MX SoC drivers

#
# Enable LiteX SoC Builder specific drivers
#
# end of Enable LiteX SoC Builder specific drivers

#
# Qualcomm SoC drivers
#
# end of Qualcomm SoC drivers

# CONFIG_SOC_TI is not set

#
# Xilinx SoC drivers
#
# end of Xilinx SoC drivers
# end of SOC (System On Chip) specific Drivers

# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_NTB is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# IRQ chip support
#
# end of IRQ chip support

# CONFIG_IPACK_BUS is not set
# CONFIG_RESET_CONTROLLER is not set

#
# PHY Subsystem
#
# CONFIG_GENERIC_PHY is not set
# CONFIG_PHY_CAN_TRANSCEIVER is not set

#
# PHY drivers for Broadcom platforms
#
# CONFIG_BCM_KONA_USB2_PHY is not set
# end of PHY drivers for Broadcom platforms

# CONFIG_PHY_PXA_28NM_HSIC is not set
# CONFIG_PHY_PXA_28NM_USB2 is not set
# end of PHY Subsystem

# CONFIG_POWERCAP is not set
# CONFIG_MCB is not set

#
# Performance monitor support
#
# end of Performance monitor support

# CONFIG_RAS is not set
# CONFIG_USB4 is not set

#
# Android
#
# CONFIG_ANDROID is not set
# end of Android

# CONFIG_LIBNVDIMM is not set
CONFIG_DAX=y
# CONFIG_DEV_DAX is not set
# CONFIG_NVMEM is not set

#
# HW tracing support
#
# CONFIG_STM is not set
# CONFIG_INTEL_TH is not set
# end of HW tracing support

# CONFIG_FPGA is not set
# CONFIG_SIOX is not set
# CONFIG_SLIMBUS is not set
# CONFIG_INTERCONNECT is not set
# CONFIG_COUNTER is not set
# CONFIG_MOST is not set
# CONFIG_PECI is not set
# CONFIG_HTE is not set
# end of Device Drivers

#
# File systems
#
# CONFIG_VALIDATE_FS_PARSER is not set
CONFIG_FS_IOMAP=y
# CONFIG_EXT2_FS is not set
# CONFIG_EXT3_FS is not set
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set
# CONFIG_EXT4_KUNIT_TESTS is not set
CONFIG_JBD2=y
CONFIG_JBD2_DEBUG=y
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
CONFIG_JFS_STATISTICS=y
CONFIG_XFS_FS=y
CONFIG_XFS_SUPPORT_V4=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_XFS_ONLINE_SCRUB is not set
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=y
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
CONFIG_OCFS2_DEBUG_MASKLOG=y
# CONFIG_OCFS2_DEBUG_FS is not set
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
# CONFIG_BTRFS_FS_REF_VERIFY is not set
CONFIG_NILFS2_FS=m
# CONFIG_F2FS_FS is not set
CONFIG_FS_DAX=y
CONFIG_FS_DAX_LIMITED=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_EXPORTFS_BLOCK_OPS=y
CONFIG_FILE_LOCKING=y
CONFIG_FS_ENCRYPTION=y
CONFIG_FS_ENCRYPTION_ALGS=y
# CONFIG_FS_ENCRYPTION_INLINE_CRYPT is not set
CONFIG_FS_VERITY=y
# CONFIG_FS_VERITY_DEBUG is not set
CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
CONFIG_AUTOFS_FS=m
CONFIG_FUSE_FS=y
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_FUSE_DAX=y
CONFIG_OVERLAY_FS=m
# CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
# CONFIG_OVERLAY_FS_INDEX is not set
# CONFIG_OVERLAY_FS_XINO_AUTO is not set
# CONFIG_OVERLAY_FS_METACOPY is not set

#
# Caches
#
CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_ERROR_INJECTION is not set
# CONFIG_CACHEFILES_ONDEMAND is not set
# end of Caches

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
# end of CD-ROM/DVD Filesystems

#
# DOS/FAT/EXFAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_FAT_DEFAULT_UTF8 is not set
# CONFIG_FAT_KUNIT_TEST is not set
CONFIG_EXFAT_FS=m
CONFIG_EXFAT_DEFAULT_IOCHARSET="utf8"
CONFIG_NTFS_FS=m
# CONFIG_NTFS_DEBUG is not set
CONFIG_NTFS_RW=y
# CONFIG_NTFS3_FS is not set
# end of DOS/FAT/EXFAT/NT Filesystems

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
# CONFIG_PROC_VMCORE_DEVICE_DUMP is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_CHILDREN=y
CONFIG_KERNFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_TMPFS_INODE64=y
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_MEMFD_CREATE=y
CONFIG_ARCH_HAS_GIGANTIC_PAGE=y
CONFIG_CONFIGFS_FS=m
# end of Pseudo filesystems

CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ORANGEFS_FS is not set
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=m
# CONFIG_ECRYPT_FS_MESSAGING is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
CONFIG_CRAMFS_BLOCKDEV=y
CONFIG_SQUASHFS=m
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_SQUASHFS_ZSTD=y
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
CONFIG_ROMFS_ON_BLOCK=y
# CONFIG_PSTORE is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_EROFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V2=m
CONFIG_NFS_V3=m
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
# CONFIG_NFS_V4_1 is not set
# CONFIG_NFS_FSCACHE is not set
# CONFIG_NFS_USE_LEGACY_DNS is not set
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFS_DISABLE_UDP_SUPPORT=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
# CONFIG_NFSD_BLOCKLAYOUT is not set
# CONFIG_NFSD_SCSILAYOUT is not set
# CONFIG_NFSD_FLEXFILELAYOUT is not set
CONFIG_NFSD_V4_SECURITY_LABEL=y
CONFIG_GRACE_PERIOD=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_SWAP=y
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES is not set
# CONFIG_SUNRPC_DEBUG is not set
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_CEPH_FS is not set
CONFIG_CIFS=m
CONFIG_CIFS_STATS2=y
CONFIG_CIFS_ALLOW_INSECURE_LEGACY=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG is not set
CONFIG_CIFS_DFS_UPCALL=y
CONFIG_CIFS_SWN_UPCALL=y
# CONFIG_CIFS_SMB_DIRECT is not set
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_SMB_SERVER is not set
CONFIG_SMBFS_COMMON=m
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=m
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=m
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
# CONFIG_DLM_DEBUG is not set
CONFIG_UNICODE=y
# CONFIG_UNICODE_NORMALIZATION_SELFTEST is not set
CONFIG_IO_WQ=y
# end of File systems

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_REQUEST_CACHE is not set
CONFIG_PERSISTENT_KEYRINGS=y
# CONFIG_TRUSTED_KEYS is not set
CONFIG_ENCRYPTED_KEYS=m
# CONFIG_USER_DECRYPTED_DATA is not set
# CONFIG_KEY_DH_OPERATIONS is not set
CONFIG_KEY_NOTIFICATIONS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITY_WRITABLE_HOOKS=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_INFINIBAND is not set
# CONFIG_SECURITY_NETWORK_XFRM is not set
CONFIG_SECURITY_PATH=y
CONFIG_LSM_MMAP_MIN_ADDR=65536
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
# CONFIG_HARDENED_USERCOPY is not set
# CONFIG_FORTIFY_SOURCE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=0
CONFIG_SECURITY_SELINUX_SIDTAB_HASH_BITS=9
CONFIG_SECURITY_SELINUX_SID2STR_CACHE_SIZE=256
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_LOADPIN is not set
# CONFIG_SECURITY_YAMA is not set
# CONFIG_SECURITY_SAFESETID is not set
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY is not set
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set
CONFIG_SECURITY_LANDLOCK=y
CONFIG_INTEGRITY=y
CONFIG_INTEGRITY_SIGNATURE=y
CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
CONFIG_INTEGRITY_TRUSTED_KEYRING=y
CONFIG_INTEGRITY_AUDIT=y
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_LSM_RULES=y
CONFIG_IMA_NG_TEMPLATE=y
# CONFIG_IMA_SIG_TEMPLATE is not set
CONFIG_IMA_DEFAULT_TEMPLATE="ima-ng"
# CONFIG_IMA_DEFAULT_HASH_SHA1 is not set
CONFIG_IMA_DEFAULT_HASH_SHA256=y
# CONFIG_IMA_DEFAULT_HASH_SHA512 is not set
CONFIG_IMA_DEFAULT_HASH="sha256"
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_READ_POLICY=y
CONFIG_IMA_APPRAISE=y
# CONFIG_IMA_ARCH_POLICY is not set
# CONFIG_IMA_APPRAISE_BUILD_POLICY is not set
CONFIG_IMA_APPRAISE_BOOTPARAM=y
# CONFIG_IMA_APPRAISE_MODSIG is not set
CONFIG_IMA_TRUSTED_KEYRING=y
# CONFIG_IMA_BLACKLIST_KEYRING is not set
# CONFIG_IMA_LOAD_X509 is not set
CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS=y
CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS=y
# CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT is not set
# CONFIG_IMA_DISABLE_HTABLE is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
# CONFIG_INIT_STACK_NONE is not set
# CONFIG_INIT_STACK_ALL_PATTERN is not set
CONFIG_INIT_STACK_ALL_ZERO=y
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
CONFIG_CC_HAS_ZERO_CALL_USED_REGS=y
# CONFIG_ZERO_CALL_USED_REGS is not set
# end of Memory initialization

CONFIG_RANDSTRUCT_NONE=y
# end of Kernel hardening options
# end of Security options

CONFIG_XOR_BLOCKS=y
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_SKCIPHER=y
CONFIG_CRYPTO_SKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_RNG_DEFAULT=y
CONFIG_CRYPTO_AKCIPHER2=y
CONFIG_CRYPTO_AKCIPHER=y
CONFIG_CRYPTO_KPP2=y
CONFIG_CRYPTO_KPP=m
CONFIG_CRYPTO_ACOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
# CONFIG_CRYPTO_MANAGER_EXTRA_TESTS is not set
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_NULL=y
CONFIG_CRYPTO_NULL2=y
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_CRYPTD=m
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_ENGINE=m

#
# Public-key cryptography
#
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_DH=m
# CONFIG_CRYPTO_DH_RFC7919_GROUPS is not set
CONFIG_CRYPTO_ECC=m
CONFIG_CRYPTO_ECDH=m
CONFIG_CRYPTO_ECDSA=m
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_SM2=m
CONFIG_CRYPTO_CURVE25519=m

#
# Authenticated Encryption with Associated Data
#
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_ECHAINIV=m

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTR=y
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_NHPOLY1305=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_ESSIV=m

#
# Hash modes
#
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_XXHASH=y
CONFIG_CRYPTO_BLAKE2B=y
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRC64_ROCKSOFT=y
CONFIG_CRYPTO_GHASH=y
CONFIG_CRYPTO_POLY1305=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_SHA512=y
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_SM3=m
# CONFIG_CRYPTO_SM3_GENERIC is not set
CONFIG_CRYPTO_STREEBOG=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_COMMON=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST_COMMON=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_SM4_GENERIC is not set
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_842=m
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
CONFIG_CRYPTO_ZSTD=m

#
# Random Number Generation
#
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_MENU=y
CONFIG_CRYPTO_DRBG_HMAC=y
# CONFIG_CRYPTO_DRBG_HASH is not set
# CONFIG_CRYPTO_DRBG_CTR is not set
CONFIG_CRYPTO_DRBG=y
CONFIG_CRYPTO_JITTERENTROPY=y
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
CONFIG_CRYPTO_USER_API_RNG=m
# CONFIG_CRYPTO_USER_API_RNG_CAVP is not set
CONFIG_CRYPTO_USER_API_AEAD=m
CONFIG_CRYPTO_USER_API_ENABLE_OBSOLETE=y
CONFIG_CRYPTO_STATS=y
CONFIG_CRYPTO_HASH_INFO=y
CONFIG_CRYPTO_HW=y
CONFIG_ZCRYPT=m
# CONFIG_ZCRYPT_DEBUG is not set
CONFIG_ZCRYPT_MULTIDEVNODES=y
CONFIG_PKEY=m
CONFIG_CRYPTO_PAES_S390=m
CONFIG_CRYPTO_SHA1_S390=m
CONFIG_CRYPTO_SHA256_S390=m
CONFIG_CRYPTO_SHA512_S390=m
CONFIG_CRYPTO_SHA3_256_S390=m
CONFIG_CRYPTO_SHA3_512_S390=m
CONFIG_CRYPTO_DES_S390=m
CONFIG_CRYPTO_AES_S390=m
CONFIG_CRYPTO_CHACHA_S390=m
CONFIG_S390_PRNG=m
CONFIG_CRYPTO_GHASH_S390=m
CONFIG_CRYPTO_CRC32_S390=y
# CONFIG_CRYPTO_DEV_NITROX_CNN55XX is not set
CONFIG_CRYPTO_DEV_VIRTIO=m
# CONFIG_CRYPTO_DEV_SAFEXCEL is not set
# CONFIG_CRYPTO_DEV_AMLOGIC_GXL is not set
CONFIG_ASYMMETRIC_KEY_TYPE=y
CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
CONFIG_X509_CERTIFICATE_PARSER=y
# CONFIG_PKCS8_PRIVATE_KEY_PARSER is not set
CONFIG_PKCS7_MESSAGE_PARSER=y
# CONFIG_PKCS7_TEST_KEY is not set
# CONFIG_SIGNED_PE_FILE_VERIFICATION is not set
# CONFIG_FIPS_SIGNATURE_SELFTEST is not set

#
# Certificates for signature checking
#
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
CONFIG_MODULE_SIG_KEY_TYPE_RSA=y
# CONFIG_MODULE_SIG_KEY_TYPE_ECDSA is not set
CONFIG_SYSTEM_TRUSTED_KEYRING=y
CONFIG_SYSTEM_TRUSTED_KEYS=""
# CONFIG_SYSTEM_EXTRA_CERTIFICATE is not set
# CONFIG_SECONDARY_TRUSTED_KEYRING is not set
# CONFIG_SYSTEM_BLACKLIST_KEYRING is not set
# end of Certificates for signature checking

CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_RAID6_PQ=y
CONFIG_RAID6_PQ_BENCHMARK=y
# CONFIG_PACKING is not set
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_NET_UTILS=y
CONFIG_CORDIC=m
CONFIG_PRIME_NUMBERS=m
CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y

#
# Crypto library routines
#
CONFIG_CRYPTO_LIB_AES=y
CONFIG_CRYPTO_LIB_ARC4=m
CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y
CONFIG_CRYPTO_ARCH_HAVE_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CHACHA_GENERIC=m
CONFIG_CRYPTO_LIB_CHACHA=m
CONFIG_CRYPTO_LIB_CURVE25519_GENERIC=m
CONFIG_CRYPTO_LIB_CURVE25519=m
CONFIG_CRYPTO_LIB_DES=m
CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
CONFIG_CRYPTO_LIB_POLY1305_GENERIC=m
CONFIG_CRYPTO_LIB_POLY1305=m
CONFIG_CRYPTO_LIB_CHACHA20POLY1305=m
CONFIG_CRYPTO_LIB_SHA256=y
# end of Crypto library routines

CONFIG_LIB_MEMNEQ=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=y
CONFIG_CRC64_ROCKSOFT=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
CONFIG_CRC64=y
CONFIG_CRC4=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=y
CONFIG_CRC8=m
CONFIG_XXHASH=y
# CONFIG_RANDOM32_SELFTEST is not set
CONFIG_842_COMPRESS=m
CONFIG_842_DECOMPRESS=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_ZLIB_DFLTCC=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_LZ4_COMPRESS=m
CONFIG_LZ4HC_COMPRESS=m
CONFIG_LZ4_DECOMPRESS=y
CONFIG_ZSTD_COMPRESS=y
CONFIG_ZSTD_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_MICROLZMA=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_DECOMPRESS_LZ4=y
CONFIG_DECOMPRESS_ZSTD=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_INTERVAL_TREE=y
CONFIG_XARRAY_MULTI=y
CONFIG_ASSOCIATIVE_ARRAY=y
CONFIG_HAS_DMA=y
CONFIG_DMA_OPS=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y
CONFIG_SWIOTLB=y
CONFIG_DMA_CMA=y
# CONFIG_DMA_PERNUMA_CMA is not set

#
# Default contiguous memory area size:
#
CONFIG_CMA_SIZE_MBYTES=0
CONFIG_CMA_SIZE_SEL_MBYTES=y
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
# CONFIG_CMA_SIZE_SEL_MIN is not set
# CONFIG_CMA_SIZE_SEL_MAX is not set
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_DMA_MAP_BENCHMARK is not set
CONFIG_SGL_ALLOC=y
CONFIG_IOMMU_HELPER=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_GLOB=y
# CONFIG_GLOB_SELFTEST is not set
CONFIG_NLATTR=y
CONFIG_LRU_CACHE=m
CONFIG_CLZ_TAB=y
CONFIG_IRQ_POLL=y
CONFIG_MPILIB=y
CONFIG_SIGNATURE=y
CONFIG_DIMLIB=y
CONFIG_OID_REGISTRY=y
CONFIG_HAVE_GENERIC_VDSO=y
CONFIG_GENERIC_GETTIMEOFDAY=y
CONFIG_GENERIC_VDSO_TIME_NS=y
CONFIG_FONT_SUPPORT=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SG_POOL=y
CONFIG_ARCH_STACKWALK=y
CONFIG_STACKDEPOT=y
CONFIG_STACK_HASH_ORDER=20
CONFIG_SBITMAP=y
# end of Library routines

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
# CONFIG_PRINTK_CALLER is not set
# CONFIG_STACKTRACE_BUILD_ID is not set
CONFIG_CONSOLE_LOGLEVEL_DEFAULT=7
CONFIG_CONSOLE_LOGLEVEL_QUIET=4
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DYNAMIC_DEBUG_CORE=y
CONFIG_SYMBOLIC_ERRNAME=y
CONFIG_DEBUG_BUGVERBOSE=y
# end of printk and dmesg options

CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
# CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT is not set
CONFIG_DEBUG_INFO_DWARF4=y
# CONFIG_DEBUG_INFO_DWARF5 is not set
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=2048
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_HEADERS_INSTALL is not set
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_VMLINUX_MAP is not set
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
# end of Compile-time checks and compiler options

#
# Generic Kernel Debugging Instruments
#
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_MAGIC_SYSRQ_SERIAL=y
CONFIG_MAGIC_SYSRQ_SERIAL_SEQUENCE=""
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
# CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set
# CONFIG_DEBUG_FS_ALLOW_NONE is not set
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set
CONFIG_HAVE_ARCH_KCSAN=y
CONFIG_HAVE_KCSAN_COMPILER=y
# CONFIG_KCSAN is not set
# end of Generic Kernel Debugging Instruments

#
# Networking Debugging
#
# CONFIG_NET_DEV_REFCNT_TRACKER is not set
# CONFIG_NET_NS_REFCNT_TRACKER is not set
# CONFIG_DEBUG_NET is not set
# end of Networking Debugging

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_PAGE_OWNER is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_DEBUG_PAGE_REF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_ARCH_HAS_DEBUG_WX=y
CONFIG_DEBUG_WX=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_PTDUMP_CORE=y
CONFIG_PTDUMP_DEBUGFS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_PER_CPU_MAPS is not set
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
# CONFIG_KASAN is not set
CONFIG_HAVE_ARCH_KFENCE=y
# CONFIG_KFENCE is not set
# end of Memory Debugging

# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Oops, Lockups and Hangs
#
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_ON_OOPS_VALUE=1
CONFIG_PANIC_TIMEOUT=0
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_WQ_WATCHDOG is not set
CONFIG_TEST_LOCKUP=m
# end of Debug Oops, Lockups and Hangs

#
# Scheduler Debugging
#
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging

# CONFIG_DEBUG_TIMEKEEPING is not set

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

CONFIG_TRACE_IRQFLAGS=y
# CONFIG_DEBUG_IRQFLAGS is not set
CONFIG_STACKTRACE=y
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
# CONFIG_DEBUG_KOBJECT is not set

#
# Debug kernel data structures
#
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_PLIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_BUG_ON_DATA_CORRUPTION=y
# end of Debug kernel data structures

# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_TORTURE_TEST=m
# CONFIG_RCU_SCALE_TEST is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_REF_SCALE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
CONFIG_RCU_TRACE=y
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging

# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
CONFIG_LATENCYTOP=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_NOP_MCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_BOOTTIME_TRACING=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_STACK_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
# CONFIG_TIMERLAT_TRACER is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_UPROBE_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_TRACING_MAP=y
CONFIG_SYNTH_EVENTS=y
CONFIG_HIST_TRIGGERS=y
# CONFIG_TRACE_EVENT_INJECT is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_TRACE_EVAL_MAP_FILE is not set
# CONFIG_FTRACE_RECORD_RECURSION is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
# CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
# CONFIG_SYNTH_EVENT_GEN_TEST is not set
# CONFIG_KPROBE_EVENT_GEN_TEST is not set
# CONFIG_HIST_TRIGGERS_DEBUG is not set
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_AUXDISPLAY is not set
# CONFIG_SAMPLE_TRACE_EVENTS is not set
# CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS is not set
CONFIG_SAMPLE_TRACE_PRINTK=m
CONFIG_SAMPLE_FTRACE_DIRECT=m
CONFIG_SAMPLE_FTRACE_DIRECT_MULTI=m
# CONFIG_SAMPLE_TRACE_ARRAY is not set
# CONFIG_SAMPLE_KOBJECT is not set
# CONFIG_SAMPLE_KPROBES is not set
# CONFIG_SAMPLE_KFIFO is not set
# CONFIG_SAMPLE_LIVEPATCH is not set
# CONFIG_SAMPLE_CONFIGFS is not set
# CONFIG_SAMPLE_VFIO_MDEV_MTTY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY is not set
# CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB is not set
# CONFIG_SAMPLE_VFIO_MDEV_MBOCHS is not set
# CONFIG_SAMPLE_WATCHDOG is not set
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
# CONFIG_STRICT_DEVMEM is not set

#
# s390 Debugging
#
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_ENTRY is not set
# CONFIG_CIO_INJECT is not set
# end of s390 Debugging

#
# Kernel Testing and Coverage
#
CONFIG_KUNIT=m
CONFIG_KUNIT_DEBUGFS=y
# CONFIG_KUNIT_TEST is not set
# CONFIG_KUNIT_EXAMPLE_TEST is not set
# CONFIG_KUNIT_ALL_TESTS is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
CONFIG_FUNCTION_ERROR_INJECTION=y
# CONFIG_FAULT_INJECTION is not set
CONFIG_ARCH_HAS_KCOV=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
# CONFIG_KCOV is not set
CONFIG_RUNTIME_TESTING_MENU=y
CONFIG_LKDTM=m
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_TEST_MIN_HEAP is not set
# CONFIG_TEST_SORT is not set
# CONFIG_TEST_DIV64 is not set
CONFIG_KPROBES_SANITY_TEST=m
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_TEST_REF_TRACKER is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_REED_SOLOMON_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
# CONFIG_ASYNC_RAID6_TEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_STRING_SELFTEST is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_STRSCPY is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_SCANF is not set
# CONFIG_TEST_BITMAP is not set
# CONFIG_TEST_UUID is not set
# CONFIG_TEST_XARRAY is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_TEST_SIPHASH is not set
# CONFIG_TEST_IDA is not set
# CONFIG_TEST_LKM is not set
# CONFIG_TEST_BITOPS is not set
# CONFIG_TEST_VMALLOC is not set
# CONFIG_TEST_USER_COPY is not set
CONFIG_TEST_BPF=m
# CONFIG_TEST_BLACKHOLE_DEV is not set
# CONFIG_FIND_BIT_BENCHMARK is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_SYSCTL is not set
# CONFIG_BITFIELD_KUNIT is not set
# CONFIG_HASH_KUNIT_TEST is not set
# CONFIG_RESOURCE_KUNIT_TEST is not set
# CONFIG_SYSCTL_KUNIT_TEST is not set
# CONFIG_LIST_KUNIT_TEST is not set
# CONFIG_LINEAR_RANGES_TEST is not set
# CONFIG_CMDLINE_KUNIT_TEST is not set
# CONFIG_BITS_TEST is not set
# CONFIG_SLUB_KUNIT_TEST is not set
# CONFIG_MEMCPY_KUNIT_TEST is not set
# CONFIG_OVERFLOW_KUNIT_TEST is not set
# CONFIG_STACKINIT_KUNIT_TEST is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_TEST_STATIC_KEYS is not set
# CONFIG_TEST_KMOD is not set
# CONFIG_TEST_MEMCAT_P is not set
CONFIG_TEST_LIVEPATCH=m
# CONFIG_TEST_MEMINIT is not set
# CONFIG_TEST_FREE_PAGES is not set
# end of Kernel Testing and Coverage
# end of Kernel hacking

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-28 23:15                       ` Steven Rostedt
  (?)
@ 2022-07-05 13:47                         ` Sven Schnelle
  -1 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 13:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

Hi,

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 21 Jun 2022 17:15:47 +0200
> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>
>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>> wait_task_inactive() is where it bails out:
>> 
>> 3303                 while (task_running(rq, p)) {
>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>> 3305                                 return 0;
>> 3306                         cpu_relax();
>> 3307                 }
>> 
>> Yet, the child task is always found in __TASK_TRACED state (as seen
>> in crash dumps):
>> 
>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>> crash> task bb04b200 __state  
>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>   __state = 8,
>> 
>> crash> task d0b10100 __state  
>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>   __state = 8,
>
> If you are using crash, can you enable all trace events?
>
> Then you should be able to extract the ftrace ring buffer from crash using
> the trace.so extend (https://github.com/fujitsu/crash-trace)
>
> I guess it should still work with s390.
>
> Then you can see the events that lead up to the crash.

Alexander is busy with other stuff, so I took over. I enabled the
sys,signal,sched and task tracepoints and ftrace_dump_on_oops. The last
lines from the trace buffer are:

[  281.043459]   strace-1177215   0d.... 269457070us : sched_waking: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043463] kill_chi-1177218   1d.... 269457070us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043467] kill_chi-1177218   1d.... 269457070us : sched_stat_runtime: comm=kill_child pid=1177218 runtime=5299 [ns] vruntime=1830714210855 [ns]
[  281.043471] kill_chi-1177218   1d.... 269457071us : sched_switch: prev_comm=kill_child prev_pid=1177218 prev_prio=120 prev_state=t ==> next_comm=swapper/1 next_pid=0 next_prio=120
[  281.043475]   strace-1177215   0..... 269457071us : sys_ptrace -> 0x50
[  281.043478]   strace-1177215   0..... 269457071us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 12)
[  281.043482]   strace-1177215   0..... 269457072us : sys_write -> 0x12
[  281.043485]   <idle>-0         3dNh.. 269457072us : sched_wakeup: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043489]   <idle>-0         3d.... 269457073us : sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=1178157 next_prio=120
[  281.043493]   strace-1177215   0..... 269457073us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 1a)
[  281.043496]   strace-1177215   0..... 269457073us : sys_write -> 0x1a
[  281.043500] kill_chi-1178157   3..... 269457073us : sys_sched_yield -> 0xffffffffffffffda
[  281.043504]   strace-1177215   0..... 269457073us : sys_ptrace(request: 18, pid: 11fa2d, addr: 0, data: 0)
[  281.043508] kill_chi-1178157   3d.... 269457073us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[  281.043511] kill_chi-1178157   3d.... 269457074us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043515] kill_chi-1178157   3d.... 269457074us : sched_stat_runtime: comm=kill_child pid=1178157 runtime=2408 [ns] vruntime=1983050055579 [ns]
[  281.043519] kill_chi-1178157   3d.... 269457075us : sched_switch: prev_comm=kill_child prev_pid=1178157 prev_prio=120 prev_state=t ==> next_comm=swapper/3 next_pid=0 next_prio=120

I attached the full output to this mail. I haven't yet tried to
understand the problem, i just wanted to send you the requested
information in the hope that it will help you.

Regards
Sven


[-- Attachment #2: dmesg.xz --]
[-- Type: application/x-xz, Size: 29004 bytes --]

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 13:47                         ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 13:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

Hi,

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 21 Jun 2022 17:15:47 +0200
> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>
>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>> wait_task_inactive() is where it bails out:
>> 
>> 3303                 while (task_running(rq, p)) {
>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>> 3305                                 return 0;
>> 3306                         cpu_relax();
>> 3307                 }
>> 
>> Yet, the child task is always found in __TASK_TRACED state (as seen
>> in crash dumps):
>> 
>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>> crash> task bb04b200 __state  
>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>   __state = 8,
>> 
>> crash> task d0b10100 __state  
>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>   __state = 8,
>
> If you are using crash, can you enable all trace events?
>
> Then you should be able to extract the ftrace ring buffer from crash using
> the trace.so extend (https://github.com/fujitsu/crash-trace)
>
> I guess it should still work with s390.
>
> Then you can see the events that lead up to the crash.

Alexander is busy with other stuff, so I took over. I enabled the
sys,signal,sched and task tracepoints and ftrace_dump_on_oops. The last
lines from the trace buffer are:

[  281.043459]   strace-1177215   0d.... 269457070us : sched_waking: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043463] kill_chi-1177218   1d.... 269457070us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043467] kill_chi-1177218   1d.... 269457070us : sched_stat_runtime: comm=kill_child pid=1177218 runtime=5299 [ns] vruntime=1830714210855 [ns]
[  281.043471] kill_chi-1177218   1d.... 269457071us : sched_switch: prev_comm=kill_child prev_pid=1177218 prev_prio=120 prev_state=t ==> next_comm=swapper/1 next_pid=0 next_prio=120
[  281.043475]   strace-1177215   0..... 269457071us : sys_ptrace -> 0x50
[  281.043478]   strace-1177215   0..... 269457071us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 12)
[  281.043482]   strace-1177215   0..... 269457072us : sys_write -> 0x12
[  281.043485]   <idle>-0         3dNh.. 269457072us : sched_wakeup: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043489]   <idle>-0         3d.... 269457073us : sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=1178157 next_prio=120
[  281.043493]   strace-1177215   0..... 269457073us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 1a)
[  281.043496]   strace-1177215   0..... 269457073us : sys_write -> 0x1a
[  281.043500] kill_chi-1178157   3..... 269457073us : sys_sched_yield -> 0xffffffffffffffda
[  281.043504]   strace-1177215   0..... 269457073us : sys_ptrace(request: 18, pid: 11fa2d, addr: 0, data: 0)
[  281.043508] kill_chi-1178157   3d.... 269457073us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[  281.043511] kill_chi-1178157   3d.... 269457074us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043515] kill_chi-1178157   3d.... 269457074us : sched_stat_runtime: comm=kill_child pid=1178157 runtime=2408 [ns] vruntime=1983050055579 [ns]
[  281.043519] kill_chi-1178157   3d.... 269457075us : sched_switch: prev_comm=kill_child prev_pid=1178157 prev_prio=120 prev_state=t ==> next_comm=swapper/3 next_pid=0 next_prio=120

I attached the full output to this mail. I haven't yet tried to
understand the problem, i just wanted to send you the requested
information in the hope that it will help you.

Regards
Sven


[-- Attachment #2: dmesg.xz --]
[-- Type: application/x-xz, Size: 29004 bytes --]

[-- Attachment #3: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 13:47                         ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 13:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

Hi,

Steven Rostedt <rostedt@goodmis.org> writes:

> On Tue, 21 Jun 2022 17:15:47 +0200
> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>
>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>> wait_task_inactive() is where it bails out:
>> 
>> 3303                 while (task_running(rq, p)) {
>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>> 3305                                 return 0;
>> 3306                         cpu_relax();
>> 3307                 }
>> 
>> Yet, the child task is always found in __TASK_TRACED state (as seen
>> in crash dumps):
>> 
>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>> crash> task bb04b200 __state  
>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>   __state = 8,
>> 
>> crash> task d0b10100 __state  
>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>   __state = 8,
>
> If you are using crash, can you enable all trace events?
>
> Then you should be able to extract the ftrace ring buffer from crash using
> the trace.so extend (https://github.com/fujitsu/crash-trace)
>
> I guess it should still work with s390.
>
> Then you can see the events that lead up to the crash.

Alexander is busy with other stuff, so I took over. I enabled the
sys,signal,sched and task tracepoints and ftrace_dump_on_oops. The last
lines from the trace buffer are:

[  281.043459]   strace-1177215   0d.... 269457070us : sched_waking: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043463] kill_chi-1177218   1d.... 269457070us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043467] kill_chi-1177218   1d.... 269457070us : sched_stat_runtime: comm=kill_child pid=1177218 runtime=5299 [ns] vruntime=1830714210855 [ns]
[  281.043471] kill_chi-1177218   1d.... 269457071us : sched_switch: prev_comm=kill_child prev_pid=1177218 prev_prio=120 prev_state=t ==> next_comm=swapper/1 next_pid=0 next_prio=120
[  281.043475]   strace-1177215   0..... 269457071us : sys_ptrace -> 0x50
[  281.043478]   strace-1177215   0..... 269457071us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 12)
[  281.043482]   strace-1177215   0..... 269457072us : sys_write -> 0x12
[  281.043485]   <idle>-0         3dNh.. 269457072us : sched_wakeup: comm=kill_child pid=1178157 prio=120 target_cpu=003
[  281.043489]   <idle>-0         3d.... 269457073us : sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=1178157 next_prio=120
[  281.043493]   strace-1177215   0..... 269457073us : sys_write(fd: 2, buf: 2aa15db3ad0, count: 1a)
[  281.043496]   strace-1177215   0..... 269457073us : sys_write -> 0x1a
[  281.043500] kill_chi-1178157   3..... 269457073us : sys_sched_yield -> 0xffffffffffffffda
[  281.043504]   strace-1177215   0..... 269457073us : sys_ptrace(request: 18, pid: 11fa2d, addr: 0, data: 0)
[  281.043508] kill_chi-1178157   3d.... 269457073us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[  281.043511] kill_chi-1178157   3d.... 269457074us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=1177215 grp=1 res=1
[  281.043515] kill_chi-1178157   3d.... 269457074us : sched_stat_runtime: comm=kill_child pid=1178157 runtime=2408 [ns] vruntime=1983050055579 [ns]
[  281.043519] kill_chi-1178157   3d.... 269457075us : sched_switch: prev_comm=kill_child prev_pid=1178157 prev_prio=120 prev_state=t ==> next_comm=swapper/3 next_pid=0 next_prio=120

I attached the full output to this mail. I haven't yet tried to
understand the problem, i just wanted to send you the requested
information in the hope that it will help you.

Regards
Sven


[-- Attachment #2: dmesg.xz --]
[-- Type: application/x-xz, Size: 29004 bytes --]

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-06-29  3:39                               ` Eric W. Biederman
  (?)
@ 2022-07-05 15:44                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 15:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, Alexander Gordeev, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:

> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?

So the thing is, s390 is, like x86, a TSO architecture with SC atomics.
Or at least it used to be; I'm not entirely solid on the Z196 features.

I've been looking at this and I can't find anything obviously wrong.
arch_spin_trylock_once() has what seems a spurious barrier() but that's
not going to cause this.

Specifically, s390 is using a simple test-and-set spinlock based on
their Compare-and-Swap (CS) instruction (so no Z196 funnies around).

Except perhaps arch_spin_unlock(), I can't grok the magic there. It does
something weird before the presumably regular TSO store of 0 into the
lock word.

Ooohh.. /me finds arch_spin_lock_queued().. *urfh* because obviously a
copy of queued spinlocks is what we need.

rwlock_t OTOH is using __atomic_add_*() and that's all Z196 magic.

Sven, does all this still reproduce if you take out
CONFIG_HAVE_MARCH_Z196_FEATURES ? Also, could you please explain the
Z196 bits or point me to the relevant section in the PoO. Additionally,
what's that _niai[48] stuff?

And I'm assuming s390 has hardware fairness on competing CS ?

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 15:44                                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 15:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, Alexander Gordeev, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:

> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?

So the thing is, s390 is, like x86, a TSO architecture with SC atomics.
Or at least it used to be; I'm not entirely solid on the Z196 features.

I've been looking at this and I can't find anything obviously wrong.
arch_spin_trylock_once() has what seems a spurious barrier() but that's
not going to cause this.

Specifically, s390 is using a simple test-and-set spinlock based on
their Compare-and-Swap (CS) instruction (so no Z196 funnies around).

Except perhaps arch_spin_unlock(), I can't grok the magic there. It does
something weird before the presumably regular TSO store of 0 into the
lock word.

Ooohh.. /me finds arch_spin_lock_queued().. *urfh* because obviously a
copy of queued spinlocks is what we need.

rwlock_t OTOH is using __atomic_add_*() and that's all Z196 magic.

Sven, does all this still reproduce if you take out
CONFIG_HAVE_MARCH_Z196_FEATURES ? Also, could you please explain the
Z196 bits or point me to the relevant section in the PoO. Additionally,
what's that _niai[48] stuff?

And I'm assuming s390 has hardware fairness on competing CS ?

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 15:44                                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 15:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Steven Rostedt, Alexander Gordeev, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jun 28, 2022 at 10:39:59PM -0500, Eric W. Biederman wrote:

> > That is, the two paths should already be synchronized, and the memory
> > barriers will not help anything inside the locks. The locking should (and
> > must) handle all that.
> 
> I would presume so to.  However the READ_ONCE that is going astray
> does not look like it is honoring that.
> 
> So perhaps there is a bug in the s390 spin_lock barriers?  Perhaps there
> is a subtle detail in the barriers that spin locks provide that we are
> overlooking?

So the thing is, s390 is, like x86, a TSO architecture with SC atomics.
Or at least it used to be; I'm not entirely solid on the Z196 features.

I've been looking at this and I can't find anything obviously wrong.
arch_spin_trylock_once() has what seems a spurious barrier() but that's
not going to cause this.

Specifically, s390 is using a simple test-and-set spinlock based on
their Compare-and-Swap (CS) instruction (so no Z196 funnies around).

Except perhaps arch_spin_unlock(), I can't grok the magic there. It does
something weird before the presumably regular TSO store of 0 into the
lock word.

Ooohh.. /me finds arch_spin_lock_queued().. *urfh* because obviously a
copy of queued spinlocks is what we need.

rwlock_t OTOH is using __atomic_add_*() and that's all Z196 magic.

Sven, does all this still reproduce if you take out
CONFIG_HAVE_MARCH_Z196_FEATURES ? Also, could you please explain the
Z196 bits or point me to the relevant section in the PoO. Additionally,
what's that _niai[48] stuff?

And I'm assuming s390 has hardware fairness on competing CS ?

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-05 13:47                         ` Sven Schnelle
  (?)
@ 2022-07-05 17:28                           ` Sven Schnelle
  -1 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 17:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]

Sven Schnelle <svens@linux.ibm.com> writes:

> Steven Rostedt <rostedt@goodmis.org> writes:
>
>> On Tue, 21 Jun 2022 17:15:47 +0200
>> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>>
>>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>>> wait_task_inactive() is where it bails out:
>>> 
>>> 3303                 while (task_running(rq, p)) {
>>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>>> 3305                                 return 0;
>>> 3306                         cpu_relax();
>>> 3307                 }
>>> 
>>> Yet, the child task is always found in __TASK_TRACED state (as seen
>>> in crash dumps):
>>> 
>>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>>> crash> task bb04b200 __state  
>>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>>   __state = 8,
>>> 
>>> crash> task d0b10100 __state  
>>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>>   __state = 8,
>>
>> If you are using crash, can you enable all trace events?
>>
>> Then you should be able to extract the ftrace ring buffer from crash using
>> the trace.so extend (https://github.com/fujitsu/crash-trace)
>>
>> I guess it should still work with s390.
>>
>> Then you can see the events that lead up to the crash.

I think there's a race in ptrace_check_attach(). It first calls
ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
If it is (and a few other conditions match) it will set ret = 0.

Later outside of siglock and tasklist_lock it will call
wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.

ptrace_stop(), which runs on another CPU, does:

set_special_state(TASK_TRACED);
current->jobctl |= JOBCTL_TRACED;

which looks ok on first sight, but in this case JOBCTL is already set,
so the reading CPU will immediately move on to wait_task_inactive(),
before JOBCTL_TRACED is set. I don't know whether this is a valid
combination. I never looked into JOBCTL_* semantics, but i guess now
is a good time to do so. I added some debugging statements, and that
gives:

[   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
[   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
[   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==> next_comm=swapper/2 next_pid=0 next_prio=120
[   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
[   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
[   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
[   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
[   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=343805 next_prio=120
[   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
[   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
[   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
[   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
[   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
[   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
[   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
[   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
[   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid=343805 runtime=2135 [ns] vruntime=556109013931 [ns]
[   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
[   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid=343805 prev_prio=120 prev_state=t ==>next_comm=swapper/6 next_pid=0 next_prio=120


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ptrace-debug.patch --]
[-- Type: text/x-diff, Size: 2758 bytes --]

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758..c2ddc47271b8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -154,8 +154,8 @@ struct task_group;
 	} while (0)
 
 #else
-# define debug_normal_state_change(cond)	do { } while (0)
-# define debug_special_state_change(cond)	do { } while (0)
+# define debug_normal_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
+# define debug_special_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
 # define debug_rtlock_wait_set_state()		do { } while (0)
 # define debug_rtlock_wait_restore_state()	do { } while (0)
 #endif
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..2cb2ae8acf23 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
+		trace_printk("task_is_traced: %d, fatal signal pending: %d\n", task_is_traced(task), __fatal_signal_pending(task));
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
@@ -263,8 +264,10 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
 		 */
-		if (ignore_state || ptrace_freeze_traced(child))
+		if (ignore_state || ptrace_freeze_traced(child)) {
+			trace_printk("child->pid = %d, child->__flags=%d\n", child->pid, child->__state);
 			ret = 0;
+		}
 	}
 	read_unlock(&tasklist_lock);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..73bb4c7882d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3301,8 +3301,10 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) {
+				trace_printk("NO MATCH: state %d, match_state %d, pid %d\n", p->__state, match_state, p->pid);
 				return 0;
+			}
 			cpu_relax();
 		}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..0ea8e6b6a641 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2232,6 +2232,8 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
+	if (current->jobctl & JOBCTL_TRACED)
+		trace_printk("JOBCTL_TRACED already set, state=%d\n", current->__state);
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
 

[-- Attachment #3: Type: text/plain, Size: 65 bytes --]


I'll continue debugging tomorrow, but maybe this helps already.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 17:28                           ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 17:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]

Sven Schnelle <svens@linux.ibm.com> writes:

> Steven Rostedt <rostedt@goodmis.org> writes:
>
>> On Tue, 21 Jun 2022 17:15:47 +0200
>> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>>
>>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>>> wait_task_inactive() is where it bails out:
>>> 
>>> 3303                 while (task_running(rq, p)) {
>>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>>> 3305                                 return 0;
>>> 3306                         cpu_relax();
>>> 3307                 }
>>> 
>>> Yet, the child task is always found in __TASK_TRACED state (as seen
>>> in crash dumps):
>>> 
>>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>>> crash> task bb04b200 __state  
>>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>>   __state = 8,
>>> 
>>> crash> task d0b10100 __state  
>>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>>   __state = 8,
>>
>> If you are using crash, can you enable all trace events?
>>
>> Then you should be able to extract the ftrace ring buffer from crash using
>> the trace.so extend (https://github.com/fujitsu/crash-trace)
>>
>> I guess it should still work with s390.
>>
>> Then you can see the events that lead up to the crash.

I think there's a race in ptrace_check_attach(). It first calls
ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
If it is (and a few other conditions match) it will set ret = 0.

Later outside of siglock and tasklist_lock it will call
wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.

ptrace_stop(), which runs on another CPU, does:

set_special_state(TASK_TRACED);
current->jobctl |= JOBCTL_TRACED;

which looks ok on first sight, but in this case JOBCTL is already set,
so the reading CPU will immediately move on to wait_task_inactive(),
before JOBCTL_TRACED is set. I don't know whether this is a valid
combination. I never looked into JOBCTL_* semantics, but i guess now
is a good time to do so. I added some debugging statements, and that
gives:

[   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
[   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
[   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==> next_comm=swapper/2 next_pid=0 next_prio=120
[   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
[   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
[   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
[   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
[   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=343805 next_prio=120
[   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
[   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
[   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
[   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
[   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
[   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
[   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
[   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
[   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid=343805 runtime=2135 [ns] vruntime=556109013931 [ns]
[   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
[   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid=343805 prev_prio=120 prev_state=t ==>next_comm=swapper/6 next_pid=0 next_prio=120


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ptrace-debug.patch --]
[-- Type: text/x-diff, Size: 2758 bytes --]

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758..c2ddc47271b8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -154,8 +154,8 @@ struct task_group;
 	} while (0)
 
 #else
-# define debug_normal_state_change(cond)	do { } while (0)
-# define debug_special_state_change(cond)	do { } while (0)
+# define debug_normal_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
+# define debug_special_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
 # define debug_rtlock_wait_set_state()		do { } while (0)
 # define debug_rtlock_wait_restore_state()	do { } while (0)
 #endif
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..2cb2ae8acf23 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
+		trace_printk("task_is_traced: %d, fatal signal pending: %d\n", task_is_traced(task), __fatal_signal_pending(task));
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
@@ -263,8 +264,10 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
 		 */
-		if (ignore_state || ptrace_freeze_traced(child))
+		if (ignore_state || ptrace_freeze_traced(child)) {
+			trace_printk("child->pid = %d, child->__flags=%d\n", child->pid, child->__state);
 			ret = 0;
+		}
 	}
 	read_unlock(&tasklist_lock);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..73bb4c7882d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3301,8 +3301,10 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) {
+				trace_printk("NO MATCH: state %d, match_state %d, pid %d\n", p->__state, match_state, p->pid);
 				return 0;
+			}
 			cpu_relax();
 		}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..0ea8e6b6a641 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2232,6 +2232,8 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
+	if (current->jobctl & JOBCTL_TRACED)
+		trace_printk("JOBCTL_TRACED already set, state=%d\n", current->__state);
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
 

[-- Attachment #3: Type: text/plain, Size: 65 bytes --]


I'll continue debugging tomorrow, but maybe this helps already.

[-- Attachment #4: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 17:28                           ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-05 17:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexander Gordeev, Eric W. Biederman, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Peter Zijlstra,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]

Sven Schnelle <svens@linux.ibm.com> writes:

> Steven Rostedt <rostedt@goodmis.org> writes:
>
>> On Tue, 21 Jun 2022 17:15:47 +0200
>> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
>>
>>> So I assume (checked actually) the return 0 below from kernel/sched/core.c:
>>> wait_task_inactive() is where it bails out:
>>> 
>>> 3303                 while (task_running(rq, p)) {
>>> 3304                         if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
>>> 3305                                 return 0;
>>> 3306                         cpu_relax();
>>> 3307                 }
>>> 
>>> Yet, the child task is always found in __TASK_TRACED state (as seen
>>> in crash dumps):
>>> 
>>> > 101447  11342  13      ce3a8100      RU   0.0   10040   4412  strace  
>>>   101450  101447   0      bb04b200      TR   0.0    2272   1136  kill_child
>>>   108261  101447   2      d0b10100      TR   0.0    2272    532  kill_child
>>> crash> task bb04b200 __state  
>>> PID: 101450  TASK: bb04b200          CPU: 0   COMMAND: "kill_child"
>>>   __state = 8,
>>> 
>>> crash> task d0b10100 __state  
>>> PID: 108261  TASK: d0b10100          CPU: 2   COMMAND: "kill_child"
>>>   __state = 8,
>>
>> If you are using crash, can you enable all trace events?
>>
>> Then you should be able to extract the ftrace ring buffer from crash using
>> the trace.so extend (https://github.com/fujitsu/crash-trace)
>>
>> I guess it should still work with s390.
>>
>> Then you can see the events that lead up to the crash.

I think there's a race in ptrace_check_attach(). It first calls
ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
If it is (and a few other conditions match) it will set ret = 0.

Later outside of siglock and tasklist_lock it will call
wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.

ptrace_stop(), which runs on another CPU, does:

set_special_state(TASK_TRACED);
current->jobctl |= JOBCTL_TRACED;

which looks ok on first sight, but in this case JOBCTL is already set,
so the reading CPU will immediately move on to wait_task_inactive(),
before JOBCTL_TRACED is set. I don't know whether this is a valid
combination. I never looked into JOBCTL_* semantics, but i guess now
is a good time to do so. I added some debugging statements, and that
gives:

[   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
[   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
[   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==> next_comm=swapper/2 next_pid=0 next_prio=120
[   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
[   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
[   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
[   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
[   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=343805 next_prio=120
[   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
[   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
[   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
[   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
[   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
[   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
[   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
[   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
[   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
[   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
[   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid=343805 runtime=2135 [ns] vruntime=556109013931 [ns]
[   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
[   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid=343805 prev_prio=120 prev_state=t ==>next_comm=swapper/6 next_pid=0 next_prio=120


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ptrace-debug.patch --]
[-- Type: text/x-diff, Size: 2758 bytes --]

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c46f3a63b758..c2ddc47271b8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -154,8 +154,8 @@ struct task_group;
 	} while (0)
 
 #else
-# define debug_normal_state_change(cond)	do { } while (0)
-# define debug_special_state_change(cond)	do { } while (0)
+# define debug_normal_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
+# define debug_special_state_change(cond)	do { trace_printk("state %d\n", cond); } while (0)
 # define debug_rtlock_wait_set_state()		do { } while (0)
 # define debug_rtlock_wait_restore_state()	do { } while (0)
 #endif
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 156a99283b11..2cb2ae8acf23 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -202,6 +202,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
 	spin_lock_irq(&task->sighand->siglock);
 	if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
 	    !__fatal_signal_pending(task)) {
+		trace_printk("task_is_traced: %d, fatal signal pending: %d\n", task_is_traced(task), __fatal_signal_pending(task));
 		task->jobctl |= JOBCTL_PTRACE_FROZEN;
 		ret = true;
 	}
@@ -263,8 +264,10 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 		 * child->sighand can't be NULL, release_task()
 		 * does ptrace_unlink() before __exit_signal().
 		 */
-		if (ignore_state || ptrace_freeze_traced(child))
+		if (ignore_state || ptrace_freeze_traced(child)) {
+			trace_printk("child->pid = %d, child->__flags=%d\n", child->pid, child->__state);
 			ret = 0;
+		}
 	}
 	read_unlock(&tasklist_lock);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..73bb4c7882d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3301,8 +3301,10 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state
 		 * is actually now running somewhere else!
 		 */
 		while (task_running(rq, p)) {
-			if (match_state && unlikely(READ_ONCE(p->__state) != match_state))
+			if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) {
+				trace_printk("NO MATCH: state %d, match_state %d, pid %d\n", p->__state, match_state, p->pid);
 				return 0;
+			}
 			cpu_relax();
 		}
 
diff --git a/kernel/signal.c b/kernel/signal.c
index edb1dc9b00dc..0ea8e6b6a641 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2232,6 +2232,8 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
 	if (!current->ptrace || __fatal_signal_pending(current))
 		return exit_code;
 
+	if (current->jobctl & JOBCTL_TRACED)
+		trace_printk("JOBCTL_TRACED already set, state=%d\n", current->__state);
 	set_special_state(TASK_TRACED);
 	current->jobctl |= JOBCTL_TRACED;
 

[-- Attachment #3: Type: text/plain, Size: 65 bytes --]


I'll continue debugging tomorrow, but maybe this helps already.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-05 17:28                           ` Sven Schnelle
  (?)
@ 2022-07-05 19:25                             ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 19:25 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
> Sven Schnelle <svens@linux.ibm.com> writes:

> I think there's a race in ptrace_check_attach(). It first calls
> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
> If it is (and a few other conditions match) it will set ret = 0.
> 
> Later outside of siglock and tasklist_lock it will call
> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
> 
> ptrace_stop(), which runs on another CPU, does:
> 
> set_special_state(TASK_TRACED);
> current->jobctl |= JOBCTL_TRACED;
> 
> which looks ok on first sight, but in this case JOBCTL is already set,
> so the reading CPU will immediately move on to wait_task_inactive(),
> before JOBCTL_TRACED is set. I don't know whether this is a valid
> combination. I never looked into JOBCTL_* semantics, but i guess now
> is a good time to do so. I added some debugging statements, and that
> gives:
> 
> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
> [   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==> next_comm=swapper/2 next_pid=0 next_prio=120
> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
> [   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=343805 next_prio=120
> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?

Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
__TASK_TRACED for now. Set when __TASK_TRACED, cleared when
TASK_RUNNING.

Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
when they would wake a __TASK_TRACED task.

> [   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
> [   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
> [   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid=343805 runtime=2135 [ns] vruntime=556109013931 [ns]
> [   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
> [   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid=343805 prev_prio=120 prev_state=t ==>next_comm=swapper/6 next_pid=0 next_prio=120
> 

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 19:25                             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 19:25 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
> Sven Schnelle <svens@linux.ibm.com> writes:

> I think there's a race in ptrace_check_attach(). It first calls
> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
> If it is (and a few other conditions match) it will set ret = 0.
> 
> Later outside of siglock and tasklist_lock it will call
> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
> 
> ptrace_stop(), which runs on another CPU, does:
> 
> set_special_state(TASK_TRACED);
> current->jobctl |= JOBCTL_TRACED;
> 
> which looks ok on first sight, but in this case JOBCTL is already set,
> so the reading CPU will immediately move on to wait_task_inactive(),
> before JOBCTL_TRACED is set. I don't know whether this is a valid
> combination. I never looked into JOBCTL_* semantics, but i guess now
> is a good time to do so. I added some debugging statements, and that
> gives:
> 
> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
> [   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==> next_comm=swapper/2 next_pid=0 next_prio=120
> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
> [   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kill_child next_pid=343805 next_prio=120
> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?

Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
__TASK_TRACED for now. Set when __TASK_TRACED, cleared when
TASK_RUNNING.

Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
when they would wake a __TASK_TRACED task.

> [   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
> [   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
> [   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid=343805 runtime=2135 [ns] vruntime=556109013931 [ns]
> [   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
> [   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid=343805 prev_prio=120 prev_state=t ==>next_comm=swapper/6 next_pid=0 next_prio=120
> 

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-05 19:25                             ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-05 19:25 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
> Sven Schnelle <svens@linux.ibm.com> writes:

> I think there's a race in ptrace_check_attach(). It first calls
> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
> If it is (and a few other conditions match) it will set ret = 0.
> 
> Later outside of siglock and tasklist_lock it will call
> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
> 
> ptrace_stop(), which runs on another CPU, does:
> 
> set_special_state(TASK_TRACED);
> current->jobctl |= JOBCTL_TRACED;
> 
> which looks ok on first sight, but in this case JOBCTL is already set,
> so the reading CPU will immediately move on to wait_task_inactive(),
> before JOBCTL_TRACED is set. I don't know whether this is a valid
> combination. I never looked into JOBCTL_* semantics, but i guess now
> is a good time to do so. I added some debugging statements, and that
> gives:
> 
> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig\x17 errno=0 code=4 comm=strace pid00542 grp=1 res=1
> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid00545 runtime058 [ns] vruntime`6165713178 [ns]
> [   86.218500] kill_chi-300545    2d.... 79990136us : sched_switch: prev_comm=kill_child prev_pid00545 prev_prio\x120 prev_state=t => next_comm=swapper/2 next_pid=0 next_prio\x120
> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid43805 prio\x120 target_cpu\06
> [   86.218519]   <idle>-0         6d.... 79990140us : sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio\x120 prev_state=R => next_comm=kill_child next_pid43805 next_prio\x120
> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?

Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
__TASK_TRACED for now. Set when __TASK_TRACED, cleared when
TASK_RUNNING.

Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
when they would wake a __TASK_TRACED task.

> [   86.218554] kill_chi-343805    6d.... 79990141us : ptrace_stop: state 8
> [   86.218558] kill_chi-343805    6d.... 79990142us : signal_generate: sig\x17 errno=0 code=4 comm=strace pid00542 grp=1 res=1
> [   86.218562] kill_chi-343805    6d.... 79990142us : sched_stat_runtime: comm=kill_child pid43805 runtime!35 [ns] vruntimeU6109013931 [ns]
> [   86.218566]   strace-300542    7..... 79990142us : wait_task_inactive: NO MATCH: state 0, match_state 8, pid 343805
> [   86.218570] kill_chi-343805    6d.... 79990142us : sched_switch: prev_comm=kill_child prev_pid43805 prev_prio\x120 prev_state=t =>next_comm=swapper/6 next_pid=0 next_prio\x120
> 

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-05 15:44                                 ` Peter Zijlstra
  (?)
@ 2022-07-06  6:56                                   ` Alexander Gordeev
  -1 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-07-06  6:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, Steven Rostedt, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jul 05, 2022 at 05:44:06PM +0200, Peter Zijlstra wrote:

Hi Peter,

> Sven, does all this still reproduce if you take out
> CONFIG_HAVE_MARCH_Z196_FEATURES ?

Yes, it hits.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  6:56                                   ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-07-06  6:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, Steven Rostedt, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jul 05, 2022 at 05:44:06PM +0200, Peter Zijlstra wrote:

Hi Peter,

> Sven, does all this still reproduce if you take out
> CONFIG_HAVE_MARCH_Z196_FEATURES ?

Yes, it hits.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  6:56                                   ` Alexander Gordeev
  0 siblings, 0 replies; 572+ messages in thread
From: Alexander Gordeev @ 2022-07-06  6:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eric W. Biederman, Steven Rostedt, linux-kernel, rjw,
	Oleg Nesterov, mingo, vincent.guittot, dietmar.eggemann, mgorman,
	bigeasy, Will Deacon, tj, linux-pm, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um, Chris Zankel,
	Max Filippov, linux-xtensa, Kees Cook, Jann Horn, linux-ia64,
	svens

On Tue, Jul 05, 2022 at 05:44:06PM +0200, Peter Zijlstra wrote:

Hi Peter,

> Sven, does all this still reproduce if you take out
> CONFIG_HAVE_MARCH_Z196_FEATURES ?

Yes, it hits.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-05 19:25                             ` Peter Zijlstra
  (?)
@ 2022-07-06  7:58                               ` Sven Schnelle
  -1 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
>> Sven Schnelle <svens@linux.ibm.com> writes:
>
>> I think there's a race in ptrace_check_attach(). It first calls
>> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
>> If it is (and a few other conditions match) it will set ret = 0.
>> 
>> Later outside of siglock and tasklist_lock it will call
>> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
>> 
>> ptrace_stop(), which runs on another CPU, does:
>> 
>> set_special_state(TASK_TRACED);
>> current->jobctl |= JOBCTL_TRACED;
>> 
>> which looks ok on first sight, but in this case JOBCTL is already set,
>> so the reading CPU will immediately move on to wait_task_inactive(),
>> before JOBCTL_TRACED is set. I don't know whether this is a valid
>> combination. I never looked into JOBCTL_* semantics, but i guess now
>> is a good time to do so. I added some debugging statements, and that
>> gives:
>> 
>> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
>> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
>> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
>> [ 86.218500] kill_chi-300545 2d.... 79990136us : sched_switch:
>> prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==>
>> next_comm=swapper/2 next_pid=0 next_prio=120
>> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
>> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
>> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
>> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
>> [ 86.218519] <idle>-0 6d.... 79990140us : sched_switch:
>> prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==>
>> next_comm=kill_child next_pid=343805 next_prio=120
>> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
>> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
>> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
>> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
>> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
>> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
>> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
>> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>
> Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> TASK_RUNNING.
>
> Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> when they would wake a __TASK_TRACED task.

try_to_wake_up() clears TASK_TRACED in this case because a signal
(SIGKILL) has to be delivered. As a test I put the following change
on top, and it "fixes" the problem:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..f2e0f5e70e77 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
         * TASK_WAKING such that we can unlock p->pi_lock before doing the
         * enqueue, such as ttwu_queue_wakelist().
         */
+       if (p->__state & TASK_TRACED)
+               trace_printk("clearing TASK_TRACED 2\n");
+       p->jobctl &= ~JOBCTL_TRACED;
        WRITE_ONCE(p->__state, TASK_WAKING);

        /*

There are several places where the state is changed from TASK_TRACED to
something else without clearing JOBCTL_TRACED.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  7:58                               ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
>> Sven Schnelle <svens@linux.ibm.com> writes:
>
>> I think there's a race in ptrace_check_attach(). It first calls
>> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
>> If it is (and a few other conditions match) it will set ret = 0.
>> 
>> Later outside of siglock and tasklist_lock it will call
>> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
>> 
>> ptrace_stop(), which runs on another CPU, does:
>> 
>> set_special_state(TASK_TRACED);
>> current->jobctl |= JOBCTL_TRACED;
>> 
>> which looks ok on first sight, but in this case JOBCTL is already set,
>> so the reading CPU will immediately move on to wait_task_inactive(),
>> before JOBCTL_TRACED is set. I don't know whether this is a valid
>> combination. I never looked into JOBCTL_* semantics, but i guess now
>> is a good time to do so. I added some debugging statements, and that
>> gives:
>> 
>> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
>> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig=17 errno=0 code=4 comm=strace pid=300542 grp=1 res=1
>> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid=300545 runtime=3058 [ns] vruntime=606165713178 [ns]
>> [ 86.218500] kill_chi-300545 2d.... 79990136us : sched_switch:
>> prev_comm=kill_child prev_pid=300545 prev_prio=120 prev_state=t ==>
>> next_comm=swapper/2 next_pid=0 next_prio=120
>> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
>> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
>> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
>> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid=343805 prio=120 target_cpu=006
>> [ 86.218519] <idle>-0 6d.... 79990140us : sched_switch:
>> prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==>
>> next_comm=kill_child next_pid=343805 next_prio=120
>> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
>> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
>> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
>> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
>> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
>> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
>> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
>> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>
> Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> TASK_RUNNING.
>
> Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> when they would wake a __TASK_TRACED task.

try_to_wake_up() clears TASK_TRACED in this case because a signal
(SIGKILL) has to be delivered. As a test I put the following change
on top, and it "fixes" the problem:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..f2e0f5e70e77 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
         * TASK_WAKING such that we can unlock p->pi_lock before doing the
         * enqueue, such as ttwu_queue_wakelist().
         */
+       if (p->__state & TASK_TRACED)
+               trace_printk("clearing TASK_TRACED 2\n");
+       p->jobctl &= ~JOBCTL_TRACED;
        WRITE_ONCE(p->__state, TASK_WAKING);

        /*

There are several places where the state is changed from TASK_TRACED to
something else without clearing JOBCTL_TRACED.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  7:58                               ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jul 05, 2022 at 07:28:49PM +0200, Sven Schnelle wrote:
>> Sven Schnelle <svens@linux.ibm.com> writes:
>
>> I think there's a race in ptrace_check_attach(). It first calls
>> ptrace_freeze_task(), which checks whether JOBCTL_TRACED is set.
>> If it is (and a few other conditions match) it will set ret = 0.
>> 
>> Later outside of siglock and tasklist_lock it will call
>> wait_task_inactive, assuming the target is in TASK_TRACED, but it isn't.
>> 
>> ptrace_stop(), which runs on another CPU, does:
>> 
>> set_special_state(TASK_TRACED);
>> current->jobctl |= JOBCTL_TRACED;
>> 
>> which looks ok on first sight, but in this case JOBCTL is already set,
>> so the reading CPU will immediately move on to wait_task_inactive(),
>> before JOBCTL_TRACED is set. I don't know whether this is a valid
>> combination. I never looked into JOBCTL_* semantics, but i guess now
>> is a good time to do so. I added some debugging statements, and that
>> gives:
>> 
>> [   86.218488] kill_chi-300545    2d.... 79990135us : ptrace_stop: state 8
>> [   86.218492] kill_chi-300545    2d.... 79990136us : signal_generate: sig\x17 errno=0 code=4 comm=strace pid00542 grp=1 res=1
>> [   86.218496] kill_chi-300545    2d.... 79990136us : sched_stat_runtime: comm=kill_child pid00545 runtime058 [ns] vruntime`6165713178 [ns]
>> [ 86.218500] kill_chi-300545 2d.... 79990136us : sched_switch:
>> prev_comm=kill_child prev_pid00545 prev_prio\x120 prev_state=t =>
>> next_comm=swapper/2 next_pid=0 next_prio\x120
>> [   86.218504]   strace-300542    7..... 79990139us : sys_ptrace -> 0x50
>> [   86.218508]   strace-300542    7..... 79990139us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 12)
>> [   86.218512]   strace-300542    7..... 79990140us : sys_write -> 0x12
>> [   86.218515]   <idle>-0         6dNh.. 79990140us : sched_wakeup: comm=kill_child pid43805 prio\x120 target_cpu\06
>> [ 86.218519] <idle>-0 6d.... 79990140us : sched_switch:
>> prev_comm=swapper/6 prev_pid=0 prev_prio\x120 prev_state=R =>
>> next_comm=kill_child next_pid43805 next_prio\x120
>> [   86.218524]   strace-300542    7..... 79990140us : sys_write(fd: 2, buf: 2aa198f7ad0, count: 19)
>> [   86.218527]   strace-300542    7..... 79990141us : sys_write -> 0x19
>> [   86.218531] kill_chi-343805    6..... 79990141us : sys_sched_yield -> 0xffffffffffffffda
>> [   86.218535]   strace-300542    7..... 79990141us : sys_ptrace(request: 18, pid: 53efd, addr: 0, data: 0)
>> [   86.218539] kill_chi-343805    6d.... 79990141us : signal_deliver: sig=9 errno=0 code=0 sa_handler=0 sa_flags=0
>> [   86.218543]   strace-300542    7d.... 79990141us : ptrace_check_attach: task_is_traced: 1, fatal signal pending: 0
>> [   86.218547]   strace-300542    7..... 79990141us : ptrace_check_attach: child->pid = 343805, child->__flags=0
>> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>
> Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> TASK_RUNNING.
>
> Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> when they would wake a __TASK_TRACED task.

try_to_wake_up() clears TASK_TRACED in this case because a signal
(SIGKILL) has to be delivered. As a test I put the following change
on top, and it "fixes" the problem:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..f2e0f5e70e77 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
         * TASK_WAKING such that we can unlock p->pi_lock before doing the
         * enqueue, such as ttwu_queue_wakelist().
         */
+       if (p->__state & TASK_TRACED)
+               trace_printk("clearing TASK_TRACED 2\n");
+       p->jobctl &= ~JOBCTL_TRACED;
        WRITE_ONCE(p->__state, TASK_WAKING);

        /*

There are several places where the state is changed from TASK_TRACED to
something else without clearing JOBCTL_TRACED.

^ permalink raw reply related	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-06  7:58                               ` Sven Schnelle
  (?)
@ 2022-07-06  8:59                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-06  8:59 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:

> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
> >
> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> > TASK_RUNNING.
> >
> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> > when they would wake a __TASK_TRACED task.
> 
> try_to_wake_up() clears TASK_TRACED in this case because a signal
> (SIGKILL) has to be delivered. As a test I put the following change
> on top, and it "fixes" the problem:
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da0bf6fe9ecd..f2e0f5e70e77 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>          * enqueue, such as ttwu_queue_wakelist().
>          */
> +       if (p->__state & TASK_TRACED)
> +               trace_printk("clearing TASK_TRACED 2\n");
> +       p->jobctl &= ~JOBCTL_TRACED;
>         WRITE_ONCE(p->__state, TASK_WAKING);
> 
>         /*
> 
> There are several places where the state is changed from TASK_TRACED to
> something else without clearing JOBCTL_TRACED.

I'm having difficulty spotting them; I find:

TASK_WAKEKILL: signal_wake_up()
__TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()

And all those sites dutifully clear JOBCTL_TRACED.

I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
events to see where we miss a spot.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  8:59                                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-06  8:59 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:

> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
> >
> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> > TASK_RUNNING.
> >
> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> > when they would wake a __TASK_TRACED task.
> 
> try_to_wake_up() clears TASK_TRACED in this case because a signal
> (SIGKILL) has to be delivered. As a test I put the following change
> on top, and it "fixes" the problem:
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da0bf6fe9ecd..f2e0f5e70e77 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>          * enqueue, such as ttwu_queue_wakelist().
>          */
> +       if (p->__state & TASK_TRACED)
> +               trace_printk("clearing TASK_TRACED 2\n");
> +       p->jobctl &= ~JOBCTL_TRACED;
>         WRITE_ONCE(p->__state, TASK_WAKING);
> 
>         /*
> 
> There are several places where the state is changed from TASK_TRACED to
> something else without clearing JOBCTL_TRACED.

I'm having difficulty spotting them; I find:

TASK_WAKEKILL: signal_wake_up()
__TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()

And all those sites dutifully clear JOBCTL_TRACED.

I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
events to see where we miss a spot.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  8:59                                 ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-06  8:59 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:

> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
> >
> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> > TASK_RUNNING.
> >
> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> > when they would wake a __TASK_TRACED task.
> 
> try_to_wake_up() clears TASK_TRACED in this case because a signal
> (SIGKILL) has to be delivered. As a test I put the following change
> on top, and it "fixes" the problem:
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da0bf6fe9ecd..f2e0f5e70e77 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>          * enqueue, such as ttwu_queue_wakelist().
>          */
> +       if (p->__state & TASK_TRACED)
> +               trace_printk("clearing TASK_TRACED 2\n");
> +       p->jobctl &= ~JOBCTL_TRACED;
>         WRITE_ONCE(p->__state, TASK_WAKING);
> 
>         /*
> 
> There are several places where the state is changed from TASK_TRACED to
> something else without clearing JOBCTL_TRACED.

I'm having difficulty spotting them; I find:

TASK_WAKEKILL: signal_wake_up()
__TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()

And all those sites dutifully clear JOBCTL_TRACED.

I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
events to see where we miss a spot.

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-06  8:59                                 ` Peter Zijlstra
  (?)
@ 2022-07-06  9:27                                   ` Sven Schnelle
  -1 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  9:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:
>
>> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>> >
>> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
>> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
>> > TASK_RUNNING.
>> >
>> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
>> > when they would wake a __TASK_TRACED task.
>> 
>> try_to_wake_up() clears TASK_TRACED in this case because a signal
>> (SIGKILL) has to be delivered. As a test I put the following change
>> on top, and it "fixes" the problem:
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index da0bf6fe9ecd..f2e0f5e70e77 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>>          * enqueue, such as ttwu_queue_wakelist().
>>          */
>> +       if (p->__state & TASK_TRACED)
>> +               trace_printk("clearing TASK_TRACED 2\n");
>> +       p->jobctl &= ~JOBCTL_TRACED;
>>         WRITE_ONCE(p->__state, TASK_WAKING);
>> 
>>         /*
>> 
>> There are several places where the state is changed from TASK_TRACED to
>> something else without clearing JOBCTL_TRACED.
>
> I'm having difficulty spotting them; I find:
>
> TASK_WAKEKILL: signal_wake_up()
> __TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()
>
> And all those sites dutifully clear JOBCTL_TRACED.
>
> I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
> events to see where we miss a spot.

The calltrace is:
[    9.863613] Call Trace:
[    9.863616]  [<00000000d3105f0e>] try_to_wake_up+0xae/0x620
[    9.863620] ([<00000000d3106164>] try_to_wake_up+0x304/0x620)
[    9.863623]  [<00000000d30d1e46>] ptrace_unfreeze_traced+0x9e/0xa8
[    9.863629]  [<00000000d30d2ef0>] __s390x_sys_ptrace+0xc0/0x160
[    9.863633]  [<00000000d3c5d8f4>] __do_syscall+0x1d4/0x200
[    9.863678]  [<00000000d3c6c332>] system_call+0x82/0xb0
[    9.863685] Last Breaking-Event-Address:
[    9.863686]  [<00000000d3106176>] try_to_wake_up+0x316/0x620
[    9.863688] ---[ end trace 0000000000000000 ]---

ptrace_unfreeze_traced() is:

static void ptrace_unfreeze_traced(struct task_struct *task)
{
        unsigned long flags;

        /*
         * The child may be awake and may have cleared
         * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
         * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
         */
        if (lock_task_sighand(task, &flags)) {
                task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
                if (__fatal_signal_pending(task)) {
                        task->jobctl &= ~TASK_TRACED;

Looking at this, shouldn't the line above read task->jobctl &= ~JOBCTL_TRACED?

                        wake_up_state(task, __TASK_TRACED);
                }
                unlock_task_sighand(task, &flags);
        }
}

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  9:27                                   ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  9:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:
>
>> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>> >
>> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
>> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
>> > TASK_RUNNING.
>> >
>> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
>> > when they would wake a __TASK_TRACED task.
>> 
>> try_to_wake_up() clears TASK_TRACED in this case because a signal
>> (SIGKILL) has to be delivered. As a test I put the following change
>> on top, and it "fixes" the problem:
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index da0bf6fe9ecd..f2e0f5e70e77 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>>          * enqueue, such as ttwu_queue_wakelist().
>>          */
>> +       if (p->__state & TASK_TRACED)
>> +               trace_printk("clearing TASK_TRACED 2\n");
>> +       p->jobctl &= ~JOBCTL_TRACED;
>>         WRITE_ONCE(p->__state, TASK_WAKING);
>> 
>>         /*
>> 
>> There are several places where the state is changed from TASK_TRACED to
>> something else without clearing JOBCTL_TRACED.
>
> I'm having difficulty spotting them; I find:
>
> TASK_WAKEKILL: signal_wake_up()
> __TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()
>
> And all those sites dutifully clear JOBCTL_TRACED.
>
> I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
> events to see where we miss a spot.

The calltrace is:
[    9.863613] Call Trace:
[    9.863616]  [<00000000d3105f0e>] try_to_wake_up+0xae/0x620
[    9.863620] ([<00000000d3106164>] try_to_wake_up+0x304/0x620)
[    9.863623]  [<00000000d30d1e46>] ptrace_unfreeze_traced+0x9e/0xa8
[    9.863629]  [<00000000d30d2ef0>] __s390x_sys_ptrace+0xc0/0x160
[    9.863633]  [<00000000d3c5d8f4>] __do_syscall+0x1d4/0x200
[    9.863678]  [<00000000d3c6c332>] system_call+0x82/0xb0
[    9.863685] Last Breaking-Event-Address:
[    9.863686]  [<00000000d3106176>] try_to_wake_up+0x316/0x620
[    9.863688] ---[ end trace 0000000000000000 ]---

ptrace_unfreeze_traced() is:

static void ptrace_unfreeze_traced(struct task_struct *task)
{
        unsigned long flags;

        /*
         * The child may be awake and may have cleared
         * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
         * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
         */
        if (lock_task_sighand(task, &flags)) {
                task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
                if (__fatal_signal_pending(task)) {
                        task->jobctl &= ~TASK_TRACED;

Looking at this, shouldn't the line above read task->jobctl &= ~JOBCTL_TRACED?

                        wake_up_state(task, __TASK_TRACED);
                }
                unlock_task_sighand(task, &flags);
        }
}

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06  9:27                                   ` Sven Schnelle
  0 siblings, 0 replies; 572+ messages in thread
From: Sven Schnelle @ 2022-07-06  9:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:
>
>> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
>> >
>> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
>> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
>> > TASK_RUNNING.
>> >
>> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
>> > when they would wake a __TASK_TRACED task.
>> 
>> try_to_wake_up() clears TASK_TRACED in this case because a signal
>> (SIGKILL) has to be delivered. As a test I put the following change
>> on top, and it "fixes" the problem:
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index da0bf6fe9ecd..f2e0f5e70e77 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
>>          * enqueue, such as ttwu_queue_wakelist().
>>          */
>> +       if (p->__state & TASK_TRACED)
>> +               trace_printk("clearing TASK_TRACED 2\n");
>> +       p->jobctl &= ~JOBCTL_TRACED;
>>         WRITE_ONCE(p->__state, TASK_WAKING);
>> 
>>         /*
>> 
>> There are several places where the state is changed from TASK_TRACED to
>> something else without clearing JOBCTL_TRACED.
>
> I'm having difficulty spotting them; I find:
>
> TASK_WAKEKILL: signal_wake_up()
> __TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()
>
> And all those sites dutifully clear JOBCTL_TRACED.
>
> I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
> events to see where we miss a spot.

The calltrace is:
[    9.863613] Call Trace:
[    9.863616]  [<00000000d3105f0e>] try_to_wake_up+0xae/0x620
[    9.863620] ([<00000000d3106164>] try_to_wake_up+0x304/0x620)
[    9.863623]  [<00000000d30d1e46>] ptrace_unfreeze_traced+0x9e/0xa8
[    9.863629]  [<00000000d30d2ef0>] __s390x_sys_ptrace+0xc0/0x160
[    9.863633]  [<00000000d3c5d8f4>] __do_syscall+0x1d4/0x200
[    9.863678]  [<00000000d3c6c332>] system_call+0x82/0xb0
[    9.863685] Last Breaking-Event-Address:
[    9.863686]  [<00000000d3106176>] try_to_wake_up+0x316/0x620
[    9.863688] ---[ end trace 0000000000000000 ]---

ptrace_unfreeze_traced() is:

static void ptrace_unfreeze_traced(struct task_struct *task)
{
        unsigned long flags;

        /*
         * The child may be awake and may have cleared
         * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
         * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
         */
        if (lock_task_sighand(task, &flags)) {
                task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
                if (__fatal_signal_pending(task)) {
                        task->jobctl &= ~TASK_TRACED;

Looking at this, shouldn't the line above read task->jobctl &= ~JOBCTL_TRACED?

                        wake_up_state(task, __TASK_TRACED);
                }
                unlock_task_sighand(task, &flags);
        }
}

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
  2022-07-06  9:27                                   ` Sven Schnelle
@ 2022-07-06 10:11                                     ` Peter Zijlstra
  -1 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-06 10:11 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Wed, Jul 06, 2022 at 11:27:05AM +0200, Sven Schnelle wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:
> >
> >> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
> >> >
> >> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> >> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> >> > TASK_RUNNING.
> >> >
> >> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> >> > when they would wake a __TASK_TRACED task.
> >> 
> >> try_to_wake_up() clears TASK_TRACED in this case because a signal
> >> (SIGKILL) has to be delivered. As a test I put the following change
> >> on top, and it "fixes" the problem:
> >> 
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index da0bf6fe9ecd..f2e0f5e70e77 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
> >>          * enqueue, such as ttwu_queue_wakelist().
> >>          */
> >> +       if (p->__state & TASK_TRACED)
> >> +               trace_printk("clearing TASK_TRACED 2\n");
> >> +       p->jobctl &= ~JOBCTL_TRACED;
> >>         WRITE_ONCE(p->__state, TASK_WAKING);
> >> 
> >>         /*
> >> 
> >> There are several places where the state is changed from TASK_TRACED to
> >> something else without clearing JOBCTL_TRACED.
> >
> > I'm having difficulty spotting them; I find:
> >
> > TASK_WAKEKILL: signal_wake_up()
> > __TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()
> >
> > And all those sites dutifully clear JOBCTL_TRACED.
> >
> > I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
> > events to see where we miss a spot.
> 
> The calltrace is:
> [    9.863613] Call Trace:
> [    9.863616]  [<00000000d3105f0e>] try_to_wake_up+0xae/0x620
> [    9.863620] ([<00000000d3106164>] try_to_wake_up+0x304/0x620)
> [    9.863623]  [<00000000d30d1e46>] ptrace_unfreeze_traced+0x9e/0xa8
> [    9.863629]  [<00000000d30d2ef0>] __s390x_sys_ptrace+0xc0/0x160
> [    9.863633]  [<00000000d3c5d8f4>] __do_syscall+0x1d4/0x200
> [    9.863678]  [<00000000d3c6c332>] system_call+0x82/0xb0
> [    9.863685] Last Breaking-Event-Address:
> [    9.863686]  [<00000000d3106176>] try_to_wake_up+0x316/0x620
> [    9.863688] ---[ end trace 0000000000000000 ]---
> 
> ptrace_unfreeze_traced() is:
> 
> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
>         unsigned long flags;
> 
>         /*
>          * The child may be awake and may have cleared
>          * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
>          * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>          */
>         if (lock_task_sighand(task, &flags)) {
>                 task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
>                 if (__fatal_signal_pending(task)) {
>                         task->jobctl &= ~TASK_TRACED;
> 
> Looking at this, shouldn't the line above read task->jobctl &= ~JOBCTL_TRACED?

YES! Absolutely.

>                         wake_up_state(task, __TASK_TRACED);
>                 }
>                 unlock_task_sighand(task, &flags);
>         }
> }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
@ 2022-07-06 10:11                                     ` Peter Zijlstra
  0 siblings, 0 replies; 572+ messages in thread
From: Peter Zijlstra @ 2022-07-06 10:11 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, Alexander Gordeev, Eric W. Biederman,
	linux-kernel, rjw, Oleg Nesterov, mingo, vincent.guittot,
	dietmar.eggemann, mgorman, bigeasy, Will Deacon, tj, linux-pm,
	Richard Weinberger, Anton Ivanov, Johannes Berg, linux-um,
	Chris Zankel, Max Filippov, linux-xtensa, Kees Cook, Jann Horn,
	linux-ia64

On Wed, Jul 06, 2022 at 11:27:05AM +0200, Sven Schnelle wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Wed, Jul 06, 2022 at 09:58:55AM +0200, Sven Schnelle wrote:
> >
> >> >> [   86.218551] kill_chi-343805    6d.... 79990141us : ptrace_stop: JOBCTL_TRACED already set, state=0 <------ valid combination of flags?
> >> >
> >> > Yeah, that's not supposed to be so. JOBCTL_TRACED is supposed to follow
> >> > __TASK_TRACED for now. Set when __TASK_TRACED, cleared when
> >> > TASK_RUNNING.
> >> >
> >> > Specifically {ptrace_,}signal_wake_up() in signal.h clear JOBCTL_TRACED
> >> > when they would wake a __TASK_TRACED task.
> >> 
> >> try_to_wake_up() clears TASK_TRACED in this case because a signal
> >> (SIGKILL) has to be delivered. As a test I put the following change
> >> on top, and it "fixes" the problem:
> >> 
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index da0bf6fe9ecd..f2e0f5e70e77 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -4141,6 +4149,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >>          * TASK_WAKING such that we can unlock p->pi_lock before doing the
> >>          * enqueue, such as ttwu_queue_wakelist().
> >>          */
> >> +       if (p->__state & TASK_TRACED)
> >> +               trace_printk("clearing TASK_TRACED 2\n");
> >> +       p->jobctl &= ~JOBCTL_TRACED;
> >>         WRITE_ONCE(p->__state, TASK_WAKING);
> >> 
> >>         /*
> >> 
> >> There are several places where the state is changed from TASK_TRACED to
> >> something else without clearing JOBCTL_TRACED.
> >
> > I'm having difficulty spotting them; I find:
> >
> > TASK_WAKEKILL: signal_wake_up()
> > __TASK_TRACED: ptrace_signal_wake_up(), ptrace_unfreeze_traced(), ptrace_resume()
> >
> > And all those sites dutifully clear JOBCTL_TRACED.
> >
> > I'd be most interested in the calstack for the 'clearing TASK_TRACED 2'
> > events to see where we miss a spot.
> 
> The calltrace is:
> [    9.863613] Call Trace:
> [    9.863616]  [<00000000d3105f0e>] try_to_wake_up+0xae/0x620
> [    9.863620] ([<00000000d3106164>] try_to_wake_up+0x304/0x620)
> [    9.863623]  [<00000000d30d1e46>] ptrace_unfreeze_traced+0x9e/0xa8
> [    9.863629]  [<00000000d30d2ef0>] __s390x_sys_ptrace+0xc0/0x160
> [    9.863633]  [<00000000d3c5d8f4>] __do_syscall+0x1d4/0x200
> [    9.863678]  [<00000000d3c6c332>] system_call+0x82/0xb0
> [    9.863685] Last Breaking-Event-Address:
> [    9.863686]  [<00000000d3106176>] try_to_wake_up+0x316/0x620
> [    9.863688] ---[ end trace 0000000000000000 ]---
> 
> ptrace_unfreeze_traced() is:
> 
> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
>         unsigned long flags;
> 
>         /*
>          * The child may be awake and may have cleared
>          * JOBCTL_PTRACE_FROZEN (see ptrace_resume).  The child will
>          * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
>          */
>         if (lock_task_sighand(task, &flags)) {
>                 task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
>                 if (__fatal_signal_pending(task)) {
>                         task->jobctl &= ~TASK_TRACED;
> 
> Looking at this, shouldn't the line above read task->jobctl &= ~JOBCTL_TRACED?

YES! Absolutely.

>                         wake_up_state(task, __TASK_TRACED);
>                 }
>                 unlock_task_sighand(task, &flags);
>         }
> }

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
                                 ` (3 preceding siblings ...)
  2022-06-23 15:12               ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Alexander Gordeev
@ 2022-07-08 22:25               ` Eric W. Biederman
  2022-07-08 23:22                 ` Keno Fischer
  4 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-07-08 22:25 UTC (permalink / raw)
  To: Keno Fischer
  Cc: linux-kernel, mingo, bigeasy, Peter Zijlstra, Jann Horn,
	Kees Cook, Alexander Gordeev, Robert O'Callahan, Kyle Huey,
	Oleg Nesterov

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Recently I had a conversation where it was pointed out to me that
> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> difficult for a tracer to handle.
>
> Keeping SIGKILL working for anything after the process has been killed
> is also a real pain from an implementation point of view.
>
> So I am attempting to remove this wart in the userspace API and see
> if anyone cares.
>
> Eric W. Biederman (3):
>       signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
>       signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
>       signal: Drop signals received after a fatal signal has been processed
>
>  fs/coredump.c                |  2 +-
>  include/linux/sched/signal.h |  1 +
>  kernel/exit.c                | 20 +++++++++++++++++++-
>  kernel/fork.c                |  2 ++
>  kernel/signal.c              |  3 ++-
>  5 files changed, 25 insertions(+), 3 deletions(-)

RR folks any comments?

Did I properly understand what Keno Fischer was asking for when we
talked in person?

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-07-08 22:25               ` Eric W. Biederman
@ 2022-07-08 23:22                 ` Keno Fischer
  2022-07-12 20:03                   ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Keno Fischer @ 2022-07-08 23:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, mingo, Sebastian Andrzej Siewior,
	Peter Zijlstra, Jann Horn, Kees Cook, Alexander Gordeev,
	Robert O'Callahan, Kyle Huey, Oleg Nesterov

Hi Eric,

On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Recently I had a conversation where it was pointed out to me that
> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> > difficult for a tracer to handle.
> >
>
> RR folks any comments?
>
> Did I properly understand what Keno Fischer was asking for when we
> talked in person?

Yes, this is indeed what I had in mind. I have not yet had the opportunity
to try out your patch series (sorry), but from visual inspection, it does indeed
do what I wanted, which is to make sure that a tracee stays in
PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
SIGKILL incoming simultaneously (since otherwise it may be impossible
for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
come in rapid succession). I will try to take this series for a proper spin
shortly.

Keno

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-07-08 23:22                 ` Keno Fischer
@ 2022-07-12 20:03                   ` Eric W. Biederman
  2022-07-16 21:29                     ` Eric W. Biederman
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-07-12 20:03 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Linux Kernel Mailing List, mingo, Sebastian Andrzej Siewior,
	Peter Zijlstra, Jann Horn, Kees Cook, Alexander Gordeev,
	Robert O'Callahan, Kyle Huey, Oleg Nesterov

Keno Fischer <keno@juliacomputing.com> writes:

> Hi Eric,
>
> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > Recently I had a conversation where it was pointed out to me that
>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>> > difficult for a tracer to handle.
>> >
>>
>> RR folks any comments?
>>
>> Did I properly understand what Keno Fischer was asking for when we
>> talked in person?
>
> Yes, this is indeed what I had in mind. I have not yet had the opportunity
> to try out your patch series (sorry), but from visual inspection, it does indeed
> do what I wanted, which is to make sure that a tracee stays in
> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
> SIGKILL incoming simultaneously (since otherwise it may be impossible
> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
> come in rapid succession). I will try to take this series for a proper spin
> shortly.

Thanks,

I haven't yet figured out how to get the rr test suite to run
successfully.  Something about my test machine and lack of perf counters
seems to be causing problems.  So if you can perform the testing on your
side that would be fantastic.

Eric

^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-07-12 20:03                   ` Eric W. Biederman
@ 2022-07-16 21:29                     ` Eric W. Biederman
  2022-07-16 23:21                       ` Kyle Huey
  0 siblings, 1 reply; 572+ messages in thread
From: Eric W. Biederman @ 2022-07-16 21:29 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Linux Kernel Mailing List, mingo, Sebastian Andrzej Siewior,
	Peter Zijlstra, Jann Horn, Kees Cook, Alexander Gordeev,
	Robert O'Callahan, Kyle Huey, Oleg Nesterov

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Keno Fischer <keno@juliacomputing.com> writes:
>
>> Hi Eric,
>>
>> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> > Recently I had a conversation where it was pointed out to me that
>>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>>> > difficult for a tracer to handle.
>>> >
>>>
>>> RR folks any comments?
>>>
>>> Did I properly understand what Keno Fischer was asking for when we
>>> talked in person?
>>
>> Yes, this is indeed what I had in mind. I have not yet had the opportunity
>> to try out your patch series (sorry), but from visual inspection, it does indeed
>> do what I wanted, which is to make sure that a tracee stays in
>> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
>> SIGKILL incoming simultaneously (since otherwise it may be impossible
>> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
>> come in rapid succession). I will try to take this series for a proper spin
>> shortly.
>
> Thanks,
>
> I haven't yet figured out how to get the rr test suite to run
> successfully.  Something about my test machine and lack of perf counters
> seems to be causing problems.  So if you can perform the testing on your
> side that would be fantastic.

Ok.  I finally found a machine where I can run rr and the rr test suite.

It looks like there are a couple of the rr 5.5.0 test that fail on
Linus's lastest kernel simply because of changes in kernel behavior.  In
particular clone_cleartid_coredump, and fcntl_rw_hints.  The
clone_cleartid_coredump appears to fail because SIGSEGV no longer kills
all processes that share an mm.  Which was a deliberate change.

With the lastest development version of rr, only detach_sigkill appears
to be failing on Linus's latest.  That failure appears to be independent
of the patches in question as well.  When run manually the
detach_sigkill test succeeds so I am not quite certain what is going on,
any thoughts?

As for my patchset it looks like it does not cause any new test failures
for rr so I will plan on getting it into linux-next shortly.

Eric


^ permalink raw reply	[flat|nested] 572+ messages in thread

* Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT
  2022-07-16 21:29                     ` Eric W. Biederman
@ 2022-07-16 23:21                       ` Kyle Huey
  0 siblings, 0 replies; 572+ messages in thread
From: Kyle Huey @ 2022-07-16 23:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Keno Fischer, Linux Kernel Mailing List, mingo,
	Sebastian Andrzej Siewior, Peter Zijlstra, Jann Horn, Kees Cook,
	Alexander Gordeev, Robert O'Callahan, Oleg Nesterov

On Sat, Jul 16, 2022 at 2:31 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
> > Keno Fischer <keno@juliacomputing.com> writes:
> >
> >> Hi Eric,
> >>
> >> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>> > Recently I had a conversation where it was pointed out to me that
> >>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> >>> > difficult for a tracer to handle.
> >>> >
> >>>
> >>> RR folks any comments?
> >>>
> >>> Did I properly understand what Keno Fischer was asking for when we
> >>> talked in person?
> >>
> >> Yes, this is indeed what I had in mind. I have not yet had the opportunity
> >> to try out your patch series (sorry), but from visual inspection, it does indeed
> >> do what I wanted, which is to make sure that a tracee stays in
> >> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
> >> SIGKILL incoming simultaneously (since otherwise it may be impossible
> >> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
> >> come in rapid succession). I will try to take this series for a proper spin
> >> shortly.
> >
> > Thanks,
> >
> > I haven't yet figured out how to get the rr test suite to run
> > successfully.  Something about my test machine and lack of perf counters
> > seems to be causing problems.  So if you can perform the testing on your
> > side that would be fantastic.
>
> Ok.  I finally found a machine where I can run rr and the rr test suite.
>
> It looks like there are a couple of the rr 5.5.0 test that fail on
> Linus's lastest kernel simply because of changes in kernel behavior.  In
> particular clone_cleartid_coredump, and fcntl_rw_hints.  The
> clone_cleartid_coredump appears to fail because SIGSEGV no longer kills
> all processes that share an mm.  Which was a deliberate change.

Yeah, we changed to handle this in
https://github.com/rr-debugger/rr/commit/04bbacdbaba1cc496e92060014442bd1fd26b41d
and https://github.com/rr-debugger/rr/commit/1a3b389c2956e1844c0d07bf4297398bb6c561ea.

> With the lastest development version of rr, only detach_sigkill appears
> to be failing on Linus's latest.  That failure appears to be independent
> of the patches in question as well.  When run manually the
> detach_sigkill test succeeds so I am not quite certain what is going on,
> any thoughts?

If it fails before your changes I wouldn't worry about it too much,
there's been some other failures in that test lately.

- Kyle

> As for my patchset it looks like it does not cause any new test failures
> for rr so I will plan on getting it into linux-next shortly.
>
> Eric
>

^ permalink raw reply	[flat|nested] 572+ messages in thread

end of thread, other threads:[~2022-07-16 23:23 UTC | newest]

Thread overview: 572+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 15:02 [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Peter Zijlstra
2022-04-21 15:02 ` [PATCH v2 1/5] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Peter Zijlstra
2022-04-26 23:34   ` Eric W. Biederman
2022-04-28 10:00     ` Peter Zijlstra
2022-04-21 15:02 ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Peter Zijlstra
2022-04-21 18:23   ` Oleg Nesterov
2022-04-21 19:58     ` Peter Zijlstra
2022-04-21 18:40   ` Eric W. Biederman
2022-04-26 22:50     ` [PATCH 0/9] ptrace: cleaning up ptrace_stop Eric W. Biederman
2022-04-26 22:50       ` Eric W. Biederman
2022-04-26 22:52       ` [PATCH 1/9] signal: Rename send_signal send_signal_locked Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-28 10:27         ` Peter Zijlstra
2022-04-28 10:27           ` Peter Zijlstra
2022-04-26 22:52       ` [PATCH 2/9] signal: Replace __group_send_sig_info with send_signal_locked Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-26 22:52       ` [PATCH 3/9] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27  7:10         ` Johannes Berg
2022-04-27  7:10           ` Johannes Berg
2022-04-27 13:50           ` Eric W. Biederman
2022-04-27 13:50             ` Eric W. Biederman
2022-04-26 22:52       ` [PATCH 4/9] ptrace/xtensa: Replace PT_SINGLESTEP " Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-26 23:33         ` Max Filippov
2022-04-26 23:33           ` Max Filippov
2022-04-26 22:52       ` [PATCH 5/9] signal: Protect parent child relationships by childs siglock Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27  6:40         ` Sebastian Andrzej Siewior
2022-04-27  6:40           ` Sebastian Andrzej Siewior
2022-04-27 13:35           ` Eric W. Biederman
2022-04-27 13:35             ` Eric W. Biederman
2022-04-26 22:52       ` [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27 14:10         ` Oleg Nesterov
2022-04-27 14:10           ` Oleg Nesterov
2022-04-27 14:20           ` Eric W. Biederman
2022-04-27 14:20             ` Eric W. Biederman
2022-04-27 14:43             ` Oleg Nesterov
2022-04-27 14:43               ` Oleg Nesterov
2022-04-27 14:47             ` Eric W. Biederman
2022-04-27 14:47               ` Eric W. Biederman
2022-04-28 17:44               ` Peter Zijlstra
2022-04-28 17:44                 ` Peter Zijlstra
2022-04-28 18:22                 ` Oleg Nesterov
2022-04-28 18:22                   ` Oleg Nesterov
2022-04-28 18:37                 ` Eric W. Biederman
2022-04-28 18:37                   ` Eric W. Biederman
2022-04-28 20:49                   ` Eric W. Biederman
2022-04-28 20:49                     ` Eric W. Biederman
2022-04-28 22:19                     ` Peter Zijlstra
2022-04-28 22:19                       ` Peter Zijlstra
2022-04-27 14:56         ` Oleg Nesterov
2022-04-27 14:56           ` Oleg Nesterov
2022-04-27 15:00           ` Oleg Nesterov
2022-04-27 15:00             ` Oleg Nesterov
2022-04-27 21:52             ` Eric W. Biederman
2022-04-27 21:52               ` Eric W. Biederman
2022-04-28 10:38         ` Peter Zijlstra
2022-04-28 10:38           ` Peter Zijlstra
2022-04-26 22:52       ` [PATCH 7/9] ptrace: Simplify the wait_task_inactive call in ptrace_check_attach Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27 13:42         ` Eric W. Biederman
2022-04-27 13:42           ` Eric W. Biederman
2022-04-27 14:27           ` Eric W. Biederman
2022-04-27 14:27             ` Eric W. Biederman
2022-04-27 15:14         ` Oleg Nesterov
2022-04-27 15:14           ` Oleg Nesterov
2022-04-28 10:42           ` Peter Zijlstra
2022-04-28 10:42             ` Peter Zijlstra
2022-04-28 11:19             ` Oleg Nesterov
2022-04-28 11:19               ` Oleg Nesterov
2022-04-28 13:54               ` Peter Zijlstra
2022-04-28 13:54                 ` Peter Zijlstra
2022-04-28 14:57                 ` Oleg Nesterov
2022-04-28 14:57                   ` Oleg Nesterov
2022-04-28 16:09                   ` Peter Zijlstra
2022-04-28 16:09                     ` Peter Zijlstra
2022-04-28 16:19                     ` Oleg Nesterov
2022-04-28 16:19                       ` Oleg Nesterov
2022-04-26 22:52       ` [PATCH 8/9] ptrace: Use siglock instead of tasklist_lock " Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27 15:20         ` Oleg Nesterov
2022-04-27 15:20           ` Oleg Nesterov
2022-04-26 22:52       ` [PATCH 9/9] ptrace: Don't change __state Eric W. Biederman
2022-04-26 22:52         ` Eric W. Biederman
2022-04-27 15:41         ` Oleg Nesterov
2022-04-27 15:41           ` Oleg Nesterov
2022-04-27 22:35           ` Eric W. Biederman
2022-04-27 22:35             ` Eric W. Biederman
2022-04-27 16:09         ` Oleg Nesterov
2022-04-27 16:33           ` Eric W. Biederman
2022-04-27 17:18             ` Oleg Nesterov
2022-04-27 17:18               ` Oleg Nesterov
2022-04-27 17:21               ` Oleg Nesterov
2022-04-27 17:21                 ` Oleg Nesterov
2022-04-27 17:31                 ` Eric W. Biederman
2022-04-27 17:31                   ` Eric W. Biederman
2022-04-27 23:05         ` Eric W. Biederman
2022-04-27 23:05           ` Eric W. Biederman
2022-04-28 15:11           ` Oleg Nesterov
2022-04-28 15:11             ` Oleg Nesterov
2022-04-28 16:50             ` Eric W. Biederman
2022-04-28 16:50               ` Eric W. Biederman
2022-04-28 18:53               ` Oleg Nesterov
2022-04-28 18:53                 ` Oleg Nesterov
2022-04-28 10:07       ` [PATCH 0/9] ptrace: cleaning up ptrace_stop Peter Zijlstra
2022-04-28 10:07         ` Peter Zijlstra
2022-04-29 21:46       ` [PATCH 0/12] " Eric W. Biederman
2022-04-29 21:46         ` Eric W. Biederman
2022-04-29 21:46         ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 01/12] signal: Rename send_signal send_signal_locked Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-05-02  7:50           ` Sebastian Andrzej Siewior
2022-05-02  7:50             ` Sebastian Andrzej Siewior
2022-05-02  7:50             ` Sebastian Andrzej Siewior
2022-04-29 21:48         ` [PATCH v2 02/12] signal: Replace __group_send_sig_info with send_signal_locked Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-05-02  7:58           ` Sebastian Andrzej Siewior
2022-05-02  7:58             ` Sebastian Andrzej Siewior
2022-05-02  7:58             ` Sebastian Andrzej Siewior
2022-04-29 21:48         ` [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP " Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-05-02 14:37           ` Oleg Nesterov
2022-05-02 14:37             ` Oleg Nesterov
2022-05-02 14:37             ` Oleg Nesterov
2022-05-03 19:36             ` Eric W. Biederman
2022-05-03 19:36               ` Eric W. Biederman
2022-05-03 19:36               ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 07/12] ptrace: Don't change __state Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 22:27           ` Peter Zijlstra
2022-04-29 22:27             ` Peter Zijlstra
2022-04-29 22:27             ` Peter Zijlstra
2022-05-02  8:59           ` Sebastian Andrzej Siewior
2022-05-02  8:59             ` Sebastian Andrzej Siewior
2022-05-02  8:59             ` Sebastian Andrzej Siewior
2022-05-02 15:39           ` Oleg Nesterov
2022-05-02 15:39             ` Oleg Nesterov
2022-05-02 15:39             ` Oleg Nesterov
2022-05-02 16:35             ` Eric W. Biederman
2022-05-02 16:35               ` Eric W. Biederman
2022-05-02 16:35               ` Eric W. Biederman
2022-05-03 13:41               ` Oleg Nesterov
2022-05-03 13:41                 ` Oleg Nesterov
2022-05-03 13:41                 ` Oleg Nesterov
2022-05-03 20:45                 ` Eric W. Biederman
2022-05-03 20:45                   ` Eric W. Biederman
2022-05-03 20:45                   ` Eric W. Biederman
2022-05-04 14:02                   ` Oleg Nesterov
2022-05-04 14:02                     ` Oleg Nesterov
2022-05-04 14:02                     ` Oleg Nesterov
2022-05-04 17:37                     ` Eric W. Biederman
2022-05-04 17:37                       ` Eric W. Biederman
2022-05-04 17:37                       ` Eric W. Biederman
2022-05-04 18:28                       ` Eric W. Biederman
2022-05-04 18:28                         ` Eric W. Biederman
2022-05-04 18:28                         ` Eric W. Biederman
2022-05-02 15:47           ` Oleg Nesterov
2022-05-02 15:47             ` Oleg Nesterov
2022-05-02 15:47             ` Oleg Nesterov
2022-04-29 21:48         ` [PATCH v2 08/12] ptrace: Remove arch_ptrace_attach Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 09/12] ptrace: Always take siglock in ptrace_resume Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-05-02 10:08           ` Sebastian Andrzej Siewior
2022-05-02 10:08             ` Sebastian Andrzej Siewior
2022-05-02 10:08             ` Sebastian Andrzej Siewior
2022-04-29 21:48         ` [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48         ` [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Eric W. Biederman
2022-04-29 21:48           ` Eric W. Biederman
2022-04-29 21:48           ` [PATCH v2 12/12] sched, signal, ptrace: " Eric W. Biederman
2022-05-02 10:18           ` [PATCH v2 12/12] sched,signal,ptrace: " Sebastian Andrzej Siewior
2022-05-02 10:18             ` Sebastian Andrzej Siewior
2022-05-02 10:18             ` Sebastian Andrzej Siewior
2022-05-02 13:38         ` [PATCH 0/12] ptrace: cleaning up ptrace_stop Sebastian Andrzej Siewior
2022-05-02 13:38           ` Sebastian Andrzej Siewior
2022-05-02 13:38           ` Sebastian Andrzej Siewior
2022-05-04 22:39         ` [PATCH v3 0/11] " Eric W. Biederman
2022-05-04 22:39           ` Eric W. Biederman
2022-05-04 22:39           ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 01/11] signal: Rename send_signal send_signal_locked Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 02/11] signal: Replace __group_send_sig_info with send_signal_locked Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 03/11] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 04/11] ptrace/xtensa: Replace PT_SINGLESTEP " Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 05/11] ptrace: Remove arch_ptrace_attach Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 06/11] signal: Use lockdep_assert_held instead of assert_spin_locked Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 07/11] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-05 14:57             ` Oleg Nesterov
2022-05-05 14:57               ` Oleg Nesterov
2022-05-05 14:57               ` Oleg Nesterov
2022-05-05 16:59               ` Eric W. Biederman
2022-05-05 16:59                 ` Eric W. Biederman
2022-05-05 16:59                 ` Eric W. Biederman
2022-05-05 15:01             ` Oleg Nesterov
2022-05-05 15:01               ` Oleg Nesterov
2022-05-05 15:01               ` Oleg Nesterov
2022-05-05 17:21               ` Eric W. Biederman
2022-05-05 17:21                 ` Eric W. Biederman
2022-05-05 17:21                 ` Eric W. Biederman
2022-05-05 17:27                 ` Oleg Nesterov
2022-05-05 17:27                   ` Oleg Nesterov
2022-05-05 17:27                   ` Oleg Nesterov
2022-05-05 15:28             ` Oleg Nesterov
2022-05-05 15:28               ` Oleg Nesterov
2022-05-05 15:28               ` Oleg Nesterov
2022-05-05 17:53               ` Eric W. Biederman
2022-05-05 17:53                 ` Eric W. Biederman
2022-05-05 17:53                 ` Eric W. Biederman
2022-05-05 18:10                 ` Oleg Nesterov
2022-05-05 18:10                   ` Oleg Nesterov
2022-05-05 18:10                   ` Oleg Nesterov
2022-05-04 22:40           ` [PATCH v3 09/11] ptrace: Don't change __state Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-05 12:50             ` Sebastian Andrzej Siewior
2022-05-05 12:50               ` Sebastian Andrzej Siewior
2022-05-05 12:50               ` Sebastian Andrzej Siewior
2022-05-05 16:48               ` Eric W. Biederman
2022-05-05 16:48                 ` Eric W. Biederman
2022-05-05 16:48                 ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 10/11] ptrace: Always take siglock in ptrace_resume Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40           ` [PATCH v3 11/11] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Eric W. Biederman
2022-05-04 22:40             ` Eric W. Biederman
2022-05-04 22:40             ` [PATCH v3 11/11] sched, signal, ptrace: " Eric W. Biederman
2022-05-05 18:25           ` [PATCH v4 0/12] ptrace: cleaning up ptrace_stop Eric W. Biederman
2022-05-05 18:25             ` Eric W. Biederman
2022-05-05 18:25             ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 01/12] signal: Rename send_signal send_signal_locked Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 02/12] signal: Replace __group_send_sig_info with send_signal_locked Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 04/12] ptrace/xtensa: Replace PT_SINGLESTEP " Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 05/12] ptrace: Remove arch_ptrace_attach Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 06/12] signal: Use lockdep_assert_held instead of assert_spin_locked Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 07/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 08/12] ptrace: Document that wait_task_inactive can't fail Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-06  6:55               ` Sebastian Andrzej Siewior
2022-05-06  6:55                 ` Sebastian Andrzej Siewior
2022-05-06  6:55                 ` Sebastian Andrzej Siewior
2022-05-05 18:26             ` [PATCH v4 09/12] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 10/12] ptrace: Don't change __state Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-06 15:09               ` Oleg Nesterov
2022-05-06 15:09                 ` Oleg Nesterov
2022-05-06 15:09                 ` Oleg Nesterov
2022-05-06 19:42                 ` Eric W. Biederman
2022-05-06 19:42                   ` Eric W. Biederman
2022-05-06 19:42                   ` Eric W. Biederman
2022-05-10 14:23               ` Oleg Nesterov
2022-05-10 14:23                 ` Oleg Nesterov
2022-05-10 14:23                 ` Oleg Nesterov
2022-05-10 15:17                 ` Eric W. Biederman
2022-05-10 15:17                   ` Eric W. Biederman
2022-05-10 15:17                   ` Eric W. Biederman
2022-05-10 15:34                   ` Oleg Nesterov
2022-05-10 15:34                     ` Oleg Nesterov
2022-05-10 15:34                     ` Oleg Nesterov
2022-05-05 18:26             ` [PATCH v4 11/12] ptrace: Always take siglock in ptrace_resume Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26             ` [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Eric W. Biederman
2022-05-05 18:26               ` Eric W. Biederman
2022-05-05 18:26               ` [PATCH v4 12/12] sched, signal, ptrace: " Eric W. Biederman
2022-06-21 13:00               ` [PATCH v4 12/12] sched,signal,ptrace: " Alexander Gordeev
2022-06-21 13:00                 ` Alexander Gordeev
2022-06-21 13:00                 ` Alexander Gordeev
2022-06-21 14:02                 ` Eric W. Biederman
2022-06-21 14:02                   ` Eric W. Biederman
2022-06-21 15:15                   ` Alexander Gordeev
2022-06-21 15:15                     ` Alexander Gordeev
2022-06-21 15:15                     ` Alexander Gordeev
2022-06-21 17:47                     ` Eric W. Biederman
2022-06-21 17:47                       ` Eric W. Biederman
2022-06-21 17:47                       ` Eric W. Biederman
2022-06-25 16:34                     ` Eric W. Biederman
2022-06-25 16:34                       ` Eric W. Biederman
2022-06-25 16:34                       ` Eric W. Biederman
2022-06-28 18:36                       ` Alexander Gordeev
2022-06-28 18:36                         ` Alexander Gordeev
2022-06-28 18:36                         ` Alexander Gordeev
2022-06-28 22:42                         ` Eric W. Biederman
2022-06-28 22:42                           ` Eric W. Biederman
2022-06-28 22:48                           ` Steven Rostedt
2022-06-28 22:48                             ` Steven Rostedt
2022-06-29  3:39                             ` Eric W. Biederman
2022-06-29  3:39                               ` Eric W. Biederman
2022-06-29  3:39                               ` Eric W. Biederman
2022-06-29 20:25                               ` Alexander Gordeev
2022-06-29 20:25                                 ` Alexander Gordeev
2022-06-29 20:25                                 ` Alexander Gordeev
2022-07-05 15:44                               ` Peter Zijlstra
2022-07-05 15:44                                 ` Peter Zijlstra
2022-07-05 15:44                                 ` Peter Zijlstra
2022-07-06  6:56                                 ` Alexander Gordeev
2022-07-06  6:56                                   ` Alexander Gordeev
2022-07-06  6:56                                   ` Alexander Gordeev
2022-06-28 23:15                     ` Steven Rostedt
2022-06-28 23:15                       ` Steven Rostedt
2022-06-28 23:15                       ` Steven Rostedt
2022-07-05 13:47                       ` Sven Schnelle
2022-07-05 13:47                         ` Sven Schnelle
2022-07-05 13:47                         ` Sven Schnelle
2022-07-05 17:28                         ` Sven Schnelle
2022-07-05 17:28                           ` Sven Schnelle
2022-07-05 17:28                           ` Sven Schnelle
2022-07-05 19:25                           ` Peter Zijlstra
2022-07-05 19:25                             ` Peter Zijlstra
2022-07-05 19:25                             ` Peter Zijlstra
2022-07-06  7:58                             ` Sven Schnelle
2022-07-06  7:58                               ` Sven Schnelle
2022-07-06  7:58                               ` Sven Schnelle
2022-07-06  8:59                               ` Peter Zijlstra
2022-07-06  8:59                                 ` Peter Zijlstra
2022-07-06  8:59                                 ` Peter Zijlstra
2022-07-06  9:27                                 ` Sven Schnelle
2022-07-06  9:27                                   ` Sven Schnelle
2022-07-06  9:27                                   ` Sven Schnelle
2022-07-06 10:11                                   ` Peter Zijlstra
2022-07-06 10:11                                     ` Peter Zijlstra
2022-05-06 14:14             ` [PATCH v4 0/12] ptrace: cleaning up ptrace_stop Oleg Nesterov
2022-05-06 14:14               ` Oleg Nesterov
2022-05-06 14:14               ` Oleg Nesterov
2022-05-06 14:38               ` Eric W. Biederman
2022-05-06 14:38                 ` Eric W. Biederman
2022-05-06 14:38                 ` Eric W. Biederman
2022-05-06 21:26             ` Kees Cook
2022-05-06 21:26               ` Kees Cook
2022-05-06 21:59               ` Eric W. Biederman
2022-05-06 21:59                 ` Eric W. Biederman
2022-05-06 21:59                 ` Eric W. Biederman
2022-05-10 14:11             ` Oleg Nesterov
2022-05-10 14:11               ` Oleg Nesterov
2022-05-10 14:11               ` Oleg Nesterov
2022-05-10 14:26               ` Eric W. Biederman
2022-05-10 14:26                 ` Eric W. Biederman
2022-05-10 14:26                 ` Eric W. Biederman
2022-05-10 14:45                 ` Sebastian Andrzej Siewior
2022-05-10 14:45                   ` Sebastian Andrzej Siewior
2022-05-10 14:45                   ` Sebastian Andrzej Siewior
2022-05-10 15:18                   ` Eric W. Biederman
2022-05-10 15:18                     ` Eric W. Biederman
2022-05-10 15:18                     ` Eric W. Biederman
2022-05-11 20:00                 ` Eric W. Biederman
2022-05-11 20:00                   ` Eric W. Biederman
2022-05-11 20:00                   ` Eric W. Biederman
2022-05-18 22:49             ` [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock Eric W. Biederman
2022-05-18 22:49               ` Eric W. Biederman
2022-05-18 22:49               ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 02/16] signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 03/16] kdb: Use real_parent when displaying a list of processes Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-19  7:56                 ` Peter Zijlstra
2022-05-19  7:56                   ` Peter Zijlstra
2022-05-19  7:56                   ` Peter Zijlstra
2022-05-19 18:06                   ` Eric W. Biederman
2022-05-19 18:06                     ` Eric W. Biederman
2022-05-19 18:06                     ` Eric W. Biederman
2022-05-19 20:52                 ` Doug Anderson
2022-05-19 20:52                   ` Doug Anderson
2022-05-19 20:52                   ` Doug Anderson
2022-05-19 23:48                   ` Eric W. Biederman
2022-05-19 23:48                     ` Eric W. Biederman
2022-05-19 23:48                     ` Eric W. Biederman
2022-05-20 23:01                     ` Doug Anderson
2022-05-20 23:01                       ` Doug Anderson
2022-05-20 23:01                       ` Doug Anderson
2022-05-18 22:53               ` [PATCH 04/16] powerpc/xmon: " Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-24 11:42                 ` Oleg Nesterov
2022-05-24 11:42                   ` Oleg Nesterov
2022-05-24 11:42                   ` Oleg Nesterov
2022-05-25 14:33                   ` Oleg Nesterov
2022-05-25 14:33                     ` Oleg Nesterov
2022-05-25 14:33                     ` Oleg Nesterov
2022-06-06 16:06                     ` Eric W. Biederman
2022-06-06 16:06                       ` Eric W. Biederman
2022-06-06 16:06                       ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-24 13:25                 ` Oleg Nesterov
2022-05-24 13:25                   ` Oleg Nesterov
2022-05-24 13:25                   ` Oleg Nesterov
2022-05-18 22:53               ` [PATCH 07/16] signal: Wake up the designated parent Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-24 13:25                 ` Oleg Nesterov
2022-05-24 13:25                   ` Oleg Nesterov
2022-05-24 13:25                   ` Oleg Nesterov
2022-05-24 16:28                   ` Oleg Nesterov
2022-05-24 16:28                     ` Oleg Nesterov
2022-05-24 16:28                     ` Oleg Nesterov
2022-05-25 14:28                     ` Oleg Nesterov
2022-05-25 14:28                       ` Oleg Nesterov
2022-05-25 14:28                       ` Oleg Nesterov
2022-06-06 22:10                       ` Eric W. Biederman
2022-06-06 22:10                         ` Eric W. Biederman
2022-06-06 22:10                         ` Eric W. Biederman
2022-06-07 15:26                         ` Oleg Nesterov
2022-06-07 15:26                           ` Oleg Nesterov
2022-06-07 15:26                           ` Oleg Nesterov
2022-05-18 22:53               ` [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-24 15:27                 ` Oleg Nesterov
2022-05-24 15:27                   ` Oleg Nesterov
2022-05-24 15:27                   ` Oleg Nesterov
2022-06-06 22:16                   ` Eric W. Biederman
2022-06-06 22:16                     ` Eric W. Biederman
2022-06-06 22:16                     ` Eric W. Biederman
2022-06-07 15:29                     ` Oleg Nesterov
2022-06-07 15:29                       ` Oleg Nesterov
2022-06-07 15:29                       ` Oleg Nesterov
2022-05-18 22:53               ` [PATCH 09/16] ptrace: In ptrace_setsiginfo deal with invalid si_signo Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 10/16] ptrace: In ptrace_signal look at what the debugger did with siginfo Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 11/16] ptrace: Use si_sino as the signal number to resume with Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 12/16] ptrace: Stop protecting ptrace_set_signr with tasklist_lock Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 13/16] ptrace: Document why ptrace_setoptions does not need a lock Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 14/16] signal: Protect parent child relationships by childs siglock Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 15/16] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53               ` [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-18 22:53                 ` Eric W. Biederman
2022-05-20 16:19                 ` kernel test robot
2022-05-20 16:19                   ` kernel test robot
2022-05-20 16:19                   ` kernel test robot
     [not found]               ` <CALWUPBdFDLuT7JaNGSJ_UXbHf8y9uKdC-SkAqzd=FQC0MX4nNQ@mail.gmail.com>
2022-05-19  6:19                 ` [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock Sebastian Andrzej Siewior
2022-05-19  6:19                   ` Sebastian Andrzej Siewior
2022-05-19  6:19                   ` Sebastian Andrzej Siewior
2022-05-19 18:05                   ` Eric W. Biederman
2022-05-19 18:05                     ` Eric W. Biederman
2022-05-19 18:05                     ` Eric W. Biederman
2022-05-20  5:24                     ` Kyle Huey
2022-05-20  5:24                       ` Kyle Huey
2022-05-20  5:24                       ` Kyle Huey
2022-06-06 16:12                       ` Eric W. Biederman
2022-06-06 16:12                         ` Eric W. Biederman
2022-06-09 19:59                         ` Kyle Huey
2022-06-09 19:59                           ` Kyle Huey
2022-06-09 19:59                           ` Kyle Huey
2022-05-20  7:33               ` Sebastian Andrzej Siewior
2022-05-20  7:33                 ` Sebastian Andrzej Siewior
2022-05-20  7:33                 ` Sebastian Andrzej Siewior
2022-05-20 19:32                 ` Eric W. Biederman
2022-05-20 19:32                   ` Eric W. Biederman
2022-05-20 19:32                   ` Eric W. Biederman
2022-05-20 19:58                   ` Peter Zijlstra
2022-05-20 19:58                     ` Peter Zijlstra
2022-05-20 19:58                     ` Peter Zijlstra
2022-05-20  9:19               ` Sebastian Andrzej Siewior
2022-05-20  9:19                 ` Sebastian Andrzej Siewior
2022-05-20  9:19                 ` Sebastian Andrzej Siewior
2022-06-22 16:43             ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Eric W. Biederman
2022-06-22 16:45               ` [PATCH 1/3] signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit Eric W. Biederman
2022-06-22 16:46               ` [PATCH 2/3] signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit Eric W. Biederman
2022-06-23  7:49                 ` kernel test robot
2022-06-22 16:47               ` [PATCH 3/3] signal: Drop signals received after a fatal signal has been processed Eric W. Biederman
2022-06-23 15:12               ` [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT Alexander Gordeev
2022-06-23 21:55                 ` Eric W. Biederman
2022-07-08 22:25               ` Eric W. Biederman
2022-07-08 23:22                 ` Keno Fischer
2022-07-12 20:03                   ` Eric W. Biederman
2022-07-16 21:29                     ` Eric W. Biederman
2022-07-16 23:21                       ` Kyle Huey
2022-04-25 14:35   ` [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT Oleg Nesterov
2022-04-25 18:33     ` Peter Zijlstra
2022-04-26  0:38       ` Eric W. Biederman
2022-04-26  5:51         ` Oleg Nesterov
2022-04-26 17:19           ` Eric W. Biederman
2022-04-26 18:11             ` Oleg Nesterov
2022-04-25 17:47   ` Oleg Nesterov
2022-04-27  0:24     ` Eric W. Biederman
2022-04-28 20:29       ` Peter Zijlstra
2022-04-28 20:59         ` Oleg Nesterov
2022-04-28 22:21           ` Peter Zijlstra
2022-04-28 22:50             ` Oleg Nesterov
2022-04-27 15:53   ` Oleg Nesterov
2022-04-27 21:57     ` Eric W. Biederman
2022-04-21 15:02 ` [PATCH v2 3/5] freezer: Have {,un}lock_system_sleep() save/restore flags Peter Zijlstra
2022-04-21 15:02 ` [PATCH v2 4/5] freezer,umh: Clean up freezer/initrd interaction Peter Zijlstra
2022-04-21 15:02 ` [PATCH v2 5/5] freezer,sched: Rewrite core freezer logic Peter Zijlstra
2022-04-21 17:26   ` Eric W. Biederman
2022-04-21 17:57     ` Oleg Nesterov
2022-04-21 19:55     ` Peter Zijlstra
2022-04-21 20:07       ` Peter Zijlstra
2022-04-22 15:52         ` Eric W. Biederman
2022-04-22 17:43 ` [PATCH v2 0/5] ptrace-vs-PREEMPT_RT and freezer rewrite Sebastian Andrzej Siewior
2022-04-22 19:15   ` Eric W. Biederman
2022-04-22 21:13     ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.