linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] signal: don't always leave task frozen after ptrace_stop()
@ 2019-05-13 19:55 Roman Gushchin
  2019-05-14 16:01 ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Gushchin @ 2019-05-13 19:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Oleg Nesterov, Alex Xu, kernel-team, cgroups, linux-kernel,
	Roman Gushchin

The ptrace_stop() function contains the cgroup_enter_frozen() call,
but no cgroup_leave_frozen(). When ptrace_stop() is called from the
do_jobctl_trap() path, it's correct, because corresponding
cgroup_leave_frozen() calls in get_signal() will guarantee that
the task won't leave the signal handler loop frozen.

However, if ptrace_stop() is called from ptrace_signal() or
ptrace_notify(), there is no such guarantee, and the task may leave
with the frozen bit set.

It leads to the regression, reported by Alex Xu. Write system call
gets mistakenly interrupted by fake TIF_SIGPENDING, which is set
by recalc_sigpending_tsk() because of the set frozen bit.

The regression can be reproduced by stracing the following simple
program:

  #include <unistd.h>

  int main() {
      write(1, "a", 1);
      return 0;
  }

An attempt to run strace ./a.out leads to the infinite loop:
  [ pre-main omitted ]
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  [ repeats forever ]

With this patch applied, it works as expected:
  [ pre-main omitted ]
  write(1, "a", 1)                        = 1
  exit_group(0)                           = ?
  +++ exited with 0 +++

Reported-by: Alex Xu <alex_y_xu@yahoo.ca>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/signal.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8607b11ff936..f12abbda4c4b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2016,7 +2016,8 @@ static bool sigkill_pending(struct task_struct *tsk)
  * If we actually decide not to stop at all because the tracer
  * is gone, we keep current->exit_code unless clear_code.
  */
-static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code,
+			kernel_siginfo_t *info, bool may_remain_frozen)
 	__releases(&current->sighand->siglock)
 	__acquires(&current->sighand->siglock)
 {
@@ -2112,6 +2113,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
 		preempt_enable_no_resched();
 		cgroup_enter_frozen();
 		freezable_schedule();
+		if (!may_remain_frozen)
+			cgroup_leave_frozen(true);
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
@@ -2152,7 +2155,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
 	recalc_sigpending_tsk(current);
 }
 
-static void ptrace_do_notify(int signr, int exit_code, int why)
+static void ptrace_do_notify(int signr, int exit_code, int why,
+			     bool may_remain_frozen)
 {
 	kernel_siginfo_t info;
 
@@ -2163,7 +2167,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
-	ptrace_stop(exit_code, why, 1, &info);
+	ptrace_stop(exit_code, why, 1, &info, may_remain_frozen);
 }
 
 void ptrace_notify(int exit_code)
@@ -2173,7 +2177,7 @@ void ptrace_notify(int exit_code)
 		task_work_run();
 
 	spin_lock_irq(&current->sighand->siglock);
-	ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED);
+	ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED, false);
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
@@ -2328,10 +2332,10 @@ static void do_jobctl_trap(void)
 			signr = SIGTRAP;
 		WARN_ON_ONCE(!signr);
 		ptrace_do_notify(signr, signr | (PTRACE_EVENT_STOP << 8),
-				 CLD_STOPPED);
+				 CLD_STOPPED, true);
 	} else {
 		WARN_ON_ONCE(!signr);
-		ptrace_stop(signr, CLD_STOPPED, 0, NULL);
+		ptrace_stop(signr, CLD_STOPPED, 0, NULL, true);
 		current->exit_code = 0;
 	}
 }
@@ -2385,7 +2389,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
 	 * comment in dequeue_signal().
 	 */
 	current->jobctl |= JOBCTL_STOP_DEQUEUED;
-	ptrace_stop(signr, CLD_TRAPPED, 0, info);
+	ptrace_stop(signr, CLD_TRAPPED, 0, info, false);
 
 	/* We're back.  Did the debugger cancel the sig?  */
 	signr = current->exit_code;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: don't always leave task frozen after ptrace_stop()
  2019-05-13 19:55 [PATCH] signal: don't always leave task frozen after ptrace_stop() Roman Gushchin
@ 2019-05-14 16:01 ` Oleg Nesterov
  2019-05-14 17:46   ` Roman Gushchin
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2019-05-14 16:01 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: Tejun Heo, Alex Xu, kernel-team, cgroups, linux-kernel

Roman,

Sorry, I can't agree with this patch. And even the changelog doesn't
look right.

On 05/13, Roman Gushchin wrote:
>
> The ptrace_stop() function contains the cgroup_enter_frozen() call,
> but no cgroup_leave_frozen(). When ptrace_stop() is called from the
> do_jobctl_trap() path, it's correct, because corresponding
> cgroup_leave_frozen() calls in get_signal() will guarantee that
> the task won't leave the signal handler loop frozen.
>
> However, if ptrace_stop() is called from ptrace_signal() or
> ptrace_notify(), there is no such guarantee, and the task may leave
> with the frozen bit set.

ptrace_signal() looks fine in that the task can't return to user-mode,
get_signal() will be called again exactly because ->frozen == 1 means
TIF_SIGPENDING. So I an not surre I understand why ptrace_signal() does
ptrace_stop(false) with your patch. But this is minor.

> It leads to the regression, reported by Alex Xu. Write system call
> gets mistakenly interrupted by fake TIF_SIGPENDING, which is set
> by recalc_sigpending_tsk() because of the set frozen bit.

IMHO, the real problem is not that syscall was interrupted. The problem
is that a frozen task must never start the syscall.


---------------------------------------------------------------------------

Can't we add the unconditional leave_frozen() into ptrace_stop() for now ?

Sure, this is not what we want. Debugger can disturb CGRP_FROZEN.

But. The "may_remain_frozen" argument uglifies this code too much (imo) and
at the same time it doesn't solve the problem above: CGRP_FROZEN can be cleared
"for no reason".

Say, why ptrace_event_pid() should do leave_frozen(true) ? And if there is any
reason, then why wait_for_vfork_done() can do leave_frozen(false) ?

Or syscall-exit path. It can't miss get_signal(), it doesn't need leave_frozen().


In short, I believe that compared to the unconditional leave_frozen() in ptrace_stop()
this patch buys almost nothing, but makes the code and the whole logic much uglier.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: don't always leave task frozen after ptrace_stop()
  2019-05-14 16:01 ` Oleg Nesterov
@ 2019-05-14 17:46   ` Roman Gushchin
  2019-05-15 14:43     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Gushchin @ 2019-05-14 17:46 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, Alex Xu, Kernel Team, cgroups, linux-kernel

On Tue, May 14, 2019 at 06:01:59PM +0200, Oleg Nesterov wrote:
> Roman,
> 
> Sorry, I can't agree with this patch. And even the changelog doesn't
> look right.
> 
> On 05/13, Roman Gushchin wrote:
> >
> > The ptrace_stop() function contains the cgroup_enter_frozen() call,
> > but no cgroup_leave_frozen(). When ptrace_stop() is called from the
> > do_jobctl_trap() path, it's correct, because corresponding
> > cgroup_leave_frozen() calls in get_signal() will guarantee that
> > the task won't leave the signal handler loop frozen.
> >
> > However, if ptrace_stop() is called from ptrace_signal() or
> > ptrace_notify(), there is no such guarantee, and the task may leave
> > with the frozen bit set.
> 
> ptrace_signal() looks fine in that the task can't return to user-mode,
> get_signal() will be called again exactly because ->frozen == 1 means
> TIF_SIGPENDING. So I an not surre I understand why ptrace_signal() does
> ptrace_stop(false) with your patch. But this is minor.
> 
> > It leads to the regression, reported by Alex Xu. Write system call
> > gets mistakenly interrupted by fake TIF_SIGPENDING, which is set
> > by recalc_sigpending_tsk() because of the set frozen bit.
> 
> IMHO, the real problem is not that syscall was interrupted. The problem
> is that a frozen task must never start the syscall.
> 
> 
> ---------------------------------------------------------------------------
> 
> Can't we add the unconditional leave_frozen() into ptrace_stop() for now ?
> 
> Sure, this is not what we want. Debugger can disturb CGRP_FROZEN.
> 
> But. The "may_remain_frozen" argument uglifies this code too much (imo) and
> at the same time it doesn't solve the problem above: CGRP_FROZEN can be cleared
> "for no reason".
> 
> Say, why ptrace_event_pid() should do leave_frozen(true) ? And if there is any
> reason, then why wait_for_vfork_done() can do leave_frozen(false) ?
> 
> Or syscall-exit path. It can't miss get_signal(), it doesn't need leave_frozen().
> 
> 
> In short, I believe that compared to the unconditional leave_frozen() in ptrace_stop()
> this patch buys almost nothing, but makes the code and the whole logic much uglier.
> 
> Oleg.
> 

Hi Oleg!

I agree that "may_remain_frozen" adds a lot of ugliness, so let's fix
the regression with the unconditional leave_frozen(true). The patch below.
Please, let me know if it's not what you meant.

The problem is that it makes the ptrace freezer kselftest to flap,
so it's good only as a temporarily solution. But it looks like we agree here.

Thank you!


--

From 2602261b066a06f6884057d2cd7369951768b9ed Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Tue, 14 May 2019 10:13:19 -0700
Subject: [PATCH] signal: unconditionally leave the frozen state in
 ptrace_stop()

Alex Xu reported a regression in strace, caused by the introduction of
the cgroup v2 freezer. The regression can be reproduced by stracing
the following simple program:

  #include <unistd.h>

  int main() {
      write(1, "a", 1);
      return 0;
  }

An attempt to run strace ./a.out leads to the infinite loop:
  [ pre-main omitted ]
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
  [ repeats forever ]

The problem occurs because the traced task leaves ptrace_stop()
(and the signal handling loop) with the frozen bit set. So let's
call cgroup_leave_frozen(true) unconditionally after sleeping
in ptrace_stop().

With this patch applied, strace works as expected:
  [ pre-main omitted ]
  write(1, "a", 1)                        = 1
  exit_group(0)                           = ?
  +++ exited with 0 +++

Reported-by: Alex Xu <alex_y_xu@yahoo.ca>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8607b11ff936..565ba14d89d5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2112,6 +2112,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
 		preempt_enable_no_resched();
 		cgroup_enter_frozen();
 		freezable_schedule();
+		cgroup_leave_frozen(true);
 	} else {
 		/*
 		 * By the time we got the lock, our tracer went away.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: don't always leave task frozen after ptrace_stop()
  2019-05-14 17:46   ` Roman Gushchin
@ 2019-05-15 14:43     ` Oleg Nesterov
  2019-05-15 17:03       ` Roman Gushchin
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2019-05-15 14:43 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: Tejun Heo, Alex Xu, Kernel Team, cgroups, linux-kernel

On 05/14, Roman Gushchin wrote:
>
> I agree that "may_remain_frozen" adds a lot of ugliness, so let's fix
> the regression with the unconditional leave_frozen(true). The patch below.
> Please, let me know if it's not what you meant.

Yes, thanks, this is what I meant. Feel free to add my ACK.

> so it's good only as a temporarily solution. But it looks like we agree here.

Yes, yes.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] signal: don't always leave task frozen after ptrace_stop()
  2019-05-15 14:43     ` Oleg Nesterov
@ 2019-05-15 17:03       ` Roman Gushchin
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2019-05-15 17:03 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo
  Cc: Tejun Heo, Alex Xu, Kernel Team, cgroups, linux-kernel

On Wed, May 15, 2019 at 04:43:35PM +0200, Oleg Nesterov wrote:
> On 05/14, Roman Gushchin wrote:
> >
> > I agree that "may_remain_frozen" adds a lot of ugliness, so let's fix
> > the regression with the unconditional leave_frozen(true). The patch below.
> > Please, let me know if it's not what you meant.
> 
> Yes, thanks, this is what I meant. Feel free to add my ACK.

Thanks!

Tejun, can you, please, pull the patch?

Thank you!

Roman

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-15 17:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13 19:55 [PATCH] signal: don't always leave task frozen after ptrace_stop() Roman Gushchin
2019-05-14 16:01 ` Oleg Nesterov
2019-05-14 17:46   ` Roman Gushchin
2019-05-15 14:43     ` Oleg Nesterov
2019-05-15 17:03       ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).