linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
@ 2019-05-13  1:20 Alex Xu (Hello71)
  2019-05-13  1:57 ` Valdis Klētnieks
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alex Xu (Hello71) @ 2019-05-13  1:20 UTC (permalink / raw)
  To: linux-kernel, tj, guro; +Cc: oleg, kernel-team

Hi,

I was trying to use strace recently and found that it exhibited some 
strange behavior. I produced this minimal test case:

#include <unistd.h>

int main() {
    write(1, "a", 1);
    return 0;
}

which, when run using "gcc test.c && strace ./a.out" produces this 
strace output:

[ pre-main omitted ]
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[ repeats forever ]

The correct result is of course:

[ pre-main omitted ]
write(1, "a", 1)                        = 1
exit_group(0)                           = ?
+++ exited with 0 +++

Strangely, this only occurs when outputting to a tty-like output. 
Running "strace ./a.out" from a native Linux x86 console or a terminal 
emulator causes the abnormal behavior. However, the following commands 
work correctly:

- strace ./a.out >/dev/null
- strace ./a.out >/tmp/a # /tmp is a standard tmpfs
- strace ./a.out >&- # causes -1 EBADF (Bad file descriptor)

"strace -o /tmp/a ./a.out" hangs and produces the above (infinite) 
output to /tmp/a.

I bisected this to 76f969e, "cgroup: cgroup v2 freezer". I reverted the 
entire patchset (reverting only that one caused a conflict), which 
resolved the issue. I skimmed the patch and came up with this 
workaround, which also resolves the issue. I am not at all clear on the 
technical workings of the patchset, but it seems to me like a process's 
frozen status is supposed to be "suspended" when a frozen process is 
ptraced, and "unsuspended" when ptracing ends. Therefore, it seems 
suspicious to always "enter frozen" whether or not the cgroup is 
actually frozen. It seems like the code should instead check if the 
cgroup is actually frozen, and if so, restore the frozen status.

I am using systemd but not any other cgroup features. I tried in an 
initramfs environment (no systemd, /init -> shell script) and reproduced 
the failing test case.

Please CC me on replies.

Thanks,
Alex.

---
 kernel/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 62f9aea4a15a..47145d9d89ca 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2110,7 +2110,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
                preempt_disable();
                read_unlock(&tasklist_lock);
                preempt_enable_no_resched();
-               cgroup_enter_frozen();
+               //cgroup_enter_frozen();
                freezable_schedule();
        } else {
                /*
@@ -2289,7 +2289,7 @@ static bool do_signal_stop(int signr)
                }
 
                /* Now we don't run again until woken by SIGCONT or SIGKILL */
-               cgroup_enter_frozen();
+               //cgroup_enter_frozen();
                freezable_schedule();
                return true;
        } else {
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
  2019-05-13  1:20 [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) Alex Xu (Hello71)
@ 2019-05-13  1:57 ` Valdis Klētnieks
  2019-05-13 12:17 ` Oleg Nesterov
  2019-05-13 17:03 ` Roman Gushchin
  2 siblings, 0 replies; 6+ messages in thread
From: Valdis Klētnieks @ 2019-05-13  1:57 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: linux-kernel, tj, guro, oleg, kernel-team

[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]

On Sun, 12 May 2019 21:20:12 -0400, "Alex Xu (Hello71)" said:

> I bisected this to 76f969e, "cgroup: cgroup v2 freezer". I reverted the
> entire patchset (reverting only that one caused a conflict), which
> resolved the issue. I skimmed the patch and came up with this
> workaround, which also resolves the issue. I am not at all clear on the
> technical workings of the patchset, but it seems to me like a process's
> frozen status is supposed to be "suspended" when a frozen process is
> ptraced, and "unsuspended" when ptracing ends. Therefore, it seems
> suspicious to always "enter frozen" whether or not the cgroup is
> actually frozen. It seems like the code should instead check if the
> cgroup is actually frozen, and if so, restore the frozen status.

Your analysis seems to be more or less correct, especially since your
one-line workaround patch makes things start working again.

> -               cgroup_enter_frozen();
> +               //cgroup_enter_frozen();

However, I'm sure this isn't a proper fix - it needs an if statement guarding
it (or a check inside the function).  If the kernel is ever in a state where it
*should* be frozen, and it doesn't freeze, things will misbehave.

One has to wonder how many things would be different if the ptrace()
system call wasn't such a steaming pile of dingo's kidneys....

[-- Attachment #2: Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
  2019-05-13  1:20 [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) Alex Xu (Hello71)
  2019-05-13  1:57 ` Valdis Klētnieks
@ 2019-05-13 12:17 ` Oleg Nesterov
  2019-05-13 16:38   ` Oleg Nesterov
  2019-05-13 17:03 ` Roman Gushchin
  2 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2019-05-13 12:17 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: linux-kernel, tj, guro, kernel-team

On 05/12, Alex Xu (Hello71) wrote:
>
> Hi,
>
> I was trying to use strace recently and found that it exhibited some
> strange behavior. I produced this minimal test case:
>
> #include <unistd.h>
>
> int main() {
>     write(1, "a", 1);
>     return 0;
> }
>
> which, when run using "gcc test.c && strace ./a.out" produces this
> strace output:
>
> [ pre-main omitted ]
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> [ repeats forever ]

Yes, cgroup_enter_frozen() alone is wrong, we have already discussed this
a bit... see https://lore.kernel.org/lkml/20190508152536.GA17058@redhat.com/

Probably we add leave_frozen(true) after freezable_schedule() for now, then
think try to make something better...

But I am not sure I 100% understand whats going on in this case, could you
try the patch below? (Just in case, of course it is wrong).

Oleg.

--- x/kernel/signal.c
+++ x/kernel/signal.c
@@ -149,8 +149,7 @@
 {
 	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
 	    PENDING(&t->pending, &t->blocked) ||
-	    PENDING(&t->signal->shared_pending, &t->blocked) ||
-	    cgroup_task_frozen(t)) {
+	    PENDING(&t->signal->shared_pending, &t->blocked) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 		return true;
 	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
  2019-05-13 12:17 ` Oleg Nesterov
@ 2019-05-13 16:38   ` Oleg Nesterov
  2019-05-13 16:54     ` Roman Gushchin
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2019-05-13 16:38 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: linux-kernel, tj, guro, kernel-team

On 05/13, Oleg Nesterov wrote:
>
> Probably we add leave_frozen(true) after freezable_schedule() for now, then
> think try to make something better...

And again, this is what I thought ptrace_stop() does, somehow I didn't notice
that the last version doesn't have leave_frozen() in ptrace_stop().

Perhaps we can do a bit better, change only tracehook_report_syscall_entry() and
PTRACE_EVENT_EXIT/SECCOMP paths to do leave_frozen() ?

At first glance other callers look fine in that they can do nothing "interesting"
befor get_signal(), but we need to re-check...

Oleg.

> But I am not sure I 100% understand whats going on in this case, could you
> try the patch below? (Just in case, of course it is wrong).
> 
> Oleg.
> 
> --- x/kernel/signal.c
> +++ x/kernel/signal.c
> @@ -149,8 +149,7 @@
>  {
>  	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
>  	    PENDING(&t->pending, &t->blocked) ||
> -	    PENDING(&t->signal->shared_pending, &t->blocked) ||
> -	    cgroup_task_frozen(t)) {
> +	    PENDING(&t->signal->shared_pending, &t->blocked) {
>  		set_tsk_thread_flag(t, TIF_SIGPENDING);
>  		return true;
>  	}
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
  2019-05-13 16:38   ` Oleg Nesterov
@ 2019-05-13 16:54     ` Roman Gushchin
  0 siblings, 0 replies; 6+ messages in thread
From: Roman Gushchin @ 2019-05-13 16:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Alex Xu (Hello71), linux-kernel, tj, Kernel Team

On Mon, May 13, 2019 at 06:38:14PM +0200, Oleg Nesterov wrote:
> On 05/13, Oleg Nesterov wrote:
> >
> > Probably we add leave_frozen(true) after freezable_schedule() for now, then
> > think try to make something better...
> 
> And again, this is what I thought ptrace_stop() does, somehow I didn't notice
> that the last version doesn't have leave_frozen() in ptrace_stop().
> 
> Perhaps we can do a bit better, change only tracehook_report_syscall_entry() and
> PTRACE_EVENT_EXIT/SECCOMP paths to do leave_frozen() ?
> 
> At first glance other callers look fine in that they can do nothing "interesting"
> befor get_signal(), but we need to re-check...

Hi Oleg!

Thank you for looking into it!

I've just check the following patch (see below). It solves the regression
and overall seems correct. But I need some more time to check.

Thanks!

--

diff --git a/kernel/signal.c b/kernel/signal.c
index 8607b11ff936..088b377ad439 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2112,6 +2112,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
                preempt_enable_no_resched();
                cgroup_enter_frozen();
                freezable_schedule();
+               if (info)
+                       cgroup_leave_frozen(true);
        } else {
                /*
                 * By the time we got the lock, our tracer went away.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e)
  2019-05-13  1:20 [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) Alex Xu (Hello71)
  2019-05-13  1:57 ` Valdis Klētnieks
  2019-05-13 12:17 ` Oleg Nesterov
@ 2019-05-13 17:03 ` Roman Gushchin
  2 siblings, 0 replies; 6+ messages in thread
From: Roman Gushchin @ 2019-05-13 17:03 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: linux-kernel, tj, oleg, Kernel Team

Hi Alex!

Thank you for the report!
It's super clear, and contains all the details, so it took me 30s
to reproduce the issue. Really appreciate your effort!



On Sun, May 12, 2019 at 09:20:12PM -0400, Alex Xu (Hello71) wrote:
> Hi,
> 
> I was trying to use strace recently and found that it exhibited some 
> strange behavior. I produced this minimal test case:
> 
> #include <unistd.h>
> 
> int main() {
>     write(1, "a", 1);
>     return 0;
> }
> 
> which, when run using "gcc test.c && strace ./a.out" produces this 
> strace output:
> 
> [ pre-main omitted ]
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> write(1, "a", 1)                        = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> [ repeats forever ]
> 
> The correct result is of course:
> 
> [ pre-main omitted ]
> write(1, "a", 1)                        = 1
> exit_group(0)                           = ?
> +++ exited with 0 +++
> 
> Strangely, this only occurs when outputting to a tty-like output. 
> Running "strace ./a.out" from a native Linux x86 console or a terminal 
> emulator causes the abnormal behavior. However, the following commands 
> work correctly:
> 
> - strace ./a.out >/dev/null
> - strace ./a.out >/tmp/a # /tmp is a standard tmpfs
> - strace ./a.out >&- # causes -1 EBADF (Bad file descriptor)
> 
> "strace -o /tmp/a ./a.out" hangs and produces the above (infinite) 
> output to /tmp/a.
> 
> I bisected this to 76f969e, "cgroup: cgroup v2 freezer". I reverted the 
> entire patchset (reverting only that one caused a conflict), which 
> resolved the issue. I skimmed the patch and came up with this 
> workaround, which also resolves the issue. I am not at all clear on the 
> technical workings of the patchset, but it seems to me like a process's 
> frozen status is supposed to be "suspended" when a frozen process is 
> ptraced, and "unsuspended" when ptracing ends. Therefore, it seems 
> suspicious to always "enter frozen" whether or not the cgroup is 
> actually frozen. It seems like the code should instead check if the 
> cgroup is actually frozen, and if so, restore the frozen status.

So, the thing is that when the freezer tries to freeze all tasks
in the cgroup, some tasks may sleep (e.g. being SIGSTOPPed),
and the freezer can't get them out of this state and put them back correctly
after unfreezing. So instead it leaves such tasks in the original state
and treats them as frozen. This is why we need this unconditional
cgroup_enter_frozen(). It's not the problem.

Anyway, I'm sure that with great help from Oleg we'll be able
to fix the issue very soon (I already posted a preliminary patch).

Once again, thank you for the report!

Roman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-13 17:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13  1:20 [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) Alex Xu (Hello71)
2019-05-13  1:57 ` Valdis Klētnieks
2019-05-13 12:17 ` Oleg Nesterov
2019-05-13 16:38   ` Oleg Nesterov
2019-05-13 16:54     ` Roman Gushchin
2019-05-13 17:03 ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).