linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>, Pavel Machek <pavel@ucw.cz>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Linux PM <linux-pm@vger.kernel.org>
Subject: Re: [RFC][PATCH] exec: Conceal the other threads from wakeups during exec
Date: Fri, 31 Jul 2020 15:07:07 -0500	[thread overview]
Message-ID: <87tuxnwis4.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAHk-=wh_5Lu_3OACT4pSqrf1eJ3=PR_fUjL1vLSbBZM2_OAC5w@mail.gmail.com> (Linus Torvalds's message of "Fri, 31 Jul 2020 10:41:48 -0700")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Jul 31, 2020 at 10:19 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Even limited to opt-in locations I think the trick of being able to
>> transform the wait-state may solve that composition problem.
>
> So the part I found intriguing was the "catch things in the signal
> handling path".
>
> Catching things there - and *only* there - would avoid a lot of the
> problems we had with the freezer. When you're about to return to user
> mode, there are no lock inversions etc.
>
> And it kind of makes conceptual sense to do, since what you're trying
> to capture is the signal group - so using the signal state to do so
> seems like a natural thing to do. No touching of any runqueues or
> scheduler data structures, do everything _purely_ with the signal
> handling pathways.
>
> So that "feels" ok to me.
>
> That said, I do wonder if there are nasty nasty latency issues with
> odd users. Normally, you'd expect that execve() with other threads in
> the group shouldn't be a performance issue, because people simply
> shouldn't do that. So it might be ok.
>
> And if you capture them all in the signal handling pathway, that ends
> up being a very convenient place to zap them all too, so maybe my
> latency worry is misguided.
>
> IOW, I think that you could try to do your "freese other threads" not
> at all like the freezer, but more like a "collect all threads in their
> signal handler parts as the first phase of zapping them".
>
> So maybe this approach is salvageable. I see where something like the
> above could work well. But I say that with a lot of handwaving, and
> maybe if I see the patch I'd go "Christ, I was a complete idiot for
> ever even suggesting that".

Yes.

The tricky bit is that there are a handful of stops that must
be handled, or it is impossible to stop everything without causing
disruption if the exec fails.  The big ones are TASK_STOPPED and
TASK_TRACED.  There is another in wait_for_vfork_done.

At which point I am looking at writting a wrapper around schedule that
changes task state to something like TASK_WAKEKILL when asked, and then
restores the state when released.  Something that is independent of
which freezer the code is using.

It could be the scheduler to with a special bit in state that says
opt-in.  But if we have to opt in it is probably much less error
prone to write the code as an wrapper around schedule, and only
modify the core scheduling code if necessary.

If I can make TASK_STOPPED and TASK_TRACED handle spurious wake-ups
I think I can build something that is independent of the rest of the
freezers so the code doesn't have to go 3 deep on wrappers of different
freezer at those locations.  It is already 2 layers deep.

But I really don't intend to work on that again for a while.


Right now I am in the final stages of ressurecting:
https://lore.kernel.org/linux-fsdevel/87a7ohs5ow.fsf@xmission.com/

The hard part looks like cleaning up and resurrecting Oleg's patch
to prevent the abuse of files_struct->count.
https://lore.kernel.org/linux-fsdevel/20180915160423.GA31461@redhat.com/

I am close but dotting all of the i's and crossing all of the t's is
taking ab bit.

Eric

  reply	other threads:[~2020-07-31 20:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 21:03 [RFC][PATCH] exec: Freeze the other threads during a multi-threaded exec Eric W. Biederman
2020-07-28  0:20 ` Linus Torvalds
2020-07-28 12:39   ` Eric W. Biederman
2020-07-28 13:20     ` Eric W. Biederman
2020-07-28 18:17       ` Linus Torvalds
2020-07-30 13:16         ` Eric W. Biederman
2020-07-30 22:56           ` [RFC][PATCH] exec: Conceal the other threads from wakeups during exec Eric W. Biederman
2020-07-30 23:17             ` Linus Torvalds
2020-07-31 17:16               ` Eric W. Biederman
2020-07-31 17:41                 ` Linus Torvalds
2020-07-31 20:07                   ` Eric W. Biederman [this message]
2020-07-31  6:28             ` Oleg Nesterov
2020-07-31 16:50               ` Eric W. Biederman
2020-07-28  9:41 ` [RFC][PATCH] exec: Freeze the other threads during a multi-threaded exec Aleksa Sarai
2020-07-28 12:18   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuxnwis4.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).