All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [patch 4/5] sched: Delay task stack freeing on RT
Date: Fri, 1 Oct 2021 11:48:48 -0700	[thread overview]
Message-ID: <CALCETrU7Fu7BA+DEk8HJPRkqsOSsC-NXR2tPsxW6VFF0pxSS6A@mail.gmail.com> (raw)
In-Reply-To: <87o8884q02.ffs@tglx>

On Fri, Oct 1, 2021 at 10:24 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Oct 01 2021 at 09:12, Andy Lutomirski wrote:
> > On Wed, Sep 29, 2021 at 4:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >> Having this logic split across two files seems unfortunate and prone to
> >> 'accidents'. Is there a real down-side to unconditionally doing it in
> >> delayed_put_task_struct() ?
> >>
> >> /me goes out for lunch... meanwhile tglx points at: 68f24b08ee89.
> >>
> >> Bah.. Andy?
> >
> > Could we make whatever we do here unconditional?
>
> Sure. I just was unsure about your reasoning in 68f24b08ee89.

Mmm, right.  The reasoning is that there are a lot of workloads that
frequently wait for a task to exit and immediately start a new task --
most shell scripts, for example.  I think I tested this with the
following amazing workload:

while true; do true; done

and we want to reuse the same stack each time from the cached stack
lookaside list instead of vfreeing and vmallocing a stack each time.
Deferring the release to the lookaside list breaks it.  Although I
suppose the fact that it works well right now is a bit fragile --
we're waking the parent (sh, etc) before releasing the stack, but
nothing gets to run until the stack is released.

>
> > And what actually causes the latency?  If it's vfree, shouldn't the
> > existing use of vfree_atomic() in free_thread_stack() handle it?  Or
> > is it the accounting?
>
> The accounting muck because it can go into the allocator and sleep in
> the worst case, which is nasty even on !RT kernels.

Wait, unaccounting memory can go into the allocator?  That seems quite nasty.

>
> But thinking some more, there is actually a way nastier issue on RT in
> the following case:
>
> CPU 0                           CPU 1
>   T1
>   spin_lock(L1)
>   rt_mutex_lock()
>       schedule()
>
>   T2
>      do_exit()
>      do_task_dead()             spin_unlock(L1)
>                                    wake(T1)
>      __schedule()
>        switch_to(T1)
>        finish_task_switch()
>          put_task_stack()
>            account()
>              ....
>              spin_lock(L2)
>
> So if L1 == L2 or L1 and L2 have a reverse dependency then this can just
> deadlock.
>
> We've never observed that, but the above case is obviously hard to
> hit. Nevertheless it's there.

Hmm.

ISTM it would be conceptually for do_exit() to handle its own freeing
in its own preemptible context.  Obviously that can't really work,
since we can't free a task_struct or a task stack while we're running
on it.  But I wonder if we could approximate it by putting this work
in a workqueue so that it all runs in a normal schedulable context.
To make the shell script case work nicely, we want to release the task
stack before notifying anyone waiting for the dying task to exit, but
maybe that's doable.  It could involve some nasty exit_signal hackery,
though.

  reply	other threads:[~2021-10-01 18:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 12:24 [patch 0/5] sched: Miscellaneous RT related tweaks Thomas Gleixner
2021-09-28 12:24 ` [patch 1/5] sched: Limit the number of task migrations per batch on RT Thomas Gleixner
2021-10-01 15:05   ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2021-10-05 14:11   ` tip-bot2 for Thomas Gleixner
2021-09-28 12:24 ` [patch 2/5] sched: Disable TTWU_QUEUE " Thomas Gleixner
2021-10-01 15:05   ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2021-10-05 14:11   ` tip-bot2 for Thomas Gleixner
2021-09-28 12:24 ` [patch 3/5] sched: Move kprobes cleanup out of finish_task_switch() Thomas Gleixner
2021-10-01 15:05   ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2021-10-05 14:11   ` tip-bot2 for Thomas Gleixner
2021-09-28 12:24 ` [patch 4/5] sched: Delay task stack freeing on RT Thomas Gleixner
2021-09-29 11:54   ` Peter Zijlstra
2021-10-01 16:12     ` Andy Lutomirski
2021-10-01 17:24       ` Thomas Gleixner
2021-10-01 18:48         ` Andy Lutomirski [this message]
2021-10-01 19:02           ` Andy Lutomirski
2021-10-01 20:54             ` Thomas Gleixner
2021-09-28 12:24 ` [patch 5/5] sched: Move mmdrop to RCU " Thomas Gleixner
2021-09-29 12:02   ` Peter Zijlstra
2021-09-29 13:05     ` Thomas Gleixner
2021-10-01 15:05   ` [tip: sched/core] " tip-bot2 for Thomas Gleixner
2021-10-05 14:11   ` tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrU7Fu7BA+DEk8HJPRkqsOSsC-NXR2tPsxW6VFF0pxSS6A@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.