All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Dmitry Vyukov <dvyukov@google.com>,
	kasan-dev@googlegroups.com,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread
Date: Thu, 6 Apr 2023 16:12:04 +0200	[thread overview]
Message-ID: <CANpmjNOwo=4_VpUs1PYajtxb8gvt3hyhgwc-Bk9RN4VgupZCyQ@mail.gmail.com> (raw)
In-Reply-To: <20230316123028.2890338-1-elver@google.com>

On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote:
>
> From: Dmitry Vyukov <dvyukov@google.com>
>
> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
> thread of a thread group for signal delivery.     However, this has a
> significant downside: it requires waking up a potentially idle thread.
>
> Instead, prefer to deliver signals to the current thread (in the same
> thread group) if SIGEV_THREAD_ID is not set by the user. This does not
> change guaranteed semantics, since POSIX process CPU time timers have
> never guaranteed that signal delivery is to a specific thread (without
> SIGEV_THREAD_ID set).
>
> The effect is that we no longer wake up potentially idle threads, and
> the kernel is no longer biased towards delivering the timer signal to
> any particular thread (which better distributes the timer signals esp.
> when multiple timers fire concurrently).
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v6:
> - Split test from this patch.
> - Update wording on what this patch aims to improve.
>
> v5:
> - Rebased onto v6.2.
>
> v4:
> - Restructured checks in send_sigqueue() as suggested.
>
> v3:
> - Switched to the completely different implementation (much simpler)
>   based on the Oleg's idea.
>
> RFC v2:
> - Added additional Cc as Thomas asked.
> ---
>  kernel/signal.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 8cb28f1df294..605445fa27d4 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>         /*
>          * Now find a thread we can wake up to take the signal off the queue.
>          *
> -        * If the main thread wants the signal, it gets first crack.
> -        * Probably the least surprising to the average bear.
> +        * Try the suggested task first (may or may not be the main thread).
>          */
>         if (wants_signal(sig, p))
>                 t = p;
> @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
>         ret = -1;
>         rcu_read_lock();
> +       /*
> +        * This function is used by POSIX timers to deliver a timer signal.
> +        * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
> +        * set), the signal must be delivered to the specific thread (queues
> +        * into t->pending).
> +        *
> +        * Where type is not PIDTYPE_PID, signals must just be delivered to the
> +        * current process. In this case, prefer to deliver to current if it is
> +        * in the same thread group as the target, as it avoids unnecessarily
> +        * waking up a potentially idle task.
> +        */
>         t = pid_task(pid, type);
> -       if (!t || !likely(lock_task_sighand(t, &flags)))
> +       if (!t)
> +               goto ret;
> +       if (type != PIDTYPE_PID && same_thread_group(t, current))
> +               t = current;
> +       if (!likely(lock_task_sighand(t, &flags)))
>                 goto ret;
>
>         ret = 1; /* the signal is ignored */
> @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>         q->info.si_overrun = 0;
>
>         signalfd_notify(t, sig);
> +       /*
> +        * If the type is not PIDTYPE_PID, we just use shared_pending, which
> +        * won't guarantee that the specified task will receive the signal, but
> +        * is sufficient if t==current in the common case.
> +        */
>         pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
>         list_add_tail(&q->list, &pending->list);
>         sigaddset(&pending->signal, sig);
> --

One last semi-gentle ping. ;-)

1. We're seeing that in some applications that use POSIX timers
heavily, but where the main thread is mostly idle, the main thread
receives a disproportional amount of the signals along with being
woken up constantly. This is bad, because the main thread usually
waits with the help of a futex or really long sleeps. Now the main
thread will steal time (to go back to sleep) from another thread that
could have instead just proceeded with whatever it was doing.

2. Delivering signals to random threads is currently way too
expensive. We need to resort to this crazy algorithm: 1) receive timer
signal, 2) check if main thread, 3) if main thread (which is likely),
pick a random thread and do tgkill. To find a random thread, iterate
/proc/self/task, but that's just abysmal for various reasons. Other
alternatives, like inherited task clock perf events are too expensive
as soon as we need to enable/disable the timers (does IPIs), and
maintaining O(#threads) timers is just as horrible.

This patch solves both the above issues.

We acknowledge the unfortunate situation of attributing this patch to
one clear subsystem and owner: it straddles into signal delivery and
POSIX timers territory, and perhaps some scheduling. The patch itself
only touches kernel/signal.c.

If anyone has serious objections, please shout (soon'ish). Given the
patch has been reviewed by Oleg, and scrutinized by Dmitry and myself,
presumably we need to find a tree that currently takes kernel/signal.c
patches?

Thanks!

-- Marco

  parent reply	other threads:[~2023-04-06 14:13 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 12:30 [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread Marco Elver
2023-03-16 12:30 ` [PATCH v6 2/2] selftests/timers/posix_timers: Test delivery of signals across threads Marco Elver
2023-04-16  7:04   ` [tip: timers/core] " tip-bot2 for Dmitry Vyukov
2024-04-06 20:53   ` [PATCH v6 2/2] " Muhammad Usama Anjum
2024-04-06 21:13     ` Oleg Nesterov
2024-04-06 21:32       ` Muhammad Usama Anjum
2023-03-30 10:19 ` [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread Marco Elver
2023-04-06 14:12 ` Marco Elver [this message]
2023-04-06 15:13   ` Frederic Weisbecker
2023-04-06 20:22 ` Peter Zijlstra
2023-04-16  7:04 ` [tip: timers/core] " tip-bot2 for Dmitry Vyukov
2024-04-01 20:17 ` [PATCH v6 1/2] " John Stultz
2024-04-02  9:07   ` Dmitry Vyukov
2024-04-02 14:57   ` Thomas Gleixner
2024-04-02 17:23     ` John Stultz
2024-04-03 12:41       ` Thomas Gleixner
2024-04-03 15:03         ` Oleg Nesterov
2024-04-03 15:43           ` Thomas Gleixner
2024-04-03 16:32             ` Thomas Gleixner
2024-04-03 18:16               ` John Stultz
2024-04-03 19:09                 ` Thomas Gleixner
2024-04-03 19:35                   ` John Stultz
2024-04-03 22:24                     ` Thomas Gleixner
2024-04-04 14:54                       ` Oleg Nesterov
2024-04-04 18:08                         ` Thomas Gleixner
2024-04-06 15:09                           ` [PATCH] selftests/timers/posix_timers: reimplement check_timer_distribution() Oleg Nesterov
2024-04-06 15:10                             ` Oleg Nesterov
2024-04-06 22:00                               ` Thomas Gleixner
2024-04-08  8:30                               ` Dmitry Vyukov
2024-04-08 10:01                                 ` Thomas Gleixner
2024-04-08 10:26                                 ` Oleg Nesterov
2024-04-08 18:49                                   ` Oleg Nesterov
2024-04-08 22:17                                     ` Thomas Gleixner
2024-04-09 11:10                                       ` Oleg Nesterov
2024-04-09 11:45                                         ` Dmitry Vyukov
2024-04-09 12:02                                         ` Thomas Gleixner
2024-04-09 13:38                                           ` [PATCH v2] " Oleg Nesterov
2024-04-09 15:57                                             ` [tip: timers/urgent] selftests/timers/posix_timers: Reimplement check_timer_distribution() tip-bot2 for Oleg Nesterov
2024-04-10 22:21                                             ` [PATCH v2] selftests/timers/posix_timers: reimplement check_timer_distribution() John Stultz
2024-04-10 22:31                                               ` Thomas Gleixner
2024-04-10 22:33                                                 ` John Stultz
2024-04-11 12:41                             ` [PATCH] " Mark Brown
2024-04-11 15:33                               ` John Stultz
2024-04-11 12:44                             ` Mark Brown
2024-04-11 14:17                               ` Thomas Gleixner
2024-04-11 15:50                                 ` Oleg Nesterov
2024-04-11 16:03                                   ` Mark Brown
2024-04-12 12:35                               ` [PATCH] selftests: fix build failure with NOLIBC Oleg Nesterov
2024-04-12 14:58                                 ` [tip: timers/urgent] selftests: kselftest: Fix " tip-bot2 for Oleg Nesterov
2024-04-14  7:42                                 ` [PATCH] selftests: fix " Mark Brown
2024-04-04  8:55             ` [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread Dmitry Vyukov
2024-04-04 13:43               ` Oleg Nesterov
2024-04-04 15:10                 ` Thomas Gleixner
2024-04-04 15:23                   ` Oleg Nesterov
2024-04-05  4:28                 ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANpmjNOwo=4_VpUs1PYajtxb8gvt3hyhgwc-Bk9RN4VgupZCyQ@mail.gmail.com' \
    --to=elver@google.com \
    --cc=dvyukov@google.com \
    --cc=ebiederm@xmission.com \
    --cc=frederic@kernel.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.