linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] posix-timers: Support delivery of signals to the current thread
@ 2022-12-16 17:18 Dmitry Vyukov
  2023-01-11 15:49 ` Dmitry Vyukov
  2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
  0 siblings, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2022-12-16 17:18 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, Dmitry Vyukov, Marco Elver

Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
and sigev_notify_thread_id == 0, which sends the signal to the current
thread that fires the timer. This is useful for relatively uniform sampling
of process execution across all threads with signals.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marco Elver <elver@google.com>

---

We are trying to implement sampling program analysis based on this.
We don't need 100% uniform sampling as a CPU profiler would need,
but we still need the signals to be reasonably distributed across
the process threads.

Thomas, does the idea look sane to you overall?
Are there any existing alternatives to this?

All alternatives we found are complex and/or have high overhead.
E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
When the timer fires we enable the perf event, and then use the signals
from the perf event, and then disable the perf event.
But this has considerable memory overhead (perf event per thread)
and issues IPIs to all CPUs for perf event enable/disable.

We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
and then manually scan /proc/self/task/ and select some task at random.
But this is also complex and makes it hard to do reasonable sampling
proportional to activity of threads.

All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
if it could deliver signals to active threads (what this patch is doing).
The analysis we are trying to do is intended for production systems
so we are aiming at as low overhead as possible.

If this idea looks sane to you in general, I will add tests and I am open
to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
and on how to represent this in the struct k_itimer. E.g. we could keep
it_pid=current but add a special flag that says to send the signal to
the current task rather than it_pid. This has an advantage that we can
add the following check to posix_timer_event()
(which would be pretty bad to violate):
WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));
---
 kernel/time/posix-timers.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 5dead89308b74..411ba087e0699 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
 int posix_timer_event(struct k_itimer *timr, int si_private)
 {
 	enum pid_type type;
+	struct pid *pid;
 	int ret;
 	/*
 	 * FIXME: if ->sigq is queued we can race with
@@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
 	 */
 	timr->sigq->info.si_sys_private = si_private;
 
+	pid = timr->it_pid ?: task_pid(current);
 	type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
-	ret = send_sigqueue(timr->sigq, timr->it_pid, type);
+	ret = send_sigqueue(timr->sigq, pid, type);
 	/* If we failed to send the signal the timer stops. */
 	return ret > 0;
 }
@@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
 	return ret;
 }
 
-static struct pid *good_sigevent(sigevent_t * event)
+static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
 {
 	struct pid *pid = task_tgid(current);
 	struct task_struct *rtn;
 
 	switch (event->sigev_notify) {
 	case SIGEV_SIGNAL | SIGEV_THREAD_ID:
+		/* This will use the current task for signals. */
+		if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
+		    !event->sigev_notify_thread_id)
+			return NULL;
 		pid = find_vpid(event->sigev_notify_thread_id);
 		rtn = pid_task(pid, PIDTYPE_PID);
 		if (!rtn || !same_thread_group(rtn, current))
-			return NULL;
+			return ERR_PTR(-ENOENT);
 		fallthrough;
 	case SIGEV_SIGNAL:
 	case SIGEV_THREAD:
 		if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
-			return NULL;
+			return ERR_PTR(-EINVAL);
 		fallthrough;
 	case SIGEV_NONE:
 		return pid;
 	default:
-		return NULL;
+		return ERR_PTR(-EINVAL);
 	}
 }
 
@@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 	struct k_itimer *new_timer;
 	int error, new_timer_id;
 	int it_id_set = IT_ID_NOT_SET;
+	struct pid *pid;
 
 	if (!kc)
 		return -EINVAL;
@@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 
 	if (event) {
 		rcu_read_lock();
-		new_timer->it_pid = get_pid(good_sigevent(event));
+		pid = good_sigevent(event, which_clock);
+		if (!IS_ERR(pid))
+			new_timer->it_pid = get_pid(pid);
 		rcu_read_unlock();
-		if (!new_timer->it_pid) {
+		if (IS_ERR(pid)) {
 			error = -EINVAL;
 			goto out;
 		}

base-commit: 041fae9c105ae342a4245cf1e0dc56a23fbb9d3c
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] posix-timers: Support delivery of signals to the current thread
  2022-12-16 17:18 [RFC PATCH] posix-timers: Support delivery of signals to the current thread Dmitry Vyukov
@ 2023-01-11 15:49 ` Dmitry Vyukov
  2023-01-11 21:28   ` Thomas Gleixner
  2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
  1 sibling, 1 reply; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-11 15:49 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, Marco Elver

On Fri, 16 Dec 2022 at 18:18, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
> and sigev_notify_thread_id == 0, which sends the signal to the current
> thread that fires the timer. This is useful for relatively uniform sampling
> of process execution across all threads with signals.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marco Elver <elver@google.com>
>
> ---
>
> We are trying to implement sampling program analysis based on this.
> We don't need 100% uniform sampling as a CPU profiler would need,
> but we still need the signals to be reasonably distributed across
> the process threads.
>
> Thomas, does the idea look sane to you overall?
> Are there any existing alternatives to this?

Hi Thomas,

I guess this was lost in your inbox during the holidays period.
Please take a look. Should I mail this as a non-RFC patch?

Thanks

> All alternatives we found are complex and/or have high overhead.
> E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
> When the timer fires we enable the perf event, and then use the signals
> from the perf event, and then disable the perf event.
> But this has considerable memory overhead (perf event per thread)
> and issues IPIs to all CPUs for perf event enable/disable.
>
> We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
> and then manually scan /proc/self/task/ and select some task at random.
> But this is also complex and makes it hard to do reasonable sampling
> proportional to activity of threads.
>
> All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
> and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
> if it could deliver signals to active threads (what this patch is doing).
> The analysis we are trying to do is intended for production systems
> so we are aiming at as low overhead as possible.
>
> If this idea looks sane to you in general, I will add tests and I am open
> to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
> and on how to represent this in the struct k_itimer. E.g. we could keep
> it_pid=current but add a special flag that says to send the signal to
> the current task rather than it_pid. This has an advantage that we can
> add the following check to posix_timer_event()
> (which would be pretty bad to violate):
> WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));
> ---
>  kernel/time/posix-timers.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 5dead89308b74..411ba087e0699 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
>  int posix_timer_event(struct k_itimer *timr, int si_private)
>  {
>         enum pid_type type;
> +       struct pid *pid;
>         int ret;
>         /*
>          * FIXME: if ->sigq is queued we can race with
> @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
>          */
>         timr->sigq->info.si_sys_private = si_private;
>
> +       pid = timr->it_pid ?: task_pid(current);
>         type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> +       ret = send_sigqueue(timr->sigq, pid, type);
>         /* If we failed to send the signal the timer stops. */
>         return ret > 0;
>  }
> @@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
>         return ret;
>  }
>
> -static struct pid *good_sigevent(sigevent_t * event)
> +static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
>  {
>         struct pid *pid = task_tgid(current);
>         struct task_struct *rtn;
>
>         switch (event->sigev_notify) {
>         case SIGEV_SIGNAL | SIGEV_THREAD_ID:
> +               /* This will use the current task for signals. */
> +               if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
> +                   !event->sigev_notify_thread_id)
> +                       return NULL;
>                 pid = find_vpid(event->sigev_notify_thread_id);
>                 rtn = pid_task(pid, PIDTYPE_PID);
>                 if (!rtn || !same_thread_group(rtn, current))
> -                       return NULL;
> +                       return ERR_PTR(-ENOENT);
>                 fallthrough;
>         case SIGEV_SIGNAL:
>         case SIGEV_THREAD:
>                 if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> -                       return NULL;
> +                       return ERR_PTR(-EINVAL);
>                 fallthrough;
>         case SIGEV_NONE:
>                 return pid;
>         default:
> -               return NULL;
> +               return ERR_PTR(-EINVAL);
>         }
>  }
>
> @@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>         struct k_itimer *new_timer;
>         int error, new_timer_id;
>         int it_id_set = IT_ID_NOT_SET;
> +       struct pid *pid;
>
>         if (!kc)
>                 return -EINVAL;
> @@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>
>         if (event) {
>                 rcu_read_lock();
> -               new_timer->it_pid = get_pid(good_sigevent(event));
> +               pid = good_sigevent(event, which_clock);
> +               if (!IS_ERR(pid))
> +                       new_timer->it_pid = get_pid(pid);
>                 rcu_read_unlock();
> -               if (!new_timer->it_pid) {
> +               if (IS_ERR(pid)) {
>                         error = -EINVAL;
>                         goto out;
>                 }
>
> base-commit: 041fae9c105ae342a4245cf1e0dc56a23fbb9d3c
> --
> 2.39.0.314.g84b9a713c41-goog
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] posix-timers: Support delivery of signals to the current thread
  2023-01-11 15:49 ` Dmitry Vyukov
@ 2023-01-11 21:28   ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2023-01-11 21:28 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: linux-kernel, Marco Elver

On Wed, Jan 11 2023 at 16:49, Dmitry Vyukov wrote:
> On Fri, 16 Dec 2022 at 18:18, Dmitry Vyukov <dvyukov@google.com> wrote:
>> We are trying to implement sampling program analysis based on this.
>> We don't need 100% uniform sampling as a CPU profiler would need,
>> but we still need the signals to be reasonably distributed across
>> the process threads.
>>
>> Thomas, does the idea look sane to you overall?
>> Are there any existing alternatives to this?
>
> Hi Thomas,
>
> I guess this was lost in your inbox during the holidays period.
> Please take a look. Should I mail this as a non-RFC patch?

It's not lost it's just in that huge pile of backlog. I'll get to it in
the next days.

Though it would be probably good to resend and explicitely CC

       Eric W. Biederman <ebiederm@xmission.com>
       Oleg Nesterov <oleg@redhat.com>
       Frederic Weisbecker <frederic@kernel.org>

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2022-12-16 17:18 [RFC PATCH] posix-timers: Support delivery of signals to the current thread Dmitry Vyukov
  2023-01-11 15:49 ` Dmitry Vyukov
@ 2023-01-12 11:24 ` Dmitry Vyukov
  2023-01-25 12:43   ` Oleg Nesterov
  2023-01-26 10:51   ` [PATCH v3] posix-timers: Prefer " Dmitry Vyukov
  1 sibling, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-12 11:24 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, Dmitry Vyukov, Eric W . Biederman, Oleg Nesterov,
	Frederic Weisbecker, Marco Elver

Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
and sigev_notify_thread_id == 0, which sends the signal to the current
thread that fires the timer. This is useful for relatively uniform sampling
of process execution across all threads with signals.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Marco Elver <elver@google.com>

---

We are trying to implement sampling program analysis based on this.
We don't need 100% uniform sampling as a CPU profiler would need,
but we still need the signals to be reasonably distributed across
the process threads.

Thomas, does the idea look sane to you overall?
Are there any existing alternatives to this?

All alternatives we found are complex and/or have high overhead.
E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
When the timer fires we enable the perf event, and then use the signals
from the perf event, and then disable the perf event.
But this has considerable memory overhead (perf event per thread)
and issues IPIs to all CPUs for perf event enable/disable.

We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
and then manually scan /proc/self/task/ and select some task at random.
But this is also complex and makes it hard to do reasonable sampling
proportional to activity of threads.

All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
if it could deliver signals to active threads (what this patch is doing).
The analysis we are trying to do is intended for production systems
so we are aiming at as low overhead as possible.

If this idea looks sane to you in general, I will add tests and I am open
to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
and on how to represent this in the struct k_itimer. E.g. we could keep
it_pid=current but add a special flag that says to send the signal to
the current task rather than it_pid. This has an advantage that we can
add the following check to posix_timer_event()
(which would be pretty bad to violate):
WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));

Changes in RFC v2:
 - added additional Cc as Thomas asked
---
 kernel/time/posix-timers.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 5dead89308b74..411ba087e0699 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
 int posix_timer_event(struct k_itimer *timr, int si_private)
 {
 	enum pid_type type;
+	struct pid *pid;
 	int ret;
 	/*
 	 * FIXME: if ->sigq is queued we can race with
@@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
 	 */
 	timr->sigq->info.si_sys_private = si_private;
 
+	pid = timr->it_pid ?: task_pid(current);
 	type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
-	ret = send_sigqueue(timr->sigq, timr->it_pid, type);
+	ret = send_sigqueue(timr->sigq, pid, type);
 	/* If we failed to send the signal the timer stops. */
 	return ret > 0;
 }
@@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
 	return ret;
 }
 
-static struct pid *good_sigevent(sigevent_t * event)
+static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
 {
 	struct pid *pid = task_tgid(current);
 	struct task_struct *rtn;
 
 	switch (event->sigev_notify) {
 	case SIGEV_SIGNAL | SIGEV_THREAD_ID:
+		/* This will use the current task for signals. */
+		if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
+		    !event->sigev_notify_thread_id)
+			return NULL;
 		pid = find_vpid(event->sigev_notify_thread_id);
 		rtn = pid_task(pid, PIDTYPE_PID);
 		if (!rtn || !same_thread_group(rtn, current))
-			return NULL;
+			return ERR_PTR(-ENOENT);
 		fallthrough;
 	case SIGEV_SIGNAL:
 	case SIGEV_THREAD:
 		if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
-			return NULL;
+			return ERR_PTR(-EINVAL);
 		fallthrough;
 	case SIGEV_NONE:
 		return pid;
 	default:
-		return NULL;
+		return ERR_PTR(-EINVAL);
 	}
 }
 
@@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 	struct k_itimer *new_timer;
 	int error, new_timer_id;
 	int it_id_set = IT_ID_NOT_SET;
+	struct pid *pid;
 
 	if (!kc)
 		return -EINVAL;
@@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 
 	if (event) {
 		rcu_read_lock();
-		new_timer->it_pid = get_pid(good_sigevent(event));
+		pid = good_sigevent(event, which_clock);
+		if (!IS_ERR(pid))
+			new_timer->it_pid = get_pid(pid);
 		rcu_read_unlock();
-		if (!new_timer->it_pid) {
+		if (IS_ERR(pid)) {
 			error = -EINVAL;
 			goto out;
 		}

base-commit: e8f60cd7db24f94f2dbed6bec30dd16a68fc0828
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
@ 2023-01-25 12:43   ` Oleg Nesterov
  2023-01-25 15:17     ` Oleg Nesterov
  2023-01-26 10:51   ` [PATCH v3] posix-timers: Prefer " Dmitry Vyukov
  1 sibling, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-25 12:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/12, Dmitry Vyukov wrote:
>
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
>  int posix_timer_event(struct k_itimer *timr, int si_private)
>  {
>  	enum pid_type type;
> +	struct pid *pid;
>  	int ret;
>  	/*
>  	 * FIXME: if ->sigq is queued we can race with
> @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
>  	 */
>  	timr->sigq->info.si_sys_private = si_private;
>
> +	pid = timr->it_pid ?: task_pid(current);
>  	type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;

can't resist... somehow the line above looks confusing to me, perhaps you
can change it to

	type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;

> -static struct pid *good_sigevent(sigevent_t * event)
> +static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
>  {
>  	struct pid *pid = task_tgid(current);
>  	struct task_struct *rtn;
>
>  	switch (event->sigev_notify) {
>  	case SIGEV_SIGNAL | SIGEV_THREAD_ID:
> +		/* This will use the current task for signals. */
> +		if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
> +		    !event->sigev_notify_thread_id)
> +			return NULL;

this doesn't look right, this skips the "sigev_signo" check below.

Other than that I see nothing wrong in this patch, but I forgot everything
about posix timers many years ago ;)

> @@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
>
>  	if (event) {
>  		rcu_read_lock();
> -		new_timer->it_pid = get_pid(good_sigevent(event));
> +		pid = good_sigevent(event, which_clock);
> +		if (!IS_ERR(pid))
> +			new_timer->it_pid = get_pid(pid);

Another cosmetic nit, feel free to ignore... If you change good_sigevent()

	case SIGEV_NONE:
-		return pid;
+		return get_pid(pid);

you can remove this "if (!IS_ERR(pid))" code above.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 12:43   ` Oleg Nesterov
@ 2023-01-25 15:17     ` Oleg Nesterov
  2023-01-25 15:34       ` Dmitry Vyukov
  0 siblings, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-25 15:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/25, Oleg Nesterov wrote:
>
> Other than that I see nothing wrong in this patch, but I forgot everything
> about posix timers many years ago ;)

Stupid question. Am I right that in posix_timer_event()

	same_thread_group(current, pid_task(timr->it_pid, PIDTYPE_PID))

is always true?

If yes, perhaps we can do a much simpler change which doesn't even affect API?
See the trivial patch below.

send_sigqueue(PIDTYPE_TGID) notify any thread in thread group, so this doesn't
really change the semantics, just complete_signal() will likely choose "current"
as a target for signal_wake_up() and iiuc this is what we want for balancing?

Oleg.


diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 5dead89308b7..e38b53a0f814 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
 int posix_timer_event(struct k_itimer *timr, int si_private)
 {
 	enum pid_type type;
+	struct pid *pid;
 	int ret;
 	/*
 	 * FIXME: if ->sigq is queued we can race with
@@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
 	 */
 	timr->sigq->info.si_sys_private = si_private;
 
-	type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
-	ret = send_sigqueue(timr->sigq, timr->it_pid, type);
+	type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
+	pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
+	ret = send_sigqueue(timr->sigq, pid, type);
 	/* If we failed to send the signal the timer stops. */
 	return ret > 0;
 }


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 15:17     ` Oleg Nesterov
@ 2023-01-25 15:34       ` Dmitry Vyukov
  2023-01-25 16:31         ` Oleg Nesterov
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-25 15:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

 On Wed, 25 Jan 2023 at 16:17, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 01/25, Oleg Nesterov wrote:
> >
> > Other than that I see nothing wrong in this patch, but I forgot everything
> > about posix timers many years ago ;)
>
> Stupid question. Am I right that in posix_timer_event()
>
>         same_thread_group(current, pid_task(timr->it_pid, PIDTYPE_PID))
>
> is always true?
>
> If yes, perhaps we can do a much simpler change which doesn't even affect API?
> See the trivial patch below.
>
> send_sigqueue(PIDTYPE_TGID) notify any thread in thread group, so this doesn't
> really change the semantics, just complete_signal() will likely choose "current"
> as a target for signal_wake_up() and iiuc this is what we want for balancing?
>
> Oleg.
>
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 5dead89308b7..e38b53a0f814 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
>  int posix_timer_event(struct k_itimer *timr, int si_private)
>  {
>         enum pid_type type;
> +       struct pid *pid;
>         int ret;
>         /*
>          * FIXME: if ->sigq is queued we can race with
> @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
>          */
>         timr->sigq->info.si_sys_private = si_private;
>
> -       type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> +       type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
> +       pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
> +       ret = send_sigqueue(timr->sigq, pid, type);
>         /* If we failed to send the signal the timer stops. */
>         return ret > 0;
>  }

Hi Oleg,

This is indeed much simpler!

Do I understand correctly that:
1. I would need to use SIGEV_SIGNAL (without SIGEV_THREAD_ID)
2. The signal is still queued into process shared_pending
3. If the current task has not blocked the signal (it shouldn't), then
it won't kick any other task
4. The current task will likely deliver the signal right on the timer
interrupt return to userspace
?

This changes the existing behavior (the "average bear" may be surprised :))
https://elixir.bootlin.com/linux/v6.2-rc5/source/kernel/signal.c#L1007
But currnently it's also queued into shared_pending and any thread
could get the signal anyway. So I think this should be fine.

On the positive side: it should improve performance. Delivering to the
currently running task is better on all fronts (no kicking,
rescheduling, IPIs, better locality), right?

Let me test that it works for my use-case.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 15:34       ` Dmitry Vyukov
@ 2023-01-25 16:31         ` Oleg Nesterov
  2023-01-25 16:45           ` Oleg Nesterov
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-25 16:31 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/25, Dmitry Vyukov wrote:
>
> > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> > index 5dead89308b7..e38b53a0f814 100644
> > --- a/kernel/time/posix-timers.c
> > +++ b/kernel/time/posix-timers.c
> > @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
> >  int posix_timer_event(struct k_itimer *timr, int si_private)
> >  {
> >         enum pid_type type;
> > +       struct pid *pid;
> >         int ret;
> >         /*
> >          * FIXME: if ->sigq is queued we can race with
> > @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
> >          */
> >         timr->sigq->info.si_sys_private = si_private;
> >
> > -       type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> > -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> > +       type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
> > +       pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
> > +       ret = send_sigqueue(timr->sigq, pid, type);
> >         /* If we failed to send the signal the timer stops. */
> >         return ret > 0;
> >  }
>
> Hi Oleg,
>
> This is indeed much simpler!
>
> Do I understand correctly that:
> 1. I would need to use SIGEV_SIGNAL (without SIGEV_THREAD_ID)

Yes,

> 2. The signal is still queued into process shared_pending

Yes. But just in case, please note that if this signal is not realtime
(sigev_signo < SIGRTMIN) and it is already queued, it will be dropped.
And I do not know if this can work for you.

However this is what we already have with SIGEV_SIGNAL w/o SIGEV_THREAD_ID,
and the same is true for SIGEV_THREAD_ID if the signal is already pending in
target_task->pending.

> 3. If the current task has not blocked the signal (it shouldn't), then
> it won't kick any other task

Yes,

> 4. The current task will likely deliver the signal right on the timer
> interrupt return to userspace
> ?

Yes.

But! I just noticed send_sigqueue() does pid_task(pid, type), so the patch
above needs another change


	--- a/kernel/signal.c
	+++ b/kernel/signal.c
	@@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
	 
		ret = -1;
		rcu_read_lock();
	-	t = pid_task(pid, type);
	+	// comment to explain why don't we use "type"
	+	t = pid_task(pid, PIDTYPE_PID);
		if (!t || !likely(lock_task_sighand(t, &flags)))
			goto ret;
	 


> This changes the existing behavior (the "average bear" may be surprised :))
> https://elixir.bootlin.com/linux/v6.2-rc5/source/kernel/signal.c#L1007

this comment looks a bit misleading, s/main thread/target thread/

> But currnently it's also queued into shared_pending and any thread
> could get the signal anyway. So I think this should be fine.

Yes.

> On the positive side: it should improve performance. Delivering to the
> currently running task is better on all fronts (no kicking,
> rescheduling, IPIs, better locality), right?

Well, iiuc this was the goal of your patch ? ;)

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 16:31         ` Oleg Nesterov
@ 2023-01-25 16:45           ` Oleg Nesterov
  2023-01-25 17:07           ` Oleg Nesterov
  2023-01-26 10:56           ` Dmitry Vyukov
  2 siblings, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-25 16:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/25, Oleg Nesterov wrote:
>
> > 2. The signal is still queued into process shared_pending
>
> Yes. But just in case, please note that if this signal is not realtime
> (sigev_signo < SIGRTMIN) and it is already queued, it will be dropped.
> And I do not know if this can work for you.
>
> However this is what we already have with SIGEV_SIGNAL w/o SIGEV_THREAD_ID,
> and the same is true for SIGEV_THREAD_ID if the signal is already pending in
> target_task->pending.

Damn ;) please ignore. I forgot that send_sigqueue() doesn't check
legacy_queue().

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 16:31         ` Oleg Nesterov
  2023-01-25 16:45           ` Oleg Nesterov
@ 2023-01-25 17:07           ` Oleg Nesterov
  2023-01-26 10:58             ` Dmitry Vyukov
  2023-01-26 10:56           ` Dmitry Vyukov
  2 siblings, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-25 17:07 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/25, Oleg Nesterov wrote:
>
> > >  int posix_timer_event(struct k_itimer *timr, int si_private)
> > >  {
> > >         enum pid_type type;
> > > +       struct pid *pid;
> > >         int ret;
> > >         /*
> > >          * FIXME: if ->sigq is queued we can race with
> > > @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
> > >          */
> > >         timr->sigq->info.si_sys_private = si_private;
> > >
> > > -       type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> > > -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> > > +       type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
> > > +       pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
> > > +       ret = send_sigqueue(timr->sigq, pid, type);
> > >         /* If we failed to send the signal the timer stops. */
> > >         return ret > 0;
> > >  }

...

> But! I just noticed send_sigqueue() does pid_task(pid, type), so the patch
> above needs another change
>
> 	--- a/kernel/signal.c
> 	+++ b/kernel/signal.c
> 	@@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
> 	 
> 		ret = -1;
> 		rcu_read_lock();
> 	-	t = pid_task(pid, type);
> 	+	// comment to explain why don't we use "type"
> 	+	t = pid_task(pid, PIDTYPE_PID);
> 		if (!t || !likely(lock_task_sighand(t, &flags)))
> 			goto ret;

So. Unless I missed something (quite possibly) we do not even need the patch
above. The one liner change below can work just fine.

Oleg.

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 
 	ret = -1;
 	rcu_read_lock();
-	t = pid_task(pid, type);
+	/* GOOD COMMENT */
+	t = type == PIDTYPE_PID ? pid_task(pid, type) : current;
 	if (!t || !likely(lock_task_sighand(t, &flags)))
 		goto ret;
 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3] posix-timers: Prefer delivery of signals to the current thread
  2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
  2023-01-25 12:43   ` Oleg Nesterov
@ 2023-01-26 10:51   ` Dmitry Vyukov
  2023-01-26 14:46     ` Oleg Nesterov
  2023-01-26 15:41     ` [PATCH v4] " Dmitry Vyukov
  1 sibling, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-26 10:51 UTC (permalink / raw)
  To: tglx, oleg
  Cc: linux-kernel, Dmitry Vyukov, Eric W . Biederman,
	Frederic Weisbecker, Marco Elver

Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
is not set. We used to prefer the main thread, but delivering
to the current thread is both faster, and allows to sample actual thread
activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
the semantics (since we queue into shared_pending, all thread may
receive the signal in both cases).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Marco Elver <elver@google.com>

---

Changes in v3:
 - switched to the completely different implementation (much simpler)
   based on the Oleg's idea

Changes in RFC v2:
 - added additional Cc as Thomas asked
---
 kernel/signal.c                               | 19 ++++-
 tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
 2 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ae26da61c4d9f..6ed71701f302c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
 	/*
 	 * Now find a thread we can wake up to take the signal off the queue.
 	 *
-	 * If the main thread wants the signal, it gets first crack.
-	 * Probably the least surprising to the average bear.
+	 * Try the suggested task first (may or may not be the main thread).
 	 */
 	if (wants_signal(sig, p))
 		t = p;
@@ -1970,7 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 
 	ret = -1;
 	rcu_read_lock();
+	/*
+	 * This is called from posix timers. SIGEV_THREAD_ID signals
+	 * (type == PIDTYPE_PID) must be delivered to the user-specified
+	 * thread, so we use that and queue into t->pending.
+	 * Non-SIGEV_THREAD_ID signals must be delivered to "the process",
+	 * so we queue into shared_pending, but prefer to deliver to the
+	 * current thread. We used to prefer the main thread, but delivering
+	 * to the current thread is both faster, and allows user-space to
+	 * sample actual thread activity for CLOCK_PROCESS_CPUTIME_ID timers,
+	 * and does not change the semantics (since we queue into
+	 * shared_pending, all thread may receive the signal in both cases).
+	 * Note: current may be from a completely unrelated process for
+	 * non-cpu-time-based timers, we must be careful to not kick it.
+	 */
 	t = pid_task(pid, type);
+	if (t && type != PIDTYPE_PID && same_thread_group(t, current))
+		t = current;
 	if (!t || !likely(lock_task_sighand(t, &flags)))
 		goto ret;
 
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 0ba500056e635..fd3b933a191fe 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -188,6 +188,76 @@ static int check_timer_create(int which)
 	return 0;
 }
 
+int remain;
+__thread int got_signal;
+
+static void *distribution_thr(void *arg) {
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	return NULL;
+}
+
+static void distribution_handler(int nr)
+{
+	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+}
+
+/* Test that all running threads receive CLOCK_PROCESS_CPUTIME_ID signals. */
+static int check_timer_distribution(void)
+{
+	int err, i;
+	timer_t id;
+	const int nthreads = 10;
+	pthread_t threads[nthreads];
+	struct itimerspec val = {
+		.it_value.tv_sec = 0,
+		.it_value.tv_nsec = 1000 * 1000,
+		.it_interval.tv_sec = 0,
+		.it_interval.tv_nsec = 1000 * 1000,
+	};
+
+	printf("Check timer_create() per process signal distribution... ");
+	fflush(stdout);
+
+	remain = nthreads + 1;  /* worker threads + this thread */
+	signal(SIGALRM, distribution_handler);
+	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
+	if (err < 0) {
+		perror("Can't create timer\n");
+		return -1;
+	}
+	err = timer_settime(id, 0, &val, NULL);
+	if (err < 0) {
+		perror("Can't set timer\n");
+		return -1;
+	}
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_create(&threads[i], NULL, distribution_thr, NULL)) {
+			perror("Can't create thread\n");
+			return -1;
+		}
+	}
+
+	/* Wait for all threads to receive the signal. */
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_join(threads[i], NULL)) {
+			perror("Can't join thread\n");
+			return -1;
+		}
+	}
+
+	if (timer_delete(id)) {
+		perror("Can't delete timer\n");
+		return -1;
+	}
+
+	printf("[OK]\n");
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	printf("Testing posix timers. False negative may happen on CPU execution \n");
@@ -217,5 +287,8 @@ int main(int argc, char **argv)
 	if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
 		return ksft_exit_fail();
 
+	if (check_timer_distribution() < 0)
+		return ksft_exit_fail();
+
 	return ksft_exit_pass();
 }

base-commit: 7c46948a6e9cf47ed03b0d489fde894ad46f1437
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 16:31         ` Oleg Nesterov
  2023-01-25 16:45           ` Oleg Nesterov
  2023-01-25 17:07           ` Oleg Nesterov
@ 2023-01-26 10:56           ` Dmitry Vyukov
  2 siblings, 0 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-26 10:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

 On Wed, 25 Jan 2023 at 17:31, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 01/25, Dmitry Vyukov wrote:
> >
> > > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> > > index 5dead89308b7..e38b53a0f814 100644
> > > --- a/kernel/time/posix-timers.c
> > > +++ b/kernel/time/posix-timers.c
> > > @@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
> > >  int posix_timer_event(struct k_itimer *timr, int si_private)
> > >  {
> > >         enum pid_type type;
> > > +       struct pid *pid;
> > >         int ret;
> > >         /*
> > >          * FIXME: if ->sigq is queued we can race with
> > > @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
> > >          */
> > >         timr->sigq->info.si_sys_private = si_private;
> > >
> > > -       type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> > > -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> > > +       type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
> > > +       pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
> > > +       ret = send_sigqueue(timr->sigq, pid, type);
> > >         /* If we failed to send the signal the timer stops. */
> > >         return ret > 0;
> > >  }
> >
> > Hi Oleg,
> >
> > This is indeed much simpler!
> >
> > Do I understand correctly that:
> > 1. I would need to use SIGEV_SIGNAL (without SIGEV_THREAD_ID)
>
> Yes,
>
> > 2. The signal is still queued into process shared_pending
>
> Yes. But just in case, please note that if this signal is not realtime
> (sigev_signo < SIGRTMIN) and it is already queued, it will be dropped.
> And I do not know if this can work for you.
>
> However this is what we already have with SIGEV_SIGNAL w/o SIGEV_THREAD_ID,
> and the same is true for SIGEV_THREAD_ID if the signal is already pending in
> target_task->pending.
>
> > 3. If the current task has not blocked the signal (it shouldn't), then
> > it won't kick any other task
>
> Yes,
>
> > 4. The current task will likely deliver the signal right on the timer
> > interrupt return to userspace
> > ?
>
> Yes.
>
> But! I just noticed send_sigqueue() does pid_task(pid, type), so the patch
> above needs another change
>
>
>         --- a/kernel/signal.c
>         +++ b/kernel/signal.c
>         @@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
>                 ret = -1;
>                 rcu_read_lock();
>         -       t = pid_task(pid, type);
>         +       // comment to explain why don't we use "type"
>         +       t = pid_task(pid, PIDTYPE_PID);
>                 if (!t || !likely(lock_task_sighand(t, &flags)))
>                         goto ret;
>
>
>
> > This changes the existing behavior (the "average bear" may be surprised :))
> > https://elixir.bootlin.com/linux/v6.2-rc5/source/kernel/signal.c#L1007
>
> this comment looks a bit misleading, s/main thread/target thread/
>
> > But currnently it's also queued into shared_pending and any thread
> > could get the signal anyway. So I think this should be fine.
>
> Yes.
>
> > On the positive side: it should improve performance. Delivering to the
> > currently running task is better on all fronts (no kicking,
> > rescheduling, IPIs, better locality), right?
>
> Well, iiuc this was the goal of your patch ? ;)

No, it actually is not. The actual goal is sampling activity of
threads. For CLOCK_PROCESS_CPUTIME_ID timers you get signals
proportional to the total activity of all threads (good), but all
signals are delivered to the main thread w/o even indication of what
thread caused the signal (questionable).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH v2] posix-timers: Support delivery of signals to the current thread
  2023-01-25 17:07           ` Oleg Nesterov
@ 2023-01-26 10:58             ` Dmitry Vyukov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-26 10:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On Wed, 25 Jan 2023 at 18:07, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 01/25, Oleg Nesterov wrote:
> >
> > > >  int posix_timer_event(struct k_itimer *timr, int si_private)
> > > >  {
> > > >         enum pid_type type;
> > > > +       struct pid *pid;
> > > >         int ret;
> > > >         /*
> > > >          * FIXME: if ->sigq is queued we can race with
> > > > @@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
> > > >          */
> > > >         timr->sigq->info.si_sys_private = si_private;
> > > >
> > > > -       type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
> > > > -       ret = send_sigqueue(timr->sigq, timr->it_pid, type);
> > > > +       type = (timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_PID : PIDTYPE_TGID;
> > > > +       pid = (type == PIDTYPE_PID) ? timr->it_pid : task_pid(current);
> > > > +       ret = send_sigqueue(timr->sigq, pid, type);
> > > >         /* If we failed to send the signal the timer stops. */
> > > >         return ret > 0;
> > > >  }
>
> ...
>
> > But! I just noticed send_sigqueue() does pid_task(pid, type), so the patch
> > above needs another change
> >
> >       --- a/kernel/signal.c
> >       +++ b/kernel/signal.c
> >       @@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
> >
> >               ret = -1;
> >               rcu_read_lock();
> >       -       t = pid_task(pid, type);
> >       +       // comment to explain why don't we use "type"
> >       +       t = pid_task(pid, PIDTYPE_PID);
> >               if (!t || !likely(lock_task_sighand(t, &flags)))
> >                       goto ret;
>
> So. Unless I missed something (quite possibly) we do not even need the patch
> above. The one liner change below can work just fine.
>
> Oleg.
>
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1970,7 +1970,8 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
>         ret = -1;
>         rcu_read_lock();
> -       t = pid_task(pid, type);
> +       /* GOOD COMMENT */
> +       t = type == PIDTYPE_PID ? pid_task(pid, type) : current;
>         if (!t || !likely(lock_task_sighand(t, &flags)))
>                 goto ret;

This works for my use-case.

However, this is a bit trickier since current may be completely
unrelated to the original timer/process (think of monotonic timers).

Please take a look at v3:
https://lore.kernel.org/lkml/20221216171807.760147-1-dvyukov@google.com/T/#me78642cc096184da681cf91c39932be2bd2b74e1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 10:51   ` [PATCH v3] posix-timers: Prefer " Dmitry Vyukov
@ 2023-01-26 14:46     ` Oleg Nesterov
  2023-01-26 15:41     ` [PATCH v4] " Dmitry Vyukov
  1 sibling, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-26 14:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 01/26, Dmitry Vyukov wrote:
>
>  	t = pid_task(pid, type);
> +	if (t && type != PIDTYPE_PID && same_thread_group(t, current))
> +		t = current;
>  	if (!t || !likely(lock_task_sighand(t, &flags)))
>  		goto ret;

Feel free to ignore, this is cosmetic/subjective, but

	t = pid_task(pid, type);
	if (!t)
		goto ret;
	if (type == PIDTYPE_TGID && same_thread_group(t, current))
		t = current;
	if (!likely(lock_task_sighand(t, &flags)))
		goto ret;

looks a bit more readable/clean to me.

LGTM. Lets wait for Thomas verdict.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 10:51   ` [PATCH v3] posix-timers: Prefer " Dmitry Vyukov
  2023-01-26 14:46     ` Oleg Nesterov
@ 2023-01-26 15:41     ` Dmitry Vyukov
  2023-01-26 17:51       ` Marco Elver
  2023-02-20 14:43       ` [PATCH v5] " Dmitry Vyukov
  1 sibling, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-26 15:41 UTC (permalink / raw)
  To: tglx, oleg
  Cc: linux-kernel, Dmitry Vyukov, Eric W . Biederman,
	Frederic Weisbecker, Marco Elver

Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
is not set. We used to prefer the main thread, but delivering
to the current thread is both faster, and allows to sample actual thread
activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
the semantics (since we queue into shared_pending, all thread may
receive the signal in both cases).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Marco Elver <elver@google.com>

---

Changes in v4:
 - restructured checks in send_sigqueue() as suggested

Changes in v3:
 - switched to the completely different implementation (much simpler)
   based on the Oleg's idea

Changes in RFC v2:
 - added additional Cc as Thomas asked
---
 kernel/signal.c                               | 23 +++++-
 tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
 2 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ae26da61c4d9f..706ad3a21933d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
 	/*
 	 * Now find a thread we can wake up to take the signal off the queue.
 	 *
-	 * If the main thread wants the signal, it gets first crack.
-	 * Probably the least surprising to the average bear.
+	 * Try the suggested task first (may or may not be the main thread).
 	 */
 	if (wants_signal(sig, p))
 		t = p;
@@ -1970,8 +1969,26 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 
 	ret = -1;
 	rcu_read_lock();
+	/*
+	 * This is called from posix timers. SIGEV_THREAD_ID signals
+	 * (type == PIDTYPE_PID) must be delivered to the user-specified
+	 * thread, so we use that and queue into t->pending.
+	 * Non-SIGEV_THREAD_ID signals must be delivered to "the process",
+	 * so we queue into shared_pending, but prefer to deliver to the
+	 * current thread. We used to prefer the main thread, but delivering
+	 * to the current thread is both faster, and allows user-space to
+	 * sample actual thread activity for CLOCK_PROCESS_CPUTIME_ID timers,
+	 * and does not change the semantics (since we queue into
+	 * shared_pending, all thread may receive the signal in both cases).
+	 * Note: current may be from a completely unrelated process for
+	 * non-cpu-time-based timers, we must be careful to not kick it.
+	 */
 	t = pid_task(pid, type);
-	if (!t || !likely(lock_task_sighand(t, &flags)))
+	if (!t)
+		goto ret;
+	if (type != PIDTYPE_PID && same_thread_group(t, current))
+		t = current;
+	if (!likely(lock_task_sighand(t, &flags)))
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 0ba500056e635..fd3b933a191fe 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -188,6 +188,76 @@ static int check_timer_create(int which)
 	return 0;
 }
 
+int remain;
+__thread int got_signal;
+
+static void *distribution_thr(void *arg) {
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	return NULL;
+}
+
+static void distribution_handler(int nr)
+{
+	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+}
+
+/* Test that all running threads receive CLOCK_PROCESS_CPUTIME_ID signals. */
+static int check_timer_distribution(void)
+{
+	int err, i;
+	timer_t id;
+	const int nthreads = 10;
+	pthread_t threads[nthreads];
+	struct itimerspec val = {
+		.it_value.tv_sec = 0,
+		.it_value.tv_nsec = 1000 * 1000,
+		.it_interval.tv_sec = 0,
+		.it_interval.tv_nsec = 1000 * 1000,
+	};
+
+	printf("Check timer_create() per process signal distribution... ");
+	fflush(stdout);
+
+	remain = nthreads + 1;  /* worker threads + this thread */
+	signal(SIGALRM, distribution_handler);
+	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
+	if (err < 0) {
+		perror("Can't create timer\n");
+		return -1;
+	}
+	err = timer_settime(id, 0, &val, NULL);
+	if (err < 0) {
+		perror("Can't set timer\n");
+		return -1;
+	}
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_create(&threads[i], NULL, distribution_thr, NULL)) {
+			perror("Can't create thread\n");
+			return -1;
+		}
+	}
+
+	/* Wait for all threads to receive the signal. */
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_join(threads[i], NULL)) {
+			perror("Can't join thread\n");
+			return -1;
+		}
+	}
+
+	if (timer_delete(id)) {
+		perror("Can't delete timer\n");
+		return -1;
+	}
+
+	printf("[OK]\n");
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	printf("Testing posix timers. False negative may happen on CPU execution \n");
@@ -217,5 +287,8 @@ int main(int argc, char **argv)
 	if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
 		return ksft_exit_fail();
 
+	if (check_timer_distribution() < 0)
+		return ksft_exit_fail();
+
 	return ksft_exit_pass();
 }

base-commit: 7c46948a6e9cf47ed03b0d489fde894ad46f1437
-- 
2.39.1.456.gfc5497dd1b-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 15:41     ` [PATCH v4] " Dmitry Vyukov
@ 2023-01-26 17:51       ` Marco Elver
  2023-01-26 19:57         ` Thomas Gleixner
  2023-02-20 14:43       ` [PATCH v5] " Dmitry Vyukov
  1 sibling, 1 reply; 26+ messages in thread
From: Marco Elver @ 2023-01-26 17:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, oleg, linux-kernel, Eric W . Biederman, Frederic Weisbecker

On Thu, 26 Jan 2023 at 16:41, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
> is not set. We used to prefer the main thread, but delivering
> to the current thread is both faster, and allows to sample actual thread
> activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
> the semantics (since we queue into shared_pending, all thread may
> receive the signal in both cases).
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Marco Elver <elver@google.com>
>
> ---
>
> Changes in v4:
>  - restructured checks in send_sigqueue() as suggested
>
> Changes in v3:
>  - switched to the completely different implementation (much simpler)
>    based on the Oleg's idea
>
> Changes in RFC v2:
>  - added additional Cc as Thomas asked
> ---
>  kernel/signal.c                               | 23 +++++-
>  tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
>  2 files changed, 93 insertions(+), 3 deletions(-)

Reviewed-by: Marco Elver <elver@google.com>

Nice - and and given the test, hopefully this behaviour won't regress in future.


> diff --git a/kernel/signal.c b/kernel/signal.c
> index ae26da61c4d9f..706ad3a21933d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
>         /*
>          * Now find a thread we can wake up to take the signal off the queue.
>          *
> -        * If the main thread wants the signal, it gets first crack.
> -        * Probably the least surprising to the average bear.
> +        * Try the suggested task first (may or may not be the main thread).
>          */
>         if (wants_signal(sig, p))
>                 t = p;
> @@ -1970,8 +1969,26 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
>
>         ret = -1;
>         rcu_read_lock();
> +       /*
> +        * This is called from posix timers. SIGEV_THREAD_ID signals
> +        * (type == PIDTYPE_PID) must be delivered to the user-specified
> +        * thread, so we use that and queue into t->pending.
> +        * Non-SIGEV_THREAD_ID signals must be delivered to "the process",
> +        * so we queue into shared_pending, but prefer to deliver to the
> +        * current thread. We used to prefer the main thread, but delivering
> +        * to the current thread is both faster, and allows user-space to
> +        * sample actual thread activity for CLOCK_PROCESS_CPUTIME_ID timers,
> +        * and does not change the semantics (since we queue into
> +        * shared_pending, all thread may receive the signal in both cases).
> +        * Note: current may be from a completely unrelated process for
> +        * non-cpu-time-based timers, we must be careful to not kick it.
> +        */
>         t = pid_task(pid, type);
> -       if (!t || !likely(lock_task_sighand(t, &flags)))
> +       if (!t)
> +               goto ret;
> +       if (type != PIDTYPE_PID && same_thread_group(t, current))
> +               t = current;
> +       if (!likely(lock_task_sighand(t, &flags)))
>                 goto ret;
>
>         ret = 1; /* the signal is ignored */
> diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
> index 0ba500056e635..fd3b933a191fe 100644
> --- a/tools/testing/selftests/timers/posix_timers.c
> +++ b/tools/testing/selftests/timers/posix_timers.c
> @@ -188,6 +188,76 @@ static int check_timer_create(int which)
>         return 0;
>  }
>
> +int remain;
> +__thread int got_signal;
> +
> +static void *distribution_thr(void *arg) {
> +       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> +       return NULL;
> +}
> +
> +static void distribution_handler(int nr)
> +{
> +       if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
> +               __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
> +}
> +
> +/* Test that all running threads receive CLOCK_PROCESS_CPUTIME_ID signals. */
> +static int check_timer_distribution(void)
> +{
> +       int err, i;
> +       timer_t id;
> +       const int nthreads = 10;
> +       pthread_t threads[nthreads];
> +       struct itimerspec val = {
> +               .it_value.tv_sec = 0,
> +               .it_value.tv_nsec = 1000 * 1000,
> +               .it_interval.tv_sec = 0,
> +               .it_interval.tv_nsec = 1000 * 1000,
> +       };
> +
> +       printf("Check timer_create() per process signal distribution... ");
> +       fflush(stdout);
> +
> +       remain = nthreads + 1;  /* worker threads + this thread */
> +       signal(SIGALRM, distribution_handler);
> +       err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
> +       if (err < 0) {
> +               perror("Can't create timer\n");
> +               return -1;
> +       }
> +       err = timer_settime(id, 0, &val, NULL);
> +       if (err < 0) {
> +               perror("Can't set timer\n");
> +               return -1;
> +       }
> +
> +       for (i = 0; i < nthreads; i++) {
> +               if (pthread_create(&threads[i], NULL, distribution_thr, NULL)) {
> +                       perror("Can't create thread\n");
> +                       return -1;
> +               }
> +       }
> +
> +       /* Wait for all threads to receive the signal. */
> +       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> +
> +       for (i = 0; i < nthreads; i++) {
> +               if (pthread_join(threads[i], NULL)) {
> +                       perror("Can't join thread\n");
> +                       return -1;
> +               }
> +       }
> +
> +       if (timer_delete(id)) {
> +               perror("Can't delete timer\n");
> +               return -1;
> +       }
> +
> +       printf("[OK]\n");
> +       return 0;
> +}
> +
>  int main(int argc, char **argv)
>  {
>         printf("Testing posix timers. False negative may happen on CPU execution \n");
> @@ -217,5 +287,8 @@ int main(int argc, char **argv)
>         if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
>                 return ksft_exit_fail();
>
> +       if (check_timer_distribution() < 0)
> +               return ksft_exit_fail();
> +
>         return ksft_exit_pass();
>  }
>
> base-commit: 7c46948a6e9cf47ed03b0d489fde894ad46f1437
> --
> 2.39.1.456.gfc5497dd1b-goog
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 17:51       ` Marco Elver
@ 2023-01-26 19:57         ` Thomas Gleixner
  2023-01-27  6:58           ` Dmitry Vyukov
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2023-01-26 19:57 UTC (permalink / raw)
  To: Marco Elver, Dmitry Vyukov
  Cc: oleg, linux-kernel, Eric W . Biederman, Frederic Weisbecker

On Thu, Jan 26 2023 at 18:51, Marco Elver wrote:
> On Thu, 26 Jan 2023 at 16:41, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
>> is not set. We used to prefer the main thread, but delivering
>> to the current thread is both faster, and allows to sample actual thread
>> activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
>> the semantics (since we queue into shared_pending, all thread may
>> receive the signal in both cases).
>
> Reviewed-by: Marco Elver <elver@google.com>
>
> Nice - and and given the test, hopefully this behaviour won't regress in future.

The test does not tell much. It just waits until each thread got a
signal once, which can take quite a while. It does not tell about the
distribution of the signals, which can be completely randomly skewed
towards a few threads.

Also for real world use cases where you have multiple threads with
different periods and runtime per period I have a hard time to
understand how that signal measures anything useful.

The most time consuming thread might actually trigger rarely while other
short threads end up being the ones which cause the timer to fire.

What's the usefulness of this information?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 19:57         ` Thomas Gleixner
@ 2023-01-27  6:58           ` Dmitry Vyukov
  2023-01-28 19:56             ` Oleg Nesterov
  2023-02-02  7:36             ` Dmitry Vyukov
  0 siblings, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-27  6:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marco Elver, oleg, linux-kernel, Eric W . Biederman, Frederic Weisbecker

On Thu, 26 Jan 2023 at 20:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, Jan 26 2023 at 18:51, Marco Elver wrote:
> > On Thu, 26 Jan 2023 at 16:41, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
> >> is not set. We used to prefer the main thread, but delivering
> >> to the current thread is both faster, and allows to sample actual thread
> >> activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
> >> the semantics (since we queue into shared_pending, all thread may
> >> receive the signal in both cases).
> >
> > Reviewed-by: Marco Elver <elver@google.com>
> >
> > Nice - and and given the test, hopefully this behaviour won't regress in future.
>
> The test does not tell much. It just waits until each thread got a
> signal once, which can take quite a while. It does not tell about the
> distribution of the signals, which can be completely randomly skewed
> towards a few threads.
>
> Also for real world use cases where you have multiple threads with
> different periods and runtime per period I have a hard time to
> understand how that signal measures anything useful.
>
> The most time consuming thread might actually trigger rarely while other
> short threads end up being the ones which cause the timer to fire.
>
> What's the usefulness of this information?
>
> Thanks,
>
>         tglx

Hi Thomas,

Our goal is to sample what threads are doing in production with low
frequency and low overhead. We did not find any reasonable existing
way of doing it on Linux today, as outlined in the RFC version of the
patch (other solutions are either much more complex and/or incur
higher memory and/or CPU overheads):
https://lore.kernel.org/all/20221216171807.760147-1-dvyukov@google.com/

This sampling does not need to be 100% precise as CPU profilers would
require, getting high precision generally requires more complexity and
overheads. The accent is on use in production and low overhead.
Consider we sample with O(seconds) interval, so some activities can
still be unsampled whatever we do here (if they take <second). But on
the other hand the intention is to use this over billions of CPU
hours. So on the global scale we still observe more-or-less
everything.

Currently all signals are practically delivered to the main thread and
the added test does not pass (at least I couldn't wait long enough).
After this change the test passes quickly (within a second for me).
Testing the actual distribution without flaky failures is very hard in
unit tests. After rounds of complaints and deflaking they usually
transform into roughly what this test is doing -- all threads are
getting at least something.
If we want to test ultimate fairness, we would need to start with the
scheduler itself. If threads don't get fair fractions, then signals
won't be evenly distributed as well. I am not sure if there are unit
tests for the scheduler that ensure this in all configurations (e.g.
uneven ratio of runnable threads to CPUs, running in VMs, etc).
I agree this test is not perfect, but as I said, it does not pass now.
So it is useful and will detect a future regression in this logic. It
ensures that running threads eventually get signals.

But regardless of our motivation, this change looks like an
improvement in general. Consider performance alone (why would we wake
another thread, maybe send an IPI, evict caches). Sending the signal
to the thread that overflowed the counter also looks reasonable. For
some programs it may actually give a good picture. Say thread A is
running for a prolonged time, then thread B is running. The program
will first get signals in thread A and then in thread B (instead of
getting them on an unrelated thread).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-27  6:58           ` Dmitry Vyukov
@ 2023-01-28 19:56             ` Oleg Nesterov
  2023-01-28 20:15               ` Oleg Nesterov
  2023-01-30  9:00               ` Dmitry Vyukov
  2023-02-02  7:36             ` Dmitry Vyukov
  1 sibling, 2 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-28 19:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Marco Elver, linux-kernel, Eric W . Biederman,
	Frederic Weisbecker

Dmitry,

I agree with what you said, just one note...

On 01/27, Dmitry Vyukov wrote:
>
> After this change the test passes quickly (within a second for me).

yet perhaps it makes sense to slightly change it? It does

	+static void *distribution_thr(void *arg) {
	+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
	+	return NULL;
	+}

so distribution_thr() eats CPU even after this thread gets a signal and thus
(in theory) it can "steal" cpu_timer_fire() from other threads unpredictably
long ? How about

	-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
	+	while (__atomic_load_n(&got_signal, __ATOMIC_RELAXED));

?

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-28 19:56             ` Oleg Nesterov
@ 2023-01-28 20:15               ` Oleg Nesterov
  2023-01-30  9:00               ` Dmitry Vyukov
  1 sibling, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-28 20:15 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Marco Elver, linux-kernel, Eric W . Biederman,
	Frederic Weisbecker

Forgot to mention ...

On 01/28, Oleg Nesterov wrote:
>
> Dmitry,
>
> I agree with what you said, just one note...
>
> On 01/27, Dmitry Vyukov wrote:
> >
> > After this change the test passes quickly (within a second for me).
>
> yet perhaps it makes sense to slightly change it? It does
>
> 	+static void *distribution_thr(void *arg) {
> 	+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> 	+	return NULL;
> 	+}
>
> so distribution_thr() eats CPU even after this thread gets a signal and thus
> (in theory) it can "steal" cpu_timer_fire() from other threads unpredictably
> long ? How about
>
> 	-	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> 	+	while (__atomic_load_n(&got_signal, __ATOMIC_RELAXED));
>
> ?

Of course, in this case it also makes sense to change the main() function the
same way and add BUG_ON(remain) after the "for (...) pthread_join()" block.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-28 19:56             ` Oleg Nesterov
  2023-01-28 20:15               ` Oleg Nesterov
@ 2023-01-30  9:00               ` Dmitry Vyukov
  2023-01-30 16:49                 ` Oleg Nesterov
  1 sibling, 1 reply; 26+ messages in thread
From: Dmitry Vyukov @ 2023-01-30  9:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Gleixner, Marco Elver, linux-kernel, Eric W . Biederman,
	Frederic Weisbecker

On Sat, 28 Jan 2023 at 20:56, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Dmitry,
>
> I agree with what you said, just one note...
>
> On 01/27, Dmitry Vyukov wrote:
> >
> > After this change the test passes quickly (within a second for me).
>
> yet perhaps it makes sense to slightly change it? It does
>
>         +static void *distribution_thr(void *arg) {
>         +       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
>         +       return NULL;
>         +}
>
> so distribution_thr() eats CPU even after this thread gets a signal and thus
> (in theory) it can "steal" cpu_timer_fire() from other threads unpredictably
> long ? How about
>
>         -       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
>         +       while (__atomic_load_n(&got_signal, __ATOMIC_RELAXED));
> ?

But why?
IIUC this makes the test even "weaker". As Thomas notes it's already
somewhat "weak". And this would make it even "weaker". So if it passes
in the current version, I would keep it as is. It makes sense to relax
it only if it's known to fail sometimes. But it doesn't fail as far as
I know. And the intention is really that the current version must pass
-- all threads must get signals even if other threads are running.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-30  9:00               ` Dmitry Vyukov
@ 2023-01-30 16:49                 ` Oleg Nesterov
  0 siblings, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2023-01-30 16:49 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Marco Elver, linux-kernel, Eric W . Biederman,
	Frederic Weisbecker

On 01/30, Dmitry Vyukov wrote:
>
> On Sat, 28 Jan 2023 at 20:56, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Dmitry,
> >
> > I agree with what you said, just one note...
> >
> > On 01/27, Dmitry Vyukov wrote:
> > >
> > > After this change the test passes quickly (within a second for me).
> >
> > yet perhaps it makes sense to slightly change it? It does
> >
> >         +static void *distribution_thr(void *arg) {
> >         +       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> >         +       return NULL;
> >         +}
> >
> > so distribution_thr() eats CPU even after this thread gets a signal and thus
> > (in theory) it can "steal" cpu_timer_fire() from other threads unpredictably
> > long ? How about
> >
> >         -       while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
> >         +       while (__atomic_load_n(&got_signal, __ATOMIC_RELAXED));
> > ?
>
> But why?
> IIUC this makes the test even "weaker". As Thomas notes it's already
> somewhat "weak". And this would make it even "weaker".

Not sure I understand why can this change make the test more weak...

IIUC, _in theory_ the test-case can "hang" forever, since all threads
are running nothing guarentees that every thread will have a chance to
call cpu_timer_fire() and get a signal.

With this change this is not possible, and the test-case will still
verify that all threads must get a signal.

Nevermind,

> So if it passes
> in the current version, I would keep it as is.

OK, I won't insist, please forget.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4] posix-timers: Prefer delivery of signals to the current thread
  2023-01-27  6:58           ` Dmitry Vyukov
  2023-01-28 19:56             ` Oleg Nesterov
@ 2023-02-02  7:36             ` Dmitry Vyukov
  1 sibling, 0 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-02-02  7:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Marco Elver, oleg, linux-kernel, Eric W . Biederman, Frederic Weisbecker

,On Fri, 27 Jan 2023 at 07:58, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, 26 Jan 2023 at 20:57, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Thu, Jan 26 2023 at 18:51, Marco Elver wrote:
> > > On Thu, 26 Jan 2023 at 16:41, Dmitry Vyukov <dvyukov@google.com> wrote:
> > >>
> > >> Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
> > >> is not set. We used to prefer the main thread, but delivering
> > >> to the current thread is both faster, and allows to sample actual thread
> > >> activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
> > >> the semantics (since we queue into shared_pending, all thread may
> > >> receive the signal in both cases).
> > >
> > > Reviewed-by: Marco Elver <elver@google.com>
> > >
> > > Nice - and and given the test, hopefully this behaviour won't regress in future.
> >
> > The test does not tell much. It just waits until each thread got a
> > signal once, which can take quite a while. It does not tell about the
> > distribution of the signals, which can be completely randomly skewed
> > towards a few threads.
> >
> > Also for real world use cases where you have multiple threads with
> > different periods and runtime per period I have a hard time to
> > understand how that signal measures anything useful.
> >
> > The most time consuming thread might actually trigger rarely while other
> > short threads end up being the ones which cause the timer to fire.
> >
> > What's the usefulness of this information?
> >
> > Thanks,
> >
> >         tglx
>
> Hi Thomas,
>
> Our goal is to sample what threads are doing in production with low
> frequency and low overhead. We did not find any reasonable existing
> way of doing it on Linux today, as outlined in the RFC version of the
> patch (other solutions are either much more complex and/or incur
> higher memory and/or CPU overheads):
> https://lore.kernel.org/all/20221216171807.760147-1-dvyukov@google.com/
>
> This sampling does not need to be 100% precise as CPU profilers would
> require, getting high precision generally requires more complexity and
> overheads. The accent is on use in production and low overhead.
> Consider we sample with O(seconds) interval, so some activities can
> still be unsampled whatever we do here (if they take <second). But on
> the other hand the intention is to use this over billions of CPU
> hours. So on the global scale we still observe more-or-less
> everything.
>
> Currently all signals are practically delivered to the main thread and
> the added test does not pass (at least I couldn't wait long enough).
> After this change the test passes quickly (within a second for me).
> Testing the actual distribution without flaky failures is very hard in
> unit tests. After rounds of complaints and deflaking they usually
> transform into roughly what this test is doing -- all threads are
> getting at least something.
> If we want to test ultimate fairness, we would need to start with the
> scheduler itself. If threads don't get fair fractions, then signals
> won't be evenly distributed as well. I am not sure if there are unit
> tests for the scheduler that ensure this in all configurations (e.g.
> uneven ratio of runnable threads to CPUs, running in VMs, etc).
> I agree this test is not perfect, but as I said, it does not pass now.
> So it is useful and will detect a future regression in this logic. It
> ensures that running threads eventually get signals.
>
> But regardless of our motivation, this change looks like an
> improvement in general. Consider performance alone (why would we wake
> another thread, maybe send an IPI, evict caches). Sending the signal
> to the thread that overflowed the counter also looks reasonable. For
> some programs it may actually give a good picture. Say thread A is
> running for a prolonged time, then thread B is running. The program
> will first get signals in thread A and then in thread B (instead of
> getting them on an unrelated thread).



Hi Thomas,

Has this answered your question? Do you have any other concerns?
If not, please take this into some tree for upstreamming.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5] posix-timers: Prefer delivery of signals to the current thread
  2023-01-26 15:41     ` [PATCH v4] " Dmitry Vyukov
  2023-01-26 17:51       ` Marco Elver
@ 2023-02-20 14:43       ` Dmitry Vyukov
  2023-02-22 15:19         ` Oleg Nesterov
  1 sibling, 1 reply; 26+ messages in thread
From: Dmitry Vyukov @ 2023-02-20 14:43 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, oleg, Dmitry Vyukov, Eric W . Biederman,
	Frederic Weisbecker, Marco Elver

Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
is not set. We used to prefer the main thread, but delivering
to the current thread is both faster, and allows to sample actual thread
activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
the semantics (since we queue into shared_pending, all thread may
receive the signal in both cases).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Marco Elver <elver@google.com>

---
Changes in v5:
 - rebased onto v6.2

Changes in v4:
 - restructured checks in send_sigqueue() as suggested

Changes in v3:
 - switched to the completely different implementation (much simpler)
   based on the Oleg's idea

Changes in RFC v2:
 - added additional Cc as Thomas asked
---
 kernel/signal.c                               | 23 +++++-
 tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
 2 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ae26da61c4d9f..706ad3a21933d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
 	/*
 	 * Now find a thread we can wake up to take the signal off the queue.
 	 *
-	 * If the main thread wants the signal, it gets first crack.
-	 * Probably the least surprising to the average bear.
+	 * Try the suggested task first (may or may not be the main thread).
 	 */
 	if (wants_signal(sig, p))
 		t = p;
@@ -1970,8 +1969,26 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
 
 	ret = -1;
 	rcu_read_lock();
+	/*
+	 * This is called from posix timers. SIGEV_THREAD_ID signals
+	 * (type == PIDTYPE_PID) must be delivered to the user-specified
+	 * thread, so we use that and queue into t->pending.
+	 * Non-SIGEV_THREAD_ID signals must be delivered to "the process",
+	 * so we queue into shared_pending, but prefer to deliver to the
+	 * current thread. We used to prefer the main thread, but delivering
+	 * to the current thread is both faster, and allows user-space to
+	 * sample actual thread activity for CLOCK_PROCESS_CPUTIME_ID timers,
+	 * and does not change the semantics (since we queue into
+	 * shared_pending, all thread may receive the signal in both cases).
+	 * Note: current may be from a completely unrelated process for
+	 * non-cpu-time-based timers, we must be careful to not kick it.
+	 */
 	t = pid_task(pid, type);
-	if (!t || !likely(lock_task_sighand(t, &flags)))
+	if (!t)
+		goto ret;
+	if (type != PIDTYPE_PID && same_thread_group(t, current))
+		t = current;
+	if (!likely(lock_task_sighand(t, &flags)))
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index 0ba500056e635..fd3b933a191fe 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -188,6 +188,76 @@ static int check_timer_create(int which)
 	return 0;
 }
 
+int remain;
+__thread int got_signal;
+
+static void *distribution_thr(void *arg) {
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+	return NULL;
+}
+
+static void distribution_handler(int nr)
+{
+	if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
+		__atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
+}
+
+/* Test that all running threads receive CLOCK_PROCESS_CPUTIME_ID signals. */
+static int check_timer_distribution(void)
+{
+	int err, i;
+	timer_t id;
+	const int nthreads = 10;
+	pthread_t threads[nthreads];
+	struct itimerspec val = {
+		.it_value.tv_sec = 0,
+		.it_value.tv_nsec = 1000 * 1000,
+		.it_interval.tv_sec = 0,
+		.it_interval.tv_nsec = 1000 * 1000,
+	};
+
+	printf("Check timer_create() per process signal distribution... ");
+	fflush(stdout);
+
+	remain = nthreads + 1;  /* worker threads + this thread */
+	signal(SIGALRM, distribution_handler);
+	err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
+	if (err < 0) {
+		perror("Can't create timer\n");
+		return -1;
+	}
+	err = timer_settime(id, 0, &val, NULL);
+	if (err < 0) {
+		perror("Can't set timer\n");
+		return -1;
+	}
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_create(&threads[i], NULL, distribution_thr, NULL)) {
+			perror("Can't create thread\n");
+			return -1;
+		}
+	}
+
+	/* Wait for all threads to receive the signal. */
+	while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+
+	for (i = 0; i < nthreads; i++) {
+		if (pthread_join(threads[i], NULL)) {
+			perror("Can't join thread\n");
+			return -1;
+		}
+	}
+
+	if (timer_delete(id)) {
+		perror("Can't delete timer\n");
+		return -1;
+	}
+
+	printf("[OK]\n");
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	printf("Testing posix timers. False negative may happen on CPU execution \n");
@@ -217,5 +287,8 @@ int main(int argc, char **argv)
 	if (check_timer_create(CLOCK_PROCESS_CPUTIME_ID) < 0)
 		return ksft_exit_fail();
 
+	if (check_timer_distribution() < 0)
+		return ksft_exit_fail();
+
 	return ksft_exit_pass();
 }

base-commit: c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
-- 
2.39.2.637.g21b0678d19-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] posix-timers: Prefer delivery of signals to the current thread
  2023-02-20 14:43       ` [PATCH v5] " Dmitry Vyukov
@ 2023-02-22 15:19         ` Oleg Nesterov
  2023-03-14  8:25           ` Dmitry Vyukov
  0 siblings, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2023-02-22 15:19 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On 02/20, Dmitry Vyukov wrote:
>
>  kernel/signal.c                               | 23 +++++-
>  tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
>  2 files changed, 93 insertions(+), 3 deletions(-)

For the change in kernel/signal.c

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5] posix-timers: Prefer delivery of signals to the current thread
  2023-02-22 15:19         ` Oleg Nesterov
@ 2023-03-14  8:25           ` Dmitry Vyukov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2023-03-14  8:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, linux-kernel, Eric W . Biederman, Frederic Weisbecker, Marco Elver

On Wed, 22 Feb 2023 at 16:19, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 02/20, Dmitry Vyukov wrote:
> >
> >  kernel/signal.c                               | 23 +++++-
> >  tools/testing/selftests/timers/posix_timers.c | 73 +++++++++++++++++++
> >  2 files changed, 93 insertions(+), 3 deletions(-)
>
> For the change in kernel/signal.c
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Hi Thomas,

Do you have any remaining concerns about this patch?
Should I drop the test to not confuse people? Then it will be just a
few lines in signal.c.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-03-14  8:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 17:18 [RFC PATCH] posix-timers: Support delivery of signals to the current thread Dmitry Vyukov
2023-01-11 15:49 ` Dmitry Vyukov
2023-01-11 21:28   ` Thomas Gleixner
2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
2023-01-25 12:43   ` Oleg Nesterov
2023-01-25 15:17     ` Oleg Nesterov
2023-01-25 15:34       ` Dmitry Vyukov
2023-01-25 16:31         ` Oleg Nesterov
2023-01-25 16:45           ` Oleg Nesterov
2023-01-25 17:07           ` Oleg Nesterov
2023-01-26 10:58             ` Dmitry Vyukov
2023-01-26 10:56           ` Dmitry Vyukov
2023-01-26 10:51   ` [PATCH v3] posix-timers: Prefer " Dmitry Vyukov
2023-01-26 14:46     ` Oleg Nesterov
2023-01-26 15:41     ` [PATCH v4] " Dmitry Vyukov
2023-01-26 17:51       ` Marco Elver
2023-01-26 19:57         ` Thomas Gleixner
2023-01-27  6:58           ` Dmitry Vyukov
2023-01-28 19:56             ` Oleg Nesterov
2023-01-28 20:15               ` Oleg Nesterov
2023-01-30  9:00               ` Dmitry Vyukov
2023-01-30 16:49                 ` Oleg Nesterov
2023-02-02  7:36             ` Dmitry Vyukov
2023-02-20 14:43       ` [PATCH v5] " Dmitry Vyukov
2023-02-22 15:19         ` Oleg Nesterov
2023-03-14  8:25           ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).