[RFC PATCH] posix-timers: Support delivery of signals to the current thread

* [RFC PATCH] posix-timers: Support delivery of signals to the current thread
@ 2022-12-16 17:18 Dmitry Vyukov
  2023-01-11 15:49 ` Dmitry Vyukov
  2023-01-12 11:24 ` [RFC PATCH v2] " Dmitry Vyukov
  0 siblings, 2 replies; 26+ messages in thread
From: Dmitry Vyukov @ 2022-12-16 17:18 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, Dmitry Vyukov, Marco Elver

Support CLOCK_PROCESS_CPUTIME_ID timers with SIGEV_SIGNAL | SIGEV_THREAD_ID
and sigev_notify_thread_id == 0, which sends the signal to the current
thread that fires the timer. This is useful for relatively uniform sampling
of process execution across all threads with signals.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marco Elver <elver@google.com>

---

We are trying to implement sampling program analysis based on this.
We don't need 100% uniform sampling as a CPU profiler would need,
but we still need the signals to be reasonably distributed across
the process threads.

Thomas, does the idea look sane to you overall?
Are there any existing alternatives to this?

All alternatives we found are complex and/or have high overhead.
E.g. we considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
plus inherited for all threads perf_event(PERF_COUNT_SW_TASK_CLOCK).
When the timer fires we enable the perf event, and then use the signals
from the perf event, and then disable the perf event.
But this has considerable memory overhead (perf event per thread)
and issues IPIs to all CPUs for perf event enable/disable.

We also considered using CLOCK_PROCESS_CPUTIME_ID+SIGEV_SIGNAL timer
and then manually scan /proc/self/task/ and select some task at random.
But this is also complex and makes it hard to do reasonable sampling
proportional to activity of threads.

All alternatives are based on CLOCK_PROCESS_CPUTIME_ID in some way,
and it seems that just a single CLOCK_PROCESS_CPUTIME_ID timer is enough
if it could deliver signals to active threads (what this patch is doing).
The analysis we are trying to do is intended for production systems
so we are aiming at as low overhead as possible.

If this idea looks sane to you in general, I will add tests and I am open
to suggestions on the user API (should it be a new SIGEV_CURRENT_THREAD?)
and on how to represent this in the struct k_itimer. E.g. we could keep
it_pid=current but add a special flag that says to send the signal to
the current task rather than it_pid. This has an advantage that we can
add the following check to posix_timer_event()
(which would be pretty bad to violate):
WARN_ON(!same_thread_group(pid_task(timr->it_pid, PIDTYPE_PID), current));
---
 kernel/time/posix-timers.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 5dead89308b74..411ba087e0699 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -336,6 +336,7 @@ void posixtimer_rearm(struct kernel_siginfo *info)
 int posix_timer_event(struct k_itimer *timr, int si_private)
 {
 	enum pid_type type;
+	struct pid *pid;
 	int ret;
 	/*
 	 * FIXME: if ->sigq is queued we can race with
@@ -350,8 +351,9 @@ int posix_timer_event(struct k_itimer *timr, int si_private)
 	 */
 	timr->sigq->info.si_sys_private = si_private;
 
+	pid = timr->it_pid ?: task_pid(current);
 	type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID;
-	ret = send_sigqueue(timr->sigq, timr->it_pid, type);
+	ret = send_sigqueue(timr->sigq, pid, type);
 	/* If we failed to send the signal the timer stops. */
 	return ret > 0;
 }
@@ -428,27 +430,31 @@ static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer)
 	return ret;
 }
 
-static struct pid *good_sigevent(sigevent_t * event)
+static struct pid *good_sigevent(sigevent_t *event, clockid_t which_clock)
 {
 	struct pid *pid = task_tgid(current);
 	struct task_struct *rtn;
 
 	switch (event->sigev_notify) {
 	case SIGEV_SIGNAL | SIGEV_THREAD_ID:
+		/* This will use the current task for signals. */
+		if (which_clock == CLOCK_PROCESS_CPUTIME_ID &&
+		    !event->sigev_notify_thread_id)
+			return NULL;
 		pid = find_vpid(event->sigev_notify_thread_id);
 		rtn = pid_task(pid, PIDTYPE_PID);
 		if (!rtn || !same_thread_group(rtn, current))
-			return NULL;
+			return ERR_PTR(-ENOENT);
 		fallthrough;
 	case SIGEV_SIGNAL:
 	case SIGEV_THREAD:
 		if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
-			return NULL;
+			return ERR_PTR(-EINVAL);
 		fallthrough;
 	case SIGEV_NONE:
 		return pid;
 	default:
-		return NULL;
+		return ERR_PTR(-EINVAL);
 	}
 }
 
@@ -502,6 +508,7 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 	struct k_itimer *new_timer;
 	int error, new_timer_id;
 	int it_id_set = IT_ID_NOT_SET;
+	struct pid *pid;
 
 	if (!kc)
 		return -EINVAL;
@@ -527,9 +534,11 @@ static int do_timer_create(clockid_t which_clock, struct sigevent *event,
 
 	if (event) {
 		rcu_read_lock();
-		new_timer->it_pid = get_pid(good_sigevent(event));
+		pid = good_sigevent(event, which_clock);
+		if (!IS_ERR(pid))
+			new_timer->it_pid = get_pid(pid);
 		rcu_read_unlock();
-		if (!new_timer->it_pid) {
+		if (IS_ERR(pid)) {
 			error = -EINVAL;
 			goto out;
 		}

base-commit: 041fae9c105ae342a4245cf1e0dc56a23fbb9d3c
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread