All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V3 0/3] posix-cpu-timers: Move expiry into task work context
@ 2020-07-30 10:14 Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Thomas Gleixner @ 2020-07-30 10:14 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

This is the 3rd installment of the series to move posix timer expiry heavy
lifting out of hard interrupt context.

Running posix CPU timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

If the signal is targeted at the task itself then it won't be delivered
before the task returns to user space anyway. If the signal is targeted at
a supervisor process then it might be slightly delayed, but posix CPU
timers are inaccurate anyway due to the fact that they are tied to the
tick.

This series has no code dependency on the entry/KVM work, but a functional
dependency vs. KVM handling task work before going into guest mode exists.
It applies on mainline and passes all tests - except when KVM is active and
timers are armed on the KVM threads. This particular issue is solved when
the entry changes are applied as well. See below.

The previous version can be found here:

    https://lore.kernel.org/r/20200716201923.228696399@linutronix.de

The changes versus V2 are:

    - Address the ordering issues vs. RT (Peter)
    - Move task work head into task struct (Oleg)
    - Drop the last patch which is an optimization and needs more thought
      now with the reworked state handling.

The series is standalone except for the fact that KVM does not handle task
work before entering a guest. The necessary changes for this are in

    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/entry

and the whole lot (entry + posix CPU timers) is available from:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git timers/posixtimer

Thanks,

	tglx

---
 arch/x86/Kconfig               |    1 
 include/linux/posix-timers.h   |   17 +++
 include/linux/sched.h          |    4 
 include/linux/seccomp.h        |    3 
 kernel/entry/common.c          |    4 
 kernel/time/Kconfig            |    9 +
 kernel/time/posix-cpu-timers.c |  216 ++++++++++++++++++++++++++++++++++++-----
 kernel/time/timer.c            |    1 
 8 files changed, 227 insertions(+), 28 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers()
  2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
@ 2020-07-30 10:14 ` Thomas Gleixner
  2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 2/3] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-07-30 10:14 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

From: Thomas Gleixner <tglx@linutronix.de>

Split it up as a preparatory step to move the heavy lifting out of
interrupt context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/posix-cpu-timers.c |   43 ++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1080,32 +1080,15 @@ static inline bool fastpath_timer_check(
 	return false;
 }
 
-/*
- * This is called from the timer interrupt handler.  The irq handler has
- * already updated our counts.  We need to check if any timers fire now.
- * Interrupts are disabled.
- */
-void run_posix_cpu_timers(void)
+static void __run_posix_cpu_timers(struct task_struct *tsk)
 {
-	struct task_struct *tsk = current;
 	struct k_itimer *timer, *next;
 	unsigned long flags;
 	LIST_HEAD(firing);
 
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * The fast path checks that there are no expired thread or thread
-	 * group timers.  If that's so, just return.
-	 */
-	if (!fastpath_timer_check(tsk))
+	if (!lock_task_sighand(tsk, &flags))
 		return;
 
-	lockdep_posixtimer_enter();
-	if (!lock_task_sighand(tsk, &flags)) {
-		lockdep_posixtimer_exit();
-		return;
-	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1147,6 +1130,28 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+}
+
+/*
+ * This is called from the timer interrupt handler.  The irq handler has
+ * already updated our counts.  We need to check if any timers fire now.
+ * Interrupts are disabled.
+ */
+void run_posix_cpu_timers(void)
+{
+	struct task_struct *tsk = current;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The fast path checks that there are no expired thread or thread
+	 * group timers.  If that's so, just return.
+	 */
+	if (!fastpath_timer_check(tsk))
+		return;
+
+	lockdep_posixtimer_enter();
+	__run_posix_cpu_timers(tsk);
 	lockdep_posixtimer_exit();
 }
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch V3 2/3] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
@ 2020-07-30 10:14 ` Thomas Gleixner
  2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 3/3] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-07-30 10:14 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

From: Thomas Gleixner <tglx@linutronix.de>

Running posix CPU timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

If the signal is targeted at the task itself then it won't be delivered
before the task returns to user space anyway. If the signal is targeted at
a supervisor process then it might be slightly delayed, but posix CPU
timers are inaccurate anyway due to the fact that they are tied to the
tick.

Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Move the task work head to task_struct and just initialize it on
    init_task (Oleg)
    Address the ordering issues vs. RT and add comments documenting
    it (Peter)
    Drop atomic ops as they are not required anymore
---
 include/linux/posix-timers.h   |   17 +++
 include/linux/sched.h          |    4 
 kernel/time/Kconfig            |    9 +
 kernel/time/posix-cpu-timers.c |  185 ++++++++++++++++++++++++++++++++++++++---
 kernel/time/timer.c            |    1 
 5 files changed, 204 insertions(+), 12 deletions(-)

--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -6,6 +6,7 @@
 #include <linux/list.h>
 #include <linux/alarmtimer.h>
 #include <linux/timerqueue.h>
+#include <linux/task_work.h>
 
 struct kernel_siginfo;
 struct task_struct;
@@ -125,6 +126,16 @@ struct posix_cputimers {
 	unsigned int			expiry_active;
 };
 
+/**
+ * posix_cputimers_work - Container for task work based posix CPU timer expiry
+ * @work:	The task work to be scheduled
+ * @scheduled:  @work has been scheduled already, no further processing
+ */
+struct posix_cputimers_work {
+	struct callback_head	work;
+	unsigned int		scheduled;
+};
+
 static inline void posix_cputimers_init(struct posix_cputimers *pct)
 {
 	memset(pct, 0, sizeof(*pct));
@@ -165,6 +176,12 @@ static inline void posix_cputimers_group
 					      u64 cpu_limit) { }
 #endif
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void posix_cputimers_init_work(void);
+#else
+static inline void posix_cputimers_init_work(void) { }
+#endif
+
 #define REQUEUE_PENDING 1
 
 /**
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -882,6 +882,10 @@ struct task_struct {
 	/* Empty if CONFIG_POSIX_CPUTIMERS=n */
 	struct posix_cputimers		posix_cputimers;
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+	struct posix_cputimers_work	posix_cputimers_work;
+#endif
+
 	/* Process credentials: */
 
 	/* Tracer's credentials at attach: */
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -52,6 +52,15 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
 config GENERIC_CMOS_UPDATE
 	bool
 
+# Select to handle posix CPU timers from task_work
+# and not from the timer interrupt context
+config HAVE_POSIX_CPU_TIMERS_TASK_WORK
+	bool
+
+config POSIX_CPU_TIMERS_TASK_WORK
+	bool
+	default y if POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
+
 if GENERIC_CLOCKEVENTS
 menu "Timers subsystem"
 
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -377,6 +377,7 @@ static int posix_cpu_clock_get(const clo
  */
 static int posix_cpu_timer_create(struct k_itimer *new_timer)
 {
+	static struct lock_class_key posix_cpu_timers_key;
 	struct pid *pid;
 
 	rcu_read_lock();
@@ -386,6 +387,17 @@ static int posix_cpu_timer_create(struct
 		return -EINVAL;
 	}
 
+	/*
+	 * If posix timer expiry is handled in task work context then
+	 * timer::it_lock can be taken without disabling interrupts as all
+	 * other locking happens in task context. This requires a seperate
+	 * lock class key otherwise regular posix timer expiry would record
+	 * the lock class being taken in interrupt context and generate a
+	 * false positive warning.
+	 */
+	if (IS_ENABLED(CONFIG_POSIX_CPU_TIMERS_TASK_WORK))
+		lockdep_set_class(&new_timer->it_lock, &posix_cpu_timers_key);
+
 	new_timer->kclock = &clock_posix_cpu;
 	timerqueue_init(&new_timer->it.cpu.node);
 	new_timer->it.cpu.pid = get_pid(pid);
@@ -1080,26 +1092,163 @@ static inline bool fastpath_timer_check(
 	return false;
 }
 
-static void __run_posix_cpu_timers(struct task_struct *tsk)
+static void handle_posix_cpu_timers(struct task_struct *tsk);
+
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+static void posix_cpu_timers_work(struct callback_head *work)
+{
+	handle_posix_cpu_timers(current);
+}
+
+/*
+ * Initialize posix CPU timers task work in init task. Out of line to
+ * keep the callback static and to avoid header recursion hell.
+ */
+void __init posix_cputimers_init_work(void)
+{
+	init_task_work(&current->posix_cputimers_work.work,
+		       posix_cpu_timers_work);
+}
+
+/*
+ * Note: All operations on tsk->posix_cputimer_work.scheduled happen either
+ * in hard interrupt context or in task context with interrupts
+ * disabled. Aside of that the writer/reader interaction is always in the
+ * context of the current task, which means they are strict per CPU.
+ */
+static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
+{
+	return tsk->posix_cputimers_work.scheduled;
+}
+
+static inline void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	if (WARN_ON_ONCE(tsk->posix_cputimers_work.scheduled))
+		return;
+
+	/* Schedule task work to actually expire the timers */
+	tsk->posix_cputimers_work.scheduled = true;
+	task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);
+}
+
+static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,
+						unsigned long start)
+{
+	bool ret = true;
+
+	/*
+	 * On !RT kernels interrupts are disabled while collecting expired
+	 * timers, so no tick can happen and the fast path check can be
+	 * reenabled without further checks.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		tsk->posix_cputimers_work.scheduled = false;
+		return true;
+	}
+
+	/*
+	 * On RT enabled kernels ticks can happen while the expired timers
+	 * are collected under sighand lock. But any tick which observes
+	 * the CPUTIMERS_WORK_SCHEDULED bit set, does not run the fastpath
+	 * checks. So reenabling the tick work has do be done carefully:
+	 *
+	 * Disable interrupts and run the fast path check if jiffies have
+	 * advanced since the collecting of expired timers started. If
+	 * jiffies have not advanced or the fast path check did not find
+	 * newly expired timers, reenable the fast path check in the timer
+	 * interrupt. If there are newly expired timers, return false and
+	 * let the collection loop repeat.
+	 */
+	local_irq_disable();
+	if (start != jiffies && fastpath_timer_check(tsk))
+		ret = false;
+	else
+		tsk->posix_cputimers_work.scheduled = false;
+	local_irq_enable();
+
+	return ret;
+}
+#else /* CONFIG_POSIX_CPU_TIMERS_TASK_WORK */
+static inline void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	lockdep_posixtimer_enter();
+	handle_posix_cpu_timers(tsk);
+	lockdep_posixtimer_exit();
+}
+
+static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
+{
+	return false;
+}
+
+static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,
+						unsigned long start)
+{
+	return true;
+}
+#endif /* CONFIG_POSIX_CPU_TIMERS_TASK_WORK */
+
+static void handle_posix_cpu_timers(struct task_struct *tsk)
 {
 	struct k_itimer *timer, *next;
-	unsigned long flags;
+	unsigned long flags, start;
 	LIST_HEAD(firing);
 
 	if (!lock_task_sighand(tsk, &flags))
 		return;
 
-	/*
-	 * Here we take off tsk->signal->cpu_timers[N] and
-	 * tsk->cpu_timers[N] all the timers that are firing, and
-	 * put them on the firing list.
-	 */
-	check_thread_timers(tsk, &firing);
+	do {
+		/*
+		 * On RT locking sighand lock does not disable interrupts,
+		 * so this needs to be careful vs. ticks. Store the current
+		 * jiffies value.
+		 */
+		start = READ_ONCE(jiffies);
+		barrier();
+
+		/*
+		 * Here we take off tsk->signal->cpu_timers[N] and
+		 * tsk->cpu_timers[N] all the timers that are firing, and
+		 * put them on the firing list.
+		 */
+		check_thread_timers(tsk, &firing);
+
+		check_process_timers(tsk, &firing);
 
-	check_process_timers(tsk, &firing);
+		/*
+		 * The above timer checks have updated the exipry cache and
+		 * because nothing can have queued or modified timers after
+		 * sighand lock was taken above it is guaranteed to be
+		 * consistent. So the next timer interrupt fastpath check
+		 * will find valid data.
+		 *
+		 * If timer expiry runs in the timer interrupt context then
+		 * the loop is not relevant as timers will be directly
+		 * expired in interrupt context. The stub function below
+		 * returns always true which allows the compiler to
+		 * optimize the loop out.
+		 *
+		 * If timer expiry is deferred to task work context then
+		 * the following rules apply:
+		 *
+		 * - On !RT kernels no tick can have happened on this CPU
+		 *   after sighand lock was acquired because interrupts are
+		 *   disabled. So reenabling task work before dropping
+		 *   sighand lock and reenabling interrupts is race free.
+		 *
+		 * - On RT kernels ticks might have happened but the tick
+		 *   work ignored posix CPU timer handling because the
+		 *   CPUTIMERS_WORK_SCHEDULED bit is set. Reenabling work
+		 *   must be done very carefully including a check whether
+		 *   ticks have happened since the start of the timer
+		 *   expiry checks. posix_cpu_timers_enable_work() takes
+		 *   care of that and eventually lets the expiry checks
+		 *   run again.
+		 */
+	} while (!posix_cpu_timers_enable_work(tsk, start));
 
 	/*
-	 * We must release these locks before taking any timer's lock.
+	 * We must release sighand lock before taking any timer's lock.
 	 * There is a potential race with timer deletion here, as the
 	 * siglock now protects our private firing list.  We have set
 	 * the firing flag in each timer, so that a deletion attempt
@@ -1117,6 +1266,13 @@ static void __run_posix_cpu_timers(struc
 	list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
 		int cpu_firing;
 
+		/*
+		 * spin_lock() is sufficient here even independent of the
+		 * expiry context. If expiry happens in hard interrupt
+		 * context it's obvious. For task work context it's safe
+		 * because all other operations on timer::it_lock happen in
+		 * task context (syscall or exit).
+		 */
 		spin_lock(&timer->it_lock);
 		list_del_init(&timer->it.cpu.elist);
 		cpu_firing = timer->it.cpu.firing;
@@ -1144,15 +1300,20 @@ void run_posix_cpu_timers(void)
 	lockdep_assert_irqs_disabled();
 
 	/*
+	 * If the actual expiry is deferred to task work context and the
+	 * work is already scheduled there is no point to do anything here.
+	 */
+	if (posix_cpu_timers_work_scheduled(tsk))
+		return;
+
+	/*
 	 * The fast path checks that there are no expired thread or thread
 	 * group timers.  If that's so, just return.
 	 */
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	lockdep_posixtimer_enter();
 	__run_posix_cpu_timers(tsk);
-	lockdep_posixtimer_exit();
 }
 
 /*
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2046,6 +2046,7 @@ static void __init init_timer_cpus(void)
 void __init init_timers(void)
 {
 	init_timer_cpus();
+	posix_cputimers_init_work();
 	open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
 }
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch V3 3/3] x86: Select POSIX_CPU_TIMERS_TASK_WORK
  2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
  2020-07-30 10:14 ` [patch V3 2/3] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
@ 2020-07-30 10:14 ` Thomas Gleixner
  2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
  2020-07-31 12:39 ` [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Oleg Nesterov
  2020-08-04 13:20 ` peterz
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-07-30 10:14 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

From: Thomas Gleixner <tglx@linutronix.de>

Move POSIX CPU timer expiry and signal delivery into task context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -209,6 +209,7 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select MMU_GATHER_RCU_TABLE_FREE		if PARAVIRT
+	select HAVE_POSIX_CPU_TIMERS_TASK_WORK
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_FUNCTION_ARG_ACCESS_API


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch V3 0/3] posix-cpu-timers: Move expiry into task work context
  2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-07-30 10:14 ` [patch V3 3/3] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
@ 2020-07-31 12:39 ` Oleg Nesterov
  2020-08-04 13:20 ` peterz
  4 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2020-07-31 12:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Eric W. Biederman, Frederic Weisbecker, John Stultz,
	Paolo Bonzini

On 07/30, Thomas Gleixner wrote:
>
>  arch/x86/Kconfig               |    1 
>  include/linux/posix-timers.h   |   17 +++
>  include/linux/sched.h          |    4 
>  include/linux/seccomp.h        |    3 
>  kernel/entry/common.c          |    4 
>  kernel/time/Kconfig            |    9 +
>  kernel/time/posix-cpu-timers.c |  216 ++++++++++++++++++++++++++++++++++++-----
>  kernel/time/timer.c            |    1 
>  8 files changed, 227 insertions(+), 28 deletions(-)

FWIW,

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch V3 0/3] posix-cpu-timers: Move expiry into task work context
  2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-07-31 12:39 ` [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Oleg Nesterov
@ 2020-08-04 13:20 ` peterz
  4 siblings, 0 replies; 9+ messages in thread
From: peterz @ 2020-08-04 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

On Thu, Jul 30, 2020 at 12:14:04PM +0200, Thomas Gleixner wrote:

>  arch/x86/Kconfig               |    1 
>  include/linux/posix-timers.h   |   17 +++
>  include/linux/sched.h          |    4 
>  include/linux/seccomp.h        |    3 
>  kernel/entry/common.c          |    4 
>  kernel/time/Kconfig            |    9 +
>  kernel/time/posix-cpu-timers.c |  216 ++++++++++++++++++++++++++++++++++++-----
>  kernel/time/timer.c            |    1 
>  8 files changed, 227 insertions(+), 28 deletions(-)

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: timers/core] x86: Select POSIX_CPU_TIMERS_TASK_WORK
  2020-07-30 10:14 ` [patch V3 3/3] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
@ 2020-08-06 17:09   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-08-06 17:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     0099808553ad4f9c04ad7afd966f6d7f470f247f
Gitweb:        https://git.kernel.org/tip/0099808553ad4f9c04ad7afd966f6d7f470f247f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 30 Jul 2020 12:14:07 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 06 Aug 2020 16:50:59 +02:00

x86: Select POSIX_CPU_TIMERS_TASK_WORK

Move POSIX CPU timer expiry and signal delivery into task context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.888613724@linutronix.de
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fd03cef..a82e715 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -209,6 +209,7 @@ config X86
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select MMU_GATHER_RCU_TABLE_FREE		if PARAVIRT
+	select HAVE_POSIX_CPU_TIMERS_TASK_WORK
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RELIABLE_STACKTRACE		if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
 	select HAVE_FUNCTION_ARG_ACCESS_API

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip: timers/core] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-30 10:14 ` [patch V3 2/3] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
@ 2020-08-06 17:09   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-08-06 17:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     1fb497dd003009be95ce67689ac800c446b7acc5
Gitweb:        https://git.kernel.org/tip/1fb497dd003009be95ce67689ac800c446b7acc5
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 30 Jul 2020 12:14:06 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 06 Aug 2020 16:50:59 +02:00

posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

Running posix CPU timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific CPU time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

If the signal is targeted at the task itself then it won't be delivered
before the task returns to user space anyway. If the signal is targeted at
a supervisor process then it might be slightly delayed, but posix CPU
timers are inaccurate anyway due to the fact that they are tied to the
tick.

Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de
---
 include/linux/posix-timers.h   |  17 +++-
 include/linux/sched.h          |   4 +-
 kernel/time/Kconfig            |   9 ++-
 kernel/time/posix-cpu-timers.c | 185 +++++++++++++++++++++++++++++---
 kernel/time/timer.c            |   1 +-
 5 files changed, 204 insertions(+), 12 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index e3f0f85..896c16d 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -6,6 +6,7 @@
 #include <linux/list.h>
 #include <linux/alarmtimer.h>
 #include <linux/timerqueue.h>
+#include <linux/task_work.h>
 
 struct kernel_siginfo;
 struct task_struct;
@@ -125,6 +126,16 @@ struct posix_cputimers {
 	unsigned int			expiry_active;
 };
 
+/**
+ * posix_cputimers_work - Container for task work based posix CPU timer expiry
+ * @work:	The task work to be scheduled
+ * @scheduled:  @work has been scheduled already, no further processing
+ */
+struct posix_cputimers_work {
+	struct callback_head	work;
+	unsigned int		scheduled;
+};
+
 static inline void posix_cputimers_init(struct posix_cputimers *pct)
 {
 	memset(pct, 0, sizeof(*pct));
@@ -165,6 +176,12 @@ static inline void posix_cputimers_group_init(struct posix_cputimers *pct,
 					      u64 cpu_limit) { }
 #endif
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void posix_cputimers_init_work(void);
+#else
+static inline void posix_cputimers_init_work(void) { }
+#endif
+
 #define REQUEUE_PENDING 1
 
 /**
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 06ec604..e9942ce 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -889,6 +889,10 @@ struct task_struct {
 	/* Empty if CONFIG_POSIX_CPUTIMERS=n */
 	struct posix_cputimers		posix_cputimers;
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+	struct posix_cputimers_work	posix_cputimers_work;
+#endif
+
 	/* Process credentials: */
 
 	/* Tracer's credentials at attach: */
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index fcc4235..a09b1d6 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -52,6 +52,15 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
 config GENERIC_CMOS_UPDATE
 	bool
 
+# Select to handle posix CPU timers from task_work
+# and not from the timer interrupt context
+config HAVE_POSIX_CPU_TIMERS_TASK_WORK
+	bool
+
+config POSIX_CPU_TIMERS_TASK_WORK
+	bool
+	default y if POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
+
 if GENERIC_CLOCKEVENTS
 menu "Timers subsystem"
 
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index e5ad873..a71758e 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -377,6 +377,7 @@ static int posix_cpu_clock_get(const clockid_t clock, struct timespec64 *tp)
  */
 static int posix_cpu_timer_create(struct k_itimer *new_timer)
 {
+	static struct lock_class_key posix_cpu_timers_key;
 	struct pid *pid;
 
 	rcu_read_lock();
@@ -386,6 +387,17 @@ static int posix_cpu_timer_create(struct k_itimer *new_timer)
 		return -EINVAL;
 	}
 
+	/*
+	 * If posix timer expiry is handled in task work context then
+	 * timer::it_lock can be taken without disabling interrupts as all
+	 * other locking happens in task context. This requires a seperate
+	 * lock class key otherwise regular posix timer expiry would record
+	 * the lock class being taken in interrupt context and generate a
+	 * false positive warning.
+	 */
+	if (IS_ENABLED(CONFIG_POSIX_CPU_TIMERS_TASK_WORK))
+		lockdep_set_class(&new_timer->it_lock, &posix_cpu_timers_key);
+
 	new_timer->kclock = &clock_posix_cpu;
 	timerqueue_init(&new_timer->it.cpu.node);
 	new_timer->it.cpu.pid = get_pid(pid);
@@ -1080,26 +1092,163 @@ static inline bool fastpath_timer_check(struct task_struct *tsk)
 	return false;
 }
 
-static void __run_posix_cpu_timers(struct task_struct *tsk)
+static void handle_posix_cpu_timers(struct task_struct *tsk);
+
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+static void posix_cpu_timers_work(struct callback_head *work)
+{
+	handle_posix_cpu_timers(current);
+}
+
+/*
+ * Initialize posix CPU timers task work in init task. Out of line to
+ * keep the callback static and to avoid header recursion hell.
+ */
+void __init posix_cputimers_init_work(void)
+{
+	init_task_work(&current->posix_cputimers_work.work,
+		       posix_cpu_timers_work);
+}
+
+/*
+ * Note: All operations on tsk->posix_cputimer_work.scheduled happen either
+ * in hard interrupt context or in task context with interrupts
+ * disabled. Aside of that the writer/reader interaction is always in the
+ * context of the current task, which means they are strict per CPU.
+ */
+static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
+{
+	return tsk->posix_cputimers_work.scheduled;
+}
+
+static inline void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	if (WARN_ON_ONCE(tsk->posix_cputimers_work.scheduled))
+		return;
+
+	/* Schedule task work to actually expire the timers */
+	tsk->posix_cputimers_work.scheduled = true;
+	task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);
+}
+
+static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,
+						unsigned long start)
+{
+	bool ret = true;
+
+	/*
+	 * On !RT kernels interrupts are disabled while collecting expired
+	 * timers, so no tick can happen and the fast path check can be
+	 * reenabled without further checks.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		tsk->posix_cputimers_work.scheduled = false;
+		return true;
+	}
+
+	/*
+	 * On RT enabled kernels ticks can happen while the expired timers
+	 * are collected under sighand lock. But any tick which observes
+	 * the CPUTIMERS_WORK_SCHEDULED bit set, does not run the fastpath
+	 * checks. So reenabling the tick work has do be done carefully:
+	 *
+	 * Disable interrupts and run the fast path check if jiffies have
+	 * advanced since the collecting of expired timers started. If
+	 * jiffies have not advanced or the fast path check did not find
+	 * newly expired timers, reenable the fast path check in the timer
+	 * interrupt. If there are newly expired timers, return false and
+	 * let the collection loop repeat.
+	 */
+	local_irq_disable();
+	if (start != jiffies && fastpath_timer_check(tsk))
+		ret = false;
+	else
+		tsk->posix_cputimers_work.scheduled = false;
+	local_irq_enable();
+
+	return ret;
+}
+#else /* CONFIG_POSIX_CPU_TIMERS_TASK_WORK */
+static inline void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	lockdep_posixtimer_enter();
+	handle_posix_cpu_timers(tsk);
+	lockdep_posixtimer_exit();
+}
+
+static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
+{
+	return false;
+}
+
+static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,
+						unsigned long start)
+{
+	return true;
+}
+#endif /* CONFIG_POSIX_CPU_TIMERS_TASK_WORK */
+
+static void handle_posix_cpu_timers(struct task_struct *tsk)
 {
 	struct k_itimer *timer, *next;
-	unsigned long flags;
+	unsigned long flags, start;
 	LIST_HEAD(firing);
 
 	if (!lock_task_sighand(tsk, &flags))
 		return;
 
-	/*
-	 * Here we take off tsk->signal->cpu_timers[N] and
-	 * tsk->cpu_timers[N] all the timers that are firing, and
-	 * put them on the firing list.
-	 */
-	check_thread_timers(tsk, &firing);
+	do {
+		/*
+		 * On RT locking sighand lock does not disable interrupts,
+		 * so this needs to be careful vs. ticks. Store the current
+		 * jiffies value.
+		 */
+		start = READ_ONCE(jiffies);
+		barrier();
 
-	check_process_timers(tsk, &firing);
+		/*
+		 * Here we take off tsk->signal->cpu_timers[N] and
+		 * tsk->cpu_timers[N] all the timers that are firing, and
+		 * put them on the firing list.
+		 */
+		check_thread_timers(tsk, &firing);
+
+		check_process_timers(tsk, &firing);
+
+		/*
+		 * The above timer checks have updated the exipry cache and
+		 * because nothing can have queued or modified timers after
+		 * sighand lock was taken above it is guaranteed to be
+		 * consistent. So the next timer interrupt fastpath check
+		 * will find valid data.
+		 *
+		 * If timer expiry runs in the timer interrupt context then
+		 * the loop is not relevant as timers will be directly
+		 * expired in interrupt context. The stub function below
+		 * returns always true which allows the compiler to
+		 * optimize the loop out.
+		 *
+		 * If timer expiry is deferred to task work context then
+		 * the following rules apply:
+		 *
+		 * - On !RT kernels no tick can have happened on this CPU
+		 *   after sighand lock was acquired because interrupts are
+		 *   disabled. So reenabling task work before dropping
+		 *   sighand lock and reenabling interrupts is race free.
+		 *
+		 * - On RT kernels ticks might have happened but the tick
+		 *   work ignored posix CPU timer handling because the
+		 *   CPUTIMERS_WORK_SCHEDULED bit is set. Reenabling work
+		 *   must be done very carefully including a check whether
+		 *   ticks have happened since the start of the timer
+		 *   expiry checks. posix_cpu_timers_enable_work() takes
+		 *   care of that and eventually lets the expiry checks
+		 *   run again.
+		 */
+	} while (!posix_cpu_timers_enable_work(tsk, start));
 
 	/*
-	 * We must release these locks before taking any timer's lock.
+	 * We must release sighand lock before taking any timer's lock.
 	 * There is a potential race with timer deletion here, as the
 	 * siglock now protects our private firing list.  We have set
 	 * the firing flag in each timer, so that a deletion attempt
@@ -1117,6 +1266,13 @@ static void __run_posix_cpu_timers(struct task_struct *tsk)
 	list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
 		int cpu_firing;
 
+		/*
+		 * spin_lock() is sufficient here even independent of the
+		 * expiry context. If expiry happens in hard interrupt
+		 * context it's obvious. For task work context it's safe
+		 * because all other operations on timer::it_lock happen in
+		 * task context (syscall or exit).
+		 */
 		spin_lock(&timer->it_lock);
 		list_del_init(&timer->it.cpu.elist);
 		cpu_firing = timer->it.cpu.firing;
@@ -1144,15 +1300,20 @@ void run_posix_cpu_timers(void)
 	lockdep_assert_irqs_disabled();
 
 	/*
+	 * If the actual expiry is deferred to task work context and the
+	 * work is already scheduled there is no point to do anything here.
+	 */
+	if (posix_cpu_timers_work_scheduled(tsk))
+		return;
+
+	/*
 	 * The fast path checks that there are no expired thread or thread
 	 * group timers.  If that's so, just return.
 	 */
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	lockdep_posixtimer_enter();
 	__run_posix_cpu_timers(tsk);
-	lockdep_posixtimer_exit();
 }
 
 /*
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ae5029f..a16764b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2017,6 +2017,7 @@ static void __init init_timer_cpus(void)
 void __init init_timers(void)
 {
 	init_timer_cpus();
+	posix_cputimers_init_work();
 	open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip: timers/core] posix-cpu-timers: Split run_posix_cpu_timers()
  2020-07-30 10:14 ` [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
@ 2020-08-06 17:09   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-08-06 17:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     820903c784a01bf6e143253418508da4f5790cff
Gitweb:        https://git.kernel.org/tip/820903c784a01bf6e143253418508da4f5790cff
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 30 Jul 2020 12:14:05 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 06 Aug 2020 16:50:59 +02:00

posix-cpu-timers: Split run_posix_cpu_timers()

Split it up as a preparatory step to move the heavy lifting out of
interrupt context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.677439437@linutronix.de
---
 kernel/time/posix-cpu-timers.c | 43 ++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 1651179..e5ad873 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1080,32 +1080,15 @@ static inline bool fastpath_timer_check(struct task_struct *tsk)
 	return false;
 }
 
-/*
- * This is called from the timer interrupt handler.  The irq handler has
- * already updated our counts.  We need to check if any timers fire now.
- * Interrupts are disabled.
- */
-void run_posix_cpu_timers(void)
+static void __run_posix_cpu_timers(struct task_struct *tsk)
 {
-	struct task_struct *tsk = current;
 	struct k_itimer *timer, *next;
 	unsigned long flags;
 	LIST_HEAD(firing);
 
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * The fast path checks that there are no expired thread or thread
-	 * group timers.  If that's so, just return.
-	 */
-	if (!fastpath_timer_check(tsk))
+	if (!lock_task_sighand(tsk, &flags))
 		return;
 
-	lockdep_posixtimer_enter();
-	if (!lock_task_sighand(tsk, &flags)) {
-		lockdep_posixtimer_exit();
-		return;
-	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1147,6 +1130,28 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+}
+
+/*
+ * This is called from the timer interrupt handler.  The irq handler has
+ * already updated our counts.  We need to check if any timers fire now.
+ * Interrupts are disabled.
+ */
+void run_posix_cpu_timers(void)
+{
+	struct task_struct *tsk = current;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The fast path checks that there are no expired thread or thread
+	 * group timers.  If that's so, just return.
+	 */
+	if (!fastpath_timer_check(tsk))
+		return;
+
+	lockdep_posixtimer_enter();
+	__run_posix_cpu_timers(tsk);
 	lockdep_posixtimer_exit();
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-06 17:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30 10:14 [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
2020-07-30 10:14 ` [patch V3 1/3] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2020-07-30 10:14 ` [patch V3 2/3] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2020-07-30 10:14 ` [patch V3 3/3] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
2020-08-06 17:09   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2020-07-31 12:39 ` [patch V3 0/3] posix-cpu-timers: Move expiry into task work context Oleg Nesterov
2020-08-04 13:20 ` peterz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.