linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 0/5] posix-cpu-timers: Move expiry into task work context
@ 2020-07-16 20:19 Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 1/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

This is the second attempt of moving the posix CPU timer heavy lifting out
of interrupt context. The initial version can be found here:

   https://lore.kernel.org/r/20190801143250.370326052@linutronix.de

Running posix cpu timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take sighand
   lock, which is a 'sleeping spinlock' in RT

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no real hard requirement to run the expiry code in hard interrupt
context. The posix CPU timers are an approximation anyway, so having them
expired and evaluated in task work context does not really make them worse.

This version is more or less a resend of the posix CPU timer bits of
V1. The back then observed issue that KVM does not handle pending task work
before going into guest mode has been solved differently. The related
patches have been posted here:

  https://lore.kernel.org/r/20200716182208.180916541@linutronix.de

This patch series has no code dependency on the entry/KVM work, but the
functional dependency vs. KVM exists.

It applies on mainline and passes all tests - except when KVM is active and
timers are armed on the KVM threads. This particular issue is solved when
the entry/KVM series is applied as well.

The entry/KVM changes are available from:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/entry

and the whole lot (entry/kvm + posix CPU timers) is available from:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git timers/posixtimer

Thanks,

	tglx

----
 b/arch/x86/Kconfig             |    1 
 include/linux/posix-timers.h   |   33 ++++--
 include/linux/sched/cputime.h  |    2 
 kernel/time/Kconfig            |    5 
 kernel/time/posix-cpu-timers.c |  213 ++++++++++++++++++++++++++++++-----------
 5 files changed, 192 insertions(+), 62 deletions(-)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 1/5] posix-cpu-timers: Split run_posix_cpu_timers()
  2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
@ 2020-07-16 20:19 ` Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Split it up as a preparatory step to move the heavy lifting out of
interrupt context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/posix-cpu-timers.c |   38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1080,27 +1080,12 @@ static inline bool fastpath_timer_check(
 	return false;
 }
 
-/*
- * This is called from the timer interrupt handler.  The irq handler has
- * already updated our counts.  We need to check if any timers fire now.
- * Interrupts are disabled.
- */
-void run_posix_cpu_timers(void)
+static void __run_posix_cpu_timers(struct task_struct *tsk)
 {
-	struct task_struct *tsk = current;
 	struct k_itimer *timer, *next;
 	unsigned long flags;
 	LIST_HEAD(firing);
 
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * The fast path checks that there are no expired thread or thread
-	 * group timers.  If that's so, just return.
-	 */
-	if (!fastpath_timer_check(tsk))
-		return;
-
 	lockdep_posixtimer_enter();
 	if (!lock_task_sighand(tsk, &flags)) {
 		lockdep_posixtimer_exit();
@@ -1151,6 +1136,27 @@ void run_posix_cpu_timers(void)
 }
 
 /*
+ * This is called from the timer interrupt handler.  The irq handler has
+ * already updated our counts.  We need to check if any timers fire now.
+ * Interrupts are disabled.
+ */
+void run_posix_cpu_timers(void)
+{
+	struct task_struct *tsk = current;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The fast path checks that there are no expired thread or thread
+	 * group timers.  If that's so, just return.
+	 */
+	if (!fastpath_timer_check(tsk))
+		return;
+
+	__run_posix_cpu_timers(tsk);
+}
+
+/*
  * Set one of the process-wide special case CPU timers or RLIMIT_CPU.
  * The tsk->sighand->siglock must be held by the caller.
  */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap
  2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 1/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
@ 2020-07-16 20:19 ` Thomas Gleixner
  2020-07-21 12:34   ` Frederic Weisbecker
  2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

For splitting the posix cpu timer code into an interrupt context check and
the actual expiry and signal handling it's required to be able to set and
test for the flags atomically.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/posix-timers.h   |   16 +++++++--------
 include/linux/sched/cputime.h  |    2 -
 kernel/time/posix-cpu-timers.c |   43 ++++++++++++++++++-----------------------
 3 files changed, 28 insertions(+), 33 deletions(-)

--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -109,20 +109,20 @@ struct posix_cputimer_base {
 	struct timerqueue_head	tqhead;
 };
 
+enum {
+	CPUTIMERS_ACTIVE,
+	CPUTIMERS_EXPIRING,
+};
+
 /**
  * posix_cputimers - Container for posix CPU timer related data
- * @bases:		Base container for posix CPU clocks
- * @timers_active:	Timers are queued.
- * @expiry_active:	Timer expiry is active. Used for
- *			process wide timers to avoid multiple
- *			task trying to handle expiry concurrently
- *
+ * @bases:	Base container for posix CPU clocks
+ * @flags:	Flags for various CPUTIMERS_* states
  * Used in task_struct and signal_struct
  */
 struct posix_cputimers {
 	struct posix_cputimer_base	bases[CPUCLOCK_MAX];
-	unsigned int			timers_active;
-	unsigned int			expiry_active;
+	unsigned long			flags;
 };
 
 static inline void posix_cputimers_init(struct posix_cputimers *pct)
--- a/include/linux/sched/cputime.h
+++ b/include/linux/sched/cputime.h
@@ -84,7 +84,7 @@ struct thread_group_cputimer *get_runnin
 	 * Check whether posix CPU timers are active. If not the thread
 	 * group accounting is not active either. Lockless check.
 	 */
-	if (!READ_ONCE(tsk->signal->posix_cputimers.timers_active))
+	if (!test_bit(CPUTIMERS_ACTIVE, &tsk->signal->posix_cputimers.flags))
 		return NULL;
 
 	/*
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -25,7 +25,7 @@ void posix_cputimers_group_init(struct p
 	posix_cputimers_init(pct);
 	if (cpu_limit != RLIM_INFINITY) {
 		pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC;
-		pct->timers_active = true;
+		set_bit(CPUTIMERS_ACTIVE, &pct->flags);
 	}
 }
 
@@ -269,7 +269,7 @@ void thread_group_sample_cputime(struct
 	struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
 	struct posix_cputimers *pct = &tsk->signal->posix_cputimers;
 
-	WARN_ON_ONCE(!pct->timers_active);
+	WARN_ON_ONCE(!test_bit(CPUTIMERS_ACTIVE, &pct->flags));
 
 	proc_sample_cputime_atomic(&cputimer->cputime_atomic, samples);
 }
@@ -292,7 +292,7 @@ static void thread_group_start_cputime(s
 	struct posix_cputimers *pct = &tsk->signal->posix_cputimers;
 
 	/* Check if cputimer isn't running. This is accessed without locking. */
-	if (!READ_ONCE(pct->timers_active)) {
+	if (!test_bit(CPUTIMERS_ACTIVE, &pct->flags)) {
 		struct task_cputime sum;
 
 		/*
@@ -304,13 +304,12 @@ static void thread_group_start_cputime(s
 		update_gt_cputime(&cputimer->cputime_atomic, &sum);
 
 		/*
-		 * We're setting timers_active without a lock. Ensure this
-		 * only gets written to in one operation. We set it after
-		 * update_gt_cputime() as a small optimization, but
+		 * We're setting CPUTIMERS_ACTIVE without a lock. We set it
+		 * after update_gt_cputime() as a small optimization, but
 		 * barriers are not required because update_gt_cputime()
 		 * can handle concurrent updates.
 		 */
-		WRITE_ONCE(pct->timers_active, true);
+		set_bit(CPUTIMERS_ACTIVE, &pct->flags);
 	}
 	proc_sample_cputime_atomic(&cputimer->cputime_atomic, samples);
 }
@@ -335,7 +334,7 @@ static u64 cpu_clock_sample_group(const
 	struct posix_cputimers *pct = &p->signal->posix_cputimers;
 	u64 samples[CPUCLOCK_MAX];
 
-	if (!READ_ONCE(pct->timers_active)) {
+	if (!test_bit(CPUTIMERS_ACTIVE, &pct->flags)) {
 		if (start)
 			thread_group_start_cputime(p, samples);
 		else
@@ -862,7 +861,7 @@ static inline void stop_process_timers(s
 	struct posix_cputimers *pct = &sig->posix_cputimers;
 
 	/* Turn off the active flag. This is done without locking. */
-	WRITE_ONCE(pct->timers_active, false);
+	clear_bit(CPUTIMERS_ACTIVE, &pct->flags);
 	tick_dep_clear_signal(sig, TICK_DEP_BIT_POSIX_TIMER);
 }
 
@@ -906,16 +905,11 @@ static void check_process_timers(struct
 	 * RLIMIT_CPU) nothing to check. Also skip the process wide timer
 	 * processing when there is already another task handling them.
 	 */
-	if (!READ_ONCE(pct->timers_active) || pct->expiry_active)
+	if (!test_bit(CPUTIMERS_ACTIVE, &pct->flags) ||
+	    test_and_set_bit(CPUTIMERS_EXPIRING, &pct->flags))
 		return;
 
 	/*
-	 * Signify that a thread is checking for process timers.
-	 * Write access to this field is protected by the sighand lock.
-	 */
-	pct->expiry_active = true;
-
-	/*
 	 * Collect the current process totals. Group accounting is active
 	 * so the sample can be taken directly.
 	 */
@@ -959,7 +953,7 @@ static void check_process_timers(struct
 	if (expiry_cache_is_inactive(pct))
 		stop_process_timers(sig);
 
-	pct->expiry_active = false;
+	clear_bit(CPUTIMERS_EXPIRING, &pct->flags);
 }
 
 /*
@@ -1057,14 +1051,15 @@ static inline bool fastpath_timer_check(
 	 * a fastpath heuristic to determine whether we should try to
 	 * acquire the sighand lock to handle timer expiry.
 	 *
-	 * In the worst case scenario, if concurrently timers_active is set
-	 * or expiry_active is cleared, but the current thread doesn't see
-	 * the change yet, the timer checks are delayed until the next
-	 * thread in the group gets a scheduler interrupt to handle the
-	 * timer. This isn't an issue in practice because these types of
-	 * delays with signals actually getting sent are expected.
+	 * In the worst case scenario, if concurrently CPUTIMERS_ACTIVE is
+	 * set or CPUTIMERS_EXPIRING is cleared, but the current thread
+	 * doesn't see the change yet, the timer checks are delayed until
+	 * the next thread in the group gets a scheduler interrupt to
+	 * handle the timer. This isn't an issue in practice because these
+	 * types of delays with signals actually getting sent are expected.
 	 */
-	if (READ_ONCE(pct->timers_active) && !READ_ONCE(pct->expiry_active)) {
+	if (test_bit(CPUTIMERS_ACTIVE, &pct->flags) &&
+	    !test_bit(CPUTIMERS_EXPIRING, &pct->flags)) {
 		u64 samples[CPUCLOCK_MAX];
 
 		proc_sample_cputime_atomic(&sig->cputimer.cputime_atomic,


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 1/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap Thomas Gleixner
@ 2020-07-16 20:19 ` Thomas Gleixner
  2020-07-16 22:50   ` Peter Zijlstra
                     ` (2 more replies)
  2020-07-16 20:19 ` [patch V2 4/5] posix-cpu-timers: Expiry timers directly when in task work context Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
  4 siblings, 3 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Running posix cpu timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/posix-timers.h   |   17 ++++++++++++++++
 kernel/time/Kconfig            |    5 ++++
 kernel/time/posix-cpu-timers.c |   42 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 63 insertions(+), 1 deletion(-)

--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -112,25 +112,42 @@ struct posix_cputimer_base {
 enum {
 	CPUTIMERS_ACTIVE,
 	CPUTIMERS_EXPIRING,
+	CPUTIMERS_WORK_SCHEDULED,
 };
 
 /**
  * posix_cputimers - Container for posix CPU timer related data
  * @bases:	Base container for posix CPU clocks
  * @flags:	Flags for various CPUTIMERS_* states
+ * @task_work:	Task work to defer timer expiry into task context
  * Used in task_struct and signal_struct
  */
 struct posix_cputimers {
 	struct posix_cputimer_base	bases[CPUCLOCK_MAX];
 	unsigned long			flags;
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+	struct callback_head		task_work;
+#endif
 };
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void posix_cpu_timers_work(struct callback_head *work);
+
+static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
+{
+	pct->task_work.func = posix_cpu_timers_work;
+}
+#else
+static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
+#endif
+
 static inline void posix_cputimers_init(struct posix_cputimers *pct)
 {
 	memset(pct, 0, sizeof(*pct));
 	pct->bases[0].nextevt = U64_MAX;
 	pct->bases[1].nextevt = U64_MAX;
 	pct->bases[2].nextevt = U64_MAX;
+	posix_cputimer_init_work(pct);
 }
 
 void posix_cputimers_group_init(struct posix_cputimers *pct, u64 cpu_limit);
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -52,6 +52,11 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
 config GENERIC_CMOS_UPDATE
 	bool
 
+# Select to handle posix CPU timers from task_work
+# and not from the timer interrupt context
+config POSIX_CPU_TIMERS_TASK_WORK
+	bool
+
 if GENERIC_CLOCKEVENTS
 menu "Timers subsystem"
 
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -14,6 +14,7 @@
 #include <linux/tick.h>
 #include <linux/workqueue.h>
 #include <linux/compat.h>
+#include <linux/task_work.h>
 #include <linux/sched/deadline.h>
 
 #include "posix-timers.h"
@@ -1075,7 +1076,9 @@ static inline bool fastpath_timer_check(
 	return false;
 }
 
-static void __run_posix_cpu_timers(struct task_struct *tsk)
+static inline void posix_cpu_timers_enable_work(struct task_struct *tsk);
+
+static void handle_posix_cpu_timers(struct task_struct *tsk)
 {
 	struct k_itimer *timer, *next;
 	unsigned long flags;
@@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
 	check_process_timers(tsk, &firing);
 
 	/*
+	 * Allow new work to be scheduled. The expiry cache
+	 * is up to date.
+	 */
+	posix_cpu_timers_enable_work(tsk);
+
+	/*
 	 * We must release these locks before taking any timer's lock.
 	 * There is a potential race with timer deletion here, as the
 	 * siglock now protects our private firing list.  We have set
@@ -1130,6 +1139,37 @@ static void __run_posix_cpu_timers(struc
 	lockdep_posixtimer_exit();
 }
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+
+void posix_cpu_timers_work(struct callback_head *work)
+{
+	handle_posix_cpu_timers(current);
+}
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	struct posix_cputimers *pct = &tsk->posix_cputimers;
+
+	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
+		task_work_add(tsk, &pct->task_work, true);
+}
+
+static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
+{
+	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
+}
+
+#else
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+	handle_posix_cpu_timers(tsk);
+}
+
+static inline void posix_cpu_timers_enable_work(struct task_struct *tsk) { }
+
+#endif
+
 /*
  * This is called from the timer interrupt handler.  The irq handler has
  * already updated our counts.  We need to check if any timers fire now.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 4/5] posix-cpu-timers: Expiry timers directly when in task work context
  2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
@ 2020-07-16 20:19 ` Thomas Gleixner
  2020-07-16 20:19 ` [patch V2 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner
  4 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

If the expiry happens in task context, there is no point in collecting the
expired timers on a list first. Just expire them directly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/posix-cpu-timers.c |   92 +++++++++++++++++++++++++++++++++++------
 1 file changed, 79 insertions(+), 13 deletions(-)

--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -753,13 +753,71 @@ static void posix_cpu_timer_get(struct k
 
 #define MAX_COLLECTED	20
 
-static u64 collect_timerqueue(struct timerqueue_head *head,
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+/*
+ * Expiry in task work context. Access to sighand->siglock is safe here.
+ */
+static void handle_expired_timer(struct cpu_timer *ctmr,
+				 struct list_head *firing)
+{
+	struct k_itimer *timer;
+	int cpu_firing;
+
+	/*
+	 * Unlock sighand lock so the timer can be locked. Keep interrupts
+	 * disabled accross the lock switch.
+	 */
+	spin_unlock(&current->sighand->siglock);
+	timer = container_of(ctmr, struct k_itimer, it.cpu);
+	spin_lock(&timer->it_lock);
+	cpu_firing = timer->it.cpu.firing;
+	timer->it.cpu.firing = 0;
+	/*
+	 * The firing flag is -1 if this raced with a reset of the timer,
+	 * which already reported this almost-firing as an overrun.  So
+	 * don't generate an event.
+	 */
+	if (likely(cpu_firing >= 0))
+		cpu_timer_fire(timer);
+	/*
+	 * Drop timer lock again and reacquire sighand lock. Allow
+	 * interrupts to come in between so this wont block interrupts
+	 * accross the delivery of a gazillion of timers.
+	 */
+	spin_unlock_irq(&timer->it_lock);
+	spin_lock_irq(&current->sighand->siglock);
+}
+#else
+/*
+ * Expiry in interupt context. Just move them to the firing list.
+ */
+static void handle_expired_timer(struct cpu_timer *ctmr,
+				 struct list_head *firing)
+{
+	list_add_tail(&ctmr->elist, firing);
+}
+#endif
+
+static u64 collect_timerqueue(struct posix_cputimer_base *base,
 			      struct list_head *firing, u64 now)
 {
 	struct timerqueue_node *next;
 	int i = 0;
 
-	while ((next = timerqueue_getnext(head))) {
+	/*
+	 * Reset the expiry cache first when expiry context is task work.
+	 * This is required because when sighand lock is dropped new timers
+	 * can be enqueued. That's not a problem for regular posix timers
+	 * as the expiry time would be correct when expire_timerqueue()
+	 * returns, but the expiry cache is also used by itimers which do
+	 * not have a corresponding posix timer and therefore a simple
+	 * update after expire_timerqueue() might overwrite their newly
+	 * written expiry time.
+	 */
+	if (IS_ENABLED(CONFIG_POSIX_CPU_TIMERS_TASK_WORK))
+		base->nextevt = U64_MAX;
+
+	while ((next = timerqueue_getnext(&base->tqhead))) {
 		struct cpu_timer *ctmr;
 		u64 expires;
 
@@ -771,7 +829,7 @@ static u64 collect_timerqueue(struct tim
 
 		ctmr->firing = 1;
 		cpu_timer_dequeue(ctmr);
-		list_add_tail(&ctmr->elist, firing);
+		handle_expired_timer(ctmr, firing);
 	}
 
 	return U64_MAX;
@@ -783,10 +841,8 @@ static void collect_posix_cputimers(stru
 	struct posix_cputimer_base *base = pct->bases;
 	int i;
 
-	for (i = 0; i < CPUCLOCK_MAX; i++, base++) {
-		base->nextevt = collect_timerqueue(&base->tqhead, firing,
-						    samples[i]);
-	}
+	for (i = 0; i < CPUCLOCK_MAX; i++, base++)
+		base->nextevt = collect_timerqueue(base, firing, samples[i]);
 }
 
 static inline void check_dl_overrun(struct task_struct *tsk)
@@ -812,9 +868,10 @@ static bool check_rlimit(u64 time, u64 l
 }
 
 /*
- * Check for any per-thread CPU timers that have fired and move them off
- * the tsk->cpu_timers[N] list onto the firing list.  Here we update the
- * tsk->it_*_expires values to reflect the remaining thread CPU timers.
+ * Check for any per-thread CPU timers that have fired and depending on the
+ * context (task work or interrupt) move them off the tsk->cpu_timers[N]
+ * list onto the firing list or expire them directly.  Update the expiry
+ * cache as well to reflect the remaining thread CPU timers.
  */
 static void check_thread_timers(struct task_struct *tsk,
 				struct list_head *firing)
@@ -889,9 +946,11 @@ static void check_cpu_itimer(struct task
 }
 
 /*
- * Check for any per-thread CPU timers that have fired and move them
- * off the tsk->*_timers list onto the firing list.  Per-thread timers
- * have already been taken off.
+ * Check for any per-process CPU timers that have fired and depending on
+ * the context (task work or interrupt) move them off the tsk->signal timer
+ * list onto the firing list or expire them directly.  Update the expiry
+ * cache to reflect the resulting state. Per-thread timers have already
+ * been handled.
  */
 static void check_process_timers(struct task_struct *tsk,
 				 struct list_head *firing)
@@ -1115,6 +1174,12 @@ static void handle_posix_cpu_timers(stru
 	unlock_task_sighand(tsk, &flags);
 
 	/*
+	 * If task work delivery is enabled, the timers are already
+	 * expired.
+	 */
+	if (IS_ENABLED(CONFIG_POSIX_CPU_TIMERS_TASK_WORK))
+		goto out;
+	/*
 	 * Now that all the timers on our list have the firing flag,
 	 * no one will touch their list entries but us.  We'll take
 	 * each timer's lock before clearing its firing flag, so no
@@ -1136,6 +1201,7 @@ static void handle_posix_cpu_timers(stru
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+out:
 	lockdep_posixtimer_exit();
 }
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [patch V2 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK
  2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-07-16 20:19 ` [patch V2 4/5] posix-cpu-timers: Expiry timers directly when in task work context Thomas Gleixner
@ 2020-07-16 20:19 ` Thomas Gleixner
  4 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-16 20:19 UTC (permalink / raw)
  To: LKML
  Cc: x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Move POSIX CPU timer expiry and signal delivery into task context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,6 +224,7 @@ config X86
 	select NEED_SG_DMA_LENGTH
 	select PCI_DOMAINS			if PCI
 	select PCI_LOCKLESS_CONFIG		if PCI
+	select POSIX_CPU_TIMERS_TASK_WORK
 	select PERF_EVENTS
 	select RTC_LIB
 	select RTC_MC146818_LIB


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
@ 2020-07-16 22:50   ` Peter Zijlstra
  2020-07-17 18:37     ` Thomas Gleixner
  2020-07-23  1:03     ` Frederic Weisbecker
  2020-07-16 22:54   ` Peter Zijlstra
  2020-07-17 17:26   ` Oleg Nesterov
  2 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-07-16 22:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:

> @@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
>  	check_process_timers(tsk, &firing);
>  
>  	/*
> +	 * Allow new work to be scheduled. The expiry cache
> +	 * is up to date.
> +	 */
> +	posix_cpu_timers_enable_work(tsk);
> +
> +	/*
>  	 * We must release these locks before taking any timer's lock.
>  	 * There is a potential race with timer deletion here, as the
>  	 * siglock now protects our private firing list.  We have set

I think I would feel more comfortable if this was done at the very
beginning of that function, possibly even with:

> +static void __run_posix_cpu_timers(struct task_struct *tsk)
> +{
> +	struct posix_cputimers *pct = &tsk->posix_cputimers;
> +
> +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
> +		task_work_add(tsk, &pct->task_work, true);
> +}
> +
> +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> +{
> +	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
	/*
	 * Ensure we observe everything before a failing test_and_set()
	 * in __run_posix_cpu_timers().
	 */
	smp_mb__after_atomic();
> +}

Such that when another timer interrupt happens while we run this, we're
guaranteed to either see it, or get re-queued and thus re-run the
function.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
  2020-07-16 22:50   ` Peter Zijlstra
@ 2020-07-16 22:54   ` Peter Zijlstra
  2020-07-17 18:38     ` Thomas Gleixner
  2020-07-17 17:26   ` Oleg Nesterov
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2020-07-16 22:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> +static void __run_posix_cpu_timers(struct task_struct *tsk)
> +{
> +	struct posix_cputimers *pct = &tsk->posix_cputimers;
> +
> +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
> +		task_work_add(tsk, &pct->task_work, true);

s/true/TWA_RESUME/g

see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")

> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
  2020-07-16 22:50   ` Peter Zijlstra
  2020-07-16 22:54   ` Peter Zijlstra
@ 2020-07-17 17:26   ` Oleg Nesterov
  2020-07-17 18:35     ` Thomas Gleixner
  2 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2020-07-17 17:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Eric W. Biederman, Frederic Weisbecker, John Stultz,
	Paolo Bonzini

Looks correct to me, but I forgot everything about posix-timers.c

this obviously means that the expired timer won't fire until the
task returns to user-mode but probably we don't care.

One cosmetic nit below,

On 07/16, Thomas Gleixner wrote:
>
> +#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
> +void posix_cpu_timers_work(struct callback_head *work);
> +
> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
> +{
> +	pct->task_work.func = posix_cpu_timers_work;

init_task_work() ?

> +}
> +#else
> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
> +#endif
> +
>  static inline void posix_cputimers_init(struct posix_cputimers *pct)
>  {
>  	memset(pct, 0, sizeof(*pct));
>  	pct->bases[0].nextevt = U64_MAX;
>  	pct->bases[1].nextevt = U64_MAX;
>  	pct->bases[2].nextevt = U64_MAX;
> +	posix_cputimer_init_work(pct);
>  }

And I can't resist. I know this is a common practice, please ignore, but to me

	static inline void posix_cputimers_init(struct posix_cputimers *pct)
	{
		memset(pct, 0, sizeof(*pct));
		pct->bases[0].nextevt = U64_MAX;
		pct->bases[1].nextevt = U64_MAX;
		pct->bases[2].nextevt = U64_MAX;
	#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
		init_task_work(&pct->task_work, posix_cpu_timers_work);
	#endif
	}

looks better than 2 posix_cputimer_init_work() definitions above.

Note also that signal_struct->posix_cputimers.task_work is never used, perhaps
it would be better to move this task_work into task_struct? This way we do not
even need to change posix_cputimers_init(), we call simply initialize
init_task.posix_task_work.

Oleg.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-17 17:26   ` Oleg Nesterov
@ 2020-07-17 18:35     ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-17 18:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, x86, Eric W. Biederman, Frederic Weisbecker, John Stultz,
	Paolo Bonzini

Oleg,

Oleg Nesterov <oleg@redhat.com> writes:
> Looks correct to me, but I forgot everything about posix-timers.c

that's not a problem because this is about posix-cpu-timers.c :)

> this obviously means that the expired timer won't fire until the
> task returns to user-mode but probably we don't care.

If the signal goes to the task itself it does not matter at all because
it's going to be delivered when the task goes out to user space.

If the signal goes to a supervisor process, then it will be slightly
delayed but I could not find a problem with that at all.

I'll add more reasoning to the changelog on V3.

>> +#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
>> +void posix_cpu_timers_work(struct callback_head *work);
>> +
>> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
>> +{
>> +	pct->task_work.func = posix_cpu_timers_work;
>
> init_task_work() ?

Yeah.

>> +}
>> +#else
>> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
>> +#endif
>> +
>>  static inline void posix_cputimers_init(struct posix_cputimers *pct)
>>  {
>>  	memset(pct, 0, sizeof(*pct));
>>  	pct->bases[0].nextevt = U64_MAX;
>>  	pct->bases[1].nextevt = U64_MAX;
>>  	pct->bases[2].nextevt = U64_MAX;
>> +	posix_cputimer_init_work(pct);
>>  }
>
> And I can't resist. I know this is a common practice, please ignore, but to me
>
> 	static inline void posix_cputimers_init(struct posix_cputimers *pct)
> 	{
> 		memset(pct, 0, sizeof(*pct));
> 		pct->bases[0].nextevt = U64_MAX;
> 		pct->bases[1].nextevt = U64_MAX;
> 		pct->bases[2].nextevt = U64_MAX;
> 	#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
> 		init_task_work(&pct->task_work, posix_cpu_timers_work);
> 	#endif
> 	}
>
> looks better than 2 posix_cputimer_init_work() definitions above.

Gah, I hate ifdefs in the middle of the code :)

> Note also that signal_struct->posix_cputimers.task_work is never used, perhaps
> it would be better to move this task_work into task_struct? This way we do not
> even need to change posix_cputimers_init(), we call simply initialize
> init_task.posix_task_work.

Let me look into that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 22:50   ` Peter Zijlstra
@ 2020-07-17 18:37     ` Thomas Gleixner
  2020-07-23  1:03     ` Frederic Weisbecker
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-17 18:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Peter Zijlstra <peterz@infradead.org> writes:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>
>> @@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
>>  	check_process_timers(tsk, &firing);
>>  
>>  	/*
>> +	 * Allow new work to be scheduled. The expiry cache
>> +	 * is up to date.
>> +	 */
>> +	posix_cpu_timers_enable_work(tsk);
>> +
>> +	/*
>>  	 * We must release these locks before taking any timer's lock.
>>  	 * There is a potential race with timer deletion here, as the
>>  	 * siglock now protects our private firing list.  We have set
>
> I think I would feel more comfortable if this was done at the very
> beginning of that function, possibly even with:
>
>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> +{
>> +	struct posix_cputimers *pct = &tsk->posix_cputimers;
>> +
>> +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
>> +		task_work_add(tsk, &pct->task_work, true);
>> +}
>> +
>> +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
>> +{
>> +	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
> 	/*
> 	 * Ensure we observe everything before a failing test_and_set()
> 	 * in __run_posix_cpu_timers().
> 	 */
> 	smp_mb__after_atomic();
>> +}
>
> Such that when another timer interrupt happens while we run this, we're
> guaranteed to either see it, or get re-queued and thus re-run the
> function.

Makes sense.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 22:54   ` Peter Zijlstra
@ 2020-07-17 18:38     ` Thomas Gleixner
  2020-07-19 19:33       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-17 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Peter Zijlstra <peterz@infradead.org> writes:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> +{
>> +	struct posix_cputimers *pct = &tsk->posix_cputimers;
>> +
>> +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
>> +		task_work_add(tsk, &pct->task_work, true);
>
> s/true/TWA_RESUME/g
>
> see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")

Duh, yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-17 18:38     ` Thomas Gleixner
@ 2020-07-19 19:33       ` Thomas Gleixner
  2020-07-21 18:50         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-19 19:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Thomas Gleixner <tglx@linutronix.de> writes:
> Peter Zijlstra <peterz@infradead.org> writes:
>> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>>> +{
>>> +	struct posix_cputimers *pct = &tsk->posix_cputimers;
>>> +
>>> +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
>>> +		task_work_add(tsk, &pct->task_work, true);
>>
>> s/true/TWA_RESUME/g
>>
>> see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
>
> Duh, yes.

Bah, that creates a dependency on sched/core ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap
  2020-07-16 20:19 ` [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap Thomas Gleixner
@ 2020-07-21 12:34   ` Frederic Weisbecker
  2020-07-21 16:10     ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-07-21 12:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, John Stultz, Paolo Bonzini

On Thu, Jul 16, 2020 at 10:19:25PM +0200, Thomas Gleixner wrote:
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -25,7 +25,7 @@ void posix_cputimers_group_init(struct p
>  	posix_cputimers_init(pct);
>  	if (cpu_limit != RLIM_INFINITY) {
>  		pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC;
> -		pct->timers_active = true;
> +		set_bit(CPUTIMERS_ACTIVE, &pct->flags);

I guess this one could be __set_bit().

>  	}
>  }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap
  2020-07-21 12:34   ` Frederic Weisbecker
@ 2020-07-21 16:10     ` Thomas Gleixner
  2020-07-21 16:23       ` David Laight
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-21 16:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, John Stultz, Paolo Bonzini

Frederic Weisbecker <frederic@kernel.org> writes:
> On Thu, Jul 16, 2020 at 10:19:25PM +0200, Thomas Gleixner wrote:
>> --- a/kernel/time/posix-cpu-timers.c
>> +++ b/kernel/time/posix-cpu-timers.c
>> @@ -25,7 +25,7 @@ void posix_cputimers_group_init(struct p
>>  	posix_cputimers_init(pct);
>>  	if (cpu_limit != RLIM_INFINITY) {
>>  		pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC;
>> -		pct->timers_active = true;
>> +		set_bit(CPUTIMERS_ACTIVE, &pct->flags);
>
> I guess this one could be __set_bit().

True :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap
  2020-07-21 16:10     ` Thomas Gleixner
@ 2020-07-21 16:23       ` David Laight
  2020-07-21 18:30         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2020-07-21 16:23 UTC (permalink / raw)
  To: 'Thomas Gleixner', Frederic Weisbecker
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, John Stultz, Paolo Bonzini

From: Thomas Gleixner
> Sent: 21 July 2020 17:11
> 
> Frederic Weisbecker <frederic@kernel.org> writes:
> > On Thu, Jul 16, 2020 at 10:19:25PM +0200, Thomas Gleixner wrote:
> >> --- a/kernel/time/posix-cpu-timers.c
> >> +++ b/kernel/time/posix-cpu-timers.c
> >> @@ -25,7 +25,7 @@ void posix_cputimers_group_init(struct p
> >>  	posix_cputimers_init(pct);
> >>  	if (cpu_limit != RLIM_INFINITY) {
> >>  		pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC;
> >> -		pct->timers_active = true;
> >> +		set_bit(CPUTIMERS_ACTIVE, &pct->flags);
> >
> > I guess this one could be __set_bit().
> 
> True :)

Hmmm... does this code need the bit operations to be atmomic?
If not then an bitmap is completely the wrong thing to be using.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap
  2020-07-21 16:23       ` David Laight
@ 2020-07-21 18:30         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-21 18:30 UTC (permalink / raw)
  To: David Laight, Frederic Weisbecker
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, John Stultz, Paolo Bonzini

David Laight <David.Laight@ACULAB.COM> writes:
> From: Thomas Gleixner
>> Sent: 21 July 2020 17:11
>> 
>> Frederic Weisbecker <frederic@kernel.org> writes:
>> > On Thu, Jul 16, 2020 at 10:19:25PM +0200, Thomas Gleixner wrote:
>> >> --- a/kernel/time/posix-cpu-timers.c
>> >> +++ b/kernel/time/posix-cpu-timers.c
>> >> @@ -25,7 +25,7 @@ void posix_cputimers_group_init(struct p
>> >>  	posix_cputimers_init(pct);
>> >>  	if (cpu_limit != RLIM_INFINITY) {
>> >>  		pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC;
>> >> -		pct->timers_active = true;
>> >> +		set_bit(CPUTIMERS_ACTIVE, &pct->flags);
>> >
>> > I guess this one could be __set_bit().
>> 
>> True :)
>
> Hmmm... does this code need the bit operations to be atmomic?
> If not then an bitmap is completely the wrong thing to be using.

Some of it does, otherwise the booleans would have stayed, but I'm
reworking parts of it so this might change.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-19 19:33       ` Thomas Gleixner
@ 2020-07-21 18:50         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-21 18:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker,
	John Stultz, Paolo Bonzini

Thomas Gleixner <tglx@linutronix.de> writes:
>
> Bah, that creates a dependency on sched/core ...

Only when looking at the wrong tree :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-16 22:50   ` Peter Zijlstra
  2020-07-17 18:37     ` Thomas Gleixner
@ 2020-07-23  1:03     ` Frederic Weisbecker
  2020-07-23  8:32       ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-07-23  1:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, x86, Oleg Nesterov, Eric W. Biederman,
	John Stultz, Paolo Bonzini

On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
> > +{
> > +	struct posix_cputimers *pct = &tsk->posix_cputimers;
> > +
> > +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
> > +		task_work_add(tsk, &pct->task_work, true);
> > +}
> > +
> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> > +{
> > +	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
> 	/*
> 	 * Ensure we observe everything before a failing test_and_set()
> 	 * in __run_posix_cpu_timers().
> 	 */
> 	smp_mb__after_atomic();
> > +}
> 
> Such that when another timer interrupt happens while we run this, we're
> guaranteed to either see it, or get re-queued and thus re-run the
> function.

But each thread in the process enqueues its own task work and flips its
own flags. So if task A runs the task work and task B runs __run_posix_cpu_timers(),
they wouldn't be ordering against the same flags.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-23  1:03     ` Frederic Weisbecker
@ 2020-07-23  8:32       ` Thomas Gleixner
  2020-07-23 12:15         ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-07-23  8:32 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra
  Cc: LKML, x86, Oleg Nesterov, Eric W. Biederman, John Stultz, Paolo Bonzini

Frederic Weisbecker <frederic@kernel.org> writes:
> On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> > +{
>> > +	struct posix_cputimers *pct = &tsk->posix_cputimers;
>> > +
>> > +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
>> > +		task_work_add(tsk, &pct->task_work, true);
>> > +}
>> > +
>> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
>> > +{
>> > +	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
>> 	/*
>> 	 * Ensure we observe everything before a failing test_and_set()
>> 	 * in __run_posix_cpu_timers().
>> 	 */
>> 	smp_mb__after_atomic();
>> > +}
>> 
>> Such that when another timer interrupt happens while we run this, we're
>> guaranteed to either see it, or get re-queued and thus re-run the
>> function.
>
> But each thread in the process enqueues its own task work and flips its
> own flags. So if task A runs the task work and task B runs __run_posix_cpu_timers(),
> they wouldn't be ordering against the same flags.

If two tasks queue work independent of each other then one of them will
find it done already, which is the same as if two tasks of the same
process execute run_posix_cpu_timers() in parallel.

I really don't want to go into the rathole of making the work or the
synchronization process wide. That's a guarantee for disaster.

Handling task work strictly per task is straight forward and simple. The
eventually resulting contention on sighand lock in task work is
unavoidable, but that's a reasonable tradeoff vs. the complexity you
need to handle task work process wide.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  2020-07-23  8:32       ` Thomas Gleixner
@ 2020-07-23 12:15         ` Frederic Weisbecker
  0 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-07-23 12:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, LKML, x86, Oleg Nesterov, Eric W. Biederman,
	John Stultz, Paolo Bonzini

On Thu, Jul 23, 2020 at 10:32:54AM +0200, Thomas Gleixner wrote:
> Frederic Weisbecker <frederic@kernel.org> writes:
> > On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
> >> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> >> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
> >> > +{
> >> > +	struct posix_cputimers *pct = &tsk->posix_cputimers;
> >> > +
> >> > +	if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, &pct->flags))
> >> > +		task_work_add(tsk, &pct->task_work, true);
> >> > +}
> >> > +
> >> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> >> > +{
> >> > +	clear_bit(CPUTIMERS_WORK_SCHEDULED, &tsk->posix_cputimers.flags);
> >> 	/*
> >> 	 * Ensure we observe everything before a failing test_and_set()
> >> 	 * in __run_posix_cpu_timers().
> >> 	 */
> >> 	smp_mb__after_atomic();
> >> > +}
> >> 
> >> Such that when another timer interrupt happens while we run this, we're
> >> guaranteed to either see it, or get re-queued and thus re-run the
> >> function.
> >
> > But each thread in the process enqueues its own task work and flips its
> > own flags. So if task A runs the task work and task B runs __run_posix_cpu_timers(),
> > they wouldn't be ordering against the same flags.
> 
> If two tasks queue work independent of each other then one of them will
> find it done already, which is the same as if two tasks of the same
> process execute run_posix_cpu_timers() in parallel.
> 
> I really don't want to go into the rathole of making the work or the
> synchronization process wide. That's a guarantee for disaster.
> 
> Handling task work strictly per task is straight forward and simple. The
> eventually resulting contention on sighand lock in task work is
> unavoidable, but that's a reasonable tradeoff vs. the complexity you
> need to handle task work process wide.

Definetly!

I was only commenting on the barrier suggestion. But I believe it shouldn't
be needed in the end.

If we were to have a per task work for thread timers and a per process work
for process timers, that means we would need to cut down the whole thing, and also
take care about timers firing after exit_task_work(), which isn't an issue
in the thread case as the work will simply be ignored for an exiting task but
it's a big issue in the case of process wide handling.

Anyway, the current layout is simple enough.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-07-23 12:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 20:19 [patch V2 0/5] posix-cpu-timers: Move expiry into task work context Thomas Gleixner
2020-07-16 20:19 ` [patch V2 1/5] posix-cpu-timers: Split run_posix_cpu_timers() Thomas Gleixner
2020-07-16 20:19 ` [patch V2 2/5] posix-cpu-timers: Convert the flags to a bitmap Thomas Gleixner
2020-07-21 12:34   ` Frederic Weisbecker
2020-07-21 16:10     ` Thomas Gleixner
2020-07-21 16:23       ` David Laight
2020-07-21 18:30         ` Thomas Gleixner
2020-07-16 20:19 ` [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Thomas Gleixner
2020-07-16 22:50   ` Peter Zijlstra
2020-07-17 18:37     ` Thomas Gleixner
2020-07-23  1:03     ` Frederic Weisbecker
2020-07-23  8:32       ` Thomas Gleixner
2020-07-23 12:15         ` Frederic Weisbecker
2020-07-16 22:54   ` Peter Zijlstra
2020-07-17 18:38     ` Thomas Gleixner
2020-07-19 19:33       ` Thomas Gleixner
2020-07-21 18:50         ` Thomas Gleixner
2020-07-17 17:26   ` Oleg Nesterov
2020-07-17 18:35     ` Thomas Gleixner
2020-07-16 20:19 ` [patch V2 4/5] posix-cpu-timers: Expiry timers directly when in task work context Thomas Gleixner
2020-07-16 20:19 ` [patch V2 5/5] x86: Select POSIX_CPU_TIMERS_TASK_WORK Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).