All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/2] timers: Add pending timer bool in struct timer_base
@ 2021-06-10 12:59 Nicolas Saenz Julienne
  2021-06-10 12:59 ` [RFC 2/2] timers: Make sure irq_work is handled when no pending timers Nicolas Saenz Julienne
  2021-06-18 21:06 ` [RFC 1/2] timers: Add pending timer bool in struct timer_base Thomas Gleixner
  0 siblings, 2 replies; 9+ messages in thread
From: Nicolas Saenz Julienne @ 2021-06-10 12:59 UTC (permalink / raw)
  To: linux-rt-users, frederic; +Cc: tglx, mtosatti, Nicolas Saenz Julienne

We need to efficiently check whether a timer base has no pending events.
So introduce a new variable in struct timer_base to do so.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---
 kernel/time/timer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 4f7602724f9a..2d7d68296a3b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -201,6 +201,7 @@ struct timer_base {
 #ifdef CONFIG_PREEMPT_RT
 	spinlock_t		expiry_lock;
 	atomic_t		timer_waiters;
+	bool			pending;
 #endif
 	unsigned long		clk;
 	unsigned long		next_expiry;
@@ -596,6 +597,9 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 		 */
 		base->next_expiry = bucket_expiry;
 		base->next_expiry_recalc = false;
+#ifdef CONFIG_PREEMPT_RT
+		base->pending = true;
+#endif
 		trigger_dyntick_cpu(base, timer);
 	}
 }
@@ -1598,6 +1602,9 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
 	}
 
 	base->next_expiry_recalc = false;
+#ifdef CONFIG_PREEMPT_RT
+	base->pending = (next != base->clk + NEXT_TIMER_MAX_DELTA);
+#endif
 
 	return next;
 }
@@ -1966,6 +1973,9 @@ int timers_prepare_cpu(unsigned int cpu)
 		base->clk = jiffies;
 		base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
 		base->is_idle = false;
+#ifdef CONFIG_PREEMPT_RT
+		base->pending = false;
+#endif
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-10 12:59 [RFC 1/2] timers: Add pending timer bool in struct timer_base Nicolas Saenz Julienne
@ 2021-06-10 12:59 ` Nicolas Saenz Julienne
  2021-06-18 22:47   ` Thomas Gleixner
  2021-06-30 17:43   ` Alison Chaiken
  2021-06-18 21:06 ` [RFC 1/2] timers: Add pending timer bool in struct timer_base Thomas Gleixner
  1 sibling, 2 replies; 9+ messages in thread
From: Nicolas Saenz Julienne @ 2021-06-10 12:59 UTC (permalink / raw)
  To: linux-rt-users, frederic; +Cc: tglx, mtosatti, Nicolas Saenz Julienne

PREEMPT_RT systems defer most irq_work handling into the timer softirq.
This softirq is only triggered when a timer is expired, which adds some
delay to the irq_work handling. It's a price PREEMPT_RT systems are
willing to pay in exchange for less IRQ noise.

This works fine for the majority of systems, but there's a catch. What
if no timer is ever armed after an irq_work is queued. This has been
observed on nohz_full CPUs while running oslat. The lack of armed timers
prevents a pending irq_work to run. Which in turn prevents the nohz code
from fully stopping the tick.

To avoid this situation introduce new logic in run_local_timers(). The
timer softirq will be triggered when an irq_work is pending but no
timers have been armed. This situation is only possible in PREEMPT_RT
systems, so make the code conditional to it.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---

NOTE: All in all, this is the best I could think of with my limited
      timers knowledge. A bigger hammer would be to unanimously trigger
      the softirq if irq_work_needs_cpu(). But I get the feeling this is
      something we want to avoid.

 kernel/time/timer.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d7d68296a3b..7611673cb172 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1771,6 +1771,28 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
 }
 
+#ifdef CONFIG_PREEMPT_RT
+static inline bool irq_work_needs_softirq(struct timer_base *base)
+{
+	/*
+	 * Neither bases have armed timers and an irq_work is pending. Since we
+	 * can't predict whether a timer will be armed in the future, request
+	 * the timer softirq to be triggered.
+	 */
+	if (!base->pending &&
+	    (IS_ENABLED(CONFIG_NO_HZ_COMMON) && !(base + 1)->pending) &&
+	    irq_work_needs_cpu())
+		return true;
+
+	return false;
+}
+#else
+static inline bool irq_work_needs_softirq(struct timer_base *base)
+{
+	return false;
+}
+#endif
+
 /*
  * Called by the local, per-CPU timer interrupt on SMP.
  */
@@ -1779,6 +1801,10 @@ static void run_local_timers(void)
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
 	hrtimer_run_queues();
+
+	if (irq_work_needs_softirq(base))
+		goto raise;
+
 	/* Raise the softirq only if required. */
 	if (time_before(jiffies, base->next_expiry)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
@@ -1788,6 +1814,8 @@ static void run_local_timers(void)
 		if (time_before(jiffies, base->next_expiry))
 			return;
 	}
+
+raise:
 	raise_softirq(TIMER_SOFTIRQ);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 1/2] timers: Add pending timer bool in struct timer_base
  2021-06-10 12:59 [RFC 1/2] timers: Add pending timer bool in struct timer_base Nicolas Saenz Julienne
  2021-06-10 12:59 ` [RFC 2/2] timers: Make sure irq_work is handled when no pending timers Nicolas Saenz Julienne
@ 2021-06-18 21:06 ` Thomas Gleixner
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2021-06-18 21:06 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, linux-rt-users, frederic
  Cc: mtosatti, Nicolas Saenz Julienne, LKML

Nicolas,

On Thu, Jun 10 2021 at 14:59, Nicolas Saenz Julienne wrote:

please always Cc the relevant mailing lists and the maintainers.
MAINTAINERS exists for a reason.

> We need to efficiently check whether a timer base has no pending
> events.

'We need' is not a technical explanation. That's close to 'I want a pony'.

Please describe what you are trying to solve and why the existing
mechanisms are not good enough.

See Documentation/process/submitting-patches.rst

> So introduce a new variable in struct timer_base to do so.

The variable solves your problem? Interesting solution.

>  		base->next_expiry = bucket_expiry;
>  		base->next_expiry_recalc = false;
> +#ifdef CONFIG_PREEMPT_RT
> +		base->pending = true;
> +#endif

What is RT specific about that?

>  		trigger_dyntick_cpu(base, timer);
>  	}
>  }
> @@ -1598,6 +1602,9 @@ static unsigned long __next_timer_interrupt(struct timer_base *base)
>  	}
>  
>  	base->next_expiry_recalc = false;
> +#ifdef CONFIG_PREEMPT_RT
> +	base->pending = (next != base->clk + NEXT_TIMER_MAX_DELTA);
> +#endif

This lacks any information about the semantics of this flag:

  - When is it valid and when not?
  - What is the valid use case for this flag?

Summary of the supplied information: We need a flag, so we added one.

Sorry that's not sufficient.

Thanks,

        tglx

  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-10 12:59 ` [RFC 2/2] timers: Make sure irq_work is handled when no pending timers Nicolas Saenz Julienne
@ 2021-06-18 22:47   ` Thomas Gleixner
  2021-06-19  8:22     ` Thomas Gleixner
  2021-06-22 13:44     ` Peter Zijlstra
  2021-06-30 17:43   ` Alison Chaiken
  1 sibling, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2021-06-18 22:47 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: linux-rt-users, frederic, mtosatti, LKML, Peter Zijlstra

Nicolas,

On Thu, Jun 10 2021 at 14:59, Nicolas Saenz Julienne wrote:
> PREEMPT_RT systems defer most irq_work handling into the timer softirq.
> This softirq is only triggered when a timer is expired, which adds some
> delay to the irq_work handling. It's a price PREEMPT_RT systems are
> willing to pay in exchange for less IRQ noise.
>
> This works fine for the majority of systems, but there's a catch. What
> if no timer is ever armed after an irq_work is queued. This has been
> observed on nohz_full CPUs while running oslat. The lack of armed timers
> prevents a pending irq_work to run. Which in turn prevents the nohz code
> from fully stopping the tick.
>
> To avoid this situation introduce new logic in run_local_timers(). The
> timer softirq will be triggered when an irq_work is pending but no
> timers have been armed. This situation is only possible in PREEMPT_RT
> systems, so make the code conditional to it.

now I can see the problem you are trying to solve, but unfortunately the
solution is fundamentally wrong.

> NOTE: All in all, this is the best I could think of with my limited
>       timers knowledge. A bigger hammer would be to unanimously trigger
>       the softirq if irq_work_needs_cpu(). But I get the feeling this is
>       something we want to avoid.

Technical decisions based on feelings are not solving anything and they
result in hard to analyse subtle issues:

Q: Assume there is a timer armed to expire 24h from now. What's the
   difference to no timer being armed?

A: None.

   Just because your use case has either no timers armed at all or has
   timers armed with short expiry is no reason to ignore the really
   simple and obvious technical facts.

But that aside, you analyzed the problem pretty good, but then you
stopped short of identifying the root cause and went off to cure the
symptom.

The irq_work deferring on RT to the timer softirq is the root cause of
the problem. It's a sloppy hack and I'm well aware of that. So this
needs to be fixed and not worked around by adding random undocumented
workarounds into the timer code.

Let's take a step back and analyze why this deferring to timer softirq
context exists on RT.

 1) The irq_work callbacks which are deferred on RT cannot be invoked from
    hard interrupt (IPI) context usually - due to spin_lock usage.

 2) Such irq_work has to be delegated to some suitable context and the
    trivial and lazy way out was to just stick into the timer softirq.

That hack survived for a long time and while I was aware of it, it was
not really high on my priority list of cleanups.

The real solution is to delegate this to a suitable context which is
executed independent of any other constraints.

There are two solutions:

  1) Create a IRQ_WORK softirq and raise that

  2) Simply delegate it to a workqueue

So now you might rightfully ask why I did not do that back then:

  #1 is not an option because we don't want to proliferate softirqs for
     various reasons.

  #2 was not feasible because back then queueing work from hard
     interrupt context was not possible.

So yes, I took the sloppy and easy way out and just glued it onto the
timer softirqs. Nobody complained so far.

As we since then made work queues RT aware and it's possible to queue
work from the irq_work IPI context, the obvious solution is to delegate
this to a work queue.

If we do a proper analysis of the affected irq work callbacks then this
probably makes a lot of sense independent of RT. There are only a few
really urgent irq work items which need to be handled immediately in the
IPI.

RT is special, but as we have demonstrated over time it's not _that_
special. It just needs a proper analysis and a real good argument why
something has to be special for RT and does not fit into the common
case. Or to demonstrate that the common case approach of 'do it right
away' is pointless or even harmfull.

Thanks,

        tglx
---
P.S.: I'm not blaming this on you as a newcomer. There are people on
      your team who should have known better.   

 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-18 22:47   ` Thomas Gleixner
@ 2021-06-19  8:22     ` Thomas Gleixner
  2021-06-22 13:44     ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2021-06-19  8:22 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: linux-rt-users, frederic, mtosatti, LKML, Peter Zijlstra

On Sat, Jun 19 2021 at 00:47, Thomas Gleixner wrote:
> On Thu, Jun 10 2021 at 14:59, Nicolas Saenz Julienne wrote:
> As we since then made work queues RT aware and it's possible to queue
> work from the irq_work IPI context, the obvious solution is to delegate
> this to a work queue.
>
> If we do a proper analysis of the affected irq work callbacks then this
> probably makes a lot of sense independent of RT. There are only a few
> really urgent irq work items which need to be handled immediately in the
> IPI.
>
> RT is special, but as we have demonstrated over time it's not _that_
> special. It just needs a proper analysis and a real good argument why
> something has to be special for RT and does not fit into the common
> case. Or to demonstrate that the common case approach of 'do it right
> away' is pointless or even harmfull.

I skimmed most of the ~40 irq_work instances.

Most of them have no urgency at all. And out of those non-urgent cases
the majority does not even have the requirement to run on the current
CPU, so they can be pushed off to a global work queue which moves them
away from NOHZ full CPUs completely.

That has nothing to do with RT, that's a benefit in general.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-18 22:47   ` Thomas Gleixner
  2021-06-19  8:22     ` Thomas Gleixner
@ 2021-06-22 13:44     ` Peter Zijlstra
  2021-06-22 17:33       ` Jiri Olsa
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-06-22 13:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Nicolas Saenz Julienne, linux-rt-users, frederic, mtosatti, LKML,
	Jiri Olsa

On Sat, Jun 19, 2021 at 12:47:04AM +0200, Thomas Gleixner wrote:
> There are two solutions:
> 
>   1) Create a IRQ_WORK softirq and raise that
> 
>   2) Simply delegate it to a workqueue

IIRC someone was looking to stick the whole thing in a kthread_worker.
Jiri, was that you?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-22 13:44     ` Peter Zijlstra
@ 2021-06-22 17:33       ` Jiri Olsa
  2021-06-22 17:42         ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2021-06-22 17:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Nicolas Saenz Julienne, linux-rt-users,
	frederic, mtosatti, LKML, Jiri Olsa

On Tue, Jun 22, 2021 at 03:44:23PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 19, 2021 at 12:47:04AM +0200, Thomas Gleixner wrote:
> > There are two solutions:
> > 
> >   1) Create a IRQ_WORK softirq and raise that
> > 
> >   2) Simply delegate it to a workqueue
> 
> IIRC someone was looking to stick the whole thing in a kthread_worker.
> Jiri, was that you?
> 

yep, I still plan on doing that

jirka


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-22 17:33       ` Jiri Olsa
@ 2021-06-22 17:42         ` Jiri Olsa
  0 siblings, 0 replies; 9+ messages in thread
From: Jiri Olsa @ 2021-06-22 17:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Nicolas Saenz Julienne, linux-rt-users,
	frederic, mtosatti, LKML, Jiri Olsa

On Tue, Jun 22, 2021 at 07:33:23PM +0200, Jiri Olsa wrote:
> On Tue, Jun 22, 2021 at 03:44:23PM +0200, Peter Zijlstra wrote:
> > On Sat, Jun 19, 2021 at 12:47:04AM +0200, Thomas Gleixner wrote:
> > > There are two solutions:
> > > 
> > >   1) Create a IRQ_WORK softirq and raise that
> > > 
> > >   2) Simply delegate it to a workqueue
> > 
> > IIRC someone was looking to stick the whole thing in a kthread_worker.
> > Jiri, was that you?
> > 
> 
> yep, I still plan on doing that

hum, IIRC that was actually perf specific change we discussed some
time ago I should have read the whole thread before answering

I'll check what was my plan to do and get back ;-)

jirka


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers
  2021-06-10 12:59 ` [RFC 2/2] timers: Make sure irq_work is handled when no pending timers Nicolas Saenz Julienne
  2021-06-18 22:47   ` Thomas Gleixner
@ 2021-06-30 17:43   ` Alison Chaiken
  1 sibling, 0 replies; 9+ messages in thread
From: Alison Chaiken @ 2021-06-30 17:43 UTC (permalink / raw)
  To: nsaenzju; +Cc: linux-rt-users, frederic, Thomas Gleixner, mtosatti

On Thu, Jun 10, 2021 at 6:00 AM Nicolas Saenz Julienne
<nsaenzju@redhat.com> wrote:
> PREEMPT_RT systems defer most irq_work handling into the timer softirq.
> This softirq is only triggered when a timer is expired, which adds some
> delay to the irq_work handling. It's a price PREEMPT_RT systems are
> willing to pay in exchange for less IRQ noise.
>
> This works fine for the majority of systems, but there's a catch. What
> if no timer is ever armed after an irq_work is queued. This has been
> observed on nohz_full CPUs while running oslat. The lack of armed timers
> prevents a pending irq_work to run. Which in turn prevents the nohz code
> from fully stopping the tick.

In https://lore.kernel.org/linux-rt-users/162093721042.5649.1489711837264906533.git-patchwork-notify@kernel.org/T/#u,
Jakub Kicinski noted:
> Another thing while I have your attention - ____napi_schedule() does
> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
> On non-RT we get occasional NOHZ warnings when drivers schedule napi
> from process context, but on RT this is even more of a problem, right?
> ksoftirqd won't run until something else actually wakes it up?

It seems to me that Nicolas Saenz Julienne's solution which raises the
TIMER softirq might result in the same problem, as doing so might not
wake ksoftirqd and result in local_softirq_pending but for TIMER
rather than NET_RX.   Is this correct?

Why NET_RX softirq shows up as 08 in local_softirq_pending is itself a
bit unclear.     It appears to derive from __ARCH_IRQ_STAT, which
means that 08 is NET_RX on x86_64 but perhaps not on other processors.
  Is this correct?   The naive expectation that the number printed by
the message is in keeping with softirq_to_name[] is any event
obviously wrong.

Thanks,
Alison Chaiken
Aurora Innovation

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-30 17:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 12:59 [RFC 1/2] timers: Add pending timer bool in struct timer_base Nicolas Saenz Julienne
2021-06-10 12:59 ` [RFC 2/2] timers: Make sure irq_work is handled when no pending timers Nicolas Saenz Julienne
2021-06-18 22:47   ` Thomas Gleixner
2021-06-19  8:22     ` Thomas Gleixner
2021-06-22 13:44     ` Peter Zijlstra
2021-06-22 17:33       ` Jiri Olsa
2021-06-22 17:42         ` Jiri Olsa
2021-06-30 17:43   ` Alison Chaiken
2021-06-18 21:06 ` [RFC 1/2] timers: Add pending timer bool in struct timer_base Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.