All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: futex_wait() can  DoS the tick
@ 2015-06-10 12:44 Mike Galbraith
  2015-06-10 15:12 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-10 12:44 UTC (permalink / raw)
  To: LKML; +Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt

Greetings,

Like so...

#include <time.h>
#include <sys/time.h>
#include <linux/futex.h>
#include  <sys/syscall.h>

int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
	return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}


int main()
{
	struct timespec t;
	int f = 1;

	clock_gettime(CLOCK_REALTIME, &t);
	t.tv_sec -= 10;

	while (1) {
		sys_futex(&f, FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME, 1, &t, NULL, FUTEX_BITSET_MATCH_ANY);
	}
}


The above was handed to me by a colleague working on a Xen guest that
livelocked.  I at first though Xen arch must have a weird problem, but
when I tried proggy on my desktop box, while it didn't stop the tick
completely as it did the Xen box, it slowed it to a crawl.  I noticed
that this did not happen with newer kernels, so a bisecting I did go,
and found that...

279f14614 x86: apic: Use tsc deadline for oneshot when available

..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
Xen dom0 remains busted by that testcase whether that patch is applied
to the host or not, even though the hypervisor supports deadline timer,
and seemingly regardless of CPU type all together.

Of all the x86_64 bare metal boxen I've tested, only those with the TSC
deadline timer have shown the issue, and there it goes away as of v3.8
unless you boot lapic=notscdeadline.

However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
hasbeen) can be made to exhibit the symptom, there may be other arches
that get seriously dinged up or maybe even as thoroughly b0rked as Xen
does when hrtimer_interrupt() is pounded into the ground by userspace.

Alternatively, should someone out there know that all bare metal is in
fact fine post 279f14614, that person will likely also know what the Xen
folks need to do to fix up their busted arch. 

The below targets the symptom, consider it hrtimer cluebat attractant.

---
 kernel/time/hrtimer.c |   31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -933,6 +933,8 @@ remove_hrtimer(struct hrtimer *timer, st
 	return 0;
 }
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 		unsigned long delta_ns, const enum hrtimer_mode mode,
 		int wakeup)
@@ -980,8 +982,27 @@ int __hrtimer_start_range_ns(struct hrti
 		 * on dynticks target.
 		 */
 		wake_up_nohz_cpu(new_base->cpu_base->cpu);
-	} else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases) &&
-			hrtimer_reprogram(timer, new_base)) {
+	} else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases)) {
+		int res = hrtimer_reprogram(timer, new_base);
+
+		if (!res)
+			goto out;
+
+		/*
+		 * If a buggy app tries forever to be awakened in the past,
+		 * banging on hrtimer_interrupt() at high speed can stall
+		 * the tick, and on a Xen box, forever.  On haswell with
+		 * tsc_deadline_timer disabled you can see it, though it
+		 * only slows the tick way down.  Other bare metal boxes
+		 * may also be terminally affected.
+		 */
+		if (unlikely(wakeup && !ret && IS_ERR_VALUE(res) &&
+			    timer->function == hrtimer_wakeup)) {
+			debug_deactivate(timer);
+			__remove_hrtimer(timer, new_base, 0, 0);
+			ret = -ETIMEDOUT;
+		}
+
 		/*
 		 * Only allow reprogramming if the new base is on this CPU.
 		 * (it might still be on another CPU if the timer was pending)
@@ -994,7 +1015,10 @@ int __hrtimer_start_range_ns(struct hrti
 			 * lock ordering issue vs. rq->lock.
 			 */
 			raw_spin_unlock(&new_base->cpu_base->lock);
-			raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+			if (!IS_ERR_VALUE(ret))
+				raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+			else
+				hrtimer_wakeup(timer);
 			local_irq_restore(flags);
 			return ret;
 		} else {
@@ -1002,6 +1026,7 @@ int __hrtimer_start_range_ns(struct hrti
 		}
 	}
 
+out:
 	unlock_hrtimer_base(timer, &flags);
 
 	return ret;



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 12:44 RFC: futex_wait() can DoS the tick Mike Galbraith
@ 2015-06-10 15:12 ` Thomas Gleixner
  2015-06-10 16:29   ` Mike Galbraith
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-10 15:12 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 10 Jun 2015, Mike Galbraith wrote:
> The above was handed to me by a colleague working on a Xen guest that
> livelocked.  I at first though Xen arch must have a weird problem, but
> when I tried proggy on my desktop box, while it didn't stop the tick
> completely as it did the Xen box, it slowed it to a crawl.  I noticed
> that this did not happen with newer kernels, so a bisecting I did go,
> and found that...
> 
> 279f14614 x86: apic: Use tsc deadline for oneshot when available
> 
> ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a

This does not make any sense at all. It does not matter whether the
box uses tscdeadline or local apic timer. We do not even program the
hardware because we see that the event is in the past already.

So we raise the hrtimer softirqd, which then expires the timer. So all
what happens is that ksoftirqd accumulates runtime, but I cannot at
all see how that amounts to a DoS and brings the machine to a grinding
halt.

> Xen dom0 remains busted by that testcase whether that patch is applied
> to the host or not, even though the hypervisor supports deadline timer,
> and seemingly regardless of CPU type all together.
>
> Of all the x86_64 bare metal boxen I've tested, only those with the TSC
> deadline timer have shown the issue, and there it goes away as of v3.8
> unless you boot lapic=notscdeadline.

I just booted a SNB with lapic=notscdeadline and ran that test
program. All what happens is - as expected - that ksoftirqd runs more
than we would like it to. I cannot observe any anomality vs. local
timer interrupts at all. If I run this pinned on an otherwise idle
core, then I get ~ CONFIG_HZ interrupts per second, which is what you
expect when the cpu never reaches idle.

With the changes pending in tip/timers/core we get more timer
interrupts instead of offloading crap to ksoftirqd, but they cannot
lead to a DoS either and we do not care whether the user spends its
cycles looping in user space or firing timer interrupts. It can only
do as long as it is on the cpu.

These timers (futex, nanosleep, poll, ...) are oneshot and all timers
which are self rearming are rate limited by the fact that we only
rearm when the previous event has been consumed by the task which
scheduled it. So the scheduler controls how many of these events can
be created from user space.

> However, given any x86_64 Intel box with TSC deadline timer (ivy, sandy,
> hasbeen) can be made to exhibit the symptom, there may be other arches
> that get seriously dinged up or maybe even as thoroughly b0rked as Xen
> does when hrtimer_interrupt() is pounded into the ground by userspace.
> 
> Alternatively, should someone out there know that all bare metal is in
> fact fine post 279f14614, that person will likely also know what the Xen
> folks need to do to fix up their busted arch. 
> 
> The below targets the symptom, consider it hrtimer cluebat attractant.

By now I know to take your patches with a grain of salt :)

Some more information about your symptoms in form of configuration,
extra patches, kernel traces etc. would be appreciated.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 15:12 ` Thomas Gleixner
@ 2015-06-10 16:29   ` Mike Galbraith
  2015-06-10 18:27     ` Thomas Gleixner
  2015-06-10 17:30   ` Mike Galbraith
  2015-06-11  7:35   ` Mike Galbraith
  2 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-10 16:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 5721 bytes --]

On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > The above was handed to me by a colleague working on a Xen guest that
> > livelocked.  I at first though Xen arch must have a weird problem, but
> > when I tried proggy on my desktop box, while it didn't stop the tick
> > completely as it did the Xen box, it slowed it to a crawl.  I noticed
> > that this did not happen with newer kernels, so a bisecting I did go,
> > and found that...
> > 
> > 279f14614 x86: apic: Use tsc deadline for oneshot when available
> > 
> > ..is what fixed it up.  Trouble is, while it fixes up my Haswell box, a
> 
> This does not make any sense at all. It does not matter whether the
> box uses tscdeadline or local apic timer. We do not even program the
> hardware because we see that the event is in the past already.

Yup.

> So we raise the hrtimer softirqd, which then expires the timer. So all
> what happens is that ksoftirqd accumulates runtime, but I cannot at
> all see how that amounts to a DoS and brings the machine to a grinding
> halt.

The tick certainly appears to crawl here, and Dom0 boxen gripe if you
let them not tick at all for a while.

> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all. If I run this pinned on an otherwise idle
> core, then I get ~ CONFIG_HZ interrupts per second, which is what you
> expect when the cpu never reaches idle.

Hm.  In order to successfully bisect the thing 3.7->3.8 I ran 2xCPUS
copies because the first bisect went gaga.  I'm not having any trouble
reproducing on master with a single pinned copy though, nor did I have
any on any of the kernels either stable or enterprise I tested, and
that's quite a few.  Whatever, that first bisect did go bad.

> > The below targets the symptom, consider it hrtimer cluebat attractant.
> 
> By now I know to take your patches with a grain of salt :)

Sodium being bad for blood pressure is a medical myth.

> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.

Virgin source or kernels with zillion+ patches, doesn't matter.  To test
virgin source earlier than EFI_STUB I had to pollute the source with
EFI backports, but nothing else.

Just a sec while I check yet again that absolutely virgin master really
really does stall....  Yup.  I pinned the tescase to CPU3..

while sleep 1; do grep LOC /proc/interrupts; done
LOC:       6706       5367       5053       6217       3031       2866       5477       3022   Local timer interrupts
LOC:       6753       5391       5074       6238       3058       2894       5576       3034   Local timer interrupts
LOC:       6791       5422       5104       6265       3066       2903       5582       3039   Local timer interrupts
LOC:       6846       5472       5154       6293       3096       2909       5595       3042   Local timer interrupts
LOC:       6855       5518       5177       6325       3199       2920       5613       3046   Local timer interrupts
LOC:       6892       5552       5217       6338       3234       2935       5637       3053   Local timer interrupts
LOC:       6983       5568       5236       6347       3244       2944       5660       3065   Local timer interrupts
LOC:       7028       5583       5251       6363       3262       2963       5673       3071   Local timer interrupts
LOC:       7217       5676       5343       6383       3305       2976       5682       3078   Local timer interrupts
LOC:       7432       5803       5418       6387       3371       3039       5757       3080   Local timer interrupts <== here
LOC:       7560       6028       5632       6394       3538       3195       5937       3084   Local timer interrupts
LOC:       7747       6135       5720       6394       3543       3262       6087       3086   Local timer interrupts
LOC:       7930       6206       5785       6394       3571       3288       6303       3087   Local timer interrupts
LOC:       8057       6299       5842       6394       3606       3346       6415       3088   Local timer interrupts
LOC:       8236       6361       5921       6394       3632       3409       6630       3090   Local timer interrupts
LOC:       8382       6448       6004       6394       3664       3478       6754       3090   Local timer interrupts
LOC:       8460       6571       6124       6394       3690       3542       6951       3092   Local timer interrupts
LOC:       8605       6670       6224       6394       3723       3614       7078       3093   Local timer interrupts
LOC:       8710       6842       6323       6394       3776       3702       7295       3123   Local timer interrupts
LOC:       8868       6947       6402       6394       3828       3784       7422       3149   Local timer interrupts
LOC:       9077       7124       6523       6394       3901       3848       7637       3172   Local timer interrupts
LOC:       9222       7189       6596       6394       3971       3928       7763       3174   Local timer interrupts
LOC:       9336       7325       6699       6394       4020       3948       7912       3176   Local timer interrupts
LOC:       9423       7414       6849       6395       4089       3979       7940       3177   Local timer interrupts
LOC:       9637       7595       6923       6395       4111       4039       7942       3179   Local timer interrupts
LOC:       9807       7734       7095       6395       4232       4108       8069       3180   Local timer interrupts
^C

Config attached.

	-Mike

[-- Attachment #2: config.xz --]
[-- Type: application/x-xz, Size: 23776 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 15:12 ` Thomas Gleixner
  2015-06-10 16:29   ` Mike Galbraith
@ 2015-06-10 17:30   ` Mike Galbraith
  2015-06-10 18:59     ` Thomas Gleixner
  2015-06-11  7:35   ` Mike Galbraith
  2 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-10 17:30 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

> Some more information about your symptoms in form of configuration,
> extra patches, kernel traces etc. would be appreciated.

homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
homer:~/tmp # perf script | grep irq_vectors:local_timer
     ksoftirqd/3    19 [003]  5586.564238: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5586.564239: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5586.568238: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5586.568239: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5586.632218: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5586.632219: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5587.056069: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5587.056071: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5587.307978: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5587.307978: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5588.255633: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5588.255635: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5589.627132: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5589.627133: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5590.334877: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5590.334883: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5590.510815: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5590.510816: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5590.558794: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5590.558796: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5590.822707: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5590.822709: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5592.346152: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5592.346153: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5592.558074: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5592.558076: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5593.741647: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5593.741649: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5594.953208: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5594.953209: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5595.009197: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5595.009198: irq_vectors:local_timer_exit: vector=239
     ksoftirqd/3    19 [003]  5595.397049: irq_vectors:local_timer_entry: vector=239
     ksoftirqd/3    19 [003]  5595.397050: irq_vectors:local_timer_exit: vector=239
 bnc886785_livel  9817 [003]  5595.473025: irq_vectors:local_timer_entry: vector=239
 bnc886785_livel  9817 [003]  5595.473027: irq_vectors:local_timer_exit: vector=239
homer:~/tmp # 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 16:29   ` Mike Galbraith
@ 2015-06-10 18:27     ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-10 18:27 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 10 Jun 2015, Mike Galbraith wrote:
> Virgin source or kernels with zillion+ patches, doesn't matter.  To test
> virgin source earlier than EFI_STUB I had to pollute the source with
> EFI backports, but nothing else.
> 
> Just a sec while I check yet again that absolutely virgin master really
> really does stall....  Yup.  I pinned the tescase to CPU3..
> 
> LOC:       7432       5803       5418       6387       3371       3039       5757       3080   Local timer interrupts <== here
> LOC:       7560       6028       5632       6394       3538       3195       5937       3084   Local timer interrupts
> LOC:       7747       6135       5720       6394       3543       3262       6087       3086   Local timer interrupts
> LOC:       7930       6206       5785       6394       3571       3288       6303       3087   Local timer interrupts

So now it would be helpful to have a trace to see what CPU3 is
actually doing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 17:30   ` Mike Galbraith
@ 2015-06-10 18:59     ` Thomas Gleixner
  2015-06-10 19:15       ` Steven Rostedt
  2015-06-11  2:27       ` Mike Galbraith
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-10 18:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 10 Jun 2015, Mike Galbraith wrote:

> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> 
> > Some more information about your symptoms in form of configuration,
> > extra patches, kernel traces etc. would be appreciated.
> 
> homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> homer:~/tmp # perf script | grep irq_vectors:local_timer
>      ksoftirqd/3    19 [003]  5586.564238: irq_vectors:local_timer_entry: vector=239
>      ksoftirqd/3    19 [003]  5586.564239: irq_vectors:local_timer_exit: vector=239
>      ksoftirqd/3    19 [003]  5586.568238: irq_vectors:local_timer_entry: vector=239
>      ksoftirqd/3    19 [003]  5586.568239: irq_vectors:local_timer_exit: vector=239
>      ksoftirqd/3    19 [003]  5586.632218: irq_vectors:local_timer_entry: vector=239

That gap is odd. Can you fire up the function tracer please?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 18:59     ` Thomas Gleixner
@ 2015-06-10 19:15       ` Steven Rostedt
  2015-06-10 19:32         ` Thomas Gleixner
  2015-06-11  2:27       ` Mike Galbraith
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2015-06-10 19:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Mike Galbraith, LKML, Peter Zijlstra, Ingo Molnar

On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> 
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > 
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> > 
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10

I would recommend using trace-cmd especially if you add function
tracing, as it's made more for fast tracing.

-- Steve

> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> >      ksoftirqd/3    19 [003]  5586.564238: irq_vectors:local_timer_entry: vector=239
> >      ksoftirqd/3    19 [003]  5586.564239: irq_vectors:local_timer_exit: vector=239
> >      ksoftirqd/3    19 [003]  5586.568238: irq_vectors:local_timer_entry: vector=239
> >      ksoftirqd/3    19 [003]  5586.568239: irq_vectors:local_timer_exit: vector=239
> >      ksoftirqd/3    19 [003]  5586.632218: irq_vectors:local_timer_entry: vector=239
> 
> That gap is odd. Can you fire up the function tracer please?
> 
> Thanks,
> 
> 	tglx


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 19:15       ` Steven Rostedt
@ 2015-06-10 19:32         ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-10 19:32 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, LKML, Peter Zijlstra, Ingo Molnar

On Wed, 10 Jun 2015, Steven Rostedt wrote:

> On Wed, 10 Jun 2015 20:59:49 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Wed, 10 Jun 2015, Mike Galbraith wrote:
> > 
> > > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > > 
> > > > Some more information about your symptoms in form of configuration,
> > > > extra patches, kernel traces etc. would be appreciated.
> > > 
> > > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> 
> I would recommend using trace-cmd especially if you add function
> tracing, as it's made more for fast tracing.
> 
> -- Steve
> 
> > > [ perf record: Woken up 2 times to write data ]
> > > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > > homer:~/tmp # perf script | grep irq_vectors:local_timer
> > >      ksoftirqd/3    19 [003]  5586.564238: irq_vectors:local_timer_entry: vector=239
> > >      ksoftirqd/3    19 [003]  5586.564239: irq_vectors:local_timer_exit: vector=239
> > >      ksoftirqd/3    19 [003]  5586.568238: irq_vectors:local_timer_entry: vector=239
> > >      ksoftirqd/3    19 [003]  5586.568239: irq_vectors:local_timer_exit: vector=239
> > >      ksoftirqd/3    19 [003]  5586.632218: irq_vectors:local_timer_entry: vector=239
> > 
> > That gap is odd. Can you fire up the function tracer please?

And please enable hrtimer events as well. Output of that cpu is good enough.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 18:59     ` Thomas Gleixner
  2015-06-10 19:15       ` Steven Rostedt
@ 2015-06-11  2:27       ` Mike Galbraith
  2015-06-11  8:34         ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11  2:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> On Wed, 10 Jun 2015, Mike Galbraith wrote:
> 
> > On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> > 
> > > Some more information about your symptoms in form of configuration,
> > > extra patches, kernel traces etc. would be appreciated.
> > 
> > homer:~/tmp # perf record -C 3 -eirq:*,irq_vectors:local* -- sleep 10
> > [ perf record: Woken up 2 times to write data ]
> > [ perf record: Captured and wrote 1380.507 MB perf.data (19530595 samples) ]
> > homer:~/tmp # perf script | grep irq_vectors:local_timer
> >      ksoftirqd/3    19 [003]  5586.564238: irq_vectors:local_timer_entry: vector=239
> >      ksoftirqd/3    19 [003]  5586.564239: irq_vectors:local_timer_exit: vector=239
> >      ksoftirqd/3    19 [003]  5586.568238: irq_vectors:local_timer_entry: vector=239
> >      ksoftirqd/3    19 [003]  5586.568239: irq_vectors:local_timer_exit: vector=239
> >      ksoftirqd/3    19 [003]  5586.632218: irq_vectors:local_timer_entry: vector=239
> 
> That gap is odd. Can you fire up the function tracer please?

Well, trying to do that like so...

trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e timer:hrtimer*  -- sleep 1

..capture takes much more than a second...

LOC:     248161     226536      42091      38892   Local timer interrupts
LOC:     248381     226783      42092      38901   Local timer interrupts
LOC:     248668     227038      42092      38903   Local timer interrupts
LOC:     248963     227277      42092      38904   Local timer interrupts
LOC:     249214     227515      42092      38905   Local timer interrupts
LOC:     249486     227732      42092      38905   Local timer interrupts
LOC:     249720     227961      42092      38905   Local timer interrupts
LOC:     249996     228200      42092      38906   Local timer interrupts
LOC:     250215     228450      42092      38906   Local timer interrupts
LOC:     250489     228704      42092      38906   Local timer interrupts
LOC:     250729     228952      42099      38964   Local timer interrupts
LOC:     250951     229141      42100      39215   Local timer interrupts
LOC:     251168     229374      42119      39468   Local timer interrupts
LOC:     251426     229641      42131      39719   Local timer interrupts
LOC:     251652     229868      42131      39970   Local timer interrupts
LOC:     251881     230141      42132      40220   Local timer interrupts
LOC:     252131     230371      42132      40471   Local timer interrupts
LOC:     252361     230646      42133      40721   Local timer interrupts
LOC:     252611     230875      42133      40972   Local timer interrupts
LOC:     252866     231149      42137      41222   Local timer interrupts
LOC:     253103     231386      42153      41450   Local timer interrupts

...but more importantly, when it gets cranked up..

homer:~/tmp # trace-cmd report > report
homer:~/tmp # grep local_timer_entry report|wc -l
252

...it scares the problem away.

(4 cpus because I fired up cpuset shield, isolates 2-3, takes 4-7 down)

	-Mike


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-10 15:12 ` Thomas Gleixner
  2015-06-10 16:29   ` Mike Galbraith
  2015-06-10 17:30   ` Mike Galbraith
@ 2015-06-11  7:35   ` Mike Galbraith
  2015-06-11  7:43     ` Mike Galbraith
  2 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11  7:35 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:

> I just booted a SNB with lapic=notscdeadline and ran that test
> program. All what happens is - as expected - that ksoftirqd runs more
> than we would like it to. I cannot observe any anomality vs. local
> timer interrupts at all.

Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
verified "went away" when I put master on it this morning, but that
doesn't matter much, having deadline timer feature alone definitely
doesn't mean box can be convinced to reproduce the stall.  Hohum.

	-Mike



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11  7:35   ` Mike Galbraith
@ 2015-06-11  7:43     ` Mike Galbraith
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11  7:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 2015-06-11 at 09:35 +0200, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote:
> 
> > I just booted a SNB with lapic=notscdeadline and ran that test
> > program. All what happens is - as expected - that ksoftirqd runs more
> > than we would like it to. I cannot observe any anomality vs. local
> > timer interrupts at all.
> 
> Ditto on a E5-4620 box this morning, no sign of it.  E5-2670 box I had
> verified "went away"

(dropped off the net on reboot, doesn't respond to remote button poking)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11  2:27       ` Mike Galbraith
@ 2015-06-11  8:34         ` Thomas Gleixner
  2015-06-11 11:41           ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-11  8:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Wed, 2015-06-10 at 20:59 +0200, Thomas Gleixner wrote:
> Well, trying to do that like so...
> 
> trace-cmd record -M 8 -p function -e irq:* -e irq_vectors:local* -e timer:hrtimer*  -- sleep 1

> ..capture takes much more than a second...
> 
> LOC:     248161     226536      42091      38892   Local timer interrupts
> LOC:     248381     226783      42092      38901   Local timer interrupts
> LOC:     248668     227038      42092      38903   Local timer interrupts
> LOC:     248963     227277      42092      38904   Local timer interrupts
> LOC:     249214     227515      42092      38905   Local timer interrupts
> LOC:     249486     227732      42092      38905   Local timer interrupts
> LOC:     249720     227961      42092      38905   Local timer interrupts
> 
> ...but more importantly, when it gets cranked up..
> 
> homer:~/tmp # trace-cmd report > report
> homer:~/tmp # grep local_timer_entry report|wc -l
> 252
> 
> ...it scares the problem away.

Can you try the following, please?

Enable function tracer and hrtimer events manually. Then watch the irq
count on cpu3. If it stalls or becomes slow, then stop the trace with

      echo 0 >/sys/kernel/debug/tracing/tracing_on

If the overhead of the function tracer hides the problem, then try just
with hrtimer, sched_switch and irq events.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11  8:34         ` Thomas Gleixner
@ 2015-06-11 11:41           ` Mike Galbraith
  2015-06-11 13:13             ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11 11:41 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:

> Can you try the following, please?
> 
> Enable function tracer and hrtimer events manually. Then watch the irq
> count on cpu3. If it stalls or becomes slow, then stop the trace with
> 
>       echo 0 >/sys/kernel/debug/tracing/tracing_on
> 
> If the overhead of the function tracer hides the problem, then try just
> with hrtimer, sched_switch and irq events.

Yeah, function tracer hides it.  After some fiddling with buffer size to
be able to ever see a tick, I finally have a trace trimmed with a tick
on each end, 175ms gap between them.  It's still 1.5 after xz, sending
off list.

	-Mike



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11 11:41           ` Mike Galbraith
@ 2015-06-11 13:13             ` Thomas Gleixner
  2015-06-11 13:58               ` Mike Galbraith
  2015-06-11 18:43               ` Mike Galbraith
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-11 13:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 10:34 +0200, Thomas Gleixner wrote:
> 
> > Can you try the following, please?
> > 
> > Enable function tracer and hrtimer events manually. Then watch the irq
> > count on cpu3. If it stalls or becomes slow, then stop the trace with
> > 
> >       echo 0 >/sys/kernel/debug/tracing/tracing_on
> > 
> > If the overhead of the function tracer hides the problem, then try just
> > with hrtimer, sched_switch and irq events.
> 
> Yeah, function tracer hides it.  After some fiddling with buffer size to
> be able to ever see a tick, I finally have a trace trimmed with a tick
> on each end, 175ms gap between them.  It's still 1.5 after xz, sending
> off list.

What you are looking at is the actual timer interrupt vector, but
that's not the the tick.

# grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
45

175 / 45  =~ 4 . So I you have CPNFIG_HZ=250

The kernel does not care whether the actual interrupt happens or the
timer is expired by other means.

If Xen relies on the vector, it's none of my problems ....

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11 13:13             ` Thomas Gleixner
@ 2015-06-11 13:58               ` Mike Galbraith
  2015-06-11 18:43               ` Mike Galbraith
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11 13:58 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

> What you are looking at is the actual timer interrupt vector, but
> that's not the the tick.
> 
> # grep 'expire_entry.*tick_sched_timer' trace2 | wc -l
> 45
> 
> 175 / 45  =~ 4 . So I you have CPNFIG_HZ=250
> 
> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.

Learn something new every day.  I (obviously) had no idea that it's
perfectly fine for interrupts to go away when the CPU is busy.

Oh well.  Thanks for the new wrinkle in the mush between my ears.

	-Mike



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11 13:13             ` Thomas Gleixner
  2015-06-11 13:58               ` Mike Galbraith
@ 2015-06-11 18:43               ` Mike Galbraith
  2015-06-11 18:52                 ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11 18:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:

> The kernel does not care whether the actual interrupt happens or the
> timer is expired by other means.

Final question: Notabug explanation fits.. but why would the behavior
delta be so radical and repeatable w/wo deadline timer on some boxen?

	-Mike



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11 18:43               ` Mike Galbraith
@ 2015-06-11 18:52                 ` Thomas Gleixner
  2015-06-11 18:55                   ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-06-11 18:52 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 11 Jun 2015, Mike Galbraith wrote:
> On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> 
> > The kernel does not care whether the actual interrupt happens or the
> > timer is expired by other means.
> 
> Final question: Notabug explanation fits.. but why would the behavior
> delta be so radical and repeatable w/wo deadline timer on some boxen?

Good question. Cannot answer that w/o looking at traces.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: RFC: futex_wait() can DoS the tick
  2015-06-11 18:52                 ` Thomas Gleixner
@ 2015-06-11 18:55                   ` Mike Galbraith
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Galbraith @ 2015-06-11 18:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Steven Rostedt

On Thu, 2015-06-11 at 20:52 +0200, Thomas Gleixner wrote:
> On Thu, 11 Jun 2015, Mike Galbraith wrote:
> > On Thu, 2015-06-11 at 15:13 +0200, Thomas Gleixner wrote:
> > 
> > > The kernel does not care whether the actual interrupt happens or the
> > > timer is expired by other means.
> > 
> > Final question: Notabug explanation fits.. but why would the behavior
> > delta be so radical and repeatable w/wo deadline timer on some boxen?
> 
> Good question. Cannot answer that w/o looking at traces.

I may come back with some, I spent enough time doing circular notabug
chasing all over hell and creation to be a tad annoyed ;-)

	-Mike



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-06-11 18:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10 12:44 RFC: futex_wait() can DoS the tick Mike Galbraith
2015-06-10 15:12 ` Thomas Gleixner
2015-06-10 16:29   ` Mike Galbraith
2015-06-10 18:27     ` Thomas Gleixner
2015-06-10 17:30   ` Mike Galbraith
2015-06-10 18:59     ` Thomas Gleixner
2015-06-10 19:15       ` Steven Rostedt
2015-06-10 19:32         ` Thomas Gleixner
2015-06-11  2:27       ` Mike Galbraith
2015-06-11  8:34         ` Thomas Gleixner
2015-06-11 11:41           ` Mike Galbraith
2015-06-11 13:13             ` Thomas Gleixner
2015-06-11 13:58               ` Mike Galbraith
2015-06-11 18:43               ` Mike Galbraith
2015-06-11 18:52                 ` Thomas Gleixner
2015-06-11 18:55                   ` Mike Galbraith
2015-06-11  7:35   ` Mike Galbraith
2015-06-11  7:43     ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.