netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: sched: use pinned timers
@ 2014-09-21  1:01 Eric Dumazet
  2014-09-26  4:27 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2014-09-21  1:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.

Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :

EST="est 1sec 4sec"
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
 tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
 tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
 tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
 tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
 tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
 tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
 tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done

We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.

    39.24%  [kernel]          [k] _raw_spin_lock
     2.65%  [kernel]          [k] netem_enqueue
     1.80%  [kernel]          [k] netem_dequeue                         
     1.63%  [kernel]          [k] copy_user_enhanced_fast_string
     1.45%  [kernel]          [k] _raw_spin_lock_bh        

By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.

     5.84%  [kernel]          [k] netem_enqueue                      
     4.83%  [kernel]          [k] _raw_spin_lock
     2.92%  [kernel]          [k] copy_user_enhanced_fast_string

Current Qdiscs that benefit from this change are :

	netem, cbq, fq, hfsc, tbf, htb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_api.c |    4 ++--
 net/sched/sch_cbq.c |    4 ++--
 net/sched/sch_htb.c |    2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ca6248345937..15e7beee266c 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -586,7 +586,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
 
 void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc)
 {
-	hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
 	wd->timer.function = qdisc_watchdog;
 	wd->qdisc = qdisc;
 }
@@ -602,7 +602,7 @@ void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires)
 
 	hrtimer_start(&wd->timer,
 		      ns_to_ktime(expires),
-		      HRTIMER_MODE_ABS);
+		      HRTIMER_MODE_ABS_PINNED);
 }
 EXPORT_SYMBOL(qdisc_watchdog_schedule_ns);
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index a3244a800501..d2cd981ba60d 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -617,7 +617,7 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer *timer)
 
 		time = ktime_set(0, 0);
 		time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
-		hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS);
+		hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED);
 	}
 
 	qdisc_unthrottled(sch);
@@ -1386,7 +1386,7 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt)
 	q->link.minidle = -0x7FFFFFFF;
 
 	qdisc_watchdog_init(&q->watchdog, sch);
-	hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
 	q->delay_timer.function = cbq_undelay;
 	q->toplevel = TC_CBQ_MAXLEVEL;
 	q->now = psched_get_time();
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 14408f262143..063e953d9848 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -932,7 +932,7 @@ ok:
 			ktime_t time = ns_to_ktime(next_event);
 			qdisc_throttled(q->watchdog.qdisc);
 			hrtimer_start(&q->watchdog.timer, time,
-				      HRTIMER_MODE_ABS);
+				      HRTIMER_MODE_ABS_PINNED);
 		}
 	} else {
 		schedule_work(&q->work);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: sched: use pinned timers
  2014-09-21  1:01 [PATCH net-next] net: sched: use pinned timers Eric Dumazet
@ 2014-09-26  4:27 ` David Miller
  2015-01-07 10:06   ` Cosmin GIRADU
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2014-09-26  4:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 20 Sep 2014 18:01:30 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> While using a MQ + NETEM setup, I had confirmation that the default
> timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
> 
> Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
> queues) :
 ...
> We can see that timers get migrated into a single cpu, presumably idle
> at the time timers are set up.
> Then all qdisc dequeues run from this cpu and huge lock contention
> happens. This single cpu is stuck in softirq mode and cannot dequeue
> fast enough.
> 
>     39.24%  [kernel]          [k] _raw_spin_lock
>      2.65%  [kernel]          [k] netem_enqueue
>      1.80%  [kernel]          [k] netem_dequeue                         
>      1.63%  [kernel]          [k] copy_user_enhanced_fast_string
>      1.45%  [kernel]          [k] _raw_spin_lock_bh        
> 
> By pinning qdisc timers on the cpu running the qdisc, we respect proper
> XPS setting and remove this lock contention.
> 
>      5.84%  [kernel]          [k] netem_enqueue                      
>      4.83%  [kernel]          [k] _raw_spin_lock
>      2.92%  [kernel]          [k] copy_user_enhanced_fast_string
> 
> Current Qdiscs that benefit from this change are :
> 
> 	netem, cbq, fq, hfsc, tbf, htb.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Looks great, applied, thanks Eric.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: sched: use pinned timers
  2014-09-26  4:27 ` David Miller
@ 2015-01-07 10:06   ` Cosmin GIRADU
  2015-01-07 15:45     ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Cosmin GIRADU @ 2015-01-07 10:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

On 26/09/14 07:27, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 20 Sep 2014 18:01:30 -0700
>
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While using a MQ + NETEM setup, I had confirmation that the default
>> timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
>>
>> Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
>> queues) :
>  ...
>> Current Qdiscs that benefit from this change are :
>>
>> 	netem, cbq, fq, hfsc, tbf, htb.
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Looks great, applied, thanks Eric.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Hi Eric,

    I saw that this patch didn't make it's way into the stable branches.
So I have two questions:
        - Would it be safe to apply to linux-3.12.x stable?
        - If yes, would there be any [noticeable] efects on a [pretty
complex] HTB setup? (I know, test and I'll see,
           but if theory sais I won't, then there would be no point to
the test, would there?)

Thank you!

-- 

Cosmin GIRADU
OSS Engineer
RCS & RDS - Unified Services
Phone:  +40-31-400-6323
Mobile: +40-77-020-0858
http://www.rcs-rds.ro

..........................................................................
Privileged/Confidential Information may be contained in this message. If
you are not the addressee indicated in this message (or responsible for
delivery of the message to such person), you may not copy or deliver
this message to anyone. In such a case, you should destroy this message
and kindly notify the sender by reply e-mail.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: sched: use pinned timers
  2015-01-07 10:06   ` Cosmin GIRADU
@ 2015-01-07 15:45     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2015-01-07 15:45 UTC (permalink / raw)
  To: Cosmin GIRADU; +Cc: netdev

On Wed, 2015-01-07 at 12:06 +0200, Cosmin GIRADU wrote:

> Hi Eric,
> 
>     I saw that this patch didn't make it's way into the stable branches.
> So I have two questions:
>         - Would it be safe to apply to linux-3.12.x stable?
>         - If yes, would there be any [noticeable] efects on a [pretty
> complex] HTB setup? (I know, test and I'll see,
>            but if theory sais I won't, then there would be no point to
> the test, would there?)
> 

It is safe to backport.

If you want to test an equivalent, without kernel patch, you simply can
do :

echo 0 >/proc/sys/kernel/timer_migration

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-07 15:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-21  1:01 [PATCH net-next] net: sched: use pinned timers Eric Dumazet
2014-09-26  4:27 ` David Miller
2015-01-07 10:06   ` Cosmin GIRADU
2015-01-07 15:45     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).