netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net_sched: allow use of hrtimer slack
@ 2020-03-16 23:02 Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 1/3] net_sched: add qdisc_watchdog_schedule_range_ns() Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2020-03-16 23:02 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Willem de Bruijn, Eric Dumazet, Eric Dumazet

Packet schedulers have used hrtimers with exact expiry times.

Some of them can afford having a slack, in order to reduce
the number of timer interrupts and feed bigger batches
to increase efficiency.

FQ for example does not care if throttled packets are
sent with an additional (small) delay.

Original observation of having maybe too many interrupts
was made by Willem de Bruijn.

Eric Dumazet (3):
  net_sched: add qdisc_watchdog_schedule_range_ns()
  net_sched: do not reprogram a timer about to expire
  net_sched: sch_fq: enable use of hrtimer slack

 include/net/pkt_sched.h        | 10 +++++++++-
 include/uapi/linux/pkt_sched.h |  2 ++
 net/sched/sch_api.c            | 21 ++++++++++++++-------
 net/sched/sch_fq.c             | 19 +++++++++++++++----
 4 files changed, 40 insertions(+), 12 deletions(-)

-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/3] net_sched: add qdisc_watchdog_schedule_range_ns()
  2020-03-16 23:02 [PATCH net-next 0/3] net_sched: allow use of hrtimer slack Eric Dumazet
@ 2020-03-16 23:02 ` Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 2/3] net_sched: do not reprogram a timer about to expire Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack Eric Dumazet
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2020-03-16 23:02 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Willem de Bruijn, Eric Dumazet, Eric Dumazet

Some packet schedulers might want to add a slack
when programming hrtimers. This can reduce number
of interrupts and increase batch sizes and thus
give good xmit_more savings.

This commit adds qdisc_watchdog_schedule_range_ns()
helper, with an extra delta_ns parameter.

Legacy qdisc_watchdog_schedule_n() becomes an inline
passing a zero slack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/pkt_sched.h | 10 +++++++++-
 net/sched/sch_api.c     | 12 +++++++-----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 20d2c6419612fd4d35b0960456394ea69ced7e7d..9092e697059e775af307be69a879386ebfd9924f 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -75,7 +75,15 @@ struct qdisc_watchdog {
 void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc,
 				 clockid_t clockid);
 void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc);
-void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires);
+
+void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
+				      u64 delta_ns);
+
+static inline void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd,
+					      u64 expires)
+{
+	return qdisc_watchdog_schedule_range_ns(wd, expires, 0ULL);
+}
 
 static inline void qdisc_watchdog_schedule(struct qdisc_watchdog *wd,
 					   psched_time_t expires)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 50794125bf0240031c142045d09b429cf945029f..83984be04f57d93b4efc50bb9cf390b116101fdd 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -618,7 +618,8 @@ void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_watchdog_init);
 
-void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires)
+void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
+				      u64 delta_ns)
 {
 	if (test_bit(__QDISC_STATE_DEACTIVATED,
 		     &qdisc_root_sleeping(wd->qdisc)->state))
@@ -628,11 +629,12 @@ void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires)
 		return;
 
 	wd->last_expires = expires;
-	hrtimer_start(&wd->timer,
-		      ns_to_ktime(expires),
-		      HRTIMER_MODE_ABS_PINNED);
+	hrtimer_start_range_ns(&wd->timer,
+			       ns_to_ktime(expires),
+			       delta_ns,
+			       HRTIMER_MODE_ABS_PINNED);
 }
-EXPORT_SYMBOL(qdisc_watchdog_schedule_ns);
+EXPORT_SYMBOL(qdisc_watchdog_schedule_range_ns);
 
 void qdisc_watchdog_cancel(struct qdisc_watchdog *wd)
 {
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/3] net_sched: do not reprogram a timer about to expire
  2020-03-16 23:02 [PATCH net-next 0/3] net_sched: allow use of hrtimer slack Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 1/3] net_sched: add qdisc_watchdog_schedule_range_ns() Eric Dumazet
@ 2020-03-16 23:02 ` Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack Eric Dumazet
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2020-03-16 23:02 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Willem de Bruijn, Eric Dumazet, Eric Dumazet

qdisc_watchdog_schedule_range_ns() can use the newly added slack
and avoid rearming the hrtimer a bit earlier than the current
value. This patch has no effect if delta_ns parameter
is zero.

Note that this means the max slack is potentially doubled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_api.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 83984be04f57d93b4efc50bb9cf390b116101fdd..0d99df1e764db812f5dfc78a9c54832c0f676f70 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -625,8 +625,13 @@ void qdisc_watchdog_schedule_range_ns(struct qdisc_watchdog *wd, u64 expires,
 		     &qdisc_root_sleeping(wd->qdisc)->state))
 		return;
 
-	if (wd->last_expires == expires)
-		return;
+	if (hrtimer_is_queued(&wd->timer)) {
+		/* If timer is already set in [expires, expires + delta_ns],
+		 * do not reprogram it.
+		 */
+		if (wd->last_expires - expires <= delta_ns)
+			return;
+	}
 
 	wd->last_expires = expires;
 	hrtimer_start_range_ns(&wd->timer,
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack
  2020-03-16 23:02 [PATCH net-next 0/3] net_sched: allow use of hrtimer slack Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 1/3] net_sched: add qdisc_watchdog_schedule_range_ns() Eric Dumazet
  2020-03-16 23:02 ` [PATCH net-next 2/3] net_sched: do not reprogram a timer about to expire Eric Dumazet
@ 2020-03-16 23:02 ` Eric Dumazet
  2020-03-16 23:16   ` Jakub Kicinski
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2020-03-16 23:02 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Willem de Bruijn, Eric Dumazet, Eric Dumazet

Add a new attribute to control the fq qdisc hrtimer slack.

Default is set to 10 usec.

When/if packets are throttled, fq set up an hrtimer that can
lead to one interrupt per packet in the throttled queue.

By using a timer slack, we allow better use of timer interrupts,
by giving them a chance to call multiple timer callbacks
at each hardware interrupt.

Also, giving a slack allows FQ to dequeue batches of packets
instead of a single one, thus increasing xmit_more efficiency.

This has no negative effect on the rate a TCP flow can sustain,
since each TCP flow maintains its own precise vtime (tp->tcp_wstamp_ns)

Tested:
 1000 concurrent flows all using paced packets.
 1,000,000 packets sent per second.

Before the patch :

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 60726784  23628 3485992    0    0   138     1  977  535  0 12 87  0  0
 0  0      0 60714700  23628 3485628    0    0     0     0 1568827 26462  0 22 78  0  0
 1  0      0 60716012  23628 3485656    0    0     0     0 1570034 26216  0 22 78  0  0
 0  0      0 60722420  23628 3485492    0    0     0     0 1567230 26424  0 22 78  0  0
 0  0      0 60727484  23628 3485556    0    0     0     0 1568220 26200  0 22 78  0  0
 2  0      0 60718900  23628 3485380    0    0     0    40 1564721 26630  0 22 78  0  0
 2  0      0 60718096  23628 3485332    0    0     0     0 1562593 26432  0 22 78  0  0
 0  0      0 60719608  23628 3485064    0    0     0     0 1563806 26238  0 22 78  0  0
 1  0      0 60722876  23628 3485236    0    0     0   130 1565874 26566  0 22 78  0  0
 1  0      0 60722752  23628 3484908    0    0     0     0 1567646 26247  0 22 78  0  0

After the patch, slack of 10 usec, we can see a reduction of interrupts
per second, and a small decrease of reported cpu usage.

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 60722564  23628 3484728    0    0   133     1  696  545  0 13 87  0  0
 1  0      0 60722568  23628 3484824    0    0     0     0 977278 25469  0 20 80  0  0
 0  0      0 60716396  23628 3484764    0    0     0     0 979997 25326  0 20 80  0  0
 0  0      0 60713844  23628 3484960    0    0     0     0 981394 25249  0 20 80  0  0
 2  0      0 60720468  23628 3484916    0    0     0     0 982860 25062  0 20 80  0  0
 1  0      0 60721236  23628 3484856    0    0     0     0 982867 25100  0 20 80  0  0
 1  0      0 60722400  23628 3484456    0    0     0     8 982698 25303  0 20 80  0  0
 0  0      0 60715396  23628 3484428    0    0     0     0 981777 25176  0 20 80  0  0
 0  0      0 60716520  23628 3486544    0    0     0    36 978965 27857  0 21 79  0  0
 0  0      0 60719592  23628 3486516    0    0     0    22 977318 25106  0 20 80  0  0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/pkt_sched.h |  2 ++
 net/sched/sch_fq.c             | 19 +++++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index ea39287d59c812da7fc94379de41c2ba20e86557..7307a29a103e5191acd829c745860b637721be30 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -911,6 +911,8 @@ enum {
 
 	TCA_FQ_CE_THRESHOLD,	/* DCTCP-like CE-marking threshold */
 
+	TCA_FQ_TIMER_SLACK,	/* timer slack */
+
 	__TCA_FQ_MAX
 };
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 371ad84def3b6f1b5f0d0704b67723d0a8af22c6..9894577689faad16ed13c73725fb7da9b128bce9 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -121,6 +121,8 @@ struct fq_sched_data {
 	u64		stat_flows_plimit;
 	u64		stat_pkts_too_long;
 	u64		stat_allocation_errors;
+
+	u32		timer_slack; /* hrtimer slack in ns */
 	struct qdisc_watchdog watchdog;
 };
 
@@ -504,8 +506,9 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
 		head = &q->old_flows;
 		if (!head->first) {
 			if (q->time_next_delayed_flow != ~0ULL)
-				qdisc_watchdog_schedule_ns(&q->watchdog,
-							   q->time_next_delayed_flow);
+				qdisc_watchdog_schedule_range_ns(&q->watchdog,
+							q->time_next_delayed_flow,
+							q->timer_slack);
 			return NULL;
 		}
 	}
@@ -747,6 +750,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
 	[TCA_FQ_ORPHAN_MASK]		= { .type = NLA_U32 },
 	[TCA_FQ_LOW_RATE_THRESHOLD]	= { .type = NLA_U32 },
 	[TCA_FQ_CE_THRESHOLD]		= { .type = NLA_U32 },
+	[TCA_FQ_TIMER_SLACK]		= { .type = NLA_U32 },
 };
 
 static int fq_change(struct Qdisc *sch, struct nlattr *opt,
@@ -833,6 +837,9 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
 		q->ce_threshold = (u64)NSEC_PER_USEC *
 				  nla_get_u32(tb[TCA_FQ_CE_THRESHOLD]);
 
+	if (tb[TCA_FQ_TIMER_SLACK])
+		q->timer_slack = nla_get_u32(tb[TCA_FQ_TIMER_SLACK]);
+
 	if (!err) {
 		sch_tree_unlock(sch);
 		err = fq_resize(sch, fq_log);
@@ -884,6 +891,8 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt,
 	q->orphan_mask		= 1024 - 1;
 	q->low_rate_threshold	= 550000 / 8;
 
+	q->timer_slack = 10 * NSEC_PER_USEC; /* 10 usec of hrtimer slack */
+
 	/* Default ce_threshold of 4294 seconds */
 	q->ce_threshold		= (u64)NSEC_PER_USEC * ~0U;
 
@@ -924,7 +933,8 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	    nla_put_u32(skb, TCA_FQ_LOW_RATE_THRESHOLD,
 			q->low_rate_threshold) ||
 	    nla_put_u32(skb, TCA_FQ_CE_THRESHOLD, (u32)ce_threshold) ||
-	    nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
+	    nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log) ||
+	    nla_put_u32(skb, TCA_FQ_TIMER_SLACK, q->timer_slack))
 		goto nla_put_failure;
 
 	return nla_nest_end(skb, opts);
@@ -947,7 +957,8 @@ static int fq_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 	st.flows_plimit		  = q->stat_flows_plimit;
 	st.pkts_too_long	  = q->stat_pkts_too_long;
 	st.allocation_errors	  = q->stat_allocation_errors;
-	st.time_next_delayed_flow = q->time_next_delayed_flow - ktime_get_ns();
+	st.time_next_delayed_flow = q->time_next_delayed_flow + q->timer_slack -
+				    ktime_get_ns();
 	st.flows		  = q->flows;
 	st.inactive_flows	  = q->inactive_flows;
 	st.throttled_flows	  = q->throttled_flows;
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack
  2020-03-16 23:02 ` [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack Eric Dumazet
@ 2020-03-16 23:16   ` Jakub Kicinski
  2020-03-17  0:25     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2020-03-16 23:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Willem de Bruijn, Eric Dumazet

On Mon, 16 Mar 2020 16:02:23 -0700 Eric Dumazet wrote:
> @@ -747,6 +750,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {

nit: since nothing has been added to fq_policy since we introduced
     strict netlink checking please consider adding:

	[TCA_FQ_UNSPEC] = { .strict_start_type = TCA_FQ_TIMER_SLACK },

>  	[TCA_FQ_ORPHAN_MASK]		= { .type = NLA_U32 },
>  	[TCA_FQ_LOW_RATE_THRESHOLD]	= { .type = NLA_U32 },
>  	[TCA_FQ_CE_THRESHOLD]		= { .type = NLA_U32 },
> +	[TCA_FQ_TIMER_SLACK]		= { .type = NLA_U32 },


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack
  2020-03-16 23:16   ` Jakub Kicinski
@ 2020-03-17  0:25     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2020-03-17  0:25 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S . Miller, netdev, Willem de Bruijn, Eric Dumazet

On Mon, Mar 16, 2020 at 4:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 16 Mar 2020 16:02:23 -0700 Eric Dumazet wrote:
> > @@ -747,6 +750,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
>
> nit: since nothing has been added to fq_policy since we introduced
>      strict netlink checking please consider adding:
>
>         [TCA_FQ_UNSPEC] = { .strict_start_type = TCA_FQ_TIMER_SLACK },
>

Thanks for the hint, I will add this in v2.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-17  0:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16 23:02 [PATCH net-next 0/3] net_sched: allow use of hrtimer slack Eric Dumazet
2020-03-16 23:02 ` [PATCH net-next 1/3] net_sched: add qdisc_watchdog_schedule_range_ns() Eric Dumazet
2020-03-16 23:02 ` [PATCH net-next 2/3] net_sched: do not reprogram a timer about to expire Eric Dumazet
2020-03-16 23:02 ` [PATCH net-next 3/3] net_sched: sch_fq: enable use of hrtimer slack Eric Dumazet
2020-03-16 23:16   ` Jakub Kicinski
2020-03-17  0:25     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).