All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
@ 2019-02-25 19:09 Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 1/7] net: sched: pie: change value of QUEUE_THRESHOLD Leslie Monis
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Leslie Monis, Mohit P . Tahiliani, Dave Taht, Jamal Hadi Salim

The current implementation of the PIE queuing discipline is according to the
IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
[PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
However, a lot of necessary modifications and enhancements have been proposed
in RFC 8033, which have not yet been incorporated in the source code of Linux.
This patch series helps in achieving the same.

Performance tests carried out using Flent [https://flent.org/]

Changes from v2 to v3:
  - Used div_u64() instead of direct division after explicit type casting as
    recommended by David

Changes from v1 to v2:
  - Excluded the patch setting PIE dynamically active/inactive as the test
    results were unsatisfactory
  - Fixed a scaling issue when adding more auto-tuning cases which caused
    local variables to underflow
  - Changed the long if/else chain to a loop as suggested by Stephen
  - Changed the position of the accu_prob variable in the pie_vars
    structure as recommended by Stephen

Mohit P. Tahiliani (7):
  net: sched: pie: change value of QUEUE_THRESHOLD
  net: sched: pie: change default value of pie_params->target
  net: sched: pie: change default value of pie_params->tupdate
  net: sched: pie: change initial value of pie_vars->burst_time
  net: sched: pie: add more cases to auto-tune alpha and beta
  net: sched: pie: add derandomization mechanism
  net: sched: pie: update references

 include/uapi/linux/pkt_sched.h |   2 +-
 net/sched/sch_pie.c            | 107 ++++++++++++++++++++-------------
 2 files changed, 66 insertions(+), 43 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 1/7] net: sched: pie: change value of QUEUE_THRESHOLD
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
@ 2019-02-25 19:09 ` Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 2/7] net: sched: pie: change default value of pie_params->target Leslie Monis
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

RFC 8033 recommends a value of 16384 bytes for the queue
threshold.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index d1429371592f..7778eff6cdb7 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -31,7 +31,7 @@
 #include <net/pkt_sched.h>
 #include <net/inet_ecn.h>
 
-#define QUEUE_THRESHOLD 10000
+#define QUEUE_THRESHOLD 16384
 #define DQCOUNT_INVALID -1
 #define MAX_PROB  0xffffffff
 #define PIE_SCALE 8
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 2/7] net: sched: pie: change default value of pie_params->target
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 1/7] net: sched: pie: change value of QUEUE_THRESHOLD Leslie Monis
@ 2019-02-25 19:09 ` Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 3/7] net: sched: pie: change default value of pie_params->tupdate Leslie Monis
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

RFC 8033 suggests a default value of 15 milliseconds for the
target queue delay.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 7778eff6cdb7..91af9bf19852 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -83,7 +83,7 @@ static void pie_params_init(struct pie_params *params)
 	params->beta = 20;
 	params->tupdate = usecs_to_jiffies(30 * USEC_PER_MSEC);	/* 30 ms */
 	params->limit = 1000;	/* default of 1000 packets */
-	params->target = PSCHED_NS2TICKS(20 * NSEC_PER_MSEC);	/* 20 ms */
+	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
 	params->ecn = false;
 	params->bytemode = false;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 3/7] net: sched: pie: change default value of pie_params->tupdate
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 1/7] net: sched: pie: change value of QUEUE_THRESHOLD Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 2/7] net: sched: pie: change default value of pie_params->target Leslie Monis
@ 2019-02-25 19:09 ` Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 4/7] net: sched: pie: change initial value of pie_vars->burst_time Leslie Monis
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

RFC 8033 suggests a default value of 15 milliseconds for the
update interval.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 91af9bf19852..702f75afc312 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -81,7 +81,7 @@ static void pie_params_init(struct pie_params *params)
 {
 	params->alpha = 2;
 	params->beta = 20;
-	params->tupdate = usecs_to_jiffies(30 * USEC_PER_MSEC);	/* 30 ms */
+	params->tupdate = usecs_to_jiffies(15 * USEC_PER_MSEC);	/* 15 ms */
 	params->limit = 1000;	/* default of 1000 packets */
 	params->target = PSCHED_NS2TICKS(15 * NSEC_PER_MSEC);	/* 15 ms */
 	params->ecn = false;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 4/7] net: sched: pie: change initial value of pie_vars->burst_time
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (2 preceding siblings ...)
  2019-02-25 19:09 ` [PATCH net-next v3 3/7] net: sched: pie: change default value of pie_params->tupdate Leslie Monis
@ 2019-02-25 19:09 ` Leslie Monis
  2019-02-25 19:09 ` [PATCH net-next v3 5/7] net: sched: pie: add more cases to auto-tune alpha and beta Leslie Monis
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

RFC 8033 suggests an initial value of 150 milliseconds for
the maximum time allowed for a burst of packets.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 702f75afc312..d88ab53593b3 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -92,8 +92,8 @@ static void pie_vars_init(struct pie_vars *vars)
 {
 	vars->dq_count = DQCOUNT_INVALID;
 	vars->avg_dq_rate = 0;
-	/* default of 100 ms in pschedtime */
-	vars->burst_time = PSCHED_NS2TICKS(100 * NSEC_PER_MSEC);
+	/* default of 150 ms in pschedtime */
+	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
 }
 
 static bool drop_early(struct Qdisc *sch, u32 packet_size)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 5/7] net: sched: pie: add more cases to auto-tune alpha and beta
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (3 preceding siblings ...)
  2019-02-25 19:09 ` [PATCH net-next v3 4/7] net: sched: pie: change initial value of pie_vars->burst_time Leslie Monis
@ 2019-02-25 19:09 ` Leslie Monis
  2019-02-25 19:10 ` [PATCH net-next v3 6/7] net: sched: pie: add derandomization mechanism Leslie Monis
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:09 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

The current implementation scales the local alpha and beta
variables in the calculate_probability function by the same
amount for all values of drop probability below 1%.

RFC 8033 suggests using additional cases for auto-tuning
alpha and beta when the drop probability is less than 1%.

In order to add more auto-tuning cases, MAX_PROB must be
scaled by u64 instead of u32 to prevent underflow when
scaling the local alpha and beta variables in the
calculate_probability function.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/uapi/linux/pkt_sched.h |  2 +-
 net/sched/sch_pie.c            | 65 +++++++++++++++++-----------------
 2 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 0d18b1d1fbbc..1eb572ef3f27 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -954,7 +954,7 @@ enum {
 #define TCA_PIE_MAX   (__TCA_PIE_MAX - 1)
 
 struct tc_pie_xstats {
-	__u32 prob;             /* current probability */
+	__u64 prob;             /* current probability */
 	__u32 delay;            /* current delay in ms */
 	__u32 avg_dq_rate;      /* current average dq_rate in bits/pie_time */
 	__u32 packets_in;       /* total number of packets enqueued */
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index d88ab53593b3..30f158582499 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -33,7 +33,7 @@
 
 #define QUEUE_THRESHOLD 16384
 #define DQCOUNT_INVALID -1
-#define MAX_PROB  0xffffffff
+#define MAX_PROB 0xffffffffffffffff
 #define PIE_SCALE 8
 
 /* parameters used */
@@ -49,7 +49,7 @@ struct pie_params {
 
 /* variables used */
 struct pie_vars {
-	u32 prob;		/* probability but scaled by u32 limit. */
+	u64 prob;		/* probability but scaled by u64 limit. */
 	psched_time_t burst_time;
 	psched_time_t qdelay;
 	psched_time_t qdelay_old;
@@ -99,8 +99,8 @@ static void pie_vars_init(struct pie_vars *vars)
 static bool drop_early(struct Qdisc *sch, u32 packet_size)
 {
 	struct pie_sched_data *q = qdisc_priv(sch);
-	u32 rnd;
-	u32 local_prob = q->vars.prob;
+	u64 rnd;
+	u64 local_prob = q->vars.prob;
 	u32 mtu = psched_mtu(qdisc_dev(sch));
 
 	/* If there is still burst allowance left skip random early drop */
@@ -124,11 +124,11 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size)
 	 * probablity. Smaller packets will have lower drop prob in this case
 	 */
 	if (q->params.bytemode && packet_size <= mtu)
-		local_prob = (local_prob / mtu) * packet_size;
+		local_prob = (u64)packet_size * div_u64(local_prob, mtu);
 	else
 		local_prob = q->vars.prob;
 
-	rnd = prandom_u32();
+	prandom_bytes(&rnd, 8);
 	if (rnd < local_prob)
 		return true;
 
@@ -317,9 +317,10 @@ static void calculate_probability(struct Qdisc *sch)
 	u32 qlen = sch->qstats.backlog;	/* queue size in bytes */
 	psched_time_t qdelay = 0;	/* in pschedtime */
 	psched_time_t qdelay_old = q->vars.qdelay;	/* in pschedtime */
-	s32 delta = 0;		/* determines the change in probability */
-	u32 oldprob;
-	u32 alpha, beta;
+	s64 delta = 0;		/* determines the change in probability */
+	u64 oldprob;
+	u64 alpha, beta;
+	u32 power;
 	bool update_prob = true;
 
 	q->vars.qdelay_old = q->vars.qdelay;
@@ -339,38 +340,36 @@ static void calculate_probability(struct Qdisc *sch)
 	 * value for alpha as 0.125. In this implementation, we use values 0-32
 	 * passed from user space to represent this. Also, alpha and beta have
 	 * unit of HZ and need to be scaled before they can used to update
-	 * probability. alpha/beta are updated locally below by 1) scaling them
-	 * appropriately 2) scaling down by 16 to come to 0-2 range.
-	 * Please see paper for details.
-	 *
-	 * We scale alpha and beta differently depending on whether we are in
-	 * light, medium or high dropping mode.
+	 * probability. alpha/beta are updated locally below by scaling down
+	 * by 16 to come to 0-2 range.
 	 */
-	if (q->vars.prob < MAX_PROB / 100) {
-		alpha =
-		    (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7;
-		beta =
-		    (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 7;
-	} else if (q->vars.prob < MAX_PROB / 10) {
-		alpha =
-		    (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 5;
-		beta =
-		    (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 5;
-	} else {
-		alpha =
-		    (q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;
-		beta =
-		    (q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;
+	alpha = ((u64)q->params.alpha * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;
+	beta = ((u64)q->params.beta * (MAX_PROB / PSCHED_TICKS_PER_SEC)) >> 4;
+
+	/* We scale alpha and beta differently depending on how heavy the
+	 * congestion is. Please see RFC 8033 for details.
+	 */
+	if (q->vars.prob < MAX_PROB / 10) {
+		alpha >>= 1;
+		beta >>= 1;
+
+		power = 100;
+		while (q->vars.prob < div_u64(MAX_PROB, power) &&
+		       power <= 1000000) {
+			alpha >>= 2;
+			beta >>= 2;
+			power *= 10;
+		}
 	}
 
 	/* alpha and beta should be between 0 and 32, in multiples of 1/16 */
-	delta += alpha * ((qdelay - q->params.target));
-	delta += beta * ((qdelay - qdelay_old));
+	delta += alpha * (u64)(qdelay - q->params.target);
+	delta += beta * (u64)(qdelay - qdelay_old);
 
 	oldprob = q->vars.prob;
 
 	/* to ensure we increase probability in steps of no more than 2% */
-	if (delta > (s32)(MAX_PROB / (100 / 2)) &&
+	if (delta > (s64)(MAX_PROB / (100 / 2)) &&
 	    q->vars.prob >= MAX_PROB / 10)
 		delta = (MAX_PROB / 100) * 2;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 6/7] net: sched: pie: add derandomization mechanism
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (4 preceding siblings ...)
  2019-02-25 19:09 ` [PATCH net-next v3 5/7] net: sched: pie: add more cases to auto-tune alpha and beta Leslie Monis
@ 2019-02-25 19:10 ` Leslie Monis
  2019-02-25 19:10 ` [PATCH net-next v3 7/7] net: sched: pie: update references Leslie Monis
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

Random dropping of packets to achieve latency control may
introduce outlier situations where packets are dropped too
close to each other or too far from each other. This can
cause the real drop percentage to temporarily deviate from
the intended drop probability. In certain scenarios, such
as a small number of simultaneous TCP flows, these
deviations can cause significant deviations in link
utilization and queuing latency.

RFC 8033 suggests using a derandomization mechanism to avoid
these deviations.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 30f158582499..916b878d3491 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -55,8 +55,10 @@ struct pie_vars {
 	psched_time_t qdelay_old;
 	u64 dq_count;		/* measured in bytes */
 	psched_time_t dq_tstamp;	/* drain rate */
+	u64 accu_prob;		/* accumulated drop probability */
 	u32 avg_dq_rate;	/* bytes per pschedtime tick,scaled */
 	u32 qlen_old;		/* in bytes */
+	u8 accu_prob_overflows;	/* overflows of accu_prob */
 };
 
 /* statistics gathering */
@@ -91,9 +93,11 @@ static void pie_params_init(struct pie_params *params)
 static void pie_vars_init(struct pie_vars *vars)
 {
 	vars->dq_count = DQCOUNT_INVALID;
+	vars->accu_prob = 0;
 	vars->avg_dq_rate = 0;
 	/* default of 150 ms in pschedtime */
 	vars->burst_time = PSCHED_NS2TICKS(150 * NSEC_PER_MSEC);
+	vars->accu_prob_overflows = 0;
 }
 
 static bool drop_early(struct Qdisc *sch, u32 packet_size)
@@ -128,9 +132,29 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size)
 	else
 		local_prob = q->vars.prob;
 
+	if (local_prob == 0) {
+		q->vars.accu_prob = 0;
+		q->vars.accu_prob_overflows = 0;
+	}
+
+	if (local_prob > MAX_PROB - q->vars.accu_prob)
+		q->vars.accu_prob_overflows++;
+
+	q->vars.accu_prob += local_prob;
+
+	if (q->vars.accu_prob_overflows == 0 &&
+	    q->vars.accu_prob < (MAX_PROB / 100) * 85)
+		return false;
+	if (q->vars.accu_prob_overflows == 8 &&
+	    q->vars.accu_prob >= MAX_PROB / 2)
+		return true;
+
 	prandom_bytes(&rnd, 8);
-	if (rnd < local_prob)
+	if (rnd < local_prob) {
+		q->vars.accu_prob = 0;
+		q->vars.accu_prob_overflows = 0;
 		return true;
+	}
 
 	return false;
 }
@@ -168,6 +192,8 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 
 out:
 	q->stats.dropped++;
+	q->vars.accu_prob = 0;
+	q->vars.accu_prob_overflows = 0;
 	return qdisc_drop(skb, sch, to_free);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v3 7/7] net: sched: pie: update references
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (5 preceding siblings ...)
  2019-02-25 19:10 ` [PATCH net-next v3 6/7] net: sched: pie: add derandomization mechanism Leslie Monis
@ 2019-02-25 19:10 ` Leslie Monis
  2019-02-25 22:21 ` [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 David Miller
  2019-02-26  0:38 ` Stephen Hemminger
  8 siblings, 0 replies; 13+ messages in thread
From: Leslie Monis @ 2019-02-25 19:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, Mohit P. Tahiliani, Dave Taht, Jamal Hadi Salim,
	Dhaval Khandla, Hrishikesh Hiraskar, Manish Kumar B,
	Sachin D . Patil, Leslie Monis

From: "Mohit P. Tahiliani" <tahiliani@nitk.edu.in>

RFC 8033 replaces the IETF draft for PIE

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/sch_pie.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 916b878d3491..f8314a14a256 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -17,9 +17,7 @@
  * University of Oslo, Norway.
  *
  * References:
- * IETF draft submission: http://tools.ietf.org/html/draft-pan-aqm-pie-00
- * IEEE  Conference on High Performance Switching and Routing 2013 :
- * "PIE: A * Lightweight Control Scheme to Address the Bufferbloat Problem"
+ * RFC 8033: https://tools.ietf.org/html/rfc8034
  */
 
 #include <linux/module.h>
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (6 preceding siblings ...)
  2019-02-25 19:10 ` [PATCH net-next v3 7/7] net: sched: pie: update references Leslie Monis
@ 2019-02-25 22:21 ` David Miller
  2019-02-26  0:38 ` Stephen Hemminger
  8 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2019-02-25 22:21 UTC (permalink / raw)
  To: lesliemonis; +Cc: netdev, tahiliani, dave.taht, jhs

From: Leslie Monis <lesliemonis@gmail.com>
Date: Tue, 26 Feb 2019 00:39:54 +0530

> The current implementation of the PIE queuing discipline is according to the
> IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> However, a lot of necessary modifications and enhancements have been proposed
> in RFC 8033, which have not yet been incorporated in the source code of Linux.
> This patch series helps in achieving the same.
> 
> Performance tests carried out using Flent [https://flent.org/]
> 
> Changes from v2 to v3:
>   - Used div_u64() instead of direct division after explicit type casting as
>     recommended by David
> 
> Changes from v1 to v2:
>   - Excluded the patch setting PIE dynamically active/inactive as the test
>     results were unsatisfactory
>   - Fixed a scaling issue when adding more auto-tuning cases which caused
>     local variables to underflow
>   - Changed the long if/else chain to a loop as suggested by Stephen
>   - Changed the position of the accu_prob variable in the pie_vars
>     structure as recommended by Stephen

Series applied, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
  2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
                   ` (7 preceding siblings ...)
  2019-02-25 22:21 ` [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 David Miller
@ 2019-02-26  0:38 ` Stephen Hemminger
  2019-02-26  1:02   ` Dave Taht
  2019-02-26  8:20   ` Leslie Monis
  8 siblings, 2 replies; 13+ messages in thread
From: Stephen Hemminger @ 2019-02-26  0:38 UTC (permalink / raw)
  To: Leslie Monis
  Cc: davem, netdev, Mohit P . Tahiliani, Dave Taht, Jamal Hadi Salim

On Tue, 26 Feb 2019 00:39:54 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:

> The current implementation of the PIE queuing discipline is according to the
> IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> However, a lot of necessary modifications and enhancements have been proposed
> in RFC 8033, which have not yet been incorporated in the source code of Linux.
> This patch series helps in achieving the same.
> 
> Performance tests carried out using Flent [https://flent.org/]
> 
> Changes from v2 to v3:
>   - Used div_u64() instead of direct division after explicit type casting as
>     recommended by David
> 
> Changes from v1 to v2:
>   - Excluded the patch setting PIE dynamically active/inactive as the test
>     results were unsatisfactory
>   - Fixed a scaling issue when adding more auto-tuning cases which caused
>     local variables to underflow
>   - Changed the long if/else chain to a loop as suggested by Stephen
>   - Changed the position of the accu_prob variable in the pie_vars
>     structure as recommended by Stephen
> 
> Mohit P. Tahiliani (7):
>   net: sched: pie: change value of QUEUE_THRESHOLD
>   net: sched: pie: change default value of pie_params->target
>   net: sched: pie: change default value of pie_params->tupdate
>   net: sched: pie: change initial value of pie_vars->burst_time
>   net: sched: pie: add more cases to auto-tune alpha and beta
>   net: sched: pie: add derandomization mechanism
>   net: sched: pie: update references
> 
>  include/uapi/linux/pkt_sched.h |   2 +-
>  net/sched/sch_pie.c            | 107 ++++++++++++++++++++-------------
>  2 files changed, 66 insertions(+), 43 deletions(-)

Are you concerned at all that changes to default values might change
expected behavior of existing users?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
  2019-02-26  0:38 ` Stephen Hemminger
@ 2019-02-26  1:02   ` Dave Taht
  2019-02-26  8:20   ` Leslie Monis
  1 sibling, 0 replies; 13+ messages in thread
From: Dave Taht @ 2019-02-26  1:02 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Leslie Monis, David S. Miller, Linux Kernel Network Developers,
	Mohit P . Tahiliani, Jamal Hadi Salim

On Mon, Feb 25, 2019 at 4:38 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 26 Feb 2019 00:39:54 +0530
> Leslie Monis <lesliemonis@gmail.com> wrote:
>
> > The current implementation of the PIE queuing discipline is according to the
> > IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> > [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> > However, a lot of necessary modifications and enhancements have been proposed
> > in RFC 8033, which have not yet been incorporated in the source code of Linux.
> > This patch series helps in achieving the same.
> >
> > Performance tests carried out using Flent [https://flent.org/]
> >
> > Changes from v2 to v3:
> >   - Used div_u64() instead of direct division after explicit type casting as
> >     recommended by David
> >
> > Changes from v1 to v2:
> >   - Excluded the patch setting PIE dynamically active/inactive as the test
> >     results were unsatisfactory
> >   - Fixed a scaling issue when adding more auto-tuning cases which caused
> >     local variables to underflow
> >   - Changed the long if/else chain to a loop as suggested by Stephen
> >   - Changed the position of the accu_prob variable in the pie_vars
> >     structure as recommended by Stephen
> >
> > Mohit P. Tahiliani (7):
> >   net: sched: pie: change value of QUEUE_THRESHOLD
> >   net: sched: pie: change default value of pie_params->target
> >   net: sched: pie: change default value of pie_params->tupdate
> >   net: sched: pie: change initial value of pie_vars->burst_time
> >   net: sched: pie: add more cases to auto-tune alpha and beta
> >   net: sched: pie: add derandomization mechanism
> >   net: sched: pie: update references
> >
> >  include/uapi/linux/pkt_sched.h |   2 +-
> >  net/sched/sch_pie.c            | 107 ++++++++++++++++++++-------------
> >  2 files changed, 66 insertions(+), 43 deletions(-)
>
> Are you concerned at all that changes to default values might change
> expected behavior of existing users?

There's existing users?

Most of these changes are really subtle and came out as we went along.
I'd have to *really search* to find my notes from that period,
and I have not evaluated the sum of these new changes, but overall
they were a slight improvement from the defaults as we shipped it way
back when.

I'm not being snarky - are there existing users? Every last person I
know that ever evaluated pie switched to fq_codel. Cisco ended up
going with another bufferbloat-fighting solution in one new device and
that team dissolved.

docsis-pie, well, that's a standard. and that's a bit different from
this pie, but this pie is closer to that pie. And it works ok.

I certainly hope leslie's group has done similar testing to what we
did then ? If there's results I can dig up my old results and see what
compares. I imagine they plan a paper. our old flent test code and
data is all published, but am not willing to spend another minute of
my life on a single queued aqm design unless someone writes me a very
big check for it, which I'd then spend on fixing P4 to run fq-codel.

fq-pie (which is waiting in the wings behind this set of patches) is
ok, as at least, the BSD version was competetive with fq_codel in most
respects. I told 'em they needed to look hard at the rate estimator.

The ECN handling problem mentioned is there on all known versions of
pie, but it's that it reverts to drop too soon for ecn to be
as much benefit as it could be.

and then l4s, which is the subject of 3 upcoming talks at netdevconf,
well... not gonna talk about it. I'm willing to give them their day in
court.

sorry to come across as grumpy. should linux be rfc compliant even if
the rfc has issues? damned if I know.



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
  2019-02-26  0:38 ` Stephen Hemminger
  2019-02-26  1:02   ` Dave Taht
@ 2019-02-26  8:20   ` Leslie Monis
  2019-02-26 15:50     ` Stephen Hemminger
  1 sibling, 1 reply; 13+ messages in thread
From: Leslie Monis @ 2019-02-26  8:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: davem, netdev, Mohit P . Tahiliani, Dave Taht, Jamal Hadi Salim

On Mon, Feb 25, 2019 at 04:38:11PM -0800, Stephen Hemminger wrote:
> On Tue, 26 Feb 2019 00:39:54 +0530
> Leslie Monis <lesliemonis@gmail.com> wrote:
> 
> > The current implementation of the PIE queuing discipline is according to the
> > IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> > [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> > However, a lot of necessary modifications and enhancements have been proposed
> > in RFC 8033, which have not yet been incorporated in the source code of Linux.
> > This patch series helps in achieving the same.
> > 
> > Performance tests carried out using Flent [https://flent.org/]
> > 
> > Changes from v2 to v3:
> >   - Used div_u64() instead of direct division after explicit type casting as
> >     recommended by David
> > 
> > Changes from v1 to v2:
> >   - Excluded the patch setting PIE dynamically active/inactive as the test
> >     results were unsatisfactory
> >   - Fixed a scaling issue when adding more auto-tuning cases which caused
> >     local variables to underflow
> >   - Changed the long if/else chain to a loop as suggested by Stephen
> >   - Changed the position of the accu_prob variable in the pie_vars
> >     structure as recommended by Stephen
> > 
> > Mohit P. Tahiliani (7):
> >   net: sched: pie: change value of QUEUE_THRESHOLD
> >   net: sched: pie: change default value of pie_params->target
> >   net: sched: pie: change default value of pie_params->tupdate
> >   net: sched: pie: change initial value of pie_vars->burst_time
> >   net: sched: pie: add more cases to auto-tune alpha and beta
> >   net: sched: pie: add derandomization mechanism
> >   net: sched: pie: update references
> > 
> >  include/uapi/linux/pkt_sched.h |   2 +-
> >  net/sched/sch_pie.c            | 107 ++++++++++++++++++++-------------
> >  2 files changed, 66 insertions(+), 43 deletions(-)
> 
> Are you concerned at all that changes to default values might change
> expected behavior of existing users?

Hi Stephen,

As Dave mentioned, the changes which we have made do not really change the
behaviour of the aqm drastically. Our performance tests show that these changes
improve performance without any side-effects. So existing users (if there are
any) should not be negatively affected in any way.

Leslie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033
  2019-02-26  8:20   ` Leslie Monis
@ 2019-02-26 15:50     ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2019-02-26 15:50 UTC (permalink / raw)
  To: Leslie Monis
  Cc: davem, netdev, Mohit P . Tahiliani, Dave Taht, Jamal Hadi Salim

On Tue, 26 Feb 2019 13:50:46 +0530
Leslie Monis <lesliemonis@gmail.com> wrote:

> On Mon, Feb 25, 2019 at 04:38:11PM -0800, Stephen Hemminger wrote:
> > On Tue, 26 Feb 2019 00:39:54 +0530
> > Leslie Monis <lesliemonis@gmail.com> wrote:
> >   
> > > The current implementation of the PIE queuing discipline is according to the
> > > IETF draft [http://tools.ietf.org/html/draft-pan-aqm-pie-00] and the paper
> > > [PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem].
> > > However, a lot of necessary modifications and enhancements have been proposed
> > > in RFC 8033, which have not yet been incorporated in the source code of Linux.
> > > This patch series helps in achieving the same.
> > > 
> > > Performance tests carried out using Flent [https://flent.org/]
> > > 
> > > Changes from v2 to v3:
> > >   - Used div_u64() instead of direct division after explicit type casting as
> > >     recommended by David
> > > 
> > > Changes from v1 to v2:
> > >   - Excluded the patch setting PIE dynamically active/inactive as the test
> > >     results were unsatisfactory
> > >   - Fixed a scaling issue when adding more auto-tuning cases which caused
> > >     local variables to underflow
> > >   - Changed the long if/else chain to a loop as suggested by Stephen
> > >   - Changed the position of the accu_prob variable in the pie_vars
> > >     structure as recommended by Stephen
> > > 
> > > Mohit P. Tahiliani (7):
> > >   net: sched: pie: change value of QUEUE_THRESHOLD
> > >   net: sched: pie: change default value of pie_params->target
> > >   net: sched: pie: change default value of pie_params->tupdate
> > >   net: sched: pie: change initial value of pie_vars->burst_time
> > >   net: sched: pie: add more cases to auto-tune alpha and beta
> > >   net: sched: pie: add derandomization mechanism
> > >   net: sched: pie: update references
> > > 
> > >  include/uapi/linux/pkt_sched.h |   2 +-
> > >  net/sched/sch_pie.c            | 107 ++++++++++++++++++++-------------
> > >  2 files changed, 66 insertions(+), 43 deletions(-)  
> > 
> > Are you concerned at all that changes to default values might change
> > expected behavior of existing users?  
> 
> Hi Stephen,
> 
> As Dave mentioned, the changes which we have made do not really change the
> behaviour of the aqm drastically. Our performance tests show that these changes
> improve performance without any side-effects. So existing users (if there are
> any) should not be negatively affected in any way.

Thanks for answering. Looks good

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-26 15:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25 19:09 [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 Leslie Monis
2019-02-25 19:09 ` [PATCH net-next v3 1/7] net: sched: pie: change value of QUEUE_THRESHOLD Leslie Monis
2019-02-25 19:09 ` [PATCH net-next v3 2/7] net: sched: pie: change default value of pie_params->target Leslie Monis
2019-02-25 19:09 ` [PATCH net-next v3 3/7] net: sched: pie: change default value of pie_params->tupdate Leslie Monis
2019-02-25 19:09 ` [PATCH net-next v3 4/7] net: sched: pie: change initial value of pie_vars->burst_time Leslie Monis
2019-02-25 19:09 ` [PATCH net-next v3 5/7] net: sched: pie: add more cases to auto-tune alpha and beta Leslie Monis
2019-02-25 19:10 ` [PATCH net-next v3 6/7] net: sched: pie: add derandomization mechanism Leslie Monis
2019-02-25 19:10 ` [PATCH net-next v3 7/7] net: sched: pie: update references Leslie Monis
2019-02-25 22:21 ` [PATCH net-next v3 0/7] net: sched: pie: align PIE implementation with RFC 8033 David Miller
2019-02-26  0:38 ` Stephen Hemminger
2019-02-26  1:02   ` Dave Taht
2019-02-26  8:20   ` Leslie Monis
2019-02-26 15:50     ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.