All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/5] TCP Wave
@ 2017-07-28 19:59 Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 1/5] tcp: Added callback for timed sender operations Natale Patriciello
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

Hi,
We are working on a new TCP congestion control algorithm, aiming at satisfying
new requirements coming from current networks. For instance, adaptation to
bandwidth/delay changes (due to mobility, dynamic switching, handover), and
optimal exploitation of very high link capacity and efficient transmission of
small objects, irrespective of the underlying link characteristics.

TCP Wave (TCPW) replaces the window-based transmission paradigm of the standard
TCP with a burst-based transmission, the ACK-clock scheduling with a
self-managed timer and the RTT-based congestion control loop with an Ack-based
Capacity and Congestion Estimation (ACCE) module. In non-technical words, it
sends data down the stack when its internal timer expires, and the timing of
the received ACKs contribute to updating this timer regularly.

We tried to add this new sender paradigm without deeply touching existing code.
In fact, we added four (optional) new congestion control functions:

+       /* get the expiration time for the send timer (optional) */
+       unsigned long (*get_send_timer_exp_time)(struct sock *sk);
+       /* no data to transmit at the timer expiration (optional) */
+       void (*no_data_to_transmit)(struct sock *sk);
+       /* the send timer is expired (optional) */
+       void (*send_timer_expired)(struct sock *sk);
+       /* the TCP has sent some segments (optional) */
+       void (*segment_sent)(struct sock *sk, u32 sent);

And a timer (tp->send_timer) which uses a send callback to push data down the
stack. If the first of these function, get_send_timer_exp_time,  is not
implemented by the current congestion control, then the timer sending timer is
never set, therefore falling back to the old, ACK-clocked, behavior.

The TCPW module itself extensively make use of the existing infrastructure and
parameters to calculate its timer, plus some heuristics when it is not possible
to have trustworthy values from the network.

You can find more stuff related to TCPW (extended results, the test programs
used and the setup for the experiments, a document describing the algorithm in
detail and so on) at:

[1] http://tlcsat.uniroma2.it/tcpwave4linux/

We would greatly appreciate any feedback from you, comments, suggestions,
corrections and so on. Thank you for your attention.

Cesare, Francesco, Ahmed, Natale

Natale Patriciello (5):
  tcp: Added callback for timed sender operations
  tcp: Implemented the timing-based operations
  tcp: PSH frames sent without timer involved
  tcp: Add initial delay to allow data queueing
  wave: Added basic version of TCP Wave

 MAINTAINERS           |   6 +
 include/linux/tcp.h   |   3 +
 include/net/tcp.h     |   8 +
 net/ipv4/Kconfig      |  16 +
 net/ipv4/Makefile     |   1 +
 net/ipv4/tcp.c        |   8 +-
 net/ipv4/tcp_ipv4.c   |   2 +
 net/ipv4/tcp_output.c |  73 +++-
 net/ipv4/tcp_wave.c   | 914 ++++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 1023 insertions(+), 8 deletions(-)
 create mode 100644 net/ipv4/tcp_wave.c

-- 
2.13.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 1/5] tcp: Added callback for timed sender operations
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
@ 2017-07-28 19:59 ` Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations Natale Patriciello
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

Standard TCP is ACK-clocked, or in other words it must wait for ACKs after
sending a full window size bytes. However, in some particular cases, a
congestion control would like to be able to tell the TCP implementation when it
is possible to send segments through a timer. This patch adds the interface
(completely optional) between a congestion control and the TCP implementation.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
---
 include/net/tcp.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index be6223c586fa..bf661ccc53a2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -939,6 +939,14 @@ struct tcp_congestion_ops {
 	/* get info for inet_diag (optional) */
 	size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
 			   union tcp_cc_info *info);
+	/* get the expiration time for the send timer (optional) */
+	unsigned long (*get_send_timer_exp_time)(struct sock *sk);
+	/* no data to transmit at the timer expiration (optional) */
+	void (*no_data_to_transmit)(struct sock *sk);
+	/* the send timer is expired (optional) */
+	void (*send_timer_expired)(struct sock *sk);
+	/* the TCP has sent some segments (optional) */
+	void (*segment_sent)(struct sock *sk, u32 sent);
 
 	char 		name[TCP_CA_NAME_MAX];
 	struct module 	*owner;
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 1/5] tcp: Added callback for timed sender operations Natale Patriciello
@ 2017-07-28 19:59 ` Natale Patriciello
  2017-07-29  1:46   ` David Miller
  2017-07-28 19:59 ` [RFC PATCH v1 3/5] tcp: PSH frames sent without timer involved Natale Patriciello
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

Timing the TCP operation based on the timer returned by the congestion control.
If the congestion control does not implement the timing interface, the TCP
behaves as usual, by sending down segments as soon as it is possible. Otherwise,
it will wait until the timer expires (and so respecting the timing constraint
set by the congestion control).

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
---
 include/linux/tcp.h   |  3 +++
 net/ipv4/tcp_ipv4.c   |  2 ++
 net/ipv4/tcp_output.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index b6d5adcee8fc..140bc20ec17e 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -369,6 +369,9 @@ struct tcp_sock {
 	 */
 	struct request_sock *fastopen_rsk;
 	u32	*saved_syn;
+
+/* TCP send timer */
+	struct timer_list send_timer;
 };
 
 enum tsq_enum {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5ab2aac5ca19..ef5fdba096e8 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1351,6 +1351,8 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 	if (*own_req)
 		tcp_move_syn(newtp, req);
 
+	init_timer(&newtp->send_timer);
+
 	return newsk;
 
 exit_overflow:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4858e190f6ac..357b9cd5019e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2187,6 +2187,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 			   int push_one, gfp_t gfp)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	const struct tcp_congestion_ops *ca_ops;
 	struct sk_buff *skb;
 	unsigned int tso_segs, sent_pkts;
 	int cwnd_quota;
@@ -2194,6 +2195,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 	bool is_cwnd_limited = false, is_rwnd_limited = false;
 	u32 max_segs;
 
+	ca_ops = inet_csk(sk)->icsk_ca_ops;
 	sent_pkts = 0;
 
 	if (!push_one) {
@@ -2292,8 +2294,16 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 			tcp_schedule_loss_probe(sk);
 		is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd);
 		tcp_cwnd_validate(sk, is_cwnd_limited);
+
+		/* Duplicated because of tp->prr_out value */
+		if (ca_ops && ca_ops->segment_sent)
+			ca_ops->segment_sent(sk, sent_pkts);
 		return false;
 	}
+
+	if (ca_ops && ca_ops->segment_sent)
+		ca_ops->segment_sent(sk, 0);
+
 	return !tp->packets_out && tcp_send_head(sk);
 }
 
@@ -2433,6 +2443,15 @@ void tcp_send_loss_probe(struct sock *sk)
 	tcp_rearm_rto(sk);
 }
 
+static void __tcp_push_pending_frames_handler(unsigned long data)
+{
+	struct sock *sk = (struct sock *)data;
+
+	lock_sock(sk);
+	tcp_push_pending_frames(sk);
+	release_sock(sk);
+}
+
 /* Push out any pending frames which were held back due to
  * TCP_CORK or attempt at coalescing tiny packets.
  * The socket must be locked by the caller.
@@ -2440,6 +2459,8 @@ void tcp_send_loss_probe(struct sock *sk)
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 			       int nonagle)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
+
 	/* If we are closed, the bytes will have to remain here.
 	 * In time closedown will finish, we empty the write queue and
 	 * all will be happy.
@@ -2447,9 +2468,38 @@ void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 	if (unlikely(sk->sk_state == TCP_CLOSE))
 		return;
 
-	if (tcp_write_xmit(sk, cur_mss, nonagle, 0,
-			   sk_gfp_mask(sk, GFP_ATOMIC)))
-		tcp_check_probe_timer(sk);
+	if (timer_pending(&tp->send_timer) == 0) {
+		/* Timer is not running, push data out */
+		int ret;
+		const struct tcp_congestion_ops *ca_ops;
+
+		ca_ops = inet_csk(sk)->icsk_ca_ops;
+
+		if (ca_ops && ca_ops->send_timer_expired)
+			ca_ops->send_timer_expired(sk);
+
+		if (tcp_write_xmit(sk, cur_mss, nonagle, 0, sk_gfp_mask(sk, GFP_ATOMIC)))
+			tcp_check_probe_timer(sk);
+
+		/* And now let's init the timer only if we have data */
+		if (tcp_send_head(sk)) {
+			if (ca_ops && ca_ops->get_send_timer_exp_time) {
+				unsigned long expiration;
+
+				setup_timer(&tp->send_timer,
+					    __tcp_push_pending_frames_handler,
+					    (unsigned long)sk);
+				expiration = ca_ops->get_send_timer_exp_time(sk);
+				ret = mod_timer(&tp->send_timer,
+						jiffies + expiration);
+				BUG_ON(ret != 0);
+			}
+		} else {
+			del_timer(&tp->send_timer);
+			if (ca_ops && ca_ops->no_data_to_transmit)
+				ca_ops->no_data_to_transmit(sk);
+		}
+	}
 }
 
 /* Send _single_ skb sitting at the send head. This function requires
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 3/5] tcp: PSH frames sent without timer involved
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 1/5] tcp: Added callback for timed sender operations Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations Natale Patriciello
@ 2017-07-28 19:59 ` Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 4/5] tcp: Add initial delay to allow data queueing Natale Patriciello
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

Segments flagged with 'PSH' should be sent as soon as possible,
ignoring the timing set by the congestion control (if any).
This patch avoids the waiting of 'PSH' segments in the TCP queue.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
---
 net/ipv4/tcp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 40aca7803cf2..ebaedbf75b63 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -975,9 +975,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 
 		if (forced_push(tp)) {
 			tcp_mark_push(tp, skb);
-			__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
-		} else if (skb == tcp_send_head(sk))
 			tcp_push_one(sk, mss_now);
+		} else if (skb == tcp_send_head(sk))
+			__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
 		continue;
 
 wait_for_sndbuf:
@@ -1320,9 +1320,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 
 		if (forced_push(tp)) {
 			tcp_mark_push(tp, skb);
-			__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
-		} else if (skb == tcp_send_head(sk))
 			tcp_push_one(sk, mss_now);
+		} else if (skb == tcp_send_head(sk))
+			__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
 		continue;
 
 wait_for_sndbuf:
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 4/5] tcp: Add initial delay to allow data queueing
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
                   ` (2 preceding siblings ...)
  2017-07-28 19:59 ` [RFC PATCH v1 3/5] tcp: PSH frames sent without timer involved Natale Patriciello
@ 2017-07-28 19:59 ` Natale Patriciello
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
  2017-07-29  5:33 ` [RFC PATCH v1 0/5] " Eric Dumazet
  5 siblings, 0 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

To be able to correctly send out data respecting the congestion control timing
requests, it is necessary to add a small delay at the beginning of the transfer.
It is necessary because, if there is no data in the TCP queue, the sending timer
is useless (and therefore not set). But, it can happen (especially at the
beginning) that an application can not appropriately fill the TCP queue,
therefore making the respect of the timing constraint set by the congestion
control impossible.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
---
 net/ipv4/tcp_output.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 357b9cd5019e..febce533c0a0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -64,6 +64,7 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
 
 static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 			   int push_one, gfp_t gfp);
+static void __tcp_push_pending_frames_handler(unsigned long data);
 
 /* Account for new data that has been sent to the network. */
 static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff *skb)
@@ -1034,6 +1035,18 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	if (skb->len != tcp_header_size) {
 		tcp_event_data_sent(tp, sk);
 		tp->data_segs_out += tcp_skb_pcount(skb);
+	} else {
+		if (timer_pending(&tp->send_timer) == 0 &&
+		    (!(tcb->tcp_flags & TCPHDR_SYN))) {
+			/* Timer is not running, adding a bit of delay
+			 * to allow data to be queued.
+			 */
+			setup_timer(&tp->send_timer,
+				    __tcp_push_pending_frames_handler,
+				    (unsigned long)sk);
+			mod_timer(&tp->send_timer,
+				  jiffies + msecs_to_jiffies(5));
+		}
 	}
 
 	if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
                   ` (3 preceding siblings ...)
  2017-07-28 19:59 ` [RFC PATCH v1 4/5] tcp: Add initial delay to allow data queueing Natale Patriciello
@ 2017-07-28 19:59 ` Natale Patriciello
  2017-07-28 23:15   ` Neal Cardwell
                     ` (4 more replies)
  2017-07-29  5:33 ` [RFC PATCH v1 0/5] " Eric Dumazet
  5 siblings, 5 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-07-28 19:59 UTC (permalink / raw)
  To: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Natale Patriciello, Francesco Zampognaro,
	Cesare Roseti

TCP Wave (TCPW) replaces the window-based transmission paradigm of the
standard TCP with a burst-based transmission, the ACK-clock scheduling
with a self-managed timer and the RTT-based congestion control loop
with an Ack-based Capacity and Congestion Estimation (ACCE) module. In
non-technical words, it sends data down the stack when its internal
timer expires, and the timing of the received ACKs contribute to
updating this timer regularly.

It is the first TCP congestion control that uses the timing constraint
developed in the Linux kernel.

Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
---
 MAINTAINERS           |   6 +
 net/ipv4/Kconfig      |  16 +
 net/ipv4/Makefile     |   1 +
 net/ipv4/tcp_output.c |   4 +-
 net/ipv4/tcp_wave.c   | 914 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 940 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tcp_wave.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 767e9d202adf..39c57bdc417d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12427,6 +12427,12 @@ W:	http://tcp-lp-mod.sourceforge.net/
 S:	Maintained
 F:	net/ipv4/tcp_lp.c
 
+TCP WAVE MODULE
+M:	"Natale Patriciello" <natale.patriciello@gmail.com>
+W:	http://tcp-lp-mod.sourceforge.net/
+S:	Maintained
+F:	net/ipv4/tcp_wave.c
+
 TDA10071 MEDIA DRIVER
 M:	Antti Palosaari <crope@iki.fi>
 L:	linux-media@vger.kernel.org
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 91a2557942fa..de23b3a04b98 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -492,6 +492,18 @@ config TCP_CONG_BIC
 	increase provides TCP friendliness.
 	See http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/
 
+config TCP_CONG_WAVE
+	tristate "Wave TCP"
+	default m
+	---help---
+	TCP Wave (TCPW) replaces the window-based transmission paradigm of the
+	standard TCP with a burst-based transmission, the ACK-clock scheduling
+	with a self-managed timer and the RTT-based congestion control loop with
+	an Ack-based Capacity and Congestion Estimation (ACCE) module. In
+	non-technical words, it sends data down the stack when its internal
+	timer expires, and the timing of the received ACKs contribute to
+	updating this timer regularly.
+
 config TCP_CONG_CUBIC
 	tristate "CUBIC TCP"
 	default y
@@ -690,6 +702,9 @@ choice
 	config DEFAULT_CUBIC
 		bool "Cubic" if TCP_CONG_CUBIC=y
 
+	config DEFAULT_WAVE
+		bool "Wave" if TCP_CONG_WAVE=y
+
 	config DEFAULT_HTCP
 		bool "Htcp" if TCP_CONG_HTCP=y
 
@@ -729,6 +744,7 @@ config DEFAULT_TCP_CONG
 	string
 	default "bic" if DEFAULT_BIC
 	default "cubic" if DEFAULT_CUBIC
+	default "wave" if DEFAULT_WAVE
 	default "htcp" if DEFAULT_HTCP
 	default "hybla" if DEFAULT_HYBLA
 	default "vegas" if DEFAULT_VEGAS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index f83de23a30e7..c5b3ae3cf5b1 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
 obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
+obj-$(CONFIG_TCP_CONG_WAVE) += tcp_wave.o
 obj-$(CONFIG_TCP_CONG_DCTCP) += tcp_dctcp.o
 obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o
 obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index febce533c0a0..616daf46b3df 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2522,7 +2522,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 {
 	struct sk_buff *skb = tcp_send_head(sk);
 
-	BUG_ON(!skb || skb->len < mss_now);
+	/* Don't be forced to send not meaningful data */
+	if (!skb || skb->len < mss_now)
+		return;
 
 	tcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation);
 }
diff --git a/net/ipv4/tcp_wave.c b/net/ipv4/tcp_wave.c
new file mode 100644
index 000000000000..079df4d223e2
--- /dev/null
+++ b/net/ipv4/tcp_wave.c
@@ -0,0 +1,914 @@
+/*
+ * TCP Wave
+ *
+ * Copyright 2017 Natale Patriciello <natale.patriciello@gmail.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include <net/tcp.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+
+#define WAVE_DEBUG 1
+
+#ifdef WAVE_DEBUG
+	#define DBG(msg ...) printk(KERN_DEBUG "WAVE_DEBUG: " msg)
+#else
+	static inline void DBG(const char *msg, ...) { }
+#endif
+
+static uint init_burst __read_mostly = 10;
+static uint min_burst __read_mostly = 3;
+static uint init_timer_ms __read_mostly = 500;
+static uint beta_ms __read_mostly = 150;
+
+module_param(init_burst, uint, 0644);
+MODULE_PARM_DESC(init_burst, "initial burst (segments)");
+module_param(min_burst, uint, 0644);
+MODULE_PARM_DESC(min_burst, "minimum burst (segments)");
+module_param(init_timer_ms, uint, 0644);
+MODULE_PARM_DESC(init_timer_ms, "initial timer (ms)");
+module_param(beta_ms, uint, 0644);
+MODULE_PARM_DESC(beta_ms, "beta parameter (ms)");
+
+/* Shift factor for the exponentially weighted average. */
+#define AVG_SCALE 20
+#define AVG_UNIT (1 << AVG_SCALE)
+
+/* Taken from BBR */
+#define BW_SCALE 24
+#define BW_UNIT (1 << BW_SCALE)
+
+/* Tell if the driver is initialized (init has been called) */
+#define FLAG_INIT       0x1
+/* Tell if, as sender, the driver is started (after TX_START) */
+#define FLAG_START      0x2
+/* If it's true, we save the sent size as a burst */
+#define FLAG_SAVE       0x4
+
+/* List for saving the size of sent burst over time */
+struct wavetcp_burst_hist {
+	u16 size;               /* The burst size */
+	struct list_head list;  /* Kernel list declaration */
+};
+
+static __always_inline bool test_flag(u8 value, const u8 *flags)
+{
+	return (*flags & value) == value;
+}
+
+static __always_inline void set_flag(u8 value, u8 *flags)
+{
+	*flags |= value;
+}
+
+static __always_inline void clear_flag(u8 value, u8 *flags)
+{
+	*flags &= ~(value);
+}
+
+/* TCP Wave private struct */
+struct wavetcp {
+	/* The module flags */
+	u8 flags;
+	/* The current transmission timer (us) */
+	u32 tx_timer;
+	/* The current burst size (segments) */
+	u16 burst;
+	/* Represents a delta from the burst size of segments sent */
+	char delta_segments;
+	/* The segments acked in the round */
+	u16 pkts_acked;
+	/* Heuristic scale, to divide the RTT */
+	u8 heuristic_scale;
+	/* Previous ack_train_disp Value */
+	u32 previous_ack_train_disp;
+	/* First ACK time of the round */
+	u32 first_ack_time;
+	/* Backup value of the first ack time */
+	u32 backup_first_ack_time;
+	/* First RTT of the round */
+	u32 first_rtt;
+	/* Minimum RTT of the round */
+	u32 min_rtt;
+	/* Average RTT of the previous round */
+	u32 avg_rtt;
+	/* Maximum RTT */
+	u32 max_rtt;
+	/* Stability factor */
+	u8 stab_factor;
+	/* The memory cache for saving the burst sizes */
+	struct kmem_cache *cache;
+	/* The burst history */
+	struct wavetcp_burst_hist *history;
+	/* To Print TCP Source Port  */
+	u16 sport;
+};
+
+/* Called to setup Wave for the current socket after it enters the CONNECTED
+ * state (i.e., called after the SYN-ACK is received). The slow start should be
+ * 0 (see wavetcp_get_ssthresh) and we set the initial cwnd to the initial
+ * burst.
+ *
+ * After the ACK of the SYN-ACK is sent, the TCP will add a bit of delay to
+ * permit the queueing of data from the application, otherwise we will end up
+ * in a scattered situation (we have one segment -> send it -> no other segment,
+ * don't set the timer -> slightly after, another segment come and we loop).
+ *
+ * At the first expiration, the cwnd will be large enough to push init_burst
+ * segments out.
+ */
+static void wavetcp_init(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	ca->sport = ntohs(inet_sk(sk)->inet_sport);
+
+	DBG("%u sport: %u [%s]\n", tcp_time_stamp, ca->sport,
+	    __func__);
+
+	/* Setting the initial Cwnd to 0 will not call the TX_START event */
+	tp->snd_ssthresh = 0;
+	tp->snd_cwnd = init_burst;
+
+	/* Used to avoid to take the SYN-ACK measurements */
+	ca->flags = 0;
+	ca->flags = FLAG_INIT | FLAG_SAVE;
+
+	ca->burst = init_burst;
+	ca->delta_segments = init_burst;
+	ca->tx_timer = init_timer_ms * USEC_PER_MSEC;
+	ca->first_ack_time = 0;
+	ca->backup_first_ack_time = 0;
+	ca->heuristic_scale = 0;
+	ca->first_rtt = 0;
+	ca->min_rtt = -1; /* a lot of time */
+	ca->avg_rtt = 0;
+	ca->max_rtt = 0;
+	ca->stab_factor = 0;
+	ca->previous_ack_train_disp = 0;
+
+	ca->history = kmalloc(sizeof(*ca->history), GFP_KERNEL);
+
+	/* Init the history of bwnd */
+	INIT_LIST_HEAD(&ca->history->list);
+
+	/* Init our cache pool for the bwnd history */
+	ca->cache = KMEM_CACHE(wavetcp_burst_hist, 0);
+	BUG_ON(ca->cache == 0);
+}
+
+static void wavetcp_release(struct sock *sk)
+{
+	struct wavetcp *ca = inet_csk_ca(sk);
+	struct wavetcp_burst_hist *tmp;
+	struct list_head *pos, *q;
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	DBG("%u sport: %u [%s]\n", tcp_time_stamp, ca->sport,
+	    __func__);
+
+	list_for_each_safe(pos, q, &ca->history->list) {
+		tmp = list_entry(pos, struct wavetcp_burst_hist, list);
+		list_del(pos);
+		kmem_cache_free(ca->cache, tmp);
+	}
+
+	if (ca->history != 0)
+		kfree(ca->history);
+
+	/* Thanks for the cache, we don't need it anymore */
+	if (ca->cache != 0)
+		kmem_cache_destroy(ca->cache);
+}
+
+static void wavetcp_print_history(struct wavetcp *ca)
+{
+	struct wavetcp_burst_hist *tmp;
+	struct list_head *pos, *q;
+
+	list_for_each_safe(pos, q, &ca->history->list) {
+		tmp = list_entry(pos, struct wavetcp_burst_hist, list);
+		DBG("[%s] %u\n", __func__, tmp->size);
+	}
+}
+
+/* Please explain that we will be forever in congestion avoidance. */
+static u32 wavetcp_recalc_ssthresh(struct sock *sk)
+{
+	DBG("%u [%s]\n", tcp_time_stamp, __func__);
+	return 0;
+}
+
+static void wavetcp_state(struct sock *sk, u8 new_state)
+{
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	switch (new_state) {
+	case TCP_CA_Open:
+		DBG("%u sport: %u [%s] set CA_Open\n", tcp_time_stamp,
+		    ca->sport, __func__);
+		/* We have fully recovered, so reset some variables */
+		ca->delta_segments = 0;
+		break;
+	default:
+		DBG("%u sport: %u [%s] set state %u, ignored\n",
+		    tcp_time_stamp, ca->sport, __func__, new_state);
+	}
+}
+
+static u32 wavetcp_undo_cwnd(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	/* Not implemented yet. We stick to the decision made earlier */
+	DBG("%u [%s]\n", tcp_time_stamp, __func__);
+	return tp->snd_cwnd;
+}
+
+/* Add the size of the burst in the history of bursts */
+static void wavetcp_insert_burst(struct wavetcp *ca, u32 burst)
+{
+	struct wavetcp_burst_hist *cur;
+
+	DBG("%u sport: %u [%s] adding %u segment in the history of burst\n",
+	    tcp_time_stamp, ca->sport, __func__, burst);
+
+	/* Take the memory from the pre-allocated pool */
+	cur = (struct wavetcp_burst_hist *)kmem_cache_alloc(ca->cache,
+							    GFP_KERNEL);
+	BUG_ON(!cur);
+
+	cur->size = burst;
+	list_add_tail(&cur->list, &ca->history->list);
+}
+
+static void wavetcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)
+{
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	switch (event) {
+	case CA_EVENT_TX_START:
+		/* first transmit when no packets in flight */
+		DBG("%u sport: %u [%s] TX_START\n", tcp_time_stamp,
+		    ca->sport, __func__);
+
+		set_flag(FLAG_START, &ca->flags);
+
+		break;
+	default:
+		DBG("%u sport: %u [%s] got event %u, ignored\n",
+		    tcp_time_stamp, ca->sport, __func__, event);
+		break;
+	}
+}
+
+static __always_inline void wavetcp_adj_mode(struct wavetcp *ca,
+					     unsigned long delta_rtt)
+{
+	ca->stab_factor = ca->avg_rtt / ca->tx_timer;
+
+	ca->min_rtt = -1; /* a lot of time */
+	ca->avg_rtt = ca->max_rtt;
+	ca->tx_timer = init_timer_ms * USEC_PER_MSEC;
+
+	DBG("%u sport: %u [%s] stab_factor %u, timer %u us, avg_rtt %u us\n",
+	    tcp_time_stamp, ca->sport, __func__, ca->stab_factor,
+	    ca->tx_timer, ca->avg_rtt);
+}
+
+static __always_inline void wavetcp_tracking_mode(struct wavetcp *ca,
+						  u32 ack_train_disp,
+						  u64 delta_rtt)
+{
+	if (ack_train_disp == 0) {
+		DBG("%u sport: %u [%s] ack_train_disp is 0. Impossible to do tracking.\n",
+		    tcp_time_stamp, ca->sport, __func__);
+		return;
+	}
+
+	ca->tx_timer = (ack_train_disp + (delta_rtt / 2));
+
+	if (ca->tx_timer == 0) {
+		DBG("%u sport: %u [%s] WARNING: tx timer is 0"
+		    ", forcefully set it to 1000 us\n",
+		    tcp_time_stamp, ca->sport, __func__);
+		ca->tx_timer = 1000;
+	}
+
+	DBG("%u sport: %u [%s] tx timer is %u us\n",
+	    tcp_time_stamp, ca->sport, __func__,
+	    ca->tx_timer);
+}
+
+/* The weight a is:
+ *
+ * a = (first_rtt - min_rtt) / first_rtt
+ *
+ */
+static __always_inline u64 wavetcp_compute_weight(u32 first_rtt,
+						  u32 min_rtt)
+{
+	u64 diff = first_rtt - min_rtt;
+
+	diff = diff * AVG_UNIT;
+
+	return diff / first_rtt;
+}
+
+static u32 heuristic_ack_train_disp(struct wavetcp *ca, const struct rate_sample *rs,
+				    u32 burst)
+{
+	u32 ack_train_disp = 0;
+	u32 backup_interval = 0;
+
+	BUG_ON (ca->previous_ack_train_disp != 0);
+
+	/*
+	 * The heuristic takes the RTT of the first ACK, the RTT of the
+	 * latest ACK, and uses the difference as ack_train_disp.
+	 *
+	 * If the sample for the first and last ACK are the same (e.g.,
+	 * one ACK per burst) we use as the latest option the value of
+	 * interval_us (which is the RTT). However, this value is
+	 * exponentially lowered each time we don't have any valid
+	 * sample (i.e., we perform a division by 2, by 4, and so on).
+	 * The increased transmitted rate, if it is out of the capacity
+	 * of the bottleneck, will be compensated by an higher
+	 * delta_rtt, and so limited by the adjustment algorithm. This
+	 * is a blind search, but we do not have any valid sample...
+	 */
+	if (rs->interval_us > 0) {
+		if (rs->interval_us >= ca->backup_first_ack_time) {
+			/* first heuristic */
+			backup_interval = rs->interval_us - ca->backup_first_ack_time;
+		} else {
+			/* this branch avoids an overflow. However, reaching
+			 * this point means that the ACK train is not aligned
+			 * with the sent burst.
+			 */
+			backup_interval = ca->backup_first_ack_time - rs->interval_us;
+		}
+
+		if (backup_interval == 0) {
+			/* Blind search */
+			ack_train_disp = rs->interval_us >> ca->heuristic_scale;
+			++ca->heuristic_scale;
+			DBG("%u sport: %u [%s] we received one BIG ack."
+			    " Doing an heuristic with scale %u, interval_us"
+			    " %li us, and setting ack_train_disp to %u us\n",
+			    tcp_time_stamp, ca->sport, __func__,
+			    ca->heuristic_scale, rs->interval_us, ack_train_disp);
+		} else {
+			ack_train_disp = backup_interval;
+			DBG("%u sport: %u [%s] we got the first ack with"
+			    " interval %u us, the last (this) with interval %li us."
+			    " Doing a substraction and setting ack_train_disp"
+			    " to %u us\n",
+			    tcp_time_stamp, ca->sport, __func__,
+			    ca->backup_first_ack_time, rs->interval_us,
+			    ack_train_disp);
+		}
+	} else {
+		DBG("%u sport: %u [%s] WARNING is not possible "
+		    "to heuristically calculate ack_train_disp, returning 0."
+		    "Delivered %u, interval_us %li\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    rs->delivered, rs->interval_us);
+		return 0;
+	}
+
+	return ack_train_disp;
+}
+
+static u32 calculate_ack_train_disp(struct wavetcp *ca,
+				    const struct rate_sample *rs,
+				    u32 burst, u64 delta_rtt)
+{
+	u32 ack_train_disp = jiffies_to_usecs(tcp_time_stamp - ca->first_ack_time);
+
+	if (ca->previous_ack_train_disp == 0 && ack_train_disp == 0) {
+		/* We received a cumulative ACK just after we sent the data, so
+		 * the dispersion would be close to zero, OR the connection
+		 * is so fast that tcp_time_stamp is not good enough to measure
+		 * time. Moreover, we don't have any valid sample from the past;
+		 * in this case, we use an heuristic to calculate
+		 * ack_train_disp.
+		 */
+		return heuristic_ack_train_disp(ca, rs, burst);
+	}
+
+	DBG("%u sport: %u [%s] using measured ack_train_disp %u",
+	    tcp_time_stamp, ca->sport, __func__, ack_train_disp);
+
+	/* resetting the heuristic scale because we have a real sample */
+	ca->heuristic_scale = 0;
+
+	if (ca->previous_ack_train_disp == 0) {
+		/* initialize the value */
+		ca->previous_ack_train_disp = ack_train_disp;
+	} else if (ack_train_disp > ca->previous_ack_train_disp) {
+		/* filter the measured value */
+		u64 alpha;
+		u64 left;
+		u64 right;
+
+		alpha = (delta_rtt * AVG_UNIT) / (beta_ms * 1000);
+		left = ((AVG_UNIT - alpha) * ca->previous_ack_train_disp) / AVG_UNIT;
+		right = (alpha * ack_train_disp) / AVG_UNIT;
+		DBG("%u sport: %u [%s] AVG_UNIT %i delta_rtt %llu beta %i alpha %llu "
+		    "rcv_ack_train_disp %u prv_ack_train_disp %u left %llu right %llu\n",
+		    tcp_time_stamp, ca->sport, __func__, AVG_UNIT, delta_rtt,
+		    beta_ms, alpha, ack_train_disp, ca->previous_ack_train_disp,
+		    left, right);
+
+		ack_train_disp = (u32)left + (u32)right;
+
+		DBG("%u sport: %u [%s] filtered_ack_train_disp %u (u32)left %u (u32)right %u\n",
+		    tcp_time_stamp, ca->sport, __func__, ack_train_disp,
+		    (u32)left, (u32)right);
+
+	} else if (ack_train_disp == 0) {
+		/* Use the plain previous value */
+		ack_train_disp = ca->previous_ack_train_disp;
+	} else {
+		/* In all other cases, update the previous value */
+		ca->previous_ack_train_disp = ack_train_disp;
+	}
+
+	DBG("%u sport: %u [%s] previous_ack_train_disp %u us, final ack_train_disp %u us\n",
+	    tcp_time_stamp, ca->sport, __func__,
+	    ca->previous_ack_train_disp, ack_train_disp);
+
+	return ack_train_disp;
+}
+
+static u64 calculate_delta_rtt(struct wavetcp *ca)
+{
+	if (ca->first_rtt == 0) {
+		ca->first_rtt = ca->avg_rtt;
+		DBG("%u sport: %u [%s] It was impossible to get any rtt "
+		    "in the train. Using the average value %u\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    ca->first_rtt);
+	}
+	/* Why the first if?
+	 *
+	 * a = (first_rtt - min_rtt) / first_rtt = 1 - (min_rtt/first_rtt)
+	 *
+	 * avg_rtt_0 = (1 - a) * first_rtt
+	 *           = (1 - (1 - (min_rtt/first_rtt))) * first_rtt
+	 *           = first_rtt - (first_rtt - min_rtt)
+	 *           = min_rtt
+	 *
+	 *
+	 * And.. what happen in the else branch? We calculate first a (scaled by
+	 * 1024), then do the substraction (1-a) by keeping in the consideration
+	 * the scale, and in the end coming back to the result removing the
+	 * scaling.
+	 *
+	 * We divide the equation
+	 *
+	 * AvgRtt = a * AvgRtt + (1-a)*Rtt
+	 *
+	 * in two part properly scaled, left and right, and then having a sum of
+	 * the two parts to avoid (possible) overflow.
+	 */
+	if (ca->avg_rtt == 0) {
+		ca->avg_rtt = ca->min_rtt;
+	} else if (ca->first_rtt > 0) {
+		u64 a;
+		u64 left;
+		u64 right;
+		a = wavetcp_compute_weight(ca->first_rtt, ca->min_rtt);
+
+		DBG("%u sport: %u [%s] init. avg %u us, first %u us, "
+		    "min %u us, a (shifted) %llu",
+		    tcp_time_stamp, ca->sport, __func__,
+		    ca->avg_rtt, ca->first_rtt, ca->min_rtt, a);
+
+		left = (a * ca->avg_rtt) / AVG_UNIT;
+		right = ((AVG_UNIT - a) * ca->first_rtt) / AVG_UNIT;
+
+		ca->avg_rtt = (u32)left + (u32)right;
+	} else {
+		DBG("%u sport: %u [%s] first_rtt is 0. It is impossible "
+		    "to calculate the average RTT. Using the old value.\n",
+		    tcp_time_stamp, ca->sport, __func__);
+	}
+
+	DBG("%u sport: %u [%s] final avg %u\n",
+	    tcp_time_stamp, ca->sport, __func__, ca->avg_rtt);
+	/* We clearly missed a measurements if this happens */
+	BUG_ON(ca->avg_rtt < ca->min_rtt);
+	return ca->avg_rtt - ca->min_rtt;
+}
+
+static void wavetcp_round_terminated(struct sock *sk, const struct rate_sample *rs,
+				     u32 burst)
+{
+	u64 delta_rtt;
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	DBG("%u sport: %u [%s] reached the burst size %u\n",
+	    tcp_time_stamp, ca->sport, __func__, burst);
+
+	BUG_ON(time_after((unsigned long)ca->first_ack_time,
+			  (unsigned long)tcp_time_stamp));
+
+	delta_rtt = calculate_delta_rtt(ca);
+	DBG("%u sport: %u [%s] delta rtt %llu us\n",
+	    tcp_time_stamp, ca->sport, __func__, delta_rtt);
+
+	/* If we have to wait, let's wait */
+	if (ca->stab_factor > 0) {
+		--ca->stab_factor;
+		DBG("%u sport: %u [%s] avoiding update for stability reasons\n",
+		    tcp_time_stamp, ca->sport, __func__);
+		return;
+	}
+
+	DBG("%u sport: %u [%s] drtt %llu\n",
+	    tcp_time_stamp, ca->sport, __func__, delta_rtt);
+
+	/* delta_rtt is in us, beta_ms in ms */
+	if (delta_rtt > beta_ms * 1000)
+		wavetcp_adj_mode(ca,  delta_rtt);
+	else
+		wavetcp_tracking_mode(ca, calculate_ack_train_disp(ca, rs,
+								   burst,
+								   delta_rtt),
+				      delta_rtt);
+}
+
+static void wavetcp_cong_control(struct sock *sk, const struct rate_sample *rs)
+{
+	struct wavetcp_burst_hist *tmp;
+	struct list_head *pos;
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	if (ca->backup_first_ack_time == 0 && rs->interval_us > 0)
+		ca->backup_first_ack_time = rs->interval_us;
+
+	pos = ca->history->list.next;
+	tmp = list_entry(pos, struct wavetcp_burst_hist, list);
+
+	if (tmp->size == 0) {
+		/* No burst in memory. Most likely we sent some segments out of
+		 * the allowed window (e.g., loss probe) */
+		DBG("%u sport: %u [%s] WARNING! empty burst\n",
+		    tcp_time_stamp, ca->sport, __func__);
+		wavetcp_print_history(ca);
+		goto reset;
+	}
+
+	DBG("%u sport: %u [%s] prior_delivered %u, delivered %i, interval_us %li, "
+	    "rtt_us %li, losses %i, ack_sack %u, prior_in_flight %u, is_app %i,"
+	    " is_retrans %i\n", tcp_time_stamp, ca->sport, __func__,
+	    rs->prior_delivered, rs->delivered, rs->interval_us, rs->rtt_us,
+	    rs->losses, rs->acked_sacked, rs->prior_in_flight,
+	    rs->is_app_limited, rs->is_retrans);
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	/* Train management.*/
+	ca->pkts_acked += rs->acked_sacked;
+
+	if (ca->pkts_acked < tmp->size)
+		return;
+
+	while (ca->pkts_acked >= tmp->size) {
+		/* Usually the burst end is also reflected in the rs->delivered
+		 * variable. If this is not the case, and such variable is
+		 * behind just for 1 segment, then do this experimental thing
+		 * to re-allineate the burst with the rs->delivered variable.
+		 * In the majority of cases, we went out of allineation because
+		 * of a tail loss probe. */
+		if (rs->delivered + 1 == tmp->size) {
+			DBG("%u sport: %u [%s] highly experimental:"
+			    " ignore 1 pkt. pkts_acked %u, delivered %u,"
+			    " burst %u\n", tcp_time_stamp, ca->sport, __func__,
+			    ca->pkts_acked, rs->delivered, tmp->size);
+			ca->pkts_acked--;
+			return;
+		}
+		wavetcp_round_terminated(sk, rs, tmp->size);
+
+		BUG_ON(ca->pkts_acked < tmp->size);
+
+		ca->pkts_acked -= tmp->size;
+
+		/* Delete the burst from the history */
+		list_del(pos);
+		kmem_cache_free(ca->cache, tmp);
+
+		/* Take next burst */
+		pos = ca->history->list.next;
+		tmp = list_entry(pos, struct wavetcp_burst_hist, list);
+
+		/* If we cycle, inside wavetcp_round_terminated we will take the
+		 * Linux path instead of the wave path.. first_rtt will not be
+		 * read, so don't waste a cycle to set it */
+		ca->first_ack_time = tcp_time_stamp;
+		ca->backup_first_ack_time = 0;
+	}
+
+reset:
+	/* Reset the variables needed for the beginning of the next round*/
+	ca->first_ack_time = 0;
+	ca->backup_first_ack_time = 0;
+	ca->first_rtt = 0;
+	DBG("%u sport: %u [%s] resetting RTT values for next round\n",
+	    tcp_time_stamp, ca->sport, __func__);
+}
+
+static void wavetcp_acce(struct wavetcp *ca, s32 rtt_us, u32 pkts_acked)
+{
+	if (ca->first_ack_time == 0) {
+		ca->first_ack_time = tcp_time_stamp;
+		DBG("%u sport: %u [%s] first ack of the train\n",
+		    tcp_time_stamp, ca->sport, __func__);
+	}
+
+	if (ca->first_rtt == 0 && rtt_us > 0) {
+		ca->first_rtt = rtt_us;
+
+		DBG("%u sport: %u [%s] first measurement rtt %i\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    ca->first_rtt);
+	}
+
+	if (rtt_us <= 0)
+		return;
+
+	/* Check the minimum rtt we have seen */
+	if (rtt_us < ca->min_rtt) {
+		ca->min_rtt = rtt_us;
+		DBG("%u sport: %u [%s] min rtt %u\n", tcp_time_stamp,
+		    ca->sport, __func__, rtt_us);
+	}
+
+	if (rtt_us > ca->max_rtt)
+		ca->max_rtt = rtt_us;
+}
+
+/* Invoked each time we receive an ACK. Obviously, this function also gets
+ * called when we receive the SYN-ACK, but we ignore it thanks to the
+ * FLAG_INIT flag.
+ *
+ * We close the cwnd of the amount of segments acked, because we don't like
+ * sending out segments if the timer is not expired. Without doing this, we
+ * would end with cwnd - in_flight > 0.
+ */
+static void wavetcp_acked(struct sock *sk, const struct ack_sample *sample)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	if (!test_flag(FLAG_INIT, &ca->flags))
+		return;
+
+	DBG("%u sport: %u [%s] pkts_acked %u, rtt_us %i, in_flight %u "
+	    ", cwnd %u, seq ack %u\n",
+	    tcp_time_stamp, ca->sport, __func__, sample->pkts_acked,
+	    sample->rtt_us, sample->in_flight, tp->snd_cwnd, tp->snd_una);
+
+	/* We can divide the ACCE function in two part: the first take care of
+	 * the RTT, and the second of the train management. Here we could have
+	 * pkts_acked == 0, but with RTT values (because the underlying TCP can
+	 * identify what segment has been ACKed through the SACK option). In any
+	 * case, therefore, we enter wavetcp_acce.*/
+	wavetcp_acce(ca, sample->rtt_us, sample->pkts_acked);
+
+	if (tp->snd_cwnd < sample->pkts_acked) {
+		/* We sent some scattered segments, so the burst segments and
+		 * the ACK we get is not aligned.
+		 */
+		DBG("%u sport: %u [%s] delta_seg %i\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    ca->delta_segments);
+
+		ca->delta_segments += sample->pkts_acked - tp->snd_cwnd;
+	}
+
+	DBG("%u sport: %u [%s] snd_cwnd %u pkts_acked %u delta %i\n",
+	    tcp_time_stamp, ca->sport, __func__, tp->snd_cwnd,
+	    sample->pkts_acked, ca->delta_segments);
+
+	/* Brutally set the cwnd in order to not let segment out */
+	tp->snd_cwnd = tcp_packets_in_flight(tp);
+
+	DBG("%u sport: %u [%s] new window %u in_flight %u delta %i\n",
+	    tcp_time_stamp, ca->sport, __func__, tp->snd_cwnd,
+	    tcp_packets_in_flight(tp), ca->delta_segments);
+}
+
+/* The TCP informs us that the timer is expired (or has never been set). We can
+ * infer the latter by the FLAG_STARTED flag: if it's false, don't increase the
+ * cwnd, because it is at its default value (init_burst) and we still have to
+ * transmit the first burst.
+ */
+static void wavetcp_timer_expired(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct wavetcp *ca = inet_csk_ca(sk);
+	u32 current_burst = ca->burst;
+
+	BUG_ON(!test_flag(FLAG_INIT, &ca->flags));
+
+	if (!test_flag(FLAG_START, &ca->flags)) {
+		DBG("%u sport: %u [%s] returning because of !FLAG_START, leaving cwnd %u\n",
+		    tcp_time_stamp, ca->sport, __func__, tp->snd_cwnd);
+		return;
+	}
+
+	DBG("%u sport: %u [%s] starting with delta %u current_burst %u\n",
+	    tcp_time_stamp, ca->sport, __func__, ca->delta_segments,
+	    current_burst);
+
+	if (ca->delta_segments < 0) {
+		/* In the previous round, we sent more than the allowed burst,
+		 * so reduce the current burst.
+		 */
+		BUG_ON(current_burst > ca->delta_segments);
+		current_burst += ca->delta_segments; /* please *reduce* */
+
+		/* Right now, we should send "current_burst" segments out */
+
+		if (tcp_packets_in_flight(tp) > tp->snd_cwnd) {
+			/* For some reasons (e.g., tcp loss probe)
+			 * we sent something outside the allowed window.
+			 * Add the amount of segments into the burst, in order
+			 * to effectively send the previous "current_burst"
+			 * segments, but without touching delta_segments.
+			 */
+			u32 diff = tcp_packets_in_flight(tp) - tp->snd_cwnd;
+
+			current_burst += diff;
+			DBG("%u sport: %u [%s] adding %u to balance "
+			    "segments sent out of window", tcp_time_stamp,
+			    ca->sport, __func__, diff);
+		}
+	}
+
+	ca->delta_segments = current_burst;
+	DBG("%u sport: %u [%s] setting delta_seg %u current burst %u\n",
+	    tcp_time_stamp, ca->sport, __func__,
+	    ca->delta_segments, current_burst);
+
+	if (current_burst < min_burst) {
+		DBG("%u sport: %u [%s] WARNING !! not min_burst",
+		    tcp_time_stamp, ca->sport, __func__);
+		ca->delta_segments += min_burst - current_burst;
+		current_burst = min_burst;
+	}
+
+	tp->snd_cwnd += current_burst;
+	set_flag(FLAG_SAVE, &ca->flags);
+
+	DBG("%u sport: %u [%s], increased window of %u segments, "
+	    "total %u, delta %i, in_flight %u\n",
+	    tcp_time_stamp, ca->sport, __func__, ca->burst,
+	    tp->snd_cwnd, ca->delta_segments, tcp_packets_in_flight(tp));
+
+	if (tp->snd_cwnd - tcp_packets_in_flight(tp) > current_burst) {
+		DBG("%u sport: %u [%s] WARNING! "
+		    " cwnd %u, in_flight %u, current burst %u\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    tp->snd_cwnd, tcp_packets_in_flight(tp),
+		    current_burst);
+	}
+}
+
+/* The TCP is asking for a timer value in jiffies. This will be subject to
+ * change for a realtime timer in the future.
+ */
+static unsigned long wavetcp_get_timer(struct sock *sk)
+{
+	struct wavetcp *ca = inet_csk_ca(sk);
+	u32 timer;
+
+	BUG_ON(!test_flag(FLAG_INIT, &ca->flags));
+
+	timer = min_t(unsigned long, ca->tx_timer, init_timer_ms * USEC_PER_MSEC);
+
+	DBG("%u sport: %u [%s] returning timer of %u us\n",
+	    tcp_time_stamp, ca->sport, __func__, timer);
+
+	return usecs_to_jiffies(timer);
+}
+
+static void wavetcp_segment_sent(struct sock *sk, u32 sent)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct wavetcp *ca = inet_csk_ca(sk);
+
+	if (test_flag(FLAG_SAVE, &ca->flags) && sent > 0) {
+		wavetcp_insert_burst(ca, sent);
+		clear_flag(FLAG_SAVE, &ca->flags);
+	} else {
+		DBG("%u sport: %u [%s] not saving burst, sent %u\n",
+		    tcp_time_stamp, ca->sport, __func__, sent);
+	}
+
+	if (sent > ca->burst) {
+		DBG("%u sport: %u [%s] WARNING! sent %u, burst %u"
+		    " cwnd %u delta_seg %i\n, TSO very probable",
+		    tcp_time_stamp, ca->sport, __func__, sent,
+		    ca->burst, tp->snd_cwnd, ca->delta_segments);
+	}
+
+	ca->delta_segments -= sent;
+
+	if (ca->delta_segments >= 0 &&
+	    ca->burst > sent &&
+	    tcp_packets_in_flight(tp) <= tp->snd_cwnd) {
+		/* Reduce the cwnd accordingly, because we didn't sent enough
+		 * to cover it (we are app limited probably) */
+		u32 diff = ca->burst - sent;
+
+		if (tp->snd_cwnd >= diff)
+			tp->snd_cwnd -= diff;
+		else
+			tp->snd_cwnd = 0;
+		DBG("%u sport: %u [%s] reducing cwnd by %u, value %u\n",
+		    tcp_time_stamp, ca->sport, __func__,
+		    ca->burst - sent, tp->snd_cwnd);
+	}
+}
+
+static void wavetcp_no_data(struct sock *sk)
+{
+	DBG("%u [%s]\n", tcp_time_stamp, __func__);
+}
+
+static u32 wavetcp_sndbuf_expand(struct sock *sk)
+{
+	return 10;
+}
+
+static struct tcp_congestion_ops wave_cong_tcp __read_mostly = {
+	.init				= wavetcp_init,
+	.release			= wavetcp_release,
+	.ssthresh			= wavetcp_recalc_ssthresh,
+/*	.cong_avoid		= wavetcp_cong_avoid, */
+	.cong_control			= wavetcp_cong_control,
+	.set_state			= wavetcp_state,
+	.undo_cwnd			= wavetcp_undo_cwnd,
+	.cwnd_event			= wavetcp_cwnd_event,
+	.pkts_acked			= wavetcp_acked,
+	.sndbuf_expand			= wavetcp_sndbuf_expand,
+	.owner				= THIS_MODULE,
+	.name				= "wave",
+	.get_send_timer_exp_time	= wavetcp_get_timer,
+	.send_timer_expired		= wavetcp_timer_expired,
+	.no_data_to_transmit		= wavetcp_no_data,
+	.segment_sent			= wavetcp_segment_sent,
+};
+
+static int __init wavetcp_register(void)
+{
+	BUILD_BUG_ON(sizeof(struct wavetcp) > ICSK_CA_PRIV_SIZE);
+
+	return tcp_register_congestion_control(&wave_cong_tcp);
+}
+
+static void __exit wavetcp_unregister(void)
+{
+	tcp_unregister_congestion_control(&wave_cong_tcp);
+}
+
+module_init(wavetcp_register);
+module_exit(wavetcp_unregister);
+
+MODULE_AUTHOR("Natale Patriciello");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("WAVE TCP");
+MODULE_VERSION("0.1");
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
@ 2017-07-28 23:15   ` Neal Cardwell
  2017-07-29  1:51   ` David Miller
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Neal Cardwell @ 2017-07-28 23:15 UTC (permalink / raw)
  To: Natale Patriciello
  Cc: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Ahmed Said,
	Francesco Zampognaro, Cesare Roseti

On Fri, Jul 28, 2017 at 3:59 PM, Natale Patriciello
<natale.patriciello@gmail.com> wrote:
> TCP Wave (TCPW) replaces the window-based transmission paradigm of the
> standard TCP with a burst-based transmission, the ACK-clock scheduling
> with a self-managed timer and the RTT-based congestion control loop
> with an Ack-based Capacity and Congestion Estimation (ACCE) module. In
> non-technical words, it sends data down the stack when its internal
> timer expires, and the timing of the received ACKs contribute to
> updating this timer regularly.
>
> It is the first TCP congestion control that uses the timing constraint
> developed in the Linux kernel.
>
> Signed-off-by: Natale Patriciello <natale.patriciello@gmail.com>
> Tested-by: Ahmed Said <ahmed.said@uniroma2.it>
> ---
>  MAINTAINERS           |   6 +
>  net/ipv4/Kconfig      |  16 +
>  net/ipv4/Makefile     |   1 +
>  net/ipv4/tcp_output.c |   4 +-
>  net/ipv4/tcp_wave.c   | 914 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 940 insertions(+), 1 deletion(-)
>  create mode 100644 net/ipv4/tcp_wave.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 767e9d202adf..39c57bdc417d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12427,6 +12427,12 @@ W:     http://tcp-lp-mod.sourceforge.net/
>  S:     Maintained
>  F:     net/ipv4/tcp_lp.c
>
> +TCP WAVE MODULE
> +M:     "Natale Patriciello" <natale.patriciello@gmail.com>
> +W:     http://tcp-lp-mod.sourceforge.net/

This URL does not work for me... I get "Unable to connect to database server".

> @@ -2522,7 +2522,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
>  {
>         struct sk_buff *skb = tcp_send_head(sk);
>
> -       BUG_ON(!skb || skb->len < mss_now);
> +       /* Don't be forced to send not meaningful data */
> +       if (!skb || skb->len < mss_now)
> +               return;
>
>         tcp_write_xmit(sk, mss_now, TCP_NAGLE_PUSH, 1, sk->sk_allocation);
>  }

This seems unrelated to the rest of the patch, and should probably be
its own patch? Also, IMHO it would be better to leave at least a
WARN_ON or WARN_ON_ONCE here, rather than completely turning this into
a silent failure.

thanks,
neal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations
  2017-07-28 19:59 ` [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations Natale Patriciello
@ 2017-07-29  1:46   ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2017-07-29  1:46 UTC (permalink / raw)
  To: natale.patriciello
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, ahmed.said, zampognaro, roseti

From: Natale Patriciello <natale.patriciello@gmail.com>
Date: Fri, 28 Jul 2017 21:59:16 +0200

> @@ -369,6 +369,9 @@ struct tcp_sock {
>  	 */
>  	struct request_sock *fastopen_rsk;
>  	u32	*saved_syn;
> +
> +/* TCP send timer */
> +	struct timer_list send_timer;
>  };
>  
>  enum tsq_enum {

If this is congestion control specific it should go into the congestion
control algorithm metadata.  If not, then it's OK to be here I guess :)

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 4858e190f6ac..357b9cd5019e 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2187,6 +2187,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>  			   int push_one, gfp_t gfp)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> +	const struct tcp_congestion_ops *ca_ops;
>  	struct sk_buff *skb;
>  	unsigned int tso_segs, sent_pkts;
>  	int cwnd_quota;

Please maintain the reverse christmas tree (longest to shortest) line ordering
of all local variable declarations.

> +	if (timer_pending(&tp->send_timer) == 0) {
> +		/* Timer is not running, push data out */
> +		int ret;
> +		const struct tcp_congestion_ops *ca_ops;

Likewise.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
  2017-07-28 23:15   ` Neal Cardwell
@ 2017-07-29  1:51   ` David Miller
  2017-07-29  1:52   ` David Miller
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2017-07-29  1:51 UTC (permalink / raw)
  To: natale.patriciello
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, ahmed.said, zampognaro, roseti

From: Natale Patriciello <natale.patriciello@gmail.com>
Date: Fri, 28 Jul 2017 21:59:19 +0200

> +/* TCP Wave private struct */
> +struct wavetcp {
> +	/* The module flags */
> +	u8 flags;
> +	/* The current transmission timer (us) */
> +	u32 tx_timer;
> +	/* The current burst size (segments) */
> +	u16 burst;

This style of declaring a data structure wastes a lot of vertical
screen space.  Instead use:

	type	name;	/* comment */

> +static void wavetcp_init(struct sock *sk)
> +{
> +	struct tcp_sock *tp = tcp_sk(sk);
> +	struct wavetcp *ca = inet_csk_ca(sk);

Always declare local variables in longest to shortest line order.

> +	DBG("%u sport: %u [%s]\n", tcp_time_stamp, ca->sport,
> +	    __func__);

DO NOT define your own custom debug logging facilities.

The kernel has millions of mechanism by which you can log information
either in the kernel log buffer or in the kernel trace log.  THere is
everything from dynamic fine-grained run time enable/disable, to
compile time controls.

There is absolutely therefore never a reason to define custom
mechanisms like you are here.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
  2017-07-28 23:15   ` Neal Cardwell
  2017-07-29  1:51   ` David Miller
@ 2017-07-29  1:52   ` David Miller
  2017-07-29 15:32   ` Stephen Hemminger
  2017-07-31 13:39   ` David Laight
  4 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2017-07-29  1:52 UTC (permalink / raw)
  To: natale.patriciello
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, ahmed.said, zampognaro, roseti

From: Natale Patriciello <natale.patriciello@gmail.com>
Date: Fri, 28 Jul 2017 21:59:19 +0200

> +static __always_inline bool test_flag(u8 value, const u8 *flags)

Never, ever, declare functions as inline in foo.c files.

Always let the compiler decide.  No matter how brilliant you think
you are, it always knows better.

And when it doesn't, that's a bug that should be fixed instead of
worked around in our code.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/5] TCP Wave
  2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
                   ` (4 preceding siblings ...)
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
@ 2017-07-29  5:33 ` Eric Dumazet
  2017-09-02 20:33   ` Natale Patriciello
  5 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2017-07-29  5:33 UTC (permalink / raw)
  To: Natale Patriciello
  Cc: David S . Miller, netdev, Ahmed Said, Francesco Zampognaro,
	Cesare Roseti

On Fri, 2017-07-28 at 21:59 +0200, Natale Patriciello wrote:
> Hi,
> We are working on a new TCP congestion control algorithm, aiming at satisfying
> new requirements coming from current networks. For instance, adaptation to
> bandwidth/delay changes (due to mobility, dynamic switching, handover), and
> optimal exploitation of very high link capacity and efficient transmission of
> small objects, irrespective of the underlying link characteristics.
> 
> TCP Wave (TCPW) replaces the window-based transmission paradigm of the standard
> TCP with a burst-based transmission, the ACK-clock scheduling with a
> self-managed timer and the RTT-based congestion control loop with an Ack-based
> Capacity and Congestion Estimation (ACCE) module. In non-technical words, it
> sends data down the stack when its internal timer expires, and the timing of
> the received ACKs contribute to updating this timer regularly.
> 
> We tried to add this new sender paradigm without deeply touching existing code.
> In fact, we added four (optional) new congestion control functions:
> 
> +       /* get the expiration time for the send timer (optional) */
> +       unsigned long (*get_send_timer_exp_time)(struct sock *sk);
> +       /* no data to transmit at the timer expiration (optional) */
> +       void (*no_data_to_transmit)(struct sock *sk);
> +       /* the send timer is expired (optional) */
> +       void (*send_timer_expired)(struct sock *sk);
> +       /* the TCP has sent some segments (optional) */
> +       void (*segment_sent)(struct sock *sk, u32 sent);
> 
> And a timer (tp->send_timer) which uses a send callback to push data down the
> stack. If the first of these function, get_send_timer_exp_time,  is not
> implemented by the current congestion control, then the timer sending timer is
> never set, therefore falling back to the old, ACK-clocked, behavior.

trimmed CC

This patch series seems to have missed recent efforts in TCP stack,
namely TCP pacing.

commit 218af599fa635b107cfe10acf3249c4dfe5e4123 ("tcp: internal
implementation for pacing") added a timer already to get fine grained
packet xmits.

I suggest you rebase your work and try to reuse existing mechanisms.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
                     ` (2 preceding siblings ...)
  2017-07-29  1:52   ` David Miller
@ 2017-07-29 15:32   ` Stephen Hemminger
  2017-07-31 13:39   ` David Laight
  4 siblings, 0 replies; 14+ messages in thread
From: Stephen Hemminger @ 2017-07-29 15:32 UTC (permalink / raw)
  To: Natale Patriciello
  Cc: David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Ahmed Said,
	Francesco Zampognaro, Cesare Roseti

On Fri, 28 Jul 2017 21:59:19 +0200
Natale Patriciello <natale.patriciello@gmail.com> wrote:

> +
> +#define WAVE_DEBUG 1
> +
> +#ifdef WAVE_DEBUG
> +	#define DBG(msg ...) printk(KERN_DEBUG "WAVE_DEBUG: " msg)
> +#else
> +	static inline void DBG(const char *msg, ...) { }
> +#endif
> +

Don't reinvent your own debug macros.
Use standard pr_debug instead.

+
+	if (ca->history != 0)
+		kfree(ca->history);

First, off the comparison should be with NULL not 0.
Secondly, kfree already does the right thing with kfree(NULL).

+		/* Usually the burst end is also reflected in the rs->delivered
+		 * variable. If this is not the case, and such variable is
+		 * behind just for 1 segment, then do this experimental thing
+		 * to re-allineate the burst with the rs->delivered variable.
+		 * In the majority of cases, we went out of allineation because
+		 * of a tail loss probe. */

Put the last */ in column with other parts of block.
		* of a tail loss probe.
		*/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave
  2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
                     ` (3 preceding siblings ...)
  2017-07-29 15:32   ` Stephen Hemminger
@ 2017-07-31 13:39   ` David Laight
  4 siblings, 0 replies; 14+ messages in thread
From: David Laight @ 2017-07-31 13:39 UTC (permalink / raw)
  To: 'Natale Patriciello',
	David S . Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, Ahmed Said, Francesco Zampognaro, Cesare Roseti

From: Natale Patriciello
> Sent: 28 July 2017 20:59
..
> +static __always_inline bool test_flag(u8 value, const u8 *flags)
> +{
> +	return (*flags & value) == value;
> +}
...
> +	if (!test_flag(FLAG_INIT, &ca->flags))
> +		return;
...

That is a completely unnecessary 'helper'.
It has its arguments in the wrong order.
Doesn't need to pass by reference.
Since you only ever check one bit you don't need the '=='.
Any error seems to be silently ignored.
I bet they can't actually happen at all.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/5] TCP Wave
  2017-07-29  5:33 ` [RFC PATCH v1 0/5] " Eric Dumazet
@ 2017-09-02 20:33   ` Natale Patriciello
  0 siblings, 0 replies; 14+ messages in thread
From: Natale Patriciello @ 2017-09-02 20:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Ahmed Said, Francesco Zampognaro, Cesare Roseti

Hello all,
first of all, we would like to thank everyone that commented our
patches; we are working to include all the suggestions (we are at a good
point).

However, we have two questions:

1) How to retrieve information about each connection? Right now we used
debug messages, but we understand it isn't the best option. TCP Wave
users have other values to track rather than congestion window and slow
start threshold. It seems we have two alternatives: (a) use get_info,
that returns strings to be read with ss, (b) open a file under /proc/net
and write data to it, in the same way as tcp_probe do. With option (a)
it is necessary a poll from userspace (for instance, using watch), but
is subjected to delays and maybe not suitable for fast connections
(watch minimum interval is 100 ms). Option (b), toggled with a module
parameter, seems the more viable. Is that correct?

The second one is inline:

On 28/07/17 at 10:33pm, Eric Dumazet wrote:
> On Fri, 2017-07-28 at 21:59 +0200, Natale Patriciello wrote:
> > Hi,
[cut]
> > TCP Wave (TCPW) replaces the window-based transmission paradigm of the standard
> > TCP with a burst-based transmission, the ACK-clock scheduling with a
> > self-managed timer and the RTT-based congestion control loop with an Ack-based
> > Capacity and Congestion Estimation (ACCE) module. In non-technical words, it
> > sends data down the stack when its internal timer expires, and the timing of
> > the received ACKs contribute to updating this timer regularly.
>
> This patch series seems to have missed recent efforts in TCP stack,
> namely TCP pacing.
>
> commit 218af599fa635b107cfe10acf3249c4dfe5e4123 ("tcp: internal
> implementation for pacing") added a timer already to get fine grained
> packet xmits.

Thank you, Eric, for this suggestion; in fact, we had problems with our
implementation of the timer, and we would like to switch to the new
pacing timer entirely. However, a pacing approach is exactly the
opposite we would like to achieve: we want to send a burst of data
(let's say, ten segments) and then wait some amount of time. Do you
think that adding a new congestion control callback that returns the
number of segments to send when the timer expires (default to 1) and another
callback for retrieving the pacing time can be a sound strategy?

Thank you again, have a nice day.

Natale

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-02 20:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28 19:59 [RFC PATCH v1 0/5] TCP Wave Natale Patriciello
2017-07-28 19:59 ` [RFC PATCH v1 1/5] tcp: Added callback for timed sender operations Natale Patriciello
2017-07-28 19:59 ` [RFC PATCH v1 2/5] tcp: Implemented the timing-based operations Natale Patriciello
2017-07-29  1:46   ` David Miller
2017-07-28 19:59 ` [RFC PATCH v1 3/5] tcp: PSH frames sent without timer involved Natale Patriciello
2017-07-28 19:59 ` [RFC PATCH v1 4/5] tcp: Add initial delay to allow data queueing Natale Patriciello
2017-07-28 19:59 ` [RFC PATCH v1 5/5] wave: Added basic version of TCP Wave Natale Patriciello
2017-07-28 23:15   ` Neal Cardwell
2017-07-29  1:51   ` David Miller
2017-07-29  1:52   ` David Miller
2017-07-29 15:32   ` Stephen Hemminger
2017-07-31 13:39   ` David Laight
2017-07-29  5:33 ` [RFC PATCH v1 0/5] " Eric Dumazet
2017-09-02 20:33   ` Natale Patriciello

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.