netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
@ 2019-09-13 19:36 Thomas Higdon
  2019-09-13 19:36 ` [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO Thomas Higdon
  2019-09-13 20:55 ` [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Neal Cardwell
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Higdon @ 2019-09-13 19:36 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh

For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.

Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.

Signed-off-by: Thomas Higdon <tph@fb.com>
---

no changes from v3

 include/linux/tcp.h      | 2 ++
 include/uapi/linux/tcp.h | 2 ++
 net/ipv4/tcp.c           | 2 ++
 net/ipv4/tcp_input.c     | 1 +
 4 files changed, 7 insertions(+)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f3a85a7fb4b1..a01dc78218f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -393,6 +393,8 @@ struct tcp_sock {
 	 */
 	struct request_sock *fastopen_rsk;
 	u32	*saved_syn;
+
+	u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */
 };
 
 enum tsq_enum {
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b3564f85a762..20237987ccc8 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -270,6 +270,8 @@ struct tcp_info {
 	__u64	tcpi_bytes_retrans;  /* RFC4898 tcpEStatsPerfOctetsRetrans */
 	__u32	tcpi_dsack_dups;     /* RFC4898 tcpEStatsStackDSACKDups */
 	__u32	tcpi_reord_seen;     /* reordering events seen */
+
+	__u32	tcpi_rcv_ooopack;    /* Out-of-order packets received */
 };
 
 /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 94df48bcecc2..4cf58208270e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2653,6 +2653,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	tp->rx_opt.saw_tstamp = 0;
 	tp->rx_opt.dsack = 0;
 	tp->rx_opt.num_sacks = 0;
+	tp->rcv_ooopack = 0;
 
 
 	/* Clean up fastopen related fields */
@@ -3295,6 +3296,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_bytes_retrans = tp->bytes_retrans;
 	info->tcpi_dsack_dups = tp->dsack_dups;
 	info->tcpi_reord_seen = tp->reord_seen;
+	info->tcpi_rcv_ooopack = tp->rcv_ooopack;
 	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 706cbb3b2986..2ef333354026 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4555,6 +4555,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	tp->pred_flags = 0;
 	inet_csk_schedule_ack(sk);
 
+	tp->rcv_ooopack += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
 	seq = TCP_SKB_CB(skb)->seq;
 	end_seq = TCP_SKB_CB(skb)->end_seq;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
  2019-09-13 19:36 [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Thomas Higdon
@ 2019-09-13 19:36 ` Thomas Higdon
  2019-09-13 21:02   ` Neal Cardwell
  2019-09-13 20:55 ` [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Neal Cardwell
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Higdon @ 2019-09-13 19:36 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell,
	Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh

Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
performance problems --
> (1) Usually when we're diagnosing TCP performance problems, we do so
> from the sender, since the sender makes most of the
> performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> From the sender-side the thing that would be most useful is to see
> tp->snd_wnd, the receive window that the receiver has advertised to
> the sender.

This serves the purpose of adding an additional __u32 to avoid the
would-be hole caused by the addition of the tcpi_rcvi_ooopack field.

Signed-off-by: Thomas Higdon <tph@fb.com>
---
changes from v3:
 - changed from rcv_wnd to snd_wnd

 include/uapi/linux/tcp.h | 1 +
 net/ipv4/tcp.c           | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 20237987ccc8..240654f22d98 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -272,6 +272,7 @@ struct tcp_info {
 	__u32	tcpi_reord_seen;     /* reordering events seen */
 
 	__u32	tcpi_rcv_ooopack;    /* Out-of-order packets received */
+	__u32	tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
 };
 
 /* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4cf58208270e..79c325a07ba5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3297,6 +3297,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_dsack_dups = tp->dsack_dups;
 	info->tcpi_reord_seen = tp->reord_seen;
 	info->tcpi_rcv_ooopack = tp->rcv_ooopack;
+	info->tcpi_snd_wnd = tp->snd_wnd;
 	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
  2019-09-13 19:36 [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Thomas Higdon
  2019-09-13 19:36 ` [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO Thomas Higdon
@ 2019-09-13 20:55 ` Neal Cardwell
  1 sibling, 0 replies; 7+ messages in thread
From: Neal Cardwell @ 2019-09-13 20:55 UTC (permalink / raw)
  To: Thomas Higdon
  Cc: netdev, Jonathan Lemon, Dave Jones, Eric Dumazet, Dave Taht,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Fri, Sep 13, 2019 at 3:37 PM Thomas Higdon <tph@fb.com> wrote:
>
> For receive-heavy cases on the server-side, we want to track the
> connection quality for individual client IPs. This counter, similar to
> the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
> tracks out-of-order packet reception. By providing this counter in
> TCP_INFO, it will allow understanding to what degree receive-heavy
> sockets are experiencing out-of-order delivery and packet drops
> indicating congestion.
>
> Please note that this is similar to the counter in NetBSD TCP_INFO, and
> has the same name.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
>
> no changes from v3
>
>  include/linux/tcp.h      | 2 ++
>  include/uapi/linux/tcp.h | 2 ++
>  net/ipv4/tcp.c           | 2 ++
>  net/ipv4/tcp_input.c     | 1 +
>  4 files changed, 7 insertions(+)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index f3a85a7fb4b1..a01dc78218f1 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -393,6 +393,8 @@ struct tcp_sock {
>          */
>         struct request_sock *fastopen_rsk;
>         u32     *saved_syn;
> +
> +       u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */

Thanks for adding this.

A thought: putting the new rcv_ooopack field here makes struct
tcp_sock bigger, and increases the odds of taking a cache miss
(according to "pahole" this field is the only one in a new cache
line).

I'd suggest putting the new rcv_ooopack field immediately before
rcv_rtt_last_tsecr. That would use up a 4-byte hole, and would place
it in a cache line already used on TCP receivers (for rcv_rtt logic).
This would make it less likely this new field causes more cache misses
or uses more space.

Details: looking at the output of "pahole" for tcp_sock in various cases:

net-next before this patch:
-------------------------------------
...
        u8                         bpf_sock_ops_cb_flags; /*  2076     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        rcv_rtt_last_tsecr;   /*  2080     4 */

        /* XXX 4 bytes hole, try to pack */

        struct {
                u32                rtt_us;               /*  2088     4 */
                u32                seq;                  /*  2092     4 */
                u64                time;                 /*  2096     8 */
        } rcv_rtt_est;                                   /*  2088    16 */
...
        /* size: 2176, cachelines: 34, members: 134 */
        /* sum members: 2164, holes: 4, sum holes: 12 */
        /* paddings: 3, sum paddings: 12 */


net-next with this patch:
-------------------------------------
...
        u32 *                      saved_syn;            /*  2168     8 */
        /* --- cacheline 34 boundary (2176 bytes) --- */
        u32                        rcv_ooopack;          /*  2176     4 */
...
        /* size: 2184, cachelines: 35, members: 135 */
        /* sum members: 2168, holes: 4, sum holes: 12 */
        /* padding: 4 */
        /* paddings: 3, sum paddings: 12 */
        /* last cacheline: 8 bytes */


net-next with this field in the suggested spot:
-------------------------------------
...
       /* XXX 3 bytes hole, try to pack */

        u32                        rcv_ooopack;          /*  2080     4 */
        u32                        rcv_rtt_last_tsecr;   /*  2084     4 */
        struct {
                u32                rtt_us;               /*  2088     4 */
                u32                seq;                  /*  2092     4 */
                u64                time;                 /*  2096     8 */
        } rcv_rtt_est;                                   /*  2088    16 */
...
        /* size: 2176, cachelines: 34, members: 135 */
        /* sum members: 2168, holes: 3, sum holes: 8 */
        /* paddings: 3, sum paddings: 12 */

neal


neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
  2019-09-13 19:36 ` [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO Thomas Higdon
@ 2019-09-13 21:02   ` Neal Cardwell
  2019-09-13 21:28     ` Yuchung Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Neal Cardwell @ 2019-09-13 21:02 UTC (permalink / raw)
  To: Thomas Higdon
  Cc: netdev, Jonathan Lemon, Dave Jones, Eric Dumazet, Dave Taht,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon <tph@fb.com> wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since the sender makes most of the
> > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > From the sender-side the thing that would be most useful is to see
> > tp->snd_wnd, the receive window that the receiver has advertised to
> > the sender.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
> changes from v3:
>  - changed from rcv_wnd to snd_wnd
>
>  include/uapi/linux/tcp.h | 1 +
>  net/ipv4/tcp.c           | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 20237987ccc8..240654f22d98 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -272,6 +272,7 @@ struct tcp_info {
>         __u32   tcpi_reord_seen;     /* reordering events seen */
>
>         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
>  };

Thanks for adding this!

My run of ./scripts/checkpatch.pl is showing a warning on this line:

WARNING: line over 80 characters
#19: FILE: include/uapi/linux/tcp.h:273:
+       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv
window size */

What if the comment is shortened up to fit in 80 columns and the units
(bytes) are added, something like:

        __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */

neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
  2019-09-13 21:02   ` Neal Cardwell
@ 2019-09-13 21:28     ` Yuchung Cheng
  2019-09-13 21:53       ` Neal Cardwell
  0 siblings, 1 reply; 7+ messages in thread
From: Yuchung Cheng @ 2019-09-13 21:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Dave Taht, Soheil Hassas Yeganeh

On Fri, Sep 13, 2019 at 2:02 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 3:36 PM Thomas Higdon <tph@fb.com> wrote:
> >
> > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> > performance problems --
> > > (1) Usually when we're diagnosing TCP performance problems, we do so
> > > from the sender, since the sender makes most of the
> > > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > > From the sender-side the thing that would be most useful is to see
> > > tp->snd_wnd, the receive window that the receiver has advertised to
> > > the sender.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon <tph@fb.com>
> > ---
> > changes from v3:
> >  - changed from rcv_wnd to snd_wnd
> >
> >  include/uapi/linux/tcp.h | 1 +
> >  net/ipv4/tcp.c           | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > index 20237987ccc8..240654f22d98 100644
> > --- a/include/uapi/linux/tcp.h
> > +++ b/include/uapi/linux/tcp.h
> > @@ -272,6 +272,7 @@ struct tcp_info {
> >         __u32   tcpi_reord_seen;     /* reordering events seen */
> >
> >         __u32   tcpi_rcv_ooopack;    /* Out-of-order packets received */
> > +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv window size */
> >  };
>
> Thanks for adding this!
>
> My run of ./scripts/checkpatch.pl is showing a warning on this line:
>
> WARNING: line over 80 characters
> #19: FILE: include/uapi/linux/tcp.h:273:
> +       __u32   tcpi_snd_wnd;        /* Remote peer's advertised recv
> window size */
>
> What if the comment is shortened up to fit in 80 columns and the units
> (bytes) are added, something like:
>
>         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?
>
> neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
  2019-09-13 21:28     ` Yuchung Cheng
@ 2019-09-13 21:53       ` Neal Cardwell
  2019-09-13 22:41         ` Yuchung Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Neal Cardwell @ 2019-09-13 21:53 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Thomas Higdon, netdev, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Dave Taht, Soheil Hassas Yeganeh

On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng <ycheng@google.com> wrote:
> > What if the comment is shortened up to fit in 80 columns and the units
> > (bytes) are added, something like:
> >
> >         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
> just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?

Good suggestion. I'm on the fence about that one. By itself, I agree
tcpi_peer_rcv_wnd sounds much more clear. But tcpi_snd_wnd has the
virtue of matching both the kernel code (tp->snd_wnd) and RFC 793
(SND.WND). So they both have pros and cons. Maybe someone else feels
more strongly one way or the other.

neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO
  2019-09-13 21:53       ` Neal Cardwell
@ 2019-09-13 22:41         ` Yuchung Cheng
  0 siblings, 0 replies; 7+ messages in thread
From: Yuchung Cheng @ 2019-09-13 22:41 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Thomas Higdon, netdev, Jonathan Lemon, Dave Jones, Eric Dumazet,
	Dave Taht, Soheil Hassas Yeganeh

On Fri, Sep 13, 2019 at 2:53 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 5:29 PM Yuchung Cheng <ycheng@google.com> wrote:
> > > What if the comment is shortened up to fit in 80 columns and the units
> > > (bytes) are added, something like:
> > >
> > >         __u32   tcpi_snd_wnd;        /* peer's advertised recv window (bytes) */
> > just a thought: will tcpi_peer_rcv_wnd be more self-explanatory?
>
> Good suggestion. I'm on the fence about that one. By itself, I agree
> tcpi_peer_rcv_wnd sounds much more clear. But tcpi_snd_wnd has the
> virtue of matching both the kernel code (tp->snd_wnd) and RFC 793
> (SND.WND). So they both have pros and cons. Maybe someone else feels
> more strongly one way or the other.
no strong preference -- snd_wnd is just fine too with proper comment
(for consistency).

also it might be good to comment whether it is scaled or not. it may
not be obvious if the actual value and scale are small.
>
> neal

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-13 22:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 19:36 [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Thomas Higdon
2019-09-13 19:36 ` [PATCH v4 2/2] tcp: Add snd_wnd to TCP_INFO Thomas Higdon
2019-09-13 21:02   ` Neal Cardwell
2019-09-13 21:28     ` Yuchung Cheng
2019-09-13 21:53       ` Neal Cardwell
2019-09-13 22:41         ` Yuchung Cheng
2019-09-13 20:55 ` [PATCH v4 1/2] tcp: Add TCP_INFO counter for packets received out-of-order Neal Cardwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).