Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Tyler Hicks <tyhicks@canonical.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jonathan Looney <jtl@netflix.com>,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	Bruce Curtis <brucec@netflix.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>
Subject: Re: [PATCH net 3/4] tcp: add tcp_min_snd_mss sysctl
Date: Mon, 17 Jun 2019 12:18:01 -0500
Message-ID: <20190617171800.GA5577@lindsey> (raw)
In-Reply-To: <20190617170354.37770-4-edumazet@google.com>

On 2019-06-17 10:03:53, Eric Dumazet wrote:
> Some TCP peers announce a very small MSS option in their SYN and/or
> SYN/ACK messages.
> 
> This forces the stack to send packets with a very high network/cpu
> overhead.
> 
> Linux has enforced a minimal value of 48. Since this value includes
> the size of TCP options, and that the options can consume up to 40
> bytes, this means that each segment can include only 8 bytes of payload.
> 
> In some cases, it can be useful to increase the minimal value
> to a saner value.
> 
> We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
> reasons.
> 
> Note that TCP_MAXSEG socket option enforces a minimal value
> of (TCP_MIN_MSS). David Miller increased this minimal value
> in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
> from 64 to 88.
> 
> We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
> 
> CVE-2019-11479 -- tcp mss hardcoded to 48
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Suggested-by: Jonathan Looney <jtl@netflix.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Tyler Hicks <tyhicks@canonical.com>

I've given the two sysctl patches a close review and some testing.

Acked-by: Tyler Hicks <tyhicks@canonical.com>

Tyler

> Cc: Bruce Curtis <brucec@netflix.com>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  Documentation/networking/ip-sysctl.txt |  8 ++++++++
>  include/net/netns/ipv4.h               |  1 +
>  net/ipv4/sysctl_net_ipv4.c             | 11 +++++++++++
>  net/ipv4/tcp_ipv4.c                    |  1 +
>  net/ipv4/tcp_output.c                  |  3 +--
>  5 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 288aa264ac26d98637a5bb1babc334bfc699bef1..22f6b8b1110ad20c36e7ceea6d67fd2cc938eb7b 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -255,6 +255,14 @@ tcp_base_mss - INTEGER
>  	Path MTU discovery (MTU probing).  If MTU probing is enabled,
>  	this is the initial MSS used by the connection.
>  
> +tcp_min_snd_mss - INTEGER
> +	TCP SYN and SYNACK messages usually advertise an ADVMSS option,
> +	as described in RFC 1122 and RFC 6691.
> +	If this ADVMSS option is smaller than tcp_min_snd_mss,
> +	it is silently capped to tcp_min_snd_mss.
> +
> +	Default : 48 (at least 8 bytes of payload per segment)
> +
>  tcp_congestion_control - STRING
>  	Set the congestion control algorithm to be used for new
>  	connections. The algorithm "reno" is always available, but
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 7698460a3dd1e5070e12d406b3ee58834688cdc9..623cfbb7b8dcbb2a6d8325ec010aff78bbdf8839 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -117,6 +117,7 @@ struct netns_ipv4 {
>  #endif
>  	int sysctl_tcp_mtu_probing;
>  	int sysctl_tcp_base_mss;
> +	int sysctl_tcp_min_snd_mss;
>  	int sysctl_tcp_probe_threshold;
>  	u32 sysctl_tcp_probe_interval;
>  
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index fa213bd8e233b577114815ca2227f08264e7df06..b6f14af926faf80f1686549bee7154c584dc63e6 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -39,6 +39,8 @@ static int ip_local_port_range_min[] = { 1, 1 };
>  static int ip_local_port_range_max[] = { 65535, 65535 };
>  static int tcp_adv_win_scale_min = -31;
>  static int tcp_adv_win_scale_max = 31;
> +static int tcp_min_snd_mss_min = TCP_MIN_SND_MSS;
> +static int tcp_min_snd_mss_max = 65535;
>  static int ip_privileged_port_min;
>  static int ip_privileged_port_max = 65535;
>  static int ip_ttl_min = 1;
> @@ -769,6 +771,15 @@ static struct ctl_table ipv4_net_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname	= "tcp_min_snd_mss",
> +		.data		= &init_net.ipv4.sysctl_tcp_min_snd_mss,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &tcp_min_snd_mss_min,
> +		.extra2		= &tcp_min_snd_mss_max,
> +	},
>  	{
>  		.procname	= "tcp_probe_threshold",
>  		.data		= &init_net.ipv4.sysctl_tcp_probe_threshold,
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index bc86f9735f4577d50d94f42b10edb6ba95bb7a05..cfa81190a1b1af30d05f4f6cd84c05b025a6afeb 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -2628,6 +2628,7 @@ static int __net_init tcp_sk_init(struct net *net)
>  	net->ipv4.sysctl_tcp_ecn_fallback = 1;
>  
>  	net->ipv4.sysctl_tcp_base_mss = TCP_BASE_MSS;
> +	net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
>  	net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
>  	net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
>  
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 1bb1c46b4abad100622d3f101a0a3ca0a6c8e881..00c01a01b547ec67c971dc25a74c9258563cf871 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1459,8 +1459,7 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
>  	mss_now -= icsk->icsk_ext_hdr_len;
>  
>  	/* Then reserve room for full set of TCP options and 8 bytes of data */
> -	if (mss_now < TCP_MIN_SND_MSS)
> -		mss_now = TCP_MIN_SND_MSS;
> +	mss_now = max(mss_now, sock_net(sk)->ipv4.sysctl_tcp_min_snd_mss);
>  	return mss_now;
>  }
>  
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 

  parent reply index

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 17:03 [PATCH net 0/4] tcp: make sack processing more robust Eric Dumazet
2019-06-17 17:03 ` [PATCH net 1/4] tcp: limit payload size of sacked skbs Eric Dumazet
2019-06-17 17:14   ` Jonathan Lemon
2019-06-17 17:03 ` [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits Eric Dumazet
2019-06-17 17:14   ` Jonathan Lemon
2019-06-18  0:18   ` Christoph Paasch
2019-06-18  2:28     ` Eric Dumazet
2019-06-18  3:19       ` Christoph Paasch
2019-06-18  3:44         ` Eric Dumazet
2019-06-18  3:53           ` Christoph Paasch
2019-06-18  4:08             ` Eric Dumazet
2019-07-10 18:23         ` Prout, Andrew - LLSC - MITLL
2019-07-10 18:28           ` Eric Dumazet
2019-07-10 18:53             ` Prout, Andrew - LLSC - MITLL
2019-07-10 19:26               ` Eric Dumazet
2019-07-11  7:28                 ` Christoph Paasch
2019-07-11  9:19                   ` Eric Dumazet
2019-07-11 18:26                     ` Michal Kubecek
2019-07-11 18:50                       ` Eric Dumazet
2019-07-11 10:18                   ` Eric Dumazet
2019-07-11 17:14                 ` Prout, Andrew - LLSC - MITLL
2019-07-11 18:28                   ` Eric Dumazet
2019-07-11 19:04                     ` Jonathan Lemon
2019-07-12  7:05                       ` Eric Dumazet
2019-07-16 15:13                   ` Prout, Andrew - LLSC - MITLL
2019-06-17 17:03 ` [PATCH net 3/4] tcp: add tcp_min_snd_mss sysctl Eric Dumazet
2019-06-17 17:15   ` Jonathan Lemon
2019-06-17 17:18   ` Tyler Hicks [this message]
2019-06-17 17:03 ` [PATCH net 4/4] tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() Eric Dumazet
2019-06-17 17:16   ` Jonathan Lemon
2019-06-17 17:18   ` Tyler Hicks
2019-06-17 17:41 ` [PATCH net 0/4] tcp: make sack processing more robust David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190617171800.GA5577@lindsey \
    --to=tyhicks@canonical.com \
    --cc=brucec@netflix.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=jtl@netflix.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git