From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757856AbcBCP6I (ORCPT ); Wed, 3 Feb 2016 10:58:08 -0500 Received: from mail-ig0-f171.google.com ([209.85.213.171]:38333 "EHLO mail-ig0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757802AbcBCP6D convert rfc822-to-8bit (ORCPT ); Wed, 3 Feb 2016 10:58:03 -0500 MIME-Version: 1.0 In-Reply-To: <1454488017-8822-1-git-send-email-hans.westgaard.ry@oracle.com> References: <568F87AC.60405@oracle.com> <1454488017-8822-1-git-send-email-hans.westgaard.ry@oracle.com> Date: Wed, 3 Feb 2016 07:58:01 -0800 Message-ID: Subject: Re: [PATCH v3] net:Add sysctl_max_skb_frags From: Alexander Duyck To: Hans Westgaard Ry Cc: "David S. Miller" , Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , Tom Herbert , Pablo Neira Ayuso , Eric Dumazet , Florian Westphal , Jiri Pirko , Alexander Duyck , Michal Hocko , =?UTF-8?Q?Linus_L=C3=BCssing?= , Hannes Frederic Sowa , Herbert Xu , Tejun Heo , Andrew Morton , Alexey Kodanev , =?UTF-8?B?SMOla29uIEJ1Z2dl?= , open list , "open list:NETWORKING [GENERAL]" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 3, 2016 at 12:26 AM, Hans Westgaard Ry wrote: > Devices may have limits on the number of fragments in an skb they support. > Current codebase uses a constant as maximum for number of fragments one > skb can hold and use. > When enabling scatter/gather and running traffic with many small messages > the codebase uses the maximum number of fragments and may thereby violate > the max for certain devices. > The patch introduces a global variable as max number of fragments. > > Signed-off-by: Hans Westgaard Ry > Reviewed-by: HÃ¥kon Bugge > > --- > include/linux/skbuff.h | 1 + > net/core/skbuff.c | 2 ++ > net/core/sysctl_net_core.c | 10 ++++++++++ > net/ipv4/tcp.c | 4 ++-- > 4 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 4355129..fe47ad3 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -219,6 +219,7 @@ struct sk_buff; > #else > #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1) > #endif > +extern int sysctl_max_skb_frags; > > typedef struct skb_frag_struct skb_frag_t; > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 152b9c7..c336b97 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -79,6 +79,8 @@ > > struct kmem_cache *skbuff_head_cache __read_mostly; > static struct kmem_cache *skbuff_fclone_cache __read_mostly; > +int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS; > +EXPORT_SYMBOL(sysctl_max_skb_frags); > > /** > * skb_panic - private function for out-of-line support > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c > index 95b6139..a6beb7b 100644 > --- a/net/core/sysctl_net_core.c > +++ b/net/core/sysctl_net_core.c I really don't think these changes belong in the core. Below you only modify the TCP code path so this more likely belongs in the TCP path unless you are going to guarantee that all other code paths obey the sysctl. It probably belongs in net/ipv4/sysctl_net_ipv4.c > @@ -26,6 +26,7 @@ static int zero = 0; > static int one = 1; > static int min_sndbuf = SOCK_MIN_SNDBUF; > static int min_rcvbuf = SOCK_MIN_RCVBUF; > +static int max_skb_frags = MAX_SKB_FRAGS; > > static int net_msg_warn; /* Unused, but still a sysctl */ > > @@ -392,6 +393,15 @@ static struct ctl_table net_core_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec > }, > + { > + .procname = "max_skb_frags", > + .data = &sysctl_max_skb_frags, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = &one, > + .extra2 = &max_skb_frags, > + }, > { } > }; I'm not really a fan of this name either. Maybe it should be something like tcp_max_gso_frags. > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index c82cca1..3dc7a2fd 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -938,7 +938,7 @@ new_segment: > > i = skb_shinfo(skb)->nr_frags; > can_coalesce = skb_can_coalesce(skb, i, page, offset); > - if (!can_coalesce && i >= MAX_SKB_FRAGS) { > + if (!can_coalesce && i >= sysctl_max_skb_frags) { > tcp_mark_push(tp, skb); > goto new_segment; > } > @@ -1211,7 +1211,7 @@ new_segment: > > if (!skb_can_coalesce(skb, i, pfrag->page, > pfrag->offset)) { > - if (i == MAX_SKB_FRAGS || !sg) { > + if (i == sysctl_max_skb_frags || !sg) { > tcp_mark_push(tp, skb); > goto new_segment; > } This bit looks good. I was wondering. Have you considered looking at something like what was done with gso_max_size? It seems like it is meant to address a problem similar to what you have described where the NICs only support a certain layout for the GSO frame. Though now that I look over the code it seems like it might be flawed in that I don't see bridges or tunnels really respecting the value so it seems like they could cause issues. - Alex