From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> To: "David S. Miller" <davem@davemloft.net> Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>, "Alexey Kuznetsov" <kuznet@ms2.inr.ac.ru>, "James Morris" <jmorris@namei.org>, "Hideaki YOSHIFUJI" <yoshfuji@linux-ipv6.org>, "Patrick McHardy" <kaber@trash.net>, "Alexei Starovoitov" <ast@plumgrid.com>, "Jiri Pirko" <jiri@mellanox.com>, "Eric Dumazet" <edumazet@google.com>, "Daniel Borkmann" <daniel@iogearbox.net>, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>, "\"\"Eric W. Biederman\"\"" <ebiederm@xmission.com>, "Salam Noureddine" <noureddine@arista.com>, "Jarod Wilson" <jarod@redhat.com>, "Toshiaki Makita" <makita.toshiaki@lab.ntt.co.jp>, "Julian Anastasov" <ja@ssi.bg>, "Ying Xue" <ying.xue@windriver.com>, "Craig Gallek" <kraig@google.com>, "Mel Gorman" <mgorman@techsingularity.net>, "\"\"hannes@stressinduktion.org\"\"" <hannes@stressinduktion.org>, "Edward Jee" <edjee@google.com>, "Julia Lawall" <julia.lawall@lip6.fr>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, "Haakon Bugge" <haakon.bugge@oracle.com>, "Knut Omang" <knut.omang@oracle.com>, "Wei Lin Guay" <wei.lin.guay@oracle.com>, "Santosh Shilimkar" <santosh.shilimkar@oracle.com>, "Yuval Shaia" <yuval.shaia@oracle.com> Subject: [PATCH] net: add per device sg_max_frags for skb Date: Wed, 6 Jan 2016 14:16:22 +0100 [thread overview] Message-ID: <1452086182-26748-1-git-send-email-hans.westgaard.ry@oracle.com> (raw) Devices may have limits on the number of fragments in an skb they support. Current codebase uses a constant as maximum for number of fragments (MAX_SKB_FRAGS) one skb can hold and use. When enabling scatter/gather and running traffic with many small messages the codebase uses the maximum number of fragments and thereby violates the max for certain devices. An example of such a violation is when running IPoIB on a HCA supporting 16 SGE on an architecture with 4K pagesize. The MAX_SKB_FRAGS will be 17 (64K/4K+1) and because IPoIB adds yet another segment we end up with send_requests with 18 SGE resulting in kernel-panic. The patch allows the device to limit the maximum number fragments used in one skb. The functionality corresponds to gso_max_size/gso_max_segs for gso. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> --- include/linux/netdevice.h | 8 ++++++++ include/net/sock.h | 2 ++ net/core/dev.c | 1 + net/core/sock.c | 1 + net/ipv4/tcp.c | 4 ++-- 5 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3b5d134..c661865 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1513,6 +1513,8 @@ enum netdev_priv_flags { * NIC for GSO * @gso_min_segs: Minimum number of segments that can be passed to the * NIC for GSO + * @sg_max_frags: Maximum number of fragments that can be passed to the + * NIC for SG * * @dcbnl_ops: Data Center Bridging netlink ops * @num_tc: Number of traffic classes in the net device @@ -1799,6 +1801,7 @@ struct net_device { struct phy_device *phydev; struct lock_class_key *qdisc_tx_busylock; bool proto_down; + u16 sg_max_frags; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -3794,6 +3797,11 @@ static inline void netif_set_gso_max_size(struct net_device *dev, { dev->gso_max_size = size; } +static inline void netif_set_sg_max_frags(struct net_device *dev, + u16 max) +{ + dev->sg_max_frags = min_t(u16, MAX_SKB_FRAGS, max); +} static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, int pulled_hlen, u16 mac_offset, diff --git a/include/net/sock.h b/include/net/sock.h index 52d27ee..c884104 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -274,6 +274,7 @@ struct cg_proto; * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) * @sk_gso_max_size: Maximum GSO segment size to build * @sk_gso_max_segs: Maximum number of GSO segments + * @sk_sg_max_frags: Maximum number of SG fragments * @sk_lingertime: %SO_LINGER l_linger setting * @sk_backlog: always used with the per-socket spinlock held * @sk_callback_lock: used with the callbacks in the end of this struct @@ -456,6 +457,7 @@ struct sock { int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); + u16 sk_sg_max_frags; }; #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) diff --git a/net/core/dev.c b/net/core/dev.c index ae00b89..abfbd3a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7106,6 +7106,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; dev->gso_max_segs = GSO_MAX_SEGS; dev->gso_min_segs = 0; + dev->sg_max_frags = MAX_SKB_FRAGS; INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); diff --git a/net/core/sock.c b/net/core/sock.c index e31dfce..53d0cf0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1621,6 +1621,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) } } sk->sk_gso_max_segs = max_segs; + sk->sk_sg_max_frags = dst->dev->sg_max_frags; } EXPORT_SYMBOL_GPL(sk_setup_caps); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c82cca1..ca5f7a0 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -938,7 +938,7 @@ new_segment: i = skb_shinfo(skb)->nr_frags; can_coalesce = skb_can_coalesce(skb, i, page, offset); - if (!can_coalesce && i >= MAX_SKB_FRAGS) { + if (!can_coalesce && i >= sk->sk_sg_max_frags) { tcp_mark_push(tp, skb); goto new_segment; } @@ -1211,7 +1211,7 @@ new_segment: if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { - if (i == MAX_SKB_FRAGS || !sg) { + if (i >= sk->sk_sg_max_frags || !sg) { tcp_mark_push(tp, skb); goto new_segment; } -- 2.4.3
WARNING: multiple messages have this Message-ID (diff)
From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> To: "David S. Miller" <davem@davemloft.net> Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>, "Alexey Kuznetsov" <kuznet@ms2.inr.ac.ru>, "James Morris" <jmorris@namei.org>, "Hideaki YOSHIFUJI" <yoshfuji@linux-ipv6.org>, "Patrick McHardy" <kaber@trash.net>, "Alexei Starovoitov" <ast@plumgrid.com>, "Jiri Pirko" <jiri@mellanox.com>, "Eric Dumazet" <edumazet@google.com>, "Daniel Borkmann" <daniel@iogearbox.net>, "Nicolas Dichtel" <nicolas.dichtel@6wind.com>, "\"\"Eric W. Biederman\"\"" <ebiederm@xmission.com>, "Salam Noureddine" <noureddine@arista.com>, "Jarod Wilson" <jarod@redhat.com>, "Toshiaki Makita" <makita.toshiaki@lab.ntt.co.jp>, "Julian Anastasov" <ja@ssi.bg>, "Ying Xue" <ying.xue@windriver.com>, "Craig Gallek" <kraig@google.com>, "Mel Gorman" <mgorman@techsingularity.net>, "\"\"hannes@stressinduktion.org\"\"" <hannes@stressinduktion.org>, "Edward Jee" <edjee@google.com>, "Julia Lawall" <julia.lawall@l Subject: [PATCH] net: add per device sg_max_frags for skb Date: Wed, 6 Jan 2016 14:16:22 +0100 [thread overview] Message-ID: <1452086182-26748-1-git-send-email-hans.westgaard.ry@oracle.com> (raw) Devices may have limits on the number of fragments in an skb they support. Current codebase uses a constant as maximum for number of fragments (MAX_SKB_FRAGS) one skb can hold and use. When enabling scatter/gather and running traffic with many small messages the codebase uses the maximum number of fragments and thereby violates the max for certain devices. An example of such a violation is when running IPoIB on a HCA supporting 16 SGE on an architecture with 4K pagesize. The MAX_SKB_FRAGS will be 17 (64K/4K+1) and because IPoIB adds yet another segment we end up with send_requests with 18 SGE resulting in kernel-panic. The patch allows the device to limit the maximum number fragments used in one skb. The functionality corresponds to gso_max_size/gso_max_segs for gso. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> --- include/linux/netdevice.h | 8 ++++++++ include/net/sock.h | 2 ++ net/core/dev.c | 1 + net/core/sock.c | 1 + net/ipv4/tcp.c | 4 ++-- 5 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3b5d134..c661865 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1513,6 +1513,8 @@ enum netdev_priv_flags { * NIC for GSO * @gso_min_segs: Minimum number of segments that can be passed to the * NIC for GSO + * @sg_max_frags: Maximum number of fragments that can be passed to the + * NIC for SG * * @dcbnl_ops: Data Center Bridging netlink ops * @num_tc: Number of traffic classes in the net device @@ -1799,6 +1801,7 @@ struct net_device { struct phy_device *phydev; struct lock_class_key *qdisc_tx_busylock; bool proto_down; + u16 sg_max_frags; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -3794,6 +3797,11 @@ static inline void netif_set_gso_max_size(struct net_device *dev, { dev->gso_max_size = size; } +static inline void netif_set_sg_max_frags(struct net_device *dev, + u16 max) +{ + dev->sg_max_frags = min_t(u16, MAX_SKB_FRAGS, max); +} static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, int pulled_hlen, u16 mac_offset, diff --git a/include/net/sock.h b/include/net/sock.h index 52d27ee..c884104 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -274,6 +274,7 @@ struct cg_proto; * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) * @sk_gso_max_size: Maximum GSO segment size to build * @sk_gso_max_segs: Maximum number of GSO segments + * @sk_sg_max_frags: Maximum number of SG fragments * @sk_lingertime: %SO_LINGER l_linger setting * @sk_backlog: always used with the per-socket spinlock held * @sk_callback_lock: used with the callbacks in the end of this struct @@ -456,6 +457,7 @@ struct sock { int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); + u16 sk_sg_max_frags; }; #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) diff --git a/net/core/dev.c b/net/core/dev.c index ae00b89..abfbd3a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7106,6 +7106,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; dev->gso_max_segs = GSO_MAX_SEGS; dev->gso_min_segs = 0; + dev->sg_max_frags = MAX_SKB_FRAGS; INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); diff --git a/net/core/sock.c b/net/core/sock.c index e31dfce..53d0cf0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1621,6 +1621,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) } } sk->sk_gso_max_segs = max_segs; + sk->sk_sg_max_frags = dst->dev->sg_max_frags; } EXPORT_SYMBOL_GPL(sk_setup_caps); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c82cca1..ca5f7a0 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -938,7 +938,7 @@ new_segment: i = skb_shinfo(skb)->nr_frags; can_coalesce = skb_can_coalesce(skb, i, page, offset); - if (!can_coalesce && i >= MAX_SKB_FRAGS) { + if (!can_coalesce && i >= sk->sk_sg_max_frags) { tcp_mark_push(tp, skb); goto new_segment; } @@ -1211,7 +1211,7 @@ new_segment: if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { - if (i == MAX_SKB_FRAGS || !sg) { + if (i >= sk->sk_sg_max_frags || !sg) { tcp_mark_push(tp, skb); goto new_segment; } -- 2.4.3
next reply other threads:[~2016-01-06 13:19 UTC|newest] Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-01-06 13:16 Hans Westgaard Ry [this message] 2016-01-06 13:16 ` [PATCH] net: add per device sg_max_frags for skb Hans Westgaard Ry 2016-01-06 13:59 ` David Laight 2016-01-06 13:59 ` David Laight 2016-01-08 9:55 ` Hans Westgaard Ry 2016-01-08 9:55 ` Hans Westgaard Ry 2016-01-08 10:33 ` David Laight 2016-01-08 10:33 ` David Laight 2016-01-08 11:47 ` Hannes Frederic Sowa 2016-01-08 11:47 ` Hannes Frederic Sowa 2016-01-13 13:57 ` Hans Westgaard Ry 2016-01-13 13:57 ` Hans Westgaard Ry 2016-01-13 14:19 ` Eric Dumazet 2016-01-13 14:19 ` Eric Dumazet 2016-01-13 14:20 ` Eric Dumazet 2016-01-13 14:20 ` Eric Dumazet 2016-01-13 15:07 ` Hannes Frederic Sowa 2016-01-13 15:07 ` Hannes Frederic Sowa 2016-01-13 15:38 ` David Miller 2016-01-13 15:44 ` Eric Dumazet 2016-01-13 15:44 ` Eric Dumazet 2016-01-13 21:07 ` Eric W. Biederman 2016-01-13 21:07 ` Eric W. Biederman 2016-01-27 13:20 ` [PATCH v2] net:Add sysctl_tcp_sg_max_skb_frags Hans Westgaard Ry 2016-01-27 15:15 ` Eric Dumazet 2016-01-27 18:12 ` Hannes Frederic Sowa 2016-02-01 13:12 ` Hans Westgaard Ry 2016-01-27 20:13 ` David Miller 2016-02-03 8:26 ` [PATCH v3] net:Add sysctl_max_skb_frags Hans Westgaard Ry 2016-02-03 8:26 ` Hans Westgaard Ry 2016-02-03 11:25 ` Herbert Xu 2016-02-03 11:36 ` Hannes Frederic Sowa 2016-02-03 12:20 ` Herbert Xu 2016-02-03 14:03 ` Hannes Frederic Sowa 2016-02-03 14:30 ` Eric Dumazet 2016-02-03 14:30 ` Eric Dumazet 2016-02-03 17:36 ` David Laight 2016-02-03 15:58 ` Alexander Duyck 2016-02-03 16:07 ` Eric Dumazet 2016-02-03 16:07 ` Eric Dumazet 2016-02-03 17:43 ` Alexander Duyck 2016-02-03 17:43 ` Alexander Duyck 2016-02-03 17:54 ` Eric Dumazet 2016-02-03 17:54 ` Eric Dumazet 2016-02-03 18:24 ` Alexander Duyck 2016-02-03 18:24 ` Alexander Duyck 2016-02-03 19:23 ` Eric Dumazet 2016-02-03 19:23 ` Eric Dumazet 2016-02-03 21:03 ` Alexander Duyck 2016-02-09 9:30 ` David Miller 2016-01-06 14:05 ` [PATCH] net: add per device sg_max_frags for skb Eric Dumazet 2016-01-06 14:05 ` Eric Dumazet 2016-01-08 10:01 ` Hans Westgaard Ry 2016-01-08 10:01 ` Hans Westgaard Ry
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1452086182-26748-1-git-send-email-hans.westgaard.ry@oracle.com \ --to=hans.westgaard.ry@oracle.com \ --cc=ast@plumgrid.com \ --cc=daniel@iogearbox.net \ --cc=davem@davemloft.net \ --cc=ebiederm@xmission.com \ --cc=edjee@google.com \ --cc=edumazet@google.com \ --cc=haakon.bugge@oracle.com \ --cc=hannes@stressinduktion.org \ --cc=ja@ssi.bg \ --cc=jarod@redhat.com \ --cc=jiri@mellanox.com \ --cc=jmorris@namei.org \ --cc=julia.lawall@lip6.fr \ --cc=kaber@trash.net \ --cc=knut.omang@oracle.com \ --cc=kraig@google.com \ --cc=kuznet@ms2.inr.ac.ru \ --cc=linux-kernel@vger.kernel.org \ --cc=makita.toshiaki@lab.ntt.co.jp \ --cc=mgorman@techsingularity.net \ --cc=netdev@vger.kernel.org \ --cc=nicolas.dichtel@6wind.com \ --cc=noureddine@arista.com \ --cc=santosh.shilimkar@oracle.com \ --cc=wei.lin.guay@oracle.com \ --cc=ying.xue@windriver.com \ --cc=yoshfuji@linux-ipv6.org \ --cc=yuval.shaia@oracle.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.