All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Eric Dumazet <edumazet@google.com>,
	Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: David Laight <David.Laight@aculab.com>,
	"David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	James Morris <jmorris@namei.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Patrick McHardy <kaber@trash.net>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Salam Noureddine <noureddine@arista.com>,
	Jarod Wilson <jarod@redhat.com>,
	Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>,
	Julian Anastasov <ja@ssi.bg>, Ying Xue <ying.xue@windriver.com>,
	Craig Gallek <kraig@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Edward Jee <edjee@google.com>,
	Julia Lawall <julia.lawall@lip6.fr>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Haakon Bugge <haakon.bugge@oracle.com>,
	Knut Omang <knut.omang@oracle.com>,
	Wei Lin Guay <wei.lin.guay@oracle.com>,
	Santosh Shilimkar <santosh.shilimkar@oracle.com>,
	Yuval Shaia <yuval.shaia@oracle.com>
Subject: Re: [PATCH] net: add per device sg_max_frags for skb
Date: Wed, 13 Jan 2016 16:07:09 +0100	[thread overview]
Message-ID: <5696681D.4060002@stressinduktion.org> (raw)
In-Reply-To: <CANn89iJtB1qJCbBWUTXFo2LRWobQe6aDFb_KEWUhBNiZCNpdWA@mail.gmail.com>

On 13.01.2016 15:19, Eric Dumazet wrote:
> 1) There are no arch with 1K page sizes. Most certainly, if we had
> MAX_SKB_FRAGS=65 some assumptions in the stack would fail.
>
> 2) TCP stack has coalescing support. write(2) or sendmsg(2) should
> append data into the last skb in write queue, and still use 32 KB
> frags.
>      You get pathological skb when using sendpage() or when one thread
> writes data into _multiple_ TCP sockets, since TCP stack uses
>      a per thread 32 KB reserve (
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81
> )
>
> 2) As I said, implementing a limit in TCP stack is not enough. Your
> patch is therefore adding complexity for all users, but is not a
> general solution.
>
>     GRO, tun device, many things can still cook 'big skbs'
>
>      You need to properly implement a fallback, possibly using
> ndo_features_check(), or directly from your ndo_start_xmit()
>
> 3) We currently have a very dumb way to fallback, forcing a linearize
> call, likely to fail if memory is fragmented and skb big.
>
>      You could instead provide a smart helper, trying to reduce the
> number of frags in a skb by chosing adjacent frags and
> re-allocating/merging them.
>
>      By choosing, I mean trying to pick smallest ones to minimize copy
> cost, to get one skb with X less fragment. (X=1 in your case ?)
>
>     I know for example that bnx2x could benefit from such a helper, as
> it has a 13 frags limits.
>     (bnx2x_pkt_req_lin(), called from bnx2x ndo_start_xmit()

As I proposed, we could globally (or per netns) limit the maximum , I 
think this would be okay and could be the best alternative to install 
slow-paths which could be hit quite constantly.

Otherwise, the fallbacks like Eric proposed them are needed. I do not 
see any other choice.

Thanks,
Hannes

WARNING: multiple messages have this Message-ID (diff)
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Eric Dumazet <edumazet@google.com>,
	Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: David Laight <David.Laight@aculab.com>,
	"David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	James Morris <jmorris@namei.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Patrick McHardy <kaber@trash.net>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Salam Noureddine <noureddine@arista.com>,
	Jarod Wilson <jarod@redhat.com>,
	Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>,
	Julian Anastasov <ja@ssi.bg>, Ying Xue <ying.xue@windriver.com>,
	Craig Gallek <kraig@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Edward Jee <edjee@google.com>,
	Julia Lawall <julia.lawall@lip6.fr>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kerne
Subject: Re: [PATCH] net: add per device sg_max_frags for skb
Date: Wed, 13 Jan 2016 16:07:09 +0100	[thread overview]
Message-ID: <5696681D.4060002@stressinduktion.org> (raw)
In-Reply-To: <CANn89iJtB1qJCbBWUTXFo2LRWobQe6aDFb_KEWUhBNiZCNpdWA@mail.gmail.com>

On 13.01.2016 15:19, Eric Dumazet wrote:
> 1) There are no arch with 1K page sizes. Most certainly, if we had
> MAX_SKB_FRAGS=65 some assumptions in the stack would fail.
>
> 2) TCP stack has coalescing support. write(2) or sendmsg(2) should
> append data into the last skb in write queue, and still use 32 KB
> frags.
>      You get pathological skb when using sendpage() or when one thread
> writes data into _multiple_ TCP sockets, since TCP stack uses
>      a per thread 32 KB reserve (
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81
> )
>
> 2) As I said, implementing a limit in TCP stack is not enough. Your
> patch is therefore adding complexity for all users, but is not a
> general solution.
>
>     GRO, tun device, many things can still cook 'big skbs'
>
>      You need to properly implement a fallback, possibly using
> ndo_features_check(), or directly from your ndo_start_xmit()
>
> 3) We currently have a very dumb way to fallback, forcing a linearize
> call, likely to fail if memory is fragmented and skb big.
>
>      You could instead provide a smart helper, trying to reduce the
> number of frags in a skb by chosing adjacent frags and
> re-allocating/merging them.
>
>      By choosing, I mean trying to pick smallest ones to minimize copy
> cost, to get one skb with X less fragment. (X=1 in your case ?)
>
>     I know for example that bnx2x could benefit from such a helper, as
> it has a 13 frags limits.
>     (bnx2x_pkt_req_lin(), called from bnx2x ndo_start_xmit()

As I proposed, we could globally (or per netns) limit the maximum , I 
think this would be okay and could be the best alternative to install 
slow-paths which could be hit quite constantly.

Otherwise, the fallbacks like Eric proposed them are needed. I do not 
see any other choice.

Thanks,
Hannes

  parent reply	other threads:[~2016-01-13 15:07 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 13:16 [PATCH] net: add per device sg_max_frags for skb Hans Westgaard Ry
2016-01-06 13:16 ` Hans Westgaard Ry
2016-01-06 13:59 ` David Laight
2016-01-06 13:59   ` David Laight
2016-01-08  9:55   ` Hans Westgaard Ry
2016-01-08  9:55     ` Hans Westgaard Ry
2016-01-08 10:33     ` David Laight
2016-01-08 10:33       ` David Laight
2016-01-08 11:47     ` Hannes Frederic Sowa
2016-01-08 11:47       ` Hannes Frederic Sowa
2016-01-13 13:57       ` Hans Westgaard Ry
2016-01-13 13:57         ` Hans Westgaard Ry
2016-01-13 14:19         ` Eric Dumazet
2016-01-13 14:19           ` Eric Dumazet
2016-01-13 14:20           ` Eric Dumazet
2016-01-13 14:20             ` Eric Dumazet
2016-01-13 15:07           ` Hannes Frederic Sowa [this message]
2016-01-13 15:07             ` Hannes Frederic Sowa
2016-01-13 15:38           ` David Miller
2016-01-13 15:44             ` Eric Dumazet
2016-01-13 15:44               ` Eric Dumazet
2016-01-13 21:07         ` Eric W. Biederman
2016-01-13 21:07           ` Eric W. Biederman
2016-01-27 13:20     ` [PATCH v2] net:Add sysctl_tcp_sg_max_skb_frags Hans Westgaard Ry
2016-01-27 15:15       ` Eric Dumazet
2016-01-27 18:12         ` Hannes Frederic Sowa
2016-02-01 13:12           ` Hans Westgaard Ry
2016-01-27 20:13       ` David Miller
2016-02-03  8:26     ` [PATCH v3] net:Add sysctl_max_skb_frags Hans Westgaard Ry
2016-02-03  8:26       ` Hans Westgaard Ry
2016-02-03 11:25       ` Herbert Xu
2016-02-03 11:36         ` Hannes Frederic Sowa
2016-02-03 12:20           ` Herbert Xu
2016-02-03 14:03             ` Hannes Frederic Sowa
2016-02-03 14:30             ` Eric Dumazet
2016-02-03 14:30               ` Eric Dumazet
2016-02-03 17:36             ` David Laight
2016-02-03 15:58       ` Alexander Duyck
2016-02-03 16:07         ` Eric Dumazet
2016-02-03 16:07           ` Eric Dumazet
2016-02-03 17:43           ` Alexander Duyck
2016-02-03 17:43             ` Alexander Duyck
2016-02-03 17:54             ` Eric Dumazet
2016-02-03 17:54               ` Eric Dumazet
2016-02-03 18:24               ` Alexander Duyck
2016-02-03 18:24                 ` Alexander Duyck
2016-02-03 19:23                 ` Eric Dumazet
2016-02-03 19:23                   ` Eric Dumazet
2016-02-03 21:03                   ` Alexander Duyck
2016-02-09  9:30       ` David Miller
2016-01-06 14:05 ` [PATCH] net: add per device sg_max_frags for skb Eric Dumazet
2016-01-06 14:05   ` Eric Dumazet
2016-01-08 10:01   ` Hans Westgaard Ry
2016-01-08 10:01     ` Hans Westgaard Ry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5696681D.4060002@stressinduktion.org \
    --to=hannes@stressinduktion.org \
    --cc=David.Laight@aculab.com \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=edjee@google.com \
    --cc=edumazet@google.com \
    --cc=haakon.bugge@oracle.com \
    --cc=hans.westgaard.ry@oracle.com \
    --cc=ja@ssi.bg \
    --cc=jarod@redhat.com \
    --cc=jiri@mellanox.com \
    --cc=jmorris@namei.org \
    --cc=julia.lawall@lip6.fr \
    --cc=kaber@trash.net \
    --cc=knut.omang@oracle.com \
    --cc=kraig@google.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=noureddine@arista.com \
    --cc=santosh.shilimkar@oracle.com \
    --cc=wei.lin.guay@oracle.com \
    --cc=ying.xue@windriver.com \
    --cc=yoshfuji@linux-ipv6.org \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.