All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, netdev <netdev@vger.kernel.org>,
	Coco Li <lixiaoyan@google.com>,
	Alexander Duyck <alexanderduyck@fb.com>
Subject: Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
Date: Fri, 04 Mar 2022 11:00:49 -0800	[thread overview]
Message-ID: <ea73ca6cb4569847d5f2b2a3a5e1f88d78ba1c1a.camel@gmail.com> (raw)
In-Reply-To: <CANn89i+qJmD9At7otrptkCpnqVUCNi6wXNYnKiwJ1jnse5qNgg@mail.gmail.com>

On Fri, 2022-03-04 at 09:09 -0800, Eric Dumazet wrote:
> On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> > 
> > On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > > From: Coco Li <lixiaoyan@google.com>
> > > > 
> > > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > > implement RFC 2675 and insert a custom extension header.
> > > > 
> > > > Note that only TCP stack is currently potentially generating
> > > > jumbograms, and that this extension header is purely local,
> > > > it wont be sent on a physical link.
> > > > 
> > > > This is needed so that packet capture (tcpdump and friends)
> > > > can properly dissect these large packets.
> > > > 
> > > 
> > > 
> > > I am fairly certain I know how you are going to respond, but I will ask
> > > this anyways :-) :
> > > 
> > > The networking stack as it stands today does not care that skb->len >
> > > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > > the larger packet size it just works.
> > > 
> > > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > > removed by the drivers before pushing to hardware. So, the only benefit
> > > of the push and pop of the jumbogram header is for packet sockets and
> > > tc/ebpf programs - assuming those programs understand the header
> > > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > > it is a standard header so apps have a chance to understand the larger
> > > packet size, but what is the likelihood that random apps or even ebpf
> > > programs will understand it?
> > > 
> > > Alternative solutions to the packet socket (ebpf programs have access to
> > > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > > wondering how feasible an alternative solution is to get large packet
> > > sizes across the board with less overhead and changes.
> > 
> > I agree that the header insertion and removal seems like a lot of extra
> > overhead for the sake of correctness. In the Microsoft case I am pretty
> > sure their LSOv2 supported both v4 and v6. I think we could do
> > something similar, we would just need to make certain the device
> > supports it and as such maybe it would make sense to implement it as a
> > gso type flag?
> > 
> > Could we handle the length field like we handle the checksum and place
> > a value in there that we know is wrong, but could be used to provide
> > additional data? Perhaps we could even use it to store the MSS in the
> > form of the length of the first packet so if examined, the packet would
> > look like the first frame of the flow with a set of trailing data.
> > 
> 
> I am a bit sad you did not give all this feedback back in August when
> I presented BIG TCP.
> 

As I recall, I was thinking along the same lines as what you have done
here, but Dave's question about including IPv4 does bring up an
interesting point. And the Microsoft version supported both.

> We did a lot of work in the last 6 months to implement, test all this,
> making sure this worked.
> 
> I am not sure I want to spend another 6 months implementing what you suggest.

I am not saying we have to do this. I am simply stating a "what if"
just to gauge this approach. You could think of it as thinking out
loud, but in written form.

> For instance, input path will not like packets larger than 64KB.
> 
> There is this thing trimming padding bytes, you probably do not want
> to mess with this.

I had overlooked the fact that this is being used on the input path,
the trimming would be an issue. I suppose the fact that the LSOv2
didn't have an Rx counterpart would be one reason for us to not
consider the IPv4 approach.


  reply	other threads:[~2022-03-04 19:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
2022-03-04 19:26   ` Alexander H Duyck
2022-03-04 19:28     ` Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 06/14] ipv6/gro: insert " Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
2022-03-04  4:37   ` David Ahern
2022-03-04 17:16     ` Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
2022-03-04  4:33   ` David Ahern
2022-03-04 15:48     ` Alexander H Duyck
2022-03-04 17:09       ` Eric Dumazet
2022-03-04 19:00         ` Alexander H Duyck [this message]
2022-03-04 19:13           ` Eric Dumazet
2022-03-05 16:53             ` David Ahern
2022-03-04 17:47     ` Eric Dumazet
2022-03-05 16:46       ` David Ahern
2022-03-05 18:08         ` Eric Dumazet
2022-03-05 19:06           ` David Ahern
2022-03-05 16:55   ` David Ahern
2022-03-03 18:16 ` [PATCH v2 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 12/14] ipvlan: " Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
2022-03-08 16:03   ` Tariq Toukan
2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
2022-03-04  4:42   ` David Ahern
2022-03-04 17:14     ` Eric Dumazet
2022-03-05 16:36       ` David Ahern
2022-03-05 17:57         ` Eric Dumazet
2022-03-08 16:02   ` Tariq Toukan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea73ca6cb4569847d5f2b2a3a5e1f88d78ba1c1a.camel@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lixiaoyan@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.