netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Sandesh Dhawaskar Sathyanarayana 
	<Sandesh.DhawaskarSathyanarayana@colorado.edu>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Fingerhut, John Andy" <john.andy.fingerhut@intel.com>,
	bpf <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Petr Lapukhov" <petr@fb.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Prashanth Kannan" <pkannan@fb.com>
Subject: Re: How to limit TCP packet lengths given to TC egress EBPF programs?
Date: Fri, 16 Jul 2021 13:48:17 -0700	[thread overview]
Message-ID: <20210716204817.bsh7xpoyrdmvupix@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAADnVQL2zxCjq0AwTXVgrfs9Cem_7vyzTJj2novVJqObGFK52w@mail.gmail.com>

On Thu, Jul 15, 2021 at 05:14:45PM -0700, Alexei Starovoitov wrote:
> On Thu, Jul 15, 2021 at 4:26 PM Sandesh Dhawaskar Sathyanarayana
> <Sandesh.DhawaskarSathyanarayana@colorado.edu> wrote:
> >
> > Hi ,
> 
> Please do not top post and do not use html in your replies.
> 
> > I tested the new TCP experimental headers as INT headers in TCP options. But this does not work.
> > Programmable switch looks for the INT header after 20 bytes of TCP header. If it finds INT then it just appends its own INT data by parsing INT field in TCP,else it appends its own INT header with data after 20 bytes and if any TCP option is present it will append that after INT.
Does it mean the switch can only look for INT at the fixed 20 bytes offset?

> > Now if we use the TCP options field in the end host as INT fields, the switch looks at TCP header options as INT and appends just the data. Now that the switch has consumed TCP option as INT data, it will not find TCP options to append after it puts its INT data as result the packets will be dropped in the switch.
This part I still don't understand after parsing a few times.

Does it mean the switch has an issue when "INT" is put under a proper
TCP header option like (kind, length, INT)?

> >
> > Hence we need a new way to create INT header space in the TCP kernel stack itself.
> >
> >
> > Here is what I did:
> >
> > 1. Reserved the space in the TCP header option using BPF_SOCK_OPS_HDR_OPT_LEN_CB.
> > 2. Used the TC-eBPF at egress to write INT header in this field.
> 
> Hard to guess without looking at the actual code,
> but sounds like you did bpf_reserve_hdr_opt() as sockops program,
> but didn't do bpf_store_hdr_opt() in BPF_SOCK_OPS_WRITE_HDR_OPT_CB ?
> and instead tried to write it in TC layer?
> That won't work of course.
> Please see progs/test_tcp_hdr_options.c example.
> 
> cc-ing Martin for further questions.
> 
> > But these packets get dropped at switch as the TCP length doesn;t match.
Why the length does not match?  Which box changed the header option
without changing the offset in the tcp header?

> >
> > Also another challenge in appending the INT in the end host at TC-eBPF (currently no support for TCP) is it breaks the TCP SYN and ACK mechanism resulting in spurious retransmissions.  As kernel is not aware of appended data in TC-eBPF at egress.
hmm... so this TC-eBPF approach is a separate attempt unrelated to
the tcp@BPF_SOCK_OPS_HDR_OPT_LEN_CB above, right?

and what caused the tcp syn get dropped?  the tcp data offset (th->doff)?

I think we are not on the same page.  May be start with these questions:
1. Please share the bpf code handling BPF_SOCK_OPS_HDR_OPT_LEN_CB.
2. Exactly how you wanted the TCP header, INT, header option to look like
   for an outgoing packet from the sender end-host.  To be practical,
   lets assume there is always a timestamp option.
   Please spell out the value in th->doff (data offset) and
   also what header option "kind" you have used to store the "INT".
   It seems you have used the experimental kind (254)?
3. When the switch sees the "INT" header option, how does it
   append its data and how the tcp header will look like at the end.

      reply	other threads:[~2021-07-16 20:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-09 18:38 How to limit TCP packet lengths given to TC egress EBPF programs? Fingerhut, John Andy
2021-07-13 23:51 ` Alexei Starovoitov
     [not found]   ` <CABX6iNqj-ojymaPhPtgeOGxtUS6evyrvN69MrLD7s_+Z3xAK+w@mail.gmail.com>
2021-07-16  0:14     ` Alexei Starovoitov
2021-07-16 20:48       ` Martin KaFai Lau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210716204817.bsh7xpoyrdmvupix@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=Sandesh.DhawaskarSathyanarayana@colorado.edu \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.andy.fingerhut@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=petr@fb.com \
    --cc=pkannan@fb.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).