All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Florian Westphal <fw@strlen.de>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Maxim Mikityanskiy <maximmi@nvidia.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>, Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Brendan Jackman <jackmanb@google.com>,
	Florent Revest <revest@chromium.org>,
	Joe Stringer <joe@cilium.io>, Lorenz Bauer <lmb@cloudflare.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
Subject: Re: [PATCH bpf-next 07/10] bpf: Add helpers to query conntrack info
Date: Thu, 21 Oct 2021 09:36:25 +0200	[thread overview]
Message-ID: <20211021073625.GE7604@breakpoint.cc> (raw)
In-Reply-To: <87r1cfe7sx.fsf@toke.dk>

Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> Florian Westphal <fw@strlen.de> writes:
> 
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> > Lookups should be fine.  Insertions are the problem.
> >> >
> >> > NAT hooks are expected to execute before the insertion into the
> >> > conntrack table.
> >> >
> >> > If you insert before, NAT hooks won't execute, i.e.
> >> > rules that use dnat/redirect/masquerade have no effect.
> >> 
> >> Well yes, if you insert the wrong state into the conntrack table, you're
> >> going to get wrong behaviour. That's sorta expected, there are lots of
> >> things XDP can do to disrupt the packet flow (like just dropping the
> >> packets :)).
> >
> > Sure, but I'm not sure I understand the use case.
> >
> > Insertion at XDP layer turns off netfilters NAT capability, so its
> > incompatible with the classic forwarding path.
> >
> > If thats fine, why do you need to insert into the conntrack table to
> > begin with?  The entire infrastructure its designed for is disabled...
> 
> One of the major selling points of XDP is that you can reuse the
> existing kernel infrastructure instead of having to roll your own. So
> sure, one could implement their own conntrack using BPF maps (as indeed,
> e.g., Cilium has done), but why do that when you can take advantage of
> the existing one in the kernel? Same reason we have the bpf_fib_lookup()
> helper...

Insertion to conntrack via ebpf seems to be bad to me precisely because it
bypasses the existing infra.

In the bypass scenario you're envisioning, who is responsible for
fastpath-or-not decision?

> > In the HW offload case, conntrack is bypassed completely. There is an
> > IPS_(HW)_OFFLOAD_BIT that prevents the flow from expiring.
> 
> That's comparable in execution semantics (stack is bypassed entirely),
> but not in control plane semantics (we lookup from XDP instead of
> pushing flows down to an offload).

I'm not following.  As soon as you do insertion via XDP existing
control plane (*tables ruleset, xfrm and so on) becomes irrelevant.

Say e.g. user has a iptables ruleset that disables conntrack for udp dport
53 to avoid conntrack overhead for local resolver cache.

No longer relevant, ebpf overrides or whatever generates the epbf prog
needs to emulate existing config.

> > I suspect we'd want a way to notify/call an ebpf program instead so we
> > can avoid the ctnetlink -> userspace -> update dance and do the XDP
> > 'flow bypass information update' from inside the kernel and ebpf/XDP
> > reimplementation of the nf flow table (it uses the netfilter ingress
> > hook on the configured devices; everyhing it does should be doable
> > from XDP).
> 
> But the point is exactly that we don't have to duplicate the state into
> BPF, we can make XDP look it up directly.

Normally for fast bypass I'd expect that the bypass infra would want to
access all info in one lookup, but conntrack only gives you the NAT
transformation, so you'll also need a sk lookup and possibly a FIB
lookup later to get the route.
Also maybe an xfrm lookup as well if your bypass infra needs to support
ipsec.

So I neither understand the need for conntrack lookup (*for fast bypass use
case*) nor the need for insert IFF the control plane we have is to be
respected.

  parent reply	other threads:[~2021-10-21  7:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 14:46 [PATCH bpf-next 00/10] New BPF helpers to accelerate synproxy Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 01/10] bpf: Use ipv6_only_sock in bpf_tcp_gen_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 02/10] bpf: Support dual-stack sockets in bpf_tcp_check_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 03/10] bpf: Use EOPNOTSUPP " Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 04/10] bpf: Make errors of bpf_tcp_check_syncookie distinguishable Maxim Mikityanskiy
2021-10-20  3:28   ` John Fastabend
2021-10-20 13:16     ` Maxim Mikityanskiy
2021-10-20 15:26       ` Lorenz Bauer
2021-10-19 14:46 ` [PATCH bpf-next 05/10] bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 06/10] bpf: Expose struct nf_conn to BPF Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 07/10] bpf: Add helpers to query conntrack info Maxim Mikityanskiy
2021-10-20  3:56   ` Kumar Kartikeya Dwivedi
2021-10-20  9:28     ` Florian Westphal
2021-10-20  9:48       ` Toke Høiland-Jørgensen
2021-10-20  9:58         ` Florian Westphal
2021-10-20 12:21           ` Toke Høiland-Jørgensen
2021-10-20 12:44             ` Florian Westphal
2021-10-20 20:54               ` Toke Høiland-Jørgensen
2021-10-20 22:55                 ` David Ahern
2021-10-21  7:36                 ` Florian Westphal [this message]
2021-10-20 13:18     ` Maxim Mikityanskiy
2021-10-20 19:17       ` Kumar Kartikeya Dwivedi
2021-10-20  9:46   ` Toke Høiland-Jørgensen
2021-10-19 14:46 ` [PATCH bpf-next 08/10] bpf: Add helpers to issue and check SYN cookies in XDP Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 09/10] bpf: Add a helper to issue timestamp " Maxim Mikityanskiy
2021-10-19 16:45   ` Eric Dumazet
2021-10-20 13:16     ` Maxim Mikityanskiy
2021-10-20 15:56   ` Lorenz Bauer
2021-10-20 16:16     ` Toke Høiland-Jørgensen
2021-10-22 16:56       ` Maxim Mikityanskiy
2021-10-27  8:34         ` Lorenz Bauer
2021-11-01 11:14       ` Maxim Mikityanskiy
2021-11-03  2:10         ` Yonghong Song
2021-11-03 14:02           ` Maxim Mikityanskiy
2021-11-09  7:11             ` Yonghong Song
2021-11-25 14:34               ` Maxim Mikityanskiy
2021-11-26  5:43                 ` Yonghong Song
2021-11-26 16:50                   ` Maxim Mikityanskiy
2021-11-26 17:07                     ` Yonghong Song
2021-11-29 17:51                       ` Maxim Mikityanskiy
2021-12-01  6:39                         ` Yonghong Song
2021-12-01 18:06                           ` Andrii Nakryiko
2021-10-19 14:46 ` [PATCH bpf-next 10/10] bpf: Add sample for raw syncookie helpers Maxim Mikityanskiy
2021-10-20 18:01   ` Joe Stringer
2021-10-21 17:19     ` Maxim Mikityanskiy
2021-10-21  1:06   ` Alexei Starovoitov
2021-10-21 17:31     ` Maxim Mikityanskiy
2021-10-21 18:50       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211021073625.GE7604@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clang-built-linux@googlegroups.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=jackmanb@google.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=maximmi@nvidia.com \
    --cc=memxor@gmail.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=revest@chromium.org \
    --cc=songliubraving@fb.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.