All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Florian Westphal <fw@strlen.de>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Maxim Mikityanskiy <maximmi@nvidia.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>, Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Brendan Jackman <jackmanb@google.com>,
	Florent Revest <revest@chromium.org>,
	Joe Stringer <joe@cilium.io>, Lorenz Bauer <lmb@cloudflare.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	clang-built-linux@googlegroups.com
Subject: Re: [PATCH bpf-next 07/10] bpf: Add helpers to query conntrack info
Date: Wed, 20 Oct 2021 14:44:57 +0200	[thread overview]
Message-ID: <20211020124457.GA7604@breakpoint.cc> (raw)
In-Reply-To: <875ytrga3p.fsf@toke.dk>

Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > Lookups should be fine.  Insertions are the problem.
> >
> > NAT hooks are expected to execute before the insertion into the
> > conntrack table.
> >
> > If you insert before, NAT hooks won't execute, i.e.
> > rules that use dnat/redirect/masquerade have no effect.
> 
> Well yes, if you insert the wrong state into the conntrack table, you're
> going to get wrong behaviour. That's sorta expected, there are lots of
> things XDP can do to disrupt the packet flow (like just dropping the
> packets :)).

Sure, but I'm not sure I understand the use case.

Insertion at XDP layer turns off netfilters NAT capability, so its
incompatible with the classic forwarding path.

If thats fine, why do you need to insert into the conntrack table to
begin with?  The entire infrastructure its designed for is disabled...

> > I don't think there is anything that stands in the way of replicating
> > this via XDP.
> 
> What I want to be able to do is write an XDP program that does the following:
> 
> 1. Parse the packet header and determine if it's a packet type we know
>    how to handle. If not, just return XDP_PASS and let the stack deal
>    with corner cases.
> 
> 2. If we know how to handle the packet (say, it's TCP or UDP), do a
>    lookup into conntrack to figure out if there's state for it and we
>    need to do things like NAT.
> 
> 3. If we need to NAT, rewrite the packet based on the information we got
>    back from conntrack.

You could already do that by storing that info in bpf maps
The ctnetlink event generated on conntrack insertion contains the NAT
mapping information, so you could have a userspace daemon that
intercepts those to update the map.

> 4. Update the conntrack state to be consistent with the packet, and then
>    redirect it out the destination interface.
> 
> I.e., in the common case the packet doesn't go through the stack at all;
> but we need to make conntrack aware that we processed the packet so the
> entry doesn't expire (and any state related to the flow gets updated).

In the HW offload case, conntrack is bypassed completely. There is an
IPS_(HW)_OFFLOAD_BIT that prevents the flow from expiring.

> Ideally we should also be able to create new state for a flow we haven't
> seen before.

The way HW offload was intended to work is to allow users to express
what flows should be offloaded via 'flow add' expression in nftables, so
they can e.g. use byte counters or rate estimators etc. to make such
a decision.  So initial packet always passes via normal stack.

This is also needed to consider e.g. XFRM -- nft_flow_offload.c won't
offload if the packet has a secpath attached (i.e., will get encrypted
later).

I suspect we'd want a way to notify/call an ebpf program instead so we
can avoid the ctnetlink -> userspace -> update dance and do the XDP
'flow bypass information update' from inside the kernel and ebpf/XDP
reimplementation of the nf flow table (it uses the netfilter ingress
hook on the configured devices; everyhing it does should be doable
from XDP).

> This requires updating of state, but I see no reason why this shouldn't
> be possible?

Updating ct->status is problematic, there would have to be extra checks
that prevent non-atomic writes and toggling of special bits such as
CONFIRMED, TEMPLATE or DYING.  Adding a helper to toggle something
specific, e.g. the offload state bit, should be okay.

  reply	other threads:[~2021-10-20 12:46 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 14:46 [PATCH bpf-next 00/10] New BPF helpers to accelerate synproxy Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 01/10] bpf: Use ipv6_only_sock in bpf_tcp_gen_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 02/10] bpf: Support dual-stack sockets in bpf_tcp_check_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 03/10] bpf: Use EOPNOTSUPP " Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 04/10] bpf: Make errors of bpf_tcp_check_syncookie distinguishable Maxim Mikityanskiy
2021-10-20  3:28   ` John Fastabend
2021-10-20 13:16     ` Maxim Mikityanskiy
2021-10-20 15:26       ` Lorenz Bauer
2021-10-19 14:46 ` [PATCH bpf-next 05/10] bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 06/10] bpf: Expose struct nf_conn to BPF Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 07/10] bpf: Add helpers to query conntrack info Maxim Mikityanskiy
2021-10-20  3:56   ` Kumar Kartikeya Dwivedi
2021-10-20  9:28     ` Florian Westphal
2021-10-20  9:48       ` Toke Høiland-Jørgensen
2021-10-20  9:58         ` Florian Westphal
2021-10-20 12:21           ` Toke Høiland-Jørgensen
2021-10-20 12:44             ` Florian Westphal [this message]
2021-10-20 20:54               ` Toke Høiland-Jørgensen
2021-10-20 22:55                 ` David Ahern
2021-10-21  7:36                 ` Florian Westphal
2021-10-20 13:18     ` Maxim Mikityanskiy
2021-10-20 19:17       ` Kumar Kartikeya Dwivedi
2021-10-20  9:46   ` Toke Høiland-Jørgensen
2021-10-19 14:46 ` [PATCH bpf-next 08/10] bpf: Add helpers to issue and check SYN cookies in XDP Maxim Mikityanskiy
2021-10-19 14:46 ` [PATCH bpf-next 09/10] bpf: Add a helper to issue timestamp " Maxim Mikityanskiy
2021-10-19 16:45   ` Eric Dumazet
2021-10-20 13:16     ` Maxim Mikityanskiy
2021-10-20 15:56   ` Lorenz Bauer
2021-10-20 16:16     ` Toke Høiland-Jørgensen
2021-10-22 16:56       ` Maxim Mikityanskiy
2021-10-27  8:34         ` Lorenz Bauer
2021-11-01 11:14       ` Maxim Mikityanskiy
2021-11-03  2:10         ` Yonghong Song
2021-11-03 14:02           ` Maxim Mikityanskiy
2021-11-09  7:11             ` Yonghong Song
2021-11-25 14:34               ` Maxim Mikityanskiy
2021-11-26  5:43                 ` Yonghong Song
2021-11-26 16:50                   ` Maxim Mikityanskiy
2021-11-26 17:07                     ` Yonghong Song
2021-11-29 17:51                       ` Maxim Mikityanskiy
2021-12-01  6:39                         ` Yonghong Song
2021-12-01 18:06                           ` Andrii Nakryiko
2021-10-19 14:46 ` [PATCH bpf-next 10/10] bpf: Add sample for raw syncookie helpers Maxim Mikityanskiy
2021-10-20 18:01   ` Joe Stringer
2021-10-21 17:19     ` Maxim Mikityanskiy
2021-10-21  1:06   ` Alexei Starovoitov
2021-10-21 17:31     ` Maxim Mikityanskiy
2021-10-21 18:50       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211020124457.GA7604@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clang-built-linux@googlegroups.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=jackmanb@google.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=maximmi@nvidia.com \
    --cc=memxor@gmail.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=revest@chromium.org \
    --cc=songliubraving@fb.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.