All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Nikishkin <vladimir@nikishkin.pw>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com,
	eng.alaamohamedsoliman.am@gmail.com, gnault@redhat.com,
	razor@blackwall.org, idosch@nvidia.com, liuhangbin@gmail.com,
	eyal.birger@gmail.com, jtoppins@redhat.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net-next v7 1/2] Add nolocalbypass option to vxlan.
Date: Tue, 02 May 2023 13:50:38 +0800	[thread overview]
Message-ID: <87ednz9rxn.fsf@laptop.lockywolf.net> (raw)
In-Reply-To: <20230501101215.46682967@hermes.local>


Stephen Hemminger <stephen@networkplumber.org> writes:

> On Tue,  2 May 2023 00:25:29 +0800
> Vladimir Nikishkin <vladimir@nikishkin.pw> wrote:
>
>> If a packet needs to be encapsulated towards a local destination IP and
>> a VXLAN device that matches the destination port and VNI exists, then
>> the packet will be injected into the Rx path as if it was received by
>> the target VXLAN device without undergoing encapsulation. If such a
>> device does not exist, the packet will be dropped.
>> 
>> There are scenarios where we do not want to drop such packets and
>> instead want to let them be encapsulated and locally received by a user
>> space program that post-processes these VXLAN packets.
>> 
>> To that end, add a new VXLAN device attribute that controls whether such
>> packets are dropped or not. When set ("localbypass") these packets are
>> dropped and when unset ("nolocalbypass") the packets are encapsulated
>> and locally delivered to the listening user space application. Default
>> to "localbypass" to maintain existing behavior.
>> 
>> Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw>
>
> Is there some way to use BPF for this. Rather than a special case
> for some userspace program?

Well, in the first patch this was not a special case, but rather change
to the default behaviour. (Which, I guess has been a little too
audacious.)

I am not sure about BPF, but the concrete use-case I have is solvable by
dedicating a packet to a bogus IP, and doing an nftables double-NAT
(source and destination) to 127.0.0.1, which is the way I am solving
this problem now, and I suspect, what most sysadmins who need this
feature would be doing this without this patch.

In fact, among all the people I have talked to about this issue (on
#networking@libera.chat, and elsewhere), nobody considered dropping
packets to be an intuitive thing. The "intuitive logic" here is the
following:

1) I am sending packets to an ip and a port,
2) I have a process listening to packets on this IP and port,
3) Why on Earth are packets not arriving?
4) Even further, why does local behaviour differ from remote behaviour?

So the "special case" is already there by design. The new option is
turning off the special case.

I am aware of the fact that heavy-duty network processing people have a
different perspective on this issue, and that in high-load environments
every tiny bit of performance is of crucial importance, hence "local
bypass" is seen not as a dirty heuristic, but rather as an essential
feature which vastly increases performance, but for "kitchen sink"
sysadmins the current (not documented) behaviour is just baffling.

So I would argue that having an option that, even though it might not be
the most frequently used one, is clearly documented as enabling the most
straightforward behaviour, would be worth it.

And although having a userspace process listening to a vxlan "for
processing" might not be the most frequently used thing (although I do
need it), at least being able to see the packets being sent to local
ports, with, say, tcpdump, in exactly the same way as the packets being
sent to remote addresses, would help sysadmins debug their setups better
even when only the most basic tools available.

I hope that this is convincing enough.

P.S. A apologise for not adding the vxlan: and testing/selftests/net:
prefixes to the patches. I will add them to the next attempt, in
addition to fixing the other issues that might be discovered.

-- 
Your sincerely,
Vladimir Nikishkin (MiEr, lockywolf)
(Laptop)
--
Fastmail.


  reply	other threads:[~2023-05-02  6:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 16:25 [PATCH net-next v7 1/2] Add nolocalbypass option to vxlan Vladimir Nikishkin
2023-05-01 16:25 ` [PATCH net-next v7 2/2] Add tests for vxlan nolocalbypass option Vladimir Nikishkin
2023-05-02 10:14   ` Paolo Abeni
2023-05-04 15:58   ` Ido Schimmel
2023-05-05  1:33     ` Vladimir Nikishkin
2023-05-05  8:52       ` Ido Schimmel
2023-05-01 17:12 ` [PATCH net-next v7 1/2] Add nolocalbypass option to vxlan Stephen Hemminger
2023-05-02  5:50   ` Vladimir Nikishkin [this message]
2023-05-04 13:05 ` Ido Schimmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ednz9rxn.fsf@laptop.lockywolf.net \
    --to=vladimir@nikishkin.pw \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eng.alaamohamedsoliman.am@gmail.com \
    --cc=eyal.birger@gmail.com \
    --cc=gnault@redhat.com \
    --cc=idosch@nvidia.com \
    --cc=jtoppins@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=shuah@kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.