All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Herbert <tom@herbertland.com>
To: David Miller <davem@davemloft.net>
Cc: Alex Duyck <aduyck@mirantis.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	Hannes Frederic Sowa <hannes@redhat.com>,
	Jesse Gross <jesse@kernel.org>,
	Eugenia Emantayev <eugenia@mellanox.com>,
	Jiri Benc <jbenc@redhat.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	Ariel Elior <ariel.elior@qlogic.com>,
	michael.chan@broadcom.com, Dept-GELinuxNICDev@qlogic.com
Subject: Re: [net-next PATCH v3 00/17] Future-proof tunnel offload handlers
Date: Mon, 20 Jun 2016 10:05:01 -0700	[thread overview]
Message-ID: <CALx6S34Y3Qq6nMHuMcC2LC5duLBO3KVN3+A+0qmPrNvFfY7gJg@mail.gmail.com> (raw)
In-Reply-To: <20160617.202641.1821023739498595024.davem@davemloft.net>

> And finally, he added a big comment explaining that new tunnel types
> should not be added to the tunnel type list, and those that exist
> should only be used for RX.
>
> Therefore, this isn't openning the door for new random offloads, quite
> the contrary.  Instead, if it making clearer what the existing
> facilitites support, and putting an explicit cap on them.
>
We have no reason to believe there won't be more offloaded UDP
protocols. There are plenty of other UDP encapsulations, transport
protocols, and application layer protocols that could be offloaded and
some of these may provide protocol specific features that have no
generic analogue. The problem is that this interface is wrong.

The interface and current NIC implementations are based on the
assumption that UDP port number is sufficient to decode a UDP payload.
This is incorrect, UDP port numbers do not have global meaning they
can only be correctly interrupted by the endpoints in the
communication (RFC7605).  Just because port number 4789 is assigned to
VXLAN does not mean that any packet a device sees that has destination
port 4789 is VXLAN, it may very well be something else completely
different. This opens up the very real possibility that devices in the
network, including NICs, misinterpret packets because they are looking
only at port number. Furthermore, if a device modifies packets based
on just port number, say like in GRO, this opens the door to silent
data corruption. This will be a real problem with a router that is
trying UDP offload packets just being forwarded, but even on just a
host this can be an issue when using ns with ipvlan of macvlan.

The design of a UDP offload interface should be predicated on the fact
that we are offloading the receive path of a UDP socket. It follows
that the requirement of the offload device is to match packets that
will be received by the UDP socket. Generally, this means it needs to
at least match by local addresses and port for an unconnected/unbound
socket, the source address for an unconnected/bound socket, a the full
4-tuple for a connected socket. VLAN, macvlan, or anything else that
could affect the selection of receive socket would also need to be
taken in to account. The typical driver interface that provides for
fine grained packet matching is n-tuple filtering.

To address the immediate problem with the current existing UDP HW
offloads I suggest:

1) Disallow UDP HW offload when forwarding is enabled. Applying
offloads to UDP encapsulated packets that are only being forwarded is
not correct.
2) Open a socket for the encapsulation port in each created ns
(similar to how RDS does this). This prevents a UDP port being used
for encapsulation from being bound for other purposes in ns.

btw, we also have this problem in the stack. GRO for UDP encapsulation
is performed before we have verified the packet is for us ("us" being
the encapsulation socket in the default name space).  The solution
here I believe is to move UDP GRO to be in udp_receive, although if
the above are implemented that would also address the issue in SW GRO
at least for the short term. flow_dissector does not attempt crack
open UDP encapsulation so we don't have the issue there.

Thanks,
Tom

WARNING: multiple messages have this Message-ID (diff)
From: Tom Herbert <tom@herbertland.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [net-next PATCH v3 00/17] Future-proof tunnel offload handlers
Date: Mon, 20 Jun 2016 10:05:01 -0700	[thread overview]
Message-ID: <CALx6S34Y3Qq6nMHuMcC2LC5duLBO3KVN3+A+0qmPrNvFfY7gJg@mail.gmail.com> (raw)
In-Reply-To: <20160617.202641.1821023739498595024.davem@davemloft.net>

> And finally, he added a big comment explaining that new tunnel types
> should not be added to the tunnel type list, and those that exist
> should only be used for RX.
>
> Therefore, this isn't openning the door for new random offloads, quite
> the contrary.  Instead, if it making clearer what the existing
> facilitites support, and putting an explicit cap on them.
>
We have no reason to believe there won't be more offloaded UDP
protocols. There are plenty of other UDP encapsulations, transport
protocols, and application layer protocols that could be offloaded and
some of these may provide protocol specific features that have no
generic analogue. The problem is that this interface is wrong.

The interface and current NIC implementations are based on the
assumption that UDP port number is sufficient to decode a UDP payload.
This is incorrect, UDP port numbers do not have global meaning they
can only be correctly interrupted by the endpoints in the
communication (RFC7605).  Just because port number 4789 is assigned to
VXLAN does not mean that any packet a device sees that has destination
port 4789 is VXLAN, it may very well be something else completely
different. This opens up the very real possibility that devices in the
network, including NICs, misinterpret packets because they are looking
only at port number. Furthermore, if a device modifies packets based
on just port number, say like in GRO, this opens the door to silent
data corruption. This will be a real problem with a router that is
trying UDP offload packets just being forwarded, but even on just a
host this can be an issue when using ns with ipvlan of macvlan.

The design of a UDP offload interface should be predicated on the fact
that we are offloading the receive path of a UDP socket. It follows
that the requirement of the offload device is to match packets that
will be received by the UDP socket. Generally, this means it needs to
at least match by local addresses and port for an unconnected/unbound
socket, the source address for an unconnected/bound socket, a the full
4-tuple for a connected socket. VLAN, macvlan, or anything else that
could affect the selection of receive socket would also need to be
taken in to account. The typical driver interface that provides for
fine grained packet matching is n-tuple filtering.

To address the immediate problem with the current existing UDP HW
offloads I suggest:

1) Disallow UDP HW offload when forwarding is enabled. Applying
offloads to UDP encapsulated packets that are only being forwarded is
not correct.
2) Open a socket for the encapsulation port in each created ns
(similar to how RDS does this). This prevents a UDP port being used
for encapsulation from being bound for other purposes in ns.

btw, we also have this problem in the stack. GRO for UDP encapsulation
is performed before we have verified the packet is for us ("us" being
the encapsulation socket in the default name space).  The solution
here I believe is to move UDP GRO to be in udp_receive, although if
the above are implemented that would also address the issue in SW GRO
at least for the short term. flow_dissector does not attempt crack
open UDP encapsulation so we don't have the issue there.

Thanks,
Tom

  reply	other threads:[~2016-06-20 17:05 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 19:20 [net-next PATCH v3 00/17] Future-proof tunnel offload handlers Alexander Duyck
2016-06-16 19:20 ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:20 ` [net-next PATCH v3 01/17] vxlan/geneve: Include udp_tunnel.h in vxlan/geneve.h and fixup includes Alexander Duyck
2016-06-16 19:20   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:06   ` Hannes Frederic Sowa
2016-06-16 23:06     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:20 ` [net-next PATCH v3 02/17] net: Combine GENEVE and VXLAN port notifiers into single functions Alexander Duyck
2016-06-16 19:20   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:45   ` Hannes Frederic Sowa
2016-06-16 22:45     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:21 ` [net-next PATCH v3 03/17] net: Merge VXLAN and GENEVE push notifiers into a single notifier Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:47   ` Hannes Frederic Sowa
2016-06-16 22:47     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:21 ` [net-next PATCH v3 04/17] bnx2x: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 05/17] bnxt: Update drivers to support unified UDP encapsulation offload functions Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 06/17] bnxt: Move GENEVE support from hard-coded port to using port notifier Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:12   ` Michael Chan
2016-06-16 23:12     ` [Intel-wired-lan] " Michael Chan
2016-06-16 19:21 ` [net-next PATCH v3 07/17] benet: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 08/17] fm10k: " Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 09/17] i40e: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 10/17] ixgbe: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 11/17] mlx4_en: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 12/17] mlx5_en: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 13/17] nfp: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 14/17] qede: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:23 ` [net-next PATCH v3 15/17] qlcnic: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:23 ` [net-next PATCH v3 16/17] net: Remove deprecated tunnel specific UDP offload functions Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:59   ` Hannes Frederic Sowa
2016-06-16 22:59     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:23 ` [net-next PATCH v3 17/17] vxlan: Add new UDP encapsulation offload type for VXLAN-GPE Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:01   ` Hannes Frederic Sowa
2016-06-16 23:01     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-18  3:26 ` [net-next PATCH v3 00/17] Future-proof tunnel offload handlers David Miller
2016-06-18  3:26   ` [Intel-wired-lan] " David Miller
2016-06-20 17:05   ` Tom Herbert [this message]
2016-06-20 17:05     ` Tom Herbert
2016-06-20 18:11     ` Hannes Frederic Sowa
2016-06-20 18:11       ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-20 19:27       ` Tom Herbert
2016-06-20 19:27         ` [Intel-wired-lan] " Tom Herbert
2016-06-20 21:36         ` Hannes Frederic Sowa
2016-06-20 21:36           ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-20 21:45           ` Tom Herbert
2016-06-20 21:45             ` [Intel-wired-lan] " Tom Herbert
2016-06-21  8:34       ` David Miller
2016-06-21  8:34         ` [Intel-wired-lan] " David Miller
2016-06-21  8:22     ` David Miller
2016-06-21  8:22       ` [Intel-wired-lan] " David Miller
2016-06-21 10:41       ` Edward Cree
2016-06-21 10:41         ` [Intel-wired-lan] " Edward Cree
2016-06-21 15:23       ` Hannes Frederic Sowa
2016-06-21 15:23         ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 17:05       ` Alexander Duyck
2016-06-21 17:05         ` [Intel-wired-lan] " Alexander Duyck
2016-06-21 17:27         ` Edward Cree
2016-06-21 17:27           ` [Intel-wired-lan] " Edward Cree
2016-06-21 17:40           ` Hannes Frederic Sowa
2016-06-21 17:40             ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 18:17             ` Alexander Duyck
2016-06-21 18:17               ` [Intel-wired-lan] " Alexander Duyck
2016-06-21 18:42               ` Tom Herbert
2016-06-21 18:42                 ` [Intel-wired-lan] " Tom Herbert
2016-06-21 21:34                 ` Hannes Frederic Sowa
2016-06-21 21:34                   ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 18:23             ` Edward Cree
2016-06-21 18:23               ` [Intel-wired-lan] " Edward Cree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALx6S34Y3Qq6nMHuMcC2LC5duLBO3KVN3+A+0qmPrNvFfY7gJg@mail.gmail.com \
    --to=tom@herbertland.com \
    --cc=Dept-GELinuxNICDev@qlogic.com \
    --cc=aduyck@mirantis.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ariel.elior@qlogic.com \
    --cc=davem@davemloft.net \
    --cc=eugenia@mellanox.com \
    --cc=hannes@redhat.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jbenc@redhat.com \
    --cc=jesse@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.