All of lore.kernel.org
 help / color / mirror / Atom feed
From: Edward Cree <ecree@solarflare.com>
To: David Miller <davem@davemloft.net>, <tom@herbertland.com>
Cc: <aduyck@mirantis.com>, <netdev@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>, <hannes@redhat.com>,
	<jesse@kernel.org>, <eugenia@mellanox.com>, <jbenc@redhat.com>,
	<alexander.duyck@gmail.com>, <saeedm@mellanox.com>,
	<ariel.elior@qlogic.com>, <michael.chan@broadcom.com>,
	<Dept-GELinuxNICDev@qlogic.com>
Subject: Re: [net-next PATCH v3 00/17] Future-proof tunnel offload handlers
Date: Tue, 21 Jun 2016 11:41:10 +0100	[thread overview]
Message-ID: <e5c04987-b203-4eb4-8a41-e2fd208f66c9@solarflare.com> (raw)
In-Reply-To: <20160621.042211.945844554759834352.davem@davemloft.net>

On 21/06/16 09:22, David Miller wrote:
> From: Tom Herbert <tom@herbertland.com> Date: Mon, 20 Jun 2016 10:05:01 -0700
>> Generally, this means it needs to at least match by local addresses and port for an unconnected/unbound socket, the source address for an unconnected/bound socket, a the full 4-tuple for a connected socket. 
> These lookup keys are all insufficient. At the very least the network namespace must be in the lookup key as well if you want to match "sockets".
But the card doesn't have to be told that; instead, only push a socket to
a device offload if the device is in the same ns as the socket.  Wouldn't
that work?
Anything beyond that - i.e. supporting cross-ns offloads - would require
knowing how packets / addresses get transformed in bridging them from one
ns to another and in general that's quite a wide set of possibilities; it
doesn't seem worth while.  Especially since the likely use-case of tunnels
plus containers is that the host does the decapsulation and transparently
gives the container a virtual ethernet device, which keeps the hardware
and the "socket" in the same ns.
> But anyways, the vastness of the key is why we want to keep "sockets"
> out of network cards, because proper support of "sockets" requires
> access to information the card simply does not and should not have.

I think Tom's talk of "sockets" is a red herring; it's more a question of
"flows".  If we think of our host as a black box, its decisions ("is this
traffic encapsulated?") necessarily depend upon the 5-tuple plus the
(implicit) information that the traffic is being received on a particular
interface.
Netns are another red herring: even without them, what if our host is a
router with NAT, forwarding traffic to another host?  Now you're trying to
match a "socket" on another host (in, perhaps, another IP-address
namespace), but the "flow" is still the same: it's defined in terms of the
addresses on the incoming traffic, not what they might get NATted to by
the time the packets hit an actual socket.

So AFAICT, flow matching up to and including 5-tuple is both necessary and
sufficient for correct UDP tunnel detection in HW.  Sadly most HW
(including our latest here at sfc) thinks it only needs UDP dest port :(
and for such HW, Tom is right that we can't mix it with forwarding, and
have to reserve the port in all ns.

-Ed

WARNING: multiple messages have this Message-ID (diff)
From: Edward Cree <ecree@solarflare.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [net-next PATCH v3 00/17] Future-proof tunnel offload handlers
Date: Tue, 21 Jun 2016 11:41:10 +0100	[thread overview]
Message-ID: <e5c04987-b203-4eb4-8a41-e2fd208f66c9@solarflare.com> (raw)
In-Reply-To: <20160621.042211.945844554759834352.davem@davemloft.net>

On 21/06/16 09:22, David Miller wrote:
> From: Tom Herbert <tom@herbertland.com> Date: Mon, 20 Jun 2016 10:05:01 -0700
>> Generally, this means it needs to at least match by local addresses and port for an unconnected/unbound socket, the source address for an unconnected/bound socket, a the full 4-tuple for a connected socket. 
> These lookup keys are all insufficient. At the very least the network namespace must be in the lookup key as well if you want to match "sockets".
But the card doesn't have to be told that; instead, only push a socket to
a device offload if the device is in the same ns as the socket.  Wouldn't
that work?
Anything beyond that - i.e. supporting cross-ns offloads - would require
knowing how packets / addresses get transformed in bridging them from one
ns to another and in general that's quite a wide set of possibilities; it
doesn't seem worth while.  Especially since the likely use-case of tunnels
plus containers is that the host does the decapsulation and transparently
gives the container a virtual ethernet device, which keeps the hardware
and the "socket" in the same ns.
> But anyways, the vastness of the key is why we want to keep "sockets"
> out of network cards, because proper support of "sockets" requires
> access to information the card simply does not and should not have.

I think Tom's talk of "sockets" is a red herring; it's more a question of
"flows".  If we think of our host as a black box, its decisions ("is this
traffic encapsulated?") necessarily depend upon the 5-tuple plus the
(implicit) information that the traffic is being received on a particular
interface.
Netns are another red herring: even without them, what if our host is a
router with NAT, forwarding traffic to another host?  Now you're trying to
match a "socket" on another host (in, perhaps, another IP-address
namespace), but the "flow" is still the same: it's defined in terms of the
addresses on the incoming traffic, not what they might get NATted to by
the time the packets hit an actual socket.

So AFAICT, flow matching up to and including 5-tuple is both necessary and
sufficient for correct UDP tunnel detection in HW.  Sadly most HW
(including our latest here at sfc) thinks it only needs UDP dest port :(
and for such HW, Tom is right that we can't mix it with forwarding, and
have to reserve the port in all ns.

-Ed


  reply	other threads:[~2016-06-21 10:53 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 19:20 [net-next PATCH v3 00/17] Future-proof tunnel offload handlers Alexander Duyck
2016-06-16 19:20 ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:20 ` [net-next PATCH v3 01/17] vxlan/geneve: Include udp_tunnel.h in vxlan/geneve.h and fixup includes Alexander Duyck
2016-06-16 19:20   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:06   ` Hannes Frederic Sowa
2016-06-16 23:06     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:20 ` [net-next PATCH v3 02/17] net: Combine GENEVE and VXLAN port notifiers into single functions Alexander Duyck
2016-06-16 19:20   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:45   ` Hannes Frederic Sowa
2016-06-16 22:45     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:21 ` [net-next PATCH v3 03/17] net: Merge VXLAN and GENEVE push notifiers into a single notifier Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:47   ` Hannes Frederic Sowa
2016-06-16 22:47     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:21 ` [net-next PATCH v3 04/17] bnx2x: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 05/17] bnxt: Update drivers to support unified UDP encapsulation offload functions Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 06/17] bnxt: Move GENEVE support from hard-coded port to using port notifier Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:12   ` Michael Chan
2016-06-16 23:12     ` [Intel-wired-lan] " Michael Chan
2016-06-16 19:21 ` [net-next PATCH v3 07/17] benet: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:21 ` [net-next PATCH v3 08/17] fm10k: " Alexander Duyck
2016-06-16 19:21   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 09/17] i40e: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 10/17] ixgbe: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 11/17] mlx4_en: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 12/17] mlx5_en: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 13/17] nfp: " Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:22 ` [net-next PATCH v3 14/17] qede: Move all UDP port notifiers to single function Alexander Duyck
2016-06-16 19:22   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:23 ` [net-next PATCH v3 15/17] qlcnic: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 19:23 ` [net-next PATCH v3 16/17] net: Remove deprecated tunnel specific UDP offload functions Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 22:59   ` Hannes Frederic Sowa
2016-06-16 22:59     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-16 19:23 ` [net-next PATCH v3 17/17] vxlan: Add new UDP encapsulation offload type for VXLAN-GPE Alexander Duyck
2016-06-16 19:23   ` [Intel-wired-lan] " Alexander Duyck
2016-06-16 23:01   ` Hannes Frederic Sowa
2016-06-16 23:01     ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-18  3:26 ` [net-next PATCH v3 00/17] Future-proof tunnel offload handlers David Miller
2016-06-18  3:26   ` [Intel-wired-lan] " David Miller
2016-06-20 17:05   ` Tom Herbert
2016-06-20 17:05     ` [Intel-wired-lan] " Tom Herbert
2016-06-20 18:11     ` Hannes Frederic Sowa
2016-06-20 18:11       ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-20 19:27       ` Tom Herbert
2016-06-20 19:27         ` [Intel-wired-lan] " Tom Herbert
2016-06-20 21:36         ` Hannes Frederic Sowa
2016-06-20 21:36           ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-20 21:45           ` Tom Herbert
2016-06-20 21:45             ` [Intel-wired-lan] " Tom Herbert
2016-06-21  8:34       ` David Miller
2016-06-21  8:34         ` [Intel-wired-lan] " David Miller
2016-06-21  8:22     ` David Miller
2016-06-21  8:22       ` [Intel-wired-lan] " David Miller
2016-06-21 10:41       ` Edward Cree [this message]
2016-06-21 10:41         ` Edward Cree
2016-06-21 15:23       ` Hannes Frederic Sowa
2016-06-21 15:23         ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 17:05       ` Alexander Duyck
2016-06-21 17:05         ` [Intel-wired-lan] " Alexander Duyck
2016-06-21 17:27         ` Edward Cree
2016-06-21 17:27           ` [Intel-wired-lan] " Edward Cree
2016-06-21 17:40           ` Hannes Frederic Sowa
2016-06-21 17:40             ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 18:17             ` Alexander Duyck
2016-06-21 18:17               ` [Intel-wired-lan] " Alexander Duyck
2016-06-21 18:42               ` Tom Herbert
2016-06-21 18:42                 ` [Intel-wired-lan] " Tom Herbert
2016-06-21 21:34                 ` Hannes Frederic Sowa
2016-06-21 21:34                   ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-06-21 18:23             ` Edward Cree
2016-06-21 18:23               ` [Intel-wired-lan] " Edward Cree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5c04987-b203-4eb4-8a41-e2fd208f66c9@solarflare.com \
    --to=ecree@solarflare.com \
    --cc=Dept-GELinuxNICDev@qlogic.com \
    --cc=aduyck@mirantis.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ariel.elior@qlogic.com \
    --cc=davem@davemloft.net \
    --cc=eugenia@mellanox.com \
    --cc=hannes@redhat.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jbenc@redhat.com \
    --cc=jesse@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.