All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
@ 2020-11-20 15:09 Alexander Lobakin
  2020-11-21 11:58 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Lobakin @ 2020-11-20 15:09 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexander Lobakin, netfilter-devel, davem, netdev, kuba, fw,
	razor, jeremy, tobias

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 20 Nov 2020 13:49:12 +0100

> Hi,

Hi Pablo,

> The following patchset augments the Netfilter flowtable fastpath to
> support for network topologies that combine IP forwarding, bridge and
> VLAN devices.

I'm curious if this new infra can be expanded later to shortcut other
VLAN-like virtual netdevs e.g. DSA-like switch slaves.

I mean, usually we have port0...portX physical port representors
and backing CPU port with ethX representor. When in comes to NAT,
portX is set as destination. Flow offload calls dev_queue_xmit()
on it, switch stack pushes CPU tag into the skb, change skb->dev
to ethX and calls another dev_queue_xmit().

If we could (using the new .ndo_fill_forward_path()) tell Netfilter
that our real dest is ethX and push the CPU tag via dev_hard_header(),
this will omit one more dev_queue_xmit() and a bunch of indirect calls
and checks.
This might require some sort of "custom" or "private" cookies for
N-Tuple though to separate flows from/to different switch ports (as
it's done for VLAN: proto + VID).

If so, I'd like to try to implement and publish that idea for reviews
after this one lands nf-next.

> This v5 includes updates for:
> 
> - Patch #2: fix incorrect xmit type in IPv6 path, per Florian Westphal.
> - Patch #3: fix possible off by one in dev_fill_forward_path() stack logic,
>             per Florian Westphal.
> - Patch #7: add a note to patch description to specify that FDB topology
>             updates are not supported at this stage, per Jakub Kicinski.
> 
> A typical scenario that can benefit from this infrastructure is composed
> of several VMs connected to bridge ports where the bridge master device
> 'br0' has an IP address. A DHCP server is also assumed to be running to
> provide connectivity to the VMs. The VMs reach the Internet through
> 'br0' as default gateway, which makes the packet enter the IP forwarding
> path. Then, netfilter is used to NAT the packets before they leave
> through the wan device.
> 
> Something like this:
> 
>                        fast path
>                 .------------------------.
>                /                          \
>                |           IP forwarding   |
>                |          /             \  .
>                |       br0               eth0
>                .       / \
>                -- veth1  veth2
>                    .
>                    .
>                    .
>                  eth0
>            ab:cd:ef:ab:cd:ef
>                   VM
> 
> The idea is to accelerate forwarding by building a fast path that takes
> packets from the ingress path of the bridge port and place them in the
> egress path of the wan device (and vice versa). Hence, skipping the
> classic bridge and IP stack paths.
> 
> This patchset is composed of:
> 
> Patch #1 adds a placeholder for the hash calculation, instead of using
>          the dir field.
> 
> Patch #2 adds the transmit path type field to the flow tuple. Two transmit
>          paths are supported so far: the neighbour and the xfrm transmit
>          paths. This patch comes in preparation to add a new direct ethernet
>          transmit path (see patch #7).
> 
> Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
>          netdev_ops. This new function describes the list of netdevice hops
>          to reach a given destination MAC address in the local network topology,
>          e.g.
> 
>                            IP forwarding
>                           /             \
>                        br0              eth0
>                        / \
>                    veth1 veth2
>                     .
>                     .
>                     .
>                    eth0
>              ab:cd:ef:ab:cd:ef
> 
>           where veth1 and veth2 are bridge ports and eth0 provides Internet
>           connectivity. eth0 is the interface in the VM which is connected to
>           the veth1 bridge port. Then, for packets going to br0 whose
>           destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path()
>           provides the following path: br0 -> veth1.
> 
> Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next
>          device hop via vlan->real_dev. This annotates the VLAN id and protocol.
>          This is useful to know what VLAN headers are expected from the ingress
>          device. This also provides information regarding the VLAN headers
>          to be pushed in the egress path.
> 
> Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make
>          lookups to the FDB to locate the next device hop (bridge port) in the
>          forwarding path.
> 
> Patch #6 updates the flowtable to use the dev_fill_forward_path()
>          infrastructure to obtain the ingress device in the fastpath.
> 
> Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the
>          egress device in the forwarding path. This also adds the direct
>          ethernet transmit path, which pushes the ethernet header to the
>          packet and send it through dev_queue_xmit(). This patch adds
>          support for the bridge, so bridge ports use this direct xmit path.
> 
> Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN
>          information is also provided by dev_fill_forward_path(). Store the
>          VLAN id and protocol in the flow tuple for hash lookups. The VLAN
>          support in the xmit path is achieved by annotating the first vlan
>          device found in the xmit path and by calling dev_hard_header()
>          (previous patch #7) before dev_queue_xmit().
> 
> Patch #9 extends nft_flowtable.sh selftest: This is adding a test to
>          cover bridge and vlan support coming in this patchset.
> 
> = Performance numbers
> 
> My testbed environment consists of three containers:
> 
>   192.168.20.2     .20.1     .10.1   10.141.10.2
>          veth0       veth0 veth1      veth0
>         ns1 <---------> nsr1 <--------> ns2
>                             SNAT
>      iperf -c                          iperf -s
> 
> where nsr1 is used for forwarding. There is a bridge device br0 in nsr1,
> veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1.
> 
> - ns2 runs iperf -s
> - ns1 runs iperf -c 10.141.10.2 -n 100G
> 
> My results are:
> 
> - Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s
> - Fastpath (with flowtable, this patchset): ~25 Gbit/s
> 
> This is an improvement of ~50% compared to baseline.

Anyway, great work, thanks!

> Please, apply. Thank you.
> 
> Pablo Neira Ayuso (9):
>   netfilter: flowtable: add hash offset field to tuple
>   netfilter: flowtable: add xmit path types
>   net: resolve forwarding path from virtual netdevice and HW destination address
>   net: 8021q: resolve forwarding path for vlan devices
>   bridge: resolve forwarding path for bridge devices
>   netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
>   netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
>   netfilter: flowtable: add vlan support
>   selftests: netfilter: flowtable bridge and VLAN support
> 
>  include/linux/netdevice.h                     |  35 +++
>  include/net/netfilter/nf_flow_table.h         |  43 +++-
>  net/8021q/vlan_dev.c                          |  15 ++
>  net/bridge/br_device.c                        |  27 +++
>  net/core/dev.c                                |  46 ++++
>  net/netfilter/nf_flow_table_core.c            |  51 +++--
>  net/netfilter/nf_flow_table_ip.c              | 200 ++++++++++++++----
>  net/netfilter/nft_flow_offload.c              | 159 +++++++++++++-
>  .../selftests/netfilter/nft_flowtable.sh      |  82 +++++++
>  9 files changed, 598 insertions(+), 60 deletions(-)
> 
> --
> 2.20.1

Al


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
  2020-11-20 15:09 [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements Alexander Lobakin
@ 2020-11-21 11:58 ` Pablo Neira Ayuso
  0 siblings, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2020-11-21 11:58 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: netfilter-devel, davem, netdev, kuba, fw, razor, jeremy, tobias

Hi,

On Fri, Nov 20, 2020 at 03:09:37PM +0000, Alexander Lobakin wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Fri, 20 Nov 2020 13:49:12 +0100
[...]
> > The following patchset augments the Netfilter flowtable fastpath to
> > support for network topologies that combine IP forwarding, bridge and
> > VLAN devices.
> 
> I'm curious if this new infra can be expanded later to shortcut other
> VLAN-like virtual netdevs e.g. DSA-like switch slaves.
> 
> I mean, usually we have port0...portX physical port representors
> and backing CPU port with ethX representor. When in comes to NAT,
> portX is set as destination. Flow offload calls dev_queue_xmit()
> on it, switch stack pushes CPU tag into the skb, change skb->dev
> to ethX and calls another dev_queue_xmit().
> 
> If we could (using the new .ndo_fill_forward_path()) tell Netfilter
> that our real dest is ethX and push the CPU tag via dev_hard_header(),
> this will omit one more dev_queue_xmit() and a bunch of indirect calls
> and checks.

If the XMIT_DIRECT path can be used for this with minimal changes,
that would be good.

> This might require some sort of "custom" or "private" cookies for
> N-Tuple though to separate flows from/to different switch ports (as
> it's done for VLAN: proto + VID).

Probably VLAN proto + VID in the tuple can be reused for this too.
Maybe add some extra information to tell if this is a VLAN or DSA
frame. It should be just one extra check for skb->protocol equals DSA.
Looks like very minimal changes to support for this.

> If so, I'd like to try to implement and publish that idea for reviews
> after this one lands nf-next.

Exploring new extensions is fine.

I received another email from someone else that would like to extend
this to support for PPPoE devices with PcEngines APU routers. In
general, adding more .ndo_fill_forward_path for more device types is
possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
  2020-11-22 14:51 ` Alexander Lobakin
@ 2020-11-22 20:15   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2020-11-22 20:15 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: netfilter-devel, davem, netdev, kuba, fw, razor, jeremy, tobias,
	linux-kernel

On Sun, Nov 22, 2020 at 02:51:18PM +0000, Alexander Lobakin wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Sun, 22 Nov 2020 12:42:19 +0100
> 
> > On Sun, Nov 22, 2020 at 10:26:16AM +0000, Alexander Lobakin wrote:
> >> From: Pablo Neira Ayuso <pablo@netfilter.org>
> >> Date: Fri, 20 Nov 2020 13:49:12 +0100
> > [...]
> >>> Something like this:
> >>>
> >>>                        fast path
> >>>                 .------------------------.
> >>>                /                          \
> >>>                |           IP forwarding   |
> >>>                |          /             \  .
> >>>                |       br0               eth0
> >>>                .       / \
> >>>                -- veth1  veth2
> >>>                    .
> >>>                    .
> >>>                    .
> >>>                  eth0
> >>>            ab:cd:ef:ab:cd:ef
> >>>                   VM
> >>
> >> I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in
> >> case of this shortcut. We'll have incomplete netdevice Tx stats for
> >> these two, as it gets updated inside this callbacks.
> >
> > TX device stats are being updated accordingly.
> >
> > # ip netns exec nsr1 ip -s link
> > 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
> >     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >     RX: bytes  packets  errors  dropped overrun mcast   
> >     0          0        0       0       0       0       
> >     TX: bytes  packets  errors  dropped carrier collsns 
> >     0          0        0       0       0       0       
> > 2: veth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >     link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff link-netns ns1
> >     RX: bytes  packets  errors  dropped overrun mcast   
> >     213290848248 4869765  0       0       0       0       
> >     TX: bytes  packets  errors  dropped carrier collsns 
> >     315346667  4777953  0       0       0       0       
> > 3: veth1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >     link/ether 4a:81:2d:9a:02:88 brd ff:ff:ff:ff:ff:ff link-netns ns2
> >     RX: bytes  packets  errors  dropped overrun mcast   
> >     315337919  4777833  0       0       0       0       
> >     TX: bytes  packets  errors  dropped carrier collsns 
> >     213290844826 4869708  0       0       0       0       
> > 4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >     link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff
> >     RX: bytes  packets  errors  dropped overrun mcast   
> >     4101       73       0       0       0       0       
> >     TX: bytes  packets  errors  dropped carrier collsns 
> >     5256       74       0       0       0       0       
> 
> Aren't these counters very low for br0, despite that br0 is an
> intermediate point of traffic flow?

Most packets follow the flowtable fast path, which is bypassing the
br0 device. Bumping br0 stats would be misleading, it would make the
user think that the packets follow the classic bridge layer path,
while they do not. The flowtable have counters itself to allow the
user to collect stats regarding the packets that follow the fastpath.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
  2020-11-22 10:26 Alexander Lobakin
  2020-11-22 11:42 ` Pablo Neira Ayuso
@ 2020-11-22 14:51 ` Alexander Lobakin
  2020-11-22 20:15   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 7+ messages in thread
From: Alexander Lobakin @ 2020-11-22 14:51 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexander Lobakin, netfilter-devel, davem, netdev, kuba, fw,
	razor, jeremy, tobias, linux-kernel

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 22 Nov 2020 12:42:19 +0100

> On Sun, Nov 22, 2020 at 10:26:16AM +0000, Alexander Lobakin wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Fri, 20 Nov 2020 13:49:12 +0100
> [...]
>>> Something like this:
>>>
>>>                        fast path
>>>                 .------------------------.
>>>                /                          \
>>>                |           IP forwarding   |
>>>                |          /             \  .
>>>                |       br0               eth0
>>>                .       / \
>>>                -- veth1  veth2
>>>                    .
>>>                    .
>>>                    .
>>>                  eth0
>>>            ab:cd:ef:ab:cd:ef
>>>                   VM
>>
>> I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in
>> case of this shortcut. We'll have incomplete netdevice Tx stats for
>> these two, as it gets updated inside this callbacks.
>
> TX device stats are being updated accordingly.
>
> # ip netns exec nsr1 ip -s link
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     RX: bytes  packets  errors  dropped overrun mcast   
>     0          0        0       0       0       0       
>     TX: bytes  packets  errors  dropped carrier collsns 
>     0          0        0       0       0       0       
> 2: veth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff link-netns ns1
>     RX: bytes  packets  errors  dropped overrun mcast   
>     213290848248 4869765  0       0       0       0       
>     TX: bytes  packets  errors  dropped carrier collsns 
>     315346667  4777953  0       0       0       0       
> 3: veth1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 4a:81:2d:9a:02:88 brd ff:ff:ff:ff:ff:ff link-netns ns2
>     RX: bytes  packets  errors  dropped overrun mcast   
>     315337919  4777833  0       0       0       0       
>     TX: bytes  packets  errors  dropped carrier collsns 
>     213290844826 4869708  0       0       0       0       
> 4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast   
>     4101       73       0       0       0       0       
>     TX: bytes  packets  errors  dropped carrier collsns 
>     5256       74       0       0       0       0       

Aren't these counters very low for br0, despite that br0 is an
intermediate point of traffic flow?

> 5: veth0.10@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
>     link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast   
>     4101       73       0       0       0       62      
>     TX: bytes  packets  errors  dropped carrier collsns 
>     315342363  4777893  0       0       0       0       


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
  2020-11-22 10:26 Alexander Lobakin
@ 2020-11-22 11:42 ` Pablo Neira Ayuso
  2020-11-22 14:51 ` Alexander Lobakin
  1 sibling, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2020-11-22 11:42 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: netfilter-devel, davem, netdev, kuba, fw, razor, jeremy, tobias,
	linux-kernel

On Sun, Nov 22, 2020 at 10:26:16AM +0000, Alexander Lobakin wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Fri, 20 Nov 2020 13:49:12 +0100
[...]
> > Something like this:
> > 
> >                        fast path
> >                 .------------------------.
> >                /                          \
> >                |           IP forwarding   |
> >                |          /             \  .
> >                |       br0               eth0
> >                .       / \
> >                -- veth1  veth2
> >                    .
> >                    .
> >                    .
> >                  eth0
> >            ab:cd:ef:ab:cd:ef
> >                   VM
> 
> I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in
> case of this shortcut. We'll have incomplete netdevice Tx stats for
> these two, as it gets updated inside this callbacks.

TX device stats are being updated accordingly.

# ip netns exec nsr1 ip -s link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0       
2: veth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff link-netns ns1
    RX: bytes  packets  errors  dropped overrun mcast   
    213290848248 4869765  0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    315346667  4777953  0       0       0       0       
3: veth1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 4a:81:2d:9a:02:88 brd ff:ff:ff:ff:ff:ff link-netns ns2
    RX: bytes  packets  errors  dropped overrun mcast   
    315337919  4777833  0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    213290844826 4869708  0       0       0       0       
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    4101       73       0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    5256       74       0       0       0       0       
5: veth0.10@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 82:0d:f3:b5:59:5d brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    4101       73       0       0       0       62      
    TX: bytes  packets  errors  dropped carrier collsns 
    315342363  4777893  0       0       0       0       


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
@ 2020-11-22 10:26 Alexander Lobakin
  2020-11-22 11:42 ` Pablo Neira Ayuso
  2020-11-22 14:51 ` Alexander Lobakin
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Lobakin @ 2020-11-22 10:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexander Lobakin, netfilter-devel, davem, netdev, kuba, fw,
	razor, jeremy, tobias, linux-kernel

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 20 Nov 2020 13:49:12 +0100

> Hi,
> 
> The following patchset augments the Netfilter flowtable fastpath to
> support for network topologies that combine IP forwarding, bridge and
> VLAN devices.
> 
> This v5 includes updates for:
> 
> - Patch #2: fix incorrect xmit type in IPv6 path, per Florian Westphal.
> - Patch #3: fix possible off by one in dev_fill_forward_path() stack logic,
>             per Florian Westphal.
> - Patch #7: add a note to patch description to specify that FDB topology
>             updates are not supported at this stage, per Jakub Kicinski.
> 
> A typical scenario that can benefit from this infrastructure is composed
> of several VMs connected to bridge ports where the bridge master device
> 'br0' has an IP address. A DHCP server is also assumed to be running to
> provide connectivity to the VMs. The VMs reach the Internet through
> 'br0' as default gateway, which makes the packet enter the IP forwarding
> path. Then, netfilter is used to NAT the packets before they leave
> through the wan device.
> 
> Something like this:
> 
>                        fast path
>                 .------------------------.
>                /                          \
>                |           IP forwarding   |
>                |          /             \  .
>                |       br0               eth0
>                .       / \
>                -- veth1  veth2
>                    .
>                    .
>                    .
>                  eth0
>            ab:cd:ef:ab:cd:ef
>                   VM

I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in
case of this shortcut. We'll have incomplete netdevice Tx stats for
these two, as it gets updated inside this callbacks.

> The idea is to accelerate forwarding by building a fast path that takes
> packets from the ingress path of the bridge port and place them in the
> egress path of the wan device (and vice versa). Hence, skipping the
> classic bridge and IP stack paths.
> 
> This patchset is composed of:
> 
> Patch #1 adds a placeholder for the hash calculation, instead of using
>          the dir field.
> 
> Patch #2 adds the transmit path type field to the flow tuple. Two transmit
>          paths are supported so far: the neighbour and the xfrm transmit
>          paths. This patch comes in preparation to add a new direct ethernet
>          transmit path (see patch #7).
> 
> Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
>          netdev_ops. This new function describes the list of netdevice hops
>          to reach a given destination MAC address in the local network topology,
>          e.g.
> 
>                            IP forwarding
>                           /             \
>                        br0              eth0
>                        / \
>                    veth1 veth2
>                     .
>                     .
>                     .
>                    eth0
>              ab:cd:ef:ab:cd:ef
> 
>           where veth1 and veth2 are bridge ports and eth0 provides Internet
>           connectivity. eth0 is the interface in the VM which is connected to
>           the veth1 bridge port. Then, for packets going to br0 whose
>           destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path()
>           provides the following path: br0 -> veth1.
> 
> Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next
>          device hop via vlan->real_dev. This annotates the VLAN id and protocol.
>          This is useful to know what VLAN headers are expected from the ingress
>          device. This also provides information regarding the VLAN headers
>          to be pushed in the egress path.
> 
> Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make
>          lookups to the FDB to locate the next device hop (bridge port) in the
>          forwarding path.
> 
> Patch #6 updates the flowtable to use the dev_fill_forward_path()
>          infrastructure to obtain the ingress device in the fastpath.
> 
> Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the
>          egress device in the forwarding path. This also adds the direct
>          ethernet transmit path, which pushes the ethernet header to the
>          packet and send it through dev_queue_xmit(). This patch adds
>          support for the bridge, so bridge ports use this direct xmit path.
> 
> Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN
>          information is also provided by dev_fill_forward_path(). Store the
>          VLAN id and protocol in the flow tuple for hash lookups. The VLAN
>          support in the xmit path is achieved by annotating the first vlan
>          device found in the xmit path and by calling dev_hard_header()
>          (previous patch #7) before dev_queue_xmit().
> 
> Patch #9 extends nft_flowtable.sh selftest: This is adding a test to
>          cover bridge and vlan support coming in this patchset.
> 
> = Performance numbers
> 
> My testbed environment consists of three containers:
> 
>   192.168.20.2     .20.1     .10.1   10.141.10.2
>          veth0       veth0 veth1      veth0
>         ns1 <---------> nsr1 <--------> ns2
>                             SNAT
>      iperf -c                          iperf -s
> 
> where nsr1 is used for forwarding. There is a bridge device br0 in nsr1,
> veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1.
> 
> - ns2 runs iperf -s
> - ns1 runs iperf -c 10.141.10.2 -n 100G
> 
> My results are:
> 
> - Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s
> - Fastpath (with flowtable, this patchset): ~25 Gbit/s
> 
> This is an improvement of ~50% compared to baseline.
> 
> Please, apply. Thank you.
> 
> Pablo Neira Ayuso (9):
>   netfilter: flowtable: add hash offset field to tuple
>   netfilter: flowtable: add xmit path types
>   net: resolve forwarding path from virtual netdevice and HW destination address
>   net: 8021q: resolve forwarding path for vlan devices
>   bridge: resolve forwarding path for bridge devices
>   netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
>   netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
>   netfilter: flowtable: add vlan support
>   selftests: netfilter: flowtable bridge and VLAN support
> 
>  include/linux/netdevice.h                     |  35 +++
>  include/net/netfilter/nf_flow_table.h         |  43 +++-
>  net/8021q/vlan_dev.c                          |  15 ++
>  net/bridge/br_device.c                        |  27 +++
>  net/core/dev.c                                |  46 ++++
>  net/netfilter/nf_flow_table_core.c            |  51 +++--
>  net/netfilter/nf_flow_table_ip.c              | 200 ++++++++++++++----
>  net/netfilter/nft_flow_offload.c              | 159 +++++++++++++-
>  .../selftests/netfilter/nft_flowtable.sh      |  82 +++++++
>  9 files changed, 598 insertions(+), 60 deletions(-)
> 
> --
> 2.20.1

Al


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
@ 2020-11-20 12:49 Pablo Neira Ayuso
  0 siblings, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2020-11-20 12:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, fw, razor, jeremy, tobias

Hi,

The following patchset augments the Netfilter flowtable fastpath to
support for network topologies that combine IP forwarding, bridge and
VLAN devices.

This v5 includes updates for:

- Patch #2: fix incorrect xmit type in IPv6 path, per Florian Westphal.
- Patch #3: fix possible off by one in dev_fill_forward_path() stack logic,
            per Florian Westphal.
- Patch #7: add a note to patch description to specify that FDB topology
            updates are not supported at this stage, per Jakub Kicinski.

A typical scenario that can benefit from this infrastructure is composed
of several VMs connected to bridge ports where the bridge master device
'br0' has an IP address. A DHCP server is also assumed to be running to
provide connectivity to the VMs. The VMs reach the Internet through
'br0' as default gateway, which makes the packet enter the IP forwarding
path. Then, netfilter is used to NAT the packets before they leave
through the wan device.

Something like this:

                       fast path
                .------------------------.
               /                          \
               |           IP forwarding   |
               |          /             \  .
               |       br0               eth0
               .       / \
               -- veth1  veth2
                   .
                   .
                   .
                 eth0
           ab:cd:ef:ab:cd:ef
                  VM

The idea is to accelerate forwarding by building a fast path that takes
packets from the ingress path of the bridge port and place them in the
egress path of the wan device (and vice versa). Hence, skipping the
classic bridge and IP stack paths.

This patchset is composed of:

Patch #1 adds a placeholder for the hash calculation, instead of using
         the dir field.

Patch #2 adds the transmit path type field to the flow tuple. Two transmit
         paths are supported so far: the neighbour and the xfrm transmit
         paths. This patch comes in preparation to add a new direct ethernet
         transmit path (see patch #7).

Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
         netdev_ops. This new function describes the list of netdevice hops
         to reach a given destination MAC address in the local network topology,
         e.g.

                           IP forwarding
                          /             \
                       br0              eth0
                       / \
                   veth1 veth2
                    .
                    .
                    .
                   eth0
             ab:cd:ef:ab:cd:ef

          where veth1 and veth2 are bridge ports and eth0 provides Internet
          connectivity. eth0 is the interface in the VM which is connected to
          the veth1 bridge port. Then, for packets going to br0 whose
          destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path()
          provides the following path: br0 -> veth1.

Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next
         device hop via vlan->real_dev. This annotates the VLAN id and protocol.
         This is useful to know what VLAN headers are expected from the ingress
         device. This also provides information regarding the VLAN headers
         to be pushed in the egress path.

Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make
         lookups to the FDB to locate the next device hop (bridge port) in the
         forwarding path.

Patch #6 updates the flowtable to use the dev_fill_forward_path()
         infrastructure to obtain the ingress device in the fastpath.

Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the
         egress device in the forwarding path. This also adds the direct
         ethernet transmit path, which pushes the ethernet header to the
         packet and send it through dev_queue_xmit(). This patch adds
         support for the bridge, so bridge ports use this direct xmit path.

Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN
         information is also provided by dev_fill_forward_path(). Store the
         VLAN id and protocol in the flow tuple for hash lookups. The VLAN
         support in the xmit path is achieved by annotating the first vlan
         device found in the xmit path and by calling dev_hard_header()
         (previous patch #7) before dev_queue_xmit().

Patch #9 extends nft_flowtable.sh selftest: This is adding a test to
         cover bridge and vlan support coming in this patchset.

= Performance numbers

My testbed environment consists of three containers:

  192.168.20.2     .20.1     .10.1   10.141.10.2
         veth0       veth0 veth1      veth0
        ns1 <---------> nsr1 <--------> ns2
                            SNAT
     iperf -c                          iperf -s

where nsr1 is used for forwarding. There is a bridge device br0 in nsr1,
veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1.

- ns2 runs iperf -s
- ns1 runs iperf -c 10.141.10.2 -n 100G

My results are:

- Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s
- Fastpath (with flowtable, this patchset): ~25 Gbit/s

This is an improvement of ~50% compared to baseline.

Please, apply. Thank you.

Pablo Neira Ayuso (9):
  netfilter: flowtable: add hash offset field to tuple
  netfilter: flowtable: add xmit path types
  net: resolve forwarding path from virtual netdevice and HW destination address
  net: 8021q: resolve forwarding path for vlan devices
  bridge: resolve forwarding path for bridge devices
  netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
  netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
  netfilter: flowtable: add vlan support
  selftests: netfilter: flowtable bridge and VLAN support

 include/linux/netdevice.h                     |  35 +++
 include/net/netfilter/nf_flow_table.h         |  43 +++-
 net/8021q/vlan_dev.c                          |  15 ++
 net/bridge/br_device.c                        |  27 +++
 net/core/dev.c                                |  46 ++++
 net/netfilter/nf_flow_table_core.c            |  51 +++--
 net/netfilter/nf_flow_table_ip.c              | 200 ++++++++++++++----
 net/netfilter/nft_flow_offload.c              | 159 +++++++++++++-
 .../selftests/netfilter/nft_flowtable.sh      |  82 +++++++
 9 files changed, 598 insertions(+), 60 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-22 20:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 15:09 [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements Alexander Lobakin
2020-11-21 11:58 ` Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2020-11-22 10:26 Alexander Lobakin
2020-11-22 11:42 ` Pablo Neira Ayuso
2020-11-22 14:51 ` Alexander Lobakin
2020-11-22 20:15   ` Pablo Neira Ayuso
2020-11-20 12:49 Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.