netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Saeed Mahameed <saeed@kernel.org>,
	"David S. Miller" <davem@davemloft.net>, <netdev@vger.kernel.org>,
	Mark Bloch <mbloch@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [net-next V2 01/17] net/mlx5: E-Switch, Refactor setting source port
Date: Tue, 9 Feb 2021 21:17:11 +0200	[thread overview]
Message-ID: <ygnhlfbxgifc.fsf@nvidia.com> (raw)
In-Reply-To: <20210209100504.119925c1@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>


On Tue 09 Feb 2021 at 20:05, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 9 Feb 2021 16:22:26 +0200 Vlad Buslov wrote:
>> No, tunnel IP is configured on VF. That particular VF is in host
>> namespace. When mlx5 resolves tunneling the code checks if tunnel
>> endpoint IP address is on such mlx5 VF, since the VF is in same
>> namespace as eswitch manager (e.g. on host) and route returned by
>> ip_route_output_key() is resolved through rt->dst.dev==tunVF device.
>> After establishing that tunnel is on VF the goal is to process two
>> resulting TC rules (in both directions) fully in hardware without
>> exposing the packet on tunneling device or tunnel VF in sw, which is
>> implemented with all the infrastructure from this series.
>> 
>> So, to summarize with IP addresses from TC examples presented in cover letter,
>> we have underlay network 7.7.7.0/24 in host namespace with tunnel endpoint IP
>> address on VF:
>> 
>> $ ip a show dev enp8s0f0v0
>> 1537: enp8s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
>>     link/ether 52:e5:6d:f2:00:69 brd ff:ff:ff:ff:ff:ff
>>     altname enp8s0f0np0v0
>>     inet 7.7.7.5/24 scope global enp8s0f0v0
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::50e5:6dff:fef2:69/64 scope link
>>        valid_lft forever preferred_lft forever
>
> Isn't this 100% the wrong way around. Disable the offloads. Does the
> traffic hit the VF encapsulated?
>
> IIUC SW will do this:
>
>         PHY port
>            |
> device     |             ,-----.
> -----------|------------|-------|----------
> kernel     |            |       |
>         (UL/PF)       (VFr)    (VF)
>            |            |       |
>         [TC ing]>redir -`       V
>
> And the packet never hits encap.

We can look at dumps on every stage (produced by running exactly the
same test with OVS option other_config:tc-policy=skip_hw):

1. Traffic arrives at UL with vxlan encapsulation

$ sudo tcpdump -ni enp8s0f0 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on enp8s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:01:28.619346 IP (tos 0x0, ttl 64, id 65187, offset 0, flags [none], proto UDP (17), length 102)
    7.7.7.1.52277 > 7.7.7.5.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 43919, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x467b (correct), seq 2194968387, ack 2680742983, win 24576, options [nop,nop,TS val 1092282319 ecr 348802330], length 0
21:01:28.619505 IP (tos 0x0, ttl 64, id 888, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 6662, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0x8025 (correct), seq 673837:675235, ack 0, win 502, options [nop,nop,TS val 348802333 ecr 1092282319], length 1398
21:01:28.619506 IP (tos 0x0, ttl 64, id 889, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 6663, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0x19d1 (correct), seq 675235:676633, ack 0, win 502, options [nop,nop,TS val 348802333 ecr 1092282319], length 1398


2. By TC rule traffic is redirected to tunnel VF that has IP address
7.7.7.5 (still encapsulated as there is no decap action attached to
filter on enp8s0f0):

$ sudo tcpdump -ni enp8s0f0v0 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on enp8s0f0v0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:03:41.524244 IP (tos 0x0, ttl 64, id 48184, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 52619, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0xaddb (correct), seq 279895999:279897397, ack 2194968387, win 502, options [nop,nop,TS val 348935238 ecr 1092415214], length 1398
21:03:41.568055 IP (tos 0x0, ttl 64, id 701, offset 0, flags [none], proto UDP (17), length 102)
    7.7.7.1.52277 > 7.7.7.5.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 44938, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0xc623 (correct), seq 1, ack 1398, win 24576, options [nop,nop,TS val 1092415267 ecr 348935238], length 0
21:03:41.568384 IP (tos 0x0, ttl 64, id 48191, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 52620, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0xe1b9 (correct), seq 1398:2796, ack 1, win 502, options [nop,nop,TS val 348935282 ecr 1092415267], length 1398


3. Traffic gets to tunnel device, where it gets decapsulated and
redirected to destination VF by TC rule on vxlan_sys_4789:

$ sudo tcpdump -ni vxlan_sys_4789 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on vxlan_sys_4789, link-type EN10MB (Ethernet), capture size 262144 bytes
21:07:39.836141 IP (tos 0x0, ttl 64, id 15565, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0xbe91 (correct), seq 2194968387, ack 4279285947, win 24576, options [nop,nop,TS val 1092653536 ecr 349173547], length 0
21:07:39.836202 IP (tos 0x0, ttl 64, id 50774, offset 0, flags [DF], proto TCP (6), length 64360)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x0f6b (incorrect -> 0x1d69), seq 746533:810841, ack 0, win 502, options [nop,nop,TS val 349173550 ecr 1092653536], length 64308
21:07:39.836449 IP (tos 0x0, ttl 64, id 15566, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x610f (correct), seq 0, ack 89473, win 24576, options [nop,nop,TS val 1092653536 ecr 349173548], length 0


4. Decapsulated payload appears on namespaced VF with IP address
5.5.5.5:

$ sudo ip  netns exec ns0 tcpdump -ni enp8s0f0v1 -vvv -c 3
yp_bind_client_create_v3: RPC: Unable to send
dropped privs to tcpdump
tcpdump: listening on enp8s0f0v1, link-type EN10MB (Ethernet), capture size 262144 bytes
21:09:06.758107 IP (tos 0x0, ttl 64, id 27527, offset 0, flags [DF], proto TCP (6), length 32206)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x91d0 (incorrect -> 0x2a2a), seq 1198920825:1198952979, ack 2194968387, win 502, options [nop,nop,TS val 349260472 ecr 1092740448], length 32154
21:09:06.758697 IP (tos 0x0, ttl 64, id 3008, offset 0, flags [DF], proto TCP (6), length 64)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x6a1a (correct), seq 1, ack 4294942132, win 24576, options [nop,nop,TS val 1092740458 ecr 349260463,nop,nop,sack 1 {0:32154}], length 0
21:09:06.758748 IP (tos 0x0, ttl 64, id 27550, offset 0, flags [DF], proto TCP (6), length 25216)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x7682 (incorrect -> 0x7627), seq 4294942132:0, ack 1, win 502, options [nop,nop,TS val 349260473 ecr 1092740458], length 25164


As you can see from the dump Tx is symmetrical. And that is exactly the
behavior we are reproducing with offloads. So I guess correct diagram
would be:

        PHY port
           |
device     |             ,(vxlan)
-----------|------------|-------|----------
kernel     |            |       |
        (UL/PF)       (VFr)    (VF)
           |            |       |
        [TC ing]>redir -`       V

Regards,
Vlad

  reply	other threads:[~2021-02-09 19:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06  5:02 [pull request][net-next V2 00/17] mlx5 updates 2021-02-04 Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 01/17] net/mlx5: E-Switch, Refactor setting source port Saeed Mahameed
2021-02-06 18:13   ` Marcelo Ricardo Leitner
2021-02-08  8:21     ` Vlad Buslov
2021-02-08 13:25       ` Marcelo Ricardo Leitner
2021-02-08 13:31         ` Vlad Buslov
2021-02-08 13:42           ` Marcelo Ricardo Leitner
2021-02-08 20:22       ` Jakub Kicinski
2021-02-09 14:22         ` Vlad Buslov
2021-02-09 16:10           ` Or Gerlitz
2021-02-10 13:56             ` Marcelo Ricardo Leitner
2021-02-10 16:44               ` Vlad Buslov
2021-02-09 18:05           ` Jakub Kicinski
2021-02-09 19:17             ` Vlad Buslov [this message]
2021-02-09 19:50               ` Jakub Kicinski
2021-02-10 11:25                 ` Vlad Buslov
2021-02-10 19:43                   ` Jakub Kicinski
2021-02-09  0:20   ` patchwork-bot+netdevbpf
2021-02-06  5:02 ` [net-next V2 02/17] net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 03/17] net/mlx5e: Always set attr mdev pointer Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 04/17] net/mlx5: E-Switch, Refactor rule offload forward action processing Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 05/17] net/mlx5e: VF tunnel TX traffic offloading Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 06/17] net/mlx5e: Refactor tun routing helpers Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 07/17] net/mlx5: E-Switch, Indirect table infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 08/17] net/mlx5e: Remove redundant match on tunnel destination mac Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 09/17] net/mlx5e: VF tunnel RX traffic offloading Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 10/17] net/mlx5e: Refactor reg_c1 usage Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 11/17] net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 12/17] net/mlx5e: Extract tc tunnel encap/decap code to dedicated file Saeed Mahameed
2021-02-09 20:35   ` Guenter Roeck
2021-02-06  5:02 ` [net-next V2 13/17] net/mlx5e: Create route entry infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 14/17] net/mlx5e: Refactor neigh update infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 15/17] net/mlx5e: TC preparation refactoring for routing update event Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 16/17] net/mlx5e: Rename some encap-specific API to generic names Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 17/17] net/mlx5e: Handle FIB events to update tunnel endpoint device Saeed Mahameed
2021-02-08 21:55 ` [pull request][net-next V2 00/17] mlx5 updates 2021-02-04 Or Gerlitz
2021-02-09  8:42 ` Or Gerlitz
2021-02-09  8:43   ` Or Gerlitz
2021-02-10 16:51   ` Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ygnhlfbxgifc.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=kuba@kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).