All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Saeed Mahameed <saeed@kernel.org>,
	"David S. Miller" <davem@davemloft.net>, <netdev@vger.kernel.org>,
	Mark Bloch <mbloch@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [net-next V2 01/17] net/mlx5: E-Switch, Refactor setting source port
Date: Tue, 9 Feb 2021 21:17:11 +0200	[thread overview]
Message-ID: <ygnhlfbxgifc.fsf@nvidia.com> (raw)
In-Reply-To: <20210209100504.119925c1@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>


On Tue 09 Feb 2021 at 20:05, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 9 Feb 2021 16:22:26 +0200 Vlad Buslov wrote:
>> No, tunnel IP is configured on VF. That particular VF is in host
>> namespace. When mlx5 resolves tunneling the code checks if tunnel
>> endpoint IP address is on such mlx5 VF, since the VF is in same
>> namespace as eswitch manager (e.g. on host) and route returned by
>> ip_route_output_key() is resolved through rt->dst.dev==tunVF device.
>> After establishing that tunnel is on VF the goal is to process two
>> resulting TC rules (in both directions) fully in hardware without
>> exposing the packet on tunneling device or tunnel VF in sw, which is
>> implemented with all the infrastructure from this series.
>> 
>> So, to summarize with IP addresses from TC examples presented in cover letter,
>> we have underlay network 7.7.7.0/24 in host namespace with tunnel endpoint IP
>> address on VF:
>> 
>> $ ip a show dev enp8s0f0v0
>> 1537: enp8s0f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
>>     link/ether 52:e5:6d:f2:00:69 brd ff:ff:ff:ff:ff:ff
>>     altname enp8s0f0np0v0
>>     inet 7.7.7.5/24 scope global enp8s0f0v0
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::50e5:6dff:fef2:69/64 scope link
>>        valid_lft forever preferred_lft forever
>
> Isn't this 100% the wrong way around. Disable the offloads. Does the
> traffic hit the VF encapsulated?
>
> IIUC SW will do this:
>
>         PHY port
>            |
> device     |             ,-----.
> -----------|------------|-------|----------
> kernel     |            |       |
>         (UL/PF)       (VFr)    (VF)
>            |            |       |
>         [TC ing]>redir -`       V
>
> And the packet never hits encap.

We can look at dumps on every stage (produced by running exactly the
same test with OVS option other_config:tc-policy=skip_hw):

1. Traffic arrives at UL with vxlan encapsulation

$ sudo tcpdump -ni enp8s0f0 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on enp8s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:01:28.619346 IP (tos 0x0, ttl 64, id 65187, offset 0, flags [none], proto UDP (17), length 102)
    7.7.7.1.52277 > 7.7.7.5.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 43919, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x467b (correct), seq 2194968387, ack 2680742983, win 24576, options [nop,nop,TS val 1092282319 ecr 348802330], length 0
21:01:28.619505 IP (tos 0x0, ttl 64, id 888, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 6662, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0x8025 (correct), seq 673837:675235, ack 0, win 502, options [nop,nop,TS val 348802333 ecr 1092282319], length 1398
21:01:28.619506 IP (tos 0x0, ttl 64, id 889, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 6663, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0x19d1 (correct), seq 675235:676633, ack 0, win 502, options [nop,nop,TS val 348802333 ecr 1092282319], length 1398


2. By TC rule traffic is redirected to tunnel VF that has IP address
7.7.7.5 (still encapsulated as there is no decap action attached to
filter on enp8s0f0):

$ sudo tcpdump -ni enp8s0f0v0 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on enp8s0f0v0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:03:41.524244 IP (tos 0x0, ttl 64, id 48184, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 52619, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0xaddb (correct), seq 279895999:279897397, ack 2194968387, win 502, options [nop,nop,TS val 348935238 ecr 1092415214], length 1398
21:03:41.568055 IP (tos 0x0, ttl 64, id 701, offset 0, flags [none], proto UDP (17), length 102)
    7.7.7.1.52277 > 7.7.7.5.vxlan: [udp sum ok] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 44938, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0xc623 (correct), seq 1, ack 1398, win 24576, options [nop,nop,TS val 1092415267 ecr 348935238], length 0
21:03:41.568384 IP (tos 0x0, ttl 64, id 48191, offset 0, flags [none], proto UDP (17), length 1500)
    7.7.7.5.40092 > 7.7.7.1.vxlan: [no cksum] VXLAN, flags [I] (0x08), vni 98
IP (tos 0x0, ttl 64, id 52620, offset 0, flags [DF], proto TCP (6), length 1450)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [.], cksum 0xe1b9 (correct), seq 1398:2796, ack 1, win 502, options [nop,nop,TS val 348935282 ecr 1092415267], length 1398


3. Traffic gets to tunnel device, where it gets decapsulated and
redirected to destination VF by TC rule on vxlan_sys_4789:

$ sudo tcpdump -ni vxlan_sys_4789 -vvv -c 3
dropped privs to tcpdump
tcpdump: listening on vxlan_sys_4789, link-type EN10MB (Ethernet), capture size 262144 bytes
21:07:39.836141 IP (tos 0x0, ttl 64, id 15565, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0xbe91 (correct), seq 2194968387, ack 4279285947, win 24576, options [nop,nop,TS val 1092653536 ecr 349173547], length 0
21:07:39.836202 IP (tos 0x0, ttl 64, id 50774, offset 0, flags [DF], proto TCP (6), length 64360)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x0f6b (incorrect -> 0x1d69), seq 746533:810841, ack 0, win 502, options [nop,nop,TS val 349173550 ecr 1092653536], length 64308
21:07:39.836449 IP (tos 0x0, ttl 64, id 15566, offset 0, flags [DF], proto TCP (6), length 52)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x610f (correct), seq 0, ack 89473, win 24576, options [nop,nop,TS val 1092653536 ecr 349173548], length 0


4. Decapsulated payload appears on namespaced VF with IP address
5.5.5.5:

$ sudo ip  netns exec ns0 tcpdump -ni enp8s0f0v1 -vvv -c 3
yp_bind_client_create_v3: RPC: Unable to send
dropped privs to tcpdump
tcpdump: listening on enp8s0f0v1, link-type EN10MB (Ethernet), capture size 262144 bytes
21:09:06.758107 IP (tos 0x0, ttl 64, id 27527, offset 0, flags [DF], proto TCP (6), length 32206)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x91d0 (incorrect -> 0x2a2a), seq 1198920825:1198952979, ack 2194968387, win 502, options [nop,nop,TS val 349260472 ecr 1092740448], length 32154
21:09:06.758697 IP (tos 0x0, ttl 64, id 3008, offset 0, flags [DF], proto TCP (6), length 64)
    5.5.5.1.targus-getdata1 > 5.5.5.5.34538: Flags [.], cksum 0x6a1a (correct), seq 1, ack 4294942132, win 24576, options [nop,nop,TS val 1092740458 ecr 349260463,nop,nop,sack 1 {0:32154}], length 0
21:09:06.758748 IP (tos 0x0, ttl 64, id 27550, offset 0, flags [DF], proto TCP (6), length 25216)
    5.5.5.5.34538 > 5.5.5.1.targus-getdata1: Flags [P.], cksum 0x7682 (incorrect -> 0x7627), seq 4294942132:0, ack 1, win 502, options [nop,nop,TS val 349260473 ecr 1092740458], length 25164


As you can see from the dump Tx is symmetrical. And that is exactly the
behavior we are reproducing with offloads. So I guess correct diagram
would be:

        PHY port
           |
device     |             ,(vxlan)
-----------|------------|-------|----------
kernel     |            |       |
        (UL/PF)       (VFr)    (VF)
           |            |       |
        [TC ing]>redir -`       V

Regards,
Vlad

  reply	other threads:[~2021-02-09 19:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06  5:02 [pull request][net-next V2 00/17] mlx5 updates 2021-02-04 Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 01/17] net/mlx5: E-Switch, Refactor setting source port Saeed Mahameed
2021-02-06 18:13   ` Marcelo Ricardo Leitner
2021-02-08  8:21     ` Vlad Buslov
2021-02-08 13:25       ` Marcelo Ricardo Leitner
2021-02-08 13:31         ` Vlad Buslov
2021-02-08 13:42           ` Marcelo Ricardo Leitner
2021-02-08 20:22       ` Jakub Kicinski
2021-02-09 14:22         ` Vlad Buslov
2021-02-09 16:10           ` Or Gerlitz
2021-02-10 13:56             ` Marcelo Ricardo Leitner
2021-02-10 16:44               ` Vlad Buslov
2021-02-09 18:05           ` Jakub Kicinski
2021-02-09 19:17             ` Vlad Buslov [this message]
2021-02-09 19:50               ` Jakub Kicinski
2021-02-10 11:25                 ` Vlad Buslov
2021-02-10 19:43                   ` Jakub Kicinski
2021-02-09  0:20   ` patchwork-bot+netdevbpf
2021-02-06  5:02 ` [net-next V2 02/17] net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 03/17] net/mlx5e: Always set attr mdev pointer Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 04/17] net/mlx5: E-Switch, Refactor rule offload forward action processing Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 05/17] net/mlx5e: VF tunnel TX traffic offloading Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 06/17] net/mlx5e: Refactor tun routing helpers Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 07/17] net/mlx5: E-Switch, Indirect table infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 08/17] net/mlx5e: Remove redundant match on tunnel destination mac Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 09/17] net/mlx5e: VF tunnel RX traffic offloading Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 10/17] net/mlx5e: Refactor reg_c1 usage Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 11/17] net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 12/17] net/mlx5e: Extract tc tunnel encap/decap code to dedicated file Saeed Mahameed
2021-02-09 20:35   ` Guenter Roeck
2021-02-06  5:02 ` [net-next V2 13/17] net/mlx5e: Create route entry infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 14/17] net/mlx5e: Refactor neigh update infrastructure Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 15/17] net/mlx5e: TC preparation refactoring for routing update event Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 16/17] net/mlx5e: Rename some encap-specific API to generic names Saeed Mahameed
2021-02-06  5:02 ` [net-next V2 17/17] net/mlx5e: Handle FIB events to update tunnel endpoint device Saeed Mahameed
2021-02-08 21:55 ` [pull request][net-next V2 00/17] mlx5 updates 2021-02-04 Or Gerlitz
2021-02-09  8:42 ` Or Gerlitz
2021-02-09  8:43   ` Or Gerlitz
2021-02-10 16:51   ` Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ygnhlfbxgifc.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=kuba@kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.