xdp-newbies.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xdp-cpumap-tc and VLAN not working together in my setup
@ 2021-07-26  1:16 Ethy H. Brito
  2021-07-26  8:26 ` Ivan Koveshnikov
  0 siblings, 1 reply; 3+ messages in thread
From: Ethy H. Brito @ 2021-07-26  1:16 UTC (permalink / raw)
  To: xdp-newbies; +Cc: Jesper Dangaard Brouer


Hi everyone.

(Long and very verbose email follows. Sorry about that - be patient)

I can't make xdp-cpumap-tc work if vlan is used at WAN interface.

If the packet gets redirected , that is, if it hits 

	"return bpf_redirect_map(&cpu_map, cpu_dest, 0);"

in xdp_iphash_to_cpu_kern (function parse_ipv4) the packet never arrives
at client. It gets dropped somewhere. 

The test setup comprises three boxes:
	1) a client - vanila ubuntu 20
	2) a middle router box in-between (1) and (3) (that runs XDP and tc_classify)
	3) a server - vanila ubuntu 20

They are almost completely isolated from production environment.

A vlan (nic-br) is set between (2) and (3).

xdp-cpumap-tc was git download with 

	git clone --recurse-submodules https://github.com/xdp-project/xdp-cpumap-tc

and compiled yesterday. No errors.

The VLANs are created like that:
	at middle box:
		#ip link add link eth1 name nic-br type vlan id 1003
	at server box
		#ip link add link eth0 name nic-br type vlan id 1003
The routes are
	at middle box:
		# ip r sh 
		10.16.239.0/24 dev eth0 scope link 
		187.17.36.69 dev eth0 scope link 
		192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.1 

	at server box:
		# ip r s
		default via 192.168.1.1 dev nic-br 
		10.16.239.0/24 via 192.168.1.1 dev nic-br 
		192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.2 

Client box has two IP  addresses configured at its only interface (eth0)
	inet 10.16.239.213/32 scope global client
	inet 187.17.36.69/32 scope global client

Both IPs "pings" server ip address 192.168.1.2 thru "middle" when XDP is *OFF*.

# ping -I 187.17.36.69 192.168.1.2
PING 192.168.1.2 (192.168.1.2) from 187.17.36.69 : 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.391 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.360 ms

# ping -I 10.16.239.213 192.168.1.2
PING 192.168.1.2 (192.168.1.2) from 10.16.239.213 : 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.410 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.389 ms

But when the script bellow is executed at "middle", It is not possible 
to ping the server from IP 187.17.36.69 anymore. 
10.16.239.213 works Ok.

As you will see bellow I only "mapped" the 187... address.

Since I have no clue where to investigate, I watched all I could think of.
These are my observations up to now:

	1) if the packet is redirected to some CPU, it disappear inside
	the kernel never hitting the client.

	2) if I unconditionally returns XDP_PASS at the end of parse_ipv4
	(xdp_iphash_to_cpu_kern) both pings work.

	3) If I comment out the last line of the script bellow (the mapping line), 
	then flush XDP and TC, and run the script again, both pings work.
	(since I do not have a map hit and the cpu redirect never occurs)

	4) If I kill the vlans and route the packets thru the "naked" eth's
	both pings work. (no need to reload XDP or tc_classify - it just works right away)

	5) I put some bpf_debug messages at the VLAN detection code, at both
	xdp_iphash and tc_classify, and they are both never hit.

	6) locally, at middle box, I can always ping 187.17.36.69 and 10.16.239.213
	(even with XDP *ON*)

	7) If I execute: 
	/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0
	pings from 10.16.239.213 stop immediately (since now these IP packets gets redirected to a CPU). 
	Deleting the IP entry from map, restores the ping immediately. 

	8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN 
	header removed with XDP	loaded into kernel - observed with tcpdump as bellow:

Dump with XDP *OFF* (VLAN header OK - packet make thru client)
# stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes

19:32:45.676423 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*, 
length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 137, length 64

19:32:45.676563 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *802.1Q (0x8100)*, 
length 102: vlan 1003, p 0, ethertype IPv4, 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 137, length 64


Dump with XDP *ON* (NO VLAN header - no packet get out thru middle's LAN (eth0))
# stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes

19:30:43.852543 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*, 
length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 18, length 64

19:30:43.852695 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *IPv4 (0x0800)*, 
length 98: 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 18, length 64

This is bpftool output:
# bpftool net
xdp:
eth0(3) driver id 796
eth1(4) driver id 801

tc:
eth0(3) clsact/egress tc_classify_kern.o:[tc_classify] id 797
eth1(4) clsact/egress tc_classify_kern.o:[tc_classify] id 802

flow_dissector:


This is the script I use to start xdp-cpumap. It is a fragment form a much
larger script that runs at my production box, stripped to a bare minimum where
the problem still happens.

----------------------------------- 8< -------------------------------------------
#!/bin/bash

# Flushes all XDP maps
###################################################
/usr/local/bin/xdp_iphash_to_cpu_cmdline --clear &>/dev/null

/sbin/ip link set eth0 up
# Turn off eth0's XPS
for xps_cpus in $(ls /sys/class/net/eth0/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
        echo 0 > $xps_cpus
done

# remove any existing qdiscs
/sbin/tc qdisc del dev eth0 root 2> /dev/null
/sbin/tc qdisc del dev eth0 ingress 2> /dev/null

# Multiqueue root discipline, handle 7fff: handle 
/sbin/tc qdisc replace dev eth0 root handle 7FFF: mq

#Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
i=0
for dir in /sys/class/net/eth0/queues/tx-* ; do
    x=$((i++))
    # Qdisc HTB $i: under parent 7FFF:$i
    i_str=$(printf '%x' $i)
    
    # "root" class
    /sbin/tc qdisc add dev eth0 parent 7FFF:$i_str handle $i_str: hfsc default fffd

    # inner classes
    /sbin/tc class add dev eth0 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
    # - set default class rate 
    /sbin/tc class add dev eth0 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
    # Change the qdisc on default class 
    /sbin/tc qdisc add dev eth0 parent $i_str:fffd fq_codel
done

# Load XDP module
/usr/local/bin/xdp_iphash_to_cpu --dev eth0 --all-cpus --wan --quiet &>/dev/null
# Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
/usr/local/bin/tc_classify --dev-egress eth0 --base-setup  --quiet &>/dev/null


/sbin/ip link set eth1 up
# Turn off eth1's XPS
for xps_cpus in $(ls /sys/class/net/eth1/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
        echo 0 > $xps_cpus
done

# remove any existing qdiscs
/sbin/tc qdisc del dev eth1 root 2> /dev/null
/sbin/tc qdisc del dev eth1 ingress 2> /dev/null

# Multiqueue root discipline, handle 7fff: handle 
/sbin/tc qdisc replace dev eth1 root handle 7FFF: mq

#Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
i=0
for dir in /sys/class/net/eth1/queues/tx-* ; do
    x=$((i++))
    # Qdisc HTB $i: under parent 7FFF:$i
    i_str=$(printf '%x' $i)
    
    # "root" class
    /sbin/tc qdisc add dev eth1 parent 7FFF:$i_str handle $i_str: hfsc default fffd

    # inner classes
    /sbin/tc class add dev eth1 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
    # - set default class rate 
    /sbin/tc class add dev eth1 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
    # Change the qdisc on default class 
    /sbin/tc qdisc add dev eth1 parent $i_str:fffd fq_codel
done

# Load XDP module
/usr/local/bin/xdp_iphash_to_cpu --dev eth1 --all-cpus --wan --quiet &>/dev/null
# Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
/usr/local/bin/tc_classify --dev-egress eth1 --base-setup  --quiet &>/dev/null

#/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0 >&/dev/null
#/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.254' --classid '1:fffe' --cpu 0 >&/dev/null
#/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.253' --classid '1:fffc' --cpu 0 >&/dev/null

# Put the client's packets in its shaper
#from client: classid=4:105;VEL=52428800;
/sbin/tc class add dev eth1 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
/sbin/tc qdisc add dev eth1 parent 4:105 fq_codel

#to client: classid=4:105;VEL=52428800;
/sbin/tc class add dev eth0 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
/sbin/tc qdisc add dev eth0 parent 4:105 fq_codel

/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '187.17.36.69' --classid '4:105' --cpu 3 >&/dev/null

exit 0


----------------------------------- 8< -------------------------------------------

Some bad interaction is happening when I use XDP and VLANs together.

Can you guys help me with this??

Regards

Ethy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xdp-cpumap-tc and VLAN not working together in my setup
  2021-07-26  1:16 xdp-cpumap-tc and VLAN not working together in my setup Ethy H. Brito
@ 2021-07-26  8:26 ` Ivan Koveshnikov
  2021-07-26 21:17   ` Ethy H. Brito
  0 siblings, 1 reply; 3+ messages in thread
From: Ivan Koveshnikov @ 2021-07-26  8:26 UTC (permalink / raw)
  To: Ethy H. Brito; +Cc: xdp-newbies, Jesper Dangaard Brouer

Hi Ethy,

>        8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN
>        header removed with XDP loaded into kernel - observed with tcpdump as bellow

Have you disabled vlan rx offloading on your nic, when trying XDP?
When vlan rx offloading is enabled, nic cuts off vlan header from
packets and sends them separately as metadata. When a packet is
processed  by the kernel, this metadata is saved into `struct
sk_buff->vlan_tci`, xdp hooks happen before that and don't work with
skbs. But for the time being there is no uniform interface to address
such metadata from XDP, and all XDP can see is a packet with vlan
headers stripped. I believe you need to switch off vlan rx offload
(`ethtool -K ${ifname} rxvlan off `) to get proper behaviour.

Best regards,
Ivan Koveshnikov

On Mon, 26 Jul 2021 at 06:17, Ethy H. Brito <ethy.brito@inexo.com.br> wrote:
>
>
> Hi everyone.
>
> (Long and very verbose email follows. Sorry about that - be patient)
>
> I can't make xdp-cpumap-tc work if vlan is used at WAN interface.
>
> If the packet gets redirected , that is, if it hits
>
>         "return bpf_redirect_map(&cpu_map, cpu_dest, 0);"
>
> in xdp_iphash_to_cpu_kern (function parse_ipv4) the packet never arrives
> at client. It gets dropped somewhere.
>
> The test setup comprises three boxes:
>         1) a client - vanila ubuntu 20
>         2) a middle router box in-between (1) and (3) (that runs XDP and tc_classify)
>         3) a server - vanila ubuntu 20
>
> They are almost completely isolated from production environment.
>
> A vlan (nic-br) is set between (2) and (3).
>
> xdp-cpumap-tc was git download with
>
>         git clone --recurse-submodules https://github.com/xdp-project/xdp-cpumap-tc
>
> and compiled yesterday. No errors.
>
> The VLANs are created like that:
>         at middle box:
>                 #ip link add link eth1 name nic-br type vlan id 1003
>         at server box
>                 #ip link add link eth0 name nic-br type vlan id 1003
> The routes are
>         at middle box:
>                 # ip r sh
>                 10.16.239.0/24 dev eth0 scope link
>                 187.17.36.69 dev eth0 scope link
>                 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.1
>
>         at server box:
>                 # ip r s
>                 default via 192.168.1.1 dev nic-br
>                 10.16.239.0/24 via 192.168.1.1 dev nic-br
>                 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.2
>
> Client box has two IP  addresses configured at its only interface (eth0)
>         inet 10.16.239.213/32 scope global client
>         inet 187.17.36.69/32 scope global client
>
> Both IPs "pings" server ip address 192.168.1.2 thru "middle" when XDP is *OFF*.
>
> # ping -I 187.17.36.69 192.168.1.2
> PING 192.168.1.2 (192.168.1.2) from 187.17.36.69 : 56(84) bytes of data.
> 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.391 ms
> 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.360 ms
>
> # ping -I 10.16.239.213 192.168.1.2
> PING 192.168.1.2 (192.168.1.2) from 10.16.239.213 : 56(84) bytes of data.
> 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.410 ms
> 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.389 ms
>
> But when the script bellow is executed at "middle", It is not possible
> to ping the server from IP 187.17.36.69 anymore.
> 10.16.239.213 works Ok.
>
> As you will see bellow I only "mapped" the 187... address.
>
> Since I have no clue where to investigate, I watched all I could think of.
> These are my observations up to now:
>
>         1) if the packet is redirected to some CPU, it disappear inside
>         the kernel never hitting the client.
>
>         2) if I unconditionally returns XDP_PASS at the end of parse_ipv4
>         (xdp_iphash_to_cpu_kern) both pings work.
>
>         3) If I comment out the last line of the script bellow (the mapping line),
>         then flush XDP and TC, and run the script again, both pings work.
>         (since I do not have a map hit and the cpu redirect never occurs)
>
>         4) If I kill the vlans and route the packets thru the "naked" eth's
>         both pings work. (no need to reload XDP or tc_classify - it just works right away)
>
>         5) I put some bpf_debug messages at the VLAN detection code, at both
>         xdp_iphash and tc_classify, and they are both never hit.
>
>         6) locally, at middle box, I can always ping 187.17.36.69 and 10.16.239.213
>         (even with XDP *ON*)
>
>         7) If I execute:
>         /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0
>         pings from 10.16.239.213 stop immediately (since now these IP packets gets redirected to a CPU).
>         Deleting the IP entry from map, restores the ping immediately.
>
>         8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN
>         header removed with XDP loaded into kernel - observed with tcpdump as bellow:
>
> Dump with XDP *OFF* (VLAN header OK - packet make thru client)
> # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
>
> 19:32:45.676423 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*,
> length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 137, length 64
>
> 19:32:45.676563 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *802.1Q (0x8100)*,
> length 102: vlan 1003, p 0, ethertype IPv4, 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 137, length 64
>
>
> Dump with XDP *ON* (NO VLAN header - no packet get out thru middle's LAN (eth0))
> # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
>
> 19:30:43.852543 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*,
> length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 18, length 64
>
> 19:30:43.852695 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *IPv4 (0x0800)*,
> length 98: 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 18, length 64
>
> This is bpftool output:
> # bpftool net
> xdp:
> eth0(3) driver id 796
> eth1(4) driver id 801
>
> tc:
> eth0(3) clsact/egress tc_classify_kern.o:[tc_classify] id 797
> eth1(4) clsact/egress tc_classify_kern.o:[tc_classify] id 802
>
> flow_dissector:
>
>
> This is the script I use to start xdp-cpumap. It is a fragment form a much
> larger script that runs at my production box, stripped to a bare minimum where
> the problem still happens.
>
> ----------------------------------- 8< -------------------------------------------
> #!/bin/bash
>
> # Flushes all XDP maps
> ###################################################
> /usr/local/bin/xdp_iphash_to_cpu_cmdline --clear &>/dev/null
>
> /sbin/ip link set eth0 up
> # Turn off eth0's XPS
> for xps_cpus in $(ls /sys/class/net/eth0/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
>         echo 0 > $xps_cpus
> done
>
> # remove any existing qdiscs
> /sbin/tc qdisc del dev eth0 root 2> /dev/null
> /sbin/tc qdisc del dev eth0 ingress 2> /dev/null
>
> # Multiqueue root discipline, handle 7fff: handle
> /sbin/tc qdisc replace dev eth0 root handle 7FFF: mq
>
> #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
> i=0
> for dir in /sys/class/net/eth0/queues/tx-* ; do
>     x=$((i++))
>     # Qdisc HTB $i: under parent 7FFF:$i
>     i_str=$(printf '%x' $i)
>
>     # "root" class
>     /sbin/tc qdisc add dev eth0 parent 7FFF:$i_str handle $i_str: hfsc default fffd
>
>     # inner classes
>     /sbin/tc class add dev eth0 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
>     # - set default class rate
>     /sbin/tc class add dev eth0 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
>     # Change the qdisc on default class
>     /sbin/tc qdisc add dev eth0 parent $i_str:fffd fq_codel
> done
>
> # Load XDP module
> /usr/local/bin/xdp_iphash_to_cpu --dev eth0 --all-cpus --wan --quiet &>/dev/null
> # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
> /usr/local/bin/tc_classify --dev-egress eth0 --base-setup  --quiet &>/dev/null
>
>
> /sbin/ip link set eth1 up
> # Turn off eth1's XPS
> for xps_cpus in $(ls /sys/class/net/eth1/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
>         echo 0 > $xps_cpus
> done
>
> # remove any existing qdiscs
> /sbin/tc qdisc del dev eth1 root 2> /dev/null
> /sbin/tc qdisc del dev eth1 ingress 2> /dev/null
>
> # Multiqueue root discipline, handle 7fff: handle
> /sbin/tc qdisc replace dev eth1 root handle 7FFF: mq
>
> #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
> i=0
> for dir in /sys/class/net/eth1/queues/tx-* ; do
>     x=$((i++))
>     # Qdisc HTB $i: under parent 7FFF:$i
>     i_str=$(printf '%x' $i)
>
>     # "root" class
>     /sbin/tc qdisc add dev eth1 parent 7FFF:$i_str handle $i_str: hfsc default fffd
>
>     # inner classes
>     /sbin/tc class add dev eth1 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
>     # - set default class rate
>     /sbin/tc class add dev eth1 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
>     # Change the qdisc on default class
>     /sbin/tc qdisc add dev eth1 parent $i_str:fffd fq_codel
> done
>
> # Load XDP module
> /usr/local/bin/xdp_iphash_to_cpu --dev eth1 --all-cpus --wan --quiet &>/dev/null
> # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
> /usr/local/bin/tc_classify --dev-egress eth1 --base-setup  --quiet &>/dev/null
>
> #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0 >&/dev/null
> #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.254' --classid '1:fffe' --cpu 0 >&/dev/null
> #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.253' --classid '1:fffc' --cpu 0 >&/dev/null
>
> # Put the client's packets in its shaper
> #from client: classid=4:105;VEL=52428800;
> /sbin/tc class add dev eth1 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
> /sbin/tc qdisc add dev eth1 parent 4:105 fq_codel
>
> #to client: classid=4:105;VEL=52428800;
> /sbin/tc class add dev eth0 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
> /sbin/tc qdisc add dev eth0 parent 4:105 fq_codel
>
> /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '187.17.36.69' --classid '4:105' --cpu 3 >&/dev/null
>
> exit 0
>
>
> ----------------------------------- 8< -------------------------------------------
>
> Some bad interaction is happening when I use XDP and VLANs together.
>
> Can you guys help me with this??
>
> Regards
>
> Ethy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xdp-cpumap-tc and VLAN not working together in my setup
  2021-07-26  8:26 ` Ivan Koveshnikov
@ 2021-07-26 21:17   ` Ethy H. Brito
  0 siblings, 0 replies; 3+ messages in thread
From: Ethy H. Brito @ 2021-07-26 21:17 UTC (permalink / raw)
  To: Ivan Koveshnikov; +Cc: xdp-newbies, Jesper Dangaard Brouer

On Mon, 26 Jul 2021 13:26:08 +0500
Ivan Koveshnikov <ikoveshnik@gmail.com> wrote:

> Hi Ethy,
> 
> >        8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN
> >        header removed with XDP loaded into kernel - observed with tcpdump as bellow  
> 
> Have you disabled vlan rx offloading on your nic, when trying XDP?
> When vlan rx offloading is enabled, nic cuts off vlan header from
> packets and sends them separately as metadata. When a packet is
> processed  by the kernel, this metadata is saved into `struct
> sk_buff->vlan_tci`, xdp hooks happen before that and don't work with
> skbs. But for the time being there is no uniform interface to address
> such metadata from XDP, and all XDP can see is a packet with vlan
> headers stripped. I believe you need to switch off vlan rx offload
> (`ethtool -K ${ifname} rxvlan off `) to get proper behaviour.
> 
> Best regards,
> Ivan Koveshnikov

Hi Ivan.

Thank you for your help.

"ethtool -K ${ifname} rxvlan off" solved the problem.

Now packets gets correctly classified and shaped.

But what is not clear for me is why, even with xdp_iphash_kern receiving the IP packet 
and have redirected it to the correct CPU,  the packets get "dropped" somewhere.
Any clues on this?

Regards

Ethy




> 
> On Mon, 26 Jul 2021 at 06:17, Ethy H. Brito <ethy.brito@inexo.com.br> wrote:
> >
> >
> > Hi everyone.
> >
> > (Long and very verbose email follows. Sorry about that - be patient)
> >
> > I can't make xdp-cpumap-tc work if vlan is used at WAN interface.
> >
> > If the packet gets redirected , that is, if it hits
> >
> >         "return bpf_redirect_map(&cpu_map, cpu_dest, 0);"
> >
> > in xdp_iphash_to_cpu_kern (function parse_ipv4) the packet never arrives
> > at client. It gets dropped somewhere.
> >
> > The test setup comprises three boxes:
> >         1) a client - vanila ubuntu 20
> >         2) a middle router box in-between (1) and (3) (that runs XDP and tc_classify)
> >         3) a server - vanila ubuntu 20
> >
> > They are almost completely isolated from production environment.
> >
> > A vlan (nic-br) is set between (2) and (3).
> >
> > xdp-cpumap-tc was git download with
> >
> >         git clone --recurse-submodules https://github.com/xdp-project/xdp-cpumap-tc
> >
> > and compiled yesterday. No errors.
> >
> > The VLANs are created like that:
> >         at middle box:
> >                 #ip link add link eth1 name nic-br type vlan id 1003
> >         at server box
> >                 #ip link add link eth0 name nic-br type vlan id 1003
> > The routes are
> >         at middle box:
> >                 # ip r sh
> >                 10.16.239.0/24 dev eth0 scope link
> >                 187.17.36.69 dev eth0 scope link
> >                 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.1
> >
> >         at server box:
> >                 # ip r s
> >                 default via 192.168.1.1 dev nic-br
> >                 10.16.239.0/24 via 192.168.1.1 dev nic-br
> >                 192.168.1.0/24 dev nic-br proto kernel scope link src 192.168.1.2
> >
> > Client box has two IP  addresses configured at its only interface (eth0)
> >         inet 10.16.239.213/32 scope global client
> >         inet 187.17.36.69/32 scope global client
> >
> > Both IPs "pings" server ip address 192.168.1.2 thru "middle" when XDP is *OFF*.
> >
> > # ping -I 187.17.36.69 192.168.1.2
> > PING 192.168.1.2 (192.168.1.2) from 187.17.36.69 : 56(84) bytes of data.
> > 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.391 ms
> > 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.360 ms
> >
> > # ping -I 10.16.239.213 192.168.1.2
> > PING 192.168.1.2 (192.168.1.2) from 10.16.239.213 : 56(84) bytes of data.
> > 64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.410 ms
> > 64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.389 ms
> >
> > But when the script bellow is executed at "middle", It is not possible
> > to ping the server from IP 187.17.36.69 anymore.
> > 10.16.239.213 works Ok.
> >
> > As you will see bellow I only "mapped" the 187... address.
> >
> > Since I have no clue where to investigate, I watched all I could think of.
> > These are my observations up to now:
> >
> >         1) if the packet is redirected to some CPU, it disappear inside
> >         the kernel never hitting the client.
> >
> >         2) if I unconditionally returns XDP_PASS at the end of parse_ipv4
> >         (xdp_iphash_to_cpu_kern) both pings work.
> >
> >         3) If I comment out the last line of the script bellow (the mapping line),
> >         then flush XDP and TC, and run the script again, both pings work.
> >         (since I do not have a map hit and the cpu redirect never occurs)
> >
> >         4) If I kill the vlans and route the packets thru the "naked" eth's
> >         both pings work. (no need to reload XDP or tc_classify - it just works right away)
> >
> >         5) I put some bpf_debug messages at the VLAN detection code, at both
> >         xdp_iphash and tc_classify, and they are both never hit.
> >
> >         6) locally, at middle box, I can always ping 187.17.36.69 and 10.16.239.213
> >         (even with XDP *ON*)
> >
> >         7) If I execute:
> >         /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0
> >         pings from 10.16.239.213 stop immediately (since now these IP packets gets redirected to a CPU).
> >         Deleting the IP entry from map, restores the ping immediately.
> >
> >         8) Any packet ARRIVING thru middle's WAN (eth1) interface has its VLAN
> >         header removed with XDP loaded into kernel - observed with tcpdump as bellow:
> >
> > Dump with XDP *OFF* (VLAN header OK - packet make thru client)
> > # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> > listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
> >
> > 19:32:45.676423 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*,
> > length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 137, length 64
> >
> > 19:32:45.676563 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *802.1Q (0x8100)*,
> > length 102: vlan 1003, p 0, ethertype IPv4, 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 137, length 64
> >
> >
> > Dump with XDP *ON* (NO VLAN header - no packet get out thru middle's LAN (eth0))
> > # stdbuf -o 0 -e 0 tcpdump -nei eth1 icmp
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> > listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
> >
> > 19:30:43.852543 00:1b:21:8d:72:99 > 00:16:3e:f4:e9:88, ethertype *802.1Q (0x8100)*,
> > length 102: vlan 1003, p 0, ethertype IPv4, 187.17.36.69 > 192.168.1.2: ICMP echo request, id 1163, seq 18, length 64
> >
> > 19:30:43.852695 00:16:3e:f4:e9:88 > 00:1b:21:8d:72:99, ethertype *IPv4 (0x0800)*,
> > length 98: 192.168.1.2 > 187.17.36.69: ICMP echo reply, id 1163, seq 18, length 64
> >
> > This is bpftool output:
> > # bpftool net
> > xdp:
> > eth0(3) driver id 796
> > eth1(4) driver id 801
> >
> > tc:
> > eth0(3) clsact/egress tc_classify_kern.o:[tc_classify] id 797
> > eth1(4) clsact/egress tc_classify_kern.o:[tc_classify] id 802
> >
> > flow_dissector:
> >
> >
> > This is the script I use to start xdp-cpumap. It is a fragment form a much
> > larger script that runs at my production box, stripped to a bare minimum where
> > the problem still happens.
> >
> > ----------------------------------- 8< -------------------------------------------
> > #!/bin/bash
> >
> > # Flushes all XDP maps
> > ###################################################
> > /usr/local/bin/xdp_iphash_to_cpu_cmdline --clear &>/dev/null
> >
> > /sbin/ip link set eth0 up
> > # Turn off eth0's XPS
> > for xps_cpus in $(ls /sys/class/net/eth0/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
> >         echo 0 > $xps_cpus
> > done
> >
> > # remove any existing qdiscs
> > /sbin/tc qdisc del dev eth0 root 2> /dev/null
> > /sbin/tc qdisc del dev eth0 ingress 2> /dev/null
> >
> > # Multiqueue root discipline, handle 7fff: handle
> > /sbin/tc qdisc replace dev eth0 root handle 7FFF: mq
> >
> > #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
> > i=0
> > for dir in /sys/class/net/eth0/queues/tx-* ; do
> >     x=$((i++))
> >     # Qdisc HTB $i: under parent 7FFF:$i
> >     i_str=$(printf '%x' $i)
> >
> >     # "root" class
> >     /sbin/tc qdisc add dev eth0 parent 7FFF:$i_str handle $i_str: hfsc default fffd
> >
> >     # inner classes
> >     /sbin/tc class add dev eth0 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
> >     # - set default class rate
> >     /sbin/tc class add dev eth0 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
> >     # Change the qdisc on default class
> >     /sbin/tc qdisc add dev eth0 parent $i_str:fffd fq_codel
> > done
> >
> > # Load XDP module
> > /usr/local/bin/xdp_iphash_to_cpu --dev eth0 --all-cpus --wan --quiet &>/dev/null
> > # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
> > /usr/local/bin/tc_classify --dev-egress eth0 --base-setup  --quiet &>/dev/null
> >
> >
> > /sbin/ip link set eth1 up
> > # Turn off eth1's XPS
> > for xps_cpus in $(ls /sys/class/net/eth1/queues/tx-*/xps_cpus | sort --field-separator='-' -k2n); do
> >         echo 0 > $xps_cpus
> > done
> >
> > # remove any existing qdiscs
> > /sbin/tc qdisc del dev eth1 root 2> /dev/null
> > /sbin/tc qdisc del dev eth1 ingress 2> /dev/null
> >
> > # Multiqueue root discipline, handle 7fff: handle
> > /sbin/tc qdisc replace dev eth1 root handle 7FFF: mq
> >
> > #Cria as disciplinas de fila (filhas de cada qdisc MQ acima) para atrelar (mais tarde) a cada CPU.
> > i=0
> > for dir in /sys/class/net/eth1/queues/tx-* ; do
> >     x=$((i++))
> >     # Qdisc HTB $i: under parent 7FFF:$i
> >     i_str=$(printf '%x' $i)
> >
> >     # "root" class
> >     /sbin/tc qdisc add dev eth1 parent 7FFF:$i_str handle $i_str: hfsc default fffd
> >
> >     # inner classes
> >     /sbin/tc class add dev eth1 parent $i_str: classid $i_str:1 hfsc sc m2 7gibit ul rate 7gibit
> >     # - set default class rate
> >     /sbin/tc class add dev eth1 parent $i_str:1 classid $i_str:fffd hfsc sc m2 7gibit ul rate 7gibit
> >     # Change the qdisc on default class
> >     /sbin/tc qdisc add dev eth1 parent $i_str:fffd fq_codel
> > done
> >
> > # Load XDP module
> > /usr/local/bin/xdp_iphash_to_cpu --dev eth1 --all-cpus --wan --quiet &>/dev/null
> > # Put all CPUs to dance. CPU X will process queue X+1 that holds the X+1 class
> > /usr/local/bin/tc_classify --dev-egress eth1 --base-setup  --quiet &>/dev/null
> >
> > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.255' --classid '1:fffd' --cpu 0 >&/dev/null
> > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.254' --classid '1:fffe' --cpu 0 >&/dev/null
> > #/usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '255.255.255.253' --classid '1:fffc' --cpu 0 >&/dev/null
> >
> > # Put the client's packets in its shaper
> > #from client: classid=4:105;VEL=52428800;
> > /sbin/tc class add dev eth1 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
> > /sbin/tc qdisc add dev eth1 parent 4:105 fq_codel
> >
> > #to client: classid=4:105;VEL=52428800;
> > /sbin/tc class add dev eth0 parent 4:1 classid 4:105 hfsc sc m2 50mibit ul rate 50mibit
> > /sbin/tc qdisc add dev eth0 parent 4:105 fq_codel
> >
> > /usr/local/bin/xdp_iphash_to_cpu_cmdline --add --ip '187.17.36.69' --classid '4:105' --cpu 3 >&/dev/null
> >
> > exit 0
> >
> >
> > ----------------------------------- 8< -------------------------------------------
> >
> > Some bad interaction is happening when I use XDP and VLANs together.
> >
> > Can you guys help me with this??
> >
> > Regards
> >
> > Ethy  


-- 

Ethy H. Brito         /"\
InterNexo Ltda.       \ /  CAMPANHA DA FITA ASCII - CONTRA MAIL HTML
+55 (12) 3797-6860     X   ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
S.J.Campos - Brasil   / \ 
 
PGP key: http://www.inexo.com.br/~ethy/0xC3F222A0.asc

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-26 21:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26  1:16 xdp-cpumap-tc and VLAN not working together in my setup Ethy H. Brito
2021-07-26  8:26 ` Ivan Koveshnikov
2021-07-26 21:17   ` Ethy H. Brito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).