netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Veth pair swallow packets for XDP_TX operation
@ 2020-01-15 22:35 Hanlin Shi
  2020-01-16  9:01 ` Toshiaki Makita
  0 siblings, 1 reply; 5+ messages in thread
From: Hanlin Shi @ 2020-01-15 22:35 UTC (permalink / raw)
  To: netdev; +Cc: Cheng-Chun William Tu

Hi community,

I’m prototyping an XDP program, and the hit issues with XDP_TX operation on veth device. The following code snippet is working as expected on 4.15.0-54-generic, but is NOT working on 4.20.17-042017-lowlatency (I got the kernel here: https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20.17/).

Here’s my setup: I created a veth pair (namely veth1 and veth2), and put them in two namespaces (namely ns1 and ns2). I assigned address 60.0.0.1 on veth1 and 60.0.0.2 on veth2, set the device as the default interface in its namespace respectively (e.g. in ns1, do “ip r set default dev veth1”). Then in ns1, I ping 60.0.0.2, and tcpdump on veth1’s RX for ICMP.

Before loading any XDP program on veth2, I can see ICMP replies on veth1 interface. I load a program which do “XDP_TX” for all packets on veth2. I expect to see the same ICMP packet being returned, but I saw nothing.

I added some debugging message in the XDP program so I’m sure that the packet is processed on veth2, but on veth1, even with promisc mode on, I cannot see any ICMP packets or even ARP packets. In my understanding, 4.15 is using generic XDP mode where 4.20 is using native XDP mode for veth, so I guess there’s something wrong with veth native XDP and need some helps on fixing the issue.

Please let me know if you need help on reproducing the issue.

Thanks,
Hanlin

PS: here’s the src code for the XDP program:
#include <stddef.h>
#include <string.h>
#include <linux/if_vlan.h>
#include <stdbool.h>
#include <bpf/bpf_endian.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <linux/in.h>#define DEBUG
#include "bpf_helpers.h"

SEC("xdp")
int loadbal(struct xdp_md *ctx) {
  bpf_printk("got packet, direct return\n");
  return XDP_TX;
}char _license[] SEC("license") = "GPL";

"bpf_helpers.h" can be found here: https://github.com/dropbox/goebpf/raw/master/bpf_helpers.h


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Veth pair swallow packets for XDP_TX operation
  2020-01-15 22:35 Veth pair swallow packets for XDP_TX operation Hanlin Shi
@ 2020-01-16  9:01 ` Toshiaki Makita
  2020-01-16 21:28   ` William Tu
  2020-01-16 22:54   ` Hanlin Shi
  0 siblings, 2 replies; 5+ messages in thread
From: Toshiaki Makita @ 2020-01-16  9:01 UTC (permalink / raw)
  To: Hanlin Shi, netdev; +Cc: Cheng-Chun William Tu

Hi Hanlin,

On 2020/01/16 7:35, Hanlin Shi wrote:
> Hi community,
> 
> I’m prototyping an XDP program, and the hit issues with XDP_TX operation on veth device. The following code snippet is working as expected on 4.15.0-54-generic, but is NOT working on 4.20.17-042017-lowlatency (I got the kernel here: https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20.17/).
> 
> Here’s my setup: I created a veth pair (namely veth1 and veth2), and put them in two namespaces (namely ns1 and ns2). I assigned address 60.0.0.1 on veth1 and 60.0.0.2 on veth2, set the device as the default interface in its namespace respectively (e.g. in ns1, do “ip r set default dev veth1”). Then in ns1, I ping 60.0.0.2, and tcpdump on veth1’s RX for ICMP.
> 
> Before loading any XDP program on veth2, I can see ICMP replies on veth1 interface. I load a program which do “XDP_TX” for all packets on veth2. I expect to see the same ICMP packet being returned, but I saw nothing.
> 
> I added some debugging message in the XDP program so I’m sure that the packet is processed on veth2, but on veth1, even with promisc mode on, I cannot see any ICMP packets or even ARP packets. In my understanding, 4.15 is using generic XDP mode where 4.20 is using native XDP mode for veth, so I guess there’s something wrong with veth native XDP and need some helps on fixing the issue.

You need to load a dummy program to receive packets from peer XDP_TX when using native veth XDP.

The dummy program is something like this:
int xdp_pass(struct xdp_md *ctx) {
	return XDP_PASS;
}
And load this program on "veth1".

For more information please refer to this slides.
https://netdevconf.info/0x13/session.html?talk-veth-xdp

Also there is a working example here.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_xdp_veth.sh

Toshiaki Makita

> 
> Please let me know if you need help on reproducing the issue.
> 
> Thanks,
> Hanlin
> 
> PS: here’s the src code for the XDP program:
> #include <stddef.h>
> #include <string.h>
> #include <linux/if_vlan.h>
> #include <stdbool.h>
> #include <bpf/bpf_endian.h>
> #include <linux/if_ether.h>
> #include <linux/ip.h>
> #include <linux/tcp.h>
> #include <linux/udp.h>
> #include <linux/in.h>#define DEBUG
> #include "bpf_helpers.h"
> 
> SEC("xdp")
> int loadbal(struct xdp_md *ctx) {
>    bpf_printk("got packet, direct return\n");
>    return XDP_TX;
> }char _license[] SEC("license") = "GPL";
> 
> "bpf_helpers.h" can be found here: https://github.com/dropbox/goebpf/raw/master/bpf_helpers.h
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Veth pair swallow packets for XDP_TX operation
  2020-01-16  9:01 ` Toshiaki Makita
@ 2020-01-16 21:28   ` William Tu
  2020-01-16 22:54   ` Hanlin Shi
  1 sibling, 0 replies; 5+ messages in thread
From: William Tu @ 2020-01-16 21:28 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Hanlin Shi, netdev

On Thu, Jan 16, 2020 at 1:13 AM Toshiaki Makita
<toshiaki.makita1@gmail.com> wrote:
>
> Hi Hanlin,
>
> On 2020/01/16 7:35, Hanlin Shi wrote:
> > Hi community,
> >
> > I’m prototyping an XDP program, and the hit issues with XDP_TX operation on veth device. The following code snippet is working as expected on 4.15.0-54-generic, but is NOT working on 4.20.17-042017-lowlatency (I got the kernel here: https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.20.17/).
> >
> > Here’s my setup: I created a veth pair (namely veth1 and veth2), and put them in two namespaces (namely ns1 and ns2). I assigned address 60.0.0.1 on veth1 and 60.0.0.2 on veth2, set the device as the default interface in its namespace respectively (e.g. in ns1, do “ip r set default dev veth1”). Then in ns1, I ping 60.0.0.2, and tcpdump on veth1’s RX for ICMP.
> >
> > Before loading any XDP program on veth2, I can see ICMP replies on veth1 interface. I load a program which do “XDP_TX” for all packets on veth2. I expect to see the same ICMP packet being returned, but I saw nothing.
> >
> > I added some debugging message in the XDP program so I’m sure that the packet is processed on veth2, but on veth1, even with promisc mode on, I cannot see any ICMP packets or even ARP packets. In my understanding, 4.15 is using generic XDP mode where 4.20 is using native XDP mode for veth, so I guess there’s something wrong with veth native XDP and need some helps on fixing the issue.
>
> You need to load a dummy program to receive packets from peer XDP_TX when using native veth XDP.
>
> The dummy program is something like this:
> int xdp_pass(struct xdp_md *ctx) {
>         return XDP_PASS;
> }
> And load this program on "veth1".
>
> For more information please refer to this slides.
> https://netdevconf.info/0x13/session.html?talk-veth-xdp
>
> Also there is a working example here.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/test_xdp_veth.sh
>
> Toshiaki Makita
>
> >
> > Please let me know if you need help on reproducing the issue.
> >
> > Thanks,
> > Hanlin
> >
> > PS: here’s the src code for the XDP program:
> > #include <stddef.h>
> > #include <string.h>
> > #include <linux/if_vlan.h>
> > #include <stdbool.h>
> > #include <bpf/bpf_endian.h>
> > #include <linux/if_ether.h>
> > #include <linux/ip.h>
> > #include <linux/tcp.h>
> > #include <linux/udp.h>
> > #include <linux/in.h>#define DEBUG
> > #include "bpf_helpers.h"
> >
> > SEC("xdp")
> > int loadbal(struct xdp_md *ctx) {
> >    bpf_printk("got packet, direct return\n");
> >    return XDP_TX;
> > }char _license[] SEC("license") = "GPL";
> >
> > "bpf_helpers.h" can be found here: https://github.com/dropbox/goebpf/raw/master/bpf_helpers.h
> >

Hi Hanlin,

I tested XDP_TX and the case you mentioned, and it works as OK on my 5.3 kernel.
I used the script to tested, can you give it a try?

#!/bin/bash
ip netns add at_ns0
ip link add p0 type veth peer name afxdp-p0
ip link set p0 netns at_ns0
ip addr add 10.1.1.2/24 dev afxdp-p0
ip link set dev afxdp-p0 up

ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
ip addr add "10.1.1.1/24" dev p0
ip link set dev p0 up
NS_EXEC_HEREDOC

tcpdump -l -n -i afxdp-p0 -w /tmp/t.pcap icmp &
ping -c 100 10.1.1.1 &

# the xdp program return XDP_TX
ip netns exec at_ns0 ip link set dev p0 xdp obj xdp1_kern.o sec xdp1

$ tcpdump -r /tmp/t.pcap
13:24:59.925165 IP 10.1.1.1 > 10.1.1.2: ICMP echo reply, id 31521, seq
3, length 64
13:25:00.949240 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 4, length 64
13:25:00.949273 IP 10.1.1.1 > 10.1.1.2: ICMP echo reply, id 31521, seq
4, length 64
13:25:01.972369 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 5, length 64
13:25:01.972402 IP 10.1.1.1 > 10.1.1.2: ICMP echo reply, id 31521, seq
5, length 64
load the XDP_TX progam...
13:25:02.995996 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 6, length 64
13:25:04.021256 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 7, length 64
13:25:05.044943 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 8, length 64
13:25:06.069341 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 9, length 64
13:25:07.093121 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 10, length 64
13:25:08.115943 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 11, length 64
13:25:09.141542 IP 10.1.1.2 > 10.1.1.1: ICMP echo request, id 31521,
seq 12, length 64

William

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Veth pair swallow packets for XDP_TX operation
  2020-01-16  9:01 ` Toshiaki Makita
  2020-01-16 21:28   ` William Tu
@ 2020-01-16 22:54   ` Hanlin Shi
  2020-01-17  6:00     ` Toshiaki Makita
  1 sibling, 1 reply; 5+ messages in thread
From: Hanlin Shi @ 2020-01-16 22:54 UTC (permalink / raw)
  To: Toshiaki Makita, netdev; +Cc: Cheng-Chun William Tu

Hi Toshiaki,

Thanks for your advice, and now it's working as expected in my environment. However I still have concerns on this issue. Is this dummy interface approach is a short-term work around? The behavior for native XDP is different from generic XDP, which could cause confusions for developers. Also, I'm planning to load the XDP program in container (specifically, Kubernetes pod), and I'm not sure is it's feasible for me to access the veth peer that is connected to the bridge (Linux bridge or ovs).

I wonder is that ok to have a fix, that if the XDP program on the peer of veth is not found, then fallback to a dummy XDP_PASS behavior, just like what you demonstrated? If needed I can help on the fix.

Thanks,
Hanlin

On 1/16/20, 1:01 AM, "Toshiaki Makita" <toshiaki.makita1@gmail.com> wrote:

    Hi Hanlin,
    
    On 2020/01/16 7:35, Hanlin Shi wrote:
    > Hi community,
    > 
    > I’m prototyping an XDP program, and the hit issues with XDP_TX operation on veth device. The following code snippet is working as expected on 4.15.0-54-generic, but is NOT working on 4.20.17-042017-lowlatency (I got the kernel here: https://nam04.safelinks.protection.outlook.com/?url=https:%2F%2Fkernel.ubuntu.com%2F~kernel-ppa%2Fmainline%2Fv4.20.17%2F&amp;data=02%7C01%7Chanlins%40vmware.com%7Ccfd69717ebcd4251394908d79a62b253%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637147621011879646&amp;sdata=47EGezqyQTyLkfFAoBlfYDrpvhPkTouMJi0ICbmcmfw%3D&amp;reserved=0).
    > 
    > Here’s my setup: I created a veth pair (namely veth1 and veth2), and put them in two namespaces (namely ns1 and ns2). I assigned address 60.0.0.1 on veth1 and 60.0.0.2 on veth2, set the device as the default interface in its namespace respectively (e.g. in ns1, do “ip r set default dev veth1”). Then in ns1, I ping 60.0.0.2, and tcpdump on veth1’s RX for ICMP.
    > 
    > Before loading any XDP program on veth2, I can see ICMP replies on veth1 interface. I load a program which do “XDP_TX” for all packets on veth2. I expect to see the same ICMP packet being returned, but I saw nothing.
    > 
    > I added some debugging message in the XDP program so I’m sure that the packet is processed on veth2, but on veth1, even with promisc mode on, I cannot see any ICMP packets or even ARP packets. In my understanding, 4.15 is using generic XDP mode where 4.20 is using native XDP mode for veth, so I guess there’s something wrong with veth native XDP and need some helps on fixing the issue.
    
    You need to load a dummy program to receive packets from peer XDP_TX when using native veth XDP.
    
    The dummy program is something like this:
    int xdp_pass(struct xdp_md *ctx) {
    	return XDP_PASS;
    }
    And load this program on "veth1".
    
    For more information please refer to this slides.
    https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetdevconf.info%2F0x13%2Fsession.html%3Ftalk-veth-xdp&amp;data=02%7C01%7Chanlins%40vmware.com%7Ccfd69717ebcd4251394908d79a62b253%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637147621011879646&amp;sdata=KIIkloefjX35ZzHk6bjx7pmO%2FRJRAB1KYw6PoQSuLmk%3D&amp;reserved=0
    
    Also there is a working example here.
    https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Ftools%2Ftesting%2Fselftests%2Fbpf%2Ftest_xdp_veth.sh&amp;data=02%7C01%7Chanlins%40vmware.com%7Ccfd69717ebcd4251394908d79a62b253%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637147621011879646&amp;sdata=9BSgwjY%2Fn4MQ0YAFzIp6HAVLw86EmNQHb5BwSpuKS2k%3D&amp;reserved=0
    
    Toshiaki Makita
    
    > 
    > Please let me know if you need help on reproducing the issue.
    > 
    > Thanks,
    > Hanlin
    > 
    > PS: here’s the src code for the XDP program:
    > #include <stddef.h>
    > #include <string.h>
    > #include <linux/if_vlan.h>
    > #include <stdbool.h>
    > #include <bpf/bpf_endian.h>
    > #include <linux/if_ether.h>
    > #include <linux/ip.h>
    > #include <linux/tcp.h>
    > #include <linux/udp.h>
    > #include <linux/in.h>#define DEBUG
    > #include "bpf_helpers.h"
    > 
    > SEC("xdp")
    > int loadbal(struct xdp_md *ctx) {
    >    bpf_printk("got packet, direct return\n");
    >    return XDP_TX;
    > }char _license[] SEC("license") = "GPL";
    > 
    > "bpf_helpers.h" can be found here: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdropbox%2Fgoebpf%2Fraw%2Fmaster%2Fbpf_helpers.h&amp;data=02%7C01%7Chanlins%40vmware.com%7Ccfd69717ebcd4251394908d79a62b253%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637147621011879646&amp;sdata=43ODisubHh%2BGLRDU2IcZTA4hujtLlVSzbcs5MhZpxCs%3D&amp;reserved=0
    > 
    


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Veth pair swallow packets for XDP_TX operation
  2020-01-16 22:54   ` Hanlin Shi
@ 2020-01-17  6:00     ` Toshiaki Makita
  0 siblings, 0 replies; 5+ messages in thread
From: Toshiaki Makita @ 2020-01-17  6:00 UTC (permalink / raw)
  To: Hanlin Shi, netdev; +Cc: Cheng-Chun William Tu

Please avoid top-posting in netdev mailing list.

On 2020/01/17 7:54, Hanlin Shi wrote:
> Hi Toshiaki,
> 
> Thanks for your advice, and now it's working as expected in my environment. However I still have concerns on this issue. Is this dummy interface approach is a short-term work around?

This is a long-standing problem and should be fixed in some way. But not easy.

Your packets were dropped because the peer device did not prepare necessary
resources to receive XDP frames. The resource allocation is triggered by
attaching (possibly dummy) XDP program, which is unfortunately unintuitive.
Typically this kind of problem happens when other devices redirect frames by
XDP_REDIRECT to some device. If the redirect target device has not prepared
necessary resources, redirected frames will be dropped. This is a common issue
with other XDP drivers and netdev community is seeking for a right solution.

For veth there may be one more option that attaching an XDP program triggers
allocation of "peer" resource. But this means we need to allocate resources
on both ends when only either of them attaches XDP. This is not necessary when the
program only does XDP_DROP or XDP_PASS, so I'm not sure this is a good idea.

Anyway with current behavior the peer (i.e. container host) needs to explicitly
allow XDP_TX by attaching some program on host side.

> The behavior for native XDP is different from generic XDP, which could cause confusions for developers. 

Native XDP is generally hard to setup, which is one of reasons why generic XDP was introduced.

> Also, I'm planning to load the XDP program in container (specifically, Kubernetes pod), and I'm not sure is it's feasible for me to access the veth peer that is connected to the bridge (Linux bridge or ovs).

So veth devices will be created by CNI plugins? Then basically your CNI plugin needs to attach
XDP program on host side if you want to allow XDP_TX in containers.

> 
> I wonder is that ok to have a fix, that if the XDP program on the peer of veth is not found, then fallback to a dummy XDP_PASS behavior, just like what you demonstrated? If needed I can help on the fix.

I proposed a similar workaround when I introduced veth native XDP, but rejected.
If we do not allocate additional resources on the peer, we need to use legacy data path
that does not have bulk interface, which makes the XDP_TX performance lower.
That would be a hard-to-fix problem than dropping...

Toshiaki Makita

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-17  6:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 22:35 Veth pair swallow packets for XDP_TX operation Hanlin Shi
2020-01-16  9:01 ` Toshiaki Makita
2020-01-16 21:28   ` William Tu
2020-01-16 22:54   ` Hanlin Shi
2020-01-17  6:00     ` Toshiaki Makita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).