From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Coombs Subject: Re: Possible bug in traffic control? Date: Wed, 10 Oct 2018 11:52:14 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To: netdev@vger.kernel.org Return-path: Received: from mail-yb1-f173.google.com ([209.85.219.173]:37627 "EHLO mail-yb1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726479AbeJJXPh (ORCPT ); Wed, 10 Oct 2018 19:15:37 -0400 Received: by mail-yb1-f173.google.com with SMTP id h1-v6so2402724ybm.4 for ; Wed, 10 Oct 2018 08:52:51 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: 2.3 billion 1 byte packets failed to re-create the bug. To try and simplify the setup I removed macsec from the equation, using a single host in the middle as the bridge. Interestingly, rather than 1.3Gbits a second in both directions, it ran around 8Mbits a second. Switching the filter from u32 to matchall didn't change the performance. Going back to the four machine test bed, again removing macsec and just bridging through radically decreased the throughput to around 8Mbits. Flip on macsec for the bridge and 1.3Gbits? On Tue, Oct 9, 2018 at 11:58 AM Josh Coombs wrote: > > Hello all, I'm looking for some guidance in chasing what I believe to > be a bug in kernel traffic control filters. If I'm pinging the wrong > list let me know. > > I have a homebrew MACSec bridge setup using two pairs of PCs. I > establish a MACSec link between them, and then use TC to bridge a > second ethernet interface over the MACSec link. The second interface > is connected to a Juniper switch at each end, and I'm using LACP over > the links to bond them up for redundancy. It turns out I need that > redundancy as after awhile one pair of bridges will stop flowing > packets in one direction. I've since replicated this failure with a > group of VMs as well. > > My test setup to replicate the failure inside ESXi: > - Two MACSec bridge VMs, A and Z > - Two IPerf VMs, A and Z > My VMs are currently built using Ubuntu Server 18.04 to be quick, no > additional packages are required outside of iperf3. Kernel ver as > shipped currently is 4.15.0-36. I highly advise using a CPU with AES > instruction support as MACSec eats CPU without it and will take longer > to reproduce the symptoms. > > - A 'MACSec Bridge' network > - A 'A Side link' network > - A 'Z Side link' network > In ESXi I used a dedicated vSwitch, 9000 MTU (to allow full 1500 eth > packets + MACSec to pass on the bridge) and the security policy is > full open (allow promiscuous, allow forged, allow mac changes) as > we're abusing the networks as direct point to point links. If using > physical machines, just cable up, my example script bumps the MTU as > required. > > The MACSec boxes have two ethernet interfaces each. One pair is on > the MACSec Bridge network. The other interfaces go to the A and Z > IPerf boxes respectively via their dedicated networks. A and Z need > their interfaces configured with IPs in a common subnet, such as > 192.168.0.1/30 and 192.168.0.2/30. > > My script sets up MACSec, tweaks MTUs, and touches a few sysctls to > turn the involved interfaces into silent actors. It then uses TC to > start the actual bridging. From there I've been firing up iperf 3 > sessions in both directions between A and Z to hammer the bridge until > it fails. When it does, I can see packets stop being bridged in one > direction on one MACSec host, but not the other. The second host > continues to flow packets in both directions. Nothing is logged to > dmesg when this fault occurs. The fault seems to occur at roughly the > same packet / traffic amount each time. On my main application it's > after approximately 2.5TB of traffic (random mix of sizes) and with my > test bed it was after 5.5TB of 1500 byte packets. > > On the impacted MACSec node, watching interface packet counters via > ifconfig and actual traffic with tcpdump I can see packets coming in > MACSec and going out the host interface, the host reply coming in but > not showing up on the MACSec interface to cross the bridge. Clearing > out the tc filter and qdisc and re-adding does not restore traffic > flow. > > There is a PPA with 4.18 available for Ubuntu that I'm going to test > with next to see if that makes a difference in behavior. In the mean > time I'd appreciate any suggestions on how to diagnose this. > > My MACSec bridge setup script, update sif, dif, the keys and rxmac to > match your setup. The rxmac is the mac addy of the remote bridge > interface. Keys need to be flipped between systems. > ----------------------- > #!/bin/bash > > # Interfaces: > # sif = Ingress physical interface (Source) > # dif = Egress physical interface (Dest) > # eif = Encrypted interface > sif=eno2 > dif=enp1s0f0 > eif=macsec0 > > # MACSec Keys: > # txkey = Transmit (Local) key > # rxkey = Receive (Remote) key > # rxmac = Receive (Remote) MAC addy > txkey=00000000000000000000000000000000 > rxkey=99999999999999999999999999999999 > rxmac=00:11:22:33:44:55 > > # Use jumbo frames for macsec to allow full 1500 MTU passthrough: > echo "* MTU update" > ip link set "$sif" mtu 9000 > ip link set "$dif" mtu 9000 > > # Bring up macsec: > echo "* Enable MACSec" > modprobe macsec > ip link add link "$dif" "$eif" type macsec > ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey" > ip macsec add "$eif" rx address "$rxmac" port 1 > ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey" > ip link set "$eif" type macsec encrypt on > #ip link set "$eif" type macsec replay on window 64 > > # Keep system from trying to respond to observed traffic: > echo "* Clamp the system so bridge ports NEVER respond to traffic" > sysctl -w net.ipv4.conf.default.arp_filter=1 > sysctl -w net.ipv4.conf.all.arp_filter=1 > ip link set "$sif" down promisc on arp off multicast off > sysctl -w net.ipv6.conf."$sif".autoconf=0 > sysctl -w net.ipv6.conf."$sif".accept_ra=0 > sysctl -w net.ipv4.conf."$sif".arp_ignore=8 > sysctl -w net.ipv4.conf."$sif".rp_filter=0 > ip link set "$dif" down promisc on arp off multicast off > sysctl -w net.ipv6.conf."$dif".autoconf=0 > sysctl -w net.ipv6.conf."$dif".accept_ra=0 > sysctl -w net.ipv4.conf."$dif".arp_ignore=8 > sysctl -w net.ipv4.conf."$dif".rp_filter=0 > ip link set "$eif" down promisc on arp off multicast off > sysctl -w net.ipv6.conf."$eif".autoconf=0 > sysctl -w net.ipv6.conf."$eif".accept_ra=0 > sysctl -w net.ipv4.conf."$eif".arp_ignore=8 > sysctl -w net.ipv4.conf."$eif".rp_filter=0 > > # Set up traffic mirroring: > echo "* Start Port Mirror" > # sif to eif > tc qdisc add dev "$sif" ingress > tc filter add dev "$sif" parent ffff: \ > protocol all \ > u32 match u8 0 0 \ > action mirred egress mirror dev "$eif" > > # eif to sif > tc qdisc add dev "$eif" ingress > tc filter add dev "$eif" parent ffff: \ > protocol all \ > u32 match u8 0 0 \ > action mirred egress mirror dev "$sif" > > # Bring up the interfaces: > echo "* Light tunnel NICS" > ip link set "$sif" up > ip link set "$dif" up > ip link set "$eif" up > > echo " --=[ MACSec Up ]=--" > ----------------------- > > Josh Coombs