All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible bug in traffic control?
@ 2018-10-09 15:58 Josh Coombs
  2018-10-10 15:52 ` Josh Coombs
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Coombs @ 2018-10-09 15:58 UTC (permalink / raw)
  To: netdev

Hello all, I'm looking for some guidance in chasing what I believe to
be a bug in kernel traffic control filters.  If I'm pinging the wrong
list let me know.

I have a homebrew MACSec bridge setup using two pairs of PCs.  I
establish a MACSec link between them, and then use TC to bridge a
second ethernet interface over the MACSec link. The second interface
is connected to a Juniper switch at each end, and I'm using LACP over
the links to bond them up for redundancy.  It turns out I need that
redundancy as after awhile one pair of bridges will stop flowing
packets in one direction.  I've since replicated this failure with a
group of VMs as well.

My test setup to replicate the failure inside ESXi:
- Two MACSec bridge VMs, A and Z
- Two IPerf VMs, A and Z
My VMs are currently built using Ubuntu Server 18.04 to be quick, no
additional packages are required outside of iperf3.  Kernel ver as
shipped currently is 4.15.0-36.  I highly advise using a CPU with AES
instruction support as MACSec eats CPU without it and will take longer
to reproduce the symptoms.

- A 'MACSec Bridge' network
- A 'A Side link' network
- A 'Z Side link' network
In ESXi I used a dedicated vSwitch, 9000 MTU (to allow full 1500 eth
packets + MACSec to pass on the bridge) and the security policy is
full open (allow promiscuous, allow forged, allow mac changes) as
we're abusing the networks as direct point to point links.  If using
physical machines, just cable up, my example script bumps the MTU as
required.

The MACSec boxes have two ethernet interfaces each.  One pair is on
the MACSec Bridge network.  The other interfaces go to the A and Z
IPerf boxes respectively via their dedicated networks.  A and Z need
their interfaces configured with IPs in a common subnet, such as
192.168.0.1/30 and 192.168.0.2/30.

My script sets up MACSec, tweaks MTUs, and touches a few sysctls to
turn the involved interfaces into silent actors.  It then uses TC to
start the actual bridging.  From there I've been firing up iperf 3
sessions in both directions between A and Z to hammer the bridge until
it fails.  When it does, I can see packets stop being bridged in one
direction on one MACSec host, but not the other.  The second host
continues to flow packets in both directions.  Nothing is logged to
dmesg when this fault occurs.  The fault seems to occur at roughly the
same packet / traffic amount each time.  On my main application it's
after approximately 2.5TB of traffic (random mix of sizes) and with my
test bed it was after 5.5TB of 1500 byte packets.

On the impacted MACSec node, watching interface packet counters via
ifconfig and actual traffic with tcpdump I can see packets coming in
MACSec and going out the host interface, the host reply coming in but
not showing up on the MACSec interface to cross the bridge.  Clearing
out the tc filter and qdisc and re-adding does not restore traffic
flow.

There is a PPA with 4.18 available for Ubuntu that I'm going to test
with next to see if that makes a difference in behavior.  In the mean
time I'd appreciate any suggestions on how to diagnose this.

My MACSec bridge setup script, update sif, dif, the keys and rxmac to
match your setup.  The rxmac is the mac addy of the remote bridge
interface.  Keys need to be flipped between systems.
-----------------------
#!/bin/bash

# Interfaces:
# sif = Ingress physical interface (Source)
# dif = Egress physical interface (Dest)
# eif = Encrypted interface
sif=eno2
dif=enp1s0f0
eif=macsec0

# MACSec Keys:
# txkey = Transmit (Local) key
# rxkey = Receive (Remote) key
# rxmac = Receive (Remote) MAC addy
txkey=00000000000000000000000000000000
rxkey=99999999999999999999999999999999
rxmac=00:11:22:33:44:55

# Use jumbo frames for macsec to allow full 1500 MTU passthrough:
echo "* MTU update"
ip link set "$sif" mtu 9000
ip link set "$dif" mtu 9000

# Bring up macsec:
echo "* Enable MACSec"
modprobe macsec
ip link add link "$dif" "$eif" type macsec
ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
ip macsec add "$eif" rx address "$rxmac" port 1
ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
ip link set "$eif" type macsec encrypt on
#ip link set "$eif" type macsec replay on window 64

# Keep system from trying to respond to observed traffic:
echo "* Clamp the system so bridge ports NEVER respond to traffic"
sysctl -w net.ipv4.conf.default.arp_filter=1
sysctl -w net.ipv4.conf.all.arp_filter=1
ip link set "$sif" down promisc on arp off multicast off
sysctl -w net.ipv6.conf."$sif".autoconf=0
sysctl -w net.ipv6.conf."$sif".accept_ra=0
sysctl -w net.ipv4.conf."$sif".arp_ignore=8
sysctl -w net.ipv4.conf."$sif".rp_filter=0
ip link set "$dif" down promisc on arp off multicast off
sysctl -w net.ipv6.conf."$dif".autoconf=0
sysctl -w net.ipv6.conf."$dif".accept_ra=0
sysctl -w net.ipv4.conf."$dif".arp_ignore=8
sysctl -w net.ipv4.conf."$dif".rp_filter=0
ip link set "$eif" down promisc on arp off multicast off
sysctl -w net.ipv6.conf."$eif".autoconf=0
sysctl -w net.ipv6.conf."$eif".accept_ra=0
sysctl -w net.ipv4.conf."$eif".arp_ignore=8
sysctl -w net.ipv4.conf."$eif".rp_filter=0

# Set up traffic mirroring:
echo "* Start Port Mirror"
# sif to eif
tc qdisc add dev "$sif" ingress
tc filter add dev "$sif" parent ffff: \
          protocol all \
          u32 match u8 0 0 \
          action mirred egress mirror dev "$eif"

# eif to sif
tc qdisc add dev "$eif" ingress
tc filter add dev "$eif" parent ffff: \
          protocol all \
          u32 match u8 0 0 \
          action mirred egress mirror dev "$sif"

# Bring up the interfaces:
echo "* Light tunnel NICS"
ip link set "$sif" up
ip link set "$dif" up
ip link set "$eif" up

echo " --=[ MACSec Up ]=--"
-----------------------

Josh Coombs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible bug in traffic control?
  2018-10-09 15:58 Possible bug in traffic control? Josh Coombs
@ 2018-10-10 15:52 ` Josh Coombs
  2018-10-10 16:39   ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Coombs @ 2018-10-10 15:52 UTC (permalink / raw)
  To: netdev

2.3 billion 1 byte packets failed to re-create the bug.  To try and
simplify the setup I removed macsec from the equation, using a single
host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
a second in both directions, it ran around 8Mbits a second.  Switching
the filter from u32 to matchall didn't change the performance.  Going
back to the four machine test bed, again removing macsec and just
bridging through radically decreased the throughput to around 8Mbits.
Flip on macsec for the bridge and 1.3Gbits?
On Tue, Oct 9, 2018 at 11:58 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> Hello all, I'm looking for some guidance in chasing what I believe to
> be a bug in kernel traffic control filters.  If I'm pinging the wrong
> list let me know.
>
> I have a homebrew MACSec bridge setup using two pairs of PCs.  I
> establish a MACSec link between them, and then use TC to bridge a
> second ethernet interface over the MACSec link. The second interface
> is connected to a Juniper switch at each end, and I'm using LACP over
> the links to bond them up for redundancy.  It turns out I need that
> redundancy as after awhile one pair of bridges will stop flowing
> packets in one direction.  I've since replicated this failure with a
> group of VMs as well.
>
> My test setup to replicate the failure inside ESXi:
> - Two MACSec bridge VMs, A and Z
> - Two IPerf VMs, A and Z
> My VMs are currently built using Ubuntu Server 18.04 to be quick, no
> additional packages are required outside of iperf3.  Kernel ver as
> shipped currently is 4.15.0-36.  I highly advise using a CPU with AES
> instruction support as MACSec eats CPU without it and will take longer
> to reproduce the symptoms.
>
> - A 'MACSec Bridge' network
> - A 'A Side link' network
> - A 'Z Side link' network
> In ESXi I used a dedicated vSwitch, 9000 MTU (to allow full 1500 eth
> packets + MACSec to pass on the bridge) and the security policy is
> full open (allow promiscuous, allow forged, allow mac changes) as
> we're abusing the networks as direct point to point links.  If using
> physical machines, just cable up, my example script bumps the MTU as
> required.
>
> The MACSec boxes have two ethernet interfaces each.  One pair is on
> the MACSec Bridge network.  The other interfaces go to the A and Z
> IPerf boxes respectively via their dedicated networks.  A and Z need
> their interfaces configured with IPs in a common subnet, such as
> 192.168.0.1/30 and 192.168.0.2/30.
>
> My script sets up MACSec, tweaks MTUs, and touches a few sysctls to
> turn the involved interfaces into silent actors.  It then uses TC to
> start the actual bridging.  From there I've been firing up iperf 3
> sessions in both directions between A and Z to hammer the bridge until
> it fails.  When it does, I can see packets stop being bridged in one
> direction on one MACSec host, but not the other.  The second host
> continues to flow packets in both directions.  Nothing is logged to
> dmesg when this fault occurs.  The fault seems to occur at roughly the
> same packet / traffic amount each time.  On my main application it's
> after approximately 2.5TB of traffic (random mix of sizes) and with my
> test bed it was after 5.5TB of 1500 byte packets.
>
> On the impacted MACSec node, watching interface packet counters via
> ifconfig and actual traffic with tcpdump I can see packets coming in
> MACSec and going out the host interface, the host reply coming in but
> not showing up on the MACSec interface to cross the bridge.  Clearing
> out the tc filter and qdisc and re-adding does not restore traffic
> flow.
>
> There is a PPA with 4.18 available for Ubuntu that I'm going to test
> with next to see if that makes a difference in behavior.  In the mean
> time I'd appreciate any suggestions on how to diagnose this.
>
> My MACSec bridge setup script, update sif, dif, the keys and rxmac to
> match your setup.  The rxmac is the mac addy of the remote bridge
> interface.  Keys need to be flipped between systems.
> -----------------------
> #!/bin/bash
>
> # Interfaces:
> # sif = Ingress physical interface (Source)
> # dif = Egress physical interface (Dest)
> # eif = Encrypted interface
> sif=eno2
> dif=enp1s0f0
> eif=macsec0
>
> # MACSec Keys:
> # txkey = Transmit (Local) key
> # rxkey = Receive (Remote) key
> # rxmac = Receive (Remote) MAC addy
> txkey=00000000000000000000000000000000
> rxkey=99999999999999999999999999999999
> rxmac=00:11:22:33:44:55
>
> # Use jumbo frames for macsec to allow full 1500 MTU passthrough:
> echo "* MTU update"
> ip link set "$sif" mtu 9000
> ip link set "$dif" mtu 9000
>
> # Bring up macsec:
> echo "* Enable MACSec"
> modprobe macsec
> ip link add link "$dif" "$eif" type macsec
> ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
> ip macsec add "$eif" rx address "$rxmac" port 1
> ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
> ip link set "$eif" type macsec encrypt on
> #ip link set "$eif" type macsec replay on window 64
>
> # Keep system from trying to respond to observed traffic:
> echo "* Clamp the system so bridge ports NEVER respond to traffic"
> sysctl -w net.ipv4.conf.default.arp_filter=1
> sysctl -w net.ipv4.conf.all.arp_filter=1
> ip link set "$sif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$sif".autoconf=0
> sysctl -w net.ipv6.conf."$sif".accept_ra=0
> sysctl -w net.ipv4.conf."$sif".arp_ignore=8
> sysctl -w net.ipv4.conf."$sif".rp_filter=0
> ip link set "$dif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$dif".autoconf=0
> sysctl -w net.ipv6.conf."$dif".accept_ra=0
> sysctl -w net.ipv4.conf."$dif".arp_ignore=8
> sysctl -w net.ipv4.conf."$dif".rp_filter=0
> ip link set "$eif" down promisc on arp off multicast off
> sysctl -w net.ipv6.conf."$eif".autoconf=0
> sysctl -w net.ipv6.conf."$eif".accept_ra=0
> sysctl -w net.ipv4.conf."$eif".arp_ignore=8
> sysctl -w net.ipv4.conf."$eif".rp_filter=0
>
> # Set up traffic mirroring:
> echo "* Start Port Mirror"
> # sif to eif
> tc qdisc add dev "$sif" ingress
> tc filter add dev "$sif" parent ffff: \
>           protocol all \
>           u32 match u8 0 0 \
>           action mirred egress mirror dev "$eif"
>
> # eif to sif
> tc qdisc add dev "$eif" ingress
> tc filter add dev "$eif" parent ffff: \
>           protocol all \
>           u32 match u8 0 0 \
>           action mirred egress mirror dev "$sif"
>
> # Bring up the interfaces:
> echo "* Light tunnel NICS"
> ip link set "$sif" up
> ip link set "$dif" up
> ip link set "$eif" up
>
> echo " --=[ MACSec Up ]=--"
> -----------------------
>
> Josh Coombs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible bug in traffic control?
  2018-10-10 15:52 ` Josh Coombs
@ 2018-10-10 16:39   ` Cong Wang
  2018-10-11 14:05     ` Josh Coombs
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2018-10-10 16:39 UTC (permalink / raw)
  To: jcoombs; +Cc: Linux Kernel Network Developers

On Wed, Oct 10, 2018 at 8:54 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> 2.3 billion 1 byte packets failed to re-create the bug.  To try and
> simplify the setup I removed macsec from the equation, using a single
> host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
> a second in both directions, it ran around 8Mbits a second.  Switching
> the filter from u32 to matchall didn't change the performance.  Going
> back to the four machine test bed, again removing macsec and just
> bridging through radically decreased the throughput to around 8Mbits.
> Flip on macsec for the bridge and 1.3Gbits?

This is a great narrow down! We can rule out macsec for guilty.

Can you share a minimum reproducer for this problem? If so I can take
a look.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible bug in traffic control?
  2018-10-10 16:39   ` Cong Wang
@ 2018-10-11 14:05     ` Josh Coombs
  2018-10-12 13:59       ` Josh Coombs
  0 siblings, 1 reply; 5+ messages in thread
From: Josh Coombs @ 2018-10-11 14:05 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev

I'm actually leaning towards macsec now.  I'm at 6TB transferred in a
double hop, no macsec over the bridge setup without triggering the
fault.  I'm going to let it continue to churn and setup a second
testbed that JUST uses macsec without traffic control bridging to see
if I can trip the issue there.    That should determine if it's macsec
itself, or an interaction between macsec and traffic control.

Joshua Coombs
GWI

office 207-494-2140
www.gwi.net

On Wed, Oct 10, 2018 at 12:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Wed, Oct 10, 2018 at 8:54 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> >
> > 2.3 billion 1 byte packets failed to re-create the bug.  To try and
> > simplify the setup I removed macsec from the equation, using a single
> > host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
> > a second in both directions, it ran around 8Mbits a second.  Switching
> > the filter from u32 to matchall didn't change the performance.  Going
> > back to the four machine test bed, again removing macsec and just
> > bridging through radically decreased the throughput to around 8Mbits.
> > Flip on macsec for the bridge and 1.3Gbits?
>
> This is a great narrow down! We can rule out macsec for guilty.
>
> Can you share a minimum reproducer for this problem? If so I can take
> a look.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible bug in traffic control?
  2018-10-11 14:05     ` Josh Coombs
@ 2018-10-12 13:59       ` Josh Coombs
  0 siblings, 0 replies; 5+ messages in thread
From: Josh Coombs @ 2018-10-12 13:59 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

I've been able to narrow the scope down, the issue is with macsec
itself.  I setup two hosts with a macsec link between them, and let a
couple iperf3 sessions blast traffic across.  At approximately 4.2
billion packets / 6TB data transferred one end stopped transmitting
packets.  Doing a tcpdump on the impacted node's macsec0 interface
shows packets coming in from the remote node, in this case arp
requests, and arp replies from the local host, but watching the
interface counters for macsec0 no packets are being recorded as
transmitting.  Again, nothing in dmesg implying an error.

Deleting the macsec interface via ip link delete macsec0 and
re-creating it gets traffic flowing again without a reboot.

Meanwhile my traffic control bridge without macsec has shuffled 19TB
via 22 billion packets and not skipped a beat, so it appears my
initial assumption of it being the culprit was wrong.

To replicate, setup two hosts with a direct ethernet link between each other.
- Bring up macsec between the two hosts, setup a dedicated /30 on the link.
- Use iperf3 or another traffic generating tool over the /30, one
session for each direction.
- Wait for traffic to stop.

My test bed is on Ubuntu Server 18.04 currently, kernel 4.15.0-36.
I'm going to spin up a vanilla kernel on 4.15 and then -current to see
if this is an Ubuntu-ism from their patches, specific to 4.15, or a
general issue with macsec.

The script I used on each host (keys, rxmacs and IPs updated as appropriate):
#!/bin/bash

# Interfaces:
# dif = Egress physical interface (Dest)
# eif = Encrypted interface
dif=ens224
eif=macsec0

# MACSec Keys:
# txkey = Transmit (Local) key
# rxkey = Receive (Remote) key
# rxmac = Receive (Remote) MAC addy
txkey=60995924232808431491190820961556
rxkey=87345530111733181210202106249824
rxmac=00:0c:29:c5:95:df

# Clear any existing IP config
ifconfig $dif 0.0.0.0

# Bring up macsec:
echo "* Enable MACSec"
modprobe macsec
ip link add link "$dif" "$eif" type macsec
ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
ip macsec add "$eif" rx address "$rxmac" port 1
ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
ip link set "$eif" type macsec encrypt on

# Bring up the interfaces:
echo "* Light tunnel NICS"
ip link set "$dif" up
ip link set "$eif" up

# Set IP
ifconfig $eif 192.168.211.1/30

echo " --=[ MACSec Up ]=--"
On Thu, Oct 11, 2018 at 10:05 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> I'm actually leaning towards macsec now.  I'm at 6TB transferred in a
> double hop, no macsec over the bridge setup without triggering the
> fault.  I'm going to let it continue to churn and setup a second
> testbed that JUST uses macsec without traffic control bridging to see
> if I can trip the issue there.    That should determine if it's macsec
> itself, or an interaction between macsec and traffic control.
>
> Joshua Coombs
> GWI
>
> office 207-494-2140
> www.gwi.net
>
> On Wed, Oct 10, 2018 at 12:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Oct 10, 2018 at 8:54 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> > >
> > > 2.3 billion 1 byte packets failed to re-create the bug.  To try and
> > > simplify the setup I removed macsec from the equation, using a single
> > > host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
> > > a second in both directions, it ran around 8Mbits a second.  Switching
> > > the filter from u32 to matchall didn't change the performance.  Going
> > > back to the four machine test bed, again removing macsec and just
> > > bridging through radically decreased the throughput to around 8Mbits.
> > > Flip on macsec for the bridge and 1.3Gbits?
> >
> > This is a great narrow down! We can rule out macsec for guilty.
> >
> > Can you share a minimum reproducer for this problem? If so I can take
> > a look.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-12 21:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 15:58 Possible bug in traffic control? Josh Coombs
2018-10-10 15:52 ` Josh Coombs
2018-10-10 16:39   ` Cong Wang
2018-10-11 14:05     ` Josh Coombs
2018-10-12 13:59       ` Josh Coombs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.