Race condition in route lookup

* Race condition in route lookup
@ 2019-10-09 16:00 Jesse Hathaway
  2019-10-10  8:31 ` Ido Schimmel
  0 siblings, 1 reply; 23+ messages in thread
From: Jesse Hathaway @ 2019-10-09 16:00 UTC (permalink / raw)
  To: netdev

We have been experiencing a route lookup race condition on our internet facing
Linux routers. I have been able to reproduce the issue, but would love more
help in isolating the cause.

Looking up a route found in the main table returns `*` rather than the directly
connected interface about once for every 10-20 million requests. From my
reading of the iproute2 source code an asterisk is indicative of the kernel
returning and interface index of 0 rather than the correct directly connected
interface.

This is reproducible with the following bash snippet on 5.4-rc2:

  $ cat route-race
  #!/bin/bash

  # Generate 50 million individual route gets to feed as batch input to `ip`
  function ip-cmds() {
          route_get='route get 192.168.11.142 from 192.168.180.10 iif vlan180'
          for ((i = 0; i < 50000000; i++)); do
                  printf '%s\n' "${route_get}"
          done

  }

  ip-cmds | ip -d -o -batch - | grep -E 'dev \*' | uniq -c

Example output:

  $ ./route-race
        6 unicast 192.168.11.142 from 192.168.180.10 dev * table main
\    cache iif vlan180

These routers have multiple routing tables and are ingesting full BGP routing
tables from multiple ISPs:

  $ ip route show table all | wc -l
  3105543

  $ ip route show table main | wc -l
  54

Please let me know what other information I can provide, thanks in advance,
Jesse Hathaway

^ permalink raw reply	[flat|nested] 23+ messages in thread