[net-next PATCH 0/5] netfilter: conntrack: optimization, remove central spinlock

* [net-next PATCH 0/5] netfilter: conntrack: optimization, remove central spinlock
@ 2014-02-27 16:41 Jesper Dangaard Brouer
  2014-02-27 16:41 ` [net-next PATCH 1/5] netfilter: trivial code cleanup and doc changes Jesper Dangaard Brouer
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2014-02-27 16:41 UTC (permalink / raw)
  To: netdev, Eric Dumazet, Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, David S. Miller, Florian Westphal

This patchset change the conntrack locking and provides a huge
performance improvements.

This patchset is based upon Eric Dumazet's proposed patch:
  http://thread.gmane.org/gmane.linux.network/268758/focus=47306
I have in agreement with Eric Dumazet, taken over this patch (and
turned it into a entire patchset).

Primary focus is to remove the central spinlock nf_conntrack_lock.
This requires several steps to be acheived.

Patch01: Trivial cleanups

Patch02: Moves the "special" dying/unconfirmed/template lists to use a
 per cpu spinlock.

Patch03: Is preparing for patch04, as it address a race
 condition. Doing this a seperate patch for reviewers sake.

Patch04: Seperates expect locking from nf_conntrack_lock. The expect
 list is small (default max 256), this it just get a single lock.

Patch05: Finally can remove nf_conntrack_lock, and instead uses an
 array of hashed spinlocks to protect insertions/deletions of
 conntracks into the hash table.  While still allowing dynamic
 resizing of the hash table.

Testing
-------
For expectations I've mostly tested the FTP nf_conntrack_ftp
helper module, by commands:

 for x in `seq 1 300`; do \
   echo $x; \
   echo -e "USER anonymous\nPASS nothing\nPASV" | nc 192.168.42.129 21; \
 done

 wget ftp://192.168.42.129/pub/delete.me.4k -O /dev/null

For overload/DoS testing, I've primarily done, SYN-flood attack testing.
Results on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen)

 Base kernel : New   810.405 conntrack/sec
 Fixed kernel: New 2.233.876 conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

E.g. this machine can reflect 6.481.463 "invalid" conntrack/sec (from
an ACK-flood).

Perf data:
----------
The nf_conntrack_lock is suffers from huge contention on current
generation servers (8 or more core/threads).  Data from under
SYN-flooding (without a listen socket)

Perf locking congestion is very "visible" on a base kernel:

    -  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
       - _raw_spin_lock_bh
          + 25.33% init_conntrack
          + 24.86% nf_ct_delete_from_lists
          + 24.62% __nf_conntrack_confirm
          + 24.38% destroy_conntrack
          + 0.70% tcp_packet
    +   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
    +   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
    +   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
    +   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
    +   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

Perf after the patchset (SYN-flood attack):

+   9.62%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   3.78%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   2.71%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   2.55%  ksoftirqd/6  [kernel.kallsyms]    [k] check_leaf
+   2.38%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table
+   2.06%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_alloc
+   1.94%  ksoftirqd/6  [nf_conntrack]       [k] __nf_conntrack_alloc
-   1.94%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock
   - _raw_spin_lock
      + 90.32% nf_conntrack_double_lock
      + 3.61% get_partial_node
      + 1.81% nf_ct_delete_from_lists
      + 1.68% __nf_conntrack_confirm
      + 1.03% sch_direct_xmit
      + 0.52% scheduler_tick
+   1.86%  ksoftirqd/6  [kernel.kallsyms]    [k] nf_iterate
+   1.80%  ksoftirqd/6  [nf_conntrack]       [k] init_conntrack
+   1.77%  ksoftirqd/6  [kernel.kallsyms]    [k] __neigh_event_send
-   1.70%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 32.55% nf_ct_del_from_dying_or_unconfirmed_list
      + 25.33% init_conntrack
      + 19.88% tcp_packet
      + 17.97% nf_ct_delete_from_lists
      + 1.62% nf_conntrack_in
      + 1.33% ixgbe_poll
      + 0.74% destroy_conntrack
+   1.64%  ksoftirqd/6  [nf_conntrack]       [k] hash_conntrack_raw
+   1.58%  ksoftirqd/6  [kernel.kallsyms]    [k] __netif_receive_skb_core
+   1.51%  ksoftirqd/6  [nf_conntrack]       [k] __nf_conntrack_find_get
+   1.48%  ksoftirqd/6  [kernel.kallsyms]    [k] __cmpxchg_double_slab
+   1.46%  ksoftirqd/6  [nf_conntrack]       [k] nf_conntrack_in
+   1.45%  ksoftirqd/6  [kernel.kallsyms]    [k] __local_bh_enable_ip

---

Jesper Dangaard Brouer (5):
      netfilter: conntrack: remove central spinlock nf_conntrack_lock
      netfilter: conntrack: seperate expect locking from nf_conntrack_lock
      netfilter: avoid race with exp->master ct
      netfilter: conntrack: spinlock per cpu to protect special lists.
      netfilter: trivial code cleanup and doc changes

 include/net/netfilter/nf_conntrack.h      |   11 +
 include/net/netfilter/nf_conntrack_core.h |    9 +
 include/net/netns/conntrack.h             |   13 +
 net/netfilter/nf_conntrack_core.c         |  427 ++++++++++++++++++++---------
 net/netfilter/nf_conntrack_expect.c       |   36 ++
 net/netfilter/nf_conntrack_h323_main.c    |    4 
 net/netfilter/nf_conntrack_helper.c       |   37 ++-
 net/netfilter/nf_conntrack_netlink.c      |  128 +++++----
 net/netfilter/nf_conntrack_sip.c          |    8 -
 9 files changed, 456 insertions(+), 217 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread