All of lore.kernel.org
 help / color / mirror / Atom feed
* [nf-next PATCH 0/5] (repost) netfilter: conntrack: optimization, remove central spinlock
@ 2014-02-27 18:23 Jesper Dangaard Brouer
  2014-02-27 18:23 ` [nf-next PATCH 1/5] netfilter: trivial code cleanup and doc changes Jesper Dangaard Brouer
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2014-02-27 18:23 UTC (permalink / raw)
  To: netfilter-devel, Eric Dumazet, Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, netdev, David S. Miller,
	Florian Westphal, Patrick McHardy

(Repost to netfilter-devel list)

This patchset change the conntrack locking and provides a huge
performance improvements.

This patchset is based upon Eric Dumazet's proposed patch:
  http://thread.gmane.org/gmane.linux.network/268758/focus=47306
I have in agreement with Eric Dumazet, taken over this patch (and
turned it into a entire patchset).

Primary focus is to remove the central spinlock nf_conntrack_lock.
This requires several steps to be acheived.

Patch01: Trivial cleanups

Patch02: Moves the "special" dying/unconfirmed/template lists to use a
 per cpu spinlock.

Patch03: Is preparing for patch04, as it address a race
 condition. Doing this a seperate patch for reviewers sake.

Patch04: Seperates expect locking from nf_conntrack_lock. The expect
 list is small (default max 256), this it just get a single lock.

Patch05: Finally can remove nf_conntrack_lock, and instead uses an
 array of hashed spinlocks to protect insertions/deletions of
 conntracks into the hash table.  While still allowing dynamic
 resizing of the hash table.


Testing
-------
For expectations I've mostly tested the FTP nf_conntrack_ftp
helper module, by commands:

 for x in `seq 1 300`; do \
   echo $x; \
   echo -e "USER anonymous\nPASS nothing\nPASV" | nc 192.168.42.129 21; \
 done

 wget ftp://192.168.42.129/pub/delete.me.4k -O /dev/null

For overload/DoS testing, I've primarily done, SYN-flood attack testing.
Results on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen)

 Base kernel : New   810.405 conntrack/sec
 Fixed kernel: New 2.233.876 conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

E.g. this machine can reflect 6.481.463 "invalid" conntrack/sec (from
an ACK-flood).

Perf data:
----------
The nf_conntrack_lock is suffers from huge contention on current
generation servers (8 or more core/threads).  Data from under
SYN-flooding (without a listen socket)

Perf locking congestion is very "visible" on a base kernel:

  -  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
     - _raw_spin_lock_bh
        + 25.33% init_conntrack
        + 24.86% nf_ct_delete_from_lists
        + 24.62% __nf_conntrack_confirm
        + 24.38% destroy_conntrack
        + 0.70% tcp_packet
  +   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
  +   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
  +   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
  +   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
  +   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

Perf after the patchset (SYN-flood attack):

 +   9.62%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
 +   3.78%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
 +   2.71%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
 +   2.55%  ksoftirqd/6  [kernel.kallsyms]    [k] check_leaf
 +   2.38%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table
 +   2.06%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_alloc
 +   1.94%  ksoftirqd/6  [nf_conntrack]       [k] __nf_conntrack_alloc
 -   1.94%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock
   - _raw_spin_lock
      + 90.32% nf_conntrack_double_lock
      + 3.61% get_partial_node
      + 1.81% nf_ct_delete_from_lists
      + 1.68% __nf_conntrack_confirm
      + 1.03% sch_direct_xmit
      + 0.52% scheduler_tick
 +   1.86%  ksoftirqd/6  [kernel.kallsyms]    [k] nf_iterate
 +   1.80%  ksoftirqd/6  [nf_conntrack]       [k] init_conntrack
 +   1.77%  ksoftirqd/6  [kernel.kallsyms]    [k] __neigh_event_send
 -   1.70%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 32.55% nf_ct_del_from_dying_or_unconfirmed_list
      + 25.33% init_conntrack
      + 19.88% tcp_packet
      + 17.97% nf_ct_delete_from_lists
      + 1.62% nf_conntrack_in
      + 1.33% ixgbe_poll
      + 0.74% destroy_conntrack
 +   1.64%  ksoftirqd/6  [nf_conntrack]       [k] hash_conntrack_raw
 +   1.58%  ksoftirqd/6  [kernel.kallsyms]    [k] __netif_receive_skb_core
 +   1.51%  ksoftirqd/6  [nf_conntrack]       [k] __nf_conntrack_find_get
 +   1.48%  ksoftirqd/6  [kernel.kallsyms]    [k] __cmpxchg_double_slab
 +   1.46%  ksoftirqd/6  [nf_conntrack]       [k] nf_conntrack_in
 +   1.45%  ksoftirqd/6  [kernel.kallsyms]    [k] __local_bh_enable_ip


---

Jesper Dangaard Brouer (5):
      netfilter: conntrack: remove central spinlock nf_conntrack_lock
      netfilter: conntrack: seperate expect locking from nf_conntrack_lock
      netfilter: avoid race with exp->master ct
      netfilter: conntrack: spinlock per cpu to protect special lists.
      netfilter: trivial code cleanup and doc changes


 include/net/netfilter/nf_conntrack.h      |   11 +
 include/net/netfilter/nf_conntrack_core.h |    9 +
 include/net/netns/conntrack.h             |   13 +
 net/netfilter/nf_conntrack_core.c         |  427 ++++++++++++++++++++---------
 net/netfilter/nf_conntrack_expect.c       |   36 ++
 net/netfilter/nf_conntrack_h323_main.c    |    4 
 net/netfilter/nf_conntrack_helper.c       |   37 ++-
 net/netfilter/nf_conntrack_netlink.c      |  128 +++++----
 net/netfilter/nf_conntrack_sip.c          |    8 -
 9 files changed, 456 insertions(+), 217 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-03-03 11:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-27 18:23 [nf-next PATCH 0/5] (repost) netfilter: conntrack: optimization, remove central spinlock Jesper Dangaard Brouer
2014-02-27 18:23 ` [nf-next PATCH 1/5] netfilter: trivial code cleanup and doc changes Jesper Dangaard Brouer
2014-02-27 18:23 ` [nf-next PATCH 2/5] netfilter: conntrack: spinlock per cpu to protect special lists Jesper Dangaard Brouer
2014-02-27 18:23 ` [nf-next PATCH 3/5] netfilter: avoid race with exp->master ct Jesper Dangaard Brouer
2014-02-27 21:34   ` Florian Westphal
2014-02-28 11:30     ` Jesper Dangaard Brouer
2014-02-27 18:23 ` [nf-next PATCH 4/5] netfilter: conntrack: seperate expect locking from nf_conntrack_lock Jesper Dangaard Brouer
2014-02-27 18:23 ` [nf-next PATCH 5/5] netfilter: conntrack: remove central spinlock nf_conntrack_lock Jesper Dangaard Brouer
2014-02-27 23:34 ` [nf-next PATCH 0/5] (repost) netfilter: conntrack: optimization, remove central spinlock David Miller
2014-02-28  9:47   ` Jesper Dangaard Brouer
2014-02-28 12:16 ` [nf-next PATCH V2 0/5] " Jesper Dangaard Brouer
2014-02-28 12:16   ` [nf-next PATCH V2 1/5] netfilter: trivial code cleanup and doc changes Jesper Dangaard Brouer
2014-02-28 12:17   ` [nf-next PATCH V2 2/5] netfilter: conntrack: spinlock per cpu to protect special lists Jesper Dangaard Brouer
2014-02-28 12:17   ` [nf-next PATCH V2 3/5] netfilter: avoid race with exp->master ct Jesper Dangaard Brouer
2014-02-28 12:17   ` [nf-next PATCH V2 4/5] netfilter: conntrack: seperate expect locking from nf_conntrack_lock Jesper Dangaard Brouer
2014-02-28 15:08     ` Florian Westphal
2014-03-03 11:33       ` Jesper Dangaard Brouer
2014-02-28 12:17   ` [nf-next PATCH V2 5/5] netfilter: conntrack: remove central spinlock nf_conntrack_lock Jesper Dangaard Brouer
2014-03-03  1:14   ` [nf-next PATCH V2 0/5] netfilter: conntrack: optimization, remove central spinlock David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.