All of lore.kernel.org
 help / color / mirror / Atom feed
From: kaber@trash.net
To: davem@davemloft.net
Cc: netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH 58/79] netfilter: x_table: speedup compat operations
Date: Wed, 19 Jan 2011 20:14:58 +0100	[thread overview]
Message-ID: <1295464519-21763-59-git-send-email-kaber@trash.net> (raw)
In-Reply-To: <1295464519-21763-1-git-send-email-kaber@trash.net>

From: Eric Dumazet <eric.dumazet@gmail.com>

One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.

We eventually trigger NMI/RCU watchdog.

INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.

Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.

This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]

Time of iptables goes from 35 s to 150 ms.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/linux/netfilter/x_tables.h |    3 +-
 net/bridge/netfilter/ebtables.c    |    1 +
 net/ipv4/netfilter/arp_tables.c    |    2 +
 net/ipv4/netfilter/ip_tables.c     |    2 +
 net/ipv6/netfilter/ip6_tables.c    |    2 +
 net/netfilter/x_tables.c           |   82 +++++++++++++++++++++---------------
 6 files changed, 57 insertions(+), 35 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 742bec0..0f04d98 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -611,8 +611,9 @@ struct _compat_xt_align {
 extern void xt_compat_lock(u_int8_t af);
 extern void xt_compat_unlock(u_int8_t af);
 
-extern int xt_compat_add_offset(u_int8_t af, unsigned int offset, short delta);
+extern int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta);
 extern void xt_compat_flush_offsets(u_int8_t af);
+extern void xt_compat_init_offsets(u_int8_t af, unsigned int number);
 extern int xt_compat_calc_jump(u_int8_t af, unsigned int offset);
 
 extern int xt_compat_match_offset(const struct xt_match *match);
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 16df053..5f1825d 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1764,6 +1764,7 @@ static int compat_table_info(const struct ebt_table_info *info,
 
 	newinfo->entries_size = size;
 
+	xt_compat_init_offsets(AF_INET, info->nentries);
 	return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info,
 							entries, newinfo);
 }
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 3fac340..47e5178 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -883,6 +883,7 @@ static int compat_table_info(const struct xt_table_info *info,
 	memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
 	newinfo->initial_entries = 0;
 	loc_cpu_entry = info->entries[raw_smp_processor_id()];
+	xt_compat_init_offsets(NFPROTO_ARP, info->number);
 	xt_entry_foreach(iter, loc_cpu_entry, info->size) {
 		ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
 		if (ret != 0)
@@ -1350,6 +1351,7 @@ static int translate_compat_table(const char *name,
 	duprintf("translate_compat_table: size %u\n", info->size);
 	j = 0;
 	xt_compat_lock(NFPROTO_ARP);
+	xt_compat_init_offsets(NFPROTO_ARP, number);
 	/* Walk through entries, checking offsets. */
 	xt_entry_foreach(iter0, entry0, total_size) {
 		ret = check_compat_entry_size_and_hooks(iter0, info, &size,
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index a846d63..c5a75d7 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1080,6 +1080,7 @@ static int compat_table_info(const struct xt_table_info *info,
 	memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
 	newinfo->initial_entries = 0;
 	loc_cpu_entry = info->entries[raw_smp_processor_id()];
+	xt_compat_init_offsets(AF_INET, info->number);
 	xt_entry_foreach(iter, loc_cpu_entry, info->size) {
 		ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
 		if (ret != 0)
@@ -1681,6 +1682,7 @@ translate_compat_table(struct net *net,
 	duprintf("translate_compat_table: size %u\n", info->size);
 	j = 0;
 	xt_compat_lock(AF_INET);
+	xt_compat_init_offsets(AF_INET, number);
 	/* Walk through entries, checking offsets. */
 	xt_entry_foreach(iter0, entry0, total_size) {
 		ret = check_compat_entry_size_and_hooks(iter0, info, &size,
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 4555823..0c9973a 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1093,6 +1093,7 @@ static int compat_table_info(const struct xt_table_info *info,
 	memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
 	newinfo->initial_entries = 0;
 	loc_cpu_entry = info->entries[raw_smp_processor_id()];
+	xt_compat_init_offsets(AF_INET6, info->number);
 	xt_entry_foreach(iter, loc_cpu_entry, info->size) {
 		ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
 		if (ret != 0)
@@ -1696,6 +1697,7 @@ translate_compat_table(struct net *net,
 	duprintf("translate_compat_table: size %u\n", info->size);
 	j = 0;
 	xt_compat_lock(AF_INET6);
+	xt_compat_init_offsets(AF_INET6, number);
 	/* Walk through entries, checking offsets. */
 	xt_entry_foreach(iter0, entry0, total_size) {
 		ret = check_compat_entry_size_and_hooks(iter0, info, &size,
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8046350..ee5de3a 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -38,9 +38,8 @@ MODULE_DESCRIPTION("{ip,ip6,arp,eb}_tables backend module");
 #define SMP_ALIGN(x) (((x) + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1))
 
 struct compat_delta {
-	struct compat_delta *next;
-	unsigned int offset;
-	int delta;
+	unsigned int offset; /* offset in kernel */
+	int delta; /* delta in 32bit user land */
 };
 
 struct xt_af {
@@ -49,7 +48,9 @@ struct xt_af {
 	struct list_head target;
 #ifdef CONFIG_COMPAT
 	struct mutex compat_mutex;
-	struct compat_delta *compat_offsets;
+	struct compat_delta *compat_tab;
+	unsigned int number; /* number of slots in compat_tab[] */
+	unsigned int cur; /* number of used slots in compat_tab[] */
 #endif
 };
 
@@ -414,54 +415,67 @@ int xt_check_match(struct xt_mtchk_param *par,
 EXPORT_SYMBOL_GPL(xt_check_match);
 
 #ifdef CONFIG_COMPAT
-int xt_compat_add_offset(u_int8_t af, unsigned int offset, short delta)
+int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta)
 {
-	struct compat_delta *tmp;
+	struct xt_af *xp = &xt[af];
 
-	tmp = kmalloc(sizeof(struct compat_delta), GFP_KERNEL);
-	if (!tmp)
-		return -ENOMEM;
+	if (!xp->compat_tab) {
+		if (!xp->number)
+			return -EINVAL;
+		xp->compat_tab = vmalloc(sizeof(struct compat_delta) * xp->number);
+		if (!xp->compat_tab)
+			return -ENOMEM;
+		xp->cur = 0;
+	}
 
-	tmp->offset = offset;
-	tmp->delta = delta;
+	if (xp->cur >= xp->number)
+		return -EINVAL;
 
-	if (xt[af].compat_offsets) {
-		tmp->next = xt[af].compat_offsets->next;
-		xt[af].compat_offsets->next = tmp;
-	} else {
-		xt[af].compat_offsets = tmp;
-		tmp->next = NULL;
-	}
+	if (xp->cur)
+		delta += xp->compat_tab[xp->cur - 1].delta;
+	xp->compat_tab[xp->cur].offset = offset;
+	xp->compat_tab[xp->cur].delta = delta;
+	xp->cur++;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xt_compat_add_offset);
 
 void xt_compat_flush_offsets(u_int8_t af)
 {
-	struct compat_delta *tmp, *next;
-
-	if (xt[af].compat_offsets) {
-		for (tmp = xt[af].compat_offsets; tmp; tmp = next) {
-			next = tmp->next;
-			kfree(tmp);
-		}
-		xt[af].compat_offsets = NULL;
+	if (xt[af].compat_tab) {
+		vfree(xt[af].compat_tab);
+		xt[af].compat_tab = NULL;
+		xt[af].number = 0;
 	}
 }
 EXPORT_SYMBOL_GPL(xt_compat_flush_offsets);
 
 int xt_compat_calc_jump(u_int8_t af, unsigned int offset)
 {
-	struct compat_delta *tmp;
-	int delta;
-
-	for (tmp = xt[af].compat_offsets, delta = 0; tmp; tmp = tmp->next)
-		if (tmp->offset < offset)
-			delta += tmp->delta;
-	return delta;
+	struct compat_delta *tmp = xt[af].compat_tab;
+	int mid, left = 0, right = xt[af].cur - 1;
+
+	while (left <= right) {
+		mid = (left + right) >> 1;
+		if (offset > tmp[mid].offset)
+			left = mid + 1;
+		else if (offset < tmp[mid].offset)
+			right = mid - 1;
+		else
+			return mid ? tmp[mid - 1].delta : 0;
+	}
+	WARN_ON_ONCE(1);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(xt_compat_calc_jump);
 
+void xt_compat_init_offsets(u_int8_t af, unsigned int number)
+{
+	xt[af].number = number;
+	xt[af].cur = 0;
+}
+EXPORT_SYMBOL(xt_compat_init_offsets);
+
 int xt_compat_match_offset(const struct xt_match *match)
 {
 	u_int16_t csize = match->compatsize ? : match->matchsize;
@@ -1337,7 +1351,7 @@ static int __init xt_init(void)
 		mutex_init(&xt[i].mutex);
 #ifdef CONFIG_COMPAT
 		mutex_init(&xt[i].compat_mutex);
-		xt[i].compat_offsets = NULL;
+		xt[i].compat_tab = NULL;
 #endif
 		INIT_LIST_HEAD(&xt[i].target);
 		INIT_LIST_HEAD(&xt[i].match);
-- 
1.7.2.3


  parent reply	other threads:[~2011-01-19 19:15 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-19 19:14 [PATCH 00/79] netfilter: netfilter update kaber
2011-01-19 19:14 ` [PATCH 01/79] netfilter: nf_conntrack: don't always initialize ct->proto kaber
2011-01-19 19:14 ` [PATCH 02/79] netfilter: xt_NFQUEUE: remove modulo operations kaber
2011-01-19 19:14 ` [PATCH 03/79] netfilter: xt_LOG: do print MAC header on FORWARD kaber
2011-01-19 19:14 ` [PATCH 04/79] netfilter: ct_extend: fix the wrong alloc_size kaber
2011-01-19 19:14 ` [PATCH 05/79] netfilter: nf_conntrack: define ct_*_info as needed kaber
2011-01-19 19:14 ` [PATCH 06/79] netfilter: nf_nat: don't use atomic bit operation kaber
2011-01-19 19:14 ` [PATCH 07/79] netfilter: ct_extend: define NF_CT_EXT_* as needed kaber
2011-01-19 19:14 ` [PATCH 08/79] netfilter: nf_nat: define nat_pptp_info " kaber
2011-01-19 19:14 ` [PATCH 09/79] netfilter: xt_CLASSIFY: add ARP support, allow CLASSIFY target on any table kaber
2011-01-19 19:14 ` [PATCH 10/79] netfilter: add __rcu annotations kaber
2011-01-19 19:14 ` [PATCH 11/79] netfilter: nf_ct_frag6_sysctl_table is static kaber
2011-01-19 19:14 ` [PATCH 12/79] netfilter: add __rcu annotations kaber
2011-01-19 19:14 ` [PATCH 13/79] netfilter: nf_nat_amanda: rename a variable kaber
2011-01-19 19:14 ` [PATCH 14/79] netfilter: rcu sparse cleanups kaber
2011-01-19 19:14 ` [PATCH 15/79] IPVS: Add persistence engine to connection entry kaber
2011-01-19 19:14 ` [PATCH 16/79] IPVS: Only match pe_data created by the same pe kaber
2011-01-19 19:14 ` [PATCH 17/79] IPVS: Make the cp argument to ip_vs_sync_conn() static kaber
2011-01-19 19:14 ` [PATCH 18/79] IPVS: Remove useless { } block from ip_vs_process_message() kaber
2011-01-19 19:40   ` Joe Perches
2011-01-25  2:10     ` Simon Horman
2011-01-25  5:16       ` Simon Horman
2011-01-19 19:14 ` [PATCH 19/79] IPVS: buffer argument to ip_vs_process_message() should not be const kaber
2011-01-19 19:14 ` [PATCH 20/79] ipvs: add static and read_mostly attributes kaber
2011-01-19 19:14 ` [PATCH 21/79] ipvs: remove shadow rt variable kaber
2011-01-19 19:14 ` [PATCH 22/79] ipvs: allow transmit of GRO aggregated skbs kaber
2011-01-19 19:14 ` [PATCH 23/79] netfilter: nf_conntrack: one less atomic op in nf_ct_expect_insert() kaber
2011-01-19 19:14 ` [PATCH 24/79] IPVS: Backup, Prepare for transferring firewall marks (fwmark) to the backup daemon kaber
2011-01-19 19:14 ` [PATCH 25/79] IPVS: Split ports[2] into src_port and dst_port kaber
2011-01-19 19:14 ` [PATCH 26/79] IPVS: skb defrag in L7 helpers kaber
2011-01-19 19:14 ` [PATCH 27/79] IPVS: Handle Scheduling errors kaber
2011-01-19 19:14 ` [PATCH 28/79] IPVS: Backup, Adding structs for new sync format kaber
2011-01-19 19:14 ` [PATCH 29/79] IPVS: Backup, Adding Version 1 receive capability kaber
2011-01-19 19:14 ` [PATCH 30/79] IPVS: Backup, Change sending to Version 1 format kaber
2011-01-19 19:14 ` [PATCH 31/79] IPVS: Backup, adding version 0 sending capabilities kaber
2011-01-19 19:14 ` [PATCH 32/79] netfilter: xtables: use guarded types kaber
2011-01-19 19:14 ` [PATCH 33/79] netfilter: fix compilation when conntrack is disabled but tproxy is enabled kaber
2011-01-19 19:14 ` [PATCH 34/79] IPVS: netns, add basic init per netns kaber
2011-01-19 19:14 ` [PATCH 35/79] IPVS: netns to services part 1 kaber
2011-01-19 19:14 ` [PATCH 36/79] IPVS: netns awarness to lblcr sheduler kaber
2011-01-19 19:14 ` [PATCH 37/79] IPVS: netns awarness to lblc sheduler kaber
2011-01-19 19:14 ` [PATCH 38/79] IPVS: netns, prepare protocol kaber
2011-01-19 19:14 ` [PATCH 39/79] IPVS: netns preparation for proto_tcp kaber
2011-01-19 19:14 ` [PATCH 40/79] IPVS: netns preparation for proto_udp kaber
2011-01-19 19:14 ` [PATCH 41/79] IPVS: netns preparation for proto_sctp kaber
2011-01-19 19:14 ` [PATCH 42/79] IPVS: netns preparation for proto_ah_esp kaber
2011-01-19 19:14 ` [PATCH 43/79] IPVS: netns, use ip_vs_proto_data as param kaber
2011-01-19 19:14 ` [PATCH 44/79] IPVS: netns, common protocol changes and use of appcnt kaber
2011-01-19 19:14 ` [PATCH 45/79] IPVS: netns awareness to ip_vs_app kaber
2011-01-19 19:14 ` [PATCH 46/79] IPVS: netns awareness to ip_vs_est kaber
2011-01-19 19:14 ` [PATCH 47/79] IPVS: netns awareness to ip_vs_sync kaber
2011-01-19 19:14 ` [PATCH 48/79] IPVS: netns, ip_vs_stats and its procfs kaber
2011-01-19 19:14 ` [PATCH 49/79] IPVS: netns, connection hash got net as param kaber
2011-01-19 19:14 ` [PATCH 50/79] IPVS: netns, ip_vs_ctl local vars moved to ipvs struct kaber
2011-01-19 19:14 ` [PATCH 51/79] IPVS: netns, defense work timer kaber
2011-01-19 19:14 ` [PATCH 52/79] IPVS: netns, trash handling kaber
2011-01-19 19:14 ` [PATCH 53/79] IPVS: netns, svc counters moved in ip_vs_ctl,c kaber
2011-01-19 19:14 ` [PATCH 54/79] IPVS: netns, misc init_net removal in core kaber
2011-01-19 19:14 ` [PATCH 55/79] IPVS: netns, final patch enabling network name space kaber
2011-01-19 19:14 ` [PATCH 56/79] netfilter: xt_comment: drop unneeded unsigned qualifier kaber
2011-01-19 19:14 ` [PATCH 57/79] netfilter: xt_conntrack: support matching on port ranges kaber
2011-01-19 19:14 ` kaber [this message]
2011-01-19 19:14 ` [PATCH 59/79] netfilter: ebt_ip6: allow matching on ipv6-icmp types/codes kaber
2011-01-19 19:15 ` [PATCH 60/79] netfilter: fix Kconfig dependencies kaber
2011-01-19 19:15 ` [PATCH 61/79] netfilter: nf_conntrack: use is_vmalloc_addr() kaber
2011-01-19 19:15 ` [PATCH 62/79] netfilter: audit target to record accepted/dropped packets kaber
2011-01-19 19:15 ` [PATCH 63/79] netfilter: create audit records for x_tables replaces kaber
2011-01-19 19:15 ` [PATCH 64/79] netfilter: xtables: add missing aliases for autoloading via iptables kaber
2011-01-19 19:15 ` [PATCH 65/79] audit: export symbol for use with xt_AUDIT kaber
2011-01-19 19:15 ` [PATCH 66/79] netfilter: xt_connlimit: use hotdrop jump mark kaber
2011-01-19 19:15 ` [PATCH 67/79] netfilter: xtables: use __uXX guarded types for userspace exports kaber
2011-01-19 19:15 ` [PATCH 68/79] netfilter: xtables: add missing header files to export list kaber
2011-01-19 19:15 ` [PATCH 69/79] netfilter: nf_nat: fix conversion to non-atomic bit ops kaber
2011-01-19 19:15 ` [PATCH 70/79] netfilter: nf_conntrack: remove an atomic bit operation kaber
2011-01-19 19:15 ` [PATCH 71/79] netfilter: Kconfig: NFQUEUE is useless without NETFILTER_NETLINK_QUEUE kaber
2011-01-19 19:15 ` [PATCH 72/79] netfilter: nfnetlink_queue: return error number to caller kaber
2011-01-19 19:15 ` [PATCH 73/79] netfilter: nfnetlink_queue: do not free skb on error kaber
2011-01-19 19:15 ` [PATCH 74/79] netfilter: reduce NF_VERDICT_MASK to 0xff kaber
2011-01-19 19:15 ` [PATCH 75/79] netfilter: allow NFQUEUE bypass if no listener is available kaber
2011-01-19 19:15 ` [PATCH 76/79] netfilter: ipt_CLUSTERIP: remove "no conntrack!" kaber
2011-01-19 19:15 ` [PATCH 77/79] netfilter: nf_conntrack: nf_conntrack snmp helper kaber
2011-01-19 19:15 ` [PATCH 78/79] netfilter: nf_conntrack_tstamp: add flow-based timestamp extension kaber
2011-01-19 19:15 ` [PATCH 79/79] netfilter: nf_conntrack: fix lifetime display for disabled connections kaber
2011-01-19 21:55 ` [PATCH 00/79] netfilter: netfilter update David Miller
2011-01-20  0:50   ` David Miller
2011-01-20  0:59     ` Jan Engelhardt
2011-01-20  1:13     ` Patrick McHardy
2011-01-20  1:36       ` Jan Engelhardt
2011-01-20  7:49         ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1295464519-21763-59-git-send-email-kaber@trash.net \
    --to=kaber@trash.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.