All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 nf-next] netfilter: x_tables: speed up iptables-restore
@ 2017-10-10 21:39 Florian Westphal
  2017-10-10 21:39 ` [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore Florian Westphal
  2017-10-10 21:39 ` [PATCH v2 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal
  0 siblings, 2 replies; 5+ messages in thread
From: Florian Westphal @ 2017-10-10 21:39 UTC (permalink / raw)
  To: netdev, edumazet

iptables-restore can take quite a long time when sytem is busy,
in order of half a minute or more.
The main reason for this is the way ip(6)tables performs table
swap, or, more precisely, expensive sequence lock synchronizations
when reading counters.

When xt_replace_table assigns the new ruleset pointer, it does
not wait for other processors to finish with old ruleset.

Instead it relies on the counter sequence lock in get_counters()
to do this.

This works but this is very costly if system is busy as each counter
read operation can possibly be restarted indefinitely.

Instead, make xt_replace_table wait until all processors are
known to not use the old ruleset anymore.

This allows to read the old counters without any locking, no cpu is
using the ruleset anymore so counters can't change either.

 ipv4/netfilter/arp_tables.c |   22 ++++++++++++++++++++--
 ipv4/netfilter/ip_tables.c  |   23 +++++++++++++++++++++--
 ipv6/netfilter/ip6_tables.c |   22 ++++++++++++++++++++--
 netfilter/x_tables.c        |   15 ++++++++++++---
 4 files changed, 73 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore
  2017-10-10 21:39 [PATCH v2 nf-next] netfilter: x_tables: speed up iptables-restore Florian Westphal
@ 2017-10-10 21:39 ` Florian Westphal
  2017-10-11 13:23   ` Eric Dumazet
  2017-10-10 21:39 ` [PATCH v2 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal
  1 sibling, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2017-10-10 21:39 UTC (permalink / raw)
  To: netdev, edumazet; +Cc: Florian Westphal, Dan Williams

xt_replace_table relies on table replacement counter retrieval (which
uses xt_recseq to synchronize pcpu counters).

This is fine, however with large rule set get_counters() can take
a very long time -- it needs to synchronize all counters because
it has to assume concurrent modifications can occur.

Make xt_replace_table synchronize by itself by waiting until all cpus
had an even seqcount.

This allows a followup patch to copy the counters of the old ruleset
without any synchonization after xt_replace_table has completed.

Cc: Dan Williams <dcbw@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: fix Erics email address

 net/netfilter/x_tables.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index c83a3b5e1c6c..f2d4a365768f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1153,6 +1153,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	unsigned int cpu;
 	int ret;
 
 	ret = xt_jumpstack_alloc(newinfo);
@@ -1184,12 +1185,20 @@ xt_replace_table(struct xt_table *table,
 
 	/*
 	 * Even though table entries have now been swapped, other CPU's
-	 * may still be using the old entries. This is okay, because
-	 * resynchronization happens because of the locking done
-	 * during the get_counters() routine.
+	 * may still be using the old entries...
 	 */
 	local_bh_enable();
 
+	/* ... so wait for even xt_recseq on all cpus */
+	for_each_possible_cpu(cpu) {
+		seqcount_t *s = &per_cpu(xt_recseq, cpu);
+
+		while (raw_read_seqcount(s) & 1)
+			cpu_relax();
+
+		cond_resched();
+	}
+
 #ifdef CONFIG_AUDIT
 	if (audit_enabled) {
 		audit_log(current->audit_context, GFP_KERNEL,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters
  2017-10-10 21:39 [PATCH v2 nf-next] netfilter: x_tables: speed up iptables-restore Florian Westphal
  2017-10-10 21:39 ` [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore Florian Westphal
@ 2017-10-10 21:39 ` Florian Westphal
  1 sibling, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2017-10-10 21:39 UTC (permalink / raw)
  To: netdev, edumazet; +Cc: Florian Westphal, Dan Williams

after previous commit xt_replace_table will wait until all cpus
had even seqcount (i.e., no cpu is accessing old ruleset).

Add a 'old' counter retrival version that doesn't synchronize counters.
Its not needed, the old counters are not in use anymore at this point.

This speeds up table replacement on busy systems with large tables
(and many cores).

Cc: Dan Williams <dcbw@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: fix Erics email address

 net/ipv4/netfilter/arp_tables.c | 22 ++++++++++++++++++++--
 net/ipv4/netfilter/ip_tables.c  | 23 +++++++++++++++++++++--
 net/ipv6/netfilter/ip6_tables.c | 22 ++++++++++++++++++++--
 3 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 9e2770fd00be..f88221aebc9d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -634,6 +634,25 @@ static void get_counters(const struct xt_table_info *t,
 	}
 }
 
+static void get_old_counters(const struct xt_table_info *t,
+			     struct xt_counters counters[])
+{
+	struct arpt_entry *iter;
+	unsigned int cpu, i;
+
+	for_each_possible_cpu(cpu) {
+		i = 0;
+		xt_entry_foreach(iter, t->entries, t->size) {
+			struct xt_counters *tmp;
+
+			tmp = xt_get_per_cpu_counter(&iter->counters, cpu);
+			ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt);
+			++i;
+		}
+		cond_resched();
+	}
+}
+
 static struct xt_counters *alloc_counters(const struct xt_table *table)
 {
 	unsigned int countersize;
@@ -910,8 +929,7 @@ static int __do_replace(struct net *net, const char *name,
 	    (newinfo->number <= oldinfo->initial_entries))
 		module_put(t->me);
 
-	/* Get the old counters, and synchronize with replace */
-	get_counters(oldinfo, counters);
+	get_old_counters(oldinfo, counters);
 
 	/* Decrease module usage counts and free resource */
 	loc_cpu_old_entry = oldinfo->entries;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 39286e543ee6..4cbe5e80f3bf 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -781,6 +781,26 @@ get_counters(const struct xt_table_info *t,
 	}
 }
 
+static void get_old_counters(const struct xt_table_info *t,
+			     struct xt_counters counters[])
+{
+	struct ipt_entry *iter;
+	unsigned int cpu, i;
+
+	for_each_possible_cpu(cpu) {
+		i = 0;
+		xt_entry_foreach(iter, t->entries, t->size) {
+			const struct xt_counters *tmp;
+
+			tmp = xt_get_per_cpu_counter(&iter->counters, cpu);
+			ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt);
+			++i; /* macro does multi eval of i */
+		}
+
+		cond_resched();
+	}
+}
+
 static struct xt_counters *alloc_counters(const struct xt_table *table)
 {
 	unsigned int countersize;
@@ -1070,8 +1090,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
 	    (newinfo->number <= oldinfo->initial_entries))
 		module_put(t->me);
 
-	/* Get the old counters, and synchronize with replace */
-	get_counters(oldinfo, counters);
+	get_old_counters(oldinfo, counters);
 
 	/* Decrease module usage counts and free resource */
 	xt_entry_foreach(iter, oldinfo->entries, oldinfo->size)
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 01bd3ee5ebc6..f06e25065a34 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -800,6 +800,25 @@ get_counters(const struct xt_table_info *t,
 	}
 }
 
+static void get_old_counters(const struct xt_table_info *t,
+			     struct xt_counters counters[])
+{
+	struct ip6t_entry *iter;
+	unsigned int cpu, i;
+
+	for_each_possible_cpu(cpu) {
+		i = 0;
+		xt_entry_foreach(iter, t->entries, t->size) {
+			const struct xt_counters *tmp;
+
+			tmp = xt_get_per_cpu_counter(&iter->counters, cpu);
+			ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt);
+			++i;
+		}
+		cond_resched();
+	}
+}
+
 static struct xt_counters *alloc_counters(const struct xt_table *table)
 {
 	unsigned int countersize;
@@ -1090,8 +1109,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
 	    (newinfo->number <= oldinfo->initial_entries))
 		module_put(t->me);
 
-	/* Get the old counters, and synchronize with replace */
-	get_counters(oldinfo, counters);
+	get_old_counters(oldinfo, counters);
 
 	/* Decrease module usage counts and free resource */
 	xt_entry_foreach(iter, oldinfo->entries, oldinfo->size)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore
  2017-10-10 21:39 ` [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore Florian Westphal
@ 2017-10-11 13:23   ` Eric Dumazet
  2017-10-11 13:45     ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2017-10-11 13:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Dan Williams

On Tue, Oct 10, 2017 at 2:39 PM, Florian Westphal <fw@strlen.de> wrote:
> xt_replace_table relies on table replacement counter retrieval (which
> uses xt_recseq to synchronize pcpu counters).
>
> This is fine, however with large rule set get_counters() can take
> a very long time -- it needs to synchronize all counters because
> it has to assume concurrent modifications can occur.
>
> Make xt_replace_table synchronize by itself by waiting until all cpus
> had an even seqcount.
>
> This allows a followup patch to copy the counters of the old ruleset
> without any synchonization after xt_replace_table has completed.
>
> Cc: Dan Williams <dcbw@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  v2: fix Erics email address
>
>  net/netfilter/x_tables.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index c83a3b5e1c6c..f2d4a365768f 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1153,6 +1153,7 @@ xt_replace_table(struct xt_table *table,
>               int *error)
>  {
>         struct xt_table_info *private;
> +       unsigned int cpu;
>         int ret;
>
>         ret = xt_jumpstack_alloc(newinfo);
> @@ -1184,12 +1185,20 @@ xt_replace_table(struct xt_table *table,
>
>         /*
>          * Even though table entries have now been swapped, other CPU's
> -        * may still be using the old entries. This is okay, because
> -        * resynchronization happens because of the locking done
> -        * during the get_counters() routine.
> +        * may still be using the old entries...
>          */
>         local_bh_enable();
>
> +       /* ... so wait for even xt_recseq on all cpus */
> +       for_each_possible_cpu(cpu) {
> +               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> +
> +               while (raw_read_seqcount(s) & 1)
> +                       cpu_relax();
> +
> +               cond_resched();
> +       }

It seems that we could also check :

1) If low order bit of sequence is 0

Or

2) the value has changed

       for_each_possible_cpu(cpu) {
                seqcount_t *s = &per_cpu(xt_recseq, cpu);
                 u32 seq = raw_read_seqcount(s) ;

                 if (seq & 1) {
                        do {
                            cpu_relax();
                          } while (seq == raw_read_seqcount(s));

Thanks !

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore
  2017-10-11 13:23   ` Eric Dumazet
@ 2017-10-11 13:45     ` Florian Westphal
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2017-10-11 13:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev, Dan Williams

Eric Dumazet <edumazet@google.com> wrote:
> > +       /* ... so wait for even xt_recseq on all cpus */
> > +       for_each_possible_cpu(cpu) {
> > +               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> > +
> > +               while (raw_read_seqcount(s) & 1)
> > +                       cpu_relax();
> > +
> > +               cond_resched();
> > +       }
> 
> It seems that we could also check :
> 
> 1) If low order bit of sequence is 0
>
> Or
> 
> 2) the value has changed
>        for_each_possible_cpu(cpu) {
>                 seqcount_t *s = &per_cpu(xt_recseq, cpu);
>                  u32 seq = raw_read_seqcount(s) ;
> 
>                  if (seq & 1) {
>                         do {
>                             cpu_relax();
>                           } while (seq == raw_read_seqcount(s));
> 

Actually I first used

for_each_possible_cpu(cpu) {
  seqcount_t *s = &per_cpu(xt_recseq, cpu);

  (void)__read_seqcount_begin(s);
}

but it looked confusing (not paired with a _retry function).
I'll respin with a loop like your suggestion above.  Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-11 13:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 21:39 [PATCH v2 nf-next] netfilter: x_tables: speed up iptables-restore Florian Westphal
2017-10-10 21:39 ` [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore Florian Westphal
2017-10-11 13:23   ` Eric Dumazet
2017-10-11 13:45     ` Florian Westphal
2017-10-10 21:39 ` [PATCH v2 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.