* [PATCH v3 nf-next 0/2] netfilter: x_tables: speed up iptables-restore @ 2017-10-11 14:26 Florian Westphal 2017-10-11 14:26 ` [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore Florian Westphal 2017-10-11 14:26 ` [PATCH v3 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal 0 siblings, 2 replies; 8+ messages in thread From: Florian Westphal @ 2017-10-11 14:26 UTC (permalink / raw) To: netfilter-devel iptables-restore can take quite a long time when sytem is busy, in order of half a minute or more. The main reason for this is the way ip(6)tables performs table swap: When xt_replace_table assigns the new ruleset pointer, it does not wait for other processors to finish with old ruleset. Instead it relies on the counter sequence lock in get_counters(). This works but this is costly if system is busy as each counter read operation can possibly be restarted indefinitely. Instead, make xt_replace_table wait until all processors are known to not use the old ruleset anymore. This allows to read the old counters without any locking, no cpu is using the ruleset anymore so counters can't change either. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 14:26 [PATCH v3 nf-next 0/2] netfilter: x_tables: speed up iptables-restore Florian Westphal @ 2017-10-11 14:26 ` Florian Westphal 2017-10-11 15:09 ` Eric Dumazet 2017-10-11 14:26 ` [PATCH v3 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal 1 sibling, 1 reply; 8+ messages in thread From: Florian Westphal @ 2017-10-11 14:26 UTC (permalink / raw) To: netfilter-devel; +Cc: Florian Westphal, Dan Williams, Eric Dumazet xt_replace_table relies on table replacement counter retrieval (which uses xt_recseq to synchronize pcpu counters). This is fine, however with large rule set get_counters() can take a very long time -- it needs to synchronize all counters because it has to assume concurrent modifications can occur. Make xt_replace_table synchronize by itself by waiting until all cpus had an even seqcount. This allows a followup patch to copy the counters of the old ruleset without any synchonization after xt_replace_table has completed. Cc: Dan Williams <dcbw@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> --- v3: check for 'seq is uneven' OR 'has changed' since last check. Its fine if seq is uneven iff its a different sequence number than the initial one. v2: fix Erics email address net/netfilter/x_tables.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index c83a3b5e1c6c..ffd1c7a76e29 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1153,6 +1153,7 @@ xt_replace_table(struct xt_table *table, int *error) { struct xt_table_info *private; + unsigned int cpu; int ret; ret = xt_jumpstack_alloc(newinfo); @@ -1184,12 +1185,23 @@ xt_replace_table(struct xt_table *table, /* * Even though table entries have now been swapped, other CPU's - * may still be using the old entries. This is okay, because - * resynchronization happens because of the locking done - * during the get_counters() routine. + * may still be using the old entries... */ local_bh_enable(); + /* ... so wait for even xt_recseq on all cpus */ + for_each_possible_cpu(cpu) { + seqcount_t *s = &per_cpu(xt_recseq, cpu); + u32 seq = raw_read_seqcount(s); + + if (seq & 1) { + do { + cond_resched(); + cpu_relax(); + } while (seq == raw_read_seqcount(s)); + } + } + #ifdef CONFIG_AUDIT if (audit_enabled) { audit_log(current->audit_context, GFP_KERNEL, -- 2.13.6 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 14:26 ` [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore Florian Westphal @ 2017-10-11 15:09 ` Eric Dumazet 2017-10-11 17:48 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2017-10-11 15:09 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Dan Williams On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > xt_replace_table relies on table replacement counter retrieval (which > uses xt_recseq to synchronize pcpu counters). > > This is fine, however with large rule set get_counters() can take > a very long time -- it needs to synchronize all counters because > it has to assume concurrent modifications can occur. > > Make xt_replace_table synchronize by itself by waiting until all cpus > had an even seqcount. > > This allows a followup patch to copy the counters of the old ruleset > without any synchonization after xt_replace_table has completed. > > Cc: Dan Williams <dcbw@redhat.com> > Cc: Eric Dumazet <edumazet@google.com> > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > v3: check for 'seq is uneven' OR 'has changed' since > last check. Its fine if seq is uneven iff its a different > sequence number than the initial one. > > v2: fix Erics email address > net/netfilter/x_tables.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > Reviewed-by: Eric Dumazet <edumazet@google.com> But it seems we need an extra smp_wmb() after smp_wmb(); table->private = newinfo; + smp_wmb(); Otherwise we have no guarantee other cpus actually see the new ->private value. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 15:09 ` Eric Dumazet @ 2017-10-11 17:48 ` Florian Westphal 2017-10-11 17:57 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2017-10-11 17:48 UTC (permalink / raw) To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel, Dan Williams Eric Dumazet <edumazet@google.com> wrote: > On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > > xt_replace_table relies on table replacement counter retrieval (which > > uses xt_recseq to synchronize pcpu counters). > > > > This is fine, however with large rule set get_counters() can take > > a very long time -- it needs to synchronize all counters because > > it has to assume concurrent modifications can occur. > > > > Make xt_replace_table synchronize by itself by waiting until all cpus > > had an even seqcount. > > > > This allows a followup patch to copy the counters of the old ruleset > > without any synchonization after xt_replace_table has completed. > > > > Cc: Dan Williams <dcbw@redhat.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Florian Westphal <fw@strlen.de> > > --- > > v3: check for 'seq is uneven' OR 'has changed' since > > last check. Its fine if seq is uneven iff its a different > > sequence number than the initial one. > > > > v2: fix Erics email address > > net/netfilter/x_tables.c | 18 +++++++++++++++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > > > Reviewed-by: Eric Dumazet <edumazet@google.com> > > But it seems we need an extra smp_wmb() after > > smp_wmb(); > table->private = newinfo; > + smp_wmb(); > > Otherwise we have no guarantee other cpus actually see the new ->private value. Seems to be unrelated to this change, so I will submit a separate patch for nf.git that adds this. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 17:48 ` Florian Westphal @ 2017-10-11 17:57 ` Eric Dumazet 2017-10-11 18:18 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2017-10-11 17:57 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Dan Williams On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: > Eric Dumazet <edumazet@google.com> wrote: >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: >> > xt_replace_table relies on table replacement counter retrieval (which >> > uses xt_recseq to synchronize pcpu counters). >> > >> > This is fine, however with large rule set get_counters() can take >> > a very long time -- it needs to synchronize all counters because >> > it has to assume concurrent modifications can occur. >> > >> > Make xt_replace_table synchronize by itself by waiting until all cpus >> > had an even seqcount. >> > >> > This allows a followup patch to copy the counters of the old ruleset >> > without any synchonization after xt_replace_table has completed. >> > >> > Cc: Dan Williams <dcbw@redhat.com> >> > Cc: Eric Dumazet <edumazet@google.com> >> > Signed-off-by: Florian Westphal <fw@strlen.de> >> > --- >> > v3: check for 'seq is uneven' OR 'has changed' since >> > last check. Its fine if seq is uneven iff its a different >> > sequence number than the initial one. >> > >> > v2: fix Erics email address >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> > >> > >> >> Reviewed-by: Eric Dumazet <edumazet@google.com> >> >> But it seems we need an extra smp_wmb() after >> >> smp_wmb(); >> table->private = newinfo; >> + smp_wmb(); >> >> Otherwise we have no guarantee other cpus actually see the new ->private value. > > Seems to be unrelated to this change, so I will submit > a separate patch for nf.git that adds this. This is related to this change, please read the comment before the local_bh_enable9) /* * Even though table entries have now been swapped, other CPU's * may still be using the old entries. This is okay, because * resynchronization happens because of the locking done * during the get_counters() routine. */ Since your new code happens right after table->private = newinfo ; the smp_wmb() is required. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 17:57 ` Eric Dumazet @ 2017-10-11 18:18 ` Florian Westphal 2017-10-11 18:23 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2017-10-11 18:18 UTC (permalink / raw) To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel, Dan Williams Eric Dumazet <edumazet@google.com> wrote: > On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > >> > xt_replace_table relies on table replacement counter retrieval (which > >> > uses xt_recseq to synchronize pcpu counters). > >> > > >> > This is fine, however with large rule set get_counters() can take > >> > a very long time -- it needs to synchronize all counters because > >> > it has to assume concurrent modifications can occur. > >> > > >> > Make xt_replace_table synchronize by itself by waiting until all cpus > >> > had an even seqcount. > >> > > >> > This allows a followup patch to copy the counters of the old ruleset > >> > without any synchonization after xt_replace_table has completed. > >> > > >> > Cc: Dan Williams <dcbw@redhat.com> > >> > Cc: Eric Dumazet <edumazet@google.com> > >> > Signed-off-by: Florian Westphal <fw@strlen.de> > >> > --- > >> > v3: check for 'seq is uneven' OR 'has changed' since > >> > last check. Its fine if seq is uneven iff its a different > >> > sequence number than the initial one. > >> > > >> > v2: fix Erics email address > >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- > >> > 1 file changed, 15 insertions(+), 3 deletions(-) > >> > > >> > > >> > >> Reviewed-by: Eric Dumazet <edumazet@google.com> > >> > >> But it seems we need an extra smp_wmb() after > >> > >> smp_wmb(); > >> table->private = newinfo; > >> + smp_wmb(); > >> > >> Otherwise we have no guarantee other cpus actually see the new ->private value. > > > > Seems to be unrelated to this change, so I will submit > > a separate patch for nf.git that adds this. > > This is related to this change, please read the comment before the > local_bh_enable9) > > /* > * Even though table entries have now been swapped, other CPU's > * may still be using the old entries. This is okay, because > * resynchronization happens because of the locking done > * during the get_counters() routine. > */ Hmm, but get_counters() does not issue a wmb, and the 'new' code added here essentially is the same as get_counters(), except that we only read seqcount until we saw a change (and not for each counter in the rule set). What am I missing? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore 2017-10-11 18:18 ` Florian Westphal @ 2017-10-11 18:23 ` Eric Dumazet 0 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2017-10-11 18:23 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Dan Williams On Wed, Oct 11, 2017 at 11:18 AM, Florian Westphal <fw@strlen.de> wrote: > Eric Dumazet <edumazet@google.com> wrote: >> On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: >> > Eric Dumazet <edumazet@google.com> wrote: >> >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: >> >> > xt_replace_table relies on table replacement counter retrieval (which >> >> > uses xt_recseq to synchronize pcpu counters). >> >> > >> >> > This is fine, however with large rule set get_counters() can take >> >> > a very long time -- it needs to synchronize all counters because >> >> > it has to assume concurrent modifications can occur. >> >> > >> >> > Make xt_replace_table synchronize by itself by waiting until all cpus >> >> > had an even seqcount. >> >> > >> >> > This allows a followup patch to copy the counters of the old ruleset >> >> > without any synchonization after xt_replace_table has completed. >> >> > >> >> > Cc: Dan Williams <dcbw@redhat.com> >> >> > Cc: Eric Dumazet <edumazet@google.com> >> >> > Signed-off-by: Florian Westphal <fw@strlen.de> >> >> > --- >> >> > v3: check for 'seq is uneven' OR 'has changed' since >> >> > last check. Its fine if seq is uneven iff its a different >> >> > sequence number than the initial one. >> >> > >> >> > v2: fix Erics email address >> >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- >> >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> >> > >> >> > >> >> >> >> Reviewed-by: Eric Dumazet <edumazet@google.com> >> >> >> >> But it seems we need an extra smp_wmb() after >> >> >> >> smp_wmb(); >> >> table->private = newinfo; >> >> + smp_wmb(); >> >> >> >> Otherwise we have no guarantee other cpus actually see the new ->private value. >> > >> > Seems to be unrelated to this change, so I will submit >> > a separate patch for nf.git that adds this. >> >> This is related to this change, please read the comment before the >> local_bh_enable9) >> >> /* >> * Even though table entries have now been swapped, other CPU's >> * may still be using the old entries. This is okay, because >> * resynchronization happens because of the locking done >> * during the get_counters() routine. >> */ > > Hmm, but get_counters() does not issue a wmb, and the 'new' code added > here essentially is the same as get_counters(), except that we only > read seqcount until we saw a change (and not for each counter in > the rule set). > > What am I missing? Your sync code does nothing interesting if we are not sure new table->private value is visible Without barriers, compiler/cpu could do this : + /* ... so wait for even xt_recseq on all cpus */ + for_each_possible_cpu(cpu) { + seqcount_t *s = &per_cpu(xt_recseq, cpu); + u32 seq = raw_read_seqcount(s); + + if (seq & 1) { + do { + cond_resched(); + cpu_relax(); + } while (seq == raw_read_seqcount(s)); + } + } /* finally, write new private value */ table->private = newinfo; Basically, your loop is now useless and you could remove it. So there is definitely a bug. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters 2017-10-11 14:26 [PATCH v3 nf-next 0/2] netfilter: x_tables: speed up iptables-restore Florian Westphal 2017-10-11 14:26 ` [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore Florian Westphal @ 2017-10-11 14:26 ` Florian Westphal 1 sibling, 0 replies; 8+ messages in thread From: Florian Westphal @ 2017-10-11 14:26 UTC (permalink / raw) To: netfilter-devel; +Cc: Florian Westphal, Dan Williams, Eric Dumazet after previous commit xt_replace_table will wait until all cpus had even seqcount (i.e., no cpu is accessing old ruleset). Add a 'old' counter retrival version that doesn't synchronize counters. Its not needed, the old counters are not in use anymore at this point. This speeds up table replacement on busy systems with large tables (and many cores). Cc: Dan Williams <dcbw@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> --- no changes since v3. v2 only fixed Cc: tag in changelog. net/ipv4/netfilter/arp_tables.c | 22 ++++++++++++++++++++-- net/ipv4/netfilter/ip_tables.c | 23 +++++++++++++++++++++-- net/ipv6/netfilter/ip6_tables.c | 22 ++++++++++++++++++++-- 3 files changed, 61 insertions(+), 6 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 9e2770fd00be..f88221aebc9d 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -634,6 +634,25 @@ static void get_counters(const struct xt_table_info *t, } } +static void get_old_counters(const struct xt_table_info *t, + struct xt_counters counters[]) +{ + struct arpt_entry *iter; + unsigned int cpu, i; + + for_each_possible_cpu(cpu) { + i = 0; + xt_entry_foreach(iter, t->entries, t->size) { + struct xt_counters *tmp; + + tmp = xt_get_per_cpu_counter(&iter->counters, cpu); + ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt); + ++i; + } + cond_resched(); + } +} + static struct xt_counters *alloc_counters(const struct xt_table *table) { unsigned int countersize; @@ -910,8 +929,7 @@ static int __do_replace(struct net *net, const char *name, (newinfo->number <= oldinfo->initial_entries)) module_put(t->me); - /* Get the old counters, and synchronize with replace */ - get_counters(oldinfo, counters); + get_old_counters(oldinfo, counters); /* Decrease module usage counts and free resource */ loc_cpu_old_entry = oldinfo->entries; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 39286e543ee6..4cbe5e80f3bf 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -781,6 +781,26 @@ get_counters(const struct xt_table_info *t, } } +static void get_old_counters(const struct xt_table_info *t, + struct xt_counters counters[]) +{ + struct ipt_entry *iter; + unsigned int cpu, i; + + for_each_possible_cpu(cpu) { + i = 0; + xt_entry_foreach(iter, t->entries, t->size) { + const struct xt_counters *tmp; + + tmp = xt_get_per_cpu_counter(&iter->counters, cpu); + ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt); + ++i; /* macro does multi eval of i */ + } + + cond_resched(); + } +} + static struct xt_counters *alloc_counters(const struct xt_table *table) { unsigned int countersize; @@ -1070,8 +1090,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, (newinfo->number <= oldinfo->initial_entries)) module_put(t->me); - /* Get the old counters, and synchronize with replace */ - get_counters(oldinfo, counters); + get_old_counters(oldinfo, counters); /* Decrease module usage counts and free resource */ xt_entry_foreach(iter, oldinfo->entries, oldinfo->size) diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 01bd3ee5ebc6..f06e25065a34 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -800,6 +800,25 @@ get_counters(const struct xt_table_info *t, } } +static void get_old_counters(const struct xt_table_info *t, + struct xt_counters counters[]) +{ + struct ip6t_entry *iter; + unsigned int cpu, i; + + for_each_possible_cpu(cpu) { + i = 0; + xt_entry_foreach(iter, t->entries, t->size) { + const struct xt_counters *tmp; + + tmp = xt_get_per_cpu_counter(&iter->counters, cpu); + ADD_COUNTER(counters[i], tmp->bcnt, tmp->pcnt); + ++i; + } + cond_resched(); + } +} + static struct xt_counters *alloc_counters(const struct xt_table *table) { unsigned int countersize; @@ -1090,8 +1109,7 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks, (newinfo->number <= oldinfo->initial_entries)) module_put(t->me); - /* Get the old counters, and synchronize with replace */ - get_counters(oldinfo, counters); + get_old_counters(oldinfo, counters); /* Decrease module usage counts and free resource */ xt_entry_foreach(iter, oldinfo->entries, oldinfo->size) -- 2.13.6 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-11 18:23 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-11 14:26 [PATCH v3 nf-next 0/2] netfilter: x_tables: speed up iptables-restore Florian Westphal 2017-10-11 14:26 ` [PATCH v3 nf-next 1/2] netfilter: x_tables: wait until old table isn't used anymore Florian Westphal 2017-10-11 15:09 ` Eric Dumazet 2017-10-11 17:48 ` Florian Westphal 2017-10-11 17:57 ` Eric Dumazet 2017-10-11 18:18 ` Florian Westphal 2017-10-11 18:23 ` Eric Dumazet 2017-10-11 14:26 ` [PATCH v3 nf-next 2/2] netfilter: x_tables: don't use seqlock when fetching old counters Florian Westphal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.