All of lore.kernel.org
 help / color / mirror / Atom feed
From: subashab@codeaurora.org
To: Florian Westphal <fw@strlen.de>
Cc: Will Deacon <will@kernel.org>,
	pablo@netfilter.org, Sean Tranchetti <stranche@codeaurora.org>,
	netfilter-devel@vger.kernel.org, peterz@infradead.org,
	tglx@linutronix.de
Subject: Re: [PATCH nf] x_tables: Properly close read section with read_seqcount_retry
Date: Thu, 19 Nov 2020 22:53:13 -0700	[thread overview]
Message-ID: <7d8bc917b7a6790fa789085ba8324b08@codeaurora.org> (raw)
In-Reply-To: <20201118211007.GA15137@breakpoint.cc>

> The three _do_table() functions need to use rcu_dereference().
> 
> This looks wrong.  I know its ok, but perhaps its better to add this:
> 
> struct xt_table_info *xt_table_get_private_protected(const struct
> xt_table *table)
> {
>  return rcu_dereference_protected(table->private,
> mutex_is_locked(&xt[table->af].mutex));
> }
> EXPORT_SYMBOL(xt_table_get_private_protected);
> 
> to x_tables.c.
> 
> If you dislike this extra function, add
> 
> #define xt_table_get_private_protected(t) 
> rcu_access_pointer((t)->private)
> 
> in include/linux/netfilter/x_tables.h, with a bit fat comment telling
> that the xt table mutex must be held.
> 
> But I'd rather have/use the helper function as it documents when its
> safe to do this (and there will be splats if misused).
> AFAICS the local_bh_disable/enable calls can be removed too after this,
> if we're interrupted by softirq calling any of the _do_table()
> functions changes to the xt seqcount do not matter anymore.
> 
> 
> We need this additional hunk to switch to rcu for replacement/sync, no?
> 
> -       local_bh_enable();
> -
> -       /* ... so wait for even xt_recseq on all cpus */
> -       for_each_possible_cpu(cpu) {
> -               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> -               u32 seq = raw_read_seqcount(s);
> -
> -               if (seq & 1) {
> -                       do {
> -                               cond_resched();
> -                               cpu_relax();
> -                       } while (seq == raw_read_seqcount(s));
> -               }
> -       }
> +       synchronize_rcu();

I've updated the patch with your comments.
Do you expect a performance impact either in datapath or perhaps more in
the rule installation with the rcu changes.
As a I understand, the change in the barrier types would be sufficient 
to
resolve the race.

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 5deb099..8ebb641 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -227,7 +227,7 @@ struct xt_table {
  	unsigned int valid_hooks;

  	/* Man behind the curtain... */
-	struct xt_table_info *private;
+	struct xt_table_info __rcu *private;

  	/* Set this to THIS_MODULE if you are a module, otherwise NULL */
  	struct module *me;
@@ -448,6 +448,9 @@ xt_get_per_cpu_counter(struct xt_counters *cnt, 
unsigned int cpu)

  struct nf_hook_ops *xt_hook_ops_alloc(const struct xt_table *, 
nf_hookfn *);

+struct xt_table_info
+*xt_table_get_private_protected(const struct xt_table *table);
+
  #ifdef CONFIG_COMPAT
  #include <net/compat.h>

diff --git a/net/ipv4/netfilter/arp_tables.c 
b/net/ipv4/netfilter/arp_tables.c
index d1e04d2..dda5d8f 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -203,7 +203,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,

  	local_bh_disable();
  	addend = xt_write_recseq_begin();
-	private = READ_ONCE(table->private); /* Address dependency. */
+	private = xt_table_get_private_protected(table);
  	cpu     = smp_processor_id();
  	table_base = private->entries;
  	jumpstack  = (struct arpt_entry **)private->jumpstack[cpu];
@@ -649,7 +649,7 @@ static struct xt_counters *alloc_counters(const 
struct xt_table *table)
  {
  	unsigned int countersize;
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);

  	/* We need atomic snapshot of counters: rest doesn't change
  	 * (other than comefrom, which userspace doesn't care
@@ -673,7 +673,7 @@ static int copy_entries_to_user(unsigned int 
total_size,
  	unsigned int off, num;
  	const struct arpt_entry *e;
  	struct xt_counters *counters;
-	struct xt_table_info *private = table->private;
+	struct xt_table_info *private = xt_table_get_private_protected(table);
  	int ret = 0;
  	void *loc_cpu_entry;

@@ -1330,7 +1330,7 @@ static int compat_copy_entries_to_user(unsigned 
int total_size,
  				       void __user *userptr)
  {
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);
  	void __user *pos;
  	unsigned int size;
  	int ret = 0;
diff --git a/net/ipv4/netfilter/ip_tables.c 
b/net/ipv4/netfilter/ip_tables.c
index f15bc21..5ec422c 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -258,7 +258,7 @@ ipt_do_table(struct sk_buff *skb,
  	WARN_ON(!(table->valid_hooks & (1 << hook)));
  	local_bh_disable();
  	addend = xt_write_recseq_begin();
-	private = READ_ONCE(table->private); /* Address dependency. */
+	private = xt_table_get_private_protected(table);
  	cpu        = smp_processor_id();
  	table_base = private->entries;
  	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
@@ -791,7 +791,7 @@ static struct xt_counters *alloc_counters(const 
struct xt_table *table)
  {
  	unsigned int countersize;
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);

  	/* We need atomic snapshot of counters: rest doesn't change
  	   (other than comefrom, which userspace doesn't care
@@ -815,7 +815,7 @@ copy_entries_to_user(unsigned int total_size,
  	unsigned int off, num;
  	const struct ipt_entry *e;
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);
  	int ret = 0;
  	const void *loc_cpu_entry;

@@ -1543,7 +1543,7 @@ compat_copy_entries_to_user(unsigned int 
total_size, struct xt_table *table,
  			    void __user *userptr)
  {
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);
  	void __user *pos;
  	unsigned int size;
  	int ret = 0;
diff --git a/net/ipv6/netfilter/ip6_tables.c 
b/net/ipv6/netfilter/ip6_tables.c
index 2e2119b..91d8364 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -280,7 +280,7 @@ ip6t_do_table(struct sk_buff *skb,

  	local_bh_disable();
  	addend = xt_write_recseq_begin();
-	private = READ_ONCE(table->private); /* Address dependency. */
+	private = xt_table_get_private_protected(table);
  	cpu        = smp_processor_id();
  	table_base = private->entries;
  	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
@@ -807,7 +807,7 @@ static struct xt_counters *alloc_counters(const 
struct xt_table *table)
  {
  	unsigned int countersize;
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);

  	/* We need atomic snapshot of counters: rest doesn't change
  	   (other than comefrom, which userspace doesn't care
@@ -831,7 +831,7 @@ copy_entries_to_user(unsigned int total_size,
  	unsigned int off, num;
  	const struct ip6t_entry *e;
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);
  	int ret = 0;
  	const void *loc_cpu_entry;

@@ -1552,7 +1552,7 @@ compat_copy_entries_to_user(unsigned int 
total_size, struct xt_table *table,
  			    void __user *userptr)
  {
  	struct xt_counters *counters;
-	const struct xt_table_info *private = table->private;
+	const struct xt_table_info *private = 
xt_table_get_private_protected(table);
  	void __user *pos;
  	unsigned int size;
  	int ret = 0;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index af22dbe..416a617 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1349,6 +1349,14 @@ struct xt_counters *xt_counters_alloc(unsigned 
int counters)
  }
  EXPORT_SYMBOL(xt_counters_alloc);

+struct xt_table_info
+*xt_table_get_private_protected(const struct xt_table *table)
+{
+	return rcu_dereference_protected(table->private,
+					 mutex_is_locked(&xt[table->af].mutex));
+}
+EXPORT_SYMBOL(xt_table_get_private_protected);
+
  struct xt_table_info *
  xt_replace_table(struct xt_table *table,
  	      unsigned int num_counters,
@@ -1356,7 +1364,6 @@ xt_replace_table(struct xt_table *table,
  	      int *error)
  {
  	struct xt_table_info *private;
-	unsigned int cpu;
  	int ret;

  	ret = xt_jumpstack_alloc(newinfo);
@@ -1366,8 +1373,7 @@ xt_replace_table(struct xt_table *table,
  	}

  	/* Do the substitution. */
-	local_bh_disable();
-	private = table->private;
+	private = xt_table_get_private_protected(table);

  	/* Check inside lock: is the old number correct? */
  	if (num_counters != private->number) {
@@ -1379,34 +1385,9 @@ xt_replace_table(struct xt_table *table,
  	}

  	newinfo->initial_entries = private->initial_entries;
-	/*
-	 * Ensure contents of newinfo are visible before assigning to
-	 * private.
-	 */
-	smp_wmb();
-	table->private = newinfo;
-
-	/* make sure all cpus see new ->private value */
-	smp_wmb();

-	/*
-	 * Even though table entries have now been swapped, other CPU's
-	 * may still be using the old entries...
-	 */
-	local_bh_enable();
-
-	/* ... so wait for even xt_recseq on all cpus */
-	for_each_possible_cpu(cpu) {
-		seqcount_t *s = &per_cpu(xt_recseq, cpu);
-		u32 seq = raw_read_seqcount(s);
-
-		if (seq & 1) {
-			do {
-				cond_resched();
-				cpu_relax();
-			} while (seq == raw_read_seqcount(s));
-		}
-	}
+	rcu_assign_pointer(table->private, newinfo);
+	synchronize_rcu();

  	audit_log_nfcfg(table->name, table->af, private->number,
  			!private->number ? AUDIT_XT_OP_REGISTER :
@@ -1442,12 +1423,12 @@ struct xt_table *xt_register_table(struct net 
*net,
  	}

  	/* Simplifies replace_table code. */
-	table->private = bootstrap;
+	rcu_assign_pointer(table->private, bootstrap);

  	if (!xt_replace_table(table, 0, newinfo, &ret))
  		goto unlock;

-	private = table->private;
+	private = xt_table_get_private_protected(table);
  	pr_debug("table->private->number = %u\n", private->number);

  	/* save number of initial entries */
@@ -1470,7 +1451,8 @@ void *xt_unregister_table(struct xt_table *table)
  	struct xt_table_info *private;

  	mutex_lock(&xt[table->af].mutex);
-	private = table->private;
+	private = xt_table_get_private_protected(table);
+	RCU_INIT_POINTER(table->private, NULL);
  	list_del(&table->list);
  	mutex_unlock(&xt[table->af].mutex);
  	audit_log_nfcfg(table->name, table->af, private->number,

  reply	other threads:[~2020-11-20  5:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14  2:21 [PATCH nf] x_tables: Properly close read section with read_seqcount_retry Sean Tranchetti
2020-11-14 16:53 ` Florian Westphal
2020-11-16  6:32   ` subashab
2020-11-16 14:18     ` Florian Westphal
2020-11-16 16:26       ` subashab
2020-11-16 17:04         ` Florian Westphal
2020-11-16 17:51           ` subashab
2020-11-16 18:20             ` Florian Westphal
2020-11-18 12:13               ` Will Deacon
2020-11-18 12:42                 ` Florian Westphal
2020-11-18 12:54                   ` Will Deacon
2020-11-18 13:14                     ` Florian Westphal
2020-11-18 20:39                       ` subashab
2020-11-18 21:10                         ` Florian Westphal
2020-11-20  5:53                           ` subashab [this message]
2020-11-20  6:31                             ` Florian Westphal
2020-11-20  9:44                             ` Peter Zijlstra
2020-11-20  9:53                               ` Peter Zijlstra
2020-11-20 10:20                                 ` Florian Westphal
2020-11-20 10:47                                   ` Peter Zijlstra
2020-11-21  1:27                                     ` subashab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d8bc917b7a6790fa789085ba8324b08@codeaurora.org \
    --to=subashab@codeaurora.org \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=peterz@infradead.org \
    --cc=stranche@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.