All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Florian Westphal <fw@strlen.de>
Cc: subashab@codeaurora.org, pablo@netfilter.org,
	Sean Tranchetti <stranche@codeaurora.org>,
	netfilter-devel@vger.kernel.org, peterz@infradead.org,
	tglx@linutronix.de
Subject: Re: [PATCH nf] x_tables: Properly close read section with read_seqcount_retry
Date: Wed, 18 Nov 2020 12:54:06 +0000	[thread overview]
Message-ID: <20201118125406.GA2029@willie-the-truck> (raw)
In-Reply-To: <20201118124228.GJ22792@breakpoint.cc>

On Wed, Nov 18, 2020 at 01:42:28PM +0100, Florian Westphal wrote:
> Will Deacon <will@kernel.org> wrote:
> > On Mon, Nov 16, 2020 at 07:20:28PM +0100, Florian Westphal wrote:
> > > subashab@codeaurora.org <subashab@codeaurora.org> wrote:
> > > > > > Unfortunately we are seeing it on ARM64 regression systems which
> > > > > > runs a
> > > > > > variety of
> > > > > > usecases so the exact steps are not known.
> > > > > 
> > > > > Ok.  Would you be willing to run some of those with your suggested
> > > > > change to see if that resolves the crashes or is that so rare that this
> > > > > isn't practical?
> > > > 
> > > > I can try that out. Let me know if you have any other suggestions as well
> > > > and I can try that too.
> > > > 
> > > > I assume we cant add locks here as it would be in the packet processing
> > > > path.
> > > 
> > > Yes.  We can add a synchronize_net() in xt_replace_table if needed
> > > though, before starting to put the references on the old ruleset
> > > This would avoid the free of the jumpstack while skbs are still
> > > in-flight.
> > 
> > I tried to make sense of what's going on here, and it looks to me like
> > the interaction between xt_replace_table() and ip6t_do_table() relies on
> > store->load order that is _not_ enforced in the code.
> > 
> > xt_replace_table() does:
> > 
> > 	table->private = newinfo;
> > 
> > 	/* make sure all cpus see new ->private value */
> > 	smp_wmb();
> > 
> > 	/*
> > 	 * Even though table entries have now been swapped, other CPU's
> > 	 * may still be using the old entries...
> > 	 */
> > 	local_bh_enable();
> > 
> > 	/* ... so wait for even xt_recseq on all cpus */
> > 	for_each_possible_cpu(cpu) {
> > 		seqcount_t *s = &per_cpu(xt_recseq, cpu);
> > 		u32 seq = raw_read_seqcount(s);
> > 
> > 		if (seq & 1) {
> > 			do {
> > 				cond_resched();
> > 				cpu_relax();
> > 			} while (seq == raw_read_seqcount(s));
> > 		}
> > 	}
> > 
> > and I think the idea here is that we swizzle the table->private pointer
> > to point to the new data, and then wait for all pre-existing readers to
> > finish with the old table (isn't this what RCU is for?).
> 
> Yes and yes.   Before the current 'for_each_possible_cpu() + seqcount
> abuse the abuse was just implicit; xt_replace_table did NOT wait for
> readers to go away and relied on arp, eb and ip(6)tables to store the
> counters back to userspace after xt_replace_table().

Thanks for the background.

> This meant a read_seqcount_begin/retry for each counter & eventually
> gave complaits from k8s users that x_tables rule replacement was too
> slow.
> 
> [..]
> 
> > Given that this appears to be going wrong in practice, I've included a quick
> > hack below which should fix the ordering. However, I think it would be
> > better if we could avoid abusing the seqcount code for this sort of thing.
> 
> I'd be ok with going via the simpler solution & wait if k8s users
> complain that its too slow.  Those memory blobs can be huge so I would
> not use call_rcu here.

If you can stomach the cost of synchronize_rcu() then this at least
gets rid of that for_each_possible_cpu() loop! Also...

> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index af22dbe85e2c..b5911985d1eb 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1384,7 +1384,7 @@ xt_replace_table(struct xt_table *table,
>          * private.
>          */
>         smp_wmb();
> -       table->private = newinfo;
> +       WRITE_ONCE(table->private, newinfo);

... you could make this rcu_assign_pointer() and get rid of the preceding
smp_wmb()...

>  
>         /* make sure all cpus see new ->private value */
>         smp_wmb();

... and this smp_wmb() is no longer needed because synchronize_rcu()
takes care of the ordering.

> @@ -1394,19 +1394,7 @@ xt_replace_table(struct xt_table *table,
>          * may still be using the old entries...
>          */
>         local_bh_enable();
> -
> -       /* ... so wait for even xt_recseq on all cpus */
> -       for_each_possible_cpu(cpu) {
> -               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> -               u32 seq = raw_read_seqcount(s);
> -
> -               if (seq & 1) {
> -                       do {
> -                               cond_resched();
> -                               cpu_relax();
> -                       } while (seq == raw_read_seqcount(s));
> -               }
> -       }

Do we still need xt_write_recseq_{begin,end}() with this removed?

Cheers,

Will

  reply	other threads:[~2020-11-18 12:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14  2:21 [PATCH nf] x_tables: Properly close read section with read_seqcount_retry Sean Tranchetti
2020-11-14 16:53 ` Florian Westphal
2020-11-16  6:32   ` subashab
2020-11-16 14:18     ` Florian Westphal
2020-11-16 16:26       ` subashab
2020-11-16 17:04         ` Florian Westphal
2020-11-16 17:51           ` subashab
2020-11-16 18:20             ` Florian Westphal
2020-11-18 12:13               ` Will Deacon
2020-11-18 12:42                 ` Florian Westphal
2020-11-18 12:54                   ` Will Deacon [this message]
2020-11-18 13:14                     ` Florian Westphal
2020-11-18 20:39                       ` subashab
2020-11-18 21:10                         ` Florian Westphal
2020-11-20  5:53                           ` subashab
2020-11-20  6:31                             ` Florian Westphal
2020-11-20  9:44                             ` Peter Zijlstra
2020-11-20  9:53                               ` Peter Zijlstra
2020-11-20 10:20                                 ` Florian Westphal
2020-11-20 10:47                                   ` Peter Zijlstra
2020-11-21  1:27                                     ` subashab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201118125406.GA2029@willie-the-truck \
    --to=will@kernel.org \
    --cc=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=peterz@infradead.org \
    --cc=stranche@codeaurora.org \
    --cc=subashab@codeaurora.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.