All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Chris Mi <chrism@mellanox.com>
Subject: Re: [Patch net 00/15] net_sched: remove RCU callbacks from TC
Date: Wed, 25 Oct 2017 17:19:16 -0700	[thread overview]
Message-ID: <20171026001916.GU3659@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAM_iQpVLHLx9eVgO0S--iq-XkhZE82Y30Z0REwU6qx8OGgm+mA@mail.gmail.com>

On Wed, Oct 25, 2017 at 03:37:40PM -0700, Cong Wang wrote:
> On Tue, Oct 24, 2017 at 6:43 PM, David Miller <davem@davemloft.net> wrote:
> > From: Cong Wang <xiyou.wangcong@gmail.com>
> > Date: Mon, 23 Oct 2017 15:02:49 -0700
> >
> >> Recently, the RCU callbacks used in TC filters and TC actions keep
> >> drawing my attention, they introduce at least 4 race condition bugs:
> >
> > Like Eric, I think doing a full RCU sync on every delete is too big
> > a pill to swallow.  This is a major control plane performance
> > regression.
> >
> > Please find another reasonable way to fix this.
> >
> 
> Alright... I finally find a way to make everyone happy.
> 
> My solution is introducing a workqueue for tc filters
> and let each RCU callback defer the work to this
> workqueue. I solve the flush_workqueue() deadlock
> by queuing another work in the same workqueue
> at the end, so the execution order should be as same
> as it is now. The ugly part is now tcf_block_put() which
> looks like below:
> 
> 
> static void tcf_block_put_final(struct work_struct *work)
> {
>         struct tcf_block *block = container_of(work, struct tcf_block, work);
>         struct tcf_chain *chain, *tmp;
> 
>         /* At this point, all the chains should have refcnt == 1. */
>         rtnl_lock();
>         list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
>                 tcf_chain_put(chain);
>         rtnl_unlock();
>         kfree(block);
> }

I am guessing that tcf_chain_put() sometimes does a call_rcu(),
and the callback function in turn calls schedule_work(), and that
tcf_block_put_deferred() is the workqueue handler function.

> static void tcf_block_put_deferred(struct work_struct *work)
> {
>         struct tcf_block *block = container_of(work, struct tcf_block, work);
>         struct tcf_chain *chain;
> 
>         rtnl_lock();
>         /* Hold a refcnt for all chains, except 0, in case they are gone. */
>         list_for_each_entry(chain, &block->chain_list, list)
>                 if (chain->index)
>                         tcf_chain_hold(chain);
> 
>         /* No race on the list, because no chain could be destroyed. */
>         list_for_each_entry(chain, &block->chain_list, list)
>                 tcf_chain_flush(chain);
> 
>         INIT_WORK(&block->work, tcf_block_put_final);
>         /* Wait for RCU callbacks to release the reference count and make
>          * sure their works have been queued before this.
>          */
>         rcu_barrier();

This one can take awhile...  Though in recent kernels it will often
be a bit faster than synchronize_rcu().

Note that rcu_barrier() only waits for callbacks posted via call_rcu()
before the rcu_barrier() starts, if that matters.

>         tcf_queue_work(&block->work);
>         rtnl_unlock();
> }

And it looks like tcf_block_put_deferred() queues itself as work as well.
Or maybe instead?

> void tcf_block_put(struct tcf_block *block)
> {
>         if (!block)
>                 return;
> 
>         INIT_WORK(&block->work, tcf_block_put_deferred);
>         /* Wait for existing RCU callbacks to cool down, make sure their works
>          * have been queued before this. We can not flush pending works here
>          * because we are holding the RTNL lock.
>          */
>         rcu_barrier();
>         tcf_queue_work(&block->work);
> }
> 
> 
> Paul, does this make any sense to you? ;)

 would be surprised if I fully understand the problem to be solved,
but my current guess is that the constraints are as follows:

1.	Things removed must be processed in order.

2.	Things removes must not be processed until a grace period
	has elapsed.

3.	Things being processed after a grace period should not be
	processed concurrently with each other or with subsequent
	removals.

4.	A given removal is not finalized until its reference count
	reaches zero.

5.	RTNL might not be held when the reference count reaches zero.

Or did I lose the thread somewhere?

							Thanx, Paul

  reply	other threads:[~2017-10-26  0:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-23 22:02 [Patch net 00/15] net_sched: remove RCU callbacks from TC Cong Wang
2017-10-23 22:02 ` [Patch net 01/15] net_sched: remove RCU callbacks in basic filter Cong Wang
2017-10-23 22:02 ` [Patch net 02/15] net_sched: remove RCU callbacks in bpf filter Cong Wang
2017-10-23 22:02 ` [Patch net 03/15] net_sched: remove RCU callbacks in flower filter Cong Wang
2017-10-23 22:02 ` [Patch net 04/15] net_sched: remove RCU callbacks in matchall filter Cong Wang
2017-10-23 22:02 ` [Patch net 05/15] net_sched: remove RCU callbacks in cgroup filter Cong Wang
2017-10-23 22:02 ` [Patch net 06/15] net_sched: remove RCU callbacks in rsvp filter Cong Wang
2017-10-23 22:02 ` [Patch net 07/15] net_sched: remove RCU callbacks in flow filter Cong Wang
2017-10-23 22:02 ` [Patch net 08/15] net_sched: remove RCU callbacks in tcindex filter Cong Wang
2017-10-23 22:02 ` [Patch net 09/15] net_sched: remove RCU callbacks in u32 filter Cong Wang
2017-10-23 22:02 ` [Patch net 10/15] net_sched: remove RCU callbacks in fw filter Cong Wang
2017-10-23 22:03 ` [Patch net 11/15] net_sched: remove RCU callbacks in route filter Cong Wang
2017-10-23 22:03 ` [Patch net 12/15] net_sched: remove RCU callbacks in sample action Cong Wang
2017-10-23 22:03 ` [Patch net 13/15] net_sched: add rtnl assertion to tcf_exts_destroy() Cong Wang
2017-10-23 22:03 ` [Patch net 14/15] selftests: Introduce a new script to generate tc batch file Cong Wang
2017-10-24  4:56   ` Chris Mi
2017-10-23 22:03 ` [Patch net 15/15] selftests: Introduce a new test case to tc testsuite Cong Wang
2017-10-23 23:16 ` [Patch net 00/15] net_sched: remove RCU callbacks from TC Eric Dumazet
2017-10-23 23:23   ` Cong Wang
2017-10-23 23:31     ` Eric Dumazet
2017-10-25 20:46       ` Cong Wang
2017-10-25  1:43 ` David Miller
2017-10-25 22:37   ` Cong Wang
2017-10-26  0:19     ` Paul E. McKenney [this message]
2017-10-26  4:49       ` Cong Wang
2017-10-26 13:58         ` Paul E. McKenney
2017-10-26 19:10           ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026001916.GU3659@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=chrism@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.