From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cong Wang Subject: Re: [Patch net 00/15] net_sched: remove RCU callbacks from TC Date: Wed, 25 Oct 2017 15:37:40 -0700 Message-ID: References: <20171023220304.2268-1-xiyou.wangcong@gmail.com> <20171025.104353.1317026584880877750.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Linux Kernel Network Developers , "Paul E. McKenney" , Jamal Hadi Salim , John Fastabend , Chris Mi To: David Miller Return-path: Received: from mail-pg0-f67.google.com ([74.125.83.67]:49631 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751753AbdJYWiB (ORCPT ); Wed, 25 Oct 2017 18:38:01 -0400 Received: by mail-pg0-f67.google.com with SMTP id g6so1101072pgn.6 for ; Wed, 25 Oct 2017 15:38:01 -0700 (PDT) In-Reply-To: <20171025.104353.1317026584880877750.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Oct 24, 2017 at 6:43 PM, David Miller wrote: > From: Cong Wang > Date: Mon, 23 Oct 2017 15:02:49 -0700 > >> Recently, the RCU callbacks used in TC filters and TC actions keep >> drawing my attention, they introduce at least 4 race condition bugs: > > Like Eric, I think doing a full RCU sync on every delete is too big > a pill to swallow. This is a major control plane performance > regression. > > Please find another reasonable way to fix this. > Alright... I finally find a way to make everyone happy. My solution is introducing a workqueue for tc filters and let each RCU callback defer the work to this workqueue. I solve the flush_workqueue() deadlock by queuing another work in the same workqueue at the end, so the execution order should be as same as it is now. The ugly part is now tcf_block_put() which looks like below: static void tcf_block_put_final(struct work_struct *work) { struct tcf_block *block = container_of(work, struct tcf_block, work); struct tcf_chain *chain, *tmp; /* At this point, all the chains should have refcnt == 1. */ rtnl_lock(); list_for_each_entry_safe(chain, tmp, &block->chain_list, list) tcf_chain_put(chain); rtnl_unlock(); kfree(block); } static void tcf_block_put_deferred(struct work_struct *work) { struct tcf_block *block = container_of(work, struct tcf_block, work); struct tcf_chain *chain; rtnl_lock(); /* Hold a refcnt for all chains, except 0, in case they are gone. */ list_for_each_entry(chain, &block->chain_list, list) if (chain->index) tcf_chain_hold(chain); /* No race on the list, because no chain could be destroyed. */ list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain); INIT_WORK(&block->work, tcf_block_put_final); /* Wait for RCU callbacks to release the reference count and make * sure their works have been queued before this. */ rcu_barrier(); tcf_queue_work(&block->work); rtnl_unlock(); } void tcf_block_put(struct tcf_block *block) { if (!block) return; INIT_WORK(&block->work, tcf_block_put_deferred); /* Wait for existing RCU callbacks to cool down, make sure their works * have been queued before this. We can not flush pending works here * because we are holding the RTNL lock. */ rcu_barrier(); tcf_queue_work(&block->work); } Paul, does this make any sense to you? ;)