From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81799C433DF for ; Tue, 13 Oct 2020 10:08:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 47F42207C3 for ; Tue, 13 Oct 2020 10:08:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728974AbgJMKIF (ORCPT ); Tue, 13 Oct 2020 06:08:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727894AbgJMKIF (ORCPT ); Tue, 13 Oct 2020 06:08:05 -0400 Received: from orbyte.nwl.cc (orbyte.nwl.cc [IPv6:2001:41d0:e:133a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A67C5C0613D0 for ; Tue, 13 Oct 2020 03:08:04 -0700 (PDT) Received: from n0-1 by orbyte.nwl.cc with local (Exim 4.94) (envelope-from ) id 1kSHDv-0007eo-5w; Tue, 13 Oct 2020 12:08:03 +0200 Date: Tue, 13 Oct 2020 12:08:03 +0200 From: Phil Sutter To: Pablo Neira Ayuso Cc: netfilter-devel@vger.kernel.org, Florian Westphal Subject: Re: [iptables PATCH 3/3] nft: Fix for concurrent noflush restore calls Message-ID: <20201013100803.GW13016@orbyte.nwl.cc> Mail-Followup-To: Phil Sutter , Pablo Neira Ayuso , netfilter-devel@vger.kernel.org, Florian Westphal References: <20201005144858.11578-1-phil@nwl.cc> <20201005144858.11578-4-phil@nwl.cc> <20201012125450.GA26934@salvia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201012125450.GA26934@salvia> Sender: Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org On Mon, Oct 12, 2020 at 02:54:50PM +0200, Pablo Neira Ayuso wrote: [...] > > - if (h->family == NFPROTO_BRIDGE) > > - nftnl_chain_set_u32(c, NFTNL_CHAIN_POLICY, NF_ACCEPT); > > + list = nft_chain_list_get(h, table, chain); > > + if (list) > > + nftnl_chain_list_add(c, list); > > + } else { > > + /* If the chain should vanish meanwhile, kernel genid changes > > + * and the transaction is refreshed enabling the chain add > > + * object. With the handle still set, kernel interprets it as a > > + * chain replace job and errors since it is not found anymore. > > + */ > > + nftnl_chain_unset(c, NFTNL_CHAIN_HANDLE); > > + } > > > > - if (!created) > > - return 1; > > + __nft_rule_flush(h, table, chain, false, created); > > I like this trick. > > If I understood correct, you always place an "add chain" command in > first place before the flush, whether the chain exists or not, so the > flush always succeeds. Yes, this was already done for "add table". So I'm "merely" extending Florian's refresh transaction logic with regards to 'skip' flag. > > - if (!batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c)) > > + obj = batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c); > > + if (!obj) > > return 0; > > > > - list = nft_chain_list_get(h, table, chain); > > - if (list) > > - nftnl_chain_list_add(c, list); > > + obj->skip = !created; > > > > /* the core expects 1 for success and 0 for error */ > > return 1; > > @@ -2649,11 +2649,6 @@ static void nft_refresh_transaction(struct nft_handle *h) > > h->error.lineno = 0; > > > > list_for_each_entry_safe(n, tmp, &h->obj_list, head) { > > - if (n->implicit) { > > - batch_obj_del(h, n); > > - continue; > > - } > > - > > switch (n->type) { > > case NFT_COMPAT_TABLE_FLUSH: > > tablename = nftnl_table_get_str(n->table, NFTNL_TABLE_NAME); > > @@ -2679,14 +2674,22 @@ static void nft_refresh_transaction(struct nft_handle *h) > > > > c = nft_chain_find(h, tablename, chainname); > > if (c) { > > - /* -restore -n flushes existing rules from redefined user-chain */ > > - __nft_rule_flush(h, tablename, > > - chainname, false, true); > > n->skip = 1; > > } else if (!c) { > > n->skip = 0; > > } > > break; > > + case NFT_COMPAT_RULE_FLUSH: > > + tablename = nftnl_rule_get_str(n->rule, NFTNL_RULE_TABLE); > > + if (!tablename) > > + continue; > > + > > + chainname = nftnl_rule_get_str(n->rule, NFTNL_RULE_CHAIN); > > + if (!chainname) > > + continue; > > + > > + n->skip = !nft_chain_find(h, tablename, chainname); > > So n->skip equals true if the chain does not exists in the cache, so > this flush does not fail (after this chain is gone because someone > else have remove it). Yes, the tricky bit is being able to adjust the batch object list no matter how kernel ruleset changes. The only way is to assume the most overhead and temporary disable individual jobs if not needed. This way we can enable/disable while refreshing the transaction. > Patch LGTM, thanks Phil. > > What I don't clearly see yet is what scenario is triggering the bug in > the existing code, if you don't mind to explain. See the test case attached to the patch: An other iptables-restore process may add references (i.e., jumps) to a chain the own iptables-restore process wants to delete. This should not be a problem because these references are added to a chain that is being flushed by the own process as well. But if that chain doesn't exist while the own process fetches kernel's ruleset, this flush job is not created. Cheers, Phil