From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66D8EC43381 for ; Fri, 15 Feb 2019 11:30:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 28EBB2177B for ; Fri, 15 Feb 2019 11:30:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="59rWB3as" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394279AbfBOLaw (ORCPT ); Fri, 15 Feb 2019 06:30:52 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:50309 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391129AbfBOLaw (ORCPT ); Fri, 15 Feb 2019 06:30:52 -0500 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 0684A222F2; Fri, 15 Feb 2019 06:30:49 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 06:30:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=vTpSqF Xg5i5DE6RNv39Uu9D6IjZb1WZSZfPq0WJnQ5g=; b=59rWB3asZzXMs1SkyqA7YS NP7ZBu4wwpEgwUYqC47lk35S5hg/TON9GQomNaArdXwsRBttiMxP8Y2sc7okpnwS vrvnsTFxEZNLvTFrFuUT0vkgVdB8xqSkN2BLcr7JH9I9e4lpzP4woXrMeggcC970 KqMcSd5VOWxfqAHB0aQSJy+s+0XoJqcdEX+052an8kQoycLZipQcuQzoARFW4bKL EfoSSCg/BJlzjiCyP4fxMcWqWZ30gSGZSeRKTW5xNZJGN6cLn1+XQ2fLIUhnXyYC QJrHKjrOJu+8N3YmtYgibF58mvHwVM7XKmAubfYJvvnZxeWxuyiJwv3vAc/vfSSw == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgvdekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfquhhtnecuuegrihhlohhuthemucef tddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpeffhffvuffkfh ggtggujggfsehttdertddtredvnecuhfhrohhmpefkughoucfutghhihhmmhgvlhcuoehi ughoshgthhesihguohhstghhrdhorhhgqeenucfkphepjeelrddujeeirdejrdduudehne curfgrrhgrmhepmhgrihhlfhhrohhmpehiughoshgthhesihguohhstghhrdhorhhgnecu vehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from localhost (bzq-79-176-7-115.red.bezeqint.net [79.176.7.115]) by mail.messagingengine.com (Postfix) with ESMTPA id 0EC0010312; Fri, 15 Feb 2019 06:30:43 -0500 (EST) Date: Fri, 15 Feb 2019 13:30:41 +0200 From: Ido Schimmel To: Vlad Buslov Cc: "netdev@vger.kernel.org" , "jhs@mojatatu.com" , "xiyou.wangcong@gmail.com" , "jiri@resnulli.us" , "davem@davemloft.net" , "ast@kernel.org" , "daniel@iogearbox.net" Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Message-ID: <20190215113041.GA10511@splinter> References: <20190211085548.7190-1-vladbu@mellanox.com> <20190211085548.7190-8-vladbu@mellanox.com> <20190214182442.GA19269@splinter> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote: > > On Thu 14 Feb 2019 at 18:24, Ido Schimmel wrote: > > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote: > >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain > >> when accessing filter_chain list, instead of relying on rtnl lock. > >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to > >> verify that all users of chain_list have the lock taken. > >> > >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute > >> all necessary code while holding chain lock in order to prevent > >> invalidation of chain_info structure by potential concurrent change. This > >> also serializes calls to tcf_chain0_head_change(), which allows head change > >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl > >> mutex. > >> > >> Signed-off-by: Vlad Buslov > >> Acked-by: Jiri Pirko > > > > With this sequence [1] I get the following trace [2]. Bisected it to > > this patch. Note that second filter is rejected by the device driver > > (that's the intention). I guess it is not properly removed from the > > filter chain in the error path? > > > > Thanks > > > > [1] > > # tc qdisc add dev swp3 clsact > > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \ > > action mirred egress mirror dev swp7 > > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \ > > action mirred egress mirror dev swp7 > > RTNETLINK answers: File exists > > We have an error talking to the kernel, -1 > > # tc qdisc del dev swp3 clsact > > > > [2] > > [ 70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access > > [ 70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI > > [ 70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304 > > [ 70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 > > [ 70.580204] RIP: 0010:mall_reoffload+0x14a/0x760 > > [ 70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83 > > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd > > [ 70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207 > > [ 70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000 > > [ 70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040 > > [ 70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa > > [ 70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000 > > [ 70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010 > > [ 70.652163] FS: 00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000 > > [ 70.661218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0 > > [ 70.675633] Call Trace: > > [ 70.693046] tcf_block_playback_offloads+0x94/0x230 > > [ 70.710617] __tcf_block_cb_unregister+0xf7/0x2d0 > > [ 70.734143] mlxsw_sp_setup_tc+0x20f/0x660 > > [ 70.738739] tcf_block_offload_unbind+0x22b/0x350 > > [ 70.748898] __tcf_block_put+0x203/0x630 > > [ 70.769700] tcf_block_put_ext+0x2f/0x40 > > [ 70.774098] clsact_destroy+0x7a/0xb0 > > [ 70.782604] qdisc_destroy+0x11a/0x5c0 > > [ 70.786810] qdisc_put+0x5a/0x70 > > [ 70.790435] notify_and_destroy+0x8e/0xa0 > > [ 70.794933] qdisc_graft+0xbb7/0xef0 > > [ 70.809009] tc_get_qdisc+0x518/0xa30 > > [ 70.821530] rtnetlink_rcv_msg+0x397/0xa20 > > [ 70.835510] netlink_rcv_skb+0x132/0x380 > > [ 70.848419] netlink_unicast+0x4c0/0x690 > > [ 70.866019] netlink_sendmsg+0x929/0xe10 > > [ 70.882134] sock_sendmsg+0xc8/0x110 > > [ 70.886144] ___sys_sendmsg+0x77a/0x8f0 > > [ 70.924064] __sys_sendmsg+0xf7/0x250 > > [ 70.955727] do_syscall_64+0x14d/0x610 > > Hi Ido, > > Thanks for reporting this. > > I looked at the code and problem seems to be matchall classifier > specific. My implementation of unlocked cls API assumes that concurrent > insertions are possible and checks for it when deleting "empty" tp. > Since classifiers don't expose number of elements, the only way to test > this is to do tp->walk() on them and assume that walk callback is called > once per filter on every classifier. In your example new tp is created > for second filter, filter insertion fails, number of elements on newly > created tp is checked with tp->walk() before deleting it. However, > matchall classifier always calls the tp->walk() callback once, even when > it doesn't have a valid filter (in this case with NULL filter pointer). > > Possible ways to fix this: > > 1) Check for NULL filter pointer in tp->walk() callback and ignore it > when counting filters on tp - will work but I don't like it because I > don't think it is ever correct to call tp->walk() callback with NULL > filter pointer. > > 2) Fix matchall to not call tp->walk() callback with NULL filter > pointers - my preferred simple solution. > > 3) Extend tp API to have direct way to count filters by implementing > tp->count - requires change to every classifier. Or maybe add it as > optional API that only unlocked classifiers are required to implement > and just delete rtnl lock dependent tp's without checking for concurrent > insertions. > > What do you think? Since the problem is matchall-specific, then it makes sense to fix it in matchall like you suggested in option #2. Can you please use this opportunity and audit other classifiers to confirm problem is indeed specific to matchall? In any case, feel free to send me a patch and I'll test it. Thanks