From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 305F1C433F5 for ; Tue, 28 Aug 2018 00:03:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 856E0208B9 for ; Tue, 28 Aug 2018 00:03:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 856E0208B9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ZenIV.linux.org.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727199AbeH1DwK (ORCPT ); Mon, 27 Aug 2018 23:52:10 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:39378 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726994AbeH1DwJ (ORCPT ); Mon, 27 Aug 2018 23:52:09 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fuRTS-0000kN-Kd; Tue, 28 Aug 2018 00:03:10 +0000 Date: Tue, 28 Aug 2018 01:03:10 +0100 From: Al Viro To: Cong Wang Cc: Jamal Hadi Salim , Kees Cook , LKML , Jiri Pirko , David Miller , Linux Kernel Network Developers Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL Message-ID: <20180828000310.GE6515@ZenIV.linux.org.uk> References: <20180826055801.GA42063@beast> <20180826061534.GT6515@ZenIV.linux.org.uk> <20180826173236.GU6515@ZenIV.linux.org.uk> <20180826225749.GY6515@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote: > > I cant think of any challenges. Cong/Jiri? Would it require development > > time classifiers/actions/qdiscs to sit in that directory (I suspect you > > dont want them in include/net). > > BTW, the idea of improving grep-ability of the code by prefixing the > > ops appropriately makes sense. i.e we should have ops->cls_init, > > ops->act_init etc. > > Hmm? Isn't struct tcf_proto_ops used and must be provided > by each tc filter module? How does it work if you move it into > net/sched/* for out-of-tree modules? Are they supposed to > include "..../net/sched/tcf_proto.h"?? Or something else? If you care about out-of-tree modules, that could easily live in include/net/tcf_proto.h, provided that it's not pulled by indirect includes into hell knows how many places. Try make allmodconfig make >/dev/null 2>&1 find -name '.*.cmd'|xargs grep sch_generic.h That finds 2977 files here, most of them having nothing to do with net/sched. > BTW, we need some grep tool that really understands C syntax, > not making each variable friendly to plain grep. This isn't the matter of C syntax; it needs to handle C typization, and you really can't do that anywhere near reliably without looking at preprocessor output. Which very much depends upon .config... BTW, something odd in cls_u32.c: what happens if we have the following graph: tcf_proto , it's ->data being and ->root - tc_u_common , in its ->hlist , in its ->ht[0] and set ->ht_down in to the ? AFAICS, there's nothing to prevent that - TCA_U32_LINK being 0x80000000 will do just that. What happens upon u32_destroy() in that case? Unless I'm misreading that code, refcounts will be : 1 : 2 : 1 and in u32_destroy() we'll get this: root_ht = tp_c = if (root_ht && --root_ht->refcnt == 0) u32_destroy_hnode(tp, root_ht, extack); decrements refcnt to 1 and does nothing else. if (--tp_c->refcnt == 0) { is satisfied hlist_del(&tp_c->hnode); unhashed while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) { we take ht = u32_clear_hnode(tp, ht, extack); which does for (h = 0; h <= ht->divisor; h++) { while ((n = rtnl_dereference(ht->ht[h])) != NULL) { n = RCU_INIT_POINTER(ht->ht[h], rtnl_dereference(n->next)); remove from ->ht[0] tcf_unbind_filter(tp, &n->res); u32_remove_hw_knode(tp, n, extack); idr_remove(&ht->handle_idr, n->handle); if (tcf_exts_get_net(&n->exts)) tcf_queue_work(&n->rwork, u32_delete_key_freepf_work); else u32_destroy_key(n->tp, n, true); ... and we hit u32_destroy_key(, , true), which does struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); ht = tcf_exts_destroy(&n->exts); tcf_exts_put_net(&n->exts); if (ht && --ht->refcnt == 0) kfree(ht); *NOW* ->refcnt is 0, and we free the damn thing. .... kfree(n); is freed and we return to u32_destroy_hnode() where we see that there's nothing else left in ->ht[...] and return to u32_destroy(). Where RCU_INIT_POINTER(tp_c->hlist, ht->next); sets ->hlist to ->next, aka . Which is already freed. /* u32_destroy_key() will later free ht for us, if it's * still referenced by some knode */ if (--ht->refcnt == 0) kfree_rcu(ht, rcu); ->refcnt reaches 0 and we free it (RCU-delayed) } ... and we go for the next iteration, this time with ht = . Doing all kinds of unsanitary things to the memory it used to occupy... Incidentally, if we hit tcf_queue_work(&n->rwork, u32_delete_key_freepf_work); instead of u32_destroy_key(), the things don't seem to be any better - we won't do anything to until rtnl is dropped, so u32_destroy() won't break on the second pass through the loop - it'll free there and return. Setting us up for trouble, since when u32_delete_key_freepf_work() finally gets to u32_destroy_key() we'll have ->ht_down pointing to freed memory and decrementing its contents... What am I missing in there? Is it just "we should never have ->ht_down pointing to anyone's ->root"? If so, I'm not sure how to detect that; if not... what should happen to the orphaned root_ht? Should it remain on the list? We might have two tcf_proto sharing tp->data, so tp_c and its list might very well survive the u32_destroy()... Note, BTW, that if we do leave the orphan on the list and later change the tc_u_knode so that ->ht_down doesn't point to that thing anymore, we'll get its refcount incremented to 2 in u32_init_knode(), then decremented to 1 by u32_set_parms() and then arrange for u32_delete_key_work() to be run. Which will drive the refcount to 0 and free the damn thing. While it's still in the middle of ->hlist...