linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Kees Cook <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL
Date: Tue, 28 Aug 2018 01:03:10 +0100	[thread overview]
Message-ID: <20180828000310.GE6515@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAM_iQpVEyq9hR3bbOtLFKoLo6nHCtiL6A__uEz3JdDO79GF_8A@mail.gmail.com>

On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote:
> > I cant think of any challenges. Cong/Jiri? Would it require development
> > time classifiers/actions/qdiscs to sit in that directory (I suspect you
> > dont want them in include/net).
> > BTW, the idea of improving grep-ability of the code by prefixing the
> > ops appropriately makes sense. i.e we should have ops->cls_init,
> > ops->act_init etc.
> 
> Hmm? Isn't struct tcf_proto_ops used and must be provided
> by each tc filter module? How does it work if you move it into
> net/sched/* for out-of-tree modules? Are they supposed to
> include "..../net/sched/tcf_proto.h"?? Or something else?

If you care about out-of-tree modules, that could easily live in
include/net/tcf_proto.h, provided that it's not pulled by indirect
includes into hell knows how many places.  Try
make allmodconfig
make >/dev/null 2>&1
find -name '.*.cmd'|xargs grep sch_generic.h

That finds 2977 files here, most of them having nothing to do with
net/sched.

> BTW, we need some grep tool that really understands C syntax,
> not making each variable friendly to plain grep.

This isn't the matter of C syntax; it needs to handle C typization,
and you really can't do that anywhere near reliably without looking
at preprocessor output.  Which very much depends upon .config...

BTW, something odd in cls_u32.c: what happens if we have the following
graph:
tcf_proto <tp>, it's ->data being <c0> and ->root - <ht0>
tc_u_common <c0>, in its ->hlist
	<ht1>, in its ->ht[0]
		<knode>
	<ht0>
and set ->ht_down in <knode> to the <ht0>?  AFAICS,
there's nothing to prevent that - TCA_U32_LINK being
0x80000000 will do just that.  What happens upon u32_destroy()
in that case?  Unless I'm misreading that code, refcounts will be
	<c0>:	1
	<ht0>:	2
	<ht1>:	1
and in u32_destroy() we'll get this:
	root_ht = <ht0>
	tp_c = <c0>
        if (root_ht && --root_ht->refcnt == 0)
                u32_destroy_hnode(tp, root_ht, extack);
decrements refcnt to 1 and does nothing else.
        if (--tp_c->refcnt == 0) {
is satisfied
                hlist_del(&tp_c->hnode);
<c0> unhashed
                while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
we take ht = <ht1>
                        u32_clear_hnode(tp, ht, extack);
which does
        for (h = 0; h <= ht->divisor; h++) {
                while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
n = <knode>
                        RCU_INIT_POINTER(ht->ht[h],
                                         rtnl_dereference(n->next));
remove <knode> from <ht1>->ht[0]
                        tcf_unbind_filter(tp, &n->res);
                        u32_remove_hw_knode(tp, n, extack);
                        idr_remove(&ht->handle_idr, n->handle);
                        if (tcf_exts_get_net(&n->exts))
                                tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
                        else
                                u32_destroy_key(n->tp, n, true);
... and we hit u32_destroy_key(<tp>, <knode>, true), which does
        struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
ht = <ht0>
        tcf_exts_destroy(&n->exts);
        tcf_exts_put_net(&n->exts);
        if (ht && --ht->refcnt == 0)
                kfree(ht);
*NOW* <ht0>->refcnt is 0, and we free the damn thing.
	....
	kfree(n);
<knode> is freed and we return to u32_destroy_hnode() where we
see that there's nothing else left in <ht1>->ht[...] and return
to u32_destroy().  Where
                        RCU_INIT_POINTER(tp_c->hlist, ht->next);
sets <c0>->hlist to <ht1>->next, aka <h0>.  Which is already freed.

                        /* u32_destroy_key() will later free ht for us, if it's
                         * still referenced by some knode
                         */
                        if (--ht->refcnt == 0)
                                kfree_rcu(ht, rcu);
<ht1>->refcnt reaches 0 and we free it (RCU-delayed)
                }
... and we go for the next iteration, this time with ht = <ht0>.
Doing all kinds of unsanitary things to the memory it used to occupy...

Incidentally, if we hit
                                tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
instead of u32_destroy_key(), the things don't seem to be any better - we
won't do anything to <knode> until rtnl is dropped, so u32_destroy() won't
break on the second pass through the loop - it'll free <ht0> there and
return.  Setting us up for trouble, since when u32_delete_key_freepf_work()
finally gets to u32_destroy_key() we'll have <knode>->ht_down pointing
to freed memory and decrementing its contents...

What am I missing in there?  Is it just "we should never have ->ht_down
pointing to anyone's ->root"?  If so, I'm not sure how to detect that;
if not... what should happen to the orphaned root_ht?  Should it
remain on the list?  We might have two tcf_proto sharing tp->data,
so tp_c and its list might very well survive the u32_destroy()...

Note, BTW, that if we do leave the orphan on the list and later
change the tc_u_knode so that ->ht_down doesn't point to that
thing anymore, we'll get its refcount incremented to 2 in
u32_init_knode(), then decremented to 1 by u32_set_parms() and
then arrange for u32_delete_key_work() to be run.  Which will 
drive the refcount to 0 and free the damn thing.  While it's
still in the middle of ->hlist...

  reply	other threads:[~2018-08-28  0:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-26  5:58 [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL Kees Cook
2018-08-26  6:15 ` Al Viro
2018-08-26  6:19   ` Kees Cook
2018-08-26 17:30     ` Jamal Hadi Salim
2018-08-26 21:56       ` Kees Cook
2018-08-27 11:46         ` Jamal Hadi Salim
2018-08-27 14:08           ` Kees Cook
2018-08-27 14:26             ` Roman Mashak
2018-08-26 17:32     ` Al Viro
2018-08-26 18:57       ` Joe Perches
2018-08-26 21:24         ` Al Viro
2018-08-26 22:26           ` Joe Perches
2018-08-26 22:43             ` Al Viro
2018-08-27  2:00               ` Julia Lawall
2018-08-27  2:35                 ` Al Viro
2018-08-27  3:35                   ` Julia Lawall
2018-08-27  4:04                     ` Al Viro
2018-08-27  4:41                       ` Julia Lawall
2018-08-27  1:59             ` Julia Lawall
2018-08-26 22:57       ` Al Viro
2018-08-27 11:57         ` Jamal Hadi Salim
2018-08-27 21:31           ` Cong Wang
2018-08-28  0:03             ` Al Viro [this message]
2018-08-28 15:59               ` Al Viro
2018-08-31  4:03                 ` Al Viro
2018-08-29 19:07               ` Cong Wang
2018-08-29 21:33                 ` Al Viro
2018-08-26 21:22 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180828000310.GE6515@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).