All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@fb.com>, "Ted Ts'o" <tytso@mit.edu>,
	Christoph Lameter <cl@linux.com>,
	David Miller <davem@davemloft.net>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Aaron Conole <aconole@bytheb.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	NetFilter <netfilter-devel@vger.kernel.org>
Subject: Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
Date: Mon, 10 Oct 2016 02:51:05 +0200	[thread overview]
Message-ID: <20161010005105.GA18349@breakpoint.cc> (raw)
In-Reply-To: <CA+55aFz2U83pG0v12E--nd5c6GZAUpVnT3jHuAwHCFk5XbVX0w@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sun, Oct 9, 2016 at 12:11 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Anyway, I don't think I can bisect it, but I'll try to narrow it down
> > a *bit* at least.
> >
> > Not doing any more pulls on this unstable base, I've been puttering
> > around in trying to clean up some stupid printk logging issues
> > instead.
> 
> So I finally got a oops with slub debugging enabled. It doesn't really
> narrow things down, though, it kind of extends on the possible
> suspects. Now adding David Miller and Pablo, because it looks like it
> may be netfilter that does something bad and corrupts memory.

Quite possible, the netns interactions are not nice :-/

> Without further ado, here's the new oops:
> 
>    general protection fault: 0000 [#1] SMP
>    CPU: 7 PID: 169 Comm: kworker/u16:7 Not tainted 4.8.0-11288-gb66484cd7470 #1
>    Hardware name: System manufacturer System Product Name/Z170-K, BIOS
..
>    Call Trace:
>      netfilter_net_exit+0x2f/0x60
>      ops_exit_list.isra.4+0x38/0x60
>      cleanup_net+0x1ba/0x2a0
>      process_one_work+0x1f1/0x480
>      worker_thread+0x48/0x4d0
>      ? process_one_work+0x480/0x480

..

> like it's a pointer loaded from a free'd allocation.
> 
> The code disassembles to
> 
>    0: 0f b6 ca             movzbl %dl,%ecx
>    3: 48 8d 84 c8 00 01 00 lea    0x100(%rax,%rcx,8),%rax
>    a: 00
>    b: 49 8b 5c c5 00       mov    0x0(%r13,%rax,8),%rbx
>   10: 48 85 db             test   %rbx,%rbx
>   13: 0f 84 cb 00 00 00     je     0xe4
>   19: 4c 3b 63 40           cmp    0x40(%rbx),%r12
>   1d: 48 8b 03             mov    (%rbx),%rax
>   20: 0f 84 e9 00 00 00     je     0x10f
>   26: 48 85 c0             test   %rax,%rax
>   29: 74 26                 je     0x51
>   2b:* 4c 3b 60 40           cmp    0x40(%rax),%r12 <-- trapping instruction
>   2f: 75 08                 jne    0x39
>   31: e9 ef 00 00 00       jmpq   0x125
>   36: 48 89 d8             mov    %rbx,%rax
>   39: 48 8b 18             mov    (%rax),%rbx
>   3c: 48 85 db             test   %rbx,%rbx
> 
> and that oopsing instruction seems to be the compare of
> "hooks_entry->orig_ops" from hooks_entry in this expression:
> 
>         if (hooks_entry && hooks_entry->orig_ops == reg) {
> 
> so hooks_entry() is bogus. It was gotten from
> 
>         hooks_entry = nf_hook_entry_head(net, reg);
> 
> but that's as far as I dug. And yes, I do have
> CONFIG_NETFILTER_INGRESS=y in case that matters.
> 
> And all this code has changed pretty radically in commit e3b37f11e6e4
> ("netfilter: replace list_head with single linked list"), and there
> was clearly already something wrong with that code, with commit
> 5119e4381a90 ("netfilter: Fix potential null pointer dereference")
> adding the test against NULL. But I suspect that only hid the "oops,
> it's actually not NULL, it loaded some uninitialized value" problem.
> 
> Over to the networking guys.. Ideas?

Sorry, not off the top of my head.
Pablo is currently travelling back home from netdev 1.2 in Tokyo,
I can help starting Wednesday when I am back.

One shot in the dark (not even compile tested; wonder if we can end up
zapping bogus hook ...)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb..fd6a2ce 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -189,6 +189,9 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
 
 unlock:
        mutex_unlock(&nf_hook_mutex);
+
+       WARN_ON(hooks_entry && hooks_entry->orig_ops != reg);
+
        if (!hooks_entry) {
                WARN(1, "nf_unregister_net_hook: hook not found!\n");
                return;

  reply	other threads:[~2016-10-10  0:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-09 21:31 slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Linus Torvalds
2016-10-10  0:51 ` Florian Westphal [this message]
2016-10-10  1:35   ` Aaron Conole
2016-10-10  2:49     ` Linus Torvalds
2016-10-10  3:41       ` Linus Torvalds
2016-10-10  3:57         ` slab corruption with current -git David Miller
2016-10-10  8:24           ` David Miller
2016-10-10 16:15             ` Linus Torvalds
2016-10-11 13:17             ` Michal Kubecek
2016-10-11 13:55               ` Aaron Conole
2016-10-10 13:49         ` slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Aaron Conole
2016-10-10 16:28           ` Linus Torvalds
2016-10-10 19:05             ` Linus Torvalds
2016-10-10 19:18               ` Aaron Conole
2016-10-11  0:30               ` slab corruption with current -git David Miller
2016-10-11  0:54                 ` Linus Torvalds
2016-10-11  5:39         ` slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Linus Torvalds
2016-10-11  5:47           ` Linus Torvalds
2016-10-11  8:57             ` slab corruption with current -git David Miller
2016-10-13  6:02               ` Markus Trippelsdorf
2016-10-13  6:06                 ` Markus Trippelsdorf
     [not found]                   ` <CA+55aFwsUR4-YmOYgJOOO4a2e48M4_tk7YhAo4s5KZQQxUjpZw@mail.gmail.com>
2016-10-13  6:27                     ` Markus Trippelsdorf
2016-10-13  6:27                       ` Markus Trippelsdorf
2016-10-13 19:49                       ` Linus Torvalds
2016-10-13 20:43                         ` Florian Westphal
2016-10-13 21:32                         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161010005105.GA18349@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=aconole@bytheb.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.