All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Conole <aconole@bytheb.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Westphal <fw@strlen.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@fb.com>, "Ted Ts'o" <tytso@mit.edu>,
	Christoph Lameter <cl@linux.com>,
	David Miller <davem@davemloft.net>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	NetFilter <netfilter-devel@vger.kernel.org>
Subject: Re: slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))
Date: Mon, 10 Oct 2016 15:18:13 -0400	[thread overview]
Message-ID: <f7td1j8xbbe.fsf@redhat.com> (raw)
In-Reply-To: <CA+55aFy0szySf+SnysjXTyfiU=RMBo9U1sHAVaTKG=tUTF+XGw@mail.gmail.com> (Linus Torvalds's message of "Mon, 10 Oct 2016 12:05:17 -0700")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Oct 10, 2016 at 9:28 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> So as I already answered to Dave, I'm not actually sure that this was
>> the buggy code, or that my patch would make any difference at all.
>
> My patch does seem to fix things, and in fact the warning about "hook
> not found" now triggers.
>
> So I think the bug really was that the singly-linked list handling
> code did not correctly handle the case of not finding the entry, and
> then freed (incorrectly) the last one that wasn't actually unlinked.
>
> In fact, I get quite a few warnings (56 total) about 30 seconds after
> logging in:
>
> [   54.213170] WARNING: CPU: 1 PID: 111 at net/netfilter/core.c:151
> nf_unregister_net_hook+0x8e/0x170
> ... repeat 54 times ...
> [   54.445520] WARNING: CPU: 7 PID: 111 at net/netfilter/core.c:151
> nf_unregister_net_hook+0x8e/0x170
>
> and looking in the journal, the first one is (again) immediately
> preceded by that systemd-hostnamed service stopping:
>
>   Oct 10 11:45:47 i7 audit[1546]: USER_LOGIN
>   ...
>   Oct 10 11:46:11 i7 audit[1]: SERVICE_STOP pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
> msg='unit=fprintd comm="systemd" exe="/usr/lib/systemd/systemd"
> hostname=? addr=? terminal=? res=success'
>   Oct 10 11:46:13 i7 pulseaudio[1697]: [pulseaudio] bluez5-util.c:
> GetManagedObjects() failed: org.freedesktop.DBus.Error.NoReply: Did
> not receive a reply. Possible causes include: the remote application
> did not send a reply, the message bus security policy blocked the
> reply, the reply timeout expir
>   Oct 10 11:46:13 i7 dbus-daemon[1003]: [system] Failed to activate
> service 'org.bluez': timed out
>   Oct 10 11:46:20 i7 audit[1]: SERVICE_STOP pid=1 uid=0
> auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
> msg='unit=systemd-hostnamed comm="systemd"
> exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
> res=success'
>   Oct 10 11:46:20 i7 kernel: ------------[ cut here ]------------
>   Oct 10 11:46:20 i7 kernel: WARNING: CPU: 1 PID: 111 at
> net/netfilter/core.c:151 nf_unregister_net_hook+0x8e/0x170
>
> so I do think it's something to do with some network startup service
> thing (perhaps dhcp, perhaps chrome, who knows) as I do my initial
> login.
>
> David - I think that also explains what was wrong with the old code.
> In the old code, this loop:
>
>         while (hooks_entry && nf_entry_dereference(hooks_entry->next)) {
>
> would exit with "hooks_entry" pointing to the last list entry (because
> ->next was NULL). Nothing was ever unlinked in the loop itself,
> because it never actually found a matching entry, but then after the
> loop it would free that last entry because it *thought* that was the
> match.
>
> My list rewrite fixes that.
>
> Anyway, I'm assuming it will come to me from the networking tree after
> more testing by the maintainers. You can add my
>
>   Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> to the patch, though.
>
> David, if you want me to just commit that thing directly, I can
> obviously do so, but I do think somebody should look at
>
>  (a) that I actually got the priority list ordering right on the
>  insertion side

It looks correct.

Reviewed-by: Aaron Conole <aconole@bytheb.org>

>  (b) what it is that makes it try to unregister that hook that isn't
> on the list in the first place

This is a still problem, I think.  I wasn't able to reproduce the issue
on a fedora-23 VM.  My fedora 24 bare-metal system does trigger this,
though.  Not sure what changed in userspace/kernel interaction side (not
an excuse, but just an observation).

> but on the whole I consider this issue explained and solved. I'll
> continue to run with my patch on my machine (just not committed).

Okay.  Very sorry for this, again.

>               Linus

  reply	other threads:[~2016-10-10 19:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-09 21:31 slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Linus Torvalds
2016-10-10  0:51 ` Florian Westphal
2016-10-10  1:35   ` Aaron Conole
2016-10-10  2:49     ` Linus Torvalds
2016-10-10  3:41       ` Linus Torvalds
2016-10-10  3:57         ` slab corruption with current -git David Miller
2016-10-10  8:24           ` David Miller
2016-10-10 16:15             ` Linus Torvalds
2016-10-11 13:17             ` Michal Kubecek
2016-10-11 13:55               ` Aaron Conole
2016-10-10 13:49         ` slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Aaron Conole
2016-10-10 16:28           ` Linus Torvalds
2016-10-10 19:05             ` Linus Torvalds
2016-10-10 19:18               ` Aaron Conole [this message]
2016-10-11  0:30               ` slab corruption with current -git David Miller
2016-10-11  0:54                 ` Linus Torvalds
2016-10-11  5:39         ` slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice)) Linus Torvalds
2016-10-11  5:47           ` Linus Torvalds
2016-10-11  8:57             ` slab corruption with current -git David Miller
2016-10-13  6:02               ` Markus Trippelsdorf
2016-10-13  6:06                 ` Markus Trippelsdorf
     [not found]                   ` <CA+55aFwsUR4-YmOYgJOOO4a2e48M4_tk7YhAo4s5KZQQxUjpZw@mail.gmail.com>
2016-10-13  6:27                     ` Markus Trippelsdorf
2016-10-13  6:27                       ` Markus Trippelsdorf
2016-10-13 19:49                       ` Linus Torvalds
2016-10-13 20:43                         ` Florian Westphal
2016-10-13 21:32                         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7td1j8xbbe.fsf@redhat.com \
    --to=aconole@bytheb.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.