All of lore.kernel.org
 help / color / mirror / Atom feed
* [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
@ 2015-01-14 21:13 Jeff Layton
  2015-01-14 22:48 ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2015-01-14 21:13 UTC (permalink / raw)
  To: netdev; +Cc: Johannes Berg

While running the trinity fuzzer in a KVM guest, I ^C'ed it at an
(apparent) inopportune moment, and saw a bunch of these WARN_ONs pop:

[ 2375.793396] ------------[ cut here ]------------
[ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
[ 2375.793417] Modules linked in: af_key crypto_user nfnetlink vmw_vsock_vmci_transport vmw_vmci vsock nfc af_802154 ieee802154 bluetooth rfkill pppoe pppox ppp_generic slhc irda crc_ccitt rds rose atm netrom appletalk ipx p8023 p8022 psnap ax25 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul ppdev crc32c_intel snd ghash_clmulni_intel pa
 rport_pc joydev parport serio_raw pcspkr virtio_balloon soundcore virtio_console pvpanic i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm virtio_pci ata_generic virtio_ring drm pata_acpi virtio floppy
[ 2375.793424] CPU: 0 PID: 1149 Comm: trinity-main Tainted: G        W      3.19.0-rc4+ #26
[ 2375.793424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[ 2375.793426]  0000000000000000 000000002a81588d ffff8800b932bd18 ffffffff8186adb8
[ 2375.793427]  0000000000000000 0000000000000000 ffff8800b932bd58 ffffffff810ac9da
[ 2375.793428]  ffffffff81f8c940 0000000000000000 ffffffff81f8c940 000000000000000f
[ 2375.793429] Call Trace:
[ 2375.793431]  [<ffffffff8186adb8>] dump_stack+0x4c/0x65
[ 2375.793432]  [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
[ 2375.793433]  [<ffffffff810acb0a>] warn_slowpath_null+0x1a/0x20
[ 2375.793435]  [<ffffffff8175e580>] genl_unbind+0xc0/0xd0
[ 2375.793436]  [<ffffffff8175cbdd>] netlink_release+0x19d/0x3f0
[ 2375.793437]  [<ffffffff81700aaf>] sock_release+0x1f/0x90
[ 2375.793439]  [<ffffffff81700b32>] sock_close+0x12/0x20
[ 2375.793440]  [<ffffffff81275098>] __fput+0xf8/0x220
[ 2375.793441]  [<ffffffff8127520e>] ____fput+0xe/0x10
[ 2375.793443]  [<ffffffff810d29f4>] task_work_run+0xb4/0xe0
[ 2375.793444]  [<ffffffff810b040d>] do_exit+0x37d/0xcf0
[ 2375.793445]  [<ffffffff810b0e24>] do_group_exit+0x54/0xe0
[ 2375.793447]  [<ffffffff810b0ec4>] SyS_exit_group+0x14/0x20
[ 2375.793448]  [<ffffffff81874ff5>] tracesys_phase2+0xd4/0xd9
[ 2375.793449] ---[ end trace be07601dc6d6ab5c ]---

trinity was run with no arguments. I had it happen a couple of times
while running it, but it doesn't happen every time. It seems to be
somewhat contingent on signaling it at the right time.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
  2015-01-14 21:13 [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0() Jeff Layton
@ 2015-01-14 22:48 ` Johannes Berg
  2015-01-14 23:25   ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2015-01-14 22:48 UTC (permalink / raw)
  To: Jeff Layton; +Cc: netdev

On Wed, 2015-01-14 at 16:13 -0500, Jeff Layton wrote:
> While running the trinity fuzzer in a KVM guest, I ^C'ed it at an
> (apparent) inopportune moment, and saw a bunch of these WARN_ONs pop:

A bunch?!

> [ 2375.793396] ------------[ cut here ]------------
> [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()

This warning is supposed to happen only when you somehow manage to
unsubscribe from a generic netlink group that doesn't actually exist, or
so.

Do you have any sort of log/trace of what trinity actually did at/before
this point?

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
  2015-01-14 22:48 ` Johannes Berg
@ 2015-01-14 23:25   ` Johannes Berg
  2015-01-15  2:20     ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2015-01-14 23:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: netdev

On Wed, 2015-01-14 at 23:48 +0100, Johannes Berg wrote:

> > [ 2375.793396] ------------[ cut here ]------------
> > [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
> 
> This warning is supposed to happen only when you somehow manage to
> unsubscribe from a generic netlink group that doesn't actually exist, or
> so.

Ok - after long deliberation I found a way to trigger it. It requires
that you leave a multicast group (likely by destroying a socket) at the
same time as the kernel unregisters the generic netlink group. I have no
idea what generic netlink group you might be using here, but I could
reproduce it with a strategically placed delay in the netlink code and
the nl80211 genl group by opening a socket, closing the socket, and
removing the cfg80211 module (to unregister the nl80211 genl group)
while the socket was still being closed.

I'll think about a fix tomorrow - it doesn't seem trivial due to
possible locking concerns.

On the bright side, I cannot see a way to reproduce this without
removing the genl family at the same time - which is good because it
means that I've just again audited the case I was worried about (the
bind/unbind not being symmetric) - it is asymmetric but only in the case
of genl family removal which seems reasonable (but I should document
it.)

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
  2015-01-14 23:25   ` Johannes Berg
@ 2015-01-15  2:20     ` Jeff Layton
  2015-01-15  8:37       ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2015-01-15  2:20 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Jeff Layton, netdev

On Thu, 15 Jan 2015 00:25:46 +0100
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Wed, 2015-01-14 at 23:48 +0100, Johannes Berg wrote:
> 
> > > [ 2375.793396] ------------[ cut here ]------------
> > > [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
> > 
> > This warning is supposed to happen only when you somehow manage to
> > unsubscribe from a generic netlink group that doesn't actually exist, or
> > so.
> 
> Ok - after long deliberation I found a way to trigger it. It requires
> that you leave a multicast group (likely by destroying a socket) at the
> same time as the kernel unregisters the generic netlink group. I have no
> idea what generic netlink group you might be using here, but I could
> reproduce it with a strategically placed delay in the netlink code and
> the nl80211 genl group by opening a socket, closing the socket, and
> removing the cfg80211 module (to unregister the nl80211 genl group)
> while the socket was still being closed.
> 
> I'll think about a fix tomorrow - it doesn't seem trivial due to
> possible locking concerns.
> 
> On the bright side, I cannot see a way to reproduce this without
> removing the genl family at the same time - which is good because it
> means that I've just again audited the case I was worried about (the
> bind/unbind not being symmetric) - it is asymmetric but only in the case
> of genl family removal which seems reasonable (but I should document
> it.)
> 
> johannes
> 

Cool.

FWIW, it popped around a dozen times or so?

Unfortunately, I didn't save the logs from the run. I'll try to
reproduce it again tomorrow (and save the logs this time), but I don't
see it every time.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
  2015-01-15  2:20     ` Jeff Layton
@ 2015-01-15  8:37       ` Johannes Berg
       [not found]         ` <20150115071426.563573e7@tlielax.poochiereds.net>
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2015-01-15  8:37 UTC (permalink / raw)
  To: Jeff Layton; +Cc: netdev

On Wed, 2015-01-14 at 21:20 -0500, Jeff Layton wrote:

> > Ok - after long deliberation I found a way to trigger it. It requires
> > that you leave a multicast group (likely by destroying a socket) at the
> > same time as the kernel unregisters the generic netlink group. I have no
> > idea what generic netlink group you might be using here, but I could
> > reproduce it with a strategically placed delay in the netlink code and
> > the nl80211 genl group by opening a socket, closing the socket, and
> > removing the cfg80211 module (to unregister the nl80211 genl group)
> > while the socket was still being closed.
> > 
> > I'll think about a fix tomorrow - it doesn't seem trivial due to
> > possible locking concerns.

> FWIW, it popped around a dozen times or so?

Yeah it would pop up for every multicast group that any socket you owned
while closing the program (which of course closes the sockets) would
have opened on that genl family.

The thing that confuses me is how you managed to unregister a genl
family at literally the same time, but I cannot find - from code review
- a way to trigger it without that. If the family goes away cleanly
before then the groups of all open sockets are cleared so it can't
happen, and if the family is still alive when the socket is closed then
of course it also can't happen.

> Unfortunately, I didn't save the logs from the run. I'll try to
> reproduce it again tomorrow (and save the logs this time), but I don't
> see it every time.

If you do manage to do that it would be great to confirm that it is
indeed the scenario I found (and reproduced).

Thanks,
johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
       [not found]           ` <1421324985.1962.5.camel@sipsolutions.net>
@ 2015-01-15 12:49             ` Jeff Layton
  2015-01-15 13:01               ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2015-01-15 12:49 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Jeff Layton, netdev

On Thu, 15 Jan 2015 13:29:45 +0100
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Thu, 2015-01-15 at 07:14 -0500, Jeff Layton wrote:
> 
> > I was able to reproduce it again this morning. This time it generated
> > 28 warnings (all with similar stack traces to the one I posted
> > yesterday).
> 
> Thanks. I just took a look at this, and it seems that sadly it doesn't
> include netlink messages in the log? At least I didn't find them - so I
> don't really know what happened. What I was thinking happened would
> (mostly) play out over netlink messages.
> 

(re-cc'ing netdev -- I sent Johannes the log in a private email as
 it was 12M compressed)

It may have. I just recently started playing with trinity, so I don't
know much about what it does. I do see these sorts of messages in the
logs that imply that it's opening netlink sockets:

[main] fd[661] = domain:16 (PF_NETLINK) type:0x2 protocol:10
[main] fd[675] = domain:16 (PF_NETLINK) type:0x2 protocol:0

...and then it does random I/Os on those fds.

> > I haven't tested your patchset out yet, but I can try to do that
> > later.
> 
> You won't be able to reproduce the warning with it since I removed it
> there. I'm reasonably certain though that you triggered one of the two
> cases I found, more likely the former that doesn't involve removing
> genetlink families (don't see how you could have done that with trinity
> unless it randomly loads and unloads modules?)
> 

Ok, sounds good.

FWIW, it tries to call delete_module, but I was running this as an
unprivileged user so I don't think that can succeed.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0()
  2015-01-15 12:49             ` Jeff Layton
@ 2015-01-15 13:01               ` Johannes Berg
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2015-01-15 13:01 UTC (permalink / raw)
  To: Jeff Layton; +Cc: netdev

On Thu, 2015-01-15 at 07:49 -0500, Jeff Layton wrote:

> It may have. I just recently started playing with trinity, so I don't
> know much about what it does. I do see these sorts of messages in the
> logs that imply that it's opening netlink sockets:
> 
> [main] fd[661] = domain:16 (PF_NETLINK) type:0x2 protocol:10
> [main] fd[675] = domain:16 (PF_NETLINK) type:0x2 protocol:0
> 
> ...and then it does random I/Os on those fds.

Yes. We're looking for generic netlink, so one of these (protocol 16):

[main] fd[427] = domain:16 (PF_NETLINK) type:0x3 protocol:16
[main] fd[442] = domain:16 (PF_NETLINK) type:0x2 protocol:16
[main] fd[774] = domain:16 (PF_NETLINK) type:0x2 protocol:16

However, I didn't see much happening on these fds, and even if it
wouldn't have recorded the kind of messages.

OTOH, multicast group stuff is done with setsockopt() and/or bind, which
I didn't really find here?

> FWIW, it tries to call delete_module, but I was running this as an
> unprivileged user so I don't think that can succeed.

Ok.

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-01-15 13:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-14 21:13 [ 2375.793397] WARNING: CPU: 0 PID: 1149 at net/netlink/genetlink.c:1037 genl_unbind+0xc0/0xd0() Jeff Layton
2015-01-14 22:48 ` Johannes Berg
2015-01-14 23:25   ` Johannes Berg
2015-01-15  2:20     ` Jeff Layton
2015-01-15  8:37       ` Johannes Berg
     [not found]         ` <20150115071426.563573e7@tlielax.poochiereds.net>
     [not found]           ` <1421324985.1962.5.camel@sipsolutions.net>
2015-01-15 12:49             ` Jeff Layton
2015-01-15 13:01               ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.