linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: BUG still has locks held in unix_stream_splice_read
@ 2016-10-09 10:05 Dmitry Vyukov
  2016-10-09 10:06 ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2016-10-09 10:05 UTC (permalink / raw)
  To: Al Viro, David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML

Hello,

While running syzkaller fuzzer on commit
b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting:

[ BUG: syz-executor/15138 still has locks held! ]
4.8.0+ #29 Not tainted
-------------------------------------
1 lock held by syz-executor/15138:
 #0:  (&pipe->mutex/1){+.+.+.}, at: [<     inline     >]
pipe_lock_nested fs/pipe.c:66
 #0:  (&pipe->mutex/1){+.+.+.}, at: [<ffffffff81844c8b>]
pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 ffff880044d4fa38 ffffffff82d383c9 ffffffff00000000 fffffbfff1097248
 ffff88005a44a3c0 ffff88005a44a3c0 dffffc0000000000 ffff88005a44a3c0
 ffff8800541fb9b8 ffff880044d4fa58 ffffffff81463cd5 0000000000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82d383c9>] dump_stack+0x12e/0x185 lib/dump_stack.c:51
 [<     inline     >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [<ffffffff81463cd5>] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [<     inline     >] try_to_freeze include/linux/freezer.h:65
 [<     inline     >] freezer_count include/linux/freezer.h:127
 [<     inline     >] freezable_schedule_timeout include/linux/freezer.h:192
 [<     inline     >] unix_stream_data_wait net/unix/af_unix.c:2223
 [<ffffffff860bae67>] unix_stream_read_generic+0x1317/0x1b70
net/unix/af_unix.c:2332
 [<ffffffff860bb81b>] unix_stream_splice_read+0x15b/0x1d0
net/unix/af_unix.c:2506
 [<ffffffff85afc56e>] sock_splice_read+0xbe/0x100 net/socket.c:775
 [<ffffffff818d121f>] do_splice_to+0x10f/0x170 fs/splice.c:908
 [<     inline     >] do_splice fs/splice.c:1196
 [<     inline     >] SYSC_splice fs/splice.c:1420
 [<ffffffff818d6aec>] SyS_splice+0x114c/0x15b0 fs/splice.c:1403
 [<ffffffff86da6d05>] entry_SYSCALL_64_fastpath+0x23/0xc6


I suspect this is:

commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Sep 17 21:02:10 2016 -0400
    skb_splice_bits(): get rid of callback
    since pipe_lock is the outermost now, we don't need to drop/regain
    socket locks around the call of splice_to_pipe() from skb_splice_bits(),
    which kills the need to have a socket-specific callback; we can just
    call splice_to_pipe() and be done with that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-09 10:05 net: BUG still has locks held in unix_stream_splice_read Dmitry Vyukov
@ 2016-10-09 10:06 ` Dmitry Vyukov
  2016-10-10  2:46   ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2016-10-09 10:06 UTC (permalink / raw)
  To: Al Viro, David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML
  Cc: syzkaller

Hello,

While running syzkaller fuzzer on commit
b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting:

[ BUG: syz-executor/15138 still has locks held! ]
4.8.0+ #29 Not tainted
-------------------------------------
1 lock held by syz-executor/15138:
 #0:  (&pipe->mutex/1){+.+.+.}, at: [<     inline     >]
pipe_lock_nested fs/pipe.c:66
 #0:  (&pipe->mutex/1){+.+.+.}, at: [<ffffffff81844c8b>]
pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
 ffff880044d4fa38 ffffffff82d383c9 ffffffff00000000 fffffbfff1097248
 ffff88005a44a3c0 ffff88005a44a3c0 dffffc0000000000 ffff88005a44a3c0
 ffff8800541fb9b8 ffff880044d4fa58 ffffffff81463cd5 0000000000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82d383c9>] dump_stack+0x12e/0x185 lib/dump_stack.c:51
 [<     inline     >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [<ffffffff81463cd5>] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [<     inline     >] try_to_freeze include/linux/freezer.h:65
 [<     inline     >] freezer_count include/linux/freezer.h:127
 [<     inline     >] freezable_schedule_timeout include/linux/freezer.h:192
 [<     inline     >] unix_stream_data_wait net/unix/af_unix.c:2223
 [<ffffffff860bae67>] unix_stream_read_generic+0x1317/0x1b70
net/unix/af_unix.c:2332
 [<ffffffff860bb81b>] unix_stream_splice_read+0x15b/0x1d0
net/unix/af_unix.c:2506
 [<ffffffff85afc56e>] sock_splice_read+0xbe/0x100 net/socket.c:775
 [<ffffffff818d121f>] do_splice_to+0x10f/0x170 fs/splice.c:908
 [<     inline     >] do_splice fs/splice.c:1196
 [<     inline     >] SYSC_splice fs/splice.c:1420
 [<ffffffff818d6aec>] SyS_splice+0x114c/0x15b0 fs/splice.c:1403
 [<ffffffff86da6d05>] entry_SYSCALL_64_fastpath+0x23/0xc6


I suspect this is:

commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Sep 17 21:02:10 2016 -0400
    skb_splice_bits(): get rid of callback
    since pipe_lock is the outermost now, we don't need to drop/regain
    socket locks around the call of splice_to_pipe() from skb_splice_bits(),
    which kills the need to have a socket-specific callback; we can just
    call splice_to_pipe() and be done with that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-09 10:06 ` Dmitry Vyukov
@ 2016-10-10  2:46   ` Al Viro
  2016-10-10  3:14     ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2016-10-10  2:46 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML,
	syzkaller

On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> I suspect this is:
> 
> commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sat Sep 17 21:02:10 2016 -0400
>     skb_splice_bits(): get rid of callback
>     since pipe_lock is the outermost now, we don't need to drop/regain
>     socket locks around the call of splice_to_pipe() from skb_splice_bits(),
>     which kills the need to have a socket-specific callback; we can just
>     call splice_to_pipe() and be done with that.

Unlikely, since that particular commit removes unlocking/relocking ->iolock
around the call of splice_to_pipe().  Original would've retaken the same
lock on the way out; it's not as if we could leave the syscall there.

It might be splice-related, but I don't believe that you've got the right
commit here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-10  2:46   ` Al Viro
@ 2016-10-10  3:14     ` Al Viro
  2016-10-10  8:01       ` Dmitry Vyukov
  2016-11-17 21:44       ` Cong Wang
  0 siblings, 2 replies; 8+ messages in thread
From: Al Viro @ 2016-10-10  3:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML,
	syzkaller

On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> > I suspect this is:
> > 
> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> > Author: Al Viro <viro@zeniv.linux.org.uk>
> > Date:   Sat Sep 17 21:02:10 2016 -0400
> >     skb_splice_bits(): get rid of callback
> >     since pipe_lock is the outermost now, we don't need to drop/regain
> >     socket locks around the call of splice_to_pipe() from skb_splice_bits(),
> >     which kills the need to have a socket-specific callback; we can just
> >     call splice_to_pipe() and be done with that.
> 
> Unlikely, since that particular commit removes unlocking/relocking ->iolock
> around the call of splice_to_pipe().  Original would've retaken the same
> lock on the way out; it's not as if we could leave the syscall there.
> 
> It might be splice-related, but I don't believe that you've got the right
> commit here.

It's not that commit, all right - it's "can't call unix_stream_read_generic()
with any locks held" stepped onto a couple of commits prior by
"splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
what is that about?

E.g what will happen if some code does a read on AF_UNIX socket with
some local mutex held?  AFAICS, there are exactly two callers of
freezable_schedule_timeout() - this one and one in XFS; the latter is
in a kernel thread where we do have good warranties about the locking
environment, but here it's in the bleeding ->recvmsg/->splice_read and
for those assumption that caller doesn't hold any locks is pretty
strong, especially since it's not documented anywhere.

What's going on there?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-10  3:14     ` Al Viro
@ 2016-10-10  8:01       ` Dmitry Vyukov
  2016-11-17 18:02         ` Dmitry Vyukov
  2016-11-17 21:44       ` Cong Wang
  1 sibling, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2016-10-10  8:01 UTC (permalink / raw)
  To: Al Viro
  Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML,
	syzkaller

On Mon, Oct 10, 2016 at 5:14 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>> > I suspect this is:
>> >
>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>> > Author: Al Viro <viro@zeniv.linux.org.uk>
>> > Date:   Sat Sep 17 21:02:10 2016 -0400
>> >     skb_splice_bits(): get rid of callback
>> >     since pipe_lock is the outermost now, we don't need to drop/regain
>> >     socket locks around the call of splice_to_pipe() from skb_splice_bits(),
>> >     which kills the need to have a socket-specific callback; we can just
>> >     call splice_to_pipe() and be done with that.
>>
>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>> around the call of splice_to_pipe().  Original would've retaken the same
>> lock on the way out; it's not as if we could leave the syscall there.
>>
>> It might be splice-related, but I don't believe that you've got the right
>> commit here.
>
> It's not that commit

It's highly likely. Sorry for falsely pointing to your commit.


> , all right - it's "can't call unix_stream_read_generic()
> with any locks held" stepped onto a couple of commits prior by
> "splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
> what is that about?
>
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held?  AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

I never saw that warning before. There is some possibility that fuzzer
has discovered some new paths, but it's much more likely that
something has changed recently (the stack looks quite simple -- just a
splice from unix socket). And my previous pull was like a week ago.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-10  8:01       ` Dmitry Vyukov
@ 2016-11-17 18:02         ` Dmitry Vyukov
  0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Vyukov @ 2016-11-17 18:02 UTC (permalink / raw)
  To: Al Viro
  Cc: David Miller, Hannes Frederic Sowa, Eric Dumazet, netdev, LKML,
	syzkaller

On Mon, Oct 10, 2016 at 10:01 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Oct 10, 2016 at 5:14 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>>> > I suspect this is:
>>> >
>>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>>> > Author: Al Viro <viro@zeniv.linux.org.uk>
>>> > Date:   Sat Sep 17 21:02:10 2016 -0400
>>> >     skb_splice_bits(): get rid of callback
>>> >     since pipe_lock is the outermost now, we don't need to drop/regain
>>> >     socket locks around the call of splice_to_pipe() from skb_splice_bits(),
>>> >     which kills the need to have a socket-specific callback; we can just
>>> >     call splice_to_pipe() and be done with that.
>>>
>>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>>> around the call of splice_to_pipe().  Original would've retaken the same
>>> lock on the way out; it's not as if we could leave the syscall there.
>>>
>>> It might be splice-related, but I don't believe that you've got the right
>>> commit here.
>>
>> It's not that commit
>
> It's highly likely. Sorry for falsely pointing to your commit.
>
>
>> , all right - it's "can't call unix_stream_read_generic()
>> with any locks held" stepped onto a couple of commits prior by
>> "splice: lift pipe_lock out of splice_to_pipe()".  Could somebody explain
>> what is that about?
>>
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held?  AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
>
> I never saw that warning before. There is some possibility that fuzzer
> has discovered some new paths, but it's much more likely that
> something has changed recently (the stack looks quite simple -- just a
> splice from unix socket). And my previous pull was like a week ago.

Ping. Just hit it again on 4.9-rc5


[ BUG: syz-executor/15922 still has locks held! ]
4.9.0-rc5+ #43 Not tainted
-------------------------------------
1 lock held by syz-executor/15922:
 #0: [ 1441.143288]  (
[<     inline     >] pipe_lock_nested fs/pipe.c:66
[<ffffffff81a8d72b>] pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 3 PID: 15922 Comm: syz-executor Not tainted 4.9.0-rc5+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88004a98f568 ffffffff834c2a19 ffffffff00000003 1ffff10009531e40
 ffffed0009531e38 0000000041b58ab3 ffffffff895758b0 ffffffff834c272b
 0000000000000003 ffff880035256640 0000000000000003 ffff88006d122cd8
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff834c2a19>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<     inline     >] print_held_locks_bug kernel/locking/lockdep.c:4296
 [<ffffffff81560245>] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
 [<     inline     >] try_to_freeze include/linux/freezer.h:65
 [<     inline     >] freezer_count include/linux/freezer.h:127
 [<     inline     >] freezable_schedule_timeout include/linux/freezer.h:192
 [<ffffffff8716ed5d>] unix_stream_data_wait+0x4fd/0x910 net/unix/af_unix.c:2223
 [<ffffffff871749a2>] unix_stream_read_generic+0x11e2/0x2240
net/unix/af_unix.c:2332
 [<ffffffff87175c7f>] unix_stream_splice_read+0x27f/0x400
net/unix/af_unix.c:2506
 [<ffffffff86a6578e>] sock_splice_read+0xbe/0x100 net/socket.c:772
 [<ffffffff81b3f90f>] do_splice_to+0x10f/0x170 fs/splice.c:897
 [<     inline     >] do_splice fs/splice.c:1185
 [<     inline     >] SYSC_splice fs/splice.c:1409
 [<ffffffff81b4683a>] SyS_splice+0xfaa/0x16a0 fs/splice.c:1392
 [<ffffffff88147905>] entry_SYSCALL_64_fastpath+0x23/0xc6

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-10-10  3:14     ` Al Viro
  2016-10-10  8:01       ` Dmitry Vyukov
@ 2016-11-17 21:44       ` Cong Wang
  2016-11-17 22:27         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 8+ messages in thread
From: Cong Wang @ 2016-11-17 21:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Dmitry Vyukov, David Miller, Hannes Frederic Sowa, Eric Dumazet,
	netdev, LKML, syzkaller, Colin Cross, Mandeep Singh Baines

On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held?  AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably correct
at that time, but later, commit 2b514574f7e88c8498027ee366
("net: af_unix: implement splice for stream af_unix sockets") breaks its
requirement for a freezable sleep:

    commit 0f9548ca10916dec166eaf74c816bded7d8e611d

    lockdep: check that no locks held at freeze time

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

So probably we just need to revert commit 2b15af6f95 now.

I am going to send a revert for at least -net and -stable, since Dmitry
saw this warning again.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: BUG still has locks held in unix_stream_splice_read
  2016-11-17 21:44       ` Cong Wang
@ 2016-11-17 22:27         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Frederic Sowa @ 2016-11-17 22:27 UTC (permalink / raw)
  To: Cong Wang, Al Viro
  Cc: Dmitry Vyukov, David Miller, Eric Dumazet, netdev, LKML,
	syzkaller, Colin Cross, Mandeep Singh Baines

On 17.11.2016 22:44, Cong Wang wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held?  AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
> 
> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
> converts schedule_timeout() to its freezable version, it was probably correct
> at that time, but later, commit 2b514574f7e88c8498027ee366
> ("net: af_unix: implement splice for stream af_unix sockets") breaks its
> requirement for a freezable sleep:
> 
>     commit 0f9548ca10916dec166eaf74c816bded7d8e611d
> 
>     lockdep: check that no locks held at freeze time
> 
>     We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
>     deadlock if the lock is later acquired in the suspend or hibernate path
>     (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
>     cgroup_freezer if a lock is held inside a frozen cgroup that is later
>     acquired by a process outside that group.
> 
> So probably we just need to revert commit 2b15af6f95 now.
> 
> I am going to send a revert for at least -net and -stable, since Dmitry
> saw this warning again.

I am not an expert on freezing but this looks around right from the
freezer code. Awesome, thanks a lot for spotting this one!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-17 22:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-09 10:05 net: BUG still has locks held in unix_stream_splice_read Dmitry Vyukov
2016-10-09 10:06 ` Dmitry Vyukov
2016-10-10  2:46   ` Al Viro
2016-10-10  3:14     ` Al Viro
2016-10-10  8:01       ` Dmitry Vyukov
2016-11-17 18:02         ` Dmitry Vyukov
2016-11-17 21:44       ` Cong Wang
2016-11-17 22:27         ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).