netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 9p/trans_fd lockup
@ 2020-08-06 11:08 Alexey Kardashevskiy
  2020-08-06 12:38 ` Dominique Martinet
  0 siblings, 1 reply; 3+ messages in thread
From: Alexey Kardashevskiy @ 2020-08-06 11:08 UTC (permalink / raw)
  To: v9fs-developer; +Cc: Greg Kurz, Dominique Martinet, netdev

Hi!

I am seeing another bug in 9p under syzkaller, the reprocase is:

r0 = open$dir(&(0x7f0000000040)='./file0\x00', 0x88142, 0x182)

r1 = openat$null(0xffffffffffffff9c, &(0x7f0000000640)='/dev/null\x00',
0x0, 0x0)
mount$9p_fd(0x0, &(0x7f0000000000)='./file0\x00',
&(0x7f00000000c0)='9p\x00', 0x0, &(0x7f0000000100)={'trans=fd,',
{'rfdno', 0x3d, r1}, 0x2$, {'wfdno', 0x3d, r0}})



The default behaviour of syzkaller is to call syscalls concurrently (I
think), at least it forks by default and executes the same sequence in
both threads.

In this example both threads makes it to:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n757

and sit there with the only difference which is thread#1 goes via

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n767

I am pretty sure things should not have gone that far but I cannot
clearly see what needs fixing. Ideas? Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 9p/trans_fd lockup
  2020-08-06 11:08 9p/trans_fd lockup Alexey Kardashevskiy
@ 2020-08-06 12:38 ` Dominique Martinet
  2020-08-07  1:43   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 3+ messages in thread
From: Dominique Martinet @ 2020-08-06 12:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: v9fs-developer, Greg Kurz, netdev

Alexey Kardashevskiy wrote on Thu, Aug 06, 2020:
> I am seeing another bug in 9p under syzkaller, the reprocase is:
> 
> r0 = open$dir(&(0x7f0000000040)='./file0\x00', 0x88142, 0x182)
> 
> r1 = openat$null(0xffffffffffffff9c, &(0x7f0000000640)='/dev/null\x00',
> 0x0, 0x0)
> mount$9p_fd(0x0, &(0x7f0000000000)='./file0\x00',
> &(0x7f00000000c0)='9p\x00', 0x0, &(0x7f0000000100)={'trans=fd,',
> {'rfdno', 0x3d, r1}, 0x2$, {'wfdno', 0x3d, r0}})
> 
> 
> 
> The default behaviour of syzkaller is to call syscalls concurrently (I
> think), at least it forks by default and executes the same sequence in
> both threads.
> 
> In this example both threads makes it to:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n757
> 
> and sit there with the only difference which is thread#1 goes via
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n767
> 
> I am pretty sure things should not have gone that far but I cannot
> clearly see what needs fixing. Ideas? Thanks,

Unkillable threads there happen with the current p9_client_rpc when
there is no real server (or server bug etc); the code is really stupid.

Basically what happens is that when you send a first signal (^C or
whatever), the function catches the signal, sends a flush, and
indefinitely waits for the flush to come back.
If you send another signal there no more flush comes but it goes back to
waiting -- it's using wait_event_killable but it's a lie it's not really
killable it just loops on that wait until the flush finally comes, which
will never come in your case.
(the rpc that came by the way is probably version or whatever is first
done on mount)


Dmitry reported that to me ages ago and I have a fix which is just to
stop waiting for the flush -- just make it asynchronous, send and
forget. That removes the whole signal handling logic and it won't hang
there anymore.

I sent the patches to the list last year, but didn't get much feedback
and didn't have time to run all the tests I wanted to run on it.


I have some free time at the end of the month so I was planning to
finish it for 5.10 (e.g. won't send it for 5.9 but once 5.9 initial
merge window passed leave it in -next for a couple of months and push it
for 5.10), so your timing is pretty good :)
An extra pair of eyes would be more than appreciated.

You can find the original mails there:
https://lore.kernel.org/lkml/1544532108-21689-3-git-send-email-asmadeus@codewreck.org/

They're also in my 9p-test branch on git://github.com/martinetd/linux


Cheers & thanks for the attention,
-- 
Dominique

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 9p/trans_fd lockup
  2020-08-06 12:38 ` Dominique Martinet
@ 2020-08-07  1:43   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 3+ messages in thread
From: Alexey Kardashevskiy @ 2020-08-07  1:43 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: v9fs-developer, Greg Kurz, netdev



On 06/08/2020 22:38, Dominique Martinet wrote:
> Alexey Kardashevskiy wrote on Thu, Aug 06, 2020:
>> I am seeing another bug in 9p under syzkaller, the reprocase is:
>>
>> r0 = open$dir(&(0x7f0000000040)='./file0\x00', 0x88142, 0x182)
>>
>> r1 = openat$null(0xffffffffffffff9c, &(0x7f0000000640)='/dev/null\x00',
>> 0x0, 0x0)
>> mount$9p_fd(0x0, &(0x7f0000000000)='./file0\x00',
>> &(0x7f00000000c0)='9p\x00', 0x0, &(0x7f0000000100)={'trans=fd,',
>> {'rfdno', 0x3d, r1}, 0x2$, {'wfdno', 0x3d, r0}})
>>
>>
>>
>> The default behaviour of syzkaller is to call syscalls concurrently (I
>> think), at least it forks by default and executes the same sequence in
>> both threads.
>>
>> In this example both threads makes it to:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n757
>>
>> and sit there with the only difference which is thread#1 goes via
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/9p/client.c?h=v5.8#n767
>>
>> I am pretty sure things should not have gone that far but I cannot
>> clearly see what needs fixing. Ideas? Thanks,
> 
> Unkillable threads there happen with the current p9_client_rpc when
> there is no real server (or server bug etc); the code is really stupid.
> 
> Basically what happens is that when you send a first signal (^C or
> whatever), the function catches the signal, sends a flush, and
> indefinitely waits for the flush to come back.
> If you send another signal there no more flush comes but it goes back to
> waiting -- it's using wait_event_killable but it's a lie it's not really
> killable it just loops on that wait until the flush finally comes, which
> will never come in your case.
> (the rpc that came by the way is probably version or whatever is first
> done on mount)
> 
> 
> Dmitry reported that to me ages ago and I have a fix which is just to
> stop waiting for the flush -- just make it asynchronous, send and
> forget. That removes the whole signal handling logic and it won't hang
> there anymore.
> 
> I sent the patches to the list last year, but didn't get much feedback
> and didn't have time to run all the tests I wanted to run on it.
> 
> 
> I have some free time at the end of the month so I was planning to
> finish it for 5.10 (e.g. won't send it for 5.9 but once 5.9 initial
> merge window passed leave it in -next for a couple of months and push it
> for 5.10), so your timing is pretty good :)
> An extra pair of eyes would be more than appreciated.
> 
> You can find the original mails there:
> https://lore.kernel.org/lkml/1544532108-21689-3-git-send-email-asmadeus@codewreck.org/
> 
> They're also in my 9p-test branch on git://github.com/martinetd/linux

Thanks for the patches, they fix my case indeed and I'll continue with
them, let's see what else syzkaller finds :)


> 
> 
> Cheers & thanks for the attention,
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-07  1:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 11:08 9p/trans_fd lockup Alexey Kardashevskiy
2020-08-06 12:38 ` Dominique Martinet
2020-08-07  1:43   ` Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).