* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 4:08 ` Al Viro
@ 2019-03-12 8:00 ` Jani Nikula
2019-03-12 14:29 ` Tetsuo Handa
2019-03-12 17:10 ` Dmitry Vyukov
2 siblings, 0 replies; 17+ messages in thread
From: Jani Nikula @ 2019-03-12 8:00 UTC (permalink / raw)
To: Al Viro, syzbot
Cc: airlied, akpm, amir73il, chris, darrick.wong, david, dri-devel,
dvyukov, eparis, hannes, hughd, intel-gfx, jack, joonas.lahtinen,
jrdr.linux, linux-kernel, linux-mm, mingo, mszeredi,
penguin-kernel, peterz, rodrigo.vivi, syzkaller-bugs, willy
On Tue, 12 Mar 2019, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Mar 11, 2019 at 08:59:00PM -0700, syzbot wrote:
>> syzbot has bisected this bug to:
>>
>> commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>> Date: Thu Feb 8 10:54:48 2018 +0000
>>
>> drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early
>>
>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
>> start commit: 34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
>> git tree: upstream
>> final crash: https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
>> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
>> userspace arch: amd64
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
>>
>> Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
>> Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in
>> i915_driver_init_early")
>
> Umm... Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.
If it's not a false positive, it's bound to be good source material for
IOCCC.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 4:08 ` Al Viro
2019-03-12 8:00 ` Jani Nikula
@ 2019-03-12 14:29 ` Tetsuo Handa
2019-03-12 17:15 ` Dmitry Vyukov
2019-03-12 17:10 ` Dmitry Vyukov
2 siblings, 1 reply; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-12 14:29 UTC (permalink / raw)
To: syzbot, dvyukov, syzkaller-bugs; +Cc: Al Viro, linux-kernel
(Moving most recipients to bcc: in order to avoid flooding.)
On 2019/03/12 13:08, Al Viro wrote:
> Umm... Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.
Well, 108 out of 168 tests done during this bisection failed to test.
With such high failure ratio, it is possible that by chance no crash
happened during few tests for specific commit; causing a wrong bisection
result. I expect that when trying to conclude "git bisect good" for
specific commit, the tests should be repeated until no crash happened
during 8 successful tests.
Also, this bisection is finding multiple different crash patterns, which
suggests that the crashed tests are not giving correct feedback to syzbot.
$ grep -F 'run #' bisect.txt\?x\=13220283200000 | wc -l
168
$ grep -F 'Connection timed out' bisect.txt\?x\=13220283200000 | wc -l
108
$ grep -F 'crashed' bisect.txt\?x\=13220283200000
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in do_iter_write
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 14:29 ` Tetsuo Handa
@ 2019-03-12 17:15 ` Dmitry Vyukov
2019-03-12 21:11 ` Tetsuo Handa
0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-12 17:15 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML
On Tue, Mar 12, 2019 at 3:30 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> (Moving most recipients to bcc: in order to avoid flooding.)
>
> On 2019/03/12 13:08, Al Viro wrote:
> > Umm... Might be a good idea to add some plausibility filters - it is,
> > in theory, possible that adding a line in a comment changes behaviour
> > (without compiler bugs, even - playing with __LINE__ is all it would
> > take), but the odds that it's _not_ a false positive are very low.
>
> Well, 108 out of 168 tests done during this bisection failed to test.
> With such high failure ratio, it is possible that by chance no crash
> happened during few tests for specific commit; causing a wrong bisection
> result. I expect that when trying to conclude "git bisect good" for
> specific commit, the tests should be repeated until no crash happened
> during 8 successful tests.
Added to https://github.com/google/syzkaller/issues/1051:
Tetsuo points out that if lots (say, 7/8) tests failed with infra
problems, then we should retry/skip or something. This zeroes the
effect of having multiple independent tests.
Thanks.
> Also, this bisection is finding multiple different crash patterns, which
> suggests that the crashed tests are not giving correct feedback to syzbot.
Treating different crashes as just "crash" is intended. Kernel bugs
can manifest in very different ways.
Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
in parallel" in https://syzkaller.appspot.com/?fixed=upstream
It lead to 50+ different failure modes.
> $ grep -F 'run #' bisect.txt\?x\=13220283200000 | wc -l
> 168
> $ grep -F 'Connection timed out' bisect.txt\?x\=13220283200000 | wc -l
> 108
> $ grep -F 'crashed' bisect.txt\?x\=13220283200000
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #4: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #3: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in do_iter_write
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 17:15 ` Dmitry Vyukov
@ 2019-03-12 21:11 ` Tetsuo Handa
2019-03-13 6:43 ` Dmitry Vyukov
0 siblings, 1 reply; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-12 21:11 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML
On 2019/03/13 2:15, Dmitry Vyukov wrote:
>> Also, this bisection is finding multiple different crash patterns, which
>> suggests that the crashed tests are not giving correct feedback to syzbot.
>
> Treating different crashes as just "crash" is intended. Kernel bugs
> can manifest in very different ways.
> Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> in parallel" in https://syzkaller.appspot.com/?fixed=upstream
> It lead to 50+ different failure modes.
>
But syzbot already found a rather simple C reproducer
( https://syzkaller.appspot.com/text?tag=ReproC&x=116fc7a8c00000 ) for this bug.
Was this reproducer used for bisection? I guess that if this reproducer was used,
syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.
Also, humans can sometimes find more simpler C reproducers from syzbot provided
reproducers. It would be nice if syzbot can accept and use a user defined C
reproducer for testing.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 21:11 ` Tetsuo Handa
@ 2019-03-13 6:43 ` Dmitry Vyukov
2019-03-13 16:37 ` Theodore Ts'o
2019-03-13 23:40 ` Eric Biggers
0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-13 6:43 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML
On Tue, Mar 12, 2019 at 10:11 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/13 2:15, Dmitry Vyukov wrote:
> >> Also, this bisection is finding multiple different crash patterns, which
> >> suggests that the crashed tests are not giving correct feedback to syzbot.
> >
> > Treating different crashes as just "crash" is intended. Kernel bugs
> > can manifest in very different ways.
> > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> > in parallel" in https://syzkaller.appspot.com/?fixed=upstream
> > It lead to 50+ different failure modes.
> >
>
> But syzbot already found a rather simple C reproducer
> ( https://syzkaller.appspot.com/text?tag=ReproC&x=116fc7a8c00000 ) for this bug.
> Was this reproducer used for bisection?
The C reproducer used for bisection is provided as "C reproducer" in
the bisection report.
> I guess that if this reproducer was used,
> syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.
Maybe. But we won't have more than 1 in future. Currently syzbot
bisects over a backlog of crashes, some of them accumulated multiple
reproducers over weeks/months/years. When it will bisect newly
reported bugs as they are found, there will be only 1 reproducer. E.g.
these two for this bug were found within a month.
> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> reproducers. It would be nice if syzbot can accept and use a user defined C
> reproducer for testing.
It would be more useful to accept patches that make syzkaller create
better reproducers from these people. Manual work is not scalable. We
would need 10 reproducers per day for a dozen of OSes (incl some
private kernels/branches). Anybody is free to run syzkaller manually
and do full manual (perfect) reporting. But for us it become clear
very early that it won't work. Then see above, while that human is
sleeping/on weekend/vacation, syzbot will already bisect own
reproducer. Adding manual reproducer later won't help in any way.
syzkaller already does lots of smart work for reproducers. Let's not
give up on the last mile and switch back to all manual work.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-13 6:43 ` Dmitry Vyukov
@ 2019-03-13 16:37 ` Theodore Ts'o
2019-03-13 16:56 ` Dmitry Vyukov
2019-03-13 23:40 ` Eric Biggers
1 sibling, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2019-03-13 16:37 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Al Viro, LKML
On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
I suspect a scalable solution that would significantly improve things
is one where Syzbot tries N times for a "good" result to make sure
it's not a flaky pass. N could either be hard-coded to some value
like 8 or 10, or Syzbot could experimentally try to figure out how
reliable the reproducer happens to be, and figure out what an ideal
"N" value should be for a particular reproducer.
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-13 16:37 ` Theodore Ts'o
@ 2019-03-13 16:56 ` Dmitry Vyukov
0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-13 16:56 UTC (permalink / raw)
To: Theodore Ts'o, Dmitry Vyukov, Tetsuo Handa, syzbot,
syzkaller-bugs, Al Viro, LKML
On Wed, Mar 13, 2019 at 5:37 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> > It would be more useful to accept patches that make syzkaller create
> > better reproducers from these people. Manual work is not scalable. We
> > would need 10 reproducers per day for a dozen of OSes (incl some
> > private kernels/branches). Anybody is free to run syzkaller manually
> > and do full manual (perfect) reporting. But for us it become clear
> > very early that it won't work. Then see above, while that human is
> > sleeping/on weekend/vacation, syzbot will already bisect own
> > reproducer. Adding manual reproducer later won't help in any way.
> > syzkaller already does lots of smart work for reproducers. Let's not
> > give up on the last mile and switch back to all manual work.
>
> I suspect a scalable solution that would significantly improve things
> is one where Syzbot tries N times for a "good" result to make sure
> it's not a flaky pass. N could either be hard-coded to some value
> like 8 or 10, or Syzbot could experimentally try to figure out how
> reliable the reproducer happens to be, and figure out what an ideal
> "N" value should be for a particular reproducer.
It currently tries 8 times, see e.g.:
https://syzkaller.appspot.com/text?tag=Log&x=13354d9d200000
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-13 6:43 ` Dmitry Vyukov
2019-03-13 16:37 ` Theodore Ts'o
@ 2019-03-13 23:40 ` Eric Biggers
2019-03-14 10:52 ` Tetsuo Handa
1 sibling, 1 reply; 17+ messages in thread
From: Eric Biggers @ 2019-03-13 23:40 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Al Viro, LKML
On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > Also, humans can sometimes find more simpler C reproducers from syzbot provided
> > reproducers. It would be nice if syzbot can accept and use a user defined C
> > reproducer for testing.
>
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
>
Well, it's very tough and not many people are familiar with the syzkaller
codebase, let alone have time to contribute. But having simplified a lot of
the syzkaller reproducers manually, the main things I do are:
- Replace bare system calls with proper C library calls. For example:
#include <sys/syscall.h>
syscall(__NR_socket, 0xa, 6, 0);
becomes:
#include <sys/socket.h>
socket(AF_INET, SOCK_DCCP, 0);
- Do the same for structs. Use the appropriate C header rather than filling in
each struct manually. For example:
*(uint16_t*)0x20000000 = 0xa;
*(uint16_t*)0x20000002 = htobe16(0x4e20);
*(uint32_t*)0x20000004 = 0;
*(uint8_t*)0x20000008 = 0;
*(uint8_t*)0x20000009 = 0;
*(uint8_t*)0x2000000a = 0;
*(uint8_t*)0x2000000b = 0;
*(uint8_t*)0x2000000c = 0;
*(uint8_t*)0x2000000d = 0;
*(uint8_t*)0x2000000e = 0;
*(uint8_t*)0x2000000f = 0;
*(uint8_t*)0x20000010 = 0;
*(uint8_t*)0x20000011 = 0;
*(uint8_t*)0x20000012 = 0;
*(uint8_t*)0x20000013 = 0;
*(uint8_t*)0x20000014 = 0;
*(uint8_t*)0x20000015 = 0;
*(uint8_t*)0x20000016 = 0;
*(uint8_t*)0x20000017 = 0;
*(uint32_t*)0x20000018 = 0;
becomes:
struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = htobe16(0x4e20) };
- Put arguments on the stack rather than in a mmap'd region, if possible.
- Simplify any calls to the helper functions that syzkaller emits, e.g.
syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff. Usually
the reproducer needs a small subset of the functionality to work.
- For multithreaded reproducers, try to incrementally simplify the threading
strategy. For example, reduce the number of threads by combining operations.
Also try running the operations in loops. Also, using fork() can often result
in a simpler reproducer than pthreads.
- Instead of using the 'r[]' array to hold all integer return values, give them
appropriate names.
- Remove duplicate #includes.
- Considering the actual kernel code and the bug, if possible find a different
way to trigger the same bug that's simpler or more reliable. If the problem
is obvious it may be possible to jump right to this step from the beginning.
Some gotchas:
- fault-nth injections are fragile, since the number of memory allocations in a
particular system call varies by kernel config and kernel version.
Incrementing n starting from 1 is more reliable.
- Some of the perf_event_open() reproducers are fragile because they hardcode a
trace event ID, which can change in every kernel version. Reading the trace
event ID from /sys/kernel/debug/tracing/events/ is more reliable.
- Reproducers using the KVM API sometimes only work on certain processors (e.g.
Intel but not AMD) or even depend on the host kernel.
- Reproducers that access the local filesystem sometimes assume that it's ext4.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-13 23:40 ` Eric Biggers
@ 2019-03-14 10:52 ` Tetsuo Handa
2019-03-20 12:49 ` Dmitry Vyukov
2019-03-20 13:45 ` Dmitry Vyukov
0 siblings, 2 replies; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-14 10:52 UTC (permalink / raw)
To: Eric Biggers, Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML
On 2019/03/14 8:40, Eric Biggers wrote:
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
>>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
>>> reproducers. It would be nice if syzbot can accept and use a user defined C
>>> reproducer for testing.
>>
>> It would be more useful to accept patches that make syzkaller create
>> better reproducers from these people. Manual work is not scalable. We
>> would need 10 reproducers per day for a dozen of OSes (incl some
>> private kernels/branches). Anybody is free to run syzkaller manually
>> and do full manual (perfect) reporting. But for us it become clear
>> very early that it won't work. Then see above, while that human is
>> sleeping/on weekend/vacation, syzbot will already bisect own
>> reproducer. Adding manual reproducer later won't help in any way.
>> syzkaller already does lots of smart work for reproducers. Let's not
>> give up on the last mile and switch back to all manual work.
>>
>
> Well, it's very tough and not many people are familiar with the syzkaller
> codebase, let alone have time to contribute.
Right. I don't read/write go programs. I don't have access to environments
for running syzbot. But instead I try to write kernel patches.
Also, although anybody is free to do full manual (perfect) reporting,
I can't afford checking such reports posted to e.g. LKML. I can afford
checking only https://syzkaller.appspot.com/ .
I have seen a Japanese article which explains how to run syzbot. But I felt that
that article lacks what to do if syzbot found a bug. If people found a crash
by running syzbot in their environments, it would be nice if they can export
the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
acts as if a bugzilla).
> But having simplified a lot of
> the syzkaller reproducers manually, the main things I do are:
Yes. I'm doing similar things. Other things not listed here are:
Try to remove syscall() which passes EOF as fd argument, for it should be
unrelated to the problem unless such call affects subtle timing.
Try to remove code for testing fuse / tun etc. if the problem seems to be
unrelated to fuse / tun etc.
syzbot gets pleased with finding one C reproducer, but I wish that syzbot
continues trying to find smaller C reproducers by e.g. eliminating unrelated
calls.
>
> - Replace bare system calls with proper C library calls. For example:
>
> #include <sys/syscall.h>
>
> syscall(__NR_socket, 0xa, 6, 0);
>
> becomes:
>
> #include <sys/socket.h>
>
> socket(AF_INET, SOCK_DCCP, 0);
Yes. It would be nice if C reproducers are provided using symbols. I run
syzbot provided C reproducers under strace because strace gives me more hints
about symbols and structures.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-14 10:52 ` Tetsuo Handa
@ 2019-03-20 12:49 ` Dmitry Vyukov
2019-03-20 13:45 ` Dmitry Vyukov
1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 12:49 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Eric Biggers, syzbot, syzkaller-bugs, Al Viro, LKML
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).
>
> > But having simplified a lot of
> > the syzkaller reproducers manually, the main things I do are:
>
> Yes. I'm doing similar things. Other things not listed here are:
>
> Try to remove syscall() which passes EOF as fd argument, for it should be
> unrelated to the problem unless such call affects subtle timing.
>
> Try to remove code for testing fuse / tun etc. if the problem seems to be
> unrelated to fuse / tun etc.
>
> syzbot gets pleased with finding one C reproducer, but I wish that syzbot
> continues trying to find smaller C reproducers by e.g. eliminating unrelated
> calls.
>
> >
> > - Replace bare system calls with proper C library calls. For example:
> >
> > #include <sys/syscall.h>
> >
> > syscall(__NR_socket, 0xa, 6, 0);
> >
> > becomes:
> >
> > #include <sys/socket.h>
> >
> > socket(AF_INET, SOCK_DCCP, 0);
>
> Yes. It would be nice if C reproducers are provided using symbols. I run
> syzbot provided C reproducers under strace because strace gives me more hints
> about symbols and structures.
I will answer re reproducers first.
Thanks for the suggestions, I've filed
https://github.com/google/syzkaller/issues/1070 for this. Lots of them
are implementable within the current framework. Things on kernel
mailing lists (suggestions, bug reports, patches) get lost very
quickly.
As far as I see most of them are related to cosmetics (not saying that
it's not useful, but just won't affect bisection results).
From my experience the most powerful simplifications are possible only
when I have already root caused the bug and understand its mechanics.
Then it's possible to reorder syscalls, remove all/most of threading,
etc. But in that case bisection is not so useful already.
Some of them can alter the load on kernel (e.g. using libc structs and
syscall wrappers) which can lead to triggering of a different bug...
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-14 10:52 ` Tetsuo Handa
2019-03-20 12:49 ` Dmitry Vyukov
@ 2019-03-20 13:45 ` Dmitry Vyukov
1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:45 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Eric Biggers, syzbot, syzkaller-bugs, Al Viro, LKML
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).
Problem 1 (smaller). Neither providing custom program nor manually
specifying bisection range (as you suggested in another thread
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/1OTVIuzBAgAJ)
won't make kernel bug bisection reliable. The problems with kernel
bisection are deeper. Consider a bug that is inherently hard to
trigger, even if one provides own reproducer it's still hard to
trigger and bisection can diverge. What happened in the other bug:
bisection diverged because the reproducer triggered another bug. Now
consider that this happens within the bisection range. Even if you
give own range, it won't help. And there are lots of other problems
like, say, large ranges where kernel build is broken.
And this will introduce own problems: e.g. it's very easy to give
syzbot a reproducer that actually don't not trigger the bug for it
(because you can't match its environment precisely).
Also: if you can't bisect locally and can't test, how do you know the
right range generally? Again that one bug was a single corner case.
Also: semi-manual process will also lead to some suboptimal results,
and then other kernel developers will come and ask questions and
somebody will need to answer these questions. But in this case syzbot
is not even accountable for what happened.
I don't think there is a simple substitution for a qualified engineer
doing its job (guiding each step of bisection manually).
It's possible to imagine a very complex workflow (super hard to
implement, test and maintain too) that will allow to do that. And it
becomes mostly offloading build/boot/test of a given configuration to
the cloud. And this brings this us to the second problem.
Problem 2. What you are proposing effectively looks like some kind of
custom workload offloading service for kernel developers. Just instead
of console commands (raw cloud VMs) it has somewhat higher level
interface (e.g. here is kernel config, compiler, command line,
sysctls, machine configuration and test case, go build and test it).
I don't think this should be bolted on top of syzbot.
Developing and running syzbot is already a _huge_ amount of work
(frequently ungrateful). I simply cannot take on developing, testing,
deploying, maintaining and operating another service. And that service
will involve much more complex human interactions, so will be much
more complex overall.
If such service is provided I think it needs to run on Linux
Foundation infrastructure that runs CI and other testing. Yes, I know,
it does not exist. But that would be the right place. It would benefit
work on all other kernel bugs too. Lots of things people attribute to
syzbot are really not specific to syzbot in anyway. For example that
service would help with bisection of all other bugs too. And it seems
that a much simpler solution would be just to provide free VMs for
developers, because you main points seems to be "I would like to do
something custom, but I don't have resources for that". This is out of
scope for syzbot.
The current syzbot scope is: automating as much as possible, solving
common cases at scale (including other OSes and kernel branches),
bringing developers enough information to pick up the bug from there
and do any custom work necessary to debug and fix the bug (there
always will be custom work! even perfect bisection can get you nowhere
re root causing and there are still bugs without reproducers). We can
solve some surrounding problems too _iff_ they are common enough, have
high bang for the buck, reasonably easy to implement and don't cause
long-term maintenance toll. This one does not look like such problem.
Sorry.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: INFO: rcu detected stall in sys_sendfile64 (2)
2019-03-12 4:08 ` Al Viro
2019-03-12 8:00 ` Jani Nikula
2019-03-12 14:29 ` Tetsuo Handa
@ 2019-03-12 17:10 ` Dmitry Vyukov
2 siblings, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-12 17:10 UTC (permalink / raw)
To: Al Viro
Cc: syzbot, David Airlie, Andrew Morton, Amir Goldstein,
Chris Wilson, Darrick J. Wong, Dave Chinner, DRI, eparis,
Johannes Weiner, Hugh Dickins, intel-gfx, Jan Kara, Jani Nikula,
Joonas Lahtinen, Souptick Joarder, LKML, Linux-MM, Ingo Molnar,
mszeredi, Tetsuo Handa, Peter Zijlstra, Rodrigo Vivi,
syzkaller-bugs, Matthew Wilcox
On Tue, Mar 12, 2019 at 5:08 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Mar 11, 2019 at 08:59:00PM -0700, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Thu Feb 8 10:54:48 2018 +0000
> >
> > drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
> > start commit: 34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
> > git tree: upstream
> > final crash: https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
> > dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
> > userspace arch: amd64
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
> >
> > Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
> > Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in
> > i915_driver_init_early")
>
> Umm... Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.
Thanks for pointing this out.
I've started collecting all such cases, so that we are able to draw
broader conclusions later:
https://github.com/google/syzkaller/issues/1051
added for this one:
=========
A mix of problems: unrelated bug triggered by the same repro
("WARNING: ODEBUG bug in netdev_freemem"); lots of infrastructure
failures ("failed to copy test binary to VM"); also the original
failure seems to be flaky. All this contributed to pointing to a
random commit.
Al Viro points out that the commit only touches comments, so we could
mark the end result as suspicious.
=========
The infrastructure problems is definitely something we need to fix
("failed to copy test binary to VM") (currently the machine hangs
periodically with lots of time consumed by dmcrypt, but I don't know
if it's related or not yet).
Re the comment-only changes, I would like to see more cases where it
would help before we start creating new universes for this. We could
parse sources with clang to understand that a change was comment-only,
but I guess kernel is mostly broken with clang throughout history....
^ permalink raw reply [flat|nested] 17+ messages in thread