All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>,
	syzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Tejun Heo <tj@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-block@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: general protection fault in wb_workfn (2)
Date: Fri, 8 Jun 2018 19:14:38 +0200	[thread overview]
Message-ID: <CACT4Y+Z4i=bD49cb4pJ5zOMxVu675NT8wCNk+c2n2XJVOXG2bg@mail.gmail.com> (raw)
In-Reply-To: <CACT4Y+bHDWDapwOonO1EpR6TiRa=qf9nSDtHArq75yCGuHf=gg@mail.gmail.com>

On Fri, Jun 8, 2018 at 6:53 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Jun 8, 2018 at 5:16 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Jun 8, 2018 at 4:31 AM, Tetsuo Handa
>>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>> Dmitry Vyukov wrote:
>>>>> On Tue, Jun 5, 2018 at 3:45 PM, Tetsuo Handa
>>>>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>> > Dmitry, can you assign VM resources for a git tree for this bug? This bug wants to fight
>>>>> > against https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches ...
>>>>>
>>>>> Hi Tetsuo,
>>>>>
>>>>> Most of the reasons for not doing it still stand. A syzkaller instance
>>>>> will produce not just this bug, it will produce hundreds of different
>>>>> bugs. Then the question is: what to do with these bugs? Report all to
>>>>> mailing lists?
>>>>
>>>> Is it possible to add linux-next.git tree as a target for fuzzing? If yes,
>>>> we can try debug patches easily, in addition to find bugs earlier than now.
>>>
>>> syzbot tested linux-next and mmotm initially, but they were removed at
>>> the request of kernel developers. See:
>>> https://groups.google.com/d/msg/syzkaller/0H0LHW_ayR8/dsK5qGB_AQAJ
>>> and:
>>> https://groups.google.com/d/msg/syzkaller-bugs/FeAgni6Atlk/U0JGoR0AAwAJ
>>> Indeed, linux-next produces around 50 assorted one-off unexplainable
>>> bug reports.
>>>
>>>
>>>>> I think the solution here is just to run syzkaller instance locally.
>>>>> It's just a program anybody can run it on any kernel with any custom
>>>>> patches. Moreover for local instance it's also possible to limit set
>>>>> of tested syscalls to increase probability of hitting this bug and at
>>>>> the same time filter out most of other bugs.
>>>>
>>>> If this bug is reproducible with VM resources individual developer can afford...
>>>>
>>>> Since my Linux development environment is VMware guests on a Windows PC, I can't
>>>> run VM instance which needs KVM acceleration. Also, due to security policy, I can't
>>>> utilize external VM resources available on the Internet, as well as I can't use ssh
>>>> and git protocols. Speak of this bug, even with a lot of VM instances, syzbot can
>>>> reproduce this bug only once or twice per a day. Thus, the question for me boils
>>>> down to, whether I can reproduce this bug using one VMware guest instance with 4GB
>>>> of memory. Effectively, I don't have access to environments for running syzkaller
>>>> instance...
>>>
>>> Well, I don't know what to say, it does require some resources.
>>>
>>>>> Do we have any idea about the guilty subsystem? You mentioned
>>>>> bdi_unregister, why? What would be the set of syscalls to concentrate
>>>>> on?
>>>>> I will do a custom run when I get around to it, if nobody else beats me to it.
>>>>
>>>> Because bdi_unregister() does "bdi->dev = NULL;" which wb_workfn() is hitting
>>>> NULL pointer dereference.
>>>
>>> Right, wb_workfn is not a generic function, it's fs-specific function.
>>>
>>> Trying to reproduce this locally now.
>>
>>
>> No luck so far.
>>
>> Trying to look from a different angle: is it possible that bdi->dev is
>> not set yet, rather then already reset?
>
>
> I was able to reproduce this once locally running syz-crush utility
> replaying one of the crash logs. Now running with Tetsuo's patch.
>
> I can say we hunting a very subtle race condition with short
> inconsistency window, perhaps few instructions.


Here we go:

[ 2853.033175] WARNING: wb_workfn: device is NULL
[ 2853.034709] wb->state=2
[ 2853.035486] (wb == &wb->bdi->wb)=0
[ 2853.036489] list_empty(&wb->work_list)=1
[ 2853.037603] list_empty(&wb->bdi->bdi_list)=0
[ 2853.038843] wb->bdi->wb.state=0
[ 2853.039819] (wb->congested->__bdi == wb->bdi)=1
[ 2853.041062] list_empty(&wb->congested->__bdi->bdi_list)=0
[ 2853.042609] wb->congested->__bdi->wb.state=0
[ 2853.043793] kasan: CONFIG_KASAN_INLINE enabled
[ 2853.045315] kasan: GPF could be caused by NULL-ptr deref or user
memory access
[ 2853.047376] general protection fault: 0000 [#1] SMP KASAN
[ 2853.048980] CPU: 1 PID: 13971 Comm: kworker/u12:8 Not tainted 4.17.0+ #21
[ 2853.050762] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[ 2853.053034] Workqueue: writeback wb_workfn
[ 2853.054193] RIP: 0010:wb_workfn+0x187/0xab0
[ 2853.055360] Code: 85 70 fd ff ff 48 83 e8 10 48 89 85 60 fd ff ff
e8 5e 38 ab ff 48 8d 7b 50 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
c1 ea 03 <80> 3c 02 00 0f 85 05 08 00 00 4c 8b 63 50 4d 85 e4 0f 84 b5
05 00
[ 2853.060692] RSP: 0018:ffff8800600ef480 EFLAGS: 00010206
[ 2853.062210] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2853.064215] RDX: 000000000000000a RSI: ffffffff81cf0312 RDI: 0000000000000050
[ 2853.066198] RBP: ffff8800600ef750 R08: ffff880061e30400 R09: ffffed000d8ccfc0
[ 2853.068037] R10: ffffed000d8ccfc0 R11: ffff88006c667e07 R12: 1ffff1000c01dead
[ 2853.069970] R13: ffff8800600ef728 R14: ffff8800676bd180 R15: dffffc0000000000
[ 2853.071932] FS:  0000000000000000(0000) GS:ffff88006c640000(0000)
knlGS:0000000000000000
[ 2853.074080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2853.075699] CR2: 00007ffc8478c000 CR3: 0000000008c6a006 CR4: 00000000001626e0
[ 2853.077633] Call Trace:
[ 2853.078341]  ? inode_wait_for_writeback+0x40/0x40
[ 2853.079642]  ? graph_lock+0x170/0x170
[ 2853.080710]  ? lock_downgrade+0x8e0/0x8e0
[ 2853.081889]  ? find_held_lock+0x36/0x1c0
[ 2853.083047]  ? graph_lock+0x170/0x170
[ 2853.084105]  ? lock_acquire+0x1dc/0x520
[ 2853.085216]  ? process_one_work+0xb8c/0x1b70
[ 2853.086425]  ? kasan_check_read+0x11/0x20
[ 2853.087608]  ? __lock_is_held+0xb5/0x140
[ 2853.088720]  process_one_work+0xc64/0x1b70
[ 2853.089875]  ? finish_task_switch+0x182/0x840
[ 2853.091085]  ? pwq_dec_nr_in_flight+0x490/0x490
[ 2853.092358]  ? __schedule+0x809/0x1e30
[ 2853.093475]  ? retint_kernel+0x10/0x10
[ 2853.094561]  ? retint_kernel+0x10/0x10
[ 2853.095610]  ? graph_lock+0x170/0x170
[ 2853.096623]  ? trace_hardirqs_on_caller+0x421/0x5c0
[ 2853.097981]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 2853.099298]  ? find_held_lock+0x36/0x1c0
[ 2853.100394]  ? lock_acquire+0x1dc/0x520
[ 2853.101472]  ? worker_thread+0x3d4/0x13a0
[ 2853.102558]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 2853.104060]  ? move_linked_works+0x2f6/0x470
[ 2853.105204]  ? trace_event_raw_event_workqueue_execute_start+0x290/0x290
[ 2853.107013]  ? do_raw_spin_trylock+0x1b0/0x1b0
[ 2853.108243]  worker_thread+0x9e5/0x13a0
[ 2853.109304]  ? process_one_work+0x1b70/0x1b70
[ 2853.110488]  ? graph_lock+0x170/0x170
[ 2853.111514]  ? find_held_lock+0x36/0x1c0
[ 2853.112610]  ? find_held_lock+0x36/0x1c0
[ 2853.113674]  ? __schedule+0x1e30/0x1e30
[ 2853.114714]  ? do_raw_spin_unlock+0x1f9/0x2e0
[ 2853.115885]  ? do_raw_spin_trylock+0x1b0/0x1b0
[ 2853.117060]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 2853.118492]  ? __kthread_parkme+0x111/0x1d0
[ 2853.119616]  ? parse_args.cold.15+0x1b3/0x1b3
[ 2853.120804]  ? trace_hardirqs_on_caller+0x421/0x5c0
[ 2853.122071]  ? trace_hardirqs_on+0xd/0x10
[ 2853.123117]  kthread+0x345/0x410
[ 2853.123999]  ? process_one_work+0x1b70/0x1b70
[ 2853.125174]  ? kthread_bind+0x40/0x40
[ 2853.126201]  ret_from_fork+0x3a/0x50
[ 2853.127199] Modules linked in:
[ 2853.128022] Dumping ftrace buffer:
[ 2853.128901]    (ftrace buffer empty)
[ 2853.129986] ---[ end trace 3ba28e076cb32fda ]---
[ 2853.131269] RIP: 0010:wb_workfn+0x187/0xab0
[ 2853.132441] Code: 85 70 fd ff ff 48 83 e8 10 48 89 85 60 fd ff ff
e8 5e 38 ab ff 48 8d 7b 50 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
c1 ea 03 <80> 3c 02 00 0f 85 05 08 00 00 4c 8b 63 50 4d 85 e4 0f 84 b5
05 00
[ 2853.137449] RSP: 0018:ffff8800600ef480 EFLAGS: 00010206
[ 2853.138618] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2853.140176] RDX: 000000000000000a RSI: ffffffff81cf0312 RDI: 0000000000000050
[ 2853.141722] RBP: ffff8800600ef750 R08: ffff880061e30400 R09: ffffed000d8ccfc0
[ 2853.143300] R10: ffffed000d8ccfc0 R11: ffff88006c667e07 R12: 1ffff1000c01dead
[ 2853.144841] R13: ffff8800600ef728 R14: ffff8800676bd180 R15: dffffc0000000000
[ 2853.146391] FS:  0000000000000000(0000) GS:ffff88006c640000(0000)
knlGS:0000000000000000
[ 2853.148141] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2853.149406] CR2: 00007ffc8478c000 CR3: 0000000008c6a006 CR4: 00000000001626e0
[ 2853.150968] Kernel panic - not syncing: Fatal exception
[ 2853.152419] Dumping ftrace buffer:
[ 2853.153121]    (ftrace buffer empty)
[ 2853.153786] Kernel Offset: disabled
[ 2853.154442] Rebooting in 86400 seconds..

  reply	other threads:[~2018-06-08 17:14 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-26  9:15 general protection fault in wb_workfn (2) syzbot
2018-05-27  0:47 ` Tetsuo Handa
2018-05-27  2:21   ` [PATCH] bdi: Fix another oops in wb_workfn() Tetsuo Handa
2018-05-27  2:36     ` Tejun Heo
2018-05-27  4:43       ` Tetsuo Handa
2018-05-29 13:46         ` Tejun Heo
2018-05-28 13:35   ` general protection fault in wb_workfn (2) Jan Kara
2018-05-30 16:00     ` Tetsuo Handa
2018-05-30 16:00       ` Tetsuo Handa
2018-05-31 11:42       ` Jan Kara
2018-05-31 13:19         ` Tetsuo Handa
2018-05-31 13:42           ` Jan Kara
2018-05-31 16:56             ` Jens Axboe
2018-06-05 13:45               ` Tetsuo Handa
2018-06-07 18:46                 ` Dmitry Vyukov
2018-06-08  2:31                   ` Tetsuo Handa
2018-06-08 14:45                     ` Dmitry Vyukov
2018-06-08 15:16                       ` Dmitry Vyukov
2018-06-08 16:53                         ` Dmitry Vyukov
2018-06-08 17:14                           ` Dmitry Vyukov [this message]
2018-06-09  5:30                             ` Tetsuo Handa
2018-06-09 14:00                               ` [PATCH] bdi: Fix another oops in wb_workfn() Tetsuo Handa
2018-06-11  9:12                                 ` Jan Kara
2018-06-11 16:01                                   ` Tejun Heo
2018-06-11 16:29                                     ` Jan Kara
2018-06-11 17:20                                       ` Tejun Heo
2018-06-12 15:57                                         ` Jan Kara
2018-06-13 10:43                                           ` Tetsuo Handa
2018-06-13 11:51                                             ` Tetsuo Handa
2018-06-13 14:06                                             ` Linus Torvalds
2018-06-13 14:46                                             ` Jan Kara
2018-06-13 14:46                                               ` Jan Kara
2018-06-13 14:55                                               ` Linus Torvalds
2018-06-13 16:20                                               ` Tetsuo Handa
2018-06-13 16:25                                                 ` Linus Torvalds
2018-06-13 16:45                                                   ` Jan Kara
2018-06-13 21:04                                                     ` Tetsuo Handa
2018-06-14 10:11                                                       ` Jan Kara
2018-06-13 14:33                                           ` Tejun Heo
2018-06-15 12:06                                             ` Jan Kara
2018-06-15 12:06                                               ` Jan Kara
2018-06-18 12:27                                               ` Jan Kara
2018-06-01  2:30             ` general protection fault in wb_workfn (2) Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+Z4i=bD49cb4pJ5zOMxVu675NT8wCNk+c2n2XJVOXG2bg@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.