From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753750AbdKMTNb (ORCPT ); Mon, 13 Nov 2017 14:13:31 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:57309 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752547AbdKMTN0 (ORCPT ); Mon, 13 Nov 2017 14:13:26 -0500 X-Google-Smtp-Source: AGs4zMb7G8BGCEIiczcJJ20HYqQDEaSzl9sBze6yRV7adc2uMRnMWKvNy2Ff1sNww3W9Fpbp5T8yxg== Date: Mon, 13 Nov 2017 11:13:22 -0800 From: Eric Biggers To: Eryu Guan Cc: Fengguang Wu , Lukas Czerner , Jan Kara , "Darrick J. Wong" , Jeff Moyer , Jens Axboe , Linus Torvalds , Linux Kernel Mailing List , linux-fsdevel Subject: Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220 Message-ID: <20171113191322.GA123488@gmail.com> References: <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com> <20171030072021.gcgpaolo5m3myuln@wfg-t540p.sh.intel.com> <20171030074429.GC3283@eguan.usersys.redhat.com> <20171031001041.5qjzn5pjertpdc3e@wfg-t540p.sh.intel.com> <20171031065417.GD3283@eguan.usersys.redhat.com> <20171106011306.GB11631@zzz.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171106011306.GB11631@zzz.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 05, 2017 at 05:13:06PM -0800, Eric Biggers wrote: > On Tue, Oct 31, 2017 at 02:54:17PM +0800, Eryu Guan wrote: > > On Tue, Oct 31, 2017 at 01:10:41AM +0100, Fengguang Wu wrote: > > > Hi Eryu, > > > > > > On Mon, Oct 30, 2017 at 03:44:29PM +0800, Eryu Guan wrote: > > > > Hi Fengguang, > > > > > > > > On Mon, Oct 30, 2017 at 08:20:21AM +0100, Fengguang Wu wrote: > > > > > CC fsdevel. > > > > > > > > > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > > > > > Hi Linus, > > > > > > > > > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > > > > > > > > > They hit the RC release mainly due to various imperfections in 0day's > > > > > > auto bisection. So I manually list them here and CC the likely easy to > > > > > > debug ones to the corresponding maintainers in the followup emails. > > > > > > > > > > > > boot_successes: 4700 > > > > > > boot_failures: 247 > > > > > > > > > > [...] > > > > > > > > > > > WARNING:at_fs/direct-io.c:#dio_complete: 7 > > > > > > WARNING:at_fs/iomap.c:#iomap_dio_complete: 3 > > > > > > WARNING:at_fs/iomap.c:#iomap_dio_rw: 1 > > > > > > > > > > The first warning happens on btrfs and is bisected to this commit. > > > > > The other 2 warnings happen on xfs. > > > > > > > > I noticed that the warnings are triggered by generic/095 and > > > > generic/208, they're known to generate such warnings and I think these > > > > warnings are kind of 'known issue', please see comments above > > > > _filter_aiodio_dmesg() in fstests/common/filter. > > > > > > > > Please make sure your local fstests contains the following 3 commits: > > > > > > > > ca93e26865ab common: move _filter_xfs_dmesg() to common/filter > > > > 5aa662733ab0 common: turn _filter_xfs_dmesg() into _filter_aiodio_dmesg() > > > > 228aee780f13 generic/036,208: whitelist [iomap_]dio_complete() WARNs > > > > > > OK. > > > > > > > we filtered out such warnings in fstests on purpose so the affected > > > > tests won't fail because of such dmesg warnings. > > > > > > We may also teach 0day robot to ignore the warning when running the > > > above 2 fstests. > > > > > > The more generic way of filtering may be to inject a dmesg line like > > > > > > THIS TEST WILL TRIGGER KERNEL WARNING, PLEASE IGNORE. > > > > > > just before the specific tests startup. Then 3rd party dmesg parsing > > > scripts can handle such purpose-made warnings in a uniform way. > > > > fstests doesn't know, prior to the test, if the warnings the test is > > going to trigger are intentional or real bugs, fstests records the dmesg > > log and analyzes the log after test and reports PASS if all the warnings > > are intentional (based on the whitelist filter). > > > > But I think it's possible to insert such a message to dmesg *after* test > > if fstests finds that all the warnings are intentional. Does that work > > for 0day robot? > > > > Please don't hack around this in tests. WARN_ON() is only meant to be used to > indicate kernel bugs. It should never be reachable by userspace. I'm seeing > the same WARNING when fuzzing with syzkaller (see below). If it's supposed to > be reachable by userspace, then it *must* be removed, or perhaps replaced with > pr_warn_ratelimited() with a message that is actually understandable, like > "process 'foo' mixes buffered I/O with direct I/O". > > WARNING: CPU: 2 PID: 27399 at fs/direct-io.c:293 dio_complete+0x532/0x7e0 fs/direct-io.c:293 > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 2 PID: 27399 Comm: syz-executor0 Not tainted 4.14.0-rc7-00174-g2d6349944d96 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.3-20171021_125229-anatol 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x115/0x1da lib/dump_stack.c:53 > panic+0x1b4/0x39c kernel/panic.c:181 > __warn+0x1bf/0x1d4 kernel/panic.c:542 > report_bug+0x23f/0x2c0 lib/bug.c:184 > fixup_bug+0x3f/0x90 arch/x86/kernel/traps.c:178 > do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] > do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 > do_error_trap+0x11c/0x350 arch/x86/kernel/traps.c:298 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 > invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:906 > RIP: 0010:dio_complete+0x532/0x7e0 fs/direct-io.c:293 > RSP: 0018:ffff880039dcf208 EFLAGS: 00010212 > RAX: 0000000000010000 RBX: ffff880061a1f040 RCX: 0000000000000ebd > RDX: ffffffff816b97f2 RSI: ffffc90000696000 RDI: ffffed00073b9e36 > RBP: ffff880039dcf250 R08: ffff88007ffd709c R09: ffff880037874640 > R10: 00000000000002e7 R11: ffff88007ffd709d R12: 0000000000000200 > R13: 0000000000000200 R14: ffff8800593cf8f0 R15: ffffffff898005d0 > do_blockdev_direct_IO fs/direct-io.c:1374 [inline] > __blockdev_direct_IO+0x46b9/0x8a70 fs/direct-io.c:1400 > ext4_direct_IO_write fs/ext4/inode.c:3685 [inline] > ext4_direct_IO+0x98e/0x1950 fs/ext4/inode.c:3807 > generic_file_direct_write+0x1e6/0x440 mm/filemap.c:2928 > __generic_file_write_iter+0x21f/0x5b0 mm/filemap.c:3107 > ext4_file_write_iter+0x5c2/0x10a0 fs/ext4/file.c:264 > call_write_iter include/linux/fs.h:1771 [inline] > new_sync_write fs/read_write.c:469 [inline] > __vfs_write+0x681/0x970 fs/read_write.c:482 > vfs_write+0x18b/0x480 fs/read_write.c:544 > SYSC_write fs/read_write.c:589 [inline] > SyS_write+0xef/0x230 fs/read_write.c:581 > entry_SYSCALL_64_fastpath+0x1f/0xbe > RIP: 0033:0x4585c9 > RSP: 002b:00007ff87eff2bd8 EFLAGS: 00000216 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000738020 RCX: 00000000004585c9 > RDX: 0000000000000200 RSI: 0000000020010000 RDI: 0000000000000015 > RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000216 R12: 00000000006d1048 > R13: 00000000ffffffff R14: 00007ff87eff36bc R15: 0000000000000000 > Dumping ftrace buffer: > (ftrace buffer empty) > Kernel Offset: 0x8000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > Rebooting in 86400 seconds.. Ping, this still needs to be fixed. Lukas, it looks like you added the bogus WARN_ON_ONCE(); can you please fix it? If not you will get bug reports from fuzzers forever. Eric