linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal
@ 2015-10-05 15:22 Theodore Ts'o
  2015-10-05 15:58 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Theodore Ts'o @ 2015-10-05 15:22 UTC (permalink / raw)
  To: Dave Hansen, Andrew Morton, Linus Torvalds; +Cc: linux-ext4, linux-kernel

I've been tracking down a test failure in xfstests generic/208 in the
data_journal configuration, which I've been testing using:

	gce-xfstests -c data_journal -C 20 generic/208

I've bisected it down to commit 998ef75ddb: "fs: do not prefault
sys_write() user buffer pages".  I've confirmed that 4.3-rc2 fails as
detailed below, but with 998ef75ddb reverted, the problem goes away.

The generic/208 test tries to run the test program
aio-dio-invalidate-failure[1] 20 times.

On a successful pass, the test runs without incident.  On a failure,
the syslog gets flooded with messages which look like this:

Oct  5 08:46:40 xfstests-201510050844 kernel: JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 33797). There's a risk of filesystem corruption in case of system crash.

... and eventually, almost always before successful 5 test runs, and
often before even a single successful test run, we end up triggering
a BUG_ON[2].

Before commit 998ef75ddb, if we need to prefault in the page, we do so
before we attempt the copy.  After this commit, we attempt the copy
and if it fails because pagefaults have been turned off, we call
write_end(), the unlock the page, prefault in the pages, and then
retry the commit.

What I think is going on is that when we do attempt the copy, we end
up marking the page dirty before we notice that we need to page fault
in the page, which ends up triggering the warning that jbd2
buffer_head that is supposed to be journaled has been marked dirty
without calling ext4_handle_dirty_metadata() --- which is handled by
ext4_journalled_write_end(), but which is now happening out of order
given this commit.

Is it possible that we can change iov_iter_copy_from_user_atomic(), to
check for the error case before it marks the page dirty?  Or can we
create a light-weight function which checks to see if the page needs
to be faulted in which is lighter weight than
iov_iter_fault_in_readable?

Thanks,

						- Ted


[1] https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/tree/src/aio-dio-regress/aio-dio-invalidate-failure.c

[2] ------------[ cut here ]------------
kernel BUG at /usr/projects/linux/ext4/fs/jbd2/commit.c:1030!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC 
CPU: 1 PID: 9842 Comm: jbd2/dm-0-8 Not tainted 4.2.0-ext4-09906-g998ef75 #92
Hardware name: Google Google, BIOS Google 01/01/2011
task: ffff8802132a73c0 ti: ffff8800ade8c000 task.ti: ffff8800ade8c000
RIP: 0010:[<ffffffff81269145>]  [<ffffffff81269145>] jbd2_journal_commit_transaction+0xfbc/0x1592
RSP: 0018:ffff8800ade8fcc0  EFLAGS: 00010202
RAX: 0000000000a20003 RBX: ffff8800b8a5a888 RCX: ffff8800b8a5a088
RDX: 0000000000000001 RSI: ffff8800ade8fc78 RDI: ffff880200f63c30
RBP: ffff8800ade8fe30 R08: 0000013f3a2c21fa R09: 0000000000000002
R10: ffff8800ade8fbf0 R11: 0000000000000774 R12: ffff8800b66b21a0
R13: ffff8800b8a5a088 R14: ffff880200f63800 R15: ffff8800b8b9d800
FS:  0000000000000000(0000) GS:ffff88021df00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9005ff2050 CR3: 000000020fea6000 CR4: 00000000001406e0
Stack:
ffff8800b4462488 000001375a588074 ffff8800b8b9d8ac ffff8802132a73c0
0000001000000024 ffff8800ada9b000 ffff8802132a73c0 ffff8800b8b9d850
00000000ffffffff ffff880000000784 0000000000000000 ffff8802102ab288
Call Trace:
[<ffffffff8126d26f>] ? kjournald2+0xb6/0x1e5
[<ffffffff8126d26f>] ? kjournald2+0xb6/0x1e5
[<ffffffff810f7ba6>] ? __wake_up_common+0x71/0x71
[<ffffffff8126d1b9>] ? commit_timeout+0xa/0xa
[<ffffffff810e1f8c>] ? kthread+0xc6/0xce
[<ffffffff810e1ec6>] ? __kthread_parkme+0x5a/0x5a
[<ffffffff8168ac5f>] ? ret_from_fork+0x3f/0x70
[<ffffffff810e1ec6>] ? __kthread_parkme+0x5a/0x5a
Code: 8b 03 a9 00 00 40 00 74 1b 4c 89 fe 4c 89 e7 45 31 ed e8 19 13 00 00 41 f6 06 02 74 1d f0 80 63 02 bf eb 16 48 8b 03 a8 02 74 02 <0f> 0b 45 31 ed 49 83 7c 24 30 00 41 0f 94 c5 4c 89 e7 e8 d5 eb 
RIP  [<ffffffff81269145>] jbd2_journal_commit_transaction+0xfbc/0x1592
RSP <ffff8800ade8fcc0>
---[ end trace 2c7d9ab15164cf1c ]---

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-15 11:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 15:22 [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal Theodore Ts'o
2015-10-05 15:58 ` Linus Torvalds
2015-10-05 16:23   ` Dave Hansen
2015-10-05 20:22     ` Linus Torvalds
2015-10-05 20:48       ` Dave Hansen
2015-10-05 21:18         ` Linus Torvalds
2015-10-05 21:55           ` Linus Torvalds
2015-10-05 23:33             ` Dave Hansen
2015-10-06  9:01               ` Linus Torvalds
2015-10-05 20:49       ` H. Peter Anvin
2015-10-06  7:56         ` Ingo Molnar
2015-10-06  9:10           ` Linus Torvalds
2015-10-06  9:27             ` Ingo Molnar
2015-10-06 13:29               ` Linus Torvalds
2015-10-06 13:42                 ` Ingo Molnar
2015-10-05 16:03 ` Dave Hansen
2015-10-05 18:04 ` Dave Hansen
2015-10-07  3:34   ` Theodore Ts'o
2015-10-07  7:32     ` Linus Torvalds
2015-10-07 15:43       ` Theodore Ts'o
2015-10-09  4:01         ` [PATCH] ext4: use private version of page_zero_new_buffers() for data=journal mode Theodore Ts'o
2015-10-13  6:06           ` Leonid V. Fedorenchik
2015-10-15 11:17           ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).