From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752440AbbJEQDD (ORCPT ); Mon, 5 Oct 2015 12:03:03 -0400 Received: from mga02.intel.com ([134.134.136.20]:16795 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751032AbbJEQDB (ORCPT ); Mon, 5 Oct 2015 12:03:01 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,639,1437462000"; d="scan'208";a="658213716" Subject: Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/ data=journal To: "Theodore Ts'o" , Andrew Morton , Linus Torvalds , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org References: <20151005152236.GA8140@thunk.org> From: Dave Hansen Message-ID: <56129F34.5070203@linux.intel.com> Date: Mon, 5 Oct 2015 09:03:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20151005152236.GA8140@thunk.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/05/2015 08:22 AM, Theodore Ts'o wrote: ... > I've bisected it down to commit 998ef75ddb: "fs: do not prefault > sys_write() user buffer pages". I've confirmed that 4.3-rc2 fails as > detailed below, but with 998ef75ddb reverted, the problem goes away. ... > Before commit 998ef75ddb, if we need to prefault in the page, we do so > before we attempt the copy. After this commit, we attempt the copy > and if it fails because pagefaults have been turned off, we call > write_end(), the unlock the page, prefault in the pages, and then > retry the commit. That's nasty. Thanks for the bug report! I'll go see if I can reproduce this. > What I think is going on is that when we do attempt the copy, we end > up marking the page dirty before we notice that we need to page fault > in the page, which ends up triggering the warning that jbd2 > buffer_head that is supposed to be journaled has been marked dirty > without calling ext4_handle_dirty_metadata() --- which is handled by > ext4_journalled_write_end(), but which is now happening out of order > given this commit. > > Is it possible that we can change iov_iter_copy_from_user_atomic(), to > check for the error case before it marks the page dirty? Or can we > create a light-weight function which checks to see if the page needs > to be faulted in which is lighter weight than > iov_iter_fault_in_readable? Maybe I'm not following the macro magic, but I don't see where iov_iter_copy_from_user_atomic() is setting 'page' dirty. It'll set the dirty bit in the PTEs of course, but I don't see it touching 'page' except to kmap() it. I do see some ->write_end() implementations doing set_page_dirty(), though. Could we have just been confused and dirtied a page under ->write_end() when we had copied=0 and it wasn't _really_ dirtied?