From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org ([198.145.29.99]:47324 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932312AbeEWK4D (ORCPT ); Wed, 23 May 2018 06:56:03 -0400 Message-ID: <2917996756c844877e220b9f9e8740182d3aa2f9.camel@kernel.org> Subject: Re: [PATCH] fs: clear writeback errors in inode_init_always From: Jeff Layton To: Dave Chinner Cc: "Darrick J. Wong" , Matthew Wilcox , "Theodore Y. Ts'o" , linux-fsdevel@vger.kernel.org, fstests@vger.kernel.org, xfs Date: Wed, 23 May 2018 06:56:00 -0400 In-Reply-To: <20180522220926.GW10363@dastard> References: <20180518225037.GA26206@thunk.org> <630faadb74f608aa5a42649b81657e8b62d46bc3.camel@kernel.org> <20180519152700.GB4507@magnolia> <20180519153619.GA24698@bombadil.infradead.org> <20180521175457.GI23858@magnolia> <20180522220926.GW10363@dastard> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, 2018-05-23 at 08:09 +1000, Dave Chinner wrote: > On Tue, May 22, 2018 at 06:30:40AM -0400, Jeff Layton wrote: > > On Mon, 2018-05-21 at 10:54 -0700, Darrick J. Wong wrote: > > > On Sat, May 19, 2018 at 08:36:19AM -0700, Matthew Wilcox wrote: > > > > On Sat, May 19, 2018 at 08:27:00AM -0700, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong > > > > > > > > > > In inode_init_always(), we clear the inode mapping flags, which clears > > > > > any retained error (AS_EIO, AS_ENOSC) bits. Unfortunately, we do not > > > > > > > > typo of ENOSPC in case you do a new version > > > > > > Fixed, thanks. > > > > > > > > also clear wb_err, which means that old mapping errors can leak through > > > > > to new inodes. > > > > > > > > > > This is crucial for the XFS inode allocation path because we recycle old > > > > > in-core inodes and we do not want error state from an old file to leak > > > > > into the new file. This bug was discovered by running generic/036 and > > > > > generic/047 in a loop and noticing that the EIOs generated by the > > > > > collision of direct and buffered writes in generic/036 would survive the > > > > > remount between 036 and 047, and get reported to the fsyncs (on > > > > > different files on a reformatted fs!) in generic/047. > > > > > > > > > > Since we're changing the semantics of inode_init_always, we must also > > > > > change xfs_reinit_inode to retain the writeback error state when we go > > > > > to recover an inode that has been torn down in the vfs but not yet > > > > > disposed of by XFS. > > > > > > > > Don't you also need to preserve inode->i_mapping->flags across the > > > > reinitialisation for the users which aren't yet using ->wb_err? > > > > > > At first I thought (mistakenly) that xfs had been completely converted, > > > but digging further we still use the old filemap_check_errors. It seems > > > reasonable to me that if we're going to resurrect an incore inode then > > > we should try to hang on to AS_EIO/AS_ENOSPC for as long as we can. > > > > > > > > > > Yes, I think the patch you sent makes sense. > > > > Most of the XFS callers are using filemap_write_and_wait{_range}, and > > most of those seem to be called from ioctls (->setattr op being the > > notable exception). > > > > We could (in principle) pass a pointer to file->f_wb_err to most of > > those callers, and use that to check for errors instead of looking for > > AS_EIO/AS_ENOSPC. We wouldn't want to advance the error cursor for those > > (as that should only be done in the context of an fsync), but it might > > be more reliable than using the flags here. > > Why wouldn't we advance the error pointer? The data writeback error > caused an operation to fail and the error has been reported to > userspace, so how is that different to fsync() failing and reporting > the error to userspace? > > This seems like a slippery slope of inconsistency to me, where data > writeback errors are only reported once on fsync(), but can be > reported multiple times through different filesystem operations... > If I get back an error on ioctl, why should I think that that has anything to do with writeback? For example, suppose I do: write = 0 ioctl = -EIO fsync = 0 The ioctl returns an error but the fsync returns 0. It would be reasonable to assume that my writes had made it to disk. I think that we really do want to ensure that fsync always returns an error if writeback failed since the last fsync. That was sort of the point of the errseq_t work in the first place. -- Jeff Layton