linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: viro@zeniv.linux.org.uk, konishi.ryusuke@lab.ntt.co.jp,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-nilfs@vger.kernel.org,
	NeilBrown <neilb@suse.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting business
Date: Mon, 6 Mar 2017 16:08:01 -0700	[thread overview]
Message-ID: <20170306230801.GA28111@linux.intel.com> (raw)
In-Reply-To: <1488724854.2925.6.camel@redhat.com>

On Sun, Mar 05, 2017 at 09:40:54AM -0500, Jeff Layton wrote:
> On Sun, 2017-03-05 at 08:35 -0500, Jeff Layton wrote:
> > I recently did some work to wire up -ENOSPC handling in ceph, and found
> > I could get back -EIO errors in some cases when I should have instead
> > gotten -ENOSPC. The problem was that the ceph writeback code would set
> > PG_error on a writeback error, and that error would clobber the mapping
> > error.
> > 
> 
> I should also note that relying on PG_error to report writeback errors
> is inherently unreliable as well. If someone calls sync() before your
> fsync gets in there, then you'll likely lose it anyway.
> 
> filemap_fdatawait_keep_errors will preserve the error in the mapping,
> but not the individual PG_error flags, so I think we do want to ensure
> that the mapping error is set when there is a writeback error and not
> rely on PG_error bit for that.
> 
> > While I fixed that problem by simply not setting that bit on errors,
> > that led me down a rabbit hole of looking at how PG_error is being
> > handled in the kernel.
> > 
> > This patch series is a few fixes for things that I 100% noticed by
> > inspection. I don't have a great way to test these since they involve
> > error handling. I can certainly doctor up a kernel to inject errors
> > in this code and test by hand however if these look plausible up front.
> > 
> > Jeff Layton (3):
> >   nilfs2: set the mapping error when calling SetPageError on writeback
> >   mm: don't TestClearPageError in __filemap_fdatawait_range
> >   mm: set mapping error when launder_pages fails
> > 
> >  fs/nilfs2/segment.c |  1 +
> >  mm/filemap.c        | 19 ++++---------------
> >  mm/truncate.c       |  6 +++++-
> >  3 files changed, 10 insertions(+), 16 deletions(-)
> > 
> 
> (cc'ing Ross...)
> 
> Just when I thought that only NILFS2 needed a little work here, I see
> another spot...
> 
> I think that we should also need to fix dax_writeback_mapping_range to
> set a mapping error on writeback as well. It looks like that's not
> happening today. Something like the patch below (obviously untested).
> 
> I'll also plan to follow up with a patch to vfs.txt to outline how
> writeback errors should be handled by filesystems, assuming that this
> patchset isn't completely off base.
> 
> -------------------8<-----------------------
> 
> [PATCH] dax: set error in mapping when writeback fails
> 
> In order to get proper error codes from fsync, we must set an error in
> the mapping range when writeback fails.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/dax.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index c45598b912e1..9005d90deeda 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -888,8 +888,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,
>  
>  			ret = dax_writeback_one(bdev, mapping, indices[i],
>  					pvec.pages[i]);
> -			if (ret < 0)
> +			if (ret < 0) {
> +				mapping_set_error(mapping, ret);
>  				return ret;
> +			}

(Adding Jan)

I tested this a bit, and for the DAX case at least I don't think this does
what you want.  The current code already returns -EIO if dax_writeback_one()
hits an error, which bubbles up through the call stack and makes the fsync()
call in userspace fail with EIO, as we want.  With both ext4 and xfs this
patch (applied to v4.10) makes it so that we fail the current fsync() due to
the return value of -EIO, then we fail the next fsync() as well because only
then do we actually process the AS_EIO flag inside of filemap_check_errors().

I think maybe the missing piece is that our normal DAX fsync call stack
doesn't include a call to filemap_check_errors() if we return -EIO.  Here's
our stack in xfs:

    dax_writeback_mapping_range+0x32/0x70
    xfs_vm_writepages+0x8c/0xf0
    do_writepages+0x21/0x30
    __filemap_fdatawrite_range+0xc6/0x100
    filemap_write_and_wait_range+0x44/0x90
    xfs_file_fsync+0x7a/0x2c0
    vfs_fsync_range+0x4b/0xb0
    ? trace_hardirqs_on_caller+0xf5/0x1b0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    entry_SYSCALL_64_fastpath+0x1f/0xc2

On the subsequent fsync() call we *do* end up calling filemap_check_errors()
via filemap_fdatawrite_range(), which tests & clears the AS_EIO flag in the
mapping:

    filemap_fdatawait_range+0x3b/0x80
    filemap_write_and_wait_range+0x5a/0x90
    xfs_file_fsync+0x7a/0x2c0
    vfs_fsync_range+0x4b/0xb0
    ? trace_hardirqs_on_caller+0xf5/0x1b0
    do_fsync+0x3d/0x70
    SyS_fsync+0x10/0x20
    entry_SYSCALL_64_fastpath+0x1f/0xc2

Was your concern just that you didn't think that fsync() was properly
returning an error when dax_writeback_one() hit an error?  Or is there another
path by which we need to report the error, where it is actually important that
we set AS_EIO?  If it's the latter, then I think we need to rework the fsync
call path so that we both generate and consume AS_EIO on the same call,
probably in filemap_write_and_wait_range().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-06 23:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-05 13:35 [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting business Jeff Layton
2017-03-05 13:35 ` [PATCH 1/3] nilfs2: set the mapping error when calling SetPageError on writeback Jeff Layton
2017-03-07 13:46   ` Ryusuke Konishi
2017-03-05 13:35 ` [PATCH 2/3] mm: don't TestClearPageError in __filemap_fdatawait_range Jeff Layton
2017-03-05 13:35 ` [PATCH 3/3] mm: set mapping error when launder_pages fails Jeff Layton
2017-03-05 14:40 ` [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting business Jeff Layton
2017-03-06 23:08   ` Ross Zwisler [this message]
2017-03-07 10:26     ` Jan Kara
2017-03-07 14:03       ` Jeff Layton
2017-03-07 15:59       ` Ross Zwisler
2017-03-07 16:17         ` Jan Kara
2017-03-09  2:57       ` Theodore Ts'o
2017-03-09  9:04         ` Jan Kara
2017-03-09 10:47           ` Jeff Layton
2017-03-09 11:02             ` Jan Kara
2017-03-09 12:43               ` Jeff Layton
2017-03-09 13:22                 ` Brian Foster
2017-03-09 14:21                 ` Theodore Ts'o
2017-03-15  5:07           ` [RFC PATCH] mm: retry writepages() on ENOMEM when doing an data integrity writeback Theodore Ts'o
2017-03-15 11:59             ` Jan Kara
2017-03-15 14:09               ` Theodore Ts'o
2017-03-15 13:03             ` Michal Hocko
2017-03-16 10:18               ` Tetsuo Handa
2017-03-06  3:06 ` [PATCH 0/3] mm/fs: get PG_error out of the writeback reporting business NeilBrown
2017-03-06 11:43   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170306230801.GA28111@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).