All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-ext4@vger.kernel.org, akpm@linux-foundation.org,
	tytso@mit.edu, jack@suse.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it
Date: Wed, 05 Apr 2017 08:28:24 +1000	[thread overview]
Message-ID: <87pogriz93.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170404115358.GH30811@bombadil.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 4833 bytes --]

On Tue, Apr 04 2017, Matthew Wilcox wrote:

> On Tue, Apr 04, 2017 at 01:03:22PM +1000, NeilBrown wrote:
>> On Mon, Apr 03 2017, Jeff Layton wrote:
>> 
>> > On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
>> >> So, OK, that makes sense, we should keep allowing filesystems to report
>> >> ENOSPC as a writeback error.  But I think much of the argument below
>> >> still holds, and we should continue to have a prior EIO to be reported
>> >> over a new ENOSPC (even if the program has already consumed the EIO).
>> >
>> > I'm fine with that (though I'd like Neil's thoughts before we decide
>> > anything) there.
>> 
>> I'd like there be a well defined time when old errors were forgotten.
>> It does make sense for EIO to persist even if ENOSPC or EDQUOT is
>> received, but not forever.
>> Clearing the remembered errors when put_write_access() causes
>> i_writecount to reach zero is one option (as suggested), but I'm not
>> sure I'm happy with it.
>> 
>> Local filesystems, or network filesystems which receive strong write
>> delegations, should only ever return EIO to fsync.  We should
>> concentrate on them first, I think.  As there is only one possible
>> error, the seq counter is sufficient to "clear" it once it has been
>> reported to fsync() (or write()?).
>> 
>> Other network filesystems could return a whole host of errors: ENOSPC
>> EDQUOT ESTALE EPERM EFBIG ...
>> Do we want to limit exactly which errors are allowed in generic code, or
>> do we just support EIO generically and expect the filesystem to sort out
>> the details for anything else?
>
> I'd like us to focus on our POSIX compliance here and not return
> arbitrary errors.  The relevant pages are here:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
>
> For close(), we have to map every error to EIO.
> For fsync(), we can return any error that write() could have.  That limits
> us to:
>
> EFBIG ENOSPC EIO ENOBUFS ENXIO
>
> I think EFBIG really isn't a writeback error; are there any network
> filesystems that don't know the file size limit at the time they accept
> the original write?  ENOBUFS seems like a transient error (*this* call to
> fsync() failed, but the next one may succeed ... it's the equivalent of
> ENOMEM).  ENXIO seems to me like it's a submission error, not a writeback
> error.  So that leaves us with ENOSPC and EIO, as we have support today.

I guess Posix doesn't acknowledge the existence of disk quotas?
I think we need to add EDQUOT to your list.
Other hypothetical errors errors from the server such as EPERM or ESTALE
can reasonably be mapped to EIO.

>
>> One possible approach a filesystem could take is just to allow a single
>> async writeback error.  After that error, all subsequent write()
>> system calls become synchronous. As write() or fsync() is called on each
>> file descriptor (which could possibly have sent the write which caused
>> the error), an error is returned and that fact is counted.  Once we have
>> returned as many errors as there are open file descriptors
>> (i_writecount?), and have seen a successful write, the filesystem
>> forgets all recorded errors and switches back to async writes (for that
>> inode).   NFS does this switch-to-sync-on-error.  See nfs_need_check_write().
>> 
>> The "which could possibly have sent the write which caused the error" is
>> an explicit reference to NFS.  NFS doesn't use the AS_EIO/AS_ENOSPC
>> flags to return async errors.  It allocates an nfs_open_context for each
>> user who opens a given inode, and stores an error in there.  Each dirty
>> pages is associated with one of these, so errors a sure to go to the
>> correct user, though not necessarily the correct fd at present.
>
> ... and you need the nfs_open_context in order to use the correct
> credentials when writing a page to the server, correct?

Correct.

Thanks,
NeilBrown


>
>> When we specify the new behaviour we should be careful to be as vague as
>> possible while still saying what we need.  This allows filesystems some
>> flexibility.
>> 
>>   If an error happens during writeback, the next write() or fsync() (or
>>   ....) on the file descriptor to which data was written will return -1
>>   with errno set to EIO or some other relevant error.  Other file
>>   descriptors open on the same file may receive EIO or some other error
>>   on a subsequent appropriate system call.
>>   It should not be assumed that close() will return an error.  fsync()
>>   must be called before close() if writeback errors are important to the
>>   application.
>
> Thanks for explaining what NFS does today.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2017-04-04 22:28 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31 19:25 [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-04-03  7:12   ` Nikolay Borisov
2017-04-03 10:28     ` Jeff Layton
2017-04-03 14:47   ` Matthew Wilcox
2017-04-03 15:19     ` Jeff Layton
2017-04-03 16:15       ` Matthew Wilcox
2017-04-03 16:30         ` Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 2/4] dax: set errors in mapping when writeback fails Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 3/4] buffer: set wb errors using both new and old infrastructure for now Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 4/4] ext4: wire it up to the new writeback error reporting infrastructure Jeff Layton
2017-04-03  4:25 ` [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it NeilBrown
2017-04-03 10:28   ` Jeff Layton
2017-04-03 14:32     ` Matthew Wilcox
2017-04-03 17:47       ` Jeff Layton
2017-04-03 18:09         ` Jeremy Allison
2017-04-03 18:18           ` Jeff Layton
2017-04-03 18:36             ` Jeremy Allison
2017-04-03 18:40               ` Jeremy Allison
2017-04-03 18:49                 ` Jeff Layton
2017-04-03 19:16         ` Matthew Wilcox
2017-04-03 20:16           ` Jeff Layton
2017-04-04  2:45             ` Matthew Wilcox
2017-04-04  3:03             ` NeilBrown
2017-04-04 11:41               ` Jeff Layton
2017-04-04 22:41                 ` NeilBrown
2017-04-04 11:53               ` Matthew Wilcox
2017-04-04 12:17                 ` Jeff Layton
2017-04-04 16:12                   ` Matthew Wilcox
2017-04-04 16:25                     ` Jeff Layton
2017-04-04 17:09                       ` Matthew Wilcox
2017-04-04 18:08                         ` Jeff Layton
2017-04-04 22:50                         ` NeilBrown
2017-04-05 19:49                         ` Jeff Layton
2017-04-05 21:03                           ` Matthew Wilcox
2017-04-06  0:19                             ` NeilBrown
2017-04-06  0:02                           ` NeilBrown
2017-04-06  2:55                             ` Matthew Wilcox
2017-04-06  5:12                               ` NeilBrown
2017-04-06 13:31                                 ` Matthew Wilcox
2017-04-06 21:53                                   ` NeilBrown
2017-04-06 14:02                             ` Jeff Layton
2017-04-06 19:14                             ` Jeff Layton
2017-04-06 20:05                               ` Matthew Wilcox
2017-04-07 13:12                                 ` Jeff Layton
2017-04-09 23:15                                   ` NeilBrown
2017-04-10 13:19                                     ` Jeff Layton
2017-04-06 22:15                               ` NeilBrown
2017-04-04 23:13                       ` NeilBrown
2017-04-05 11:14                         ` Jeff Layton
2017-04-06  0:24                           ` NeilBrown
2017-04-04 13:38                 ` Theodore Ts'o
2017-04-04 22:28                 ` NeilBrown [this message]
2017-04-03 14:51   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pogriz93.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.