linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: 焦晓冬 <milestonejxd@gmail.com>,
	R.E.Wolff@bitwizard.nl, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	"Matthew Wilcox" <willy@infradead.org>
Subject: Re: POSIX violation by writeback error
Date: Wed, 05 Sep 2018 07:42:46 -0400	[thread overview]
Message-ID: <5fec9eccdb2e7418d7c594ce353557ed1c394d96.camel@redhat.com> (raw)
In-Reply-To: <1959947.mKHFU3S0Eq@merkaba>

On Wed, 2018-09-05 at 09:37 +0200, Martin Steigerwald wrote:
> Jeff Layton - 04.09.18, 17:44:
> > > - If the following read() could be served by a page in memory, just
> > > returns the data. If the following read() could not be served by a
> > > page in memory and the inode/address_space has a writeback error
> > > mark, returns EIO. If there is a writeback error on the file, and
> > > the request data could not be served
> > > by a page in memory, it means we are reading a (partically)
> > > corrupted
> > > (out-of-data)
> > > file. Receiving an EIO is expected.
> > 
> > No, an error on read is not expected there. Consider this:
> > 
> > Suppose the backend filesystem (maybe an NFSv3 export) is really r/o,
> > but was mounted r/w. An application queues up a bunch of writes that
> > of course can't be written back (they get EROFS or something when
> > they're flushed back to the server), but that application never calls
> > fsync.
> > 
> > A completely unrelated application is running as a user that can open
> > the file for read, but not r/w. It then goes to open and read the file
> > and then gets EIO back or maybe even EROFS.
> > 
> > Why should that application (which did zero writes) have any reason to
> > think that the error was due to prior writeback failure by a
> > completely separate process? Does EROFS make sense when you're
> > attempting to do a read anyway?
> > 
> > Moreover, what is that application's remedy in this case? It just
> > wants to read the file, but may not be able to even open it for write
> > to issue an fsync to "clear" the error. How do we get things moving
> > again so it can do what it wants?
> > 
> > I think your suggestion would open the floodgates for local DoS
> > attacks.
> 
> I wonder whether a new error for reporting writeback errors like this 
> could help out of the situation. But from all I read here so far, this 
> is a really challenging situation to deal with.
> 
> I still remember how AmigaOS dealt with this case and from an usability 
> point of view it was close to ideal: If a disk was removed, like a 
> floppy disk, a network disk provided by Envoy or even a hard disk, it 
> pops up a dialog "You MUST insert volume <name of volume> again". And if 
> you did, it continued writing. That worked even with networked devices. 
> I tested it. I unplugged the ethernet cable and replugged it and it 
> continued writing.
> 
> I can imagine that this would be quite challenging to implement within 
> Linux. I remember there has been a Google Summer of Code project for 
> NetBSD at least been offered to implement this, but I never got to know 
> whether it was taken or even implemented. If so it might serve as an 
> inspiration. Anyway AmigaOS did this even for stationary hard disks. I 
> had the issue of a flaky connection through IDE to SCSI and then SCSI to 
> UWSCSI adapter. And when the hard disk had connection issues that dialog 
> popped up, with the name of the operating system volume for example.
> 
> Every access to it was blocked then. It simply blocked all processes 
> that accessed it till it became available again (usually I rebooted  in 
> case of stationary device cause I had to open case or no hot plug 
> available or working). 
> 
> But AFAIR AmigaOS also did not have a notion of caching writes for 
> longer than maybe a few seconds or so and I think just within the device 
> driver. Writes were (almost) immediate. There have been some 
> asynchronous I/O libraries and I would expect an delay in the dialog 
> popping up in that case.
> 
> It would be challenging to implement for Linux even just for removable 
> devices. You have page dirtying and delayed writeback – which is still 
> an performance issue with NFS of 1 GBit, rsync from local storage that 
> is faster than 1 GBit and huge files, reducing dirty memory ratio may 
> help to halve the time needed to complete the rsync copy operation. And 
> you would need to communicate all the way to userspace to let the user 
> know about the issue.
> 

You may be interested in Project Banbury:

http://www.wil.cx/~willy/banbury.html

> Still, at least for removable media, this would be almost the most 
> usability friendly approach. With robust filesystems (Amiga Old 
> Filesystem and Fast Filesystem was not robust in case of sudden write 
> interruption, so the "MUST" was mean that way) one may even offer 
> "Please insert device <name of device> again to write out unwritten data 
> or choose to discard that data" in a dialog. And for removable media it 
> may even work as blocking processes that access it usually would not 
> block the whole system. But for the operating system disk? I know how 
> Plasma desktop behaves during massive I/O operations. It usually just 
> completely stalls to a halt. It seems to me that its processes do some 
> I/O almost all of the time … or that the Linux kernel blocks other 
> syscalls too during heavy I/O load.
> 
> I just liked to mention it as another crazy idea. But I bet it would 
> practically need to rewrite the I/O subsystem in Linux to a great 
> extent, probably diminishing its performance in situations of write 
> pressure. Or maybe a genius finds a way to implement both. :)
> 
> What I do think tough is that the dirty page caching of Linux with its 
> current standard settings is excessive. 5% / 10% of available memory 
> often is a lot these days. There has been a discussion reducing the 
> default, but AFAIK it was never done. Linus suggested in that discussion 
> to about what the storage can write out in 3 to 5 seconds. That may even 
> help with error reporting as reducing dirty memory ratio will reduce the 
> memory pressure and so you may choose to add some memory allocations for 
> error handling. And the time till you know its not working may be less.
> 
-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2018-09-05 16:12 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04  6:32 POSIX violation by writeback error 焦晓冬
2018-09-04  7:53 ` Rogier Wolff
2018-09-04  8:58   ` 焦晓冬
2018-09-04  9:29     ` Rogier Wolff
2018-09-04 10:45       ` 焦晓冬
2018-09-04 11:09     ` Jeff Layton
2018-09-04 14:56       ` 焦晓冬
2018-09-04 15:44         ` Jeff Layton
2018-09-04 16:12           ` J. Bruce Fields
2018-09-04 16:23             ` Rogier Wolff
2018-09-04 18:54               ` J. Bruce Fields
2018-09-04 20:18                 ` Jeff Layton
2018-09-04 20:35                   ` Vito Caputo
2018-09-04 21:02                     ` Matthew Wilcox
2018-09-05  0:51                     ` Dave Chinner
2018-09-05  8:24                   ` 焦晓冬
2018-09-05 10:55                     ` Jeff Layton
2018-09-05 12:07                       ` Rogier Wolff
2018-09-06  2:57                         ` Dave Chinner
2018-09-06  9:17                           ` Rogier Wolff
2018-09-24 23:09                             ` Alan Cox
2018-09-05 13:53                       ` J. Bruce Fields
2018-09-05  7:08           ` Rogier Wolff
2018-09-05  7:39             ` Martin Steigerwald
2018-09-05  8:04               ` Rogier Wolff
2018-09-05  8:37                 ` 焦晓冬
2018-09-05 12:07                   ` Austin S. Hemmelgarn
2018-09-05 12:46                     ` Rogier Wolff
2018-09-05  9:32                 ` Martin Steigerwald
2018-09-05  7:37           ` Martin Steigerwald
2018-09-05 11:42             ` Jeff Layton [this message]
2018-09-05  8:09           ` 焦晓冬
2018-09-05 13:08             ` Theodore Y. Ts'o
2018-09-24 23:21               ` Alan Cox
2018-09-06  7:28             ` 焦晓冬
     [not found] <CAJDTihx2yaR-_-9Ks1PoFcrKNZgUOoLdN-wRTTMV76Jg_dCLrw@mail.gmail.com>
2018-09-04 10:56 ` Jeff Layton
2018-09-24 23:30   ` Alan Cox
2018-09-25 11:15     ` Jeff Layton
2018-09-25 15:46       ` Theodore Y. Ts'o
2018-09-25 16:17         ` Rogier Wolff
2018-09-25 16:39         ` Alan Cox
2018-09-25 16:41         ` Jeff Layton
2018-09-25 22:30           ` Theodore Y. Ts'o
2018-09-26 18:10             ` Alan Cox
2018-09-26 21:49               ` Theodore Y. Ts'o
2018-09-27 22:48                 ` Alan Cox
2018-09-27  7:18               ` Rogier Wolff
2018-09-27 12:43             ` Jeff Layton
2018-09-27 14:27               ` Theodore Y. Ts'o
2018-09-25 17:35         ` Adam Borowski
2018-09-25 22:46           ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fec9eccdb2e7418d7c594ce353557ed1c394d96.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=R.E.Wolff@bitwizard.nl \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    --cc=milestonejxd@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).