From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-f65.google.com ([209.85.161.65]:42029 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727104AbeIEQMn (ORCPT ); Wed, 5 Sep 2018 12:12:43 -0400 Received: by mail-yw1-f65.google.com with SMTP id n207-v6so2488613ywn.9 for ; Wed, 05 Sep 2018 04:42:51 -0700 (PDT) Message-ID: <5fec9eccdb2e7418d7c594ce353557ed1c394d96.camel@redhat.com> Subject: Re: POSIX violation by writeback error From: Jeff Layton To: Martin Steigerwald Cc: =?UTF-8?Q?=E7=84=A6=E6=99=93=E5=86=AC?= , R.E.Wolff@bitwizard.nl, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Date: Wed, 05 Sep 2018 07:42:46 -0400 In-Reply-To: <1959947.mKHFU3S0Eq@merkaba> References: <1959947.mKHFU3S0Eq@merkaba> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, 2018-09-05 at 09:37 +0200, Martin Steigerwald wrote: > Jeff Layton - 04.09.18, 17:44: > > > - If the following read() could be served by a page in memory, just > > > returns the data. If the following read() could not be served by a > > > page in memory and the inode/address_space has a writeback error > > > mark, returns EIO. If there is a writeback error on the file, and > > > the request data could not be served > > > by a page in memory, it means we are reading a (partically) > > > corrupted > > > (out-of-data) > > > file. Receiving an EIO is expected. > > > > No, an error on read is not expected there. Consider this: > > > > Suppose the backend filesystem (maybe an NFSv3 export) is really r/o, > > but was mounted r/w. An application queues up a bunch of writes that > > of course can't be written back (they get EROFS or something when > > they're flushed back to the server), but that application never calls > > fsync. > > > > A completely unrelated application is running as a user that can open > > the file for read, but not r/w. It then goes to open and read the file > > and then gets EIO back or maybe even EROFS. > > > > Why should that application (which did zero writes) have any reason to > > think that the error was due to prior writeback failure by a > > completely separate process? Does EROFS make sense when you're > > attempting to do a read anyway? > > > > Moreover, what is that application's remedy in this case? It just > > wants to read the file, but may not be able to even open it for write > > to issue an fsync to "clear" the error. How do we get things moving > > again so it can do what it wants? > > > > I think your suggestion would open the floodgates for local DoS > > attacks. > > I wonder whether a new error for reporting writeback errors like this > could help out of the situation. But from all I read here so far, this > is a really challenging situation to deal with. > > I still remember how AmigaOS dealt with this case and from an usability > point of view it was close to ideal: If a disk was removed, like a > floppy disk, a network disk provided by Envoy or even a hard disk, it > pops up a dialog "You MUST insert volume again". And if > you did, it continued writing. That worked even with networked devices. > I tested it. I unplugged the ethernet cable and replugged it and it > continued writing. > > I can imagine that this would be quite challenging to implement within > Linux. I remember there has been a Google Summer of Code project for > NetBSD at least been offered to implement this, but I never got to know > whether it was taken or even implemented. If so it might serve as an > inspiration. Anyway AmigaOS did this even for stationary hard disks. I > had the issue of a flaky connection through IDE to SCSI and then SCSI to > UWSCSI adapter. And when the hard disk had connection issues that dialog > popped up, with the name of the operating system volume for example. > > Every access to it was blocked then. It simply blocked all processes > that accessed it till it became available again (usually I rebooted in > case of stationary device cause I had to open case or no hot plug > available or working). > > But AFAIR AmigaOS also did not have a notion of caching writes for > longer than maybe a few seconds or so and I think just within the device > driver. Writes were (almost) immediate. There have been some > asynchronous I/O libraries and I would expect an delay in the dialog > popping up in that case. > > It would be challenging to implement for Linux even just for removable > devices. You have page dirtying and delayed writeback – which is still > an performance issue with NFS of 1 GBit, rsync from local storage that > is faster than 1 GBit and huge files, reducing dirty memory ratio may > help to halve the time needed to complete the rsync copy operation. And > you would need to communicate all the way to userspace to let the user > know about the issue. > You may be interested in Project Banbury: http://www.wil.cx/~willy/banbury.html > Still, at least for removable media, this would be almost the most > usability friendly approach. With robust filesystems (Amiga Old > Filesystem and Fast Filesystem was not robust in case of sudden write > interruption, so the "MUST" was mean that way) one may even offer > "Please insert device again to write out unwritten data > or choose to discard that data" in a dialog. And for removable media it > may even work as blocking processes that access it usually would not > block the whole system. But for the operating system disk? I know how > Plasma desktop behaves during massive I/O operations. It usually just > completely stalls to a halt. It seems to me that its processes do some > I/O almost all of the time … or that the Linux kernel blocks other > syscalls too during heavy I/O load. > > I just liked to mention it as another crazy idea. But I bet it would > practically need to rewrite the I/O subsystem in Linux to a great > extent, probably diminishing its performance in situations of write > pressure. Or maybe a genius finds a way to implement both. :) > > What I do think tough is that the dirty page caching of Linux with its > current standard settings is excessive. 5% / 10% of available memory > often is a lot these days. There has been a discussion reducing the > default, but AFAIK it was never done. Linus suggested in that discussion > to about what the storage can write out in 3 to 5 seconds. That may even > help with error reporting as reducing dirty memory ratio will reduce the > memory pressure and so you may choose to add some memory allocations for > error handling. And the time till you know its not working may be less. > -- Jeff Layton