From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f66.google.com ([209.85.218.66]:38731 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726401AbeIEMyO (ORCPT ); Wed, 5 Sep 2018 08:54:14 -0400 MIME-Version: 1.0 References: <20180904075347.GH11854@BitWizard.nl> <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com> <20180904161203.GD17478@fieldses.org> <20180904162348.GN17123@BitWizard.nl> <20180904185411.GA22166@fieldses.org> In-Reply-To: From: =?UTF-8?B?54Sm5pmT5Yas?= Date: Wed, 5 Sep 2018 16:24:57 +0800 Message-ID: Subject: Re: POSIX violation by writeback error To: jlayton@redhat.com Cc: bfields@fieldses.org, R.E.Wolff@bitwizard.nl, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Sep 5, 2018 at 4:18 AM Jeff Layton wrote: > > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote: > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote: > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote: > > > > Well, I think the point was that in the above examples you'd prefer= that > > > > the read just fail--no need to keep the data. A bit marking the fi= le > > > > (or even the entire filesystem) unreadable would satisfy posix, I g= uess. > > > > Whether that's practical, I don't know. > > > > > > When you would do it like that (mark the whole filesystem as "in > > > error") things go from bad to worse even faster. The Linux kernel > > > tries to keep the system up even in the face of errors. > > > > > > With that suggestion, having one application run into a writeback > > > error would effectively crash the whole system because the filesystem > > > may be the root filesystem and stuff like "sshd" that you need to > > > diagnose the problem needs to be read from the disk.... > > > > Well, the absolutist position on posix compliance here would be that a > > crash is still preferable to returning the wrong data. And for the > > cases =E7=84=A6=E6=99=93=E5=86=AC gives, that sounds right? Maybe it's= the wrong balance in > > general, I don't know. And we do already have filesystems with > > panic-on-error options, so if they aren't used maybe then maybe users > > have already voted against that level of strictness. > > > > Yeah, idk. The problem here is that this is squarely in the domain of > implementation defined behavior. I do think that the current "policy" > (if you call it that) of what to do after a wb error is weird and wrong. > What we probably ought to do is start considering how we'd like it to > behave. > > How about something like this? > > Mark the pages as "uncleanable" after a writeback error. We'll satisfy > reads from the cached data until someone calls fsync, at which point > we'd return the error and invalidate the uncleanable pages. Totally agree with you. > > If no one calls fsync and scrapes the error, we'll hold on to it for as > long as we can (or up to some predefined limit) and then after that > we'll invalidate the uncleanable pages and start returning errors on > reads. If someone eventually calls fsync afterward, we can return to > normal operation. Agree with you except that using fsync() as `clear_error_mark()` seems weird and counter-intuitive. > > As always though...what about mmap? Would we need to SIGBUS at the point > where we'd start returning errors on read()? I think SIGBUS to mmap() is the same thing as EIO to read(). > > Would that approximate the current behavior enough and make sense? > Implementing it all sounds non-trivial though... No. No problem is reported because nowadays we are relying on the underlying disk drives. They transparently redirect bad sectors and use S.M.A.R.T to waning us long before a real EIO could be seen. As to network filesystems, if I'm not wrong, close() op calls fsync() inside the implementation. So there is also no problem. > > -- > Jeff Layton >