From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from imap.thunk.org ([74.207.234.97]:53160 "EHLO imap.thunk.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1725868AbeIERjD (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 5 Sep 2018 13:39:03 -0400
Date: Wed, 5 Sep 2018 09:08:45 -0400
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: =?utf-8?B?54Sm5pmT5Yas?= <milestonejxd@gmail.com>
Cc: jlayton@redhat.com, R.E.Wolff@bitwizard.nl,
        linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: POSIX violation by writeback error
Message-ID: <20180905130845.GE23909@thunk.org>
References: <CAJDTihz-rFb2SGaxZsQnXGnee_2qW_ynhPe=tZ4yzQBSV_KQ1g@mail.gmail.com>
 <20180904075347.GH11854@BitWizard.nl>
 <CAJDTihzqn3whQ47uUOxGYk4Je4S10ehNEQCtfb=j--iCsdDqgQ@mail.gmail.com>
 <82ffc434137c2ca47a8edefbe7007f5cbecd1cca.camel@redhat.com>
 <CAJDTihw7T8WLme09W8VHCRfiALq4fxg1ZsywcSjn6hXsAw5wRw@mail.gmail.com>
 <cd137e88c9e882200c08c7336aa7b5a1c84a7ba3.camel@redhat.com>
 <CAJDTihzO0Y0ZE8em2LqT-Ac-Kga7W51Uwwb2uv9oTFhkJ8vKgA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAJDTihzO0Y0ZE8em2LqT-Ac-Kga7W51Uwwb2uv9oTFhkJ8vKgA@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Sep 05, 2018 at 04:09:42PM +0800, 焦晓冬 wrote:
> Well, since the reader application and the writer application are reading
> a same file, they are indeed related. The reader here is expecting
> to read the lasted data the writer offers, not any data available. The
> reader is surely not expecting to read partially new and partially old data.
> Right? And, that `read() should return the lasted write()` by POSIX
> supports this expectation.

Unix, and therefore Linux's, core assumption is that the primary
abstraction is the file.  So if you say that all applications which
read or write the same file, that's equivalent of saying, "all
applications are related".  Consider that a text editor can read a
config file, or a source file, or any other text file.  Consider shell
script commands such as "cat", "sort", "uniq".  Heck /bin/cp copies
any type of file.  Does that mean that /bin/cp, as a reader
application, is related to all applications on the system.

The real problem here is that we're trying to guess the motivations
and usage of programs that are reading the file, and there's no good
way to do that.  It could be that the reader is someone who wants to
be informed that file is in page cache, but was never persisted to
disk.  It could be that the user has figured out something has gone
terribly wrong, and is desperately trying to rescue all the data she
can by copying it to another disk.  In that case, stopping the reader
from being able to access the contents is exactly the wrong thing to
do if what you care about is preventing data loss.

The other thing which you seem to be assuming is that applications
which care about precious data won't use fsync(2).  And in general,
it's been fairly well known for decades that if you care about your
data, you have to use fsync(2) or O_DIRECT writes; and you *must*
check the error return of both the fsync(2) and the close(2) system
calls.  Emacs got that right in the mid-1980's --- over 30 years ago.
We mocked GNOME and KDE's toy notepad applications for getting this
wrong a decade ago, and they've since fixed it.

Actually, the GNOME and KDE applications, because they were too lazy
to persist the xattr and ACL's, decided it was better to truncate the
file and then rewrite it.  So if you crashed after the
truncate... your data was toast.  This was a decade ago, and again, it
was considered spectacular bad application programming then, and it's
since been fixed.  The point here is that there will always be lousy
application programs.  And it is a genuine systems design question how
much should we sacrifice performance and efficiency to accomodate
stupid application programs.

For example, we could make close(2) imply an fsync(2), and return the
error in close(2).  But *that* assumes that applications actually
check the return value for close(2) --- and there will be those that
don't.  This would completely trash performance for builds, since it
would slow down writing generated files such as all the *.o object
files.  Which since they are generated files, they aren't precious.
So forcing an fsync(2) after writing all of those files will destroy
your system performance.

						- Ted