linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: "Barczak, Mariusz" <mariusz.barczak@intel.com>
Cc: Andreas Dilger <adilger@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Wysoczanski, Michal" <michal.wysoczanski@intel.com>,
	"Baldyga, Robert" <robert.baldyga@intel.com>,
	"Roman, Agnieszka" <agnieszka.roman@intel.com>
Subject: Re: [BUG] Possible silent data corruption in filesystems/page cache
Date: Mon, 6 Jun 2016 09:35:39 -0400	[thread overview]
Message-ID: <20160606133539.GE22108@thunk.org> (raw)
In-Reply-To: <842E055448A75D44BEB94DEB9E5166E91877C26F@irsmsx110.ger.corp.intel.com>

On Mon, Jun 06, 2016 at 07:29:42AM +0000, Barczak, Mariusz wrote:
> Hi, Let me elaborate problem in detail. 
> 
> For buffered IO data are copied into memory pages. For this case,
> the write IO is not submitted (generally). In the background opportunistic
> cleaning of dirty pages takes place and IO is generated to the
> device. An IO error is observed on this path and application
> is not informed about this. Summarizing flushing of dirty page fails.
> And probably, this page is dropped but in fact it should not be.
> So if above situation happens between application write and sync
> then no error is reported. In addition after some time, when the
> application reads the same LBA on which IO error occurred, old data
> content is fetched.

The application will be informed about it if it asks --- if it calls
fsync(), the I/O will be forced and if there is an error it will be
returned to the user.  But if the user has not asked, there is no way
for the user space to know that there is a problem --- for that
matter, it may have exited already by the time we do the buffered
writeback, so there may be nobody to inform.

If the error hapepns between the write and sync, then the address
space mapping's AS_EIO bit will be set.  (See filemap_check_errors()
and do a git grep on AS_EIO.)  So the user will be informed when they
call fsync(2).

The problem with simply not dropping the page is that if we do that,
the page will never be cleaned, and in the worst case, this can lead
to memory exhaustion.  Consider the case where a user is writing huge
numbers of pages, (e.g., dd if=/dev/zero
of=/dev/device-that-will-go-away) if the page is never dropped, then
the memory will never go away.

In other words, the current behavior was carefully considered, and
deliberately chosen as the best design.

The fact that you need to call fsync(2), and then check the error
returns of both fsync(2) *and* close(2) if you want to know for sure
whether or not there was an I/O error is a known, docmented part of
Unix/Linux and has been true for literally decades.  (With Emacs
learning and fixing this back in the late-1980's to avoid losing user
data if the user goes over quota on their Andrew File System on a BSD
4.3 system, for example.  If you're using some editor that comes with
some desktop package or some whizzy IDE, all bets are off, of course.
But if you're using such tools, you probably care about eye candy way
more than you care about your data; certainly the authors of such
programs seem to have this tendency, anyway.  :-)

Cheers,

						- Ted

  reply	other threads:[~2016-06-06 13:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  9:51 [BUG] Possible silent data corruption in filesystems/page cache Barczak, Mariusz
2016-06-02 19:32 ` Andreas Dilger
2016-06-06  7:29   ` Barczak, Mariusz
2016-06-06 13:35     ` Theodore Ts'o [this message]
2016-06-07  7:36       ` Barczak, Mariusz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160606133539.GE22108@thunk.org \
    --to=tytso@mit.edu \
    --cc=adilger@dilger.ca \
    --cc=agnieszka.roman@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mariusz.barczak@intel.com \
    --cc=michal.wysoczanski@intel.com \
    --cc=robert.baldyga@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).