linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Xin Zhao" <uszhaoxin@gmail.com>
To: "Zach Brown" <zab@zabbo.net>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: How to know when file data has been flushed into disk?
Date: Fri, 7 Apr 2006 21:39:50 -0400	[thread overview]
Message-ID: <4ae3c140604071839v1b570d37y57c7e06233028e8f@mail.gmail.com> (raw)
In-Reply-To: <4436A770.3080905@zabbo.net>

This answered all my questions! Many thanks! Will check the phase 2 code.

Xin


On 4/7/06, Zach Brown <zab@zabbo.net> wrote:
>
> > If a program access data like this:
> >
> > 1. open the file
> > 2. write a lot of data into this file
>
> You don't say if this is an extending write or overwriting existing file
> data.  I'm going to assume extending writes so that data=ordered kicks in.
>
> > 3. close the file
>
> > So my questions are:
> > 1. How will the file system be notified after all data has been
> > flushed into disk?
>
> Look at phase 2 in journal_commit_transaction().  The kjournald thread
> issues the writeback of the file data by walking t_sync_datalist and
> then waits for the writeback to complete by using wait_on_buffer()
> before committing the transaction.
>
> > 2. Unlike data=journal mode, in data=order mode, the data could be
> > lost if system crashes when data is being flushed to disk. When system
> > reboots, does journal contains the old meta data for undo?
>
> No, ext3 isn't roll-backward.  It doesn't store the *old* data in the
> journal and undo the change if it fails halfway through.  It's
> roll-forward.  It stores the *new* data in the journal and replays
> complete transactions in the journal that weren't moved out to their
> final place on disk at the time of the crash.
>
> So if the machine reboots during the writeback phase then the
> transaction won't be committed yet and recovery won't replay that
> transaction from the journal.  From the metadata's point of view the
> file extension will never have happened.
>
> > 3. Does sys_close() have to  be blocked until all data and metadata
> > are committed?
>
> No, and neither does sys_getpid() :)
>
> > to take subsequent operation. However, data flush could be failed. In
> > this case, file system seems to mislead the application. Is this true?
>
> No.  The application has no grounds for assuming that a successful
> close() has synced previous operations to disk.  It's simply not part of
> the API.
>
> > If so, any solutions?
>
> The application should rely on tools like fsync(), fdatasync(), O_SYNC,
> mount -o sync, etc.  Whatever suits it best.
>
> - z
>

  reply	other threads:[~2006-04-08  1:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-07 15:42 How to know when file data has been flushed into disk? Xin Zhao
2006-04-07 15:53 ` Douglas McNaught
2006-04-07 16:04   ` Xin Zhao
2006-04-07 16:55     ` linux-os (Dick Johnson)
2006-04-07 17:19       ` Xin Zhao
2006-04-07 23:54   ` Ric Wheeler
2006-04-07 17:54 ` Zach Brown
2006-04-08  1:39   ` Xin Zhao [this message]
2006-04-07 21:07 Michael Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ae3c140604071839v1b570d37y57c7e06233028e8f@mail.gmail.com \
    --to=uszhaoxin@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).