linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Nico Williams <nico@cryptonector.com>
Cc: david@lang.hm,
	"General Discussion of SQLite Database" <sqlite-users@sqlite.org>,
	"杨苏立 Yang Su Li" <suli@cs.wisc.edu>,
	linux-fsdevel@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	drh@hwaci.com
Subject: Re: [sqlite] light weight write barriers
Date: Thu, 25 Oct 2012 02:02:31 -0400	[thread overview]
Message-ID: <20121025060231.GC9860@thunk.org> (raw)
In-Reply-To: <CAK3OfOjtH9qVghb6+33HSb8+dPZVexLC9Z5XO1KKvjoTYiYp4A@mail.gmail.com>

On Thu, Oct 25, 2012 at 12:18:47AM -0500, Nico Williams wrote:
> 
> By trusting fsync().  And if you don't care about immediate Durability
> you can run the fsync() in a background thread and mark the associated
> transaction as completed in the next transaction to be written after
> the fsync() completes.

The challenge is when you have entagled metadata updates.  That is,
you update file A, and file B, and file A and B might share metadata.
In order to sync file A, you also have to update part of the metadata
for the updates to file B, which means calculating the dependencies of
what you have to drag in can get very complicated.  You can keep track
of what bits of the metadata you have to undo and then redo before
writing out the metadata for fsync(A), but that basically means you
have to implement soft updates, and all of the complexity this
implies: http://lwn.net/Articles/339337/

If you can keep all of the metadata separate, this can be somewhat
mitigated, but usually the block allocation records (regardless of
whether you use a tree, or a bitmap, or some other data structure)
tends of have entanglement problems.

It certainly is not impossible; RDBMS's have implemented this.  On the
other hand, they generally aren't as fast as file systems for
non-transactional workloads, and people really care about performance
on those sorts of workloads for file systems.  (About a decade ago,
Oracle tried to claim that you could run file system workloads using
an Oracle databsae as a back-end.  Everyone laughed at them, and the
idea died a quick, merciful death.)

Still, if you want to try to implement such a thing, by all means,
give it a try.  But I think you'll find that creating a file system
that can compete with existing file systems for performance, and
*then* also supports a transactional model, is going to be quite a
challenge.

     	      		      	     	      	 - Ted

  reply	other threads:[~2012-10-25  6:02 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CALwJ=MzHjAOs4J4kGH6HLdwP8E88StDWyAPVumNg9zCWpS9Tdg@mail.gmail.com>
2012-10-10 17:17 ` light weight write barriers Andi Kleen
2012-10-11 16:32   ` [sqlite] " 杨苏立 Yang Su Li
2012-10-11 17:41     ` Christoph Hellwig
2012-10-23 19:53     ` Vladislav Bolkhovitin
2012-10-24 21:17       ` Nico Williams
2012-10-24 22:03         ` david
2012-10-25  0:20           ` Nico Williams
2012-10-25  1:04             ` david
2012-10-25  5:18               ` Nico Williams
2012-10-25  6:02                 ` Theodore Ts'o [this message]
2012-10-25  6:58                   ` david
2012-10-25 14:03                     ` Theodore Ts'o
2012-10-25 18:03                       ` david
2012-10-25 18:29                         ` Theodore Ts'o
2012-11-05 20:03                           ` Pavel Machek
2012-11-05 22:04                             ` Theodore Ts'o
     [not found]                               ` <CALwJ=Mx-uEFLXK2wywekk=0dwrwVFb68wocnH9bjXJmHRsJx3w@mail.gmail.com>
2012-11-05 23:00                                 ` Theodore Ts'o
2012-10-30 23:49                   ` Nico Williams
2012-10-25  5:42           ` Theodore Ts'o
2012-10-25  7:11             ` david
2012-10-27  1:52         ` Vladislav Bolkhovitin
2012-10-25  5:14       ` Theodore Ts'o
2012-10-25 13:03         ` Alan Cox
2012-10-25 13:50           ` Theodore Ts'o
2012-10-27  1:55             ` Vladislav Bolkhovitin
2012-10-27  1:54         ` Vladislav Bolkhovitin
2012-10-27  4:44           ` Theodore Ts'o
2012-10-30 22:22             ` Vladislav Bolkhovitin
2012-10-31  9:54               ` Alan Cox
2012-11-01 20:18                 ` Vladislav Bolkhovitin
2012-11-01 21:24                   ` Alan Cox
2012-11-02  0:15                     ` Vladislav Bolkhovitin
2012-11-02  0:38                     ` Howard Chu
2012-11-02 12:33                       ` Alan Cox
2012-11-13  3:41                         ` Vladislav Bolkhovitin
2012-11-13 17:40                           ` Alan Cox
2012-11-13 19:13                             ` Nico Williams
2012-11-15  1:17                               ` Vladislav Bolkhovitin
2012-11-15 12:07                                 ` David Lang
2012-11-16 15:06                                   ` Howard Chu
2012-11-16 15:31                                     ` Ric Wheeler
2012-11-16 15:54                                       ` Howard Chu
2012-11-16 18:03                                         ` Ric Wheeler
2012-11-16 19:14                                     ` David Lang
2012-11-17  5:02                                   ` Vladislav Bolkhovitin
     [not found]                                   ` <CABK4GYNGrbes2Yhig4ioh-37OXg6iy6gqb3u8A2P2_dqNpMqoQ@mail.gmail.com>
2012-11-17  5:02                                     ` Vladislav Bolkhovitin
2012-11-15 17:06                                 ` Ryan Johnson
2012-11-15 22:35                                   ` Chris Friesen
2012-11-17  5:02                                     ` Vladislav Bolkhovitin
2012-11-20  1:23                                       ` Vladislav Bolkhovitin
2012-11-26 20:05                                         ` Nico Williams
2012-11-29  2:15                                           ` Vladislav Bolkhovitin
2012-11-15  1:16                             ` Vladislav Bolkhovitin
2012-11-13  3:37                       ` Vladislav Bolkhovitin
     [not found]                       ` <CALwJ=MwtFAz7uby+YzPPp2eBG-y+TUTOu9E9tEJbygDQW+s_tg@mail.gmail.com>
2012-11-13  3:41                         ` Vladislav Bolkhovitin
     [not found]           ` <CABK4GYMmigmi7YM9A5Aga21ZWoMKgUe3eX-AhPzLw9CnYhpcGA@mail.gmail.com>
2012-11-13  3:42             ` Vladislav Bolkhovitin
     [not found]   ` <CALwJ=MyR+nU3zqi3V3JMuEGNwd8FUsw9xLACJvd0HoBv3kRi0w@mail.gmail.com>
2012-10-11 16:38     ` Nico Williams
2012-10-11 16:48       ` Nico Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121025060231.GC9860@thunk.org \
    --to=tytso@mit.edu \
    --cc=david@lang.hm \
    --cc=drh@hwaci.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico@cryptonector.com \
    --cc=sqlite-users@sqlite.org \
    --cc=suli@cs.wisc.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).