linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: david@lang.hm
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Nico Williams" <nico@cryptonector.com>,
	"General Discussion of SQLite Database" <sqlite-users@sqlite.org>,
	"杨苏立 Yang Su Li" <suli@cs.wisc.edu>,
	linux-fsdevel@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	drh@hwaci.com
Subject: Re: [sqlite] light weight write barriers
Date: Thu, 25 Oct 2012 00:11:43 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.02.1210242358570.31862@asgard.lang.hm> (raw)
In-Reply-To: <20121025054255.GB9860@thunk.org>

On Thu, 25 Oct 2012, Theodore Ts'o wrote:

> On Wed, Oct 24, 2012 at 03:03:00PM -0700, david@lang.hm wrote:
>> Like what is being described for sqlite, loosing the tail end of the
>> messages is not a big problem under normal conditions. But there is
>> a need to be sure that what is there is complete up to the point
>> where it's lost.
>>
>> this is similar in concept to write-ahead-logs done for databases
>> (without the absolute durability requirement)
>
> If that's what you require, and you are using ext3/4, usng data
> journalling might meet your requirements.  It's something you can
> enable on a per-file basis, via chattr +j; you don't have to force all
> file systems to use data journaling via the data=journalled mount
> option.
>
> The potential downsides that you may or may not care about for this
> particular application:
>
> (a) This will definitely have a performance impact, especially if you
> are doing lots of small (less than 4k) writes, since the data blocks
> will get run through the journal, and will only get written to their
> final location on disk.
>
> (b) You don't get atomicity if the write spans a 4k block boundary.
> All of the bytes before i_size will be written, so you don't have to
> worry about "holes"; but the last message written to the log file
> might be truncated.
>
> (c) There will be a performance impact, since the contents of data
> blocks will be written at least twice (once to the journal, and once
> to the final location on disk).  If you do lots of small, sub-4k
> writes, the performance might be even worse, since data blocks might
> be written multiple times to the journal.

I'll have to dig into this option. In the case of rsyslog it sounds 
like it could work (not as good as a filesystem independant way of doing 
things, but better than full fsyncs)

Truncated messages are not great, but they are a detectable, and 
acceptable risk.

while the average message size is much smaller than 4K (on my network it's 
~250 bytes), the metadata that's broken out expands this somewhat, and we 
can afford to waste disk space if it makes things safer or more efficient.

If we do update in place with flags with each message, each message will 
need to be written up to three times (on recipt, being processed, finished 
processed). With high message burst rates, I'm worried that we would fill 
up the journal, is there a good way to deal with this?

I believe that ext4 can put the journal on a different device from the 
filesystem, would this help a lot?

If you were to put the journal for an ext4 filesystem on a ram disk, you 
would loose the data recovery protection of the journal, but could you use 
this trick to get ordered data writes onto the filesystem?

David Lang

  reply	other threads:[~2012-10-25  7:11 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CALwJ=MzHjAOs4J4kGH6HLdwP8E88StDWyAPVumNg9zCWpS9Tdg@mail.gmail.com>
2012-10-10 17:17 ` light weight write barriers Andi Kleen
2012-10-11 16:32   ` [sqlite] " 杨苏立 Yang Su Li
2012-10-11 17:41     ` Christoph Hellwig
2012-10-23 19:53     ` Vladislav Bolkhovitin
2012-10-24 21:17       ` Nico Williams
2012-10-24 22:03         ` david
2012-10-25  0:20           ` Nico Williams
2012-10-25  1:04             ` david
2012-10-25  5:18               ` Nico Williams
2012-10-25  6:02                 ` Theodore Ts'o
2012-10-25  6:58                   ` david
2012-10-25 14:03                     ` Theodore Ts'o
2012-10-25 18:03                       ` david
2012-10-25 18:29                         ` Theodore Ts'o
2012-11-05 20:03                           ` Pavel Machek
2012-11-05 22:04                             ` Theodore Ts'o
     [not found]                               ` <CALwJ=Mx-uEFLXK2wywekk=0dwrwVFb68wocnH9bjXJmHRsJx3w@mail.gmail.com>
2012-11-05 23:00                                 ` Theodore Ts'o
2012-10-30 23:49                   ` Nico Williams
2012-10-25  5:42           ` Theodore Ts'o
2012-10-25  7:11             ` david [this message]
2012-10-27  1:52         ` Vladislav Bolkhovitin
2012-10-25  5:14       ` Theodore Ts'o
2012-10-25 13:03         ` Alan Cox
2012-10-25 13:50           ` Theodore Ts'o
2012-10-27  1:55             ` Vladislav Bolkhovitin
2012-10-27  1:54         ` Vladislav Bolkhovitin
2012-10-27  4:44           ` Theodore Ts'o
2012-10-30 22:22             ` Vladislav Bolkhovitin
2012-10-31  9:54               ` Alan Cox
2012-11-01 20:18                 ` Vladislav Bolkhovitin
2012-11-01 21:24                   ` Alan Cox
2012-11-02  0:15                     ` Vladislav Bolkhovitin
2012-11-02  0:38                     ` Howard Chu
2012-11-02 12:33                       ` Alan Cox
2012-11-13  3:41                         ` Vladislav Bolkhovitin
2012-11-13 17:40                           ` Alan Cox
2012-11-13 19:13                             ` Nico Williams
2012-11-15  1:17                               ` Vladislav Bolkhovitin
2012-11-15 12:07                                 ` David Lang
2012-11-16 15:06                                   ` Howard Chu
2012-11-16 15:31                                     ` Ric Wheeler
2012-11-16 15:54                                       ` Howard Chu
2012-11-16 18:03                                         ` Ric Wheeler
2012-11-16 19:14                                     ` David Lang
2012-11-17  5:02                                   ` Vladislav Bolkhovitin
     [not found]                                   ` <CABK4GYNGrbes2Yhig4ioh-37OXg6iy6gqb3u8A2P2_dqNpMqoQ@mail.gmail.com>
2012-11-17  5:02                                     ` Vladislav Bolkhovitin
2012-11-15 17:06                                 ` Ryan Johnson
2012-11-15 22:35                                   ` Chris Friesen
2012-11-17  5:02                                     ` Vladislav Bolkhovitin
2012-11-20  1:23                                       ` Vladislav Bolkhovitin
2012-11-26 20:05                                         ` Nico Williams
2012-11-29  2:15                                           ` Vladislav Bolkhovitin
2012-11-15  1:16                             ` Vladislav Bolkhovitin
2012-11-13  3:37                       ` Vladislav Bolkhovitin
     [not found]                       ` <CALwJ=MwtFAz7uby+YzPPp2eBG-y+TUTOu9E9tEJbygDQW+s_tg@mail.gmail.com>
2012-11-13  3:41                         ` Vladislav Bolkhovitin
     [not found]           ` <CABK4GYMmigmi7YM9A5Aga21ZWoMKgUe3eX-AhPzLw9CnYhpcGA@mail.gmail.com>
2012-11-13  3:42             ` Vladislav Bolkhovitin
     [not found]   ` <CALwJ=MyR+nU3zqi3V3JMuEGNwd8FUsw9xLACJvd0HoBv3kRi0w@mail.gmail.com>
2012-10-11 16:38     ` Nico Williams
2012-10-11 16:48       ` Nico Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1210242358570.31862@asgard.lang.hm \
    --to=david@lang.hm \
    --cc=drh@hwaci.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico@cryptonector.com \
    --cc=sqlite-users@sqlite.org \
    --cc=suli@cs.wisc.edu \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).