All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] fsync() and LVM
@ 2009-03-13 17:46 Marco Colombo
  2009-03-13 20:08 ` Stuart D. Gathman
  0 siblings, 1 reply; 39+ messages in thread
From: Marco Colombo @ 2009-03-13 17:46 UTC (permalink / raw)
  To: LVM general discussion and development

Hi, I'm a long time user of both PostgreSQL and LVM. So far I've been quite
happy with both. But a recent thread on the PostgreSQL list made me
unconfortable. What is this thing they're referring to, fsync()'s being
ignored? Makes me feel like I'm running on thin ice, without even
knowing. Before I start phasing out LVM from all my PostgreSQL installations
(as they suggest), I'd like to hear some kind of confirmation.
This is quite scary.

http://archives.postgresql.org/pgsql-general/2009-03/msg00204.php

In my understanding:

fsync():  force data from OS memory to disk (ending up in the disk cache)
write barrier: force data from disk cache to disk platters

If you disable write-back cache on the disks, you no longer need write
barriers. But apparently they claim LVM being unsafe even with disks
caches in write-thru mode, which surprises me a lot.

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-13 17:46 [linux-lvm] fsync() and LVM Marco Colombo
@ 2009-03-13 20:08 ` Stuart D. Gathman
  2009-03-13 20:29   ` Ben Chobot
  2009-03-13 20:38   ` Alasdair G Kergon
  0 siblings, 2 replies; 39+ messages in thread
From: Stuart D. Gathman @ 2009-03-13 20:08 UTC (permalink / raw)
  To: LVM general discussion and development

On Fri, 13 Mar 2009, Marco Colombo wrote:

> Hi, I'm a long time user of both PostgreSQL and LVM. So far I've been quite
> happy with both. But a recent thread on the PostgreSQL list made me
> unconfortable. What is this thing they're referring to, fsync()'s being
> ignored? Makes me feel like I'm running on thin ice, without even
> knowing. Before I start phasing out LVM from all my PostgreSQL installations
> (as they suggest), I'd like to hear some kind of confirmation.

> http://archives.postgresql.org/pgsql-general/2009-03/msg00204.php

The discussion doesn't make a lot of sense.  fsync() is a filesystem
call - it can't possibly be handled (or ignored) at a lower level because the
lowel level doesn't know which blocks belong to the file.
I *can* imagine that perhaps the raw block writes used by the filesystem
code might be ignored - or improperly cached.  Clearly, they are not
ignored (filesystems do get updated) - so if there is any substance to
the charge, it must be that LVM reorders writes somehow.  Caching
doesn't really break anything - it is the *reordering* of writes that
could be a problem.  A "write barrier" says "finish these writes before you
start any more, but otherwise reorder how you like".  I did some experiments
with iostat, and I am convinced that LVM does not itself do any reordering of
writes.

Here is my theory as to what is really going on:

LVM is not really "ignoring" fsync(), because it would never see it.
However, in the presence of hardware writeback caching in disk drives,
fsync() would need to tell the hardware to "finish all these writes
before you start any more" (i.e. - a write barrier) for fsync() to be
effective. 

I suspect that LVM simply fails to pass the *write barrier* through
to underlying layers (i.e. ignores the writebarrier call).  Thus, you should be
fine if you simply turn off writeback caching in your disk drives.  If you
could guarrantee that disk drives would be powered long enough after the main
system stops - even a drive write back cache would not be much of a risk.

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-13 20:08 ` Stuart D. Gathman
@ 2009-03-13 20:29   ` Ben Chobot
  2009-03-13 20:38   ` Alasdair G Kergon
  1 sibling, 0 replies; 39+ messages in thread
From: Ben Chobot @ 2009-03-13 20:29 UTC (permalink / raw)
  To: LVM general discussion and development

On Fri, 13 Mar 2009, Stuart D. Gathman wrote:

> I suspect that LVM simply fails to pass the *write barrier* through
> to underlying layers (i.e. ignores the writebarrier call).  Thus, you should be
> fine if you simply turn off writeback caching in your disk drives.  If you
> could guarrantee that disk drives would be powered long enough after the main
> system stops - even a drive write back cache would not be much of a risk.

The big question in my mind is which software layers don't get to see the 
write barrier. If any of them can reorder writes, that could (however 
unlikely) lead to data corruption.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-13 20:08 ` Stuart D. Gathman
  2009-03-13 20:29   ` Ben Chobot
@ 2009-03-13 20:38   ` Alasdair G Kergon
  2009-03-14  3:16     ` Marco Colombo
  2009-03-14  9:07     ` Dietmar Maurer
  1 sibling, 2 replies; 39+ messages in thread
From: Alasdair G Kergon @ 2009-03-13 20:38 UTC (permalink / raw)
  To: LVM general discussion and development

Let's try to clear up the confusion.

Kernel device-mapper (which lvm uses) does not support write barriers
except in very restricted circumstances (when only one device is
involved and the mapping is trivial).  If dm receives a write barrier
which is not supported it notifies the caller (typically a filesystem)
so appropriate action can be taken if it wishes.

Several kernels releases ago, the implementation of the 'flush device'
operation in the block layer was changed from a simple function call
that dm supported to a mechanism involving barriers that is trickier for
dm to support.  Previously 'flush' could not fail and so callers do not
generally have strategies to handle such a situation.

The latest of several attempts to support barriers is contained in
patches here:
  http://patchwork.kernel.org/project/dm-devel/list/?q=barriers

Please review and test if you are interested!

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-13 20:38   ` Alasdair G Kergon
@ 2009-03-14  3:16     ` Marco Colombo
  2009-03-14  9:07     ` Dietmar Maurer
  1 sibling, 0 replies; 39+ messages in thread
From: Marco Colombo @ 2009-03-14  3:16 UTC (permalink / raw)
  To: LVM general discussion and development

Alasdair G Kergon wrote:
 > Several kernels releases ago, the implementation of the 'flush device'
> operation in the block layer was changed from a simple function call
> that dm supported to a mechanism involving barriers that is trickier for
> dm to support.  Previously 'flush' could not fail and so callers do not
> generally have strategies to handle such a situation.

The 'caller' here would be fsync() in the FS. What strategies are available
to handle a failing 'flush'? It there anything that can be done at
application level (userland)?

More than anything, does LVM (or device mapper) really reorder writes?
Is it safe with disk caches in write-thru mode? (hdparm -W0)

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-13 20:38   ` Alasdair G Kergon
  2009-03-14  3:16     ` Marco Colombo
@ 2009-03-14  9:07     ` Dietmar Maurer
  2009-03-14 14:31       ` Stuart D. Gathman
  1 sibling, 1 reply; 39+ messages in thread
From: Dietmar Maurer @ 2009-03-14  9:07 UTC (permalink / raw)
  To: LVM general discussion and development

> Let's try to clear up the confusion.
> 
> Kernel device-mapper (which lvm uses) does not support write barriers
> except in very restricted circumstances (when only one device is
> involved and the mapping is trivial).  If dm receives a write barrier
> which is not supported it notifies the caller (typically a filesystem)
> so appropriate action can be taken if it wishes.

Does that mean I should never use more than one device if I have
applications depending on fsync (databases)?

- Dietmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-14  9:07     ` Dietmar Maurer
@ 2009-03-14 14:31       ` Stuart D. Gathman
  2009-03-15  0:51         ` Marco Colombo
  2009-03-15  8:51         ` Dietmar Maurer
  0 siblings, 2 replies; 39+ messages in thread
From: Stuart D. Gathman @ 2009-03-14 14:31 UTC (permalink / raw)
  To: LVM general discussion and development

On Sat, 14 Mar 2009, Dietmar Maurer wrote:

> > Let's try to clear up the confusion.
> > 
> > Kernel device-mapper (which lvm uses) does not support write barriers
> > except in very restricted circumstances (when only one device is
> > involved and the mapping is trivial).  If dm receives a write barrier
> > which is not supported it notifies the caller (typically a filesystem)
> > so appropriate action can be taken if it wishes.
> 
> Does that mean I should never use more than one device if I have
> applications depending on fsync (databases)?

It just means that write barriers won't get passed to the device.
This is only a problem if the devices have write caches.  Note 
that with multiple devices, even a FIFO write cache could cause 
reordering between devices (one device could finish faster than another).

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-14 14:31       ` Stuart D. Gathman
@ 2009-03-15  0:51         ` Marco Colombo
  2009-03-16 11:02           ` Charles Marcus
  2009-03-16 17:17           ` Stuart D. Gathman
  2009-03-15  8:51         ` Dietmar Maurer
  1 sibling, 2 replies; 39+ messages in thread
From: Marco Colombo @ 2009-03-15  0:51 UTC (permalink / raw)
  To: LVM general discussion and development

Stuart D. Gathman wrote:
> On Sat, 14 Mar 2009, Dietmar Maurer wrote: 
> It just means that write barriers won't get passed to the device.
> This is only a problem if the devices have write caches.  Note 
> that with multiple devices, even a FIFO write cache could cause 
> reordering between devices (one device could finish faster than another).

No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
that data doesn't even get to the controller, and it doesn't matter
if the disks have write caches enabled or not. Or if they have battery backed
caches. Please read the thread I linked. If what they say it's true,
you can't use LVM for anything that needs fsync(), including mail queues
(sendmail), mail storage (imapd), as such. So I'd really like to know.

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-14 14:31       ` Stuart D. Gathman
  2009-03-15  0:51         ` Marco Colombo
@ 2009-03-15  8:51         ` Dietmar Maurer
  2009-03-15 23:31           ` Marco Colombo
  2009-03-17 18:12           ` Les Mikesell
  1 sibling, 2 replies; 39+ messages in thread
From: Dietmar Maurer @ 2009-03-15  8:51 UTC (permalink / raw)
  To: LVM general discussion and development

> > Does that mean I should never use more than one device if I have
> > applications depending on fsync (databases)?
> 
> It just means that write barriers won't get passed to the device.
> This is only a problem if the devices have write caches.

But fsync is implemented using 'write barriers' - so fsync does not
work?

After fsync, all data should be sent from the OS to the disk controller:

a.) this work perfectly using LVM?

b.) this does not work at all using LVM?

c.) it works when you use one single physical drive with LVM?

I am confused. The thread on the postfix list claims that it does not
work at
all?

- Dietmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-15  8:51         ` Dietmar Maurer
@ 2009-03-15 23:31           ` Marco Colombo
  2009-03-17 18:12           ` Les Mikesell
  1 sibling, 0 replies; 39+ messages in thread
From: Marco Colombo @ 2009-03-15 23:31 UTC (permalink / raw)
  To: LVM general discussion and development

[Please forgive double-posting, I'm not sure my previous attempt
succeeded]
Dietmar Maurer wrote:
>>> Does that mean I should never use more than one device if I have
>>> applications depending on fsync (databases)?
>> It just means that write barriers won't get passed to the device.
>> This is only a problem if the devices have write caches.
> 
> But fsync is implemented using 'write barriers' - so fsync does not
> work?
> 
> After fsync, all data should be sent from the OS to the disk controller:
> 
> a.) this work perfectly using LVM?
> 
> b.) this does not work at all using LVM?
> 
> c.) it works when you use one single physical drive with LVM?
> 
> I am confused. The thread on the postfix list claims that it does not
> work at
> all?

Well, it's on the PostgreSQL list, not postfix. But it may affect postfix
as well. Quoting postfix documentation:

 Gory details: the Postfix mail queue requires that (1) the file system
 can rename a file to a near-by directory without changing the file's
 inode number, and that (2) mail is safely stored after fsync() of that
 file (not its parent directory) returns successfully, even when that
 file is renamed to a near-by directory at some later point in time.

If fsync() doen't work, point (2) is not fulfilled.


Please note: that on PostgreSQL list is not speculation. It comes from
measurements. Benchmarks show too high transaction rates, just as if
fsync() was disabled. The explanation (they provided) is that LVM does
not honor fsync().


By some reading I've done I'm not sure. Is blkdev_issue_flush() we're
talking about? Please see: http://lkml.org/lkml/2007/5/25/71
Is a LVM (well, device mapper) device still a "FLUSHABLE device" by
that definition? Apparently it's ok not to support BIO_RW_BARRIER, as
long as you support blkdev_issue_flush(). Has something changed since then?

How would you classify a LVM device? SAFE, FLUSHABLE, BARRIER or
something else (UNSAFE)?

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-15  0:51         ` Marco Colombo
@ 2009-03-16 11:02           ` Charles Marcus
  2009-03-16 11:05             ` Martin Schröder
  2009-03-16 14:36             ` Marco Colombo
  2009-03-16 17:17           ` Stuart D. Gathman
  1 sibling, 2 replies; 39+ messages in thread
From: Charles Marcus @ 2009-03-16 11:02 UTC (permalink / raw)
  To: LVM general discussion and development

On 3/14/2009 8:51 PM, Marco Colombo wrote:
> Stuart D. Gathman wrote:
>> On Sat, 14 Mar 2009, Dietmar Maurer wrote: 
>> It just means that write barriers won't get passed to the device.
>> This is only a problem if the devices have write caches.  Note 
>> that with multiple devices, even a FIFO write cache could cause 
>> reordering between devices (one device could finish faster than another).

> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
> that data doesn't even get to the controller, and it doesn't matter
> if the disks have write caches enabled or not. Or if they have battery backed
> caches. Please read the thread I linked. If what they say it's true,
> you can't use LVM for anything that needs fsync(), including mail queues
> (sendmail), mail storage (imapd), as such. So I'd really like to know.

Seeing as my /var (with both postfix & courier-imap using it for mail
storage) has been on lvm for almost 4 years, that would be news to me...

;)

-- 

Best regards,

Charles

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 11:02           ` Charles Marcus
@ 2009-03-16 11:05             ` Martin Schröder
  2009-03-16 11:18               ` Charles Marcus
  2009-03-16 14:36             ` Marco Colombo
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Schröder @ 2009-03-16 11:05 UTC (permalink / raw)
  To: LVM general discussion and development

2009/3/16, Charles Marcus <CMarcus@media-brokers.com>:
> Seeing as my /var (with both postfix & courier-imap using it for mail
>  storage) has been on lvm for almost 4 years, that would be news to me...

And how often has the computer crashed needing an fsck in those years?
It's most likely no problem if the fs is always unmounted cleanly.

Best
   Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 11:05             ` Martin Schröder
@ 2009-03-16 11:18               ` Charles Marcus
  2009-03-16 11:25                 ` Dietmar Maurer
  0 siblings, 1 reply; 39+ messages in thread
From: Charles Marcus @ 2009-03-16 11:18 UTC (permalink / raw)
  To: LVM general discussion and development

On 3/16/2009, Martin Schr�der (martin@oneiros.de) wrote:
> And how often has the computer crashed needing an fsck in those years?
> It's most likely no problem if the fs is always unmounted cleanly.

There have been probably 4 unclean shutdowns (due to extended power
outages) in these 4 years, 2 of which required an extended fsck...

Running reiserfs too...

Zero problems to date (knock on wood)...

-- 

Best regards,

Charles

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-16 11:18               ` Charles Marcus
@ 2009-03-16 11:25                 ` Dietmar Maurer
  0 siblings, 0 replies; 39+ messages in thread
From: Dietmar Maurer @ 2009-03-16 11:25 UTC (permalink / raw)
  To: LVM general discussion and development

> On 3/16/2009, Martin Schr�der (martin@oneiros.de) wrote:
> > And how often has the computer crashed needing an fsck in those
> years?
> > It's most likely no problem if the fs is always unmounted cleanly.
> 
> There have been probably 4 unclean shutdowns (due to extended power
> outages) in these 4 years, 2 of which required an extended fsck...
> 
> Running reiserfs too...
> 
> Zero problems to date (knock on wood)...

The question is if fsync is implemented correctly or not?

- Dietmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 11:02           ` Charles Marcus
  2009-03-16 11:05             ` Martin Schröder
@ 2009-03-16 14:36             ` Marco Colombo
  2009-03-16 17:13               ` Stuart D. Gathman
  1 sibling, 1 reply; 39+ messages in thread
From: Marco Colombo @ 2009-03-16 14:36 UTC (permalink / raw)
  To: LVM general discussion and development

Charles Marcus wrote:
> On 3/14/2009 8:51 PM, Marco Colombo wrote:
>> Stuart D. Gathman wrote:
>>> On Sat, 14 Mar 2009, Dietmar Maurer wrote: 
>>> It just means that write barriers won't get passed to the device.
>>> This is only a problem if the devices have write caches.  Note 
>>> that with multiple devices, even a FIFO write cache could cause 
>>> reordering between devices (one device could finish faster than another).
> 
>> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
>> that data doesn't even get to the controller, and it doesn't matter
>> if the disks have write caches enabled or not. Or if they have battery backed
>> caches. Please read the thread I linked. If what they say it's true,
>> you can't use LVM for anything that needs fsync(), including mail queues
>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
> 
> Seeing as my /var (with both postfix & courier-imap using it for mail
> storage) has been on lvm for almost 4 years, that would be news to me...
> 
> ;)
> 

Believe me or not, they both depend on fsync(). Anyway, even if you lost
a message, how do you expect to know? If you have any user base large
enough, you're used to 'missing' messages (99% of the user-deleted or
user-never-sent kind). A truly lost one may have gone missed in the noise.

A lying fsync() doesn't blow all your mail repository up, just you may
loose one/two messages on a crash. Or a transaction, speaking of databases.
If that's the case, I would like to know, that's all.

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 14:36             ` Marco Colombo
@ 2009-03-16 17:13               ` Stuart D. Gathman
  0 siblings, 0 replies; 39+ messages in thread
From: Stuart D. Gathman @ 2009-03-16 17:13 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, 16 Mar 2009, Marco Colombo wrote:

> >> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
> >> that data doesn't even get to the controller, and it doesn't matter

If that was the case, then just *attempting* to call fsync would corrupt
your filesystem/database- "dirty" blocks would not actually get written, but 
still get marked "clean".  Clearly, LVM does not interfere with writing
to the disk.  It is only write barriers (waiting for the writes to 
actually finish) that don't get passed through (in all but the most
simple cases).  It simply returns you to the old days when the man
page for sync() said "this queues dirty blocks for writing but does not
wait for them to finish" and shutdown scripts called sync() multiple times
with sleeps in between.

> A lying fsync() doesn't blow all your mail repository up, just you may
> loose one/two messages on a crash. Or a transaction, speaking of databases.
> If that's the case, I would like to know, that's all.

Since the fsync() returns "fail" when LVM can't map it to multiple devices,
it isn't exactly "lying".  And one possible response to a failure might be
to wait a bit.

According to the redhat guy, this problem came up when the simple block device
"flush" call was replaced with the more complex write barrier.  LVM had
no problem passing through a simple block device flush.  (Why couldn't
the simple "flush" call still be available?)  I would like to know which
kernel version made this change.

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-15  0:51         ` Marco Colombo
  2009-03-16 11:02           ` Charles Marcus
@ 2009-03-16 17:17           ` Stuart D. Gathman
  2009-03-16 18:50             ` Les Mikesell
  2009-03-17 16:00             ` Marco Colombo
  1 sibling, 2 replies; 39+ messages in thread
From: Stuart D. Gathman @ 2009-03-16 17:17 UTC (permalink / raw)
  To: LVM general discussion and development

On Sun, 15 Mar 2009, Marco Colombo wrote:

> Stuart D. Gathman wrote:
> > On Sat, 14 Mar 2009, Dietmar Maurer wrote: 
> > It just means that write barriers won't get passed to the device.
> > This is only a problem if the devices have write caches.  Note 
> > that with multiple devices, even a FIFO write cache could cause 
> > reordering between devices (one device could finish faster than another).
> 
> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),

That is clearly wrong - since fsync() isn't LVM's responsibility.
I think they mean that fsync() can't garrantee that any writes are
actually on the platter.

> that data doesn't even get to the controller, and it doesn't matter
> if the disks have write caches enabled or not. Or if they have battery backed
> caches. Please read the thread I linked. If what they say it's true,

That is clearly wrong.  If writes don't work, nothing works.

> you can't use LVM for anything that needs fsync(), including mail queues
> (sendmail), mail storage (imapd), as such. So I'd really like to know.

fsync() is a file system call that writes dirty buffers, and then waits
for the physical writes to complete.  It is only the waiting part that
is broken.

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 17:17           ` Stuart D. Gathman
@ 2009-03-16 18:50             ` Les Mikesell
  2009-03-16 19:36               ` Greg Freemyer
  2009-03-17 16:00             ` Marco Colombo
  1 sibling, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-16 18:50 UTC (permalink / raw)
  To: LVM general discussion and development

Stuart D. Gathman wrote:
>
>> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
> 
> That is clearly wrong - since fsync() isn't LVM's responsibility.
> I think they mean that fsync() can't garrantee that any writes are
> actually on the platter.
> 
>> that data doesn't even get to the controller, and it doesn't matter
>> if the disks have write caches enabled or not. Or if they have battery backed
>> caches. Please read the thread I linked. If what they say it's true,
> 
> That is clearly wrong.  If writes don't work, nothing works.
> 
>> you can't use LVM for anything that needs fsync(), including mail queues
>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
> 
> fsync() is a file system call that writes dirty buffers, and then waits
> for the physical writes to complete.  It is only the waiting part that
> is broken.

It's a yes or no question...  Fsync() either guarantees that the write 
is committed to physical media so the application can continue knowing 
that it's own transactional expectations are met (i.e. you can crash and 
recover that piece of data), or it is broken.  If it doesn't wait for 
completion, it can't possibly report the correct status.

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 18:50             ` Les Mikesell
@ 2009-03-16 19:36               ` Greg Freemyer
  2009-03-16 19:55                 ` [linux-lvm] liblvm status question ben scott
  2009-03-16 20:28                 ` [linux-lvm] fsync() and LVM Les Mikesell
  0 siblings, 2 replies; 39+ messages in thread
From: Greg Freemyer @ 2009-03-16 19:36 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 16, 2009 at 2:50 PM, Les Mikesell <lesmikesell@gmail.com> wrote:
> Stuart D. Gathman wrote:
>>
>>> No, it's more than that. PostgreSQL gurus say LVM doesn't honor fsync(),
>>
>> That is clearly wrong - since fsync() isn't LVM's responsibility.
>> I think they mean that fsync() can't garrantee that any writes are
>> actually on the platter.
>>
>>> that data doesn't even get to the controller, and it doesn't matter
>>> if the disks have write caches enabled or not. Or if they have battery
>>> backed
>>> caches. Please read the thread I linked. If what they say it's true,
>>
>> That is clearly wrong. �If writes don't work, nothing works.
>>
>>> you can't use LVM for anything that needs fsync(), including mail queues
>>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
>>
>> fsync() is a file system call that writes dirty buffers, and then waits
>> for the physical writes to complete. �It is only the waiting part that
>> is broken.
>
> It's a yes or no question... �Fsync() either guarantees that the write is
> committed to physical media so the application can continue knowing that
> it's own transactional expectations are met (i.e. you can crash and recover
> that piece of data), or it is broken. �If it doesn't wait for completion, it
> can't possibly report the correct status.
>

This discussion seems a bit bizarre to me.  Many apps require data get
to stable memory in a well defined way.  Barriers is certainly one way
to do that, but I don't think barriers are supported by LVM, mdraid,
or drbd.

Those are some very significant subsystems.  I have to believe
filesystems have another way to implement fsync if barriers are not
supported in the stack of block susbsystems.

Maybe this discussion needs to move to a filesystem list, since it is
the filesystem that is responsible for making fsync() work even in the
absence of barriers.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [linux-lvm] liblvm status question
  2009-03-16 19:36               ` Greg Freemyer
@ 2009-03-16 19:55                 ` ben scott
  2009-03-16 20:58                   ` Greg Freemyer
  2009-03-16 20:28                 ` [linux-lvm] fsync() and LVM Les Mikesell
  1 sibling, 1 reply; 39+ messages in thread
From: ben scott @ 2009-03-16 19:55 UTC (permalink / raw)
  To: LVM general discussion and development

Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly 
working? I am writing a program for working with logical volumes and it would 
be very helpful if I could start integrating liblvm now even if it is still 
buggy at the moment. Also, where can I  find the files or cvs? 

Thank you

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 19:36               ` Greg Freemyer
  2009-03-16 19:55                 ` [linux-lvm] liblvm status question ben scott
@ 2009-03-16 20:28                 ` Les Mikesell
  2009-03-16 20:54                   ` Greg Freemyer
  1 sibling, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-16 20:28 UTC (permalink / raw)
  To: LVM general discussion and development

Greg Freemyer wrote:
>
>>>> you can't use LVM for anything that needs fsync(), including mail queues
>>>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
>>> fsync() is a file system call that writes dirty buffers, and then waits
>>> for the physical writes to complete.  It is only the waiting part that
>>> is broken.
>> It's a yes or no question...  Fsync() either guarantees that the write is
>> committed to physical media so the application can continue knowing that
>> it's own transactional expectations are met (i.e. you can crash and recover
>> that piece of data), or it is broken.  If it doesn't wait for completion, it
>> can't possibly report the correct status.
>>
> 
> This discussion seems a bit bizarre to me.

You can't avoid a discussion of expected but missing functionality.

> Many apps require data get
> to stable memory in a well defined way.  Barriers is certainly one way
> to do that, but I don't think barriers are supported by LVM, mdraid,
> or drbd.
> 
> Those are some very significant subsystems.  I have to believe
> filesystems have another way to implement fsync if barriers are not
> supported in the stack of block susbsystems.

If you can't get the completion status from the underlying layer, how 
can a filesystem possibly implement it?

> Maybe this discussion needs to move to a filesystem list, since it is
> the filesystem that is responsible for making fsync() work even in the
> absence of barriers.

I though linux ended up doing a sync of the entire outstanding buffered 
data for a partition with horrible performance, at least on ext3.

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 20:28                 ` [linux-lvm] fsync() and LVM Les Mikesell
@ 2009-03-16 20:54                   ` Greg Freemyer
  2009-03-16 21:17                     ` Les Mikesell
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Freemyer @ 2009-03-16 20:54 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 16, 2009 at 4:28 PM, Les Mikesell <lesmikesell@gmail.com> wrote:
> Greg Freemyer wrote:
>>
>>>>> you can't use LVM for anything that needs fsync(), including mail
>>>>> queues
>>>>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
>>>>
>>>> fsync() is a file system call that writes dirty buffers, and then waits
>>>> for the physical writes to complete. �It is only the waiting part that
>>>> is broken.
>>>
>>> It's a yes or no question... �Fsync() either guarantees that the write is
>>> committed to physical media so the application can continue knowing that
>>> it's own transactional expectations are met (i.e. you can crash and
>>> recover
>>> that piece of data), or it is broken. �If it doesn't wait for completion,
>>> it
>>> can't possibly report the correct status.
>>>
>>
>> This discussion seems a bit bizarre to me.
>
> You can't avoid a discussion of expected but missing functionality.
>
>> Many apps require data get
>> to stable memory in a well defined way. �Barriers is certainly one way
>> to do that, but I don't think barriers are supported by LVM, mdraid,
>> or drbd.
>>
>> Those are some very significant subsystems. �I have to believe
>> filesystems have another way to implement fsync if barriers are not
>> supported in the stack of block susbsystems.
>
> If you can't get the completion status from the underlying layer, how can a
> filesystem possibly implement it?

Barriers is a specific technology and they were just implemented in
linux around 2005 I think.  (see documentation/barriers.txt)

Surely there was a mechanism in place before that.

>> Maybe this discussion needs to move to a filesystem list, since it is
>> the filesystem that is responsible for making fsync() work even in the
>> absence of barriers.
>
> I though linux ended up doing a sync of the entire outstanding buffered data
> for a partition with horrible performance, at least on ext3.

Yes, I understand fsync is horribly slow in ext3 and that may be the
reason.  Supposedly much better in ext4.  Still if a userspace app
calls fsync and in turn the filesystem does something really slow due
to the lack of barriers, then this conversation should be about the
poor performance of fsync() when using lvm (or mdraid, or drdb), not
the total lack of fsync() support.

> --
> �Les Mikesell
> � lesmikesell@gmail.com

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] liblvm status question
  2009-03-16 19:55                 ` [linux-lvm] liblvm status question ben scott
@ 2009-03-16 20:58                   ` Greg Freemyer
  2009-03-17 10:38                     ` Bryn M. Reeves
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Freemyer @ 2009-03-16 20:58 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote:
> Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly
> working? I am writing a program for working with logical volumes and it would
> be very helpful if I could start integrating liblvm now even if it is still
> buggy at the moment. Also, where can I �find the files or cvs?
>
> Thank you

I think you mean libdevmapper don't you?

Pretty sure libdevmapper is used by the core LVM 2.0 tools.
Unfortunately, I'm not aware of any documentation about what the API
is.  I guess you have to read the source.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 20:54                   ` Greg Freemyer
@ 2009-03-16 21:17                     ` Les Mikesell
  2009-03-16 21:36                       ` Greg Freemyer
  0 siblings, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-16 21:17 UTC (permalink / raw)
  To: LVM general discussion and development

Greg Freemyer wrote:
> 
>>> Those are some very significant subsystems.  I have to believe
>>> filesystems have another way to implement fsync if barriers are not
>>> supported in the stack of block susbsystems.
>> If you can't get the completion status from the underlying layer, how can a
>> filesystem possibly implement it?
> 
> Barriers is a specific technology and they were just implemented in
> linux around 2005 I think.  (see documentation/barriers.txt)
> 
> Surely there was a mechanism in place before that.

I'm not sure that's a reasonable assumption.

>>> Maybe this discussion needs to move to a filesystem list, since it is
>>> the filesystem that is responsible for making fsync() work even in the
>>> absence of barriers.
>> I though linux ended up doing a sync of the entire outstanding buffered data
>> for a partition with horrible performance, at least on ext3.
> 
> Yes, I understand fsync is horribly slow in ext3 and that may be the
> reason.  Supposedly much better in ext4.  Still if a userspace app
> calls fsync and in turn the filesystem does something really slow due
> to the lack of barriers, then this conversation should be about the
> poor performance of fsync() when using lvm (or mdraid, or drdb), not
> the total lack of fsync() support.

I haven't seen anyone claim yet that there is support for fsync(), which 
must return the status of the completion of the operation to the 
application.  If it does, then the discussion could turn to performance.

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 21:17                     ` Les Mikesell
@ 2009-03-16 21:36                       ` Greg Freemyer
  2009-03-16 21:53                         ` Les Mikesell
  2009-03-16 21:57                         ` Allen, Jack
  0 siblings, 2 replies; 39+ messages in thread
From: Greg Freemyer @ 2009-03-16 21:36 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 16, 2009 at 5:17 PM, Les Mikesell <lesmikesell@gmail.com> wrote:
> Greg Freemyer wrote:
>>
>>>> Those are some very significant subsystems. �I have to believe
>>>> filesystems have another way to implement fsync if barriers are not
>>>> supported in the stack of block susbsystems.
>>>
>>> If you can't get the completion status from the underlying layer, how can
>>> a
>>> filesystem possibly implement it?
>>
>> Barriers is a specific technology and they were just implemented in
>> linux around 2005 I think. �(see documentation/barriers.txt)
>>
>> Surely there was a mechanism in place before that.
>
> I'm not sure that's a reasonable assumption.
>
>>>> Maybe this discussion needs to move to a filesystem list, since it is
>>>> the filesystem that is responsible for making fsync() work even in the
>>>> absence of barriers.
>>>
>>> I though linux ended up doing a sync of the entire outstanding buffered
>>> data
>>> for a partition with horrible performance, at least on ext3.
>>
>> Yes, I understand fsync is horribly slow in ext3 and that may be the
>> reason. �Supposedly much better in ext4. �Still if a userspace app
>> calls fsync and in turn the filesystem does something really slow due
>> to the lack of barriers, then this conversation should be about the
>> poor performance of fsync() when using lvm (or mdraid, or drdb), not
>> the total lack of fsync() support.
>
> I haven't seen anyone claim yet that there is support for fsync(), which
> must return the status of the completion of the operation to the
> application. �If it does, then the discussion could turn to performance.
>
> --
> �Les Mikesell
> � lesmikesell@gmail.com

Is your specific interest to ext3?  If so, I suggest you post a
question there along the lines of:

Device Mapper does not support barriers if more than one physical
device is in use by the LV.  If I'm using ext3 on a LV and I call
fsync() from user space, how is fsync() implemented.  Or is it not?

The ext4 list is <linux-ext4@vger.kernel.org>.  I see some ext3 stuff
posted there, or it may have its own list.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 21:36                       ` Greg Freemyer
@ 2009-03-16 21:53                         ` Les Mikesell
  2009-03-16 22:51                           ` Joshua D. Drake
  2009-03-16 21:57                         ` Allen, Jack
  1 sibling, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-16 21:53 UTC (permalink / raw)
  To: LVM general discussion and development

Greg Freemyer wrote:
>>> I haven't seen anyone claim yet that there is support for fsync(), which
>> must return the status of the completion of the operation to the
>> application.  If it does, then the discussion could turn to performance.
>> 
> Is your specific interest to ext3?

No, it is whether a useful fsync() is possible over LVM.

> If so, I suggest you post a
> question there along the lines of:
> 
> Device Mapper does not support barriers if more than one physical
> device is in use by the LV.  If I'm using ext3 on a LV and I call
> fsync() from user space, how is fsync() implemented.  Or is it not?

The point of fsync() is for an application to know that a write has been 
safely committed, as for example sendmail would do before acknowledging 
to the sender that a message has been accepted.  The question isn't 
whether an application can call fsync() but rather whether it's return 
status is lying, making the underlying storage unsuitable for anything 
that needs reliability.

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-16 21:36                       ` Greg Freemyer
  2009-03-16 21:53                         ` Les Mikesell
@ 2009-03-16 21:57                         ` Allen, Jack
  1 sibling, 0 replies; 39+ messages in thread
From: Allen, Jack @ 2009-03-16 21:57 UTC (permalink / raw)
  To: LVM general discussion and development

 

-----Original Message-----
From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com] On Behalf Of Greg Freemyer
Sent: Monday, March 16, 2009 5:36 PM
To: LVM general discussion and development
Subject: Re: [linux-lvm] fsync() and LVM

On Mon, Mar 16, 2009 at 5:17 PM, Les Mikesell <lesmikesell@gmail.com> wrote:
> Greg Freemyer wrote:
>>
>>>> Those are some very significant subsystems. �I have to believe
>>>> filesystems have another way to implement fsync if barriers are not
>>>> supported in the stack of block susbsystems.
>>>
>>> If you can't get the completion status from the underlying layer, how can
>>> a
>>> filesystem possibly implement it?
>>
>> Barriers is a specific technology and they were just implemented in
>> linux around 2005 I think. �(see documentation/barriers.txt)
>>
>> Surely there was a mechanism in place before that.
>
> I'm not sure that's a reasonable assumption.
>
>>>> Maybe this discussion needs to move to a filesystem list, since it is
>>>> the filesystem that is responsible for making fsync() work even in the
>>>> absence of barriers.
>>>
>>> I though linux ended up doing a sync of the entire outstanding buffered
>>> data
>>> for a partition with horrible performance, at least on ext3.
>>
>> Yes, I understand fsync is horribly slow in ext3 and that may be the
>> reason. �Supposedly much better in ext4. �Still if a userspace app
>> calls fsync and in turn the filesystem does something really slow due
>> to the lack of barriers, then this conversation should be about the
>> poor performance of fsync() when using lvm (or mdraid, or drdb), not
>> the total lack of fsync() support.
>
> I haven't seen anyone claim yet that there is support for fsync(), which
> must return the status of the completion of the operation to the
> application. �If it does, then the discussion could turn to performance.
>
> --
> �Les Mikesell
> � lesmikesell@gmail.com

Is your specific interest to ext3?  If so, I suggest you post a
question there along the lines of:

Device Mapper does not support barriers if more than one physical
device is in use by the LV.  If I'm using ext3 on a LV and I call
fsync() from user space, how is fsync() implemented.  Or is it not?

The ext4 list is <linux-ext4@vger.kernel.org>.  I see some ext3 stuff
posted there, or it may have its own list.

Greg
-- 

========================================
So what happens if there is a database implemented directly on a Logical Volume, not File System involved at all?

Should the fsync man page describe what happens when used on each type of File System, Logical Volume, disk partition and /or combination?

-----
Thanks:
Jack Allen

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 21:53                         ` Les Mikesell
@ 2009-03-16 22:51                           ` Joshua D. Drake
  2009-03-17 15:33                             ` Joshua D. Drake
  0 siblings, 1 reply; 39+ messages in thread
From: Joshua D. Drake @ 2009-03-16 22:51 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote:

> The point of fsync() is for an application to know that a write has been 
> safely committed, as for example sendmail would do before acknowledging 
> to the sender that a message has been accepted.  The question isn't 
> whether an application can call fsync() but rather whether it's return 
> status is lying, making the underlying storage unsuitable for anything 
> that needs reliability.

Right and for databases this is critical. So enlightenment here would be
good.

Sincerely,

Joshua D. Drake


> 
-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] liblvm status question
  2009-03-16 20:58                   ` Greg Freemyer
@ 2009-03-17 10:38                     ` Bryn M. Reeves
  2009-03-17 18:42                       ` ben scott
  2009-03-17 20:52                       ` Greg Freemyer
  0 siblings, 2 replies; 39+ messages in thread
From: Bryn M. Reeves @ 2009-03-17 10:38 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote:
> On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote:
> > Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly
> > working? I am writing a program for working with logical volumes and it would
> > be very helpful if I could start integrating liblvm now even if it is still
> > buggy at the moment. Also, where can I  find the files or cvs?
> >
> > Thank you
> 
> I think you mean libdevmapper don't you?

No, he means liblvm:

http://fedoraproject.org/wiki/Features/liblvm
http://fedoraproject.org/wiki/LVM/liblvm

Patches are just starting to be merged but it's still a
work-in-progress:

http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html

Follow the lvm-devel mailing list to keep track of what's going on.

Regards,
Bryn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 22:51                           ` Joshua D. Drake
@ 2009-03-17 15:33                             ` Joshua D. Drake
  2009-03-19  9:20                               ` Tim Post
  0 siblings, 1 reply; 39+ messages in thread
From: Joshua D. Drake @ 2009-03-17 15:33 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, 2009-03-16 at 15:51 -0700, Joshua D. Drake wrote:
> On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote:
> 
> > The point of fsync() is for an application to know that a write has been 
> > safely committed, as for example sendmail would do before acknowledging 
> > to the sender that a message has been accepted.  The question isn't 
> > whether an application can call fsync() but rather whether it's return 
> > status is lying, making the underlying storage unsuitable for anything 
> > that needs reliability.
> 
> Right and for databases this is critical. So enlightenment here would be
> good.

Anyone?

Joshua D. Drake
 
-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-16 17:17           ` Stuart D. Gathman
  2009-03-16 18:50             ` Les Mikesell
@ 2009-03-17 16:00             ` Marco Colombo
  2009-03-17 17:40               ` Stuart D. Gathman
  1 sibling, 1 reply; 39+ messages in thread
From: Marco Colombo @ 2009-03-17 16:00 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 4543 bytes --]

Stuart D. Gathman wrote:
> That is clearly wrong - since fsync() isn't LVM's responsibility.
> I think they mean that fsync() can't garrantee that any writes are
> actually on the platter.

Even if the disk cache is in write-thru mode, that is.

>> that data doesn't even get to the controller, and it doesn't matter
>> if the disks have write caches enabled or not. Or if they have battery backed
>> caches. Please read the thread I linked. If what they say it's true,
> 
> That is clearly wrong.  If writes don't work, nothing works.

It's the flush (= write NOW) supposedly not working, not the write.
Writes happen, just later and potentially not in order. You seems to assume
that fsync() is the only way to have the data written. That's not clearly
the case, most userland processes just issue write(), never fsync(), and
data gets written anyway, sooner or later.

>> you can't use LVM for anything that needs fsync(), including mail queues
>> (sendmail), mail storage (imapd), as such. So I'd really like to know.
> 
> fsync() is a file system call that writes dirty buffers,

sure, but it's not the only way to have dirty pages flushed. There's
a kernel thread that flushes them every since and then, and there's
also memory pressure. So a broken fsync() can go unnoticed, you become
aware of it if and only if:

1) you run some application that needs it (most don't even use it);
2) the system crashes (power loss);
3) you are unlucky enough to hit the window of vulnerability.

If any of these conditions is not met, you won't be aware of a
mulfunctioning fsync().

But I think I understand what you mean: if the API to flush to physical
storage is the same (used by fsync(), by pdflush, by the VM system)
then you're right, everything is broken. But I've been using LVM
for years now, I'm assuming that's not the case. :)

> and then waits
> for the physical writes to complete.  It is only the waiting part that
> is broken.

Half-broken is broken. And the bigger issue here it's not even the delay.
The issue is ordering. For a database, loosing the last transactions is bad
enough, loosing transactions in the middle of the timeline is even worse.

For the mail subsystems, there's almost no ordering requirement, still
loosing messages is no good.

---------------

Ehm, I've decided to write a small test program. My system is a Fedora 7,
so nowhere recent. My setup:

/home is a LV, belonging to VG 'vg_data', whose only PV is /dev/md6.
/dev/md6 is a RAID1 md device, whose members are /dev/sda10 and /dev/sdb10.
/dev/sda and /dev/sdb are both Seagate ST3320620AS SATA disks.

The filesystem is EXT3, mounted with noatime,data=ordered.

The attached program writes the same block on a file N times (looping on
lseek/write. Depending on how it's compiled, it issues a fdatasync() after
each write.

Here are the results, for 32MB of data written:

$ time ./test_nosync

real    0m0.056s
user    0m0.004s
sys     0m0.052s

clearly, not disk activity here.

$ time ./test_sync

real    0m2.070s
user    0m0.002s
sys     0m0.152s

Now the same after hdparm -W0 /dev/sda; hdparm -W0 /dev/sdb:

$ time ./test_sync

real    1m16.431s
user    0m0.004s
sys     0m0.273s

These are 4096 "transactions" of size 8192, w/o the overhaed of
allocating new blocks (it writes to the same block over and over).
The first test is meaningless (they are never really committed).
The second test, it's about 2000 transactions per second. Too many.
In the third test, I got only about 50 transactions per second,
which makes a lot of sense.

It seems to me that in my setup, disabling the caches on the disks does
bring data to the platters, and that noone is "lying" about fsync.

Now I'm _really_ confused.


(the following isn't meaningful for the discussion)

For the curious of you (I was) I commented out the lseek(). For the _nosync
version it's the same (1/2 a second).

For the _sync version, with -W1 I get:

$ time ./test_sync

real    0m48.816s
user    0m0.002s
sys     0m0.483s

and with -W0:

$ time ./test_sync

real    3m6.674s
user    0m0.006s
sys     0m0.526s

Since all the test were done deleting the file each time, I think what
happens here is that the file is increasing in size, so fdatasync()
each time triggers a write of the inode. It's two writes per loop.
So I tried keeping the file around, having my test program write on
preallocated blocks.

With -W1:
$ time ./test_sync

real    0m11.253s
user    0m0.001s
sys     0m0.244s


with -W0:
$ time ./test_sync

real    0m46.353s
user    0m0.005s
sys     0m0.249s


.TM.

[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 807 bytes --]

/*
 * compile with -DDO_FSYNC=1 and then with -DDO_FSYNC=0
 */
#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#if !defined(DO_FSYNC)
# error "You must define DO_FSYNC"
#endif

#define MYBUFSIZ BUFSIZ
#define BYTES_TO_WRITE (32*1024*1024) /* 32MB */

int main(int argc, char *argv[])
{
	int fd, rc, i;
	char buf[MYBUFSIZ] = { '\0', };

	fd = open("testfile", O_WRONLY|O_CREAT, 0600);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	for (i = 0; i < (BYTES_TO_WRITE/MYBUFSIZ); i++) {
		rc = lseek(fd, 0, SEEK_SET);
		if (rc < 0) {
			perror("lseek");
			exit(1);
		}
		rc = write(fd, buf, sizeof(buf));
		if (rc < 0) {
			perror("write");
			exit(1);
		}
#if DO_FSYNC
		fdatasync(fd);
		if (rc < 0) {
			perror("fdatasync");
			exit(1);
		}
#endif
	} 
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-17 16:00             ` Marco Colombo
@ 2009-03-17 17:40               ` Stuart D. Gathman
  2009-03-17 18:17                 ` Les Mikesell
  0 siblings, 1 reply; 39+ messages in thread
From: Stuart D. Gathman @ 2009-03-17 17:40 UTC (permalink / raw)
  To: LVM general discussion and development

On Tue, 17 Mar 2009, Marco Colombo wrote:

> It seems to me that in my setup, disabling the caches on the disks does
> bring data to the platters, and that noone is "lying" about fsync.
> 
> Now I'm _really_ confused.

That's been my claim all along - that the broken fsync only affects
on disk cache.  LVM itself does not reorder writes in any way - it just
fails to pass along the write barrier.  fsync() does *start* writing
the dirty buffers (implemented in the fs code).  It just doesn't 
wait for the writes to finish getting to the platters.  Apparently,
it does wait for the write to get to the drive (but I'm not certain).

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-15  8:51         ` Dietmar Maurer
  2009-03-15 23:31           ` Marco Colombo
@ 2009-03-17 18:12           ` Les Mikesell
  2009-03-17 18:19             ` Dietmar Maurer
  1 sibling, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-17 18:12 UTC (permalink / raw)
  To: LVM general discussion and development

Dietmar Maurer wrote:
>>> Does that mean I should never use more than one device if I have
>>> applications depending on fsync (databases)?
>> It just means that write barriers won't get passed to the device.
>> This is only a problem if the devices have write caches.
> 
> But fsync is implemented using 'write barriers' - so fsync does not
> work?
> 
> After fsync, all data should be sent from the OS to the disk controller:
> 
> a.) this work perfectly using LVM?
> 
> b.) this does not work at all using LVM?
> 
> c.) it works when you use one single physical drive with LVM?
> 
> I am confused. The thread on the postfix list claims that it does not
> work at
> all?

Everything will seem to work until you have an inconvenient crash or 
disk error.  That is, data will be written normally - whether you fsync 
or not.  The point of fsync() though, is for an application to confirm 
that the file is committed to stable media and will be recoverable even 
if the application (or OS) crashes or the system loses power.  The 
correct next action of the application will depend on the return status 
of the fsync() operation (e.g., acknowledging receipt of a mail message, 
considering a database change to be committed, etc.).   What I believe 
is happening is that fsync() always returns as though it were successful 
even though the underlying operations haven't completed.  That's 
ummm..., optimistic at best.   But, everything will still work (and more 
quickly) as long as the physical write of the file and associated 
directory metadata eventually succeeds.   Realistically, for most things 
it doesn't matter because for critical data you still have to deal with 
the possibility of a disk write that succeeds being unreadable later for 
a variety of reasons - and the rest isn't critical anyway.  However, it 
would be good to know exactly what to expect here.

-- 
  Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-17 17:40               ` Stuart D. Gathman
@ 2009-03-17 18:17                 ` Les Mikesell
  2009-03-18  0:37                   ` Marco Colombo
  0 siblings, 1 reply; 39+ messages in thread
From: Les Mikesell @ 2009-03-17 18:17 UTC (permalink / raw)
  To: LVM general discussion and development

Stuart D. Gathman wrote:
> On Tue, 17 Mar 2009, Marco Colombo wrote:
> 
>> It seems to me that in my setup, disabling the caches on the disks does
>> bring data to the platters, and that noone is "lying" about fsync.
>>
>> Now I'm _really_ confused.
> 
> That's been my claim all along - that the broken fsync only affects
> on disk cache.  LVM itself does not reorder writes in any way - it just
> fails to pass along the write barrier.  fsync() does *start* writing
> the dirty buffers (implemented in the fs code).  It just doesn't 
> wait for the writes to finish getting to the platters.  Apparently,
> it does wait for the write to get to the drive (but I'm not certain).

Given that fsync() is supposed to return the status of the completion of 
the physical write, that sounds broken to me.  Do the LVM's in question 
here have more than one underlying device, and does it matter?

-- 
   Les Mikesell
    lesmikesell@gmail.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [linux-lvm] fsync() and LVM
  2009-03-17 18:12           ` Les Mikesell
@ 2009-03-17 18:19             ` Dietmar Maurer
  0 siblings, 0 replies; 39+ messages in thread
From: Dietmar Maurer @ 2009-03-17 18:19 UTC (permalink / raw)
  To: LVM general discussion and development

> Dietmar Maurer wrote:
> >>> Does that mean I should never use more than one device if I have
> >>> applications depending on fsync (databases)?
> >> It just means that write barriers won't get passed to the device.
> >> This is only a problem if the devices have write caches.
> >
> > But fsync is implemented using 'write barriers' - so fsync does not
> > work?
> >
> > After fsync, all data should be sent from the OS to the disk
> controller:
> >
> > a.) this work perfectly using LVM?
> >
> > b.) this does not work at all using LVM?
> >
> > c.) it works when you use one single physical drive with LVM?

Please, can someone answer that questions?

- Dietmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] liblvm status question
  2009-03-17 10:38                     ` Bryn M. Reeves
@ 2009-03-17 18:42                       ` ben scott
  2009-03-17 20:52                       ` Greg Freemyer
  1 sibling, 0 replies; 39+ messages in thread
From: ben scott @ 2009-03-17 18:42 UTC (permalink / raw)
  To: LVM general discussion and development

On Tuesday 17 March 2009 3:38:24 am Bryn M. Reeves wrote:
> On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote:
> > On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote:
> > > Is the liblvm project at a state where vgs, lvs or pvs functionality is
> > > mostly working? I am writing a program for working with logical volumes
> > > and it would be very helpful if I could start integrating liblvm now
> > > even if it is still buggy at the moment. Also, where can I  find the
> > > files or cvs?
> > >
> > > Thank you
> >
> > I think you mean libdevmapper don't you?
>
> No, he means liblvm:
>
> http://fedoraproject.org/wiki/Features/liblvm
> http://fedoraproject.org/wiki/LVM/liblvm
>
> Patches are just starting to be merged but it's still a
> work-in-progress:
>
> http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html
>
> Follow the lvm-devel mailing list to keep track of what's going on.
>
> Regards,
> Bryn.

Thank you.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] liblvm status question
  2009-03-17 10:38                     ` Bryn M. Reeves
  2009-03-17 18:42                       ` ben scott
@ 2009-03-17 20:52                       ` Greg Freemyer
  1 sibling, 0 replies; 39+ messages in thread
From: Greg Freemyer @ 2009-03-17 20:52 UTC (permalink / raw)
  To: LVM general discussion and development

On Tue, Mar 17, 2009 at 6:38 AM, Bryn M. Reeves <bmr@redhat.com> wrote:
> On Mon, 2009-03-16 at 16:58 -0400, Greg Freemyer wrote:
>> On Mon, Mar 16, 2009 at 3:55 PM, ben scott <benscott@nwlink.com> wrote:
>> > Is the liblvm project at a state where vgs, lvs or pvs functionality is mostly
>> > working? I am writing a program for working with logical volumes and it would
>> > be very helpful if I could start integrating liblvm now even if it is still
>> > buggy at the moment. Also, where can I �find the files or cvs?
>> >
>> > Thank you
>>
>> I think you mean libdevmapper don't you?
>
> No, he means liblvm:
>
> http://fedoraproject.org/wiki/Features/liblvm
> http://fedoraproject.org/wiki/LVM/liblvm
>
> Patches are just starting to be merged but it's still a
> work-in-progress:
>
> http://www.redhat.com/archives/lvm-devel/2009-March/msg00008.html
>

Very cool.  I just recommended libdevmapper to a project team doing
some work.  liblvm looks like a much better fit for them.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-17 18:17                 ` Les Mikesell
@ 2009-03-18  0:37                   ` Marco Colombo
  0 siblings, 0 replies; 39+ messages in thread
From: Marco Colombo @ 2009-03-18  0:37 UTC (permalink / raw)
  To: LVM general discussion and development

Les Mikesell wrote:
> Stuart D. Gathman wrote:
>>
>> That's been my claim all along - that the broken fsync only affects
>> on disk cache.  LVM itself does not reorder writes in any way - it just
>> fails to pass along the write barrier.  fsync() does *start* writing
>> the dirty buffers (implemented in the fs code).  It just doesn't wait
>> for the writes to finish getting to the platters.  Apparently,
>> it does wait for the write to get to the drive (but I'm not certain).
> 
> Given that fsync() is supposed to return the status of the completion of
> the physical write, that sounds broken to me.  Do the LVM's in question
> here have more than one underlying device, and does it matter?
> 

According to my tests, you get a 50x speedup when you turn the cache on.
It means that fsync is waiting for something to happen, and this "something"
happens 50 times faster only when you turn the disk write-back cache on.

It seems to me that the only explanation is that fsync is waiting for disk
I/O to complete (and not just to begin otherwise the time would be the same).
With the cache enabled, the disk reports completion when the data is in the
cache (write-back behaviour), with cache disabled it waits for the data
to be on platters (write-thru behaviour).

.TM.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [linux-lvm] fsync() and LVM
  2009-03-17 15:33                             ` Joshua D. Drake
@ 2009-03-19  9:20                               ` Tim Post
  0 siblings, 0 replies; 39+ messages in thread
From: Tim Post @ 2009-03-19  9:20 UTC (permalink / raw)
  To: jd, LVM general discussion and development

On Tue, 2009-03-17 at 08:33 -0700, Joshua D. Drake wrote:
> On Mon, 2009-03-16 at 15:51 -0700, Joshua D. Drake wrote:
> > On Mon, 2009-03-16 at 16:53 -0500, Les Mikesell wrote:
> > 
> > > The point of fsync() is for an application to know that a write has been 
> > > safely committed, as for example sendmail would do before acknowledging 
> > > to the sender that a message has been accepted.  The question isn't 
> > > whether an application can call fsync() but rather whether it's return 
> > > status is lying, making the underlying storage unsuitable for anything 
> > > that needs reliability.
> > 
> > Right and for databases this is critical. So enlightenment here would be
> > good.
> 
> Anyone?
> 
> Joshua D. Drake

If a logical volume spans physical devices where write caching is
enabled, the results of fsync() can not be trusted. This is an issue
with device mapper, lvm is one of a few possible customers of DM.

Now it gets interesting:

Enter virtualization. When you have something like this:

fsync -> guest block device -> block tap driver -> CLVM -> iscsi ->
storage -> physical disk.

Even if device mapper passed along the write barrier, would it be
reliable? Is every part of that chain going to pass the same along, and
how many opportunities for re-ordering are presented in the above?

So, even if its fixed in DM, can fsync() still be trusted? I think, at
the least, more testing should be done with various configurations even
after a suitable patch to DM is merged. What about PGSQL users using
some kind of elastic hosting?

Given the craze in 'cloud' technology, its an important question to ask
(and research). 


Cheers,
--Tim

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-03-19  9:21 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-13 17:46 [linux-lvm] fsync() and LVM Marco Colombo
2009-03-13 20:08 ` Stuart D. Gathman
2009-03-13 20:29   ` Ben Chobot
2009-03-13 20:38   ` Alasdair G Kergon
2009-03-14  3:16     ` Marco Colombo
2009-03-14  9:07     ` Dietmar Maurer
2009-03-14 14:31       ` Stuart D. Gathman
2009-03-15  0:51         ` Marco Colombo
2009-03-16 11:02           ` Charles Marcus
2009-03-16 11:05             ` Martin Schröder
2009-03-16 11:18               ` Charles Marcus
2009-03-16 11:25                 ` Dietmar Maurer
2009-03-16 14:36             ` Marco Colombo
2009-03-16 17:13               ` Stuart D. Gathman
2009-03-16 17:17           ` Stuart D. Gathman
2009-03-16 18:50             ` Les Mikesell
2009-03-16 19:36               ` Greg Freemyer
2009-03-16 19:55                 ` [linux-lvm] liblvm status question ben scott
2009-03-16 20:58                   ` Greg Freemyer
2009-03-17 10:38                     ` Bryn M. Reeves
2009-03-17 18:42                       ` ben scott
2009-03-17 20:52                       ` Greg Freemyer
2009-03-16 20:28                 ` [linux-lvm] fsync() and LVM Les Mikesell
2009-03-16 20:54                   ` Greg Freemyer
2009-03-16 21:17                     ` Les Mikesell
2009-03-16 21:36                       ` Greg Freemyer
2009-03-16 21:53                         ` Les Mikesell
2009-03-16 22:51                           ` Joshua D. Drake
2009-03-17 15:33                             ` Joshua D. Drake
2009-03-19  9:20                               ` Tim Post
2009-03-16 21:57                         ` Allen, Jack
2009-03-17 16:00             ` Marco Colombo
2009-03-17 17:40               ` Stuart D. Gathman
2009-03-17 18:17                 ` Les Mikesell
2009-03-18  0:37                   ` Marco Colombo
2009-03-15  8:51         ` Dietmar Maurer
2009-03-15 23:31           ` Marco Colombo
2009-03-17 18:12           ` Les Mikesell
2009-03-17 18:19             ` Dietmar Maurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.