* block device journal durability
@ 2010-06-16 14:46 Phil Carns
2010-06-16 17:03 ` Sage Weil
0 siblings, 1 reply; 3+ messages in thread
From: Phil Carns @ 2010-06-16 14:46 UTC (permalink / raw)
To: ceph-devel
I noticed that Ceph issues a warning if it detects that you are using a
raw block device as the journal and write caching is enabled on that
device.
When it opens the block device file, however, the FileJournal is using
O_DIRECT|O_SYNC. In recent kernels, syncing a block device file
actually triggers a proper write barrier operation
(http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420). The barrier
operation is also supported on MD and LVM now as well if you happen to
have a journal on a multi-disk volume.
Does this mean that if you have a new enough kernel, and a block device
that understands barriers, that you can safely leave the write cache
enabled for the journal device? It seems that way to me, but I wanted
to make sure that I am not missing a more subtle issue related to how
Ceph performs its journaling.
Kudos on the user-friendly warning messages in Ceph in general, by the way.
thanks,
-Phil
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: block device journal durability
2010-06-16 14:46 block device journal durability Phil Carns
@ 2010-06-16 17:03 ` Sage Weil
2010-06-16 17:32 ` Phil Carns
0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2010-06-16 17:03 UTC (permalink / raw)
To: Phil Carns; +Cc: ceph-devel
On Wed, 16 Jun 2010, Phil Carns wrote:
> I noticed that Ceph issues a warning if it detects that you are using a raw
> block device as the journal and write caching is enabled on that device.
>
> When it opens the block device file, however, the FileJournal is using
> O_DIRECT|O_SYNC. In recent kernels, syncing a block device file actually
> triggers a proper write barrier operation
> (http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420). The barrier
> operation is also supported on MD and LVM now as well if you happen to have a
> journal on a multi-disk volume.
>
> Does this mean that if you have a new enough kernel, and a block device that
> understands barriers, that you can safely leave the write cache enabled for
> the journal device? It seems that way to me, but I wanted to make sure that I
> am not missing a more subtle issue related to how Ceph performs its
> journaling.
You're correct. The only concern is that the data is safely on disk when
the write returns, and it sounds like recent kernels issue the barriers to
make that happen. Depending on how recent that behavior is, we can
probably either remove the warning entirely, or try to guess based on
kernel version.
> Kudos on the user-friendly warning messages in Ceph in general, by the way.
Thanks!
sage
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: block device journal durability
2010-06-16 17:03 ` Sage Weil
@ 2010-06-16 17:32 ` Phil Carns
0 siblings, 0 replies; 3+ messages in thread
From: Phil Carns @ 2010-06-16 17:32 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On 06/16/2010 01:03 PM, Sage Weil wrote:
> On Wed, 16 Jun 2010, Phil Carns wrote:
>
>> I noticed that Ceph issues a warning if it detects that you are using a raw
>> block device as the journal and write caching is enabled on that device.
>>
>> When it opens the block device file, however, the FileJournal is using
>> O_DIRECT|O_SYNC. In recent kernels, syncing a block device file actually
>> triggers a proper write barrier operation
>> (http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420). The barrier
>> operation is also supported on MD and LVM now as well if you happen to have a
>> journal on a multi-disk volume.
>>
>> Does this mean that if you have a new enough kernel, and a block device that
>> understands barriers, that you can safely leave the write cache enabled for
>> the journal device? It seems that way to me, but I wanted to make sure that I
>> am not missing a more subtle issue related to how Ceph performs its
>> journaling.
>>
> You're correct. The only concern is that the data is safely on disk when
> the write returns, and it sounds like recent kernels issue the barriers to
> make that happen.
Great, thanks for the confirmation.
> Depending on how recent that behavior is, we can
> probably either remove the warning entirely, or try to guess based on
> kernel version.
>
It looks like this first appeared in 2.6.33 (for both single devices and
md/lvm) as best I can tell. Its too bad there's not a better way to
detect the fsync semantics from user space. I don't know of any way
other than by checking the kernel version. It is actually an even
tougher issue if an app wants to figure that out for an arbitrary file,
because in that case it depends on the file system and the mount options
as well. In some ways it would be nice to have a "tell me what fsync
semantic this file descriptor supports" ioctl :-)
-Phil
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-06-16 17:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 14:46 block device journal durability Phil Carns
2010-06-16 17:03 ` Sage Weil
2010-06-16 17:32 ` Phil Carns
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.