All of lore.kernel.org
 help / color / mirror / Atom feed
* block device journal durability
@ 2010-06-16 14:46 Phil Carns
  2010-06-16 17:03 ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Phil Carns @ 2010-06-16 14:46 UTC (permalink / raw)
  To: ceph-devel

I noticed that Ceph issues a warning if it detects that you are using a 
raw block device as the journal and write caching is enabled on that 
device.

When it opens the block device file, however, the FileJournal is using 
O_DIRECT|O_SYNC.  In recent kernels, syncing a block device file 
actually triggers a proper write barrier operation
(http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420).  The barrier 
operation is also supported on MD and LVM now as well if you happen to 
have a journal on a multi-disk volume.

Does this mean that if you have a new enough kernel, and a block device 
that understands barriers, that you can safely leave the write cache 
enabled for the journal device?  It seems that way to me, but I wanted 
to make sure that I am not missing a more subtle issue related to how 
Ceph performs its journaling.

Kudos on the user-friendly warning messages in Ceph in general, by the way.

thanks,
-Phil


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block device journal durability
  2010-06-16 14:46 block device journal durability Phil Carns
@ 2010-06-16 17:03 ` Sage Weil
  2010-06-16 17:32   ` Phil Carns
  0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2010-06-16 17:03 UTC (permalink / raw)
  To: Phil Carns; +Cc: ceph-devel

On Wed, 16 Jun 2010, Phil Carns wrote:
> I noticed that Ceph issues a warning if it detects that you are using a raw
> block device as the journal and write caching is enabled on that device.
> 
> When it opens the block device file, however, the FileJournal is using
> O_DIRECT|O_SYNC.  In recent kernels, syncing a block device file actually
> triggers a proper write barrier operation
> (http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420).  The barrier
> operation is also supported on MD and LVM now as well if you happen to have a
> journal on a multi-disk volume.
>
> Does this mean that if you have a new enough kernel, and a block device that
> understands barriers, that you can safely leave the write cache enabled for
> the journal device?  It seems that way to me, but I wanted to make sure that I
> am not missing a more subtle issue related to how Ceph performs its
> journaling.

You're correct.  The only concern is that the data is safely on disk when 
the write returns, and it sounds like recent kernels issue the barriers to 
make that happen.  Depending on how recent that behavior is, we can 
probably either remove the warning entirely, or try to guess based on 
kernel version.

> Kudos on the user-friendly warning messages in Ceph in general, by the way.

Thanks!
sage


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block device journal durability
  2010-06-16 17:03 ` Sage Weil
@ 2010-06-16 17:32   ` Phil Carns
  0 siblings, 0 replies; 3+ messages in thread
From: Phil Carns @ 2010-06-16 17:32 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 06/16/2010 01:03 PM, Sage Weil wrote:
> On Wed, 16 Jun 2010, Phil Carns wrote:
>    
>> I noticed that Ceph issues a warning if it detects that you are using a raw
>> block device as the journal and write caching is enabled on that device.
>>
>> When it opens the block device file, however, the FileJournal is using
>> O_DIRECT|O_SYNC.  In recent kernels, syncing a block device file actually
>> triggers a proper write barrier operation
>> (http://lxr.linux.no/linux+v2.6.34/fs/block_dev.c#L420).  The barrier
>> operation is also supported on MD and LVM now as well if you happen to have a
>> journal on a multi-disk volume.
>>
>> Does this mean that if you have a new enough kernel, and a block device that
>> understands barriers, that you can safely leave the write cache enabled for
>> the journal device?  It seems that way to me, but I wanted to make sure that I
>> am not missing a more subtle issue related to how Ceph performs its
>> journaling.
>>      
> You're correct.  The only concern is that the data is safely on disk when
> the write returns, and it sounds like recent kernels issue the barriers to
> make that happen.

Great, thanks for the confirmation.

>   Depending on how recent that behavior is, we can
> probably either remove the warning entirely, or try to guess based on
> kernel version.
>    

It looks like this first appeared in 2.6.33 (for both single devices and 
md/lvm) as best I can tell.  Its too bad there's not a better way to 
detect the fsync semantics from user space.  I don't know of any way 
other than by checking the kernel version.  It is actually an even 
tougher issue if an app wants to figure that out for an arbitrary file, 
because in that case it depends on the file system and the mount options 
as well.  In some ways it would be nice to have a "tell me what fsync 
semantic this file descriptor supports" ioctl :-)

-Phil

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-06-16 17:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 14:46 block device journal durability Phil Carns
2010-06-16 17:03 ` Sage Weil
2010-06-16 17:32   ` Phil Carns

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.