All of lore.kernel.org
 help / color / mirror / Atom feed
* Not starting a OSD journal
@ 2017-01-29 12:38 Willem Jan Withagen
  2017-01-29 16:21 ` Nathan Cutler
  0 siblings, 1 reply; 7+ messages in thread
From: Willem Jan Withagen @ 2017-01-29 12:38 UTC (permalink / raw)
  To: Ceph Development

Hi all,

I'm rummaging thru the options, but I do not really see an option to
fully disable journaling?

One of the reasons for testing that is that ZFS already has very good
journaling functionality. So I'd like to see what kind of performance
difference that makes.

Or is this like setting journal-size to 0 or the path to /dev/null?

--WjW



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-29 12:38 Not starting a OSD journal Willem Jan Withagen
@ 2017-01-29 16:21 ` Nathan Cutler
  2017-01-29 20:09   ` Willem Jan Withagen
  0 siblings, 1 reply; 7+ messages in thread
From: Nathan Cutler @ 2017-01-29 16:21 UTC (permalink / raw)
  To: Willem Jan Withagen, Ceph Development

> I'm rummaging thru the options, but I do not really see an option to
> fully disable journaling?
>
> One of the reasons for testing that is that ZFS already has very good
> journaling functionality. So I'd like to see what kind of performance
> difference that makes.
>
> Or is this like setting journal-size to 0 or the path to /dev/null?

All writes go through the journal, so it is required. However, the 
journal can be in a file within the OSD data partition. To deploy an OSD 
in this configuration, it should be sufficient to *not* supply the 
JOURNAL positional parameter to "ceph-disk prepare" [1].

By doing this, you of course lose the option of putting the journal on a 
separate (SSD) disk. If your data partition is on an HDD, journal-on-SSD 
is going to give superior performance.

[1] See "ceph-disk prepare --help" for a description of these arguments.

-- 
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-29 16:21 ` Nathan Cutler
@ 2017-01-29 20:09   ` Willem Jan Withagen
  2017-01-29 22:45     ` Nathan Cutler
  2017-01-30  9:17     ` Kostas Liakakis
  0 siblings, 2 replies; 7+ messages in thread
From: Willem Jan Withagen @ 2017-01-29 20:09 UTC (permalink / raw)
  To: Nathan Cutler, Ceph Development

On 29-1-2017 17:21, Nathan Cutler wrote:
>> I'm rummaging thru the options, but I do not really see an option to
>> fully disable journaling?
>>
>> One of the reasons for testing that is that ZFS already has very good
>> journaling functionality. So I'd like to see what kind of performance
>> difference that makes.
>>
>> Or is this like setting journal-size to 0 or the path to /dev/null?
> 
> All writes go through the journal, so it is required. However, the
> journal can be in a file within the OSD data partition. To deploy an OSD
> in this configuration, it should be sufficient to *not* supply the
> JOURNAL positional parameter to "ceph-disk prepare" [1].
> 
> By doing this, you of course lose the option of putting the journal on a
> separate (SSD) disk. If your data partition is on an HDD, journal-on-SSD
> is going to give superior performance.
> 
> [1] See "ceph-disk prepare --help" for a description of these arguments.

'mmm,

too bad..

Now the not so bad part is again that the journal is probablu oke in a
file on ZFS if that ZFS-pool is backed with a ZIL (the zfs journal) and
L2ARC (the ZFS cache).

The disadvantage is that there will be a double write per original write:
 (ceph) first write is to the journal-file
    (zfs) write is stored in the write queue
    (zfs) write to ZIL(ssd) if write is synced write
    (zfs) async write to disk when write slot is available
 (ceph) read from zfs-store,
    (zfs) delivers data either arc(ram) or l2arc(ssd) or HD
 (ceph) writes data to filestore.
    (zfs) write is stored in the write queue
    (zfs) write to ZIL(ssd) if write is synced write
    (zfs) async write to disk when write slot is available

And I hoped to forgo the Ceph journal write/read cycle.

The other way to do this is to not use a ZIL in ZFS and depend on the
journal in Ceph. But then sync writes to ZFS without ZIL is not a real
sensible thing to do...
But then it will burn double SSD space and double write cycles.

Another division would be to create a separate ZFS pool that has both
ZIL and L2ARC which is only used for the journals...
But then it is the question if the actual writes to the store are also
done synchronous? Because then that would also again require a ZIL.

Now it would not be so bad because all ZILs are like 1Gb in size.
But in the end it will impact in the available bandwidth of the SSDs.

--WjW

	


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-29 20:09   ` Willem Jan Withagen
@ 2017-01-29 22:45     ` Nathan Cutler
  2017-01-29 22:55       ` Willem Jan Withagen
  2017-01-30  9:17     ` Kostas Liakakis
  1 sibling, 1 reply; 7+ messages in thread
From: Nathan Cutler @ 2017-01-29 22:45 UTC (permalink / raw)
  To: Willem Jan Withagen, Ceph Development

Hi Willem:

Sounds like you want to put your journals in a partition, not a file. 
Since the Ceph journal uses the journal partition directly (without any 
underlying filesystem), you will not have ZFS to worry about there.

Nathan

On 01/29/2017 09:09 PM, Willem Jan Withagen wrote:
> On 29-1-2017 17:21, Nathan Cutler wrote:
>>> I'm rummaging thru the options, but I do not really see an option to
>>> fully disable journaling?
>>>
>>> One of the reasons for testing that is that ZFS already has very good
>>> journaling functionality. So I'd like to see what kind of performance
>>> difference that makes.
>>>
>>> Or is this like setting journal-size to 0 or the path to /dev/null?
>>
>> All writes go through the journal, so it is required. However, the
>> journal can be in a file within the OSD data partition. To deploy an OSD
>> in this configuration, it should be sufficient to *not* supply the
>> JOURNAL positional parameter to "ceph-disk prepare" [1].
>>
>> By doing this, you of course lose the option of putting the journal on a
>> separate (SSD) disk. If your data partition is on an HDD, journal-on-SSD
>> is going to give superior performance.
>>
>> [1] See "ceph-disk prepare --help" for a description of these arguments.
>
> 'mmm,
>
> too bad..
>
> Now the not so bad part is again that the journal is probablu oke in a
> file on ZFS if that ZFS-pool is backed with a ZIL (the zfs journal) and
> L2ARC (the ZFS cache).
>
> The disadvantage is that there will be a double write per original write:
>  (ceph) first write is to the journal-file
>     (zfs) write is stored in the write queue
>     (zfs) write to ZIL(ssd) if write is synced write
>     (zfs) async write to disk when write slot is available
>  (ceph) read from zfs-store,
>     (zfs) delivers data either arc(ram) or l2arc(ssd) or HD
>  (ceph) writes data to filestore.
>     (zfs) write is stored in the write queue
>     (zfs) write to ZIL(ssd) if write is synced write
>     (zfs) async write to disk when write slot is available
>
> And I hoped to forgo the Ceph journal write/read cycle.
>
> The other way to do this is to not use a ZIL in ZFS and depend on the
> journal in Ceph. But then sync writes to ZFS without ZIL is not a real
> sensible thing to do...
> But then it will burn double SSD space and double write cycles.
>
> Another division would be to create a separate ZFS pool that has both
> ZIL and L2ARC which is only used for the journals...
> But then it is the question if the actual writes to the store are also
> done synchronous? Because then that would also again require a ZIL.
>
> Now it would not be so bad because all ZILs are like 1Gb in size.
> But in the end it will impact in the available bandwidth of the SSDs.
>
> --WjW
>
> 	
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-29 22:45     ` Nathan Cutler
@ 2017-01-29 22:55       ` Willem Jan Withagen
  0 siblings, 0 replies; 7+ messages in thread
From: Willem Jan Withagen @ 2017-01-29 22:55 UTC (permalink / raw)
  To: Nathan Cutler, Ceph Development

On 29-1-2017 23:45, Nathan Cutler wrote:
> Hi Willem:
> 
> Sounds like you want to put your journals in a partition, not a file.
> Since the Ceph journal uses the journal partition directly (without any
> underlying filesystem), you will not have ZFS to worry about there.

Hi Nathan,

That is one approach. ;)
Sorry for being so persistent.

I was looking for ways to use ZFS properties to the advantage of the system.
And removing a complete write/read cycle (even if is is on ssd) would be
an interesting option.

This assumes that ZFS with ZIL is rather reliable. (i'd dare to say
very) It has not let me down in the 10 years I'm running it now.

--WjW

> Nathan
> 
> On 01/29/2017 09:09 PM, Willem Jan Withagen wrote:
>> On 29-1-2017 17:21, Nathan Cutler wrote:
>>>> I'm rummaging thru the options, but I do not really see an option to
>>>> fully disable journaling?
>>>>
>>>> One of the reasons for testing that is that ZFS already has very good
>>>> journaling functionality. So I'd like to see what kind of performance
>>>> difference that makes.
>>>>
>>>> Or is this like setting journal-size to 0 or the path to /dev/null?
>>>
>>> All writes go through the journal, so it is required. However, the
>>> journal can be in a file within the OSD data partition. To deploy an OSD
>>> in this configuration, it should be sufficient to *not* supply the
>>> JOURNAL positional parameter to "ceph-disk prepare" [1].
>>>
>>> By doing this, you of course lose the option of putting the journal on a
>>> separate (SSD) disk. If your data partition is on an HDD, journal-on-SSD
>>> is going to give superior performance.
>>>
>>> [1] See "ceph-disk prepare --help" for a description of these arguments.
>>
>> 'mmm,
>>
>> too bad..
>>
>> Now the not so bad part is again that the journal is probablu oke in a
>> file on ZFS if that ZFS-pool is backed with a ZIL (the zfs journal) and
>> L2ARC (the ZFS cache).
>>
>> The disadvantage is that there will be a double write per original write:
>>  (ceph) first write is to the journal-file
>>     (zfs) write is stored in the write queue
>>     (zfs) write to ZIL(ssd) if write is synced write
>>     (zfs) async write to disk when write slot is available
>>  (ceph) read from zfs-store,
>>     (zfs) delivers data either arc(ram) or l2arc(ssd) or HD
>>  (ceph) writes data to filestore.
>>     (zfs) write is stored in the write queue
>>     (zfs) write to ZIL(ssd) if write is synced write
>>     (zfs) async write to disk when write slot is available
>>
>> And I hoped to forgo the Ceph journal write/read cycle.
>>
>> The other way to do this is to not use a ZIL in ZFS and depend on the
>> journal in Ceph. But then sync writes to ZFS without ZIL is not a real
>> sensible thing to do...
>> But then it will burn double SSD space and double write cycles.
>>
>> Another division would be to create a separate ZFS pool that has both
>> ZIL and L2ARC which is only used for the journals...
>> But then it is the question if the actual writes to the store are also
>> done synchronous? Because then that would also again require a ZIL.
>>
>> Now it would not be so bad because all ZILs are like 1Gb in size.
>> But in the end it will impact in the available bandwidth of the SSDs.
>>
>> --WjW
>>
>>     
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-29 20:09   ` Willem Jan Withagen
  2017-01-29 22:45     ` Nathan Cutler
@ 2017-01-30  9:17     ` Kostas Liakakis
  2017-01-30  9:47       ` Willem Jan Withagen
  1 sibling, 1 reply; 7+ messages in thread
From: Kostas Liakakis @ 2017-01-30  9:17 UTC (permalink / raw)
  To: Ceph Development

On 2017-01-29 22:09, Willem Jan Withagen wrote:
> The disadvantage is that there will be a double write per original write:
>  (ceph) first write is to the journal-file
>     (zfs) write is stored in the write queue
>     (zfs) write to ZIL(ssd) if write is synced write
>     (zfs) async write to disk when write slot is available
>  (ceph) read from zfs-store,
>     (zfs) delivers data either arc(ram) or l2arc(ssd) or HD
>  (ceph) writes data to filestore.
>     (zfs) write is stored in the write queue
>     (zfs) write to ZIL(ssd) if write is synced write
>     (zfs) async write to disk when write slot is available
>
> And I hoped to forgo the Ceph journal write/read cycle.
You've got a slight misconception there, not that it matters much to
your problem.

A Ceph OSD will never read its journal under normal operation. OSD will
only commit data to the journal before it commits them to its filestore.
The journal is only replayed after an OSD crash or otherwise abnormal
termination.

The way Ceph OSDs w/ filestore has been put in place I don't think there
is a way to fully utilize zfs features.

-K.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not starting a OSD journal
  2017-01-30  9:17     ` Kostas Liakakis
@ 2017-01-30  9:47       ` Willem Jan Withagen
  0 siblings, 0 replies; 7+ messages in thread
From: Willem Jan Withagen @ 2017-01-30  9:47 UTC (permalink / raw)
  To: Kostas Liakakis, Ceph Development

On 30-1-2017 10:17, Kostas Liakakis wrote:
> On 2017-01-29 22:09, Willem Jan Withagen wrote:
>> The disadvantage is that there will be a double write per original write:
>>  (ceph) first write is to the journal-file
>>     (zfs) write is stored in the write queue
>>     (zfs) write to ZIL(ssd) if write is synced write
>>     (zfs) async write to disk when write slot is available
>>  (ceph) read from zfs-store,
>>     (zfs) delivers data either arc(ram) or l2arc(ssd) or HD
>>  (ceph) writes data to filestore.
>>     (zfs) write is stored in the write queue
>>     (zfs) write to ZIL(ssd) if write is synced write
>>     (zfs) async write to disk when write slot is available
>>
>> And I hoped to forgo the Ceph journal write/read cycle.
> You've got a slight misconception there, not that it matters much to
> your problem.
> 
> A Ceph OSD will never read its journal under normal operation. OSD will
> only commit data to the journal before it commits them to its filestore.
> The journal is only replayed after an OSD crash or otherwise abnormal
> termination.

Ah, Oke .. Thanx for the clarification.

This is also how ZFS does it. The ZIL is only read in case a node
crashes and transactions need to be replayed.

> The way Ceph OSDs w/ filestore has been put in place I don't think there
> is a way to fully utilize zfs features.

I'm running some benchmarks with a simple config just to see what is
going on. And your explanation is in accordance with what I was seeing.
Although I do see some reads from the journal, but not at the same rate
things are written.

Guess I'm going to be toying a bit more....

--WjW



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-30  9:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-29 12:38 Not starting a OSD journal Willem Jan Withagen
2017-01-29 16:21 ` Nathan Cutler
2017-01-29 20:09   ` Willem Jan Withagen
2017-01-29 22:45     ` Nathan Cutler
2017-01-29 22:55       ` Willem Jan Withagen
2017-01-30  9:17     ` Kostas Liakakis
2017-01-30  9:47       ` Willem Jan Withagen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.