All of lore.kernel.org
 help / color / mirror / Atom feed
* RBD Cache and rbd-nbd
@ 2018-05-10 19:03 Marc Schöchlin
       [not found] ` <f57c4834-517e-0c0e-7496-831689327bac-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Schöchlin @ 2018-05-10 19:03 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users


[-- Attachment #1.1: Type: text/plain, Size: 1262 bytes --]

Hello list,

i map ~30 rbds  per xenserver
<https://github.com/vico-research-and-consulting/RBDSR> host by using
rbd-nbd to run virtual machines on these devices.

I have the following questions:

  * Is it possible to use rbd cache for rbd-nbd? I assume that this is
    true, but  the documentation does not make a clear statement about this.
    (http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/(
  * If i configure caches like described at
    http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there
    dedicated caches per rbd-nbd/krbd device or is there a only a single
    cache area.
    How can i identify the rbd cache with the tools provided by the
    operating system?
  * Can you provide some hints how to about adequate cache settings for
    a write intensive environment (70% write, 30% read)?
    Is it a good idea to specify a huge rbd cache of 1 GB with a max
    dirty age of 10 seconds?

Regards
Marc

Our system:

  * Luminous/12.2.5
  * Ubuntu 16.04
  * 5 OSD Nodes (24*8 TB HDD OSDs, 48*1TB SSD OSDS, Bluestore, 6Gb Cache
    per OSD)
  * Size per OSD, 192GB RAM, 56 HT CPUs)
  * 3 Mons (64 GB RAM, 200GB SSD, 4 visible CPUs)
  * 2 * 10 GBIT, SFP+, bonded xmit_hash_policy layer3+4


[-- Attachment #1.2: Type: text/html, Size: 1930 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD Cache and rbd-nbd
       [not found] ` <f57c4834-517e-0c0e-7496-831689327bac-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
@ 2018-05-10 19:18   ` Jason Dillaman
       [not found]     ` <CA+aFP1BYyE-opsc8eLH79A8EK-2N0cv_qjah788dKdOYLx7ZYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Dillaman @ 2018-05-10 19:18 UTC (permalink / raw)
  To: Marc Schöchlin; +Cc: ceph-devel, ceph-users

On Thu, May 10, 2018 at 12:03 PM, Marc Schöchlin <ms@256bit.org> wrote:
> Hello list,
>
> i map ~30 rbds  per xenserver host by using rbd-nbd to run virtual machines
> on these devices.
>
> I have the following questions:
>
> Is it possible to use rbd cache for rbd-nbd? I assume that this is true, but
> the documentation does not make a clear statement about this.
> (http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/(

It's on by default since it's a librbd client and that's the default setting.

> If i configure caches like described at
> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
> caches per rbd-nbd/krbd device or is there a only a single cache area.

The librbd cache is per device, but if you aren't performing direct
IOs to the device, you would also have the unified Linux pagecache on
top of all the devices.

> How can i identify the rbd cache with the tools provided by the operating
> system?

Identify how? You can enable the admin sockets and use "ceph
--admin-deamon config show" to display the in-use settings.

> Can you provide some hints how to about adequate cache settings for a write
> intensive environment (70% write, 30% read)?
> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
> of 10 seconds?

The librbd cache is really only useful for sequential read-ahead and
for small writes (assuming writeback is enabled). Assuming you aren't
using direct IO, I'd suspect your best performance would be to disable
the librbd cache and rely on the Linux pagecache to work its magic.

>
> Regards
> Marc
>
> Our system:
>
> Luminous/12.2.5
> Ubuntu 16.04
> 5 OSD Nodes (24*8 TB HDD OSDs, 48*1TB SSD OSDS, Bluestore, 6Gb Cache per
> OSD)
> Size per OSD, 192GB RAM, 56 HT CPUs)
> 3 Mons (64 GB RAM, 200GB SSD, 4 visible CPUs)
> 2 * 10 GBIT, SFP+, bonded xmit_hash_policy layer3+4
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD Cache and rbd-nbd
       [not found]     ` <CA+aFP1BYyE-opsc8eLH79A8EK-2N0cv_qjah788dKdOYLx7ZYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-11  7:59       ` Marc Schöchlin
       [not found]         ` <de7be619-2da0-7f4f-08cf-aafb80bffeb9-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Schöchlin @ 2018-05-11  7:59 UTC (permalink / raw)
  To: dillaman-H+wXaHxf7aLQT0dZR+AlfA; +Cc: ceph-devel, ceph-users


[-- Attachment #1.1: Type: text/plain, Size: 3654 bytes --]

Hello Jason,

thanks for your response.


Am 10.05.2018 um 21:18 schrieb Jason Dillaman:

>> If i configure caches like described at
>> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
>> caches per rbd-nbd/krbd device or is there a only a single cache area.
> The librbd cache is per device, but if you aren't performing direct
> IOs to the device, you would also have the unified Linux pagecache on
> top of all the devices.
XENServer directly utilizes nbd devices which are connected in my
understanding by blkback (dom-0) and blkfront (dom-U) to the virtual
machines.
In my understanding pagecache is only part of the game if i use data on
mounted filesystems (VFS usage).
Therefore it would be a good thing to use rbd cache for rbd-nbd (/dev/nbdX).
>> How can i identify the rbd cache with the tools provided by the operating
>> system?
> Identify how? You can enable the admin sockets and use "ceph
> --admin-deamon config show" to display the in-use settings.

Ah ok, i discovered that i can gather configuration settings by executing:
(xen_test is the identity of the xen rbd_nbd user)

ceph --id xen_test --admin-daemon
/var/run/ceph/ceph-client.xen_test.asok config show | less -p rbd_cache

Sorry, my question was a bit unprecice: I was searching for usage
statistics of the rbd cache.
Is there also a possibility to gather rbd_cache usage statistics as a
source of verification for optimizing the cache settings?

Due to the fact that a rbd cache is created for every device, i assume
that the rbd cache simply part of the rbd-nbd process memory.


>> Can you provide some hints how to about adequate cache settings for a write
>> intensive environment (70% write, 30% read)?
>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>> of 10 seconds?
> The librbd cache is really only useful for sequential read-ahead and
> for small writes (assuming writeback is enabled). Assuming you aren't
> using direct IO, I'd suspect your best performance would be to disable
> the librbd cache and rely on the Linux pagecache to work its magic.
As described, xenserver directly utilizes the nbd devices.

Our typical workload is originated over 70 percent in database write
operations in the virtual machines.
Therefore collecting write operations with rbd cache and writing them in
chunks to ceph might be a good thing.
A higher limit for "rbd cache max dirty" might be a adequate here.
At the other side our read workload typically reads huge files in
sequential manner.

Therefore it might be useful to do start with a configuration like that:

rbd cache size = 64MB
rbd cache max dirty = 48MB
rbd cache target dirty = 32MB
rbd cache max dirty age = 10

What is the strategy of librbd to write data to the storage from
rbd_cache if "rbd cache max dirty = 48MB" is reached?
Is there a reduction of io operations (merging of ios) compared to the
granularity of writes of my virtual machines?

Additionally, i would do no non-default settings for readahead on nbd
level to have the possibility to configure this at operating system
level of the vms.

Our operating systems in the virtual machines use currently a readahead
of 256 (256*512 = 128KB).
From my point of view it would be a good thing for sequential reads in
big files to increase readahead to a higher value.
We haven't changed the default rbd object size of 4MB - nevertheless it
might be a good thing to increase the readahead to 1024 (=512KB) to
decrease read requests by factor of 4for sequential reads.

What do you think about this?

Regards
Marc


[-- Attachment #1.2: Type: text/html, Size: 6094 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD Cache and rbd-nbd
       [not found]         ` <de7be619-2da0-7f4f-08cf-aafb80bffeb9-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
@ 2018-05-11 15:02           ` Jason Dillaman
       [not found]             ` <CA+aFP1Dn-yVRwTt7Wfj2HX+56FUPzJwYD5+us8ESLPvZ9EjHCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Dillaman @ 2018-05-11 15:02 UTC (permalink / raw)
  To: Marc Schöchlin; +Cc: ceph-devel, ceph-users

On Fri, May 11, 2018 at 3:59 AM, Marc Schöchlin <ms@256bit.org> wrote:
> Hello Jason,
>
> thanks for your response.
>
>
> Am 10.05.2018 um 21:18 schrieb Jason Dillaman:
>
> If i configure caches like described at
> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated
> caches per rbd-nbd/krbd device or is there a only a single cache area.
>
> The librbd cache is per device, but if you aren't performing direct
> IOs to the device, you would also have the unified Linux pagecache on
> top of all the devices.
>
> XENServer directly utilizes nbd devices which are connected in my
> understanding by blkback (dom-0) and blkfront (dom-U) to the virtual
> machines.
> In my understanding pagecache is only part of the game if i use data on
> mounted filesystems (VFS usage).
> Therefore it would be a good thing to use rbd cache for rbd-nbd (/dev/nbdX).

I cannot speak for Xen, but in general IO to a block device will hit
the pagecache unless the IO operation is flagged as direct (e.g.
O_DIRECT) to bypass the pagecache and directly send it to the block
device.

> How can i identify the rbd cache with the tools provided by the operating
> system?
>
> Identify how? You can enable the admin sockets and use "ceph
> --admin-deamon config show" to display the in-use settings.
>
>
> Ah ok, i discovered that i can gather configuration settings by executing:
> (xen_test is the identity of the xen rbd_nbd user)
>
> ceph --id xen_test --admin-daemon /var/run/ceph/ceph-client.xen_test.asok
> config show | less -p rbd_cache
>
> Sorry, my question was a bit unprecice: I was searching for usage statistics
> of the rbd cache.
> Is there also a possibility to gather rbd_cache usage statistics as a source
> of verification for optimizing the cache settings?

You can run "perf dump" instead of "config show" to dump out the
current performance counters. There are some stats from the in-memory
cache included in there.

> Due to the fact that a rbd cache is created for every device, i assume that
> the rbd cache simply part of the rbd-nbd process memory.

Correct.

>
> Can you provide some hints how to about adequate cache settings for a write
> intensive environment (70% write, 30% read)?
> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
> of 10 seconds?

Depends on your workload and your testing results. I suspect a
database on top of RBD is going to do its own read caching and will be
issuing lots of flush calls to the block device, potentially negating
the need for a large cache.

> The librbd cache is really only useful for sequential read-ahead and
> for small writes (assuming writeback is enabled). Assuming you aren't
> using direct IO, I'd suspect your best performance would be to disable
> the librbd cache and rely on the Linux pagecache to work its magic.
>
> As described, xenserver directly utilizes the nbd devices.
>
> Our typical workload is originated over 70 percent in database write
> operations in the virtual machines.
> Therefore collecting write operations with rbd cache and writing them in
> chunks to ceph might be a good thing.
> A higher limit for "rbd cache max dirty" might be a adequate here.
> At the other side our read workload typically reads huge files in sequential
> manner.
>
> Therefore it might be useful to do start with a configuration like that:
>
> rbd cache size = 64MB
> rbd cache max dirty = 48MB
> rbd cache target dirty = 32MB
> rbd cache max dirty age = 10
>
> What is the strategy of librbd to write data to the storage from rbd_cache
> if "rbd cache max dirty = 48MB" is reached?
> Is there a reduction of io operations (merging of ios) compared to the
> granularity of writes of my virtual machines?

If the cache is full, incoming IO will be stalled as the dirty bits
are written back to the backing RBD image to make room available for
the new IO request.

> Additionally, i would do no non-default settings for readahead on nbd level
> to have the possibility to configure this at operating system level of the
> vms.
>
> Our operating systems in the virtual machines use currently a readahead of
> 256 (256*512 = 128KB).
> From my point of view it would be a good thing for sequential reads in big
> files to increase readahead to a higher value.
> We haven't changed the default rbd object size of 4MB - nevertheless it
> might be a good thing to increase the readahead to 1024 (=512KB) to decrease
> read requests by factor of 4 for sequential reads.
>
> What do you think about this?

Depends on your workload.

> Regards
> Marc
>



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD Cache and rbd-nbd
       [not found]             ` <CA+aFP1Dn-yVRwTt7Wfj2HX+56FUPzJwYD5+us8ESLPvZ9EjHCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-14  7:15               ` Marc Schöchlin
       [not found]                 ` <74367bd7-d96f-6dfd-427a-ff74662b90bc-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Schöchlin @ 2018-05-14  7:15 UTC (permalink / raw)
  To: dillaman-H+wXaHxf7aLQT0dZR+AlfA; +Cc: ceph-devel, ceph-users

Hello Jason,

many thanks for your informative response!

Am 11.05.2018 um 17:02 schrieb Jason Dillaman:
> I cannot speak for Xen, but in general IO to a block device will hit
> the pagecache unless the IO operation is flagged as direct (e.g.
> O_DIRECT) to bypass the pagecache and directly send it to the block
> device.
Sure, but it seems that xenserver just forwards io from virtual machines
(vm: blkfront, dom-0: blkback) to the ndb device in dom-0.
>> Sorry, my question was a bit unprecice: I was searching for usage statistics
>> of the rbd cache.
>> Is there also a possibility to gather rbd_cache usage statistics as a source
>> of verification for optimizing the cache settings?
> You can run "perf dump" instead of "config show" to dump out the
> current performance counters. There are some stats from the in-memory
> cache included in there.
Great, i was not aware of that.
There are really a lot of statistics which might be useful for analyzing
whats going on or if the optimizations improve the performance of our
systems.
>> Can you provide some hints how to about adequate cache settings for a write
>> intensive environment (70% write, 30% read)?
>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>> of 10 seconds?
> Depends on your workload and your testing results. I suspect a
> database on top of RBD is going to do its own read caching and will be
> issuing lots of flush calls to the block device, potentially negating
> the need for a large cache.

Sure, reducing flushes with the acceptance of a degraded level of
reliability seems to be one import key for improved performance.

>>
>> Our typical workload is originated over 70 percent in database write
>> operations in the virtual machines.
>> Therefore collecting write operations with rbd cache and writing them in
>> chunks to ceph might be a good thing.
>> A higher limit for "rbd cache max dirty" might be a adequate here.
>> At the other side our read workload typically reads huge files in sequential
>> manner.
>>
>> Therefore it might be useful to do start with a configuration like that:
>>
>> rbd cache size = 64MB
>> rbd cache max dirty = 48MB
>> rbd cache target dirty = 32MB
>> rbd cache max dirty age = 10
>>
>> What is the strategy of librbd to write data to the storage from rbd_cache
>> if "rbd cache max dirty = 48MB" is reached?
>> Is there a reduction of io operations (merging of ios) compared to the
>> granularity of writes of my virtual machines?
> If the cache is full, incoming IO will be stalled as the dirty bits
> are written back to the backing RBD image to make room available for
> the new IO request.
Sure, i will have a look at the statistics and the throughput.
Is there any consolidation of write requests in rbd cache?

Example:
If a vm writes small io-requests to the ndb device with belong to the
same rados object - does librbd consollidate these requests to  a single
ceph io?
What strategies does librd use for that?

Regards
Marc

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RBD Cache and rbd-nbd
       [not found]                 ` <74367bd7-d96f-6dfd-427a-ff74662b90bc-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
@ 2018-05-14 12:33                   ` Jason Dillaman
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Dillaman @ 2018-05-14 12:33 UTC (permalink / raw)
  To: Marc Schöchlin; +Cc: ceph-devel, ceph-users

On Mon, May 14, 2018 at 12:15 AM, Marc Schöchlin <ms@256bit.org> wrote:
> Hello Jason,
>
> many thanks for your informative response!
>
> Am 11.05.2018 um 17:02 schrieb Jason Dillaman:
>> I cannot speak for Xen, but in general IO to a block device will hit
>> the pagecache unless the IO operation is flagged as direct (e.g.
>> O_DIRECT) to bypass the pagecache and directly send it to the block
>> device.
> Sure, but it seems that xenserver just forwards io from virtual machines
> (vm: blkfront, dom-0: blkback) to the ndb device in dom-0.
>>> Sorry, my question was a bit unprecice: I was searching for usage statistics
>>> of the rbd cache.
>>> Is there also a possibility to gather rbd_cache usage statistics as a source
>>> of verification for optimizing the cache settings?
>> You can run "perf dump" instead of "config show" to dump out the
>> current performance counters. There are some stats from the in-memory
>> cache included in there.
> Great, i was not aware of that.
> There are really a lot of statistics which might be useful for analyzing
> whats going on or if the optimizations improve the performance of our
> systems.
>>> Can you provide some hints how to about adequate cache settings for a write
>>> intensive environment (70% write, 30% read)?
>>> Is it a good idea to specify a huge rbd cache of 1 GB with a max dirty age
>>> of 10 seconds?
>> Depends on your workload and your testing results. I suspect a
>> database on top of RBD is going to do its own read caching and will be
>> issuing lots of flush calls to the block device, potentially negating
>> the need for a large cache.
>
> Sure, reducing flushes with the acceptance of a degraded level of
> reliability seems to be one import key for improved performance.
>
>>>
>>> Our typical workload is originated over 70 percent in database write
>>> operations in the virtual machines.
>>> Therefore collecting write operations with rbd cache and writing them in
>>> chunks to ceph might be a good thing.
>>> A higher limit for "rbd cache max dirty" might be a adequate here.
>>> At the other side our read workload typically reads huge files in sequential
>>> manner.
>>>
>>> Therefore it might be useful to do start with a configuration like that:
>>>
>>> rbd cache size = 64MB
>>> rbd cache max dirty = 48MB
>>> rbd cache target dirty = 32MB
>>> rbd cache max dirty age = 10
>>>
>>> What is the strategy of librbd to write data to the storage from rbd_cache
>>> if "rbd cache max dirty = 48MB" is reached?
>>> Is there a reduction of io operations (merging of ios) compared to the
>>> granularity of writes of my virtual machines?
>> If the cache is full, incoming IO will be stalled as the dirty bits
>> are written back to the backing RBD image to make room available for
>> the new IO request.
> Sure, i will have a look at the statistics and the throughput.
> Is there any consolidation of write requests in rbd cache?
>
> Example:
> If a vm writes small io-requests to the ndb device with belong to the
> same rados object - does librbd consollidate these requests to  a single
> ceph io?
> What strategies does librd use for that?

The librbd cache will consolidate sequential dirty extents within the
same object, but it does not consolidate all dirty extents within the
same object to the same write request.

> Regards
> Marc
>



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-14 12:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 19:03 RBD Cache and rbd-nbd Marc Schöchlin
     [not found] ` <f57c4834-517e-0c0e-7496-831689327bac-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
2018-05-10 19:18   ` Jason Dillaman
     [not found]     ` <CA+aFP1BYyE-opsc8eLH79A8EK-2N0cv_qjah788dKdOYLx7ZYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-11  7:59       ` Marc Schöchlin
     [not found]         ` <de7be619-2da0-7f4f-08cf-aafb80bffeb9-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
2018-05-11 15:02           ` Jason Dillaman
     [not found]             ` <CA+aFP1Dn-yVRwTt7Wfj2HX+56FUPzJwYD5+us8ESLPvZ9EjHCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-14  7:15               ` Marc Schöchlin
     [not found]                 ` <74367bd7-d96f-6dfd-427a-ff74662b90bc-aJA5TdoZkU0dnm+yROfE0A@public.gmane.org>
2018-05-14 12:33                   ` Jason Dillaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.