* Re: wip-librbd-caching
2012-04-12 19:45 ` wip-librbd-caching Sage Weil
@ 2012-04-12 19:48 ` Damien Churchill
2012-04-12 19:54 ` wip-librbd-caching Tommi Virtanen
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Damien Churchill @ 2012-04-12 19:48 UTC (permalink / raw)
To: Sage Weil; +Cc: Martin Mailand, ceph-devel, Josh Durgin
On 12 April 2012 20:45, Sage Weil <sage@newdream.net> wrote:
> I'm not familiar with the performance implications of KSM, but the
> objectcacher doesn't modify existing buffers in place, so I suspect it's a
> good candidate. And it looks like there's minimal effort in enabling
> it...
It uses some CPU when calculating hashes, although I believe if it
gets to be too resource consuming it is possible to disable it and
continue to use the shared pages you have already calculated, just not
updating or checking for any others that could be shared.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-12 19:45 ` wip-librbd-caching Sage Weil
2012-04-12 19:48 ` wip-librbd-caching Damien Churchill
@ 2012-04-12 19:54 ` Tommi Virtanen
2012-04-12 20:20 ` wip-librbd-caching Sage Weil
2012-04-12 19:55 ` wip-librbd-caching Greg Farnum
2012-04-18 12:50 ` wip-librbd-caching Martin Mailand
3 siblings, 1 reply; 9+ messages in thread
From: Tommi Virtanen @ 2012-04-12 19:54 UTC (permalink / raw)
To: Sage Weil; +Cc: Martin Mailand, ceph-devel, Josh Durgin
On Thu, Apr 12, 2012 at 12:45, Sage Weil <sage@newdream.net> wrote:
>> So maybe we could reduce the memory footprint of the cache, but keep it's
>> performance.
>
> I'm not familiar with the performance implications of KSM, but the
> objectcacher doesn't modify existing buffers in place, so I suspect it's a
> good candidate. And it looks like there's minimal effort in enabling
> it...
Are the objectcacher cache entries full pages, page aligned, with no
bookkeeping data inside the page? Those are pretty much the
requirements for page-granularity dedup to work..
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-12 19:54 ` wip-librbd-caching Tommi Virtanen
@ 2012-04-12 20:20 ` Sage Weil
0 siblings, 0 replies; 9+ messages in thread
From: Sage Weil @ 2012-04-12 20:20 UTC (permalink / raw)
To: Tommi Virtanen; +Cc: Martin Mailand, ceph-devel, Josh Durgin
[-- Attachment #1: Type: TEXT/PLAIN, Size: 962 bytes --]
On Thu, 12 Apr 2012, Tommi Virtanen wrote:
> On Thu, Apr 12, 2012 at 12:45, Sage Weil <sage@newdream.net> wrote:
> >> So maybe we could reduce the memory footprint of the cache, but keep it's
> >> performance.
> >
> > I'm not familiar with the performance implications of KSM, but the
> > objectcacher doesn't modify existing buffers in place, so I suspect it's a
> > good candidate. And it looks like there's minimal effort in enabling
> > it...
>
> Are the objectcacher cache entries full pages, page aligned, with no
> bookkeeping data inside the page? Those are pretty much the
> requirements for page-granularity dedup to work..
Some buffers are, some aren't, but we'd only want to madvise on page
aligned ones. The messenger is careful to read things into aligned
memory, and librbd will only be getting block-sized (probably page-sized,
if we say we have 4k blocks) IO... so that should include every buffer in
this case.
sage
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-12 19:45 ` wip-librbd-caching Sage Weil
2012-04-12 19:48 ` wip-librbd-caching Damien Churchill
2012-04-12 19:54 ` wip-librbd-caching Tommi Virtanen
@ 2012-04-12 19:55 ` Greg Farnum
2012-04-18 12:50 ` wip-librbd-caching Martin Mailand
3 siblings, 0 replies; 9+ messages in thread
From: Greg Farnum @ 2012-04-12 19:55 UTC (permalink / raw)
To: Sage Weil; +Cc: Martin Mailand, ceph-devel, Josh Durgin
On Thursday, April 12, 2012 at 12:45 PM, Sage Weil wrote:
> On Thu, 12 Apr 2012, Martin Mailand wrote:
> > The other point is, that the cache is not KSM enabled, therefore identical
> > pages will not be merged, could that be changed, what would be the downside?
> >
> > So maybe we could reduce the memory footprint of the cache, but keep it's
> > performance.
>
>
>
> I'm not familiar with the performance implications of KSM, but the
> objectcacher doesn't modify existing buffers in place, so I suspect it's a
> good candidate. And it looks like there's minimal effort in enabling
> it...
But if you're supposed to advise the kernel that the memory is a good candidate, then probably we shouldn't be making that madvise call on every buffer (I imagine it's doing a sha1 on each page and then examining a tree) — especially since we (probably) flush all that data out relatively quickly. And RBD doesn't currently have any information about whether the data is OS or user data… (I guess in future, with layering, we could call madvise on pages which were read from an underlying gold image.)
Also, TV is wondering if the data is even page-aligned or not? I can't recall off-hand.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-12 19:45 ` wip-librbd-caching Sage Weil
` (2 preceding siblings ...)
2012-04-12 19:55 ` wip-librbd-caching Greg Farnum
@ 2012-04-18 12:50 ` Martin Mailand
2012-04-18 16:27 ` wip-librbd-caching Greg Farnum
2012-04-18 17:44 ` wip-librbd-caching Sage Weil
3 siblings, 2 replies; 9+ messages in thread
From: Martin Mailand @ 2012-04-18 12:50 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel, Josh Durgin
Am 12.04.2012 21:45, schrieb Sage Weil:
> The config options you'll want to look at are client_oc_* (in case you
> didn't see that already :). "oc" is short for objectcacher, and it isn't
> only used for client (libcephfs), so it might be worth renaming these
> options before people start using them.
Hi,
I changed the values and the performance is still very good and the
memory footprint is much smaller.
OPTION(client_oc_size, OPT_INT, 1024*1024* 50) // MB * n
OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 25) // MB * n (dirty
OR tx.. bigish)
OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty
(keep this smallish)
// note: the max amount of "in flight" dirty data is roughly (max - target)
But I am not quite sure about the meaning of the values.
client_oc_size Max size of the cache?
client_oc_max_dirty max dirty value before the writeback starts?
client_oc_target_dirty ???
-martin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-18 12:50 ` wip-librbd-caching Martin Mailand
@ 2012-04-18 16:27 ` Greg Farnum
2012-04-18 17:44 ` wip-librbd-caching Sage Weil
1 sibling, 0 replies; 9+ messages in thread
From: Greg Farnum @ 2012-04-18 16:27 UTC (permalink / raw)
To: Martin Mailand; +Cc: Sage Weil, ceph-devel, Josh Durgin
On Wednesday, April 18, 2012 at 5:50 AM, Martin Mailand wrote:
> Hi,
>
> I changed the values and the performance is still very good and the
> memory footprint is much smaller.
>
> OPTION(client_oc_size, OPT_INT, 1024*1024* 50) // MB * n
> OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 25) // MB * n (dirty
> OR tx.. bigish)
> OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty
> (keep this smallish)
> // note: the max amount of "in flight" dirty data is roughly (max - target)
>
> But I am not quite sure about the meaning of the values.
> client_oc_size Max size of the cache?
> client_oc_max_dirty max dirty value before the writeback starts?
> client_oc_target_dirty ???
>
Right now the cache writeout algorithms are based on amount of dirty data, rather than something like how long the data has been dirty.
client_oc_size is the max (and therefore typical) size of the cache.
client_oc_max_dirty is the largest amount of dirty data in the cache — if this much is dirty and you try to dirty more, the dirtier (a write of some kind) will block until some of the other dirty data has been committed.
client_oc_target_dirty is the amount of dirty data that will trigger the cache to start flushing data out.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: wip-librbd-caching
2012-04-18 12:50 ` wip-librbd-caching Martin Mailand
2012-04-18 16:27 ` wip-librbd-caching Greg Farnum
@ 2012-04-18 17:44 ` Sage Weil
1 sibling, 0 replies; 9+ messages in thread
From: Sage Weil @ 2012-04-18 17:44 UTC (permalink / raw)
To: Martin Mailand; +Cc: ceph-devel, Josh Durgin
On Wed, 18 Apr 2012, Martin Mailand wrote:
> Am 12.04.2012 21:45, schrieb Sage Weil:
> > The config options you'll want to look at are client_oc_* (in case you
> > didn't see that already :). "oc" is short for objectcacher, and it isn't
> > only used for client (libcephfs), so it might be worth renaming these
> > options before people start using them.
>
> Hi,
>
> I changed the values and the performance is still very good and the memory
> footprint is much smaller.
>
> OPTION(client_oc_size, OPT_INT, 1024*1024* 50) // MB * n
> OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 25) // MB * n (dirty OR
> tx.. bigish)
> OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty (keep
> this smallish)
> // note: the max amount of "in flight" dirty data is roughly (max - target)
>
> But I am not quite sure about the meaning of the values.
> client_oc_size Max size of the cache?
yes
> client_oc_max_dirty max dirty value before the writeback starts?
before writes block and wait for writeback to bring the dirty level down
> client_oc_target_dirty ???
before writeback starts
BTW I renamed 'rbd cache enabled' -> 'rbd cache'. I'd like to rename the
objectcacher settings too so they aren't nested under client_ (which is
the fs client code).
objectcacher_*?
sage
^ permalink raw reply [flat|nested] 9+ messages in thread