All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haomai Wang <haomaiwang@gmail.com>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Refactor DBObjectMap Proposal
Date: Sun, 22 Dec 2013 14:02:45 +0800	[thread overview]
Message-ID: <6F5C0608-09BB-4370-9599-FC3BCFDE46B5@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1312212119140.2168@cobra.newdream.net>


On Dec 22, 2013, at 1:20 PM, Sage Weil <sage@inktank.com> wrote:

> On Sat, 21 Dec 2013, Haomai Wang wrote:
>> On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@inktank.com> wrote:
>> 
>>> On Thu, 12 Dec 2013, Haomai Wang wrote:
>>>> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote:
>>>>> [adding cc ceph-devel]
>>> 
>>> [attempt 2]
>>> 
>>>>> 
>>>>> On Wed, 11 Dec 2013, Haomai Wang wrote:
>>>>>> Hi Sage,
>>>>>> 
>>>>>> Since last CDS, you have pointed jobs see below:
>>>>>> 
>>>>>> ============================
>>>>>> 2. DBObjectMap: refactor interface
>>>>>>   1. expose underlying KeyValueDB transactions to caller, so they
>>>>>> can bundle several DBObjectMap ops together and capture an entire
>>>>>> ObjectStore::Transaction's worth of work)
>>>>>>   2.expose the user prefixes in a generic way, instead of
>>>>>> hard-coding in the omap, xattr, and various internal namespaces
>>>>>> 
>>>>>> 3. stripe file data over keys
>>>>>>   1. Build a class that will implement a file data interface (read
>>>>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap
>>>>>>   2. stripe data over keys of size X (e.g., 1MB, which seems to be
>>>>>> the limit people are converging around)
>>>>>>   3. store file size information in a metadata key.  maybe this can
>>>>>> be DBObjectMap::Header; maybe not
>>>>>>   4. contemplate future optimizations that put small objects
>>>>>> "inline" in the Header (or equivalent) key
>>>>>> ============================
>>>>>> 
>>>>>> I'm interested to implement it and I don't know whether you or others
>>>>>> started to do it. Now I want to describe my idea.
>>>>> 
>>>>> Nobody is working on this just yet, although there is a lot of interest in
>>>>> this area so your timing is very good!
>>>>> 
>>>>>> According to your comments, I think about implementing strip file data
>>>>>> over keys in KeyValueStore class. Add a field called "userdata" to
>>>>>> DBObjectMap::Header which is explained by caller such as
>>>>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for
>>>>>> "userdata" field. So KeyValueStore will make use of "userdata" to
>>>>>> manage stripped layer. Maybe a metadata table to map offset->key_name.
>>>>> 
>>>>> Yes.  My original thought is to make the DBObjectMap type fields a bit
>>>>> more general (instead of the hard-coded #defines), but I don't think it
>>>>> matters too much.
>>>>> 
>>>>> For the metadata table, yes eventually.. but I would keep it simple for
>>>>> the first pass and iterate from there.
>>>>> 
>>>>>> Although DBObjectMap already implement clone operation on
>>>>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent
>>>>>> which will cause dependent lookup chain resulting to performance
>>>>>> degrade just like librbd. And I suspect that if using the current
>>>>>> DBObjectMap methods to manage cloned objects, it may occur performance
>>>>>> problems.  So DBObjectMap need to expose pure KeyValueDB interfaces
>>>>>> called by KeyValueStore to store stripped keys which is controlled by
>>>>>> a metadata table mentioned above. Others such as xattr and omap
>>>>>> namespace won't be destroyed. Clone operation will be implemented via
>>>>>> DBObjectMap::clone, actual object data won't be changed and only
>>>>>> metadata table referenced to "userdata" will be copied. Any write
>>>>>> operation will be redirected to new key. In other word, it may looks
>>>>>> like librbd did, but here we implement it in ROW not COW.
>>>>>> 
>>>>>> The reason to design like above contains:
>>>>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is
>>>>>> used by FileStore which will limit big changes
>>>>> 
>>>>> Yes; we need to be a bit careful here.  I'm hoping the main changes though
>>>>> are really just moving the transaction create and submit boilerplate in
>>>>> each method into the FileStore callers?
>>>> 
>>>> In my mind, I don't want to change the caller codes such as FileStore.
>>>> It works well now. ;-)
>>> 
>>> True.  We can also just make a second layer of methods (_foo() instead of 
>>> foo() or someting) that take the transaction as an argument.
>>> 
>>> Or just fork DBObjectMap entirely so that we don't need to worry about 
>>> breaking FileStore ondisk compatibility; we will likely want/need to do 
>>> something like that eventually anyway!
>> 
>> I'm confusing by "_remove" interface in FileStore that doesn't remove omap
>> keys with corresponding object. And I try to dump transaction what
>> "rados rm object -p data" doing, actually no delete operations with omap keys.
>> 
>> So I'm wonder that it's the proper we don't remove omap keys? And I notice
>> MemStore did omap erase operation:
>>  c->object_map.erase(oid);
>>  c->object_hash.erase(oid);
> 
> FileStore::_remove() calls lfn_unlink(), which calls 
> object_map->clear(...) (if nlink == 0).
> 
> I think that's what you're looking for?

OH, it seemed that I missing it previously. Thank you.

> 
> sage
> 
> 
>> 
>>> 
>>> sage
>>> 
>>>>> 
>>>>>> 2. Read/Write object is a more frequenter operation which different
>>>>>> from OMap or xattr operations, we need more special handler now or
>>>>>> future to optimize.
>>>>>> 3. Different kv backend may have different features just like
>>>>>> FileSystemBackend, we would like to deal with these at KeyValueStore
>>>>>> not DBObjectMap or upper class.
>>>>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things.
>>>>> 
>>>>> I'm not fully following this description, but it sounds like you're
>>>>> thinking about the right issues.  A few comments:
>>>>> 
>>>>> - In the ideal case, we'd like to minimize the number of lookups/keys we
>>>>> query to access an object.  This is a bit less important for objects that
>>>>> are cloned (they tend to be snapshots... mostly).
>>>>> 
>>>>> - I think it makes sense to make the main header key for an object be able
>>>>> to embed various bits of useful data, like
>>>>> 
>>>>> - all of the xattrs, if there aren't many of them
>>>>> - the file size
>>>>> - the file content, if it is small
>>>>> 
>>>>> No need for this in the initial implementation, but we should design
>>>>> something that can accomodate it.
>>>>> 
>>>>> - It would be nice to capture the striping CRUD stuff in a separate class;
>>>>> a child of DBObjectMap or something similar.  This will make it easy to
>>>>> swap out and/or experiment with different approaches.
>>>>> 
>>>>>> So in this proposal, DBObjectMap will serve as a bridge in the front
>>>>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store
>>>>>> stripped object and DBObjectMap::Header to store metadata. If so, my
>>>>>> previous implementation could be fully make use of. :-)
>>>>> 
>>>>> That's great news!  Let me know if there is anything we can do to help
>>>>> here.
>>>>> 
>>>>> sage
>>>> 
>>>> Thanks for your comments!
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> 
>>>> Wheat
>> 
>> Best regards,
>> Wheats

Best regards,
Wheats




  reply	other threads:[~2013-12-22  6:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACJqLyaxvRF8R51i0atG_GgFvWcWDZMjrzbBwJG0SPiziSKp1g@mail.gmail.com>
     [not found] ` <alpine.DEB.2.00.1312112117020.4714@cobra.newdream.net>
     [not found]   ` <CACJqLybbLiqv+Z5HZtwVAdYQLe-db2vsb=eO6oXT9qGZ6UNQnQ@mail.gmail.com>
2013-12-12 17:01     ` Refactor DBObjectMap Proposal Sage Weil
2013-12-21 14:33       ` Haomai Wang
2013-12-22  5:20         ` Sage Weil
2013-12-22  6:02           ` Haomai Wang [this message]
2013-12-22  9:44             ` Haomai Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6F5C0608-09BB-4370-9599-FC3BCFDE46B5@gmail.com \
    --to=haomaiwang@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.