* Re: Refactor DBObjectMap Proposal [not found] ` <CACJqLybbLiqv+Z5HZtwVAdYQLe-db2vsb=eO6oXT9qGZ6UNQnQ@mail.gmail.com> @ 2013-12-12 17:01 ` Sage Weil 2013-12-21 14:33 ` Haomai Wang 0 siblings, 1 reply; 5+ messages in thread From: Sage Weil @ 2013-12-12 17:01 UTC (permalink / raw) To: Haomai Wang; +Cc: ceph-devel On Thu, 12 Dec 2013, Haomai Wang wrote: > On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote: > > [adding cc ceph-devel] [attempt 2] > > > > On Wed, 11 Dec 2013, Haomai Wang wrote: > >> Hi Sage, > >> > >> Since last CDS, you have pointed jobs see below: > >> > >> ============================ > >> 2. DBObjectMap: refactor interface > >> 1. expose underlying KeyValueDB transactions to caller, so they > >> can bundle several DBObjectMap ops together and capture an entire > >> ObjectStore::Transaction's worth of work) > >> 2.expose the user prefixes in a generic way, instead of > >> hard-coding in the omap, xattr, and various internal namespaces > >> > >> 3. stripe file data over keys > >> 1. Build a class that will implement a file data interface (read > >> extent, write extent, truncate, zero, etc.) on top of DBObjectMap > >> 2. stripe data over keys of size X (e.g., 1MB, which seems to be > >> the limit people are converging around) > >> 3. store file size information in a metadata key. maybe this can > >> be DBObjectMap::Header; maybe not > >> 4. contemplate future optimizations that put small objects > >> "inline" in the Header (or equivalent) key > >> ============================ > >> > >> I'm interested to implement it and I don't know whether you or others > >> started to do it. Now I want to describe my idea. > > > > Nobody is working on this just yet, although there is a lot of interest in > > this area so your timing is very good! > > > >> According to your comments, I think about implementing strip file data > >> over keys in KeyValueStore class. Add a field called "userdata" to > >> DBObjectMap::Header which is explained by caller such as > >> KeyValueStore. Of course, we need to add CRUD operation interfaces for > >> "userdata" field. So KeyValueStore will make use of "userdata" to > >> manage stripped layer. Maybe a metadata table to map offset->key_name. > > > > Yes. My original thought is to make the DBObjectMap type fields a bit > > more general (instead of the hard-coded #defines), but I don't think it > > matters too much. > > > > For the metadata table, yes eventually.. but I would keep it simple for > > the first pass and iterate from there. > > > >> Although DBObjectMap already implement clone operation on > >> "USER_PREFIX" keys, I really don't like operations like lookup_parent > >> which will cause dependent lookup chain resulting to performance > >> degrade just like librbd. And I suspect that if using the current > >> DBObjectMap methods to manage cloned objects, it may occur performance > >> problems. So DBObjectMap need to expose pure KeyValueDB interfaces > >> called by KeyValueStore to store stripped keys which is controlled by > >> a metadata table mentioned above. Others such as xattr and omap > >> namespace won't be destroyed. Clone operation will be implemented via > >> DBObjectMap::clone, actual object data won't be changed and only > >> metadata table referenced to "userdata" will be copied. Any write > >> operation will be redirected to new key. In other word, it may looks > >> like librbd did, but here we implement it in ROW not COW. > >> > >> The reason to design like above contains: > >> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is > >> used by FileStore which will limit big changes > > > > Yes; we need to be a bit careful here. I'm hoping the main changes though > > are really just moving the transaction create and submit boilerplate in > > each method into the FileStore callers? > > In my mind, I don't want to change the caller codes such as FileStore. > It works well now. ;-) True. We can also just make a second layer of methods (_foo() instead of foo() or someting) that take the transaction as an argument. Or just fork DBObjectMap entirely so that we don't need to worry about breaking FileStore ondisk compatibility; we will likely want/need to do something like that eventually anyway! sage > > > >> 2. Read/Write object is a more frequenter operation which different > >> from OMap or xattr operations, we need more special handler now or > >> future to optimize. > >> 3. Different kv backend may have different features just like > >> FileSystemBackend, we would like to deal with these at KeyValueStore > >> not DBObjectMap or upper class. > >> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. > > > > I'm not fully following this description, but it sounds like you're > > thinking about the right issues. A few comments: > > > > - In the ideal case, we'd like to minimize the number of lookups/keys we > > query to access an object. This is a bit less important for objects that > > are cloned (they tend to be snapshots... mostly). > > > > - I think it makes sense to make the main header key for an object be able > > to embed various bits of useful data, like > > > > - all of the xattrs, if there aren't many of them > > - the file size > > - the file content, if it is small > > > > No need for this in the initial implementation, but we should design > > something that can accomodate it. > > > > - It would be nice to capture the striping CRUD stuff in a separate class; > > a child of DBObjectMap or something similar. This will make it easy to > > swap out and/or experiment with different approaches. > > > >> So in this proposal, DBObjectMap will serve as a bridge in the front > >> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store > >> stripped object and DBObjectMap::Header to store metadata. If so, my > >> previous implementation could be fully make use of. :-) > > > > That's great news! Let me know if there is anything we can do to help > > here. > > > > sage > > Thanks for your comments! > > > -- > Best Regards, > > Wheat > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Refactor DBObjectMap Proposal 2013-12-12 17:01 ` Refactor DBObjectMap Proposal Sage Weil @ 2013-12-21 14:33 ` Haomai Wang 2013-12-22 5:20 ` Sage Weil 0 siblings, 1 reply; 5+ messages in thread From: Haomai Wang @ 2013-12-21 14:33 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@inktank.com> wrote: > On Thu, 12 Dec 2013, Haomai Wang wrote: >> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote: >>> [adding cc ceph-devel] > > [attempt 2] > >>> >>> On Wed, 11 Dec 2013, Haomai Wang wrote: >>>> Hi Sage, >>>> >>>> Since last CDS, you have pointed jobs see below: >>>> >>>> ============================ >>>> 2. DBObjectMap: refactor interface >>>> 1. expose underlying KeyValueDB transactions to caller, so they >>>> can bundle several DBObjectMap ops together and capture an entire >>>> ObjectStore::Transaction's worth of work) >>>> 2.expose the user prefixes in a generic way, instead of >>>> hard-coding in the omap, xattr, and various internal namespaces >>>> >>>> 3. stripe file data over keys >>>> 1. Build a class that will implement a file data interface (read >>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap >>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be >>>> the limit people are converging around) >>>> 3. store file size information in a metadata key. maybe this can >>>> be DBObjectMap::Header; maybe not >>>> 4. contemplate future optimizations that put small objects >>>> "inline" in the Header (or equivalent) key >>>> ============================ >>>> >>>> I'm interested to implement it and I don't know whether you or others >>>> started to do it. Now I want to describe my idea. >>> >>> Nobody is working on this just yet, although there is a lot of interest in >>> this area so your timing is very good! >>> >>>> According to your comments, I think about implementing strip file data >>>> over keys in KeyValueStore class. Add a field called "userdata" to >>>> DBObjectMap::Header which is explained by caller such as >>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for >>>> "userdata" field. So KeyValueStore will make use of "userdata" to >>>> manage stripped layer. Maybe a metadata table to map offset->key_name. >>> >>> Yes. My original thought is to make the DBObjectMap type fields a bit >>> more general (instead of the hard-coded #defines), but I don't think it >>> matters too much. >>> >>> For the metadata table, yes eventually.. but I would keep it simple for >>> the first pass and iterate from there. >>> >>>> Although DBObjectMap already implement clone operation on >>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent >>>> which will cause dependent lookup chain resulting to performance >>>> degrade just like librbd. And I suspect that if using the current >>>> DBObjectMap methods to manage cloned objects, it may occur performance >>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces >>>> called by KeyValueStore to store stripped keys which is controlled by >>>> a metadata table mentioned above. Others such as xattr and omap >>>> namespace won't be destroyed. Clone operation will be implemented via >>>> DBObjectMap::clone, actual object data won't be changed and only >>>> metadata table referenced to "userdata" will be copied. Any write >>>> operation will be redirected to new key. In other word, it may looks >>>> like librbd did, but here we implement it in ROW not COW. >>>> >>>> The reason to design like above contains: >>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is >>>> used by FileStore which will limit big changes >>> >>> Yes; we need to be a bit careful here. I'm hoping the main changes though >>> are really just moving the transaction create and submit boilerplate in >>> each method into the FileStore callers? >> >> In my mind, I don't want to change the caller codes such as FileStore. >> It works well now. ;-) > > True. We can also just make a second layer of methods (_foo() instead of > foo() or someting) that take the transaction as an argument. > > Or just fork DBObjectMap entirely so that we don't need to worry about > breaking FileStore ondisk compatibility; we will likely want/need to do > something like that eventually anyway! I'm confusing by "_remove" interface in FileStore that doesn't remove omap keys with corresponding object. And I try to dump transaction what "rados rm object -p data" doing, actually no delete operations with omap keys. So I'm wonder that it's the proper we don't remove omap keys? And I notice MemStore did omap erase operation: c->object_map.erase(oid); c->object_hash.erase(oid); > > sage > >>> >>>> 2. Read/Write object is a more frequenter operation which different >>>> from OMap or xattr operations, we need more special handler now or >>>> future to optimize. >>>> 3. Different kv backend may have different features just like >>>> FileSystemBackend, we would like to deal with these at KeyValueStore >>>> not DBObjectMap or upper class. >>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. >>> >>> I'm not fully following this description, but it sounds like you're >>> thinking about the right issues. A few comments: >>> >>> - In the ideal case, we'd like to minimize the number of lookups/keys we >>> query to access an object. This is a bit less important for objects that >>> are cloned (they tend to be snapshots... mostly). >>> >>> - I think it makes sense to make the main header key for an object be able >>> to embed various bits of useful data, like >>> >>> - all of the xattrs, if there aren't many of them >>> - the file size >>> - the file content, if it is small >>> >>> No need for this in the initial implementation, but we should design >>> something that can accomodate it. >>> >>> - It would be nice to capture the striping CRUD stuff in a separate class; >>> a child of DBObjectMap or something similar. This will make it easy to >>> swap out and/or experiment with different approaches. >>> >>>> So in this proposal, DBObjectMap will serve as a bridge in the front >>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store >>>> stripped object and DBObjectMap::Header to store metadata. If so, my >>>> previous implementation could be fully make use of. :-) >>> >>> That's great news! Let me know if there is anything we can do to help >>> here. >>> >>> sage >> >> Thanks for your comments! >> >> >> -- >> Best Regards, >> >> Wheat Best regards, Wheats ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Refactor DBObjectMap Proposal 2013-12-21 14:33 ` Haomai Wang @ 2013-12-22 5:20 ` Sage Weil 2013-12-22 6:02 ` Haomai Wang 0 siblings, 1 reply; 5+ messages in thread From: Sage Weil @ 2013-12-22 5:20 UTC (permalink / raw) To: Haomai Wang; +Cc: ceph-devel On Sat, 21 Dec 2013, Haomai Wang wrote: > On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@inktank.com> wrote: > > > On Thu, 12 Dec 2013, Haomai Wang wrote: > >> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote: > >>> [adding cc ceph-devel] > > > > [attempt 2] > > > >>> > >>> On Wed, 11 Dec 2013, Haomai Wang wrote: > >>>> Hi Sage, > >>>> > >>>> Since last CDS, you have pointed jobs see below: > >>>> > >>>> ============================ > >>>> 2. DBObjectMap: refactor interface > >>>> 1. expose underlying KeyValueDB transactions to caller, so they > >>>> can bundle several DBObjectMap ops together and capture an entire > >>>> ObjectStore::Transaction's worth of work) > >>>> 2.expose the user prefixes in a generic way, instead of > >>>> hard-coding in the omap, xattr, and various internal namespaces > >>>> > >>>> 3. stripe file data over keys > >>>> 1. Build a class that will implement a file data interface (read > >>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap > >>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be > >>>> the limit people are converging around) > >>>> 3. store file size information in a metadata key. maybe this can > >>>> be DBObjectMap::Header; maybe not > >>>> 4. contemplate future optimizations that put small objects > >>>> "inline" in the Header (or equivalent) key > >>>> ============================ > >>>> > >>>> I'm interested to implement it and I don't know whether you or others > >>>> started to do it. Now I want to describe my idea. > >>> > >>> Nobody is working on this just yet, although there is a lot of interest in > >>> this area so your timing is very good! > >>> > >>>> According to your comments, I think about implementing strip file data > >>>> over keys in KeyValueStore class. Add a field called "userdata" to > >>>> DBObjectMap::Header which is explained by caller such as > >>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for > >>>> "userdata" field. So KeyValueStore will make use of "userdata" to > >>>> manage stripped layer. Maybe a metadata table to map offset->key_name. > >>> > >>> Yes. My original thought is to make the DBObjectMap type fields a bit > >>> more general (instead of the hard-coded #defines), but I don't think it > >>> matters too much. > >>> > >>> For the metadata table, yes eventually.. but I would keep it simple for > >>> the first pass and iterate from there. > >>> > >>>> Although DBObjectMap already implement clone operation on > >>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent > >>>> which will cause dependent lookup chain resulting to performance > >>>> degrade just like librbd. And I suspect that if using the current > >>>> DBObjectMap methods to manage cloned objects, it may occur performance > >>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces > >>>> called by KeyValueStore to store stripped keys which is controlled by > >>>> a metadata table mentioned above. Others such as xattr and omap > >>>> namespace won't be destroyed. Clone operation will be implemented via > >>>> DBObjectMap::clone, actual object data won't be changed and only > >>>> metadata table referenced to "userdata" will be copied. Any write > >>>> operation will be redirected to new key. In other word, it may looks > >>>> like librbd did, but here we implement it in ROW not COW. > >>>> > >>>> The reason to design like above contains: > >>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is > >>>> used by FileStore which will limit big changes > >>> > >>> Yes; we need to be a bit careful here. I'm hoping the main changes though > >>> are really just moving the transaction create and submit boilerplate in > >>> each method into the FileStore callers? > >> > >> In my mind, I don't want to change the caller codes such as FileStore. > >> It works well now. ;-) > > > > True. We can also just make a second layer of methods (_foo() instead of > > foo() or someting) that take the transaction as an argument. > > > > Or just fork DBObjectMap entirely so that we don't need to worry about > > breaking FileStore ondisk compatibility; we will likely want/need to do > > something like that eventually anyway! > > I'm confusing by "_remove" interface in FileStore that doesn't remove omap > keys with corresponding object. And I try to dump transaction what > "rados rm object -p data" doing, actually no delete operations with omap keys. > > So I'm wonder that it's the proper we don't remove omap keys? And I notice > MemStore did omap erase operation: > c->object_map.erase(oid); > c->object_hash.erase(oid); FileStore::_remove() calls lfn_unlink(), which calls object_map->clear(...) (if nlink == 0). I think that's what you're looking for? sage > > > > > sage > > > >>> > >>>> 2. Read/Write object is a more frequenter operation which different > >>>> from OMap or xattr operations, we need more special handler now or > >>>> future to optimize. > >>>> 3. Different kv backend may have different features just like > >>>> FileSystemBackend, we would like to deal with these at KeyValueStore > >>>> not DBObjectMap or upper class. > >>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. > >>> > >>> I'm not fully following this description, but it sounds like you're > >>> thinking about the right issues. A few comments: > >>> > >>> - In the ideal case, we'd like to minimize the number of lookups/keys we > >>> query to access an object. This is a bit less important for objects that > >>> are cloned (they tend to be snapshots... mostly). > >>> > >>> - I think it makes sense to make the main header key for an object be able > >>> to embed various bits of useful data, like > >>> > >>> - all of the xattrs, if there aren't many of them > >>> - the file size > >>> - the file content, if it is small > >>> > >>> No need for this in the initial implementation, but we should design > >>> something that can accomodate it. > >>> > >>> - It would be nice to capture the striping CRUD stuff in a separate class; > >>> a child of DBObjectMap or something similar. This will make it easy to > >>> swap out and/or experiment with different approaches. > >>> > >>>> So in this proposal, DBObjectMap will serve as a bridge in the front > >>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store > >>>> stripped object and DBObjectMap::Header to store metadata. If so, my > >>>> previous implementation could be fully make use of. :-) > >>> > >>> That's great news! Let me know if there is anything we can do to help > >>> here. > >>> > >>> sage > >> > >> Thanks for your comments! > >> > >> > >> -- > >> Best Regards, > >> > >> Wheat > > Best regards, > Wheats > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Refactor DBObjectMap Proposal 2013-12-22 5:20 ` Sage Weil @ 2013-12-22 6:02 ` Haomai Wang 2013-12-22 9:44 ` Haomai Wang 0 siblings, 1 reply; 5+ messages in thread From: Haomai Wang @ 2013-12-22 6:02 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Dec 22, 2013, at 1:20 PM, Sage Weil <sage@inktank.com> wrote: > On Sat, 21 Dec 2013, Haomai Wang wrote: >> On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@inktank.com> wrote: >> >>> On Thu, 12 Dec 2013, Haomai Wang wrote: >>>> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote: >>>>> [adding cc ceph-devel] >>> >>> [attempt 2] >>> >>>>> >>>>> On Wed, 11 Dec 2013, Haomai Wang wrote: >>>>>> Hi Sage, >>>>>> >>>>>> Since last CDS, you have pointed jobs see below: >>>>>> >>>>>> ============================ >>>>>> 2. DBObjectMap: refactor interface >>>>>> 1. expose underlying KeyValueDB transactions to caller, so they >>>>>> can bundle several DBObjectMap ops together and capture an entire >>>>>> ObjectStore::Transaction's worth of work) >>>>>> 2.expose the user prefixes in a generic way, instead of >>>>>> hard-coding in the omap, xattr, and various internal namespaces >>>>>> >>>>>> 3. stripe file data over keys >>>>>> 1. Build a class that will implement a file data interface (read >>>>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap >>>>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be >>>>>> the limit people are converging around) >>>>>> 3. store file size information in a metadata key. maybe this can >>>>>> be DBObjectMap::Header; maybe not >>>>>> 4. contemplate future optimizations that put small objects >>>>>> "inline" in the Header (or equivalent) key >>>>>> ============================ >>>>>> >>>>>> I'm interested to implement it and I don't know whether you or others >>>>>> started to do it. Now I want to describe my idea. >>>>> >>>>> Nobody is working on this just yet, although there is a lot of interest in >>>>> this area so your timing is very good! >>>>> >>>>>> According to your comments, I think about implementing strip file data >>>>>> over keys in KeyValueStore class. Add a field called "userdata" to >>>>>> DBObjectMap::Header which is explained by caller such as >>>>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for >>>>>> "userdata" field. So KeyValueStore will make use of "userdata" to >>>>>> manage stripped layer. Maybe a metadata table to map offset->key_name. >>>>> >>>>> Yes. My original thought is to make the DBObjectMap type fields a bit >>>>> more general (instead of the hard-coded #defines), but I don't think it >>>>> matters too much. >>>>> >>>>> For the metadata table, yes eventually.. but I would keep it simple for >>>>> the first pass and iterate from there. >>>>> >>>>>> Although DBObjectMap already implement clone operation on >>>>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent >>>>>> which will cause dependent lookup chain resulting to performance >>>>>> degrade just like librbd. And I suspect that if using the current >>>>>> DBObjectMap methods to manage cloned objects, it may occur performance >>>>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces >>>>>> called by KeyValueStore to store stripped keys which is controlled by >>>>>> a metadata table mentioned above. Others such as xattr and omap >>>>>> namespace won't be destroyed. Clone operation will be implemented via >>>>>> DBObjectMap::clone, actual object data won't be changed and only >>>>>> metadata table referenced to "userdata" will be copied. Any write >>>>>> operation will be redirected to new key. In other word, it may looks >>>>>> like librbd did, but here we implement it in ROW not COW. >>>>>> >>>>>> The reason to design like above contains: >>>>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is >>>>>> used by FileStore which will limit big changes >>>>> >>>>> Yes; we need to be a bit careful here. I'm hoping the main changes though >>>>> are really just moving the transaction create and submit boilerplate in >>>>> each method into the FileStore callers? >>>> >>>> In my mind, I don't want to change the caller codes such as FileStore. >>>> It works well now. ;-) >>> >>> True. We can also just make a second layer of methods (_foo() instead of >>> foo() or someting) that take the transaction as an argument. >>> >>> Or just fork DBObjectMap entirely so that we don't need to worry about >>> breaking FileStore ondisk compatibility; we will likely want/need to do >>> something like that eventually anyway! >> >> I'm confusing by "_remove" interface in FileStore that doesn't remove omap >> keys with corresponding object. And I try to dump transaction what >> "rados rm object -p data" doing, actually no delete operations with omap keys. >> >> So I'm wonder that it's the proper we don't remove omap keys? And I notice >> MemStore did omap erase operation: >> c->object_map.erase(oid); >> c->object_hash.erase(oid); > > FileStore::_remove() calls lfn_unlink(), which calls > object_map->clear(...) (if nlink == 0). > > I think that's what you're looking for? OH, it seemed that I missing it previously. Thank you. > > sage > > >> >>> >>> sage >>> >>>>> >>>>>> 2. Read/Write object is a more frequenter operation which different >>>>>> from OMap or xattr operations, we need more special handler now or >>>>>> future to optimize. >>>>>> 3. Different kv backend may have different features just like >>>>>> FileSystemBackend, we would like to deal with these at KeyValueStore >>>>>> not DBObjectMap or upper class. >>>>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. >>>>> >>>>> I'm not fully following this description, but it sounds like you're >>>>> thinking about the right issues. A few comments: >>>>> >>>>> - In the ideal case, we'd like to minimize the number of lookups/keys we >>>>> query to access an object. This is a bit less important for objects that >>>>> are cloned (they tend to be snapshots... mostly). >>>>> >>>>> - I think it makes sense to make the main header key for an object be able >>>>> to embed various bits of useful data, like >>>>> >>>>> - all of the xattrs, if there aren't many of them >>>>> - the file size >>>>> - the file content, if it is small >>>>> >>>>> No need for this in the initial implementation, but we should design >>>>> something that can accomodate it. >>>>> >>>>> - It would be nice to capture the striping CRUD stuff in a separate class; >>>>> a child of DBObjectMap or something similar. This will make it easy to >>>>> swap out and/or experiment with different approaches. >>>>> >>>>>> So in this proposal, DBObjectMap will serve as a bridge in the front >>>>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store >>>>>> stripped object and DBObjectMap::Header to store metadata. If so, my >>>>>> previous implementation could be fully make use of. :-) >>>>> >>>>> That's great news! Let me know if there is anything we can do to help >>>>> here. >>>>> >>>>> sage >>>> >>>> Thanks for your comments! >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Wheat >> >> Best regards, >> Wheats Best regards, Wheats ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Refactor DBObjectMap Proposal 2013-12-22 6:02 ` Haomai Wang @ 2013-12-22 9:44 ` Haomai Wang 0 siblings, 0 replies; 5+ messages in thread From: Haomai Wang @ 2013-12-22 9:44 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel On Dec 22, 2013, at 2:02 PM, Haomai Wang <haomaiwang@gmail.com> wrote: > > On Dec 22, 2013, at 1:20 PM, Sage Weil <sage@inktank.com> wrote: > >> On Sat, 21 Dec 2013, Haomai Wang wrote: >>> On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@inktank.com> wrote: >>> >>>> On Thu, 12 Dec 2013, Haomai Wang wrote: >>>>> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@inktank.com> wrote: >>>>>> [adding cc ceph-devel] >>>> >>>> [attempt 2] >>>> >>>>>> >>>>>> On Wed, 11 Dec 2013, Haomai Wang wrote: >>>>>>> Hi Sage, >>>>>>> >>>>>>> Since last CDS, you have pointed jobs see below: >>>>>>> >>>>>>> ============================ >>>>>>> 2. DBObjectMap: refactor interface >>>>>>> 1. expose underlying KeyValueDB transactions to caller, so they >>>>>>> can bundle several DBObjectMap ops together and capture an entire >>>>>>> ObjectStore::Transaction's worth of work) >>>>>>> 2.expose the user prefixes in a generic way, instead of >>>>>>> hard-coding in the omap, xattr, and various internal namespaces >>>>>>> >>>>>>> 3. stripe file data over keys >>>>>>> 1. Build a class that will implement a file data interface (read >>>>>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap >>>>>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be >>>>>>> the limit people are converging around) >>>>>>> 3. store file size information in a metadata key. maybe this can >>>>>>> be DBObjectMap::Header; maybe not >>>>>>> 4. contemplate future optimizations that put small objects >>>>>>> "inline" in the Header (or equivalent) key >>>>>>> ============================ >>>>>>> >>>>>>> I'm interested to implement it and I don't know whether you or others >>>>>>> started to do it. Now I want to describe my idea. >>>>>> >>>>>> Nobody is working on this just yet, although there is a lot of interest in >>>>>> this area so your timing is very good! >>>>>> >>>>>>> According to your comments, I think about implementing strip file data >>>>>>> over keys in KeyValueStore class. Add a field called "userdata" to >>>>>>> DBObjectMap::Header which is explained by caller such as >>>>>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for >>>>>>> "userdata" field. So KeyValueStore will make use of "userdata" to >>>>>>> manage stripped layer. Maybe a metadata table to map offset->key_name. >>>>>> >>>>>> Yes. My original thought is to make the DBObjectMap type fields a bit >>>>>> more general (instead of the hard-coded #defines), but I don't think it >>>>>> matters too much. >>>>>> >>>>>> For the metadata table, yes eventually.. but I would keep it simple for >>>>>> the first pass and iterate from there. >>>>>> >>>>>>> Although DBObjectMap already implement clone operation on >>>>>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent >>>>>>> which will cause dependent lookup chain resulting to performance >>>>>>> degrade just like librbd. And I suspect that if using the current >>>>>>> DBObjectMap methods to manage cloned objects, it may occur performance >>>>>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces >>>>>>> called by KeyValueStore to store stripped keys which is controlled by >>>>>>> a metadata table mentioned above. Others such as xattr and omap >>>>>>> namespace won't be destroyed. Clone operation will be implemented via >>>>>>> DBObjectMap::clone, actual object data won't be changed and only >>>>>>> metadata table referenced to "userdata" will be copied. Any write >>>>>>> operation will be redirected to new key. In other word, it may looks >>>>>>> like librbd did, but here we implement it in ROW not COW. >>>>>>> >>>>>>> The reason to design like above contains: >>>>>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is >>>>>>> used by FileStore which will limit big changes >>>>>> >>>>>> Yes; we need to be a bit careful here. I'm hoping the main changes though >>>>>> are really just moving the transaction create and submit boilerplate in >>>>>> each method into the FileStore callers? >>>>> >>>>> In my mind, I don't want to change the caller codes such as FileStore. >>>>> It works well now. ;-) >>>> >>>> True. We can also just make a second layer of methods (_foo() instead of >>>> foo() or someting) that take the transaction as an argument. >>>> >>>> Or just fork DBObjectMap entirely so that we don't need to worry about >>>> breaking FileStore ondisk compatibility; we will likely want/need to do >>>> something like that eventually anyway! >>> >>> I'm confusing by "_remove" interface in FileStore that doesn't remove omap >>> keys with corresponding object. And I try to dump transaction what >>> "rados rm object -p data" doing, actually no delete operations with omap keys. >>> >>> So I'm wonder that it's the proper we don't remove omap keys? And I notice >>> MemStore did omap erase operation: >>> c->object_map.erase(oid); >>> c->object_hash.erase(oid); >> >> FileStore::_remove() calls lfn_unlink(), which calls >> object_map->clear(...) (if nlink == 0). >> >> I think that's what you're looking for? > > OH, it seemed that I missing it previously. Thank you. > >> >> sage >> >> >>> >>>> >>>> sage >>>> >>>>>> >>>>>>> 2. Read/Write object is a more frequenter operation which different >>>>>>> from OMap or xattr operations, we need more special handler now or >>>>>>> future to optimize. >>>>>>> 3. Different kv backend may have different features just like >>>>>>> FileSystemBackend, we would like to deal with these at KeyValueStore >>>>>>> not DBObjectMap or upper class. >>>>>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. >>>>>> >>>>>> I'm not fully following this description, but it sounds like you're >>>>>> thinking about the right issues. A few comments: >>>>>> >>>>>> - In the ideal case, we'd like to minimize the number of lookups/keys we >>>>>> query to access an object. This is a bit less important for objects that >>>>>> are cloned (they tend to be snapshots... mostly). >>>>>> >>>>>> - I think it makes sense to make the main header key for an object be able >>>>>> to embed various bits of useful data, like >>>>>> >>>>>> - all of the xattrs, if there aren't many of them >>>>>> - the file size >>>>>> - the file content, if it is small >>>>>> >>>>>> No need for this in the initial implementation, but we should design >>>>>> something that can accomodate it. >>>>>> >>>>>> - It would be nice to capture the striping CRUD stuff in a separate class; >>>>>> a child of DBObjectMap or something similar. This will make it easy to >>>>>> swap out and/or experiment with different approaches. >>>>>> >>>>>>> So in this proposal, DBObjectMap will serve as a bridge in the front >>>>>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store >>>>>>> stripped object and DBObjectMap::Header to store metadata. If so, my >>>>>>> previous implementation could be fully make use of. :-) >>>>>> >>>>>> That's great news! Let me know if there is anything we can do to help >>>>>> here. Another problem: Because DBObjectMap API only accept ghobject_t and have no info about the object which collection belong to. So if using DBObjectMap inherent API, it can't handle with ObjectStore APIs such as "collection_list" and "collection_empty". If adding "coll_t" argument to DBObjectMap API and new obj_name->key function "DBObjectMap::ghobject_key_v1", it looks like most of API needed to be rewrite that I don't want. Another way is that adding a collection-objects mapping, which may add the number of operations each transactions. >>>>>> >>>>>> sage >>>>> >>>>> Thanks for your comments! >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> >>>>> Wheat >>> >>> Best regards, >>> Wheats > > Best regards, > Wheats Best regards, Wheats ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-12-22 9:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CACJqLyaxvRF8R51i0atG_GgFvWcWDZMjrzbBwJG0SPiziSKp1g@mail.gmail.com> [not found] ` <alpine.DEB.2.00.1312112117020.4714@cobra.newdream.net> [not found] ` <CACJqLybbLiqv+Z5HZtwVAdYQLe-db2vsb=eO6oXT9qGZ6UNQnQ@mail.gmail.com> 2013-12-12 17:01 ` Refactor DBObjectMap Proposal Sage Weil 2013-12-21 14:33 ` Haomai Wang 2013-12-22 5:20 ` Sage Weil 2013-12-22 6:02 ` Haomai Wang 2013-12-22 9:44 ` Haomai Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.