All of lore.kernel.org
 help / color / mirror / Atom feed
* mempool update
@ 2016-10-10 19:05 Sage Weil
  2016-10-10 19:13 ` Mark Nelson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sage Weil @ 2016-10-10 19:05 UTC (permalink / raw)
  To: ceph-devel; +Cc: allen.samuels

Hi Allen, everyone,

I spent the end of last week and today messing around some more with the 
mempool framework.  My primary complaints were that the debug mode with 
containers wasn't stable (because containers would come and go and race 
with an enumeration of the list) and (more importantly) that the 
per-container tracking wasn't actually what I wanted to see.  I wanted to 
know "how much memory is consume in the entire pool by container 
declaration x," and not how many items/bytes are in each individual 
container instance.

This led me to restructure the way the containers are handled by defining 
a template allocator *type* for the contained type (or 
container-specific wrapper, like std::_Rb_tree_node) that had a static 
member containing the actual allocator instance.  That mean that for 
each type we declare to be in a container there has to be matching .cc 
definition of the allocator structure (slightly more awkward) but it 
totally eliminated the per-container-instance allocator state and made the 
debug mode very light-weight (we enumerate the *types*, not instances).

The new version works something like so:

 struct obj {
   MEMBER_OF_MEMPOOL();
   int a;
   int b;
   obj() : a(1), b(1) {}
   obj(int _a) : a(_a), b(2) {}
   obj(int _a,int _b) : a(_a), b(_b) {}
   friend inline bool operator<(const obj& l, const obj& r) {
     return l.a < r.a;
   }
 };

 // and in a .cc file,
 MEMPOOL_DEFINE_OBJECT_FACTORY(obj, obj, unittest_2);

(That duplicate obj arg is just because scoped types like BlueStore::Foo 
can't be used to name the _factory_* static instance variable, so the user 
has to pass in a second variable-nameable name.  Not sure if there is some 
preprocessor magic that can automate that or not...)

If I want to make containers, I can do this:

 unittest_1::vector<int> v;

 // and in a .cc file,
 MEMPOOL_DEFINE_FACTORY(int, int, unittest_1);

Vector works out of the box because it doesn't have a container-internal 
wrapper, but for set and map and list you have to do

 unittest_1::set<int> s;
 unittest_1::map<int,obj> m;
 unittest_1::list<int> l;

 MEMPOOL_DEFINE_SET_FACTORY(int, int, unittest_1);
 MEMPOOL_DEFINE_MAP_FACTORY(int, obj, int_obj, unittest_1);
 MEMPOOL_DEFINE_LIST_FACTORY(int, int, unittest_1);

for the st::_List_node etc allocator instances.  Now, when you do debug 
mode and dump the stats, you can see information like this (for 
bluestore):

    "bluestore_meta_onode": {
        "total_bytes": 174760,
        "total_items": 257,
        "by_type": [
            {
                "type": "BlueStore::Onode",
                "bytes": 174760,
                "items": 257
            }
        ]
    },
    "bluestore_meta_other": {
        "total_bytes": 7969096,
        "total_items": 76899,
        "by_type": [
            {
                "type": "std::_Rb_tree_node<std::pair<unsigned long const, bluestore_extent_ref_m
ap_t::record_t> >",
                "bytes": 1146048,
                "items": 23876
            },
            {
                "type": "BlueStore::ExtentMap::Shard",
                "bytes": 111384,
                "items": 1989
            },
            {
                "type": "BlueStore::SharedBlob",
                "bytes": 2494848,
                "items": 12994
            },
            {
                "type": "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::
Blob> > >",
                "bytes": 55056,
                "items": 1147
            },
            {
                "type": "BlueStore::Blob",
                "bytes": 3014608,
                "items": 12994
            },
            {
                "type": "BlueStore::Extent",
                "bytes": 1147152,
                "items": 23899
            },
            {
                "type": "std::_Rb_tree_node<std::pair<unsigned long const, std::unique_ptr<BlueSt
ore::Buffer, std::default_delete<BlueStore::Buffer> > > >",
                "bytes": 0,
                "items": 0
            },
            {
                "type": "BlueStore::Buffer",
                "bytes": 0,
                "items": 0
            }
        ]
    }

which tells pretty much the full story about where the memory is going and 
doesn't have the per-instance uglyness.  Notably, the container types in 
the mempool namespace are just convenient names for std::map<> 
etc--there's no longer a need to define the container types.

Here's the latest iteration:

	https://github.com/liewegas/ceph/blob/wip-mempool/src/include/mempool.h

Note that this version also removes all of the slab allocator logic--it's 
just the accounting infrastructure.  Mark's most recent run shows that the 
bluestore mempool is set at 1 GB and the RSS is almost 4 GB, so either 
there are lots of allocations we're not capturing yet or the heap is 
fragmenting heavily.  I suspect we'll want to add the slab behavior back 
in soon so that we have memory that is nicely packed, but I'd prefer to 
keep that change orthogonal so that we understand the full behavior!

Thoughts?
sage

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mempool update
  2016-10-10 19:05 mempool update Sage Weil
@ 2016-10-10 19:13 ` Mark Nelson
  2016-10-11 14:16 ` Mark Nelson
  2016-10-11 19:21 ` Sage Weil
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Nelson @ 2016-10-10 19:13 UTC (permalink / raw)
  To: Sage Weil, ceph-devel; +Cc: allen.samuels

On 10/10/2016 02:05 PM, Sage Weil wrote:
> Hi Allen, everyone,
>
> I spent the end of last week and today messing around some more with the
> mempool framework.  My primary complaints were that the debug mode with
> containers wasn't stable (because containers would come and go and race
> with an enumeration of the list) and (more importantly) that the
> per-container tracking wasn't actually what I wanted to see.  I wanted to
> know "how much memory is consume in the entire pool by container
> declaration x," and not how many items/bytes are in each individual
> container instance.
>
> This led me to restructure the way the containers are handled by defining
> a template allocator *type* for the contained type (or
> container-specific wrapper, like std::_Rb_tree_node) that had a static
> member containing the actual allocator instance.  That mean that for
> each type we declare to be in a container there has to be matching .cc
> definition of the allocator structure (slightly more awkward) but it
> totally eliminated the per-container-instance allocator state and made the
> debug mode very light-weight (we enumerate the *types*, not instances).
>
> The new version works something like so:
>
>  struct obj {
>    MEMBER_OF_MEMPOOL();
>    int a;
>    int b;
>    obj() : a(1), b(1) {}
>    obj(int _a) : a(_a), b(2) {}
>    obj(int _a,int _b) : a(_a), b(_b) {}
>    friend inline bool operator<(const obj& l, const obj& r) {
>      return l.a < r.a;
>    }
>  };
>
>  // and in a .cc file,
>  MEMPOOL_DEFINE_OBJECT_FACTORY(obj, obj, unittest_2);
>
> (That duplicate obj arg is just because scoped types like BlueStore::Foo
> can't be used to name the _factory_* static instance variable, so the user
> has to pass in a second variable-nameable name.  Not sure if there is some
> preprocessor magic that can automate that or not...)
>
> If I want to make containers, I can do this:
>
>  unittest_1::vector<int> v;
>
>  // and in a .cc file,
>  MEMPOOL_DEFINE_FACTORY(int, int, unittest_1);
>
> Vector works out of the box because it doesn't have a container-internal
> wrapper, but for set and map and list you have to do
>
>  unittest_1::set<int> s;
>  unittest_1::map<int,obj> m;
>  unittest_1::list<int> l;
>
>  MEMPOOL_DEFINE_SET_FACTORY(int, int, unittest_1);
>  MEMPOOL_DEFINE_MAP_FACTORY(int, obj, int_obj, unittest_1);
>  MEMPOOL_DEFINE_LIST_FACTORY(int, int, unittest_1);
>
> for the st::_List_node etc allocator instances.  Now, when you do debug
> mode and dump the stats, you can see information like this (for
> bluestore):
>
>     "bluestore_meta_onode": {
>         "total_bytes": 174760,
>         "total_items": 257,
>         "by_type": [
>             {
>                 "type": "BlueStore::Onode",
>                 "bytes": 174760,
>                 "items": 257
>             }
>         ]
>     },
>     "bluestore_meta_other": {
>         "total_bytes": 7969096,
>         "total_items": 76899,
>         "by_type": [
>             {
>                 "type": "std::_Rb_tree_node<std::pair<unsigned long const, bluestore_extent_ref_m
> ap_t::record_t> >",
>                 "bytes": 1146048,
>                 "items": 23876
>             },
>             {
>                 "type": "BlueStore::ExtentMap::Shard",
>                 "bytes": 111384,
>                 "items": 1989
>             },
>             {
>                 "type": "BlueStore::SharedBlob",
>                 "bytes": 2494848,
>                 "items": 12994
>             },
>             {
>                 "type": "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::
> Blob> > >",
>                 "bytes": 55056,
>                 "items": 1147
>             },
>             {
>                 "type": "BlueStore::Blob",
>                 "bytes": 3014608,
>                 "items": 12994
>             },
>             {
>                 "type": "BlueStore::Extent",
>                 "bytes": 1147152,
>                 "items": 23899
>             },
>             {
>                 "type": "std::_Rb_tree_node<std::pair<unsigned long const, std::unique_ptr<BlueSt
> ore::Buffer, std::default_delete<BlueStore::Buffer> > > >",
>                 "bytes": 0,
>                 "items": 0
>             },
>             {
>                 "type": "BlueStore::Buffer",
>                 "bytes": 0,
>                 "items": 0
>             }
>         ]
>     }
>

Beautiful!  Let me see if I can gather some more information with this 
verison.

> which tells pretty much the full story about where the memory is going and
> doesn't have the per-instance uglyness.  Notably, the container types in
> the mempool namespace are just convenient names for std::map<>
> etc--there's no longer a need to define the container types.
>
> Here's the latest iteration:
>
> 	https://github.com/liewegas/ceph/blob/wip-mempool/src/include/mempool.h
>
> Note that this version also removes all of the slab allocator logic--it's
> just the accounting infrastructure.  Mark's most recent run shows that the
> bluestore mempool is set at 1 GB and the RSS is almost 4 GB, so either
> there are lots of allocations we're not capturing yet or the heap is
> fragmenting heavily.  I suspect we'll want to add the slab behavior back
> in soon so that we have memory that is nicely packed, but I'd prefer to
> keep that change orthogonal so that we understand the full behavior!

FWIW, there has been some discrepancy in the massif heap results vs RSS 
too, though perhaps less of one than what we are seeing here.  I would 
hazzard a guess that it's a bit of both.

>
> Thoughts?
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mempool update
  2016-10-10 19:05 mempool update Sage Weil
  2016-10-10 19:13 ` Mark Nelson
@ 2016-10-11 14:16 ` Mark Nelson
  2016-10-11 19:21 ` Sage Weil
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Nelson @ 2016-10-11 14:16 UTC (permalink / raw)
  To: Sage Weil, ceph-devel; +Cc: allen.samuels



On 10/10/2016 02:05 PM, Sage Weil wrote:
> Hi Allen, everyone,

<snip>

>
>     "bluestore_meta_onode": {
>         "total_bytes": 174760,
>         "total_items": 257,
>         "by_type": [
>             {
>                 "type": "BlueStore::Onode",
>                 "bytes": 174760,
>                 "items": 257
>             }
>         ]
>     },
>     "bluestore_meta_other": {
>         "total_bytes": 7969096,
>         "total_items": 76899,
>         "by_type": [
>             {
>                 "type": "std::_Rb_tree_node<std::pair<unsigned long const, bluestore_extent_ref_m
> ap_t::record_t> >",
>                 "bytes": 1146048,
>                 "items": 23876
>             },
>             {
>                 "type": "BlueStore::ExtentMap::Shard",
>                 "bytes": 111384,
>                 "items": 1989
>             },
>             {
>                 "type": "BlueStore::SharedBlob",
>                 "bytes": 2494848,
>                 "items": 12994
>             },
>             {
>                 "type": "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::
> Blob> > >",
>                 "bytes": 55056,
>                 "items": 1147
>             },
>             {
>                 "type": "BlueStore::Blob",
>                 "bytes": 3014608,
>                 "items": 12994
>             },
>             {
>                 "type": "BlueStore::Extent",
>                 "bytes": 1147152,
>                 "items": 23899
>             },
>             {
>                 "type": "std::_Rb_tree_node<std::pair<unsigned long const, std::unique_ptr<BlueSt
> ore::Buffer, std::default_delete<BlueStore::Buffer> > > >",
>                 "bytes": 0,
>                 "items": 0
>             },
>             {
>                 "type": "BlueStore::Buffer",
>                 "bytes": 0,
>                 "items": 0
>             }
>         ]
>     }
>

I ran some tests yesterday, sadly without 11394 until we've got a 
working version we can apply to wip-mempool.  I dumped the stats (this 
was with about 14GB of RSS per OSD!) and compared them against Sage's 
numbers:

https://drive.google.com/file/d/0B2gTBZrkrnpZdlNjNWkzbU4xMk0/view?usp=sharing

Looks like in my run some of the types are a bit smaller.  Not sure if 
that's due to fixes being applied to wip-mempool after sage's run.  The 
total amount of memory used by the cache is way smaller than the RSS of 
the OSD, but that's probably to be expected right now.

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mempool update
  2016-10-10 19:05 mempool update Sage Weil
  2016-10-10 19:13 ` Mark Nelson
  2016-10-11 14:16 ` Mark Nelson
@ 2016-10-11 19:21 ` Sage Weil
  2 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2016-10-11 19:21 UTC (permalink / raw)
  To: ceph-devel; +Cc: allen.samuels

On Mon, 10 Oct 2016, Sage Weil wrote:
> Here's the latest iteration:
> 
> 	https://github.com/liewegas/ceph/blob/wip-mempool/src/include/mempool.h

I've cleaned this up and it's ready for review and test.

	https://github.com/ceph/ceph/pull/11331

- fixed linking (libglobal now)
- generic asok command and config mempool_debug option (default false)
- can safely enable/disable debug at runtime

and a bunch of small fixes and cleanups!

sage

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-11 19:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10 19:05 mempool update Sage Weil
2016-10-10 19:13 ` Mark Nelson
2016-10-11 14:16 ` Mark Nelson
2016-10-11 19:21 ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.