All of lore.kernel.org
 help / color / mirror / Atom feed
* bitmap allocator granularity
@ 2016-09-06 19:27 Sage Weil
  2016-09-07  6:02 ` Ramesh Chander
  0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2016-09-06 19:27 UTC (permalink / raw)
  To: Ramesh.Chander, ceph-devel

Hi Ramesh,

It looks like fsck error I've been chasing on my branch is a general 
problem with the bitmap granularity.  The 
ObjectStore/StoreTest.SyntheticMatrixCsumVsCompression/2 test sets 
min_alloc_size to 32k and then to something smaller after that.  My branch 
adds an occasional umount+fsck+mount to the synthetic 
workload test that uncovers a problem: if we start with a 
small min_alloc_size, write some objects, and then umount and remount with 
a larger min_alloc_size (say, 32k), things can go wrong.  The allocator 
defines its bits in terms of min_alloc_size, but some used extents are 
smaller than that, and when they get released we trigger an assert like

/home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: In function 'void BitMapAllocator::insert_free(uint64_t, uint64_t)' thread 7ffb44deb700 time 2016-09-06 15:23:39.055902
/home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off % m_block_size))

There was a related issue with fsck that its used_blcoks bitmap was 
min_alloc_size granularity.

I see two options: we can either unconditionally maintain the bitmap in 
block_size units, or we can store persistently the smallest min_alloc_size 
that we have ever mounted with and use that ("min_min_alloc_size?").

What do you think?
sage

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: bitmap allocator granularity
  2016-09-06 19:27 bitmap allocator granularity Sage Weil
@ 2016-09-07  6:02 ` Ramesh Chander
  2016-09-07 13:17   ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Ramesh Chander @ 2016-09-07  6:02 UTC (permalink / raw)
  To: Sage Weil, ceph-devel

Hi Sage,

Thanks for root causing this and pin pointing the problem.

In my opinion keeping it persistent is good way to go.

This is because min_alloc_size is the unit that allocator has to guarantee that it will allocate contiguous.

If we store something else smaller (block size) in allocator, then we might end up searching for N contiguous bits every time even for single unit allocation.

This might happen in any case but aligning it to min_alloc_size will at least avoid it in good configurations.

Also in some misconfiguration where block size is very small,  the allocator might take more memory even if minimum allocation unit is larger.

I assume that min_alloc_size is not something that will change too frequently in lifetime of OSD and we have to just make sure it works so whatever you suggested for persisting minimum min_alloc_size will work.
It can be stored in superblock or with Freelist manger metadata during mount and unmount time?

If you suggest I can do these changes?

-Ramesh


> -----Original Message-----
> From: Sage Weil [mailto:sweil@redhat.com]
> Sent: Wednesday, September 07, 2016 12:57 AM
> To: Ramesh Chander; ceph-devel@vger.kernel.org
> Subject: bitmap allocator granularity
>
> Hi Ramesh,
>
> It looks like fsck error I've been chasing on my branch is a general problem
> with the bitmap granularity.  The
> ObjectStore/StoreTest.SyntheticMatrixCsumVsCompression/2 test sets
> min_alloc_size to 32k and then to something smaller after that.  My branch
> adds an occasional umount+fsck+mount to the synthetic workload test that
> uncovers a problem: if we start with a small min_alloc_size, write some
> objects, and then umount and remount with a larger min_alloc_size (say,
> 32k), things can go wrong.  The allocator defines its bits in terms of
> min_alloc_size, but some used extents are smaller than that, and when they
> get released we trigger an assert like
>
> /home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: In function 'void
> BitMapAllocator::insert_free(uint64_t, uint64_t)' thread 7ffb44deb700 time
> 2016-09-06 15:23:39.055902
> /home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: 76: FAILED
> assert(!(off % m_block_size))
>
> There was a related issue with fsck that its used_blcoks bitmap was
> min_alloc_size granularity.
>
> I see two options: we can either unconditionally maintain the bitmap in
> block_size units, or we can store persistently the smallest min_alloc_size that
> we have ever mounted with and use that ("min_min_alloc_size?").
>
> What do you think?
> sage
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: bitmap allocator granularity
  2016-09-07  6:02 ` Ramesh Chander
@ 2016-09-07 13:17   ` Sage Weil
  0 siblings, 0 replies; 3+ messages in thread
From: Sage Weil @ 2016-09-07 13:17 UTC (permalink / raw)
  To: Ramesh Chander; +Cc: ceph-devel

On Wed, 7 Sep 2016, Ramesh Chander wrote:
> Hi Sage,
> 
> Thanks for root causing this and pin pointing the problem.
> 
> In my opinion keeping it persistent is good way to go.
> 
> This is because min_alloc_size is the unit that allocator has to 
> guarantee that it will allocate contiguous.
> 
> If we store something else smaller (block size) in allocator, then we 
> might end up searching for N contiguous bits every time even for single 
> unit allocation.
> 
> This might happen in any case but aligning it to min_alloc_size will at 
> least avoid it in good configurations.
> 
> Also in some misconfiguration where block size is very small, the 
> allocator might take more memory even if minimum allocation unit is 
> larger.
> 
> I assume that min_alloc_size is not something that will change too 
> frequently in lifetime of OSD and we have to just make sure it works so 
> whatever you suggested for persisting minimum min_alloc_size will work. 

If we go this route I see two options: specify min_alloc_size only at mkfs 
time and store and use that value forever after, or store the 
min_min_alloc_size and use that for the allocator granularity.  Any 
preferences?

The min_min_alloc_size isn't difficult, but it's weird that starting with 
min_alloc_size of 64K, switching to 4K, and then switching back to 64K 
will not behave/perform the same as having it at 64K the whole time.  We 
can just say as much on startup in the log, I suppose.

> It can be stored in superblock or with Freelist manger metadata during 
> mount and unmount time?
> If you suggest I can do these changes?

Sure!  There's the "super" area that's read during startup 
(_open_super_meta) and a corresponding write function too.

Thanks!
sage



> 
> -Ramesh
> 
> 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@redhat.com]
> > Sent: Wednesday, September 07, 2016 12:57 AM
> > To: Ramesh Chander; ceph-devel@vger.kernel.org
> > Subject: bitmap allocator granularity
> >
> > Hi Ramesh,
> >
> > It looks like fsck error I've been chasing on my branch is a general problem
> > with the bitmap granularity.  The
> > ObjectStore/StoreTest.SyntheticMatrixCsumVsCompression/2 test sets
> > min_alloc_size to 32k and then to something smaller after that.  My branch
> > adds an occasional umount+fsck+mount to the synthetic workload test that
> > uncovers a problem: if we start with a small min_alloc_size, write some
> > objects, and then umount and remount with a larger min_alloc_size (say,
> > 32k), things can go wrong.  The allocator defines its bits in terms of
> > min_alloc_size, but some used extents are smaller than that, and when they
> > get released we trigger an assert like
> >
> > /home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: In function 'void
> > BitMapAllocator::insert_free(uint64_t, uint64_t)' thread 7ffb44deb700 time
> > 2016-09-06 15:23:39.055902
> > /home/sage/src/ceph/src/os/bluestore/BitMapAllocator.cc: 76: FAILED
> > assert(!(off % m_block_size))
> >
> > There was a related issue with fsck that its used_blcoks bitmap was
> > min_alloc_size granularity.
> >
> > I see two options: we can either unconditionally maintain the bitmap in
> > block_size units, or we can store persistently the smallest min_alloc_size that
> > we have ever mounted with and use that ("min_min_alloc_size?").
> >
> > What do you think?
> > sage
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-07 13:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 19:27 bitmap allocator granularity Sage Weil
2016-09-07  6:02 ` Ramesh Chander
2016-09-07 13:17   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.