All of lore.kernel.org
 help / color / mirror / Atom feed
* Bluestore Bitmap allocator crash
@ 2016-07-21  1:44 Somnath Roy
  2016-07-21  3:53 ` Mark Nelson
  0 siblings, 1 reply; 3+ messages in thread
From: Somnath Roy @ 2016-07-21  1:44 UTC (permalink / raw)
  To: Ramesh Chander; +Cc: ceph-devel

Ramesh,
I am hitting the following crash during IO path the moment I started io.

os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off % m_block_size))

 ceph version 11.0.0-696-ga3438ba (a3438bac71a54cb43e5feb93ad09228bf69942ae)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55804242de40]
 2: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x2e3) [0x558042132813]
 3: (BitMapAllocator::commit_finish()+0x2a5) [0x558042132e55]
 4: (BlueStore::_kv_sync_thread()+0x142d) [0x558041fff61d]
 5: (BlueStore::KVSyncThread::entry()+0xd) [0x558042028c2d]
 6: (Thread::entry_wrapper()+0x75) [0x55804240d755]
 7: (()+0x76fa) [0x7f36699076fa]
 8: (clone()+0x6d) [0x7f3667767b5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


What I did :
------------

1. I have a separate wal partition to a nvram device

2. change min_alloc_size to 16K

3. ran 4k rw.

Let me know if you need further details.

Thanks & Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Bluestore Bitmap allocator crash
  2016-07-21  1:44 Bluestore Bitmap allocator crash Somnath Roy
@ 2016-07-21  3:53 ` Mark Nelson
  2016-07-21  4:40   ` Ramesh Chander
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Nelson @ 2016-07-21  3:53 UTC (permalink / raw)
  To: Somnath Roy, Ramesh Chander; +Cc: ceph-devel

Hi Somnath & Ramesh,

Speaking of allocator failures, dmick noticed tonight that we are 
failing make check specifically in unittest_bit_alloc.  We both did 
bisects and came to the same conclusion that it first appeared with the 
merge this morning of:

https://github.com/ceph/ceph/pull/10257

Ultimately the assert we are failing is here:

https://github.com/ceph/ceph/blob/c98ced1d5ae3d3709d0cd38c5b075b1b2c458a74/src/os/bluestore/BitAllocator.cc#L1518

I've been digging in tonight with gdb.  Here's the relevant parts of the bt:

#2  0x00007ffff3712566 in __assert_fail_base (fmt=0x7ffff3862288 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x60ffe0 "start_block + num_blocks <= size()",
     file=file@entry=0x60fea8 
"/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", 
line=line@entry=1518,
     function=function@entry=0x610300 
<BitAllocator::set_blocks_used(long, long)::__PRETTY_FUNCTION__> 
"virtual void BitAllocator::set_blocks_used(int64_t, int64_t)") at 
assert.c:92
#3  0x00007ffff3712612 in __GI___assert_fail (assertion=0x60ffe0 
"start_block + num_blocks <= size()", file=0x60fea8 
"/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", 
line=1518,
     function=0x610300 <BitAllocator::set_blocks_used(long, 
long)::__PRETTY_FUNCTION__> "virtual void 
BitAllocator::set_blocks_used(int64_t, int64_t)") at assert.c:101
#4  0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, 
start_block=1035, num_blocks=501) at 
/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518
#5  0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, 
total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at 
/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223
#6  0x00000000004edbcd in BitAllocator_test_bmap_alloc_Test::TestBody 
(this=<optimized out>) at 
/home/ubuntu/src/markhpc/ceph/src/test/objectstore/BitAllocator_test.cc:448

We can see that in frame 5 the total_blocks is 1035 and zone_size_block 
is 512:

#5  0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, 
total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at 
/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223
1223	  init_check(total_blocks, zone_size_block, mode, false, false);

While in frame 4 the start_block is 1035 and the num_blocks is 501, and 
size() is 1035, which is why the assert is failing.

(gdb) frame 4
#4  0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, 
start_block=1035, num_blocks=501) at 
/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518
1518	  debug_assert(start_block + num_blocks <= size());
(gdb) print size()
$19 = 1035

Ultimately the code in BitAllocator::init_check seems like it might be 
relevant.  I haven't dug in enough to understand what's going on in 
there, but it seems like we must be hitting a corner case given that the 
start_block is the same as size().

Mark

On 07/20/2016 08:44 PM, Somnath Roy wrote:
> Ramesh,
> I am hitting the following crash during IO path the moment I started io.
>
> os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off % m_block_size))
>
>  ceph version 11.0.0-696-ga3438ba (a3438bac71a54cb43e5feb93ad09228bf69942ae)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55804242de40]
>  2: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x2e3) [0x558042132813]
>  3: (BitMapAllocator::commit_finish()+0x2a5) [0x558042132e55]
>  4: (BlueStore::_kv_sync_thread()+0x142d) [0x558041fff61d]
>  5: (BlueStore::KVSyncThread::entry()+0xd) [0x558042028c2d]
>  6: (Thread::entry_wrapper()+0x75) [0x55804240d755]
>  7: (()+0x76fa) [0x7f36699076fa]
>  8: (clone()+0x6d) [0x7f3667767b5d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
>
> What I did :
> ------------
>
> 1. I have a separate wal partition to a nvram device
>
> 2. change min_alloc_size to 16K
>
> 3. ran 4k rw.
>
> Let me know if you need further details.
>
> Thanks & Regards
> Somnath
>
>
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Bluestore Bitmap allocator crash
  2016-07-21  3:53 ` Mark Nelson
@ 2016-07-21  4:40   ` Ramesh Chander
  0 siblings, 0 replies; 3+ messages in thread
From: Ramesh Chander @ 2016-07-21  4:40 UTC (permalink / raw)
  To: Mark Nelson, Somnath Roy; +Cc: ceph-devel

Hi Somnath, Mark,

I will look in to it now with priority.

-Ramesh

> -----Original Message-----
> From: Mark Nelson [mailto:mnelson@redhat.com]
> Sent: Thursday, July 21, 2016 9:24 AM
> To: Somnath Roy; Ramesh Chander
> Cc: ceph-devel
> Subject: Re: Bluestore Bitmap allocator crash
>
> Hi Somnath & Ramesh,
>
> Speaking of allocator failures, dmick noticed tonight that we are failing make
> check specifically in unittest_bit_alloc.  We both did bisects and came to the
> same conclusion that it first appeared with the merge this morning of:
>
> https://github.com/ceph/ceph/pull/10257
>
> Ultimately the assert we are failing is here:
>
> https://github.com/ceph/ceph/blob/c98ced1d5ae3d3709d0cd38c5b075b1b2
> c458a74/src/os/bluestore/BitAllocator.cc#L1518
>
> I've been digging in tonight with gdb.  Here's the relevant parts of the bt:
>
> #2  0x00007ffff3712566 in __assert_fail_base (fmt=0x7ffff3862288
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> assertion=assertion@entry=0x60ffe0 "start_block + num_blocks <= size()",
>      file=file@entry=0x60fea8
> "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc",
> line=line@entry=1518,
>      function=function@entry=0x610300
> <BitAllocator::set_blocks_used(long, long)::__PRETTY_FUNCTION__>
> "virtual void BitAllocator::set_blocks_used(int64_t, int64_t)") at
> assert.c:92
> #3  0x00007ffff3712612 in __GI___assert_fail (assertion=0x60ffe0
> "start_block + num_blocks <= size()", file=0x60fea8
> "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc",
> line=1518,
>      function=0x610300 <BitAllocator::set_blocks_used(long,
> long)::__PRETTY_FUNCTION__> "virtual void
> BitAllocator::set_blocks_used(int64_t, int64_t)") at assert.c:101
> #4  0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0,
> start_block=1035, num_blocks=501) at
> /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518
> #5  0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0,
> total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at
> /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223
> #6  0x00000000004edbcd in BitAllocator_test_bmap_alloc_Test::TestBody
> (this=<optimized out>) at
> /home/ubuntu/src/markhpc/ceph/src/test/objectstore/BitAllocator_test.cc:
> 448
>
> We can see that in frame 5 the total_blocks is 1035 and zone_size_block is
> 512:
>
> #5  0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0,
> total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at
> /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223
> 1223  init_check(total_blocks, zone_size_block, mode, false, false);
>
> While in frame 4 the start_block is 1035 and the num_blocks is 501, and
> size() is 1035, which is why the assert is failing.
>
> (gdb) frame 4
> #4  0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0,
> start_block=1035, num_blocks=501) at
> /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518
> 1518  debug_assert(start_block + num_blocks <= size());
> (gdb) print size()
> $19 = 1035
>
> Ultimately the code in BitAllocator::init_check seems like it might be relevant.
> I haven't dug in enough to understand what's going on in there, but it seems
> like we must be hitting a corner case given that the start_block is the same as
> size().
>
> Mark
>
> On 07/20/2016 08:44 PM, Somnath Roy wrote:
> > Ramesh,
> > I am hitting the following crash during IO path the moment I started io.
> >
> > os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off %
> > m_block_size))
> >
> >  ceph version 11.0.0-696-ga3438ba
> > (a3438bac71a54cb43e5feb93ad09228bf69942ae)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x80) [0x55804242de40]
> >  2: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x2e3)
> > [0x558042132813]
> >  3: (BitMapAllocator::commit_finish()+0x2a5) [0x558042132e55]
> >  4: (BlueStore::_kv_sync_thread()+0x142d) [0x558041fff61d]
> >  5: (BlueStore::KVSyncThread::entry()+0xd) [0x558042028c2d]
> >  6: (Thread::entry_wrapper()+0x75) [0x55804240d755]
> >  7: (()+0x76fa) [0x7f36699076fa]
> >  8: (clone()+0x6d) [0x7f3667767b5d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
> >
> >
> > What I did :
> > ------------
> >
> > 1. I have a separate wal partition to a nvram device
> >
> > 2. change min_alloc_size to 16K
> >
> > 3. ran 4k rw.
> >
> > Let me know if you need further details.
> >
> > Thanks & Regards
> > Somnath
> >
> >
> >
> > PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-21  5:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21  1:44 Bluestore Bitmap allocator crash Somnath Roy
2016-07-21  3:53 ` Mark Nelson
2016-07-21  4:40   ` Ramesh Chander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.