From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ramesh Chander Subject: RE: Bluestore Bitmap allocator crash Date: Thu, 21 Jul 2016 04:40:12 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: Received: from mail-by2nam01on0062.outbound.protection.outlook.com ([104.47.34.62]:51760 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750955AbcGUFOL convert rfc822-to-8bit (ORCPT ); Thu, 21 Jul 2016 01:14:11 -0400 In-Reply-To: Content-Language: en-US Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Mark Nelson , Somnath Roy Cc: ceph-devel Hi Somnath, Mark, I will look in to it now with priority. -Ramesh > -----Original Message----- > From: Mark Nelson [mailto:mnelson@redhat.com] > Sent: Thursday, July 21, 2016 9:24 AM > To: Somnath Roy; Ramesh Chander > Cc: ceph-devel > Subject: Re: Bluestore Bitmap allocator crash > > Hi Somnath & Ramesh, > > Speaking of allocator failures, dmick noticed tonight that we are failing make > check specifically in unittest_bit_alloc. We both did bisects and came to the > same conclusion that it first appeared with the merge this morning of: > > https://github.com/ceph/ceph/pull/10257 > > Ultimately the assert we are failing is here: > > https://github.com/ceph/ceph/blob/c98ced1d5ae3d3709d0cd38c5b075b1b2 > c458a74/src/os/bluestore/BitAllocator.cc#L1518 > > I've been digging in tonight with gdb. Here's the relevant parts of the bt: > > #2 0x00007ffff3712566 in __assert_fail_base (fmt=0x7ffff3862288 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x60ffe0 "start_block + num_blocks <= size()", > file=file@entry=0x60fea8 > "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", > line=line@entry=1518, > function=function@entry=0x610300 > > "virtual void BitAllocator::set_blocks_used(int64_t, int64_t)") at > assert.c:92 > #3 0x00007ffff3712612 in __GI___assert_fail (assertion=0x60ffe0 > "start_block + num_blocks <= size()", file=0x60fea8 > "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", > line=1518, > function=0x610300 long)::__PRETTY_FUNCTION__> "virtual void > BitAllocator::set_blocks_used(int64_t, int64_t)") at assert.c:101 > #4 0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, > start_block=1035, num_blocks=501) at > /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518 > #5 0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, > total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at > /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223 > #6 0x00000000004edbcd in BitAllocator_test_bmap_alloc_Test::TestBody > (this=) at > /home/ubuntu/src/markhpc/ceph/src/test/objectstore/BitAllocator_test.cc: > 448 > > We can see that in frame 5 the total_blocks is 1035 and zone_size_block is > 512: > > #5 0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, > total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at > /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223 > 1223 init_check(total_blocks, zone_size_block, mode, false, false); > > While in frame 4 the start_block is 1035 and the num_blocks is 501, and > size() is 1035, which is why the assert is failing. > > (gdb) frame 4 > #4 0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, > start_block=1035, num_blocks=501) at > /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518 > 1518 debug_assert(start_block + num_blocks <= size()); > (gdb) print size() > $19 = 1035 > > Ultimately the code in BitAllocator::init_check seems like it might be relevant. > I haven't dug in enough to understand what's going on in there, but it seems > like we must be hitting a corner case given that the start_block is the same as > size(). > > Mark > > On 07/20/2016 08:44 PM, Somnath Roy wrote: > > Ramesh, > > I am hitting the following crash during IO path the moment I started io. > > > > os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off % > > m_block_size)) > > > > ceph version 11.0.0-696-ga3438ba > > (a3438bac71a54cb43e5feb93ad09228bf69942ae) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x80) [0x55804242de40] > > 2: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x2e3) > > [0x558042132813] > > 3: (BitMapAllocator::commit_finish()+0x2a5) [0x558042132e55] > > 4: (BlueStore::_kv_sync_thread()+0x142d) [0x558041fff61d] > > 5: (BlueStore::KVSyncThread::entry()+0xd) [0x558042028c2d] > > 6: (Thread::entry_wrapper()+0x75) [0x55804240d755] > > 7: (()+0x76fa) [0x7f36699076fa] > > 8: (clone()+0x6d) [0x7f3667767b5d] > > NOTE: a copy of the executable, or `objdump -rdS ` is needed > to interpret this. > > > > > > What I did : > > ------------ > > > > 1. I have a separate wal partition to a nvram device > > > > 2. change min_alloc_size to 16K > > > > 3. ran 4k rw. > > > > Let me know if you need further details. > > > > Thanks & Regards > > Somnath > > > > > > > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@vger.kernel.org More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).