From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ramesh Chander Subject: RE: Anybody else hitting this panic in latest master with bluestore? Date: Wed, 8 Jun 2016 06:00:36 +0000 Message-ID: References: <8dd167e2-89c2-84dc-fee1-216be0dc6392@redhat.com> <3c72bdf4-cdfc-42f6-e73f-e0100958b103@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-by2on0053.outbound.protection.outlook.com ([207.46.100.53]:43392 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1161420AbcFHGAk convert rfc822-to-8bit (ORCPT ); Wed, 8 Jun 2016 02:00:40 -0400 In-Reply-To: Content-Language: en-US Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Somnath Roy , Mark Nelson , Sage Weil Cc: ceph-devel Hi Somnath, I think setting 4k block size is done intentionally. 127 128 // Operate as though the block size is 4 KB. The backing file 129 // blksize doesn't strictly matter except that some file systems = may 130 // require a read/modify/write if we write something smaller than 131 // it. 132 block_size =3D g_conf->bdev_block_size; 133 if (block_size !=3D (unsigned)st.st_blksize) { 134 dout(1) << __func__ << " backing device/file reports st_blksize= " 135 << st.st_blksize << ", using bdev_block_size " 136 << block_size << " anyway" << dendl; 137 } 138 Other than more fragmentation we should not see any issue by taking blo= ck size as 4k instead of 512. At least I am not aware of. How to reproduce it? I can have a look. -Ramesh > -----Original Message----- > From: Somnath Roy > Sent: Wednesday, June 08, 2016 5:04 AM > To: Somnath Roy; Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bl= uestore? > > Ok , I think I found out what is happening in my environment. This dr= ive is > formatted with 512 logical block size. > BitMap allocator is by default is working with 4K block size and the = calculation > is breaking (?). I have reformatted the device with 4K and it worked = fine. > I don't think taking this logical block size parameter as user input = may not be > *wise*. > Since OS needs that all devices is advertising the correct logical bl= ock size > here. > > /sys/block/sdb/queue/logical_block_size > > Allocator needs to read the correct size from the above location. > Sage/Ramesh ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- > owner@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Tuesday, June 07, 2016 1:12 PM > To: Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bl= uestore? > > Mark/Sage, > That problem seems to be gone. BTW, rocksdb folder is not cleaned wit= h > 'make clean'. I took latest master and manually clean rocksdb folder = as you > suggested.. > But, now I am hitting the following crash in some of my drives. It se= ems to be > related to block alignment. > > 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 > os/bluestore/BitmapFreelistManager.cc: In function 'void > BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transacti= on)' > thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 > os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & > block_mask) =3D=3D offset) > > ceph version 10.2.0-2021-g55cb608 > (55cb608f63787f7969514ad0d7222da68ab84d88) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x5652219dd0a0] > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, > std::shared_ptr)+0x12ed) [0x5652216af96d= ] > 3: (BitmapFreelistManager::create(unsigned long, > std::shared_ptr)+0x33f) [0x5652216b034f] > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] > 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] > 6: (OSD::mkfs(CephContext*, ObjectStore*, > std::__cxx11::basic_string, std::allocat= or > > const&, uuid_d, int)+0x117) [0x5652212776c7] > 7: (main()+0x1003) [0x565221209533] > 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] > 9: (_start()+0x29) [0x5652212588b9] > NOTE: a copy of the executable, or `objdump -rdS ` is ne= eded > to interpret this. > > Here is my disk partitions.. > > Osd.15 on /dev/sdi crashed.. > > > sdi 8:128 0 7T 0 disk > =86=80sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 > =84=80sdi2 8:130 0 7T 0 part > nvme0n1 259:0 0 15.4G 0 disk > root@emsnode11:~/ceph-master/src# fdisk /dev/sdi > > Welcome to fdisk (util-linux 2.27.1). > Changes will remain in memory only, until you decide to write them. > Be careful before using the write command. > > > Command (m for help): p > Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors > Units: sectors of 1 * 512 =3D 512 bytes > Sector size (logical/physical): 512 bytes / 16384 bytes I/O size > (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk > identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E > > Device Start End Sectors Size Type > /dev/sdi1 2048 20973567 20971520 10G Linux filesystem > /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem > > Seems to be aligned properly , what alignment bitmap allocator is loo= king for > (Ramesh ?). > I will debug further and update. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Somnath Roy > Sent: Tuesday, June 07, 2016 11:06 AM > To: 'Mark Nelson'; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bl= uestore? > > I will try now and let you know. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Mark Nelson [mailto:mnelson@redhat.com] > Sent: Tuesday, June 07, 2016 10:57 AM > To: Somnath Roy; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bl= uestore? > > Hi Somnath, > > Did Sage's suggestion fix it for you? In my tests rocksdb wasn't bui= lding > properly after an upstream commit to detect when jemalloc isn't > present: > > https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 > 510fcad4756616a > > I've submitted a fix that is now in master. If you clean the rocksdb= folder and > try again with current master I believe it should work for you. > > Thanks, > Mark > > On 06/07/2016 09:23 AM, Somnath Roy wrote: > > Sage, > > I did a global 'make clean' before build, isn't that sufficient ? S= till need to go > to rocksdb folder and clean ? > > > > > > -----Original Message----- > > From: Sage Weil [mailto:sage@newdream.net] > > Sent: Tuesday, June 07, 2016 6:06 AM > > To: Mark Nelson > > Cc: Somnath Roy; Ramesh Chander; ceph-devel > > Subject: Re: Anybody else hitting this panic in latest master with = bluestore? > > > > On Tue, 7 Jun 2016, Mark Nelson wrote: > >> I believe this is due to the rocksdb submodule update in PR #9466. > >> I'm working on tracking down the commit in rocksdb that's causing = it. > > > > Is it possible that the problem is that your build *didn't* update = rocksdb? > > > > The ceph makefile isn't smart enough to notice changes in the rocks= db/ dir > and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the= submodule > updates to get a fresh build. > > > > Maybe you didn't do that, and some of the ceph code is build using = the > new headers and data structures that don't match the previously compi= led > rocksdb code? > > > > sage > > PLEASE NOTE: The information contained in this electronic mail mess= age is > intended only for the use of the designated recipient(s) named above.= If the > reader of this message is not the intended recipient, you are hereby = notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly p= rohibited. If > you have received this communication in error, please notify the send= er by > telephone or e-mail (as shown above) immediately and destroy any and = all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel"= in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message = is intended only for the use of the designated recipient(s) named above= =2E If the reader of this message is not the intended recipient, you ar= e hereby notified that you have received this message in error and that= any review, dissemination, distribution, or copying of this message is= strictly prohibited. If you have received this communication in error,= please notify the sender by telephone or e-mail (as shown above) immed= iately and destroy any and all copies of this message in your possessio= n (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html