* Anybody else hitting this panic in latest master with bluestore? @ 2016-06-07 4:59 Ramesh Chander 2016-06-07 6:37 ` Somnath Roy 0 siblings, 1 reply; 15+ messages in thread From: Ramesh Chander @ 2016-06-07 4:59 UTC (permalink / raw) To: ceph-devel Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. -1> 2016-06-06 21:58:50.008174 7fbd0a313500 1 using id 15 0> 2016-06-06 21:58:50.010667 7fbd0a313500 -1 *** Caught signal (Segmentation fault) ** in thread 7fbd0a313500 thread_name:ceph_test_objec ceph version 10.2.0-1899-ged6dc92 (ed6dc9291628a8b1f488cc8b06f46c2815b64c04) 1: (()+0x5b75fa) [0x7fbd0a8f35fa] 2: (()+0x10340) [0x7fbd087fb340] 3: (()+0x152686) [0x7fbd071bb686] 4: (rocksdb::ColumnFamilyOptions::ColumnFamilyOptions(rocksdb::Options const&)+0x12a) [0x7fbd0a87594a] 5: (rocksdb::GetOptionsFromString(rocksdb::Options const&, std::string const&, rocksdb::Options*)+0x18a) [0x7fbd0a87db1a] 6: (RocksDBStore::ParseOptionsFromString(std::string, rocksdb::Options&)+0x10c) [0x7fbd0a7acfc4] 7: (RocksDBStore::init(std::string)+0x74) [0x7fbd0a7ad6ec] 8: (BlueStore::_open_db(bool)+0x261a) [0x7fbd0a62f24a] 9: (BlueStore::mkfs()+0xdb5) [0x7fbd0a647705] 10: (StoreTest::SetUp()+0x34b) [0x7fbd0a5e6edd] 11: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fbd0a8ecb73] 12: (testing::Test::Run()+0x50) [0x7fbd0a8e2860] 13: (testing::TestInfo::Run()+0x9e) [0x7fbd0a8e296e] 14: (testing::TestCase::Run()+0xa5) [0x7fbd0a8e2a75] 15: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fbd0a8e44e8] 16: (testing::UnitTest::Run()+0x54) [0x7fbd0a8e47a4] 17: (main()+0x522) [0x7fbd0a585156] 18: (__libc_start_main()+0xf5) [0x7fbd0708aec5] 19: (()+0x248107) [0x7fbd0a584107] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -Regards, Ramesh Chander PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 4:59 Anybody else hitting this panic in latest master with bluestore? Ramesh Chander @ 2016-06-07 6:37 ` Somnath Roy 2016-06-07 11:33 ` Mark Nelson 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-06-07 6:37 UTC (permalink / raw) To: Ramesh Chander, ceph-devel Yes, me and Mark is getting this. See my old post.. -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ramesh Chander Sent: Monday, June 06, 2016 10:00 PM To: ceph-devel Subject: Anybody else hitting this panic in latest master with bluestore? Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. -1> 2016-06-06 21:58:50.008174 7fbd0a313500 1 using id 15 0> 2016-06-06 21:58:50.010667 7fbd0a313500 -1 *** Caught signal (Segmentation fault) ** in thread 7fbd0a313500 thread_name:ceph_test_objec ceph version 10.2.0-1899-ged6dc92 (ed6dc9291628a8b1f488cc8b06f46c2815b64c04) 1: (()+0x5b75fa) [0x7fbd0a8f35fa] 2: (()+0x10340) [0x7fbd087fb340] 3: (()+0x152686) [0x7fbd071bb686] 4: (rocksdb::ColumnFamilyOptions::ColumnFamilyOptions(rocksdb::Options const&)+0x12a) [0x7fbd0a87594a] 5: (rocksdb::GetOptionsFromString(rocksdb::Options const&, std::string const&, rocksdb::Options*)+0x18a) [0x7fbd0a87db1a] 6: (RocksDBStore::ParseOptionsFromString(std::string, rocksdb::Options&)+0x10c) [0x7fbd0a7acfc4] 7: (RocksDBStore::init(std::string)+0x74) [0x7fbd0a7ad6ec] 8: (BlueStore::_open_db(bool)+0x261a) [0x7fbd0a62f24a] 9: (BlueStore::mkfs()+0xdb5) [0x7fbd0a647705] 10: (StoreTest::SetUp()+0x34b) [0x7fbd0a5e6edd] 11: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fbd0a8ecb73] 12: (testing::Test::Run()+0x50) [0x7fbd0a8e2860] 13: (testing::TestInfo::Run()+0x9e) [0x7fbd0a8e296e] 14: (testing::TestCase::Run()+0xa5) [0x7fbd0a8e2a75] 15: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fbd0a8e44e8] 16: (testing::UnitTest::Run()+0x54) [0x7fbd0a8e47a4] 17: (main()+0x522) [0x7fbd0a585156] 18: (__libc_start_main()+0xf5) [0x7fbd0708aec5] 19: (()+0x248107) [0x7fbd0a584107] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -Regards, Ramesh Chander PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 6:37 ` Somnath Roy @ 2016-06-07 11:33 ` Mark Nelson 2016-06-07 13:05 ` Sage Weil 0 siblings, 1 reply; 15+ messages in thread From: Mark Nelson @ 2016-06-07 11:33 UTC (permalink / raw) To: Somnath Roy, Ramesh Chander, ceph-devel I believe this is due to the rocksdb submodule update in PR #9466. I'm working on tracking down the commit in rocksdb that's causing it. Mark On 06/07/2016 01:37 AM, Somnath Roy wrote: > Yes, me and Mark is getting this. See my old post.. > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Ramesh Chander > Sent: Monday, June 06, 2016 10:00 PM > To: ceph-devel > Subject: Anybody else hitting this panic in latest master with bluestore? > > Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster. Do not use feature with important data. > > -1> 2016-06-06 21:58:50.008174 7fbd0a313500 1 using id 15 > 0> 2016-06-06 21:58:50.010667 7fbd0a313500 -1 *** Caught signal (Segmentation fault) ** in thread 7fbd0a313500 thread_name:ceph_test_objec > > ceph version 10.2.0-1899-ged6dc92 (ed6dc9291628a8b1f488cc8b06f46c2815b64c04) > 1: (()+0x5b75fa) [0x7fbd0a8f35fa] > 2: (()+0x10340) [0x7fbd087fb340] > 3: (()+0x152686) [0x7fbd071bb686] > 4: (rocksdb::ColumnFamilyOptions::ColumnFamilyOptions(rocksdb::Options const&)+0x12a) [0x7fbd0a87594a] > 5: (rocksdb::GetOptionsFromString(rocksdb::Options const&, std::string const&, rocksdb::Options*)+0x18a) [0x7fbd0a87db1a] > 6: (RocksDBStore::ParseOptionsFromString(std::string, rocksdb::Options&)+0x10c) [0x7fbd0a7acfc4] > 7: (RocksDBStore::init(std::string)+0x74) [0x7fbd0a7ad6ec] > 8: (BlueStore::_open_db(bool)+0x261a) [0x7fbd0a62f24a] > 9: (BlueStore::mkfs()+0xdb5) [0x7fbd0a647705] > 10: (StoreTest::SetUp()+0x34b) [0x7fbd0a5e6edd] > 11: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x33) [0x7fbd0a8ecb73] > 12: (testing::Test::Run()+0x50) [0x7fbd0a8e2860] > 13: (testing::TestInfo::Run()+0x9e) [0x7fbd0a8e296e] > 14: (testing::TestCase::Run()+0xa5) [0x7fbd0a8e2a75] > 15: (testing::internal::UnitTestImpl::RunAllTests()+0x248) [0x7fbd0a8e44e8] > 16: (testing::UnitTest::Run()+0x54) [0x7fbd0a8e47a4] > 17: (main()+0x522) [0x7fbd0a585156] > 18: (__libc_start_main()+0xf5) [0x7fbd0708aec5] > 19: (()+0x248107) [0x7fbd0a584107] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > -Regards, > Ramesh Chander > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 11:33 ` Mark Nelson @ 2016-06-07 13:05 ` Sage Weil 2016-06-07 14:23 ` Somnath Roy 0 siblings, 1 reply; 15+ messages in thread From: Sage Weil @ 2016-06-07 13:05 UTC (permalink / raw) To: Mark Nelson; +Cc: Somnath Roy, Ramesh Chander, ceph-devel On Tue, 7 Jun 2016, Mark Nelson wrote: > I believe this is due to the rocksdb submodule update in PR #9466. I'm > working on tracking down the commit in rocksdb that's causing it. Is it possible that the problem is that your build *didn't* update rocksdb? The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? sage ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 13:05 ` Sage Weil @ 2016-06-07 14:23 ` Somnath Roy 2016-06-07 17:57 ` Mark Nelson 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-06-07 14:23 UTC (permalink / raw) To: Sage Weil, Mark Nelson; +Cc: Ramesh Chander, ceph-devel Sage, I did a global 'make clean' before build, isn't that sufficient ? Still need to go to rocksdb folder and clean ? -----Original Message----- From: Sage Weil [mailto:sage@newdream.net] Sent: Tuesday, June 07, 2016 6:06 AM To: Mark Nelson Cc: Somnath Roy; Ramesh Chander; ceph-devel Subject: Re: Anybody else hitting this panic in latest master with bluestore? On Tue, 7 Jun 2016, Mark Nelson wrote: > I believe this is due to the rocksdb submodule update in PR #9466. > I'm working on tracking down the commit in rocksdb that's causing it. Is it possible that the problem is that your build *didn't* update rocksdb? The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? sage PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 14:23 ` Somnath Roy @ 2016-06-07 17:57 ` Mark Nelson 2016-06-07 18:05 ` Somnath Roy 2016-06-07 20:12 ` Somnath Roy 0 siblings, 2 replies; 15+ messages in thread From: Mark Nelson @ 2016-06-07 17:57 UTC (permalink / raw) To: Somnath Roy, Sage Weil; +Cc: Ramesh Chander, ceph-devel Hi Somnath, Did Sage's suggestion fix it for you? In my tests rocksdb wasn't building properly after an upstream commit to detect when jemalloc isn't present: https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8510fcad4756616a I've submitted a fix that is now in master. If you clean the rocksdb folder and try again with current master I believe it should work for you. Thanks, Mark On 06/07/2016 09:23 AM, Somnath Roy wrote: > Sage, > I did a global 'make clean' before build, isn't that sufficient ? Still need to go to rocksdb folder and clean ? > > > -----Original Message----- > From: Sage Weil [mailto:sage@newdream.net] > Sent: Tuesday, June 07, 2016 6:06 AM > To: Mark Nelson > Cc: Somnath Roy; Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > On Tue, 7 Jun 2016, Mark Nelson wrote: >> I believe this is due to the rocksdb submodule update in PR #9466. >> I'm working on tracking down the commit in rocksdb that's causing it. > > Is it possible that the problem is that your build *didn't* update rocksdb? > > The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. > > Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? > > sage > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 17:57 ` Mark Nelson @ 2016-06-07 18:05 ` Somnath Roy 2016-06-07 20:12 ` Somnath Roy 1 sibling, 0 replies; 15+ messages in thread From: Somnath Roy @ 2016-06-07 18:05 UTC (permalink / raw) To: Mark Nelson, Sage Weil; +Cc: Ramesh Chander, ceph-devel I will try now and let you know. Thanks & Regards Somnath -----Original Message----- From: Mark Nelson [mailto:mnelson@redhat.com] Sent: Tuesday, June 07, 2016 10:57 AM To: Somnath Roy; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: Re: Anybody else hitting this panic in latest master with bluestore? Hi Somnath, Did Sage's suggestion fix it for you? In my tests rocksdb wasn't building properly after an upstream commit to detect when jemalloc isn't present: https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8510fcad4756616a I've submitted a fix that is now in master. If you clean the rocksdb folder and try again with current master I believe it should work for you. Thanks, Mark On 06/07/2016 09:23 AM, Somnath Roy wrote: > Sage, > I did a global 'make clean' before build, isn't that sufficient ? Still need to go to rocksdb folder and clean ? > > > -----Original Message----- > From: Sage Weil [mailto:sage@newdream.net] > Sent: Tuesday, June 07, 2016 6:06 AM > To: Mark Nelson > Cc: Somnath Roy; Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > On Tue, 7 Jun 2016, Mark Nelson wrote: >> I believe this is due to the rocksdb submodule update in PR #9466. >> I'm working on tracking down the commit in rocksdb that's causing it. > > Is it possible that the problem is that your build *didn't* update rocksdb? > > The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. > > Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? > > sage > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 17:57 ` Mark Nelson 2016-06-07 18:05 ` Somnath Roy @ 2016-06-07 20:12 ` Somnath Roy 2016-06-07 23:33 ` Somnath Roy 1 sibling, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-06-07 20:12 UTC (permalink / raw) To: Mark Nelson, Sage Weil; +Cc: Ramesh Chander, ceph-devel Mark/Sage, That problem seems to be gone. BTW, rocksdb folder is not cleaned with 'make clean'. I took latest master and manually clean rocksdb folder as you suggested.. But, now I am hitting the following crash in some of my drives. It seems to be related to block alignment. 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 os/bluestore/BitmapFreelistManager.cc: In function 'void BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & block_mask) == offset) ceph version 10.2.0-2021-g55cb608 (55cb608f63787f7969514ad0d7222da68ab84d88) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5652219dd0a0] 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x5652216af96d] 3: (BitmapFreelistManager::create(unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] 6: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x117) [0x5652212776c7] 7: (main()+0x1003) [0x565221209533] 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] 9: (_start()+0x29) [0x5652212588b9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Here is my disk partitions.. Osd.15 on /dev/sdi crashed.. sdi 8:128 0 7T 0 disk ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 └─sdi2 8:130 0 7T 0 part nvme0n1 259:0 0 15.4G 0 disk root@emsnode11:~/ceph-master/src# fdisk /dev/sdi Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): p Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 16384 bytes I/O size (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E Device Start End Sectors Size Type /dev/sdi1 2048 20973567 20971520 10G Linux filesystem /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem Seems to be aligned properly , what alignment bitmap allocator is looking for (Ramesh ?). I will debug further and update. Thanks & Regards Somnath -----Original Message----- From: Somnath Roy Sent: Tuesday, June 07, 2016 11:06 AM To: 'Mark Nelson'; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: RE: Anybody else hitting this panic in latest master with bluestore? I will try now and let you know. Thanks & Regards Somnath -----Original Message----- From: Mark Nelson [mailto:mnelson@redhat.com] Sent: Tuesday, June 07, 2016 10:57 AM To: Somnath Roy; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: Re: Anybody else hitting this panic in latest master with bluestore? Hi Somnath, Did Sage's suggestion fix it for you? In my tests rocksdb wasn't building properly after an upstream commit to detect when jemalloc isn't present: https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8510fcad4756616a I've submitted a fix that is now in master. If you clean the rocksdb folder and try again with current master I believe it should work for you. Thanks, Mark On 06/07/2016 09:23 AM, Somnath Roy wrote: > Sage, > I did a global 'make clean' before build, isn't that sufficient ? Still need to go to rocksdb folder and clean ? > > > -----Original Message----- > From: Sage Weil [mailto:sage@newdream.net] > Sent: Tuesday, June 07, 2016 6:06 AM > To: Mark Nelson > Cc: Somnath Roy; Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > On Tue, 7 Jun 2016, Mark Nelson wrote: >> I believe this is due to the rocksdb submodule update in PR #9466. >> I'm working on tracking down the commit in rocksdb that's causing it. > > Is it possible that the problem is that your build *didn't* update rocksdb? > > The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. > > Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? > > sage > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 20:12 ` Somnath Roy @ 2016-06-07 23:33 ` Somnath Roy 2016-06-08 6:00 ` Ramesh Chander 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-06-07 23:33 UTC (permalink / raw) To: Somnath Roy, Mark Nelson, Sage Weil; +Cc: Ramesh Chander, ceph-devel Ok , I think I found out what is happening in my environment. This drive is formatted with 512 logical block size. BitMap allocator is by default is working with 4K block size and the calculation is breaking (?). I have reformatted the device with 4K and it worked fine. I don't think taking this logical block size parameter as user input may not be *wise*. Since OS needs that all devices is advertising the correct logical block size here. /sys/block/sdb/queue/logical_block_size Allocator needs to read the correct size from the above location. Sage/Ramesh ? Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Somnath Roy Sent: Tuesday, June 07, 2016 1:12 PM To: Mark Nelson; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: RE: Anybody else hitting this panic in latest master with bluestore? Mark/Sage, That problem seems to be gone. BTW, rocksdb folder is not cleaned with 'make clean'. I took latest master and manually clean rocksdb folder as you suggested.. But, now I am hitting the following crash in some of my drives. It seems to be related to block alignment. 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 os/bluestore/BitmapFreelistManager.cc: In function 'void BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & block_mask) == offset) ceph version 10.2.0-2021-g55cb608 (55cb608f63787f7969514ad0d7222da68ab84d88) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5652219dd0a0] 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x5652216af96d] 3: (BitmapFreelistManager::create(unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] 6: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x117) [0x5652212776c7] 7: (main()+0x1003) [0x565221209533] 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] 9: (_start()+0x29) [0x5652212588b9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Here is my disk partitions.. Osd.15 on /dev/sdi crashed.. sdi 8:128 0 7T 0 disk ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 └─sdi2 8:130 0 7T 0 part nvme0n1 259:0 0 15.4G 0 disk root@emsnode11:~/ceph-master/src# fdisk /dev/sdi Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): p Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 16384 bytes I/O size (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E Device Start End Sectors Size Type /dev/sdi1 2048 20973567 20971520 10G Linux filesystem /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem Seems to be aligned properly , what alignment bitmap allocator is looking for (Ramesh ?). I will debug further and update. Thanks & Regards Somnath -----Original Message----- From: Somnath Roy Sent: Tuesday, June 07, 2016 11:06 AM To: 'Mark Nelson'; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: RE: Anybody else hitting this panic in latest master with bluestore? I will try now and let you know. Thanks & Regards Somnath -----Original Message----- From: Mark Nelson [mailto:mnelson@redhat.com] Sent: Tuesday, June 07, 2016 10:57 AM To: Somnath Roy; Sage Weil Cc: Ramesh Chander; ceph-devel Subject: Re: Anybody else hitting this panic in latest master with bluestore? Hi Somnath, Did Sage's suggestion fix it for you? In my tests rocksdb wasn't building properly after an upstream commit to detect when jemalloc isn't present: https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8510fcad4756616a I've submitted a fix that is now in master. If you clean the rocksdb folder and try again with current master I believe it should work for you. Thanks, Mark On 06/07/2016 09:23 AM, Somnath Roy wrote: > Sage, > I did a global 'make clean' before build, isn't that sufficient ? Still need to go to rocksdb folder and clean ? > > > -----Original Message----- > From: Sage Weil [mailto:sage@newdream.net] > Sent: Tuesday, June 07, 2016 6:06 AM > To: Mark Nelson > Cc: Somnath Roy; Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > On Tue, 7 Jun 2016, Mark Nelson wrote: >> I believe this is due to the rocksdb submodule update in PR #9466. >> I'm working on tracking down the commit in rocksdb that's causing it. > > Is it possible that the problem is that your build *didn't* update rocksdb? > > The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule updates to get a fresh build. > > Maybe you didn't do that, and some of the ceph code is build using the new headers and data structures that don't match the previously compiled rocksdb code? > > sage > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-07 23:33 ` Somnath Roy @ 2016-06-08 6:00 ` Ramesh Chander 2016-06-08 7:14 ` Somnath Roy 0 siblings, 1 reply; 15+ messages in thread From: Ramesh Chander @ 2016-06-08 6:00 UTC (permalink / raw) To: Somnath Roy, Mark Nelson, Sage Weil; +Cc: ceph-devel Hi Somnath, I think setting 4k block size is done intentionally. 127 128 // Operate as though the block size is 4 KB. The backing file 129 // blksize doesn't strictly matter except that some file systems may 130 // require a read/modify/write if we write something smaller than 131 // it. 132 block_size = g_conf->bdev_block_size; 133 if (block_size != (unsigned)st.st_blksize) { 134 dout(1) << __func__ << " backing device/file reports st_blksize " 135 << st.st_blksize << ", using bdev_block_size " 136 << block_size << " anyway" << dendl; 137 } 138 Other than more fragmentation we should not see any issue by taking block size as 4k instead of 512. At least I am not aware of. How to reproduce it? I can have a look. -Ramesh > -----Original Message----- > From: Somnath Roy > Sent: Wednesday, June 08, 2016 5:04 AM > To: Somnath Roy; Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > Ok , I think I found out what is happening in my environment. This drive is > formatted with 512 logical block size. > BitMap allocator is by default is working with 4K block size and the calculation > is breaking (?). I have reformatted the device with 4K and it worked fine. > I don't think taking this logical block size parameter as user input may not be > *wise*. > Since OS needs that all devices is advertising the correct logical block size > here. > > /sys/block/sdb/queue/logical_block_size > > Allocator needs to read the correct size from the above location. > Sage/Ramesh ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- > owner@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Tuesday, June 07, 2016 1:12 PM > To: Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > Mark/Sage, > That problem seems to be gone. BTW, rocksdb folder is not cleaned with > 'make clean'. I took latest master and manually clean rocksdb folder as you > suggested.. > But, now I am hitting the following crash in some of my drives. It seems to be > related to block alignment. > > 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 > os/bluestore/BitmapFreelistManager.cc: In function 'void > BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' > thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 > os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & > block_mask) == offset) > > ceph version 10.2.0-2021-g55cb608 > (55cb608f63787f7969514ad0d7222da68ab84d88) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x5652219dd0a0] > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x5652216af96d] > 3: (BitmapFreelistManager::create(unsigned long, > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] > 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] > 6: (OSD::mkfs(CephContext*, ObjectStore*, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, uuid_d, int)+0x117) [0x5652212776c7] > 7: (main()+0x1003) [0x565221209533] > 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] > 9: (_start()+0x29) [0x5652212588b9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > Here is my disk partitions.. > > Osd.15 on /dev/sdi crashed.. > > > sdi 8:128 0 7T 0 disk > ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 > └─sdi2 8:130 0 7T 0 part > nvme0n1 259:0 0 15.4G 0 disk > root@emsnode11:~/ceph-master/src# fdisk /dev/sdi > > Welcome to fdisk (util-linux 2.27.1). > Changes will remain in memory only, until you decide to write them. > Be careful before using the write command. > > > Command (m for help): p > Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 16384 bytes I/O size > (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk > identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E > > Device Start End Sectors Size Type > /dev/sdi1 2048 20973567 20971520 10G Linux filesystem > /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem > > Seems to be aligned properly , what alignment bitmap allocator is looking for > (Ramesh ?). > I will debug further and update. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Somnath Roy > Sent: Tuesday, June 07, 2016 11:06 AM > To: 'Mark Nelson'; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > I will try now and let you know. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Mark Nelson [mailto:mnelson@redhat.com] > Sent: Tuesday, June 07, 2016 10:57 AM > To: Somnath Roy; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > Hi Somnath, > > Did Sage's suggestion fix it for you? In my tests rocksdb wasn't building > properly after an upstream commit to detect when jemalloc isn't > present: > > https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 > 510fcad4756616a > > I've submitted a fix that is now in master. If you clean the rocksdb folder and > try again with current master I believe it should work for you. > > Thanks, > Mark > > On 06/07/2016 09:23 AM, Somnath Roy wrote: > > Sage, > > I did a global 'make clean' before build, isn't that sufficient ? Still need to go > to rocksdb folder and clean ? > > > > > > -----Original Message----- > > From: Sage Weil [mailto:sage@newdream.net] > > Sent: Tuesday, June 07, 2016 6:06 AM > > To: Mark Nelson > > Cc: Somnath Roy; Ramesh Chander; ceph-devel > > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > > > On Tue, 7 Jun 2016, Mark Nelson wrote: > >> I believe this is due to the rocksdb submodule update in PR #9466. > >> I'm working on tracking down the commit in rocksdb that's causing it. > > > > Is it possible that the problem is that your build *didn't* update rocksdb? > > > > The ceph makefile isn't smart enough to notice changes in the rocksdb/ dir > and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the submodule > updates to get a fresh build. > > > > Maybe you didn't do that, and some of the ceph code is build using the > new headers and data structures that don't match the previously compiled > rocksdb code? > > > > sage > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-06-08 6:00 ` Ramesh Chander @ 2016-06-08 7:14 ` Somnath Roy 2016-07-10 13:52 ` Kevan Rehm 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-06-08 7:14 UTC (permalink / raw) To: Ramesh Chander, Mark Nelson, Sage Weil; +Cc: ceph-devel Try to format a device with 512 sector size. I will revert back the same device to 512 sector tomorrow and see if I can still reproduce. Here is the verbose log I collected, see if that helps. 2016-06-07 13:32:25.431373 7fce0cee28c0 10 stupidalloc commit_start releasing 0 in extents 0 2016-06-07 13:32:25.431580 7fce0cee28c0 10 stupidalloc commit_finish released 0 in extents 0 2016-06-07 13:32:25.431733 7fce0cee28c0 10 stupidalloc reserve need 1048576 num_free 306824863744 num_reserved 0 2016-06-07 13:32:25.431743 7fce0cee28c0 10 stupidalloc allocate want_size 1048576 alloc_unit 1048576 hint 0 2016-06-07 13:32:25.435021 7fce0cee28c0 4 rocksdb: DB pointer 0x7fce08909200 2016-06-07 13:32:25.435049 7fce0cee28c0 1 bluestore(/var/lib/ceph/osd/ceph-15) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16 2016-06-07 13:32:25.435057 7fce0cee28c0 20 bluestore(/var/lib/ceph/osd/ceph-15) _open_fm initializing freespace 2016-06-07 13:32:25.435066 7fce0cee28c0 10 freelist _init_misc bytes_per_key 0x80000, key_mask 0xfffffffffff80000 2016-06-07 13:32:25.435074 7fce0cee28c0 10 freelist create rounding blocks up from 0x6f9fd151e00 to 0x6f9fd180000 (0x6f9fd180 blocks) 2016-06-07 13:32:25.438853 7fce0cee28c0 -1 os/bluestore/BitmapFreelistManager.cc: In function 'void BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 7fce0cee28c0 time 2016-06-07 13:32:25.435087 os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & block_mask) == offset) ceph version 10.2.0-2021-g55cb608 (55cb608f63787f7969514ad0d7222da68ab84d88) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x562bdda880a0] 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x562bdd75a96d] 3: (BitmapFreelistManager::create(unsigned long, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x562bdd75b34f] 4: (BlueStore::_open_fm(bool)+0xcd3) [0x562bdd641683] 5: (BlueStore::mkfs()+0x8b9) [0x562bdd6839b9] 6: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x117) [0x562bdd3226c7] 7: (main()+0x1003) [0x562bdd2b4533] 8: (__libc_start_main()+0xf0) [0x7fce09946830] 9: (_start()+0x29) [0x562bdd3038b9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Thanks & Regards Somnath -----Original Message----- From: Ramesh Chander Sent: Tuesday, June 07, 2016 11:01 PM To: Somnath Roy; Mark Nelson; Sage Weil Cc: ceph-devel Subject: RE: Anybody else hitting this panic in latest master with bluestore? Hi Somnath, I think setting 4k block size is done intentionally. 127 128 // Operate as though the block size is 4 KB. The backing file 129 // blksize doesn't strictly matter except that some file systems may 130 // require a read/modify/write if we write something smaller than 131 // it. 132 block_size = g_conf->bdev_block_size; 133 if (block_size != (unsigned)st.st_blksize) { 134 dout(1) << __func__ << " backing device/file reports st_blksize " 135 << st.st_blksize << ", using bdev_block_size " 136 << block_size << " anyway" << dendl; 137 } 138 Other than more fragmentation we should not see any issue by taking block size as 4k instead of 512. At least I am not aware of. How to reproduce it? I can have a look. -Ramesh > -----Original Message----- > From: Somnath Roy > Sent: Wednesday, June 08, 2016 5:04 AM > To: Somnath Roy; Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > Ok , I think I found out what is happening in my environment. This > drive is formatted with 512 logical block size. > BitMap allocator is by default is working with 4K block size and the > calculation is breaking (?). I have reformatted the device with 4K and it worked fine. > I don't think taking this logical block size parameter as user input > may not be *wise*. > Since OS needs that all devices is advertising the correct logical > block size here. > > /sys/block/sdb/queue/logical_block_size > > Allocator needs to read the correct size from the above location. > Sage/Ramesh ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- > owner@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Tuesday, June 07, 2016 1:12 PM > To: Mark Nelson; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > Mark/Sage, > That problem seems to be gone. BTW, rocksdb folder is not cleaned with > 'make clean'. I took latest master and manually clean rocksdb folder > as you suggested.. > But, now I am hitting the following crash in some of my drives. It > seems to be related to block alignment. > > 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 > os/bluestore/BitmapFreelistManager.cc: In function 'void > BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' > thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 > os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & > block_mask) == offset) > > ceph version 10.2.0-2021-g55cb608 > (55cb608f63787f7969514ad0d7222da68ab84d88) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x5652219dd0a0] > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x5652216af96d] > 3: (BitmapFreelistManager::create(unsigned long, > std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] > 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] > 6: (OSD::mkfs(CephContext*, ObjectStore*, > std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > > const&, uuid_d, int)+0x117) [0x5652212776c7] > 7: (main()+0x1003) [0x565221209533] > 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] > 9: (_start()+0x29) [0x5652212588b9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > Here is my disk partitions.. > > Osd.15 on /dev/sdi crashed.. > > > sdi 8:128 0 7T 0 disk > ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 > └─sdi2 8:130 0 7T 0 part > nvme0n1 259:0 0 15.4G 0 disk > root@emsnode11:~/ceph-master/src# fdisk /dev/sdi > > Welcome to fdisk (util-linux 2.27.1). > Changes will remain in memory only, until you decide to write them. > Be careful before using the write command. > > > Command (m for help): p > Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 16384 bytes I/O size > (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk > identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E > > Device Start End Sectors Size Type > /dev/sdi1 2048 20973567 20971520 10G Linux filesystem > /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem > > Seems to be aligned properly , what alignment bitmap allocator is > looking for (Ramesh ?). > I will debug further and update. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Somnath Roy > Sent: Tuesday, June 07, 2016 11:06 AM > To: 'Mark Nelson'; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > I will try now and let you know. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Mark Nelson [mailto:mnelson@redhat.com] > Sent: Tuesday, June 07, 2016 10:57 AM > To: Somnath Roy; Sage Weil > Cc: Ramesh Chander; ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > Hi Somnath, > > Did Sage's suggestion fix it for you? In my tests rocksdb wasn't > building properly after an upstream commit to detect when jemalloc > isn't > present: > > https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 > 510fcad4756616a > > I've submitted a fix that is now in master. If you clean the rocksdb > folder and try again with current master I believe it should work for you. > > Thanks, > Mark > > On 06/07/2016 09:23 AM, Somnath Roy wrote: > > Sage, > > I did a global 'make clean' before build, isn't that sufficient ? > > Still need to go > to rocksdb folder and clean ? > > > > > > -----Original Message----- > > From: Sage Weil [mailto:sage@newdream.net] > > Sent: Tuesday, June 07, 2016 6:06 AM > > To: Mark Nelson > > Cc: Somnath Roy; Ramesh Chander; ceph-devel > > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > > > On Tue, 7 Jun 2016, Mark Nelson wrote: > >> I believe this is due to the rocksdb submodule update in PR #9466. > >> I'm working on tracking down the commit in rocksdb that's causing it. > > > > Is it possible that the problem is that your build *didn't* update rocksdb? > > > > The ceph makefile isn't smart enough to notice changes in the > > rocksdb/ dir > and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the > submodule updates to get a fresh build. > > > > Maybe you didn't do that, and some of the ceph code is build using > > the > new headers and data structures that don't match the previously > compiled rocksdb code? > > > > sage > > PLEASE NOTE: The information contained in this electronic mail > > message is > intended only for the use of the designated recipient(s) named above. > If the reader of this message is not the intended recipient, you are > hereby notified that you have received this message in error and that > any review, dissemination, distribution, or copying of this message is > strictly prohibited. If you have received this communication in error, > please notify the sender by telephone or e-mail (as shown above) > immediately and destroy any and all copies of this message in your > possession (whether hard copies or electronically stored copies). > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Anybody else hitting this panic in latest master with bluestore? 2016-06-08 7:14 ` Somnath Roy @ 2016-07-10 13:52 ` Kevan Rehm 2016-07-10 14:52 ` Somnath Roy 0 siblings, 1 reply; 15+ messages in thread From: Kevan Rehm @ 2016-07-10 13:52 UTC (permalink / raw) To: Somnath Roy; +Cc: ceph-devel Somnath, I hit this same bug while testing bluestore with a PMEM device, ceph-deploy created a partition whose size did not fall on a 4096-byte boundary. I opened ceph issue 16644 to document the problem, see the issue for a 3-line patch I proposed that fixes it. Kevan On 6/8/16, 2:14 AM, "ceph-devel-owner@vger.kernel.org on behalf of Somnath Roy" <ceph-devel-owner@vger.kernel.org on behalf of Somnath.Roy@sandisk.com> wrote: >Try to format a device with 512 sector size. I will revert back the same >device to 512 sector tomorrow and see if I can still reproduce. Here is >the verbose log I collected, see if that helps. > >2016-06-07 13:32:25.431373 7fce0cee28c0 10 stupidalloc commit_start >releasing 0 in extents 0 >2016-06-07 13:32:25.431580 7fce0cee28c0 10 stupidalloc commit_finish >released 0 in extents 0 >2016-06-07 13:32:25.431733 7fce0cee28c0 10 stupidalloc reserve need >1048576 num_free 306824863744 num_reserved 0 >2016-06-07 13:32:25.431743 7fce0cee28c0 10 stupidalloc allocate want_size >1048576 alloc_unit 1048576 hint 0 >2016-06-07 13:32:25.435021 7fce0cee28c0 4 rocksdb: DB pointer >0x7fce08909200 >2016-06-07 13:32:25.435049 7fce0cee28c0 1 >bluestore(/var/lib/ceph/osd/ceph-15) _open_db opened rocksdb path db >options >compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_num >ber_to_merge=3,recycle_log_file_num=16 >2016-06-07 13:32:25.435057 7fce0cee28c0 20 >bluestore(/var/lib/ceph/osd/ceph-15) _open_fm initializing freespace >2016-06-07 13:32:25.435066 7fce0cee28c0 10 freelist _init_misc >bytes_per_key 0x80000, key_mask 0xfffffffffff80000 >2016-06-07 13:32:25.435074 7fce0cee28c0 10 freelist create rounding >blocks up from 0x6f9fd151e00 to 0x6f9fd180000 (0x6f9fd180 blocks) >2016-06-07 13:32:25.438853 7fce0cee28c0 -1 >os/bluestore/BitmapFreelistManager.cc: In function 'void >BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' >thread 7fce0cee28c0 time 2016-06-07 13:32:25.435087 >os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >block_mask) == offset) > > ceph version 10.2.0-2021-g55cb608 >(55cb608f63787f7969514ad0d7222da68ab84d88) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >const*)+0x80) [0x562bdda880a0] > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x562bdd75a96d] > 3: (BitmapFreelistManager::create(unsigned long, >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x562bdd75b34f] > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x562bdd641683] > 5: (BlueStore::mkfs()+0x8b9) [0x562bdd6839b9] > 6: (OSD::mkfs(CephContext*, ObjectStore*, >std::__cxx11::basic_string<char, std::char_traits<char>, >std::allocator<char> > const&, uuid_d, int)+0x117) [0x562bdd3226c7] > 7: (main()+0x1003) [0x562bdd2b4533] > 8: (__libc_start_main()+0xf0) [0x7fce09946830] > 9: (_start()+0x29) [0x562bdd3038b9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >to interpret this. > >Thanks & Regards >Somnath > > >-----Original Message----- >From: Ramesh Chander >Sent: Tuesday, June 07, 2016 11:01 PM >To: Somnath Roy; Mark Nelson; Sage Weil >Cc: ceph-devel >Subject: RE: Anybody else hitting this panic in latest master with >bluestore? > >Hi Somnath, > >I think setting 4k block size is done intentionally. > >127 >128 // Operate as though the block size is 4 KB. The backing file >129 // blksize doesn't strictly matter except that some file systems may >130 // require a read/modify/write if we write something smaller than >131 // it. >132 block_size = g_conf->bdev_block_size; >133 if (block_size != (unsigned)st.st_blksize) { >134 dout(1) << __func__ << " backing device/file reports st_blksize " >135 << st.st_blksize << ", using bdev_block_size " >136 << block_size << " anyway" << dendl; >137 } >138 > >Other than more fragmentation we should not see any issue by taking block >size as 4k instead of 512. At least I am not aware of. > >How to reproduce it? I can have a look. > >-Ramesh > >> -----Original Message----- >> From: Somnath Roy >> Sent: Wednesday, June 08, 2016 5:04 AM >> To: Somnath Roy; Mark Nelson; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> Ok , I think I found out what is happening in my environment. This >> drive is formatted with 512 logical block size. >> BitMap allocator is by default is working with 4K block size and the >> calculation is breaking (?). I have reformatted the device with 4K and >>it worked fine. >> I don't think taking this logical block size parameter as user input >> may not be *wise*. >> Since OS needs that all devices is advertising the correct logical >> block size here. >> >> /sys/block/sdb/queue/logical_block_size >> >> Allocator needs to read the correct size from the above location. >> Sage/Ramesh ? >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- >> owner@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Tuesday, June 07, 2016 1:12 PM >> To: Mark Nelson; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> Mark/Sage, >> That problem seems to be gone. BTW, rocksdb folder is not cleaned with >> 'make clean'. I took latest master and manually clean rocksdb folder >> as you suggested.. >> But, now I am hitting the following crash in some of my drives. It >> seems to be related to block alignment. >> >> 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 >> os/bluestore/BitmapFreelistManager.cc: In function 'void >> BitmapFreelistManager::_xor(uint64_t, uint64_t, >>KeyValueDB::Transaction)' >> thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 >> os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >> block_mask) == offset) >> >> ceph version 10.2.0-2021-g55cb608 >> (55cb608f63787f7969514ad0d7222da68ab84d88) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x80) [0x5652219dd0a0] >> 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x5652216af96d] >> 3: (BitmapFreelistManager::create(unsigned long, >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] >> 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] >> 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] >> 6: (OSD::mkfs(CephContext*, ObjectStore*, >> std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> >> > const&, uuid_d, int)+0x117) [0x5652212776c7] >> 7: (main()+0x1003) [0x565221209533] >> 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] >> 9: (_start()+0x29) [0x5652212588b9] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> Here is my disk partitions.. >> >> Osd.15 on /dev/sdi crashed.. >> >> >> sdi 8:128 0 7T 0 disk >> ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 >> └─sdi2 8:130 0 7T 0 part >> nvme0n1 259:0 0 15.4G 0 disk >> root@emsnode11:~/ceph-master/src# fdisk /dev/sdi >> >> Welcome to fdisk (util-linux 2.27.1). >> Changes will remain in memory only, until you decide to write them. >> Be careful before using the write command. >> >> >> Command (m for help): p >> Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors >> Units: sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 16384 bytes I/O size >> (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk >> identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E >> >> Device Start End Sectors Size Type >> /dev/sdi1 2048 20973567 20971520 10G Linux filesystem >> /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem >> >> Seems to be aligned properly , what alignment bitmap allocator is >> looking for (Ramesh ?). >> I will debug further and update. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Somnath Roy >> Sent: Tuesday, June 07, 2016 11:06 AM >> To: 'Mark Nelson'; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> I will try now and let you know. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Mark Nelson [mailto:mnelson@redhat.com] >> Sent: Tuesday, June 07, 2016 10:57 AM >> To: Somnath Roy; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: Re: Anybody else hitting this panic in latest master with >>bluestore? >> >> Hi Somnath, >> >> Did Sage's suggestion fix it for you? In my tests rocksdb wasn't >> building properly after an upstream commit to detect when jemalloc >> isn't >> present: >> >> https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 >> 510fcad4756616a >> >> I've submitted a fix that is now in master. If you clean the rocksdb >> folder and try again with current master I believe it should work for >>you. >> >> Thanks, >> Mark >> >> On 06/07/2016 09:23 AM, Somnath Roy wrote: >> > Sage, >> > I did a global 'make clean' before build, isn't that sufficient ? >> > Still need to go >> to rocksdb folder and clean ? >> > >> > >> > -----Original Message----- >> > From: Sage Weil [mailto:sage@newdream.net] >> > Sent: Tuesday, June 07, 2016 6:06 AM >> > To: Mark Nelson >> > Cc: Somnath Roy; Ramesh Chander; ceph-devel >> > Subject: Re: Anybody else hitting this panic in latest master with >>bluestore? >> > >> > On Tue, 7 Jun 2016, Mark Nelson wrote: >> >> I believe this is due to the rocksdb submodule update in PR #9466. >> >> I'm working on tracking down the commit in rocksdb that's causing it. >> > >> > Is it possible that the problem is that your build *didn't* update >>rocksdb? >> > >> > The ceph makefile isn't smart enough to notice changes in the >> > rocksdb/ dir >> and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the >> submodule updates to get a fresh build. >> > >> > Maybe you didn't do that, and some of the ceph code is build using >> > the >> new headers and data structures that don't match the previously >> compiled rocksdb code? >> > >> > sage >> > PLEASE NOTE: The information contained in this electronic mail >> > message is >> intended only for the use of the designated recipient(s) named above. >> If the reader of this message is not the intended recipient, you are >> hereby notified that you have received this message in error and that >> any review, dissemination, distribution, or copying of this message is >> strictly prohibited. If you have received this communication in error, >> please notify the sender by telephone or e-mail (as shown above) >> immediately and destroy any and all copies of this message in your >> possession (whether hard copies or electronically stored copies). >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >PLEASE NOTE: The information contained in this electronic mail message is >intended only for the use of the designated recipient(s) named above. If >the reader of this message is not the intended recipient, you are hereby >notified that you have received this message in error and that any >review, dissemination, distribution, or copying of this message is >strictly prohibited. If you have received this communication in error, >please notify the sender by telephone or e-mail (as shown above) >immediately and destroy any and all copies of this message in your >possession (whether hard copies or electronically stored copies). >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-07-10 13:52 ` Kevan Rehm @ 2016-07-10 14:52 ` Somnath Roy 2016-07-10 15:15 ` Ramesh Chander 0 siblings, 1 reply; 15+ messages in thread From: Somnath Roy @ 2016-07-10 14:52 UTC (permalink / raw) To: Kevan Rehm; +Cc: ceph-devel Thanks Kevan for confirming this. After I properly reformatted the drives, I didn't hit the issue, so, didn't bother chasing it. Ramesh, Could you please look into this ? Regards Somnath -----Original Message----- From: Kevan Rehm [mailto:krehm@cray.com] Sent: Sunday, July 10, 2016 6:53 AM To: Somnath Roy Cc: ceph-devel Subject: Re: Anybody else hitting this panic in latest master with bluestore? Somnath, I hit this same bug while testing bluestore with a PMEM device, ceph-deploy created a partition whose size did not fall on a 4096-byte boundary. I opened ceph issue 16644 to document the problem, see the issue for a 3-line patch I proposed that fixes it. Kevan On 6/8/16, 2:14 AM, "ceph-devel-owner@vger.kernel.org on behalf of Somnath Roy" <ceph-devel-owner@vger.kernel.org on behalf of Somnath.Roy@sandisk.com> wrote: >Try to format a device with 512 sector size. I will revert back the >same device to 512 sector tomorrow and see if I can still reproduce. >Here is the verbose log I collected, see if that helps. > >2016-06-07 13:32:25.431373 7fce0cee28c0 10 stupidalloc commit_start >releasing 0 in extents 0 >2016-06-07 13:32:25.431580 7fce0cee28c0 10 stupidalloc commit_finish >released 0 in extents 0 >2016-06-07 13:32:25.431733 7fce0cee28c0 10 stupidalloc reserve need >1048576 num_free 306824863744 num_reserved 0 >2016-06-07 13:32:25.431743 7fce0cee28c0 10 stupidalloc allocate >want_size >1048576 alloc_unit 1048576 hint 0 >2016-06-07 13:32:25.435021 7fce0cee28c0 4 rocksdb: DB pointer >0x7fce08909200 >2016-06-07 13:32:25.435049 7fce0cee28c0 1 >bluestore(/var/lib/ceph/osd/ceph-15) _open_db opened rocksdb path db >options >compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_ >num >ber_to_merge=3,recycle_log_file_num=16 >2016-06-07 13:32:25.435057 7fce0cee28c0 20 >bluestore(/var/lib/ceph/osd/ceph-15) _open_fm initializing freespace >2016-06-07 13:32:25.435066 7fce0cee28c0 10 freelist _init_misc >bytes_per_key 0x80000, key_mask 0xfffffffffff80000 >2016-06-07 13:32:25.435074 7fce0cee28c0 10 freelist create rounding >blocks up from 0x6f9fd151e00 to 0x6f9fd180000 (0x6f9fd180 blocks) >2016-06-07 13:32:25.438853 7fce0cee28c0 -1 >os/bluestore/BitmapFreelistManager.cc: In function 'void >BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' >thread 7fce0cee28c0 time 2016-06-07 13:32:25.435087 >os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >block_mask) == offset) > > ceph version 10.2.0-2021-g55cb608 >(55cb608f63787f7969514ad0d7222da68ab84d88) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >const*)+0x80) [0x562bdda880a0] > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) [0x562bdd75a96d] > 3: (BitmapFreelistManager::create(unsigned long, >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x562bdd75b34f] > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x562bdd641683] > 5: (BlueStore::mkfs()+0x8b9) [0x562bdd6839b9] > 6: (OSD::mkfs(CephContext*, ObjectStore*, >std::__cxx11::basic_string<char, std::char_traits<char>, >std::allocator<char> > const&, uuid_d, int)+0x117) [0x562bdd3226c7] > 7: (main()+0x1003) [0x562bdd2b4533] > 8: (__libc_start_main()+0xf0) [0x7fce09946830] > 9: (_start()+0x29) [0x562bdd3038b9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >needed to interpret this. > >Thanks & Regards >Somnath > > >-----Original Message----- >From: Ramesh Chander >Sent: Tuesday, June 07, 2016 11:01 PM >To: Somnath Roy; Mark Nelson; Sage Weil >Cc: ceph-devel >Subject: RE: Anybody else hitting this panic in latest master with >bluestore? > >Hi Somnath, > >I think setting 4k block size is done intentionally. > >127 >128 // Operate as though the block size is 4 KB. The backing file >129 // blksize doesn't strictly matter except that some file systems may >130 // require a read/modify/write if we write something smaller than >131 // it. >132 block_size = g_conf->bdev_block_size; >133 if (block_size != (unsigned)st.st_blksize) { >134 dout(1) << __func__ << " backing device/file reports st_blksize " >135 << st.st_blksize << ", using bdev_block_size " >136 << block_size << " anyway" << dendl; >137 } >138 > >Other than more fragmentation we should not see any issue by taking >block size as 4k instead of 512. At least I am not aware of. > >How to reproduce it? I can have a look. > >-Ramesh > >> -----Original Message----- >> From: Somnath Roy >> Sent: Wednesday, June 08, 2016 5:04 AM >> To: Somnath Roy; Mark Nelson; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> Ok , I think I found out what is happening in my environment. This >>drive is formatted with 512 logical block size. >> BitMap allocator is by default is working with 4K block size and the >>calculation is breaking (?). I have reformatted the device with 4K and >>it worked fine. >> I don't think taking this logical block size parameter as user input >>may not be *wise*. >> Since OS needs that all devices is advertising the correct logical >>block size here. >> >> /sys/block/sdb/queue/logical_block_size >> >> Allocator needs to read the correct size from the above location. >> Sage/Ramesh ? >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- >>owner@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Tuesday, June 07, 2016 1:12 PM >> To: Mark Nelson; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> Mark/Sage, >> That problem seems to be gone. BTW, rocksdb folder is not cleaned >> with 'make clean'. I took latest master and manually clean rocksdb >> folder as you suggested.. >> But, now I am hitting the following crash in some of my drives. It >> seems to be related to block alignment. >> >> 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 >> os/bluestore/BitmapFreelistManager.cc: In function 'void >>BitmapFreelistManager::_xor(uint64_t, uint64_t, >>KeyValueDB::Transaction)' >> thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 >> os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >> block_mask) == offset) >> >> ceph version 10.2.0-2021-g55cb608 >> (55cb608f63787f7969514ad0d7222da68ab84d88) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x80) [0x5652219dd0a0] >> 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) >> [0x5652216af96d] >> 3: (BitmapFreelistManager::create(unsigned long, >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x5652216b034f] >> 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] >> 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] >> 6: (OSD::mkfs(CephContext*, ObjectStore*, >> std::__cxx11::basic_string<char, std::char_traits<char>, >> std::allocator<char> >> > const&, uuid_d, int)+0x117) [0x5652212776c7] >> 7: (main()+0x1003) [0x565221209533] >> 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] >> 9: (_start()+0x29) [0x5652212588b9] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> Here is my disk partitions.. >> >> Osd.15 on /dev/sdi crashed.. >> >> >> sdi 8:128 0 7T 0 disk >> ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 >> └─sdi2 8:130 0 7T 0 part >> nvme0n1 259:0 0 15.4G 0 disk >> root@emsnode11:~/ceph-master/src# fdisk /dev/sdi >> >> Welcome to fdisk (util-linux 2.27.1). >> Changes will remain in memory only, until you decide to write them. >> Be careful before using the write command. >> >> >> Command (m for help): p >> Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors >> Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): >> 512 bytes / 16384 bytes I/O size >> (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk >> identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E >> >> Device Start End Sectors Size Type >> /dev/sdi1 2048 20973567 20971520 10G Linux filesystem >> /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem >> >> Seems to be aligned properly , what alignment bitmap allocator is >> looking for (Ramesh ?). >> I will debug further and update. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Somnath Roy >> Sent: Tuesday, June 07, 2016 11:06 AM >> To: 'Mark Nelson'; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> I will try now and let you know. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Mark Nelson [mailto:mnelson@redhat.com] >> Sent: Tuesday, June 07, 2016 10:57 AM >> To: Somnath Roy; Sage Weil >> Cc: Ramesh Chander; ceph-devel >> Subject: Re: Anybody else hitting this panic in latest master with >>bluestore? >> >> Hi Somnath, >> >> Did Sage's suggestion fix it for you? In my tests rocksdb wasn't >> building properly after an upstream commit to detect when jemalloc >> isn't >> present: >> >> https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 >> 510fcad4756616a >> >> I've submitted a fix that is now in master. If you clean the rocksdb >>folder and try again with current master I believe it should work for >>you. >> >> Thanks, >> Mark >> >> On 06/07/2016 09:23 AM, Somnath Roy wrote: >> > Sage, >> > I did a global 'make clean' before build, isn't that sufficient ? >> > Still need to go >> to rocksdb folder and clean ? >> > >> > >> > -----Original Message----- >> > From: Sage Weil [mailto:sage@newdream.net] >> > Sent: Tuesday, June 07, 2016 6:06 AM >> > To: Mark Nelson >> > Cc: Somnath Roy; Ramesh Chander; ceph-devel >> > Subject: Re: Anybody else hitting this panic in latest master with >>bluestore? >> > >> > On Tue, 7 Jun 2016, Mark Nelson wrote: >> >> I believe this is due to the rocksdb submodule update in PR #9466. >> >> I'm working on tracking down the commit in rocksdb that's causing it. >> > >> > Is it possible that the problem is that your build *didn't* update >>rocksdb? >> > >> > The ceph makefile isn't smart enough to notice changes in the >> > rocksdb/ dir >> and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the >> submodule updates to get a fresh build. >> > >> > Maybe you didn't do that, and some of the ceph code is build using >> > the >> new headers and data structures that don't match the previously >> compiled rocksdb code? >> > >> > sage >> > PLEASE NOTE: The information contained in this electronic mail >> > message is >> intended only for the use of the designated recipient(s) named above. >> If the reader of this message is not the intended recipient, you are >> hereby notified that you have received this message in error and that >> any review, dissemination, distribution, or copying of this message >> is strictly prohibited. If you have received this communication in >> error, please notify the sender by telephone or e-mail (as shown >> above) immediately and destroy any and all copies of this message in >> your possession (whether hard copies or electronically stored copies). >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >PLEASE NOTE: The information contained in this electronic mail message >is intended only for the use of the designated recipient(s) named >above. If the reader of this message is not the intended recipient, you >are hereby notified that you have received this message in error and >that any review, dissemination, distribution, or copying of this >message is strictly prohibited. If you have received this communication >in error, please notify the sender by telephone or e-mail (as shown >above) immediately and destroy any and all copies of this message in >your possession (whether hard copies or electronically stored copies). >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" >in the body of a message to majordomo@vger.kernel.org More majordomo >info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Anybody else hitting this panic in latest master with bluestore? 2016-07-10 14:52 ` Somnath Roy @ 2016-07-10 15:15 ` Ramesh Chander 2016-07-10 15:57 ` Kevan Rehm 0 siblings, 1 reply; 15+ messages in thread From: Ramesh Chander @ 2016-07-10 15:15 UTC (permalink / raw) To: Somnath Roy, Kevan Rehm; +Cc: ceph-devel I think there are some calculations that expect storage to be 4k aligned in both allocators. I will look in to it. -Ramesh > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- > owner@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Sunday, July 10, 2016 8:22 PM > To: Kevan Rehm > Cc: ceph-devel > Subject: RE: Anybody else hitting this panic in latest master with bluestore? > > Thanks Kevan for confirming this. > After I properly reformatted the drives, I didn't hit the issue, so, didn't > bother chasing it. > Ramesh, > Could you please look into this ? > > Regards > Somnath > > -----Original Message----- > From: Kevan Rehm [mailto:krehm@cray.com] > Sent: Sunday, July 10, 2016 6:53 AM > To: Somnath Roy > Cc: ceph-devel > Subject: Re: Anybody else hitting this panic in latest master with bluestore? > > Somnath, > > I hit this same bug while testing bluestore with a PMEM device, ceph-deploy > created a partition whose size did not fall on a 4096-byte boundary. > > I opened ceph issue 16644 to document the problem, see the issue for a 3- > line patch I proposed that fixes it. > > Kevan > > > On 6/8/16, 2:14 AM, "ceph-devel-owner@vger.kernel.org on behalf of > Somnath Roy" <ceph-devel-owner@vger.kernel.org on behalf of > Somnath.Roy@sandisk.com> wrote: > > >Try to format a device with 512 sector size. I will revert back the > >same device to 512 sector tomorrow and see if I can still reproduce. > >Here is the verbose log I collected, see if that helps. > > > >2016-06-07 13:32:25.431373 7fce0cee28c0 10 stupidalloc commit_start > >releasing 0 in extents 0 > >2016-06-07 13:32:25.431580 7fce0cee28c0 10 stupidalloc commit_finish > >released 0 in extents 0 > >2016-06-07 13:32:25.431733 7fce0cee28c0 10 stupidalloc reserve need > >1048576 num_free 306824863744 num_reserved 0 > >2016-06-07 13:32:25.431743 7fce0cee28c0 10 stupidalloc allocate > >want_size > >1048576 alloc_unit 1048576 hint 0 > >2016-06-07 13:32:25.435021 7fce0cee28c0 4 rocksdb: DB pointer > >0x7fce08909200 > >2016-06-07 13:32:25.435049 7fce0cee28c0 1 > >bluestore(/var/lib/ceph/osd/ceph-15) _open_db opened rocksdb path db > >options > >compression=kNoCompression,max_write_buffer_number=16,min_write_ > buffer_ > >num > >ber_to_merge=3,recycle_log_file_num=16 > >2016-06-07 13:32:25.435057 7fce0cee28c0 20 > >bluestore(/var/lib/ceph/osd/ceph-15) _open_fm initializing freespace > >2016-06-07 13:32:25.435066 7fce0cee28c0 10 freelist _init_misc > >bytes_per_key 0x80000, key_mask 0xfffffffffff80000 > >2016-06-07 13:32:25.435074 7fce0cee28c0 10 freelist create rounding > >blocks up from 0x6f9fd151e00 to 0x6f9fd180000 (0x6f9fd180 blocks) > >2016-06-07 13:32:25.438853 7fce0cee28c0 -1 > >os/bluestore/BitmapFreelistManager.cc: In function 'void > >BitmapFreelistManager::_xor(uint64_t, uint64_t, KeyValueDB::Transaction)' > >thread 7fce0cee28c0 time 2016-06-07 13:32:25.435087 > >os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & > >block_mask) == offset) > > > > ceph version 10.2.0-2021-g55cb608 > >(55cb608f63787f7969514ad0d7222da68ab84d88) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >const*)+0x80) [0x562bdda880a0] > > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, > >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) > [0x562bdd75a96d] > > 3: (BitmapFreelistManager::create(unsigned long, > >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x562bdd75b34f] > > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x562bdd641683] > > 5: (BlueStore::mkfs()+0x8b9) [0x562bdd6839b9] > > 6: (OSD::mkfs(CephContext*, ObjectStore*, > >std::__cxx11::basic_string<char, std::char_traits<char>, > >std::allocator<char> > const&, uuid_d, int)+0x117) [0x562bdd3226c7] > > 7: (main()+0x1003) [0x562bdd2b4533] > > 8: (__libc_start_main()+0xf0) [0x7fce09946830] > > 9: (_start()+0x29) [0x562bdd3038b9] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >needed to interpret this. > > > >Thanks & Regards > >Somnath > > > > > >-----Original Message----- > >From: Ramesh Chander > >Sent: Tuesday, June 07, 2016 11:01 PM > >To: Somnath Roy; Mark Nelson; Sage Weil > >Cc: ceph-devel > >Subject: RE: Anybody else hitting this panic in latest master with > >bluestore? > > > >Hi Somnath, > > > >I think setting 4k block size is done intentionally. > > > >127 > >128 // Operate as though the block size is 4 KB. The backing file > >129 // blksize doesn't strictly matter except that some file systems may > >130 // require a read/modify/write if we write something smaller than > >131 // it. > >132 block_size = g_conf->bdev_block_size; > >133 if (block_size != (unsigned)st.st_blksize) { > >134 dout(1) << __func__ << " backing device/file reports st_blksize " > >135 << st.st_blksize << ", using bdev_block_size " > >136 << block_size << " anyway" << dendl; > >137 } > >138 > > > >Other than more fragmentation we should not see any issue by taking > >block size as 4k instead of 512. At least I am not aware of. > > > >How to reproduce it? I can have a look. > > > >-Ramesh > > > >> -----Original Message----- > >> From: Somnath Roy > >> Sent: Wednesday, June 08, 2016 5:04 AM > >> To: Somnath Roy; Mark Nelson; Sage Weil > >> Cc: Ramesh Chander; ceph-devel > >> Subject: RE: Anybody else hitting this panic in latest master with > >>bluestore? > >> > >> Ok , I think I found out what is happening in my environment. This > >>drive is formatted with 512 logical block size. > >> BitMap allocator is by default is working with 4K block size and the > >>calculation is breaking (?). I have reformatted the device with 4K and > >>it worked fine. > >> I don't think taking this logical block size parameter as user input > >>may not be *wise*. > >> Since OS needs that all devices is advertising the correct logical > >>block size here. > >> > >> /sys/block/sdb/queue/logical_block_size > >> > >> Allocator needs to read the correct size from the above location. > >> Sage/Ramesh ? > >> > >> Thanks & Regards > >> Somnath > >> > >> -----Original Message----- > >> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- > >>owner@vger.kernel.org] On Behalf Of Somnath Roy > >> Sent: Tuesday, June 07, 2016 1:12 PM > >> To: Mark Nelson; Sage Weil > >> Cc: Ramesh Chander; ceph-devel > >> Subject: RE: Anybody else hitting this panic in latest master with > >>bluestore? > >> > >> Mark/Sage, > >> That problem seems to be gone. BTW, rocksdb folder is not cleaned > >> with 'make clean'. I took latest master and manually clean rocksdb > >> folder as you suggested.. > >> But, now I am hitting the following crash in some of my drives. It > >> seems to be related to block alignment. > >> > >> 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 > >> os/bluestore/BitmapFreelistManager.cc: In function 'void > >>BitmapFreelistManager::_xor(uint64_t, uint64_t, > >>KeyValueDB::Transaction)' > >> thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 > >> os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & > >> block_mask) == offset) > >> > >> ceph version 10.2.0-2021-g55cb608 > >> (55cb608f63787f7969514ad0d7222da68ab84d88) > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x80) [0x5652219dd0a0] > >> 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, > >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) > >> [0x5652216af96d] > >> 3: (BitmapFreelistManager::create(unsigned long, > >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) > [0x5652216b034f] > >> 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] > >> 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] > >> 6: (OSD::mkfs(CephContext*, ObjectStore*, > >> std::__cxx11::basic_string<char, std::char_traits<char>, > >> std::allocator<char> > >> > const&, uuid_d, int)+0x117) [0x5652212776c7] > >> 7: (main()+0x1003) [0x565221209533] > >> 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] > >> 9: (_start()+0x29) [0x5652212588b9] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> needed to interpret this. > >> > >> Here is my disk partitions.. > >> > >> Osd.15 on /dev/sdi crashed.. > >> > >> > >> sdi 8:128 0 7T 0 disk > >> ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 > >> └─sdi2 8:130 0 7T 0 part > >> nvme0n1 259:0 0 15.4G 0 disk > >> root@emsnode11:~/ceph-master/src# fdisk /dev/sdi > >> > >> Welcome to fdisk (util-linux 2.27.1). > >> Changes will remain in memory only, until you decide to write them. > >> Be careful before using the write command. > >> > >> > >> Command (m for help): p > >> Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors > >> Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): > >> 512 bytes / 16384 bytes I/O size > >> (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk > >> identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E > >> > >> Device Start End Sectors Size Type > >> /dev/sdi1 2048 20973567 20971520 10G Linux filesystem > >> /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem > >> > >> Seems to be aligned properly , what alignment bitmap allocator is > >> looking for (Ramesh ?). > >> I will debug further and update. > >> > >> Thanks & Regards > >> Somnath > >> > >> -----Original Message----- > >> From: Somnath Roy > >> Sent: Tuesday, June 07, 2016 11:06 AM > >> To: 'Mark Nelson'; Sage Weil > >> Cc: Ramesh Chander; ceph-devel > >> Subject: RE: Anybody else hitting this panic in latest master with > >>bluestore? > >> > >> I will try now and let you know. > >> > >> Thanks & Regards > >> Somnath > >> > >> -----Original Message----- > >> From: Mark Nelson [mailto:mnelson@redhat.com] > >> Sent: Tuesday, June 07, 2016 10:57 AM > >> To: Somnath Roy; Sage Weil > >> Cc: Ramesh Chander; ceph-devel > >> Subject: Re: Anybody else hitting this panic in latest master with > >>bluestore? > >> > >> Hi Somnath, > >> > >> Did Sage's suggestion fix it for you? In my tests rocksdb wasn't > >> building properly after an upstream commit to detect when jemalloc > >> isn't > >> present: > >> > >> > https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 > >> 510fcad4756616a > >> > >> I've submitted a fix that is now in master. If you clean the rocksdb > >>folder and try again with current master I believe it should work for > >>you. > >> > >> Thanks, > >> Mark > >> > >> On 06/07/2016 09:23 AM, Somnath Roy wrote: > >> > Sage, > >> > I did a global 'make clean' before build, isn't that sufficient ? > >> > Still need to go > >> to rocksdb folder and clean ? > >> > > >> > > >> > -----Original Message----- > >> > From: Sage Weil [mailto:sage@newdream.net] > >> > Sent: Tuesday, June 07, 2016 6:06 AM > >> > To: Mark Nelson > >> > Cc: Somnath Roy; Ramesh Chander; ceph-devel > >> > Subject: Re: Anybody else hitting this panic in latest master with > >>bluestore? > >> > > >> > On Tue, 7 Jun 2016, Mark Nelson wrote: > >> >> I believe this is due to the rocksdb submodule update in PR #9466. > >> >> I'm working on tracking down the commit in rocksdb that's causing it. > >> > > >> > Is it possible that the problem is that your build *didn't* update > >>rocksdb? > >> > > >> > The ceph makefile isn't smart enough to notice changes in the > >> > rocksdb/ dir > >> and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the > >> submodule updates to get a fresh build. > >> > > >> > Maybe you didn't do that, and some of the ceph code is build using > >> > the > >> new headers and data structures that don't match the previously > >> compiled rocksdb code? > >> > > >> > sage > >> > PLEASE NOTE: The information contained in this electronic mail > >> > message is > >> intended only for the use of the designated recipient(s) named above. > >> If the reader of this message is not the intended recipient, you are > >> hereby notified that you have received this message in error and that > >> any review, dissemination, distribution, or copying of this message > >> is strictly prohibited. If you have received this communication in > >> error, please notify the sender by telephone or e-mail (as shown > >> above) immediately and destroy any and all copies of this message in > >> your possession (whether hard copies or electronically stored copies). > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" > >> in the body of a message to majordomo@vger.kernel.org More > majordomo > >> info at http://vger.kernel.org/majordomo-info.html > >PLEASE NOTE: The information contained in this electronic mail message > >is intended only for the use of the designated recipient(s) named > >above. If the reader of this message is not the intended recipient, you > >are hereby notified that you have received this message in error and > >that any review, dissemination, distribution, or copying of this > >message is strictly prohibited. If you have received this communication > >in error, please notify the sender by telephone or e-mail (as shown > >above) immediately and destroy any and all copies of this message in > >your possession (whether hard copies or electronically stored copies). > >-- > >To unsubscribe from this list: send the line "unsubscribe ceph-devel" > >in the body of a message to majordomo@vger.kernel.org More majordomo > >info at http://vger.kernel.org/majordomo-info.html > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Anybody else hitting this panic in latest master with bluestore? 2016-07-10 15:15 ` Ramesh Chander @ 2016-07-10 15:57 ` Kevan Rehm 0 siblings, 0 replies; 15+ messages in thread From: Kevan Rehm @ 2016-07-10 15:57 UTC (permalink / raw) To: Ramesh Chander, Somnath Roy; +Cc: ceph-devel Ramesh, Yes, and I'm not suggesting a change to that. Bluestore already has some logic in it to "round down" the size of the block device to a blocks_per_key boundary, by marking any trailing blocks as "in-use". I just tweaked the code to detect and include any trailing partial-block in the range to be marked as in-use. Kevan On 7/10/16, 10:15 AM, "Ramesh Chander" <Ramesh.Chander@sandisk.com> wrote: >I think there are some calculations that expect storage to be 4k aligned >in both allocators. > >I will look in to it. > >-Ramesh > >> -----Original Message----- >> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- >> owner@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Sunday, July 10, 2016 8:22 PM >> To: Kevan Rehm >> Cc: ceph-devel >> Subject: RE: Anybody else hitting this panic in latest master with >>bluestore? >> >> Thanks Kevan for confirming this. >> After I properly reformatted the drives, I didn't hit the issue, so, >>didn't >> bother chasing it. >> Ramesh, >> Could you please look into this ? >> >> Regards >> Somnath >> >> -----Original Message----- >> From: Kevan Rehm [mailto:krehm@cray.com] >> Sent: Sunday, July 10, 2016 6:53 AM >> To: Somnath Roy >> Cc: ceph-devel >> Subject: Re: Anybody else hitting this panic in latest master with >>bluestore? >> >> Somnath, >> >> I hit this same bug while testing bluestore with a PMEM device, >>ceph-deploy >> created a partition whose size did not fall on a 4096-byte boundary. >> >> I opened ceph issue 16644 to document the problem, see the issue for a >>3- >> line patch I proposed that fixes it. >> >> Kevan >> >> >> On 6/8/16, 2:14 AM, "ceph-devel-owner@vger.kernel.org on behalf of >> Somnath Roy" <ceph-devel-owner@vger.kernel.org on behalf of >> Somnath.Roy@sandisk.com> wrote: >> >> >Try to format a device with 512 sector size. I will revert back the >> >same device to 512 sector tomorrow and see if I can still reproduce. >> >Here is the verbose log I collected, see if that helps. >> > >> >2016-06-07 13:32:25.431373 7fce0cee28c0 10 stupidalloc commit_start >> >releasing 0 in extents 0 >> >2016-06-07 13:32:25.431580 7fce0cee28c0 10 stupidalloc commit_finish >> >released 0 in extents 0 >> >2016-06-07 13:32:25.431733 7fce0cee28c0 10 stupidalloc reserve need >> >1048576 num_free 306824863744 num_reserved 0 >> >2016-06-07 13:32:25.431743 7fce0cee28c0 10 stupidalloc allocate >> >want_size >> >1048576 alloc_unit 1048576 hint 0 >> >2016-06-07 13:32:25.435021 7fce0cee28c0 4 rocksdb: DB pointer >> >0x7fce08909200 >> >2016-06-07 13:32:25.435049 7fce0cee28c0 1 >> >bluestore(/var/lib/ceph/osd/ceph-15) _open_db opened rocksdb path db >> >options >> >compression=kNoCompression,max_write_buffer_number=16,min_write_ >> buffer_ >> >num >> >ber_to_merge=3,recycle_log_file_num=16 >> >2016-06-07 13:32:25.435057 7fce0cee28c0 20 >> >bluestore(/var/lib/ceph/osd/ceph-15) _open_fm initializing freespace >> >2016-06-07 13:32:25.435066 7fce0cee28c0 10 freelist _init_misc >> >bytes_per_key 0x80000, key_mask 0xfffffffffff80000 >> >2016-06-07 13:32:25.435074 7fce0cee28c0 10 freelist create rounding >> >blocks up from 0x6f9fd151e00 to 0x6f9fd180000 (0x6f9fd180 blocks) >> >2016-06-07 13:32:25.438853 7fce0cee28c0 -1 >> >os/bluestore/BitmapFreelistManager.cc: In function 'void >> >BitmapFreelistManager::_xor(uint64_t, uint64_t, >>KeyValueDB::Transaction)' >> >thread 7fce0cee28c0 time 2016-06-07 13:32:25.435087 >> >os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >> >block_mask) == offset) >> > >> > ceph version 10.2.0-2021-g55cb608 >> >(55cb608f63787f7969514ad0d7222da68ab84d88) >> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >const*)+0x80) [0x562bdda880a0] >> > 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >> >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) >> [0x562bdd75a96d] >> > 3: (BitmapFreelistManager::create(unsigned long, >> >std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) [0x562bdd75b34f] >> > 4: (BlueStore::_open_fm(bool)+0xcd3) [0x562bdd641683] >> > 5: (BlueStore::mkfs()+0x8b9) [0x562bdd6839b9] >> > 6: (OSD::mkfs(CephContext*, ObjectStore*, >> >std::__cxx11::basic_string<char, std::char_traits<char>, >> >std::allocator<char> > const&, uuid_d, int)+0x117) [0x562bdd3226c7] >> > 7: (main()+0x1003) [0x562bdd2b4533] >> > 8: (__libc_start_main()+0xf0) [0x7fce09946830] >> > 9: (_start()+0x29) [0x562bdd3038b9] >> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >needed to interpret this. >> > >> >Thanks & Regards >> >Somnath >> > >> > >> >-----Original Message----- >> >From: Ramesh Chander >> >Sent: Tuesday, June 07, 2016 11:01 PM >> >To: Somnath Roy; Mark Nelson; Sage Weil >> >Cc: ceph-devel >> >Subject: RE: Anybody else hitting this panic in latest master with >> >bluestore? >> > >> >Hi Somnath, >> > >> >I think setting 4k block size is done intentionally. >> > >> >127 >> >128 // Operate as though the block size is 4 KB. The backing file >> >129 // blksize doesn't strictly matter except that some file systems >>may >> >130 // require a read/modify/write if we write something smaller than >> >131 // it. >> >132 block_size = g_conf->bdev_block_size; >> >133 if (block_size != (unsigned)st.st_blksize) { >> >134 dout(1) << __func__ << " backing device/file reports >>st_blksize " >> >135 << st.st_blksize << ", using bdev_block_size " >> >136 << block_size << " anyway" << dendl; >> >137 } >> >138 >> > >> >Other than more fragmentation we should not see any issue by taking >> >block size as 4k instead of 512. At least I am not aware of. >> > >> >How to reproduce it? I can have a look. >> > >> >-Ramesh >> > >> >> -----Original Message----- >> >> From: Somnath Roy >> >> Sent: Wednesday, June 08, 2016 5:04 AM >> >> To: Somnath Roy; Mark Nelson; Sage Weil >> >> Cc: Ramesh Chander; ceph-devel >> >> Subject: RE: Anybody else hitting this panic in latest master with >> >>bluestore? >> >> >> >> Ok , I think I found out what is happening in my environment. This >> >>drive is formatted with 512 logical block size. >> >> BitMap allocator is by default is working with 4K block size and the >> >>calculation is breaking (?). I have reformatted the device with 4K and >> >>it worked fine. >> >> I don't think taking this logical block size parameter as user input >> >>may not be *wise*. >> >> Since OS needs that all devices is advertising the correct logical >> >>block size here. >> >> >> >> /sys/block/sdb/queue/logical_block_size >> >> >> >> Allocator needs to read the correct size from the above location. >> >> Sage/Ramesh ? >> >> >> >> Thanks & Regards >> >> Somnath >> >> >> >> -----Original Message----- >> >> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel- >> >>owner@vger.kernel.org] On Behalf Of Somnath Roy >> >> Sent: Tuesday, June 07, 2016 1:12 PM >> >> To: Mark Nelson; Sage Weil >> >> Cc: Ramesh Chander; ceph-devel >> >> Subject: RE: Anybody else hitting this panic in latest master with >> >>bluestore? >> >> >> >> Mark/Sage, >> >> That problem seems to be gone. BTW, rocksdb folder is not cleaned >> >> with 'make clean'. I took latest master and manually clean rocksdb >> >> folder as you suggested.. >> >> But, now I am hitting the following crash in some of my drives. It >> >> seems to be related to block alignment. >> >> >> >> 0> 2016-06-07 11:50:12.353375 7f5c0fe938c0 -1 >> >> os/bluestore/BitmapFreelistManager.cc: In function 'void >> >>BitmapFreelistManager::_xor(uint64_t, uint64_t, >> >>KeyValueDB::Transaction)' >> >> thread 7f5c0fe938c0 time 2016-06-07 11:50:12.349722 >> >> os/bluestore/BitmapFreelistManager.cc: 477: FAILED assert((offset & >> >> block_mask) == offset) >> >> >> >> ceph version 10.2.0-2021-g55cb608 >> >> (55cb608f63787f7969514ad0d7222da68ab84d88) >> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >> const*)+0x80) [0x5652219dd0a0] >> >> 2: (BitmapFreelistManager::_xor(unsigned long, unsigned long, >> >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x12ed) >> >> [0x5652216af96d] >> >> 3: (BitmapFreelistManager::create(unsigned long, >> >> std::shared_ptr<KeyValueDB::TransactionImpl>)+0x33f) >> [0x5652216b034f] >> >> 4: (BlueStore::_open_fm(bool)+0xcd3) [0x565221596683] >> >> 5: (BlueStore::mkfs()+0x8b9) [0x5652215d89b9] >> >> 6: (OSD::mkfs(CephContext*, ObjectStore*, >> >> std::__cxx11::basic_string<char, std::char_traits<char>, >> >> std::allocator<char> >> >> > const&, uuid_d, int)+0x117) [0x5652212776c7] >> >> 7: (main()+0x1003) [0x565221209533] >> >> 8: (__libc_start_main()+0xf0) [0x7f5c0c8f7830] >> >> 9: (_start()+0x29) [0x5652212588b9] >> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> needed to interpret this. >> >> >> >> Here is my disk partitions.. >> >> >> >> Osd.15 on /dev/sdi crashed.. >> >> >> >> >> >> sdi 8:128 0 7T 0 disk >> >> ├─sdi1 8:129 0 10G 0 part /var/lib/ceph/osd/ceph-15 >> >> └─sdi2 8:130 0 7T 0 part >> >> nvme0n1 259:0 0 15.4G 0 disk >> >> root@emsnode11:~/ceph-master/src# fdisk /dev/sdi >> >> >> >> Welcome to fdisk (util-linux 2.27.1). >> >> Changes will remain in memory only, until you decide to write them. >> >> Be careful before using the write command. >> >> >> >> >> >> Command (m for help): p >> >> Disk /dev/sdi: 7 TiB, 7681501126656 bytes, 15002931888 sectors >> >> Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): >> >> 512 bytes / 16384 bytes I/O size >> >> (minimum/optimal): 16384 bytes / 16384 bytes Disklabel type: gpt Disk >> >> identifier: 4A3182B9-23EA-441A-A113-FE904E81BF3E >> >> >> >> Device Start End Sectors Size Type >> >> /dev/sdi1 2048 20973567 20971520 10G Linux filesystem >> >> /dev/sdi2 20973568 15002931854 14981958287 7T Linux filesystem >> >> >> >> Seems to be aligned properly , what alignment bitmap allocator is >> >> looking for (Ramesh ?). >> >> I will debug further and update. >> >> >> >> Thanks & Regards >> >> Somnath >> >> >> >> -----Original Message----- >> >> From: Somnath Roy >> >> Sent: Tuesday, June 07, 2016 11:06 AM >> >> To: 'Mark Nelson'; Sage Weil >> >> Cc: Ramesh Chander; ceph-devel >> >> Subject: RE: Anybody else hitting this panic in latest master with >> >>bluestore? >> >> >> >> I will try now and let you know. >> >> >> >> Thanks & Regards >> >> Somnath >> >> >> >> -----Original Message----- >> >> From: Mark Nelson [mailto:mnelson@redhat.com] >> >> Sent: Tuesday, June 07, 2016 10:57 AM >> >> To: Somnath Roy; Sage Weil >> >> Cc: Ramesh Chander; ceph-devel >> >> Subject: Re: Anybody else hitting this panic in latest master with >> >>bluestore? >> >> >> >> Hi Somnath, >> >> >> >> Did Sage's suggestion fix it for you? In my tests rocksdb wasn't >> >> building properly after an upstream commit to detect when jemalloc >> >> isn't >> >> present: >> >> >> >> >> https://github.com/facebook/rocksdb/commit/0850bc514737a64dc8ca13de8 >> >> 510fcad4756616a >> >> >> >> I've submitted a fix that is now in master. If you clean the rocksdb >> >>folder and try again with current master I believe it should work for >> >>you. >> >> >> >> Thanks, >> >> Mark >> >> >> >> On 06/07/2016 09:23 AM, Somnath Roy wrote: >> >> > Sage, >> >> > I did a global 'make clean' before build, isn't that sufficient ? >> >> > Still need to go >> >> to rocksdb folder and clean ? >> >> > >> >> > >> >> > -----Original Message----- >> >> > From: Sage Weil [mailto:sage@newdream.net] >> >> > Sent: Tuesday, June 07, 2016 6:06 AM >> >> > To: Mark Nelson >> >> > Cc: Somnath Roy; Ramesh Chander; ceph-devel >> >> > Subject: Re: Anybody else hitting this panic in latest master with >> >>bluestore? >> >> > >> >> > On Tue, 7 Jun 2016, Mark Nelson wrote: >> >> >> I believe this is due to the rocksdb submodule update in PR #9466. >> >> >> I'm working on tracking down the commit in rocksdb that's causing >>it. >> >> > >> >> > Is it possible that the problem is that your build *didn't* update >> >>rocksdb? >> >> > >> >> > The ceph makefile isn't smart enough to notice changes in the >> >> > rocksdb/ dir >> >> and rebuild. You have to 'cd rocksdb ; make clean ; cd ..' after the >> >> submodule updates to get a fresh build. >> >> > >> >> > Maybe you didn't do that, and some of the ceph code is build using >> >> > the >> >> new headers and data structures that don't match the previously >> >> compiled rocksdb code? >> >> > >> >> > sage >> >> > PLEASE NOTE: The information contained in this electronic mail >> >> > message is >> >> intended only for the use of the designated recipient(s) named above. >> >> If the reader of this message is not the intended recipient, you are >> >> hereby notified that you have received this message in error and that >> >> any review, dissemination, distribution, or copying of this message >> >> is strictly prohibited. If you have received this communication in >> >> error, please notify the sender by telephone or e-mail (as shown >> >> above) immediately and destroy any and all copies of this message in >> >> your possession (whether hard copies or electronically stored >>copies). >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> >> in the body of a message to majordomo@vger.kernel.org More >> majordomo >> >> info at http://vger.kernel.org/majordomo-info.html >> >PLEASE NOTE: The information contained in this electronic mail message >> >is intended only for the use of the designated recipient(s) named >> >above. If the reader of this message is not the intended recipient, you >> >are hereby notified that you have received this message in error and >> >that any review, dissemination, distribution, or copying of this >> >message is strictly prohibited. If you have received this communication >> >in error, please notify the sender by telephone or e-mail (as shown >> >above) immediately and destroy any and all copies of this message in >> >your possession (whether hard copies or electronically stored copies). >> >-- >> >To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> >in the body of a message to majordomo@vger.kernel.org More majordomo >> >info at http://vger.kernel.org/majordomo-info.html >> >> PLEASE NOTE: The information contained in this electronic mail message >>is >> intended only for the use of the designated recipient(s) named above. >>If the >> reader of this message is not the intended recipient, you are hereby >>notified >> that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly >>prohibited. If >> you have received this communication in error, please notify the sender >>by >> telephone or e-mail (as shown above) immediately and destroy any and all >> copies of this message in your possession (whether hard copies or >> electronically stored copies). >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>in the >> body of a message to majordomo@vger.kernel.org More majordomo info at >> http://vger.kernel.org/majordomo-info.html >PLEASE NOTE: The information contained in this electronic mail message is >intended only for the use of the designated recipient(s) named above. If >the reader of this message is not the intended recipient, you are hereby >notified that you have received this message in error and that any >review, dissemination, distribution, or copying of this message is >strictly prohibited. If you have received this communication in error, >please notify the sender by telephone or e-mail (as shown above) >immediately and destroy any and all copies of this message in your >possession (whether hard copies or electronically stored copies). ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-07-10 16:06 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-07 4:59 Anybody else hitting this panic in latest master with bluestore? Ramesh Chander 2016-06-07 6:37 ` Somnath Roy 2016-06-07 11:33 ` Mark Nelson 2016-06-07 13:05 ` Sage Weil 2016-06-07 14:23 ` Somnath Roy 2016-06-07 17:57 ` Mark Nelson 2016-06-07 18:05 ` Somnath Roy 2016-06-07 20:12 ` Somnath Roy 2016-06-07 23:33 ` Somnath Roy 2016-06-08 6:00 ` Ramesh Chander 2016-06-08 7:14 ` Somnath Roy 2016-07-10 13:52 ` Kevan Rehm 2016-07-10 14:52 ` Somnath Roy 2016-07-10 15:15 ` Ramesh Chander 2016-07-10 15:57 ` Kevan Rehm
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.