All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kamble, Nitin A" <Nitin.Kamble@Teradata.com>
To: Sage Weil <sage@newdream.net>
Cc: Somnath Roy <Somnath.Roy@sandisk.com>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Bluestore OSD support in ceph-disk
Date: Fri, 16 Sep 2016 23:09:23 +0000	[thread overview]
Message-ID: <B3E46566-5536-4414-8A51-7223ABDA463E@Teradata.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1609162053270.1040@piezo.us.to>


> On Sep 16, 2016, at 1:54 PM, Sage Weil <sage@newdream.net> wrote:
> 
> On Fri, 16 Sep 2016, Kamble, Nitin A wrote:
>>> On Sep 16, 2016, at 12:23 PM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>> 
>>> How you configured bluestore, all default ? i.e all in single partition , no separate partition for db/wal ?
>> It is separated partitions for data(SSD), wal(SSD), rocksdb(SSD), & block store (HDD).
>> 
>>> Wondering if you are out of db space/disk space ?
>> I notice a misconfiguration on the cluster now. The wal & db partition use got swapped, so it is getting just 128MB db partition now. Probably this is the cause of the assert.
> 
> FWIW bluefs is supposed to fall back on any allocation failure to the next 
> larger/slower device (wal -> db -> primary), so having a tiny wal or tiny 
> db shouldn't actually matter.  A bluefs log (debug bluefs = 10 or 20) 
> leading up to any crash there would be helpful.
> 
> Thanks!
> sage
> 

Good to know this fall back mechanism.
In my previous run the partitions and sizes in config did not match. I see the ceph-daemon-dump
showing 900MB+ used for db while db partition was 128MB. I was thinking it started overwriting on
to the next partition. But instead as per this backup logic it started using HDD for db.

One issue I see is that, ceph.conf lists the sizes of the wal,db,& block. But it is possible that actual
partitions may have different sizes. From the ceph-daemon-dump output looks like it is not looking
at the partition’s real size, instead the code is assuming the sizes from the config file as the partition
sizes. I think probing of the size of existing devices/files will be better than taking the sizes from the
config file blindly.

After 5 hours or so 6+ osds were down out of 30.
We will be running the stress test one again with the fixed partition configuration with debug level of
0, to get max performance out. And if that fails then I will switch to debug level 10 or 20, and gather
some detailed logs.

Thanks,
Nitin

> 
>>> We had some issues in this front sometimes back which was fixed, may be a new issue (?). Need verbose log for at least bluefs (debug_bluefs = 20/20)
>> 
>> Let me fix the cluster configuration, to give better space to the DB partition. And if with that this issue comes up then I will try capturing detailed logs.
>> 
>>> BTW, what is your workload (block size, IO pattern ) ?
>> 
>> The workload is internal teradata benchmark, which simulates IO pattern of database disk access with various block sizes and IO pattern. 
>> 
>> Thanks,
>> Nitin
>> 
>> 
>> 
>>> 
>>> -----Original Message-----
>>> From: Kamble, Nitin A [mailto:Nitin.Kamble@Teradata.com]
>>> Sent: Friday, September 16, 2016 12:00 PM
>>> To: Somnath Roy
>>> Cc: Sage Weil; Ceph Development
>>> Subject: Re: Bluestore OSD support in ceph-disk
>>> 
>>> 
>>>> On Sep 16, 2016, at 11:43 AM, Somnath Roy <Somnath.Roy@sandisk.com> wrote:
>>>> 
>>>> Please send the snippet (very first trace , go up in the log) where it is actually printing the assert.
>>>> BTW, what workload you are running ?
>>>> 
>>>> Thanks & Regards
>>>> Somnath
>>>> 
>>> Here it is.
>>> 
>>> 2016-09-16 08:49:30.605845 7fb5a96ba700 -1 /build/nitin/nightly_builds/20160914_125459-master/ceph.git/rpmbuild/BUILD/ceph-v11.0.0-2309.g9096ad3/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_allocate(uint8_t, uint64_t, std::vecto r<bluefs_extent_t>*)' thread 7fb5a96ba700 time 2016-09-16 08:49:30.602139
>>> /build/nitin/nightly_builds/20160914_125459-master/ceph.git/rpmbuild/BUILD/ceph-v11.0.0-2309.g9096ad3/src/os/bluestore/BlueFS.cc: 1686: FAILED assert(0 == "allocate failed... wtf")
>>> 
>>> ceph version v11.0.0-2309-g9096ad3 (9096ad37f2c0798c26d7784fb4e7a781feb72cb8)
>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fb5bf43a11b]
>>> 2: (BlueFS::_allocate(unsigned char, unsigned long, std::vector<bluefs_extent_t, std::allocator<bluefs_extent_t> >*)+0x8ad) [0x7fb5bf2735dd]
>>> 3: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0xb4f) [0x7fb5bf27aa1f]
>>> 4: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x29b) [0x7fb5bf27bc9b]
>>> 5: (BlueRocksWritableFile::Sync()+0x4e) [0x7fb5bf29125e]
>>> 6: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x139) [0x7fb5bf388699]
>>> 7: (rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x7fb5bf389238]
>>> 8: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x13cf) [0x7fb5bf2e0a2f]
>>> 9: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x7fb5bf2e1637]
>>> 10: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x5b) [0x7fb5bf21a14b]
>>> 11: (BlueStore::_kv_sync_thread()+0xf5a) [0x7fb5bf1e7ffa]
>>> 12: (BlueStore::KVSyncThread::entry()+0xd) [0x7fb5bf1f5a6d]
>>> 13: (()+0x80a4) [0x7fb5bb4a70a4]
>>> 14: (clone()+0x6d) [0x7fb5ba32004d]
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>> 
>>> 
>>> 
>>> Thanks,
>>> Nitin
>>> 
>>> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2016-09-16 23:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-14 19:10 Best latest commit to run bluestore Kamble, Nitin A
2016-09-14 19:13 ` Somnath Roy
2016-09-14 19:47   ` Kamble, Nitin A
2016-09-15 18:23     ` Bluestore OSD support in ceph-disk Kamble, Nitin A
2016-09-15 18:34       ` Sage Weil
2016-09-15 18:46         ` Kamble, Nitin A
2016-09-15 18:54           ` Sage Weil
2016-09-16  6:43             ` Kamble, Nitin A
2016-09-16 18:38               ` Kamble, Nitin A
2016-09-16 18:43                 ` Somnath Roy
2016-09-16 19:00                   ` Kamble, Nitin A
2016-09-16 19:23                     ` Somnath Roy
2016-09-16 20:25                       ` Kamble, Nitin A
2016-09-16 20:36                         ` Somnath Roy
2016-09-16 20:54                         ` Sage Weil
2016-09-16 23:09                           ` Kamble, Nitin A [this message]
2016-09-16 23:15                             ` Somnath Roy
2016-09-18  6:41                               ` Kamble, Nitin A
2016-09-18  7:06                                 ` Varada Kari
2016-09-19  0:28                                   ` Kamble, Nitin A
2016-09-19  1:58                                     ` Varada Kari
2016-09-19  3:25                                       ` Somnath Roy
2016-09-19  5:24                                         ` Kamble, Nitin A
2016-09-19  5:32                                           ` Somnath Roy
2016-09-20  5:47                                             ` Kamble, Nitin A
2016-09-20 11:53                                               ` Sage Weil
2016-09-17 14:14                             ` Sage Weil
2016-09-18  6:41                               ` Kamble, Nitin A

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B3E46566-5536-4414-8A51-7223ABDA463E@Teradata.com \
    --to=nitin.kamble@teradata.com \
    --cc=Somnath.Roy@sandisk.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.