ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert LeBlanc <robert@leblancnet.us>
To: ceph-devel <ceph-devel@vger.kernel.org>, ceph-users <ceph-users@ceph.io>
Subject: Re: Nautilus 14.2.19 mon 100% CPU
Date: Thu, 8 Apr 2021 11:24:36 -0600	[thread overview]
Message-ID: <CAANLjFrSgYw3qDu46gJPOUQXOojz9yuPj3EOYwvLtQnTmgGe_w@mail.gmail.com> (raw)
In-Reply-To: <CAANLjFpjRLtV+GR4WV15iXXCvkig6tJAr_G=_bZpZ=jKnYfvTQ@mail.gmail.com>

On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc <robert@leblancnet.us> wrote:
>
> I upgraded our Luminous cluster to Nautilus a couple of weeks ago and converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. Yesterday our monitor cluster went nuts and started constantly calling elections because monitor nodes were at 100% and wouldn't respond to heartbeats. I reduced the monitor cluster to one to prevent the constant elections and that let the system limp along until the backfills finished. There are large amounts of time where ceph commands hang with the CPU is at 100%, when the CPU drops I see a lot of work getting done in the monitor logs which stops as soon as the CPU is at 100% again.
>
> I did a `perf top` on the node to see what's taking all the time and it appears to be in the rocksdb code path. I've set `mon_compact_on_start = true` in the ceph.conf but that does not appear to help. The `/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the backfills were going on. I've tried adding a second monitor, but it goes back to the constant elections. I tried restarting all the services without luck. I also pulled the monitor from the network work and tried restarting the mon service isolated (this helped a couple of weeks ago when `ceph -s` would cause 100% CPU and lock up the service much worse than this) and didn't see the high CPU load. So I'm guessing it's triggered from some external source.
>
> I'm happy to provide more info, just let me know what would be helpful.

Sent this to the dev list, but forgot it needed to be plain text. Here
is text output of the `perf top` taken a bit later, so not exactly the
same as the screenshot earlier.

Samples: 20M of event 'cycles', 4000 Hz, Event count (approx.):
61966526527 lost: 0/0 drop: 0/0
Overhead  Shared Object                             Symbol
 11.52%  ceph-mon                                  [.]
rocksdb::MemTable::KeyComparator::operator()
  6.80%  ceph-mon                                  [.]
rocksdb::MemTable::KeyComparator::operator()
  4.75%  ceph-mon                                  [.]
rocksdb::InlineSkipList<rocksdb::MemTableRep::KeyComparator
const&>::FindGreaterOrEqual
  2.89%  libc-2.27.so                              [.] vfprintf
  2.54%  libtcmalloc.so.4.3.0                      [.] tc_deletearray_nothrow
  2.31%  ceph-mon                                  [.] TLS init
function for rocksdb::perf_context
  2.14%  ceph-mon                                  [.] rocksdb::DBImpl::GetImpl
  1.53%  libc-2.27.so                              [.] 0x000000000018acf8
  1.44%  libc-2.27.so                              [.] _IO_default_xsputn
  1.34%  ceph-mon                                  [.] memcmp@plt
  1.32%  libtcmalloc.so.4.3.0                      [.] tc_malloc
  1.28%  ceph-mon                                  [.] rocksdb::Version::Get
  1.27%  libc-2.27.so                              [.] 0x000000000018abf4
  1.17%  ceph-mon                                  [.] RocksDBStore::get
  1.08%  ceph-mon                                  [.] 0x0000000000639a33
  1.04%  ceph-mon                                  [.] 0x0000000000639a0e
  0.89%  ceph-mon                                  [.] 0x0000000000639a46
  0.86%  ceph-mon                                  [.] rocksdb::TableCache::Get
  0.72%  libc-2.27.so                              [.] 0x000000000018abfe
  0.68%  libceph-common.so.0                       [.] ceph_str_hash_rjenkins
  0.66%  ceph-mon                                  [.] rocksdb::Hash
  0.63%  ceph-mon                                  [.] rocksdb::MemTable::Get
  0.62%  ceph-mon                                  [.] 0x00000000006399ff
  0.57%  libc-2.27.so                              [.] 0x000000000018abf0
  0.57%  ceph-mon                                  [.]
rocksdb::GetContext::GetContext
  0.57%  ceph-mon                                  [.]
rocksdb::BlockBasedTable::Get
  0.57%  ceph-mon                                  [.]
rocksdb::BlockBasedTable::GetFilter
  0.55%  [vdso]                                    [.] __vdso_clock_gettime
  0.54%  ceph-mon                                  [.] 0x00000000005afa17
  0.53%  ceph-mgr                                  [.]
std::_Rb_tree<pg_t, pg_t, std::_Identity<pg_t>, std::less<pg_t>,
std::allocator<pg_t> >::equal_range
  0.51%  libceph-common.so.0                       [.] PerfCounters::tinc
  0.50%  ceph-mon                                  [.]
OSDMonitor::make_snap_epoch_key[abi:cxx11]

       reply	other threads:[~2021-04-08 17:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAANLjFpjRLtV+GR4WV15iXXCvkig6tJAr_G=_bZpZ=jKnYfvTQ@mail.gmail.com>
2021-04-08 17:24 ` Robert LeBlanc [this message]
2021-04-08 19:11 ` [ceph-users] Nautilus 14.2.19 mon 100% CPU Stefan Kooman
2021-04-08 20:26   ` Robert LeBlanc
     [not found]     ` <CAKTRiELqxD+0LtRXan9gMzot3y4A4M4x=km-MB2aET6wP_5mQg@mail.gmail.com>
2021-04-09  3:48       ` Robert LeBlanc
2021-04-09 13:40         ` Robert LeBlanc
2021-04-09 15:25           ` [ceph-users] " Stefan Kooman
2021-04-09 16:41             ` Robert LeBlanc
2021-04-09 17:01               ` Robert LeBlanc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAANLjFrSgYw3qDu46gJPOUQXOojz9yuPj3EOYwvLtQnTmgGe_w@mail.gmail.com \
    --to=robert@leblancnet.us \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).