All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert LeBlanc <robert@leblancnet.us>
To: Zizon Qiu <zzdtsv@gmail.com>
Cc: Stefan Kooman <stefan@bit.nl>,
	ceph-devel <ceph-devel@vger.kernel.org>,
	ceph-users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Nautilus 14.2.19 mon 100% CPU
Date: Thu, 8 Apr 2021 21:48:48 -0600	[thread overview]
Message-ID: <CAANLjFrhHbuM-jW5HuuyBMFVu3GWnG23Ama8_vKs55GpOCTA-w@mail.gmail.com> (raw)
In-Reply-To: <CAKTRiELqxD+0LtRXan9gMzot3y4A4M4x=km-MB2aET6wP_5mQg@mail.gmail.com>

Good thought. The storage for the monitor data is a RAID-0 over three
NVMe devices. Watching iostat, they are completely idle, maybe 0.8% to
1.4% for a second every minute or so.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Thu, Apr 8, 2021 at 7:48 PM Zizon Qiu <zzdtsv@gmail.com> wrote:
>
> Will it be related to some kind of disk issue of that mon located in,which may casually
> slow down IO and further the rocksdb?
>
>
> On Fri, Apr 9, 2021 at 4:29 AM Robert LeBlanc <robert@leblancnet.us> wrote:
>>
>> I found this thread that matches a lot of what I'm seeing. I see the
>> ms_dispatch thread going to 100%, but I'm at a single MON, the
>> recovery is done and the rocksdb MON database is ~300MB. I've tried
>> all the settings mentioned in that thread with no noticeable
>> improvement. I was hoping that once the recovery was done (backfills
>> to reformatted OSDs) that it would clear up, but not yet. So any other
>> ideas would be really helpful. Our MDS is functioning, but stalls a
>> lot because the mons miss heartbeats.
>>
>> mon_compact_on_start = true
>> rocksdb_cache_size = 1342177280
>> mon_lease = 30
>> mon_osd_cache_size = 200000
>> mon_sync_max_payload_size = 4096
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>> On Thu, Apr 8, 2021 at 1:11 PM Stefan Kooman <stefan@bit.nl> wrote:
>> >
>> > On 4/8/21 6:22 PM, Robert LeBlanc wrote:
>> > > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
>> > > converted the last batch of FileStore OSDs to BlueStore about 36 hours
>> > > ago. Yesterday our monitor cluster went nuts and started constantly
>> > > calling elections because monitor nodes were at 100% and wouldn't
>> > > respond to heartbeats. I reduced the monitor cluster to one to prevent
>> > > the constant elections and that let the system limp along until the
>> > > backfills finished. There are large amounts of time where ceph commands
>> > > hang with the CPU is at 100%, when the CPU drops I see a lot of work
>> > > getting done in the monitor logs which stops as soon as the CPU is at
>> > > 100% again.
>> >
>> >
>> > Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder
>> > advise this several times because of monitor issues. Also recently for a
>> > cluster that got upgraded from Luminous -> Mimic -> Nautilus.
>> >
>> > Worth a shot.
>> >
>> > Otherwise I'll try to look in depth and see if I can come up with
>> > something smart (for now I need to go catch some sleep).
>> >
>> > Gr. Stefan

  parent reply	other threads:[~2021-04-09  3:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAANLjFpjRLtV+GR4WV15iXXCvkig6tJAr_G=_bZpZ=jKnYfvTQ@mail.gmail.com>
2021-04-08 17:24 ` Nautilus 14.2.19 mon 100% CPU Robert LeBlanc
2021-04-08 19:11 ` [ceph-users] " Stefan Kooman
2021-04-08 20:26   ` Robert LeBlanc
     [not found]     ` <CAKTRiELqxD+0LtRXan9gMzot3y4A4M4x=km-MB2aET6wP_5mQg@mail.gmail.com>
2021-04-09  3:48       ` Robert LeBlanc [this message]
2021-04-09 13:40         ` Robert LeBlanc
2021-04-09 15:25           ` [ceph-users] " Stefan Kooman
2021-04-09 16:41             ` Robert LeBlanc
2021-04-09 17:01               ` Robert LeBlanc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAANLjFrhHbuM-jW5HuuyBMFVu3GWnG23Ama8_vKs55GpOCTA-w@mail.gmail.com \
    --to=robert@leblancnet.us \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.io \
    --cc=stefan@bit.nl \
    --cc=zzdtsv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.