All of lore.kernel.org
 help / color / mirror / Atom feed
From: M Ranga Swami Reddy <swamireddy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Alexandru Cucu <me-8/hOu4Zd+9ihKNWrAYCRhA@public.gmane.org>
Cc: ceph-users <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>,
	ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: Ceph cluster stability
Date: Fri, 22 Feb 2019 15:53:45 +0530	[thread overview]
Message-ID: <CANA9Uk4wzgWMDBz_DkMDY_6BPjRnbCmLryaBSEk3xTU+RjaXeA@mail.gmail.com> (raw)
In-Reply-To: <CAHrLTFnMc3Jep4P8WwdR-8J+qNeWhXHtgba4E88V=NBx99YJqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Yep...these are setting already in place. And also followed all
recommendations to get performance, but still impacts with osd
down..even we have 2000+ osd.
And using 3 pools with diff. HW nodes for each pool. One pool's OSD
down, also impacts other pools performance...
which not expected with Ceph (here are using the separate NICs for
data and replication)..

On Wed, Feb 20, 2019 at 9:25 PM Alexandru Cucu <me@alexcucu.ro> wrote:
>
> Hi,
>
> I would decrese max active recovery processes per osd and increase
> recovery sleep.
>     osd recovery max active = 1 (default is 3)
>     osd recovery sleep = 1 (default is 0 or 0.1)
>
> osd max backfills defaults to 1 so that should be OK if he's using the
> default :D
>
> Disabling scrubbing during recovery should also help:
>     osd scrub during recovery = false
>
> On Wed, Feb 20, 2019 at 5:47 PM Darius Kasparavičius <daznis@gmail.com> wrote:
> >
> > Hello,
> >
> >
> > Check your CPU usage when you are doing those kind of operations. We
> > had a similar issue where our CPU monitoring was reporting fine < 40%
> > usage, but our load on the nodes was high mid 60-80. If it's possible
> > try disabling ht and see the actual cpu usage.
> > If you are hitting CPU limits you can try disabling crc on messages.
> > ms_nocrc
> > ms_crc_data
> > ms_crc_header
> >
> > And setting all your debug messages to 0.
> > If you haven't done you can also lower your recovery settings a little.
> > osd recovery max active
> > osd max backfills
> >
> > You can also lower your file store threads.
> > filestore op threads
> >
> >
> > If you can also switch to bluestore from filestore. This will also
> > lower your CPU usage. I'm not sure that this is bluestore that does
> > it, but I'm seeing lower cpu usage when moving to bluestore + rocksdb
> > compared to filestore + leveldb .
> >
> >
> > On Wed, Feb 20, 2019 at 4:27 PM M Ranga Swami Reddy
> > <swamireddy@gmail.com> wrote:
> > >
> > > Thats expected from Ceph by design. But in our case, we are using all
> > > recommendation like rack failure domain, replication n/w,etc, still
> > > face client IO performance issues during one OSD down..
> > >
> > > On Tue, Feb 19, 2019 at 10:56 PM David Turner <drakonstein@gmail.com> wrote:
> > > >
> > > > With a RACK failure domain, you should be able to have an entire rack powered down without noticing any major impact on the clients.  I regularly take down OSDs and nodes for maintenance and upgrades without seeing any problems with client IO.
> > > >
> > > > On Tue, Feb 12, 2019 at 5:01 AM M Ranga Swami Reddy <swamireddy@gmail.com> wrote:
> > > >>
> > > >> Hello - I have a couple of questions on ceph cluster stability, even
> > > >> we follow all recommendations as below:
> > > >> - Having separate replication n/w and data n/w
> > > >> - RACK is the failure domain
> > > >> - Using SSDs for journals (1:4ratio)
> > > >>
> > > >> Q1 - If one OSD down, cluster IO down drastically and customer Apps impacted.
> > > >> Q2 - what is stability ratio, like with above, is ceph cluster
> > > >> workable condition, if one osd down or one node down,etc.
> > > >>
> > > >> Thanks
> > > >> Swami
> > > >> _______________________________________________
> > > >> ceph-users mailing list
> > > >> ceph-users@lists.ceph.com
> > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  parent reply	other threads:[~2019-02-22 10:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-12 10:01 Ceph cluster stability M Ranga Swami Reddy
     [not found] ` <CANA9Uk5YZYbq5EN40PX5vo55wPzjYLU+Oy9m8Hm-DRG-f1zxFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-19 17:26   ` David Turner
2019-02-20 14:27     ` M Ranga Swami Reddy
     [not found]       ` <CANA9Uk5_FfQqEbZ7O3xp4PPy=cSYUwSf-xWVM6+5WnQO6YkqmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-20 15:47         ` Darius Kasparavičius
     [not found]           ` <CANrNMwUVupc80VWW_OKbYnH1JzB9fKRJFB7q7wVgJH9MY8fB6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-20 15:55             ` Alexandru Cucu
     [not found]               ` <CAHrLTFnMc3Jep4P8WwdR-8J+qNeWhXHtgba4E88V=NBx99YJqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 10:23                 ` M Ranga Swami Reddy [this message]
2019-02-22 10:58             ` M Ranga Swami Reddy
     [not found]               ` <CANA9Uk5+D1GU55Goc0+TEjfYhtS8wQ0XRC-3QrSqBpEXPR9Z-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:59                 ` Janne Johansson
     [not found]                   ` <CAA6-MF9ki_eHY=ZHwreHJ1KpB1iFrJd-+k7CeaOtTdTMBmD=DA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 12:14                     ` M Ranga Swami Reddy
2019-02-22 11:01             ` M Ranga Swami Reddy
     [not found]               ` <CANA9Uk5p7so6BYrR+BMhN-qQv3BFnqJwW7cpNA-Xpqtu3mQFhg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:10                 ` David Turner
     [not found]                   ` <CAN-GepJDYqs931SwNvevPPYLUnAcqWPc3oQUDLDeBVFVrWHZEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:39                     ` M Ranga Swami Reddy
     [not found]                       ` <CANA9Uk4_Ynn8+3BDWDiy5Pshv2u_cp90a=vGWwUxrJXS+Q-STQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:43                         ` David Turner
     [not found]                           ` <CAN-Gep+wy9axKNL26RUpCKy3S2uTxzOrSXq+bf=+zhsrGxAvCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:50                             ` M Ranga Swami Reddy
     [not found]                               ` <CANA9Uk6rHoAtvAUGtW1VqZnXDz6EjaFOUfxmYUocChRe5mZwDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 11:55                                 ` Darius Kasparavičius
     [not found]                                   ` <CANrNMwUuODV3Ju+TxosZE0hM9qwAU1Rk0efST6QmzmWXW+hXFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-22 12:14                                     ` M Ranga Swami Reddy
     [not found]                                       ` <CANA9Uk6CFJnGXz-Bcwe7Cb9JMBFGw5rwd3xgFvyo9teCWoUcCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-23  1:01                                         ` Anthony D'Atri
     [not found]                                           ` <7B24D99C-9F31-401F-9455-8F7B0E013160-lmG3kWHAawoohuQDye/k0w@public.gmane.org>
2019-02-25  9:33                                             ` M Ranga Swami Reddy
     [not found]                                               ` <CANA9Uk7=gbKEpzrN7XAe71FD5M7Y+MgUi=FM1qNDc0Y_s6E6gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-02-25 11:23                                                 ` Darius Kasparavičius

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANA9Uk4wzgWMDBz_DkMDY_6BPjRnbCmLryaBSEk3xTU+RjaXeA@mail.gmail.com \
    --to=swamireddy-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
    --cc=me-8/hOu4Zd+9ihKNWrAYCRhA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.