All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wido den Hollander <wido@42on.com>
To: Gregory Farnum <gfarnum@redhat.com>
Cc: huang jun <hjwsm1989@gmail.com>,
	Dan van der Ster <dan@vanderster.com>,
	Xiaoxi Chen <superdebuger@gmail.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: Failing OSDs (suicide timeout) due to flaky clients
Date: Tue, 5 Jul 2016 20:59:11 +0200 (CEST)	[thread overview]
Message-ID: <39488107.541.1467745151556@ox.pcextreme.nl> (raw)
In-Reply-To: <CAJ4mKGaFPDzZ3jh9B5SxPbOiadXu7J+W+-gJ7WnOJZahM42_ng@mail.gmail.com>


> Op 5 juli 2016 om 20:35 schreef Gregory Farnum <gfarnum@redhat.com>:
> 
> 
> Uh, searching for OpTracker in my github emails leads me to
> https://github.com/ceph/ceph/pull/7148
> 

Ah, yes! That's the one probably.

Looking at it this was only backported to Jewel, but not to Hammer nor Firefly.

- http://tracker.ceph.com/issues/14248
- https://github.com/ceph/ceph/commit/67be35cba7c384353b0b6d49284a4ead94c4152e

It applies cleanly on Hammer. Building packages and will see if it resolves it. Now find a way to test it and reproduce this.

Wido

> I didn't try and trace the backports but there should be links from
> the referenced Redmine ticket, or you can search the git logs.
> -Greg
> 
> On Tue, Jul 5, 2016 at 11:32 AM, Wido den Hollander <wido@42on.com> wrote:
> >
> >> Op 5 juli 2016 om 19:48 schreef Gregory Farnum <gfarnum@redhat.com>:
> >>
> >>
> >> On Tue, Jul 5, 2016 at 10:45 AM, Wido den Hollander <wido@42on.com> wrote:
> >> >
> >> >> Op 5 juli 2016 om 19:27 schreef Gregory Farnum <gfarnum@redhat.com>:
> >> >>
> >> >>
> >> >> On Tue, Jul 5, 2016 at 2:10 AM, Wido den Hollander <wido@42on.com> wrote:
> >> >> >
> >> >> >> Op 5 juli 2016 om 10:56 schreef huang jun <hjwsm1989@gmail.com>:
> >> >> >>
> >> >> >>
> >> >> >> i see osd timed out many times.
> >> >> >> In SimpleMessenger mode, when sending msg, the Pipeconnection will
> >> >> >> hold a lock, which maybe hold by other threads,
> >> >> >> it's reported before: http://tracker.ceph.com/issues/9921
> >> >> >>
> >> >> >
> >> >> > Thank you! It surely looks like the same symptoms we are seeing in this cluster.
> >> >> >
> >> >> > The bug has been marked as resolved, but are you sure it is?
> >> >>
> >> >> Pretty sure about that bug being done.
> >> >>
> >> >> The conntrack filling thing sounds vaguely familiar though. Is this
> >> >> the latest hammer? I think there were some leaks of messages while
> >> >> sending replies that might have blocked up incoming queues that got
> >> >> resolved later.
> >> >
> >> > Keep in mind, it's the conntrack filling up on the client which results in >50% packetloss on that client.
> >> >
> >> > The cluster is not firewalled and doesn't do any connection tracking.
> >> >
> >> > This is hammer 0.94.5, if this is fixed in .6 or .7, do you have an idea for which commit I should look? (Simple)Messenger related?
> >>
> >> If it is one of the op leaks, it'll be in the OSD OpTracker stuff to
> >> avoid keeping around message references for tracking purposes and
> >> unblocking the client Throttles.
> >
> > Thanks! I've been looking in the hammer and master branch, but was unable to find the right commit I think. Been looking for 45 minutes now, but nothing which caught my attention.
> >
> > If you have the time, would you be so kind to take a look?
> >
> > Wido
> >
> >> -Greg
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2016-07-05 18:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-04 14:41 Failing OSDs (suicide timeout) due to flaky clients Wido den Hollander
2016-07-04 15:54 ` Dan van der Ster
2016-07-04 16:48   ` Wido den Hollander
2016-07-05  1:38     ` Xiaoxi Chen
2016-07-05  7:26       ` Wido den Hollander
2016-07-05  8:56         ` huang jun
2016-07-05  9:10           ` Wido den Hollander
2016-07-05 17:27             ` Gregory Farnum
2016-07-05 17:45               ` Wido den Hollander
2016-07-05 17:48                 ` Gregory Farnum
2016-07-05 18:32                   ` Wido den Hollander
2016-07-05 18:35                     ` Gregory Farnum
2016-07-05 18:59                       ` Wido den Hollander [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39488107.541.1467745151556@ox.pcextreme.nl \
    --to=wido@42on.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dan@vanderster.com \
    --cc=gfarnum@redhat.com \
    --cc=hjwsm1989@gmail.com \
    --cc=superdebuger@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.