All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Farnum <gregory.farnum@dreamhost.com>
To: Sage Weil <sage@newdream.net>
Cc: "Jim Schutt" <jaschut@sandia.gov>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: cosd multi-second stalls cause "wrongly marked me down"
Date: Wed, 23 Feb 2011 11:12:35 -0800	[thread overview]
Message-ID: <99643591EE67457AAE61648D53093F40@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.1102231051070.2601@cobra.newdream.net>


On Wednesday, February 23, 2011 at 10:54 AM, Sage Weil wrote: 
> On Wed, 23 Feb 2011, Gregory Farnum wrote:
> > I have managed to get OSDs wrongly marking each other down during 
> > startup when they're peering large numbers of PGs/pools, as they 
> > disagree on who they need to be heartbeating (due to the slow handling 
> > of new osd maps and pg creates); if you're mostly seeing OSDs get 
> > incorrectly marked down during low epochs (your original email said 
> > epoch 7) this is probably what you're finding.
> 
> FWIW, this isn't supposed to happen either.. the implementation may be 
> broken somewhat. The idea is that once an OSD starts to expect a 
> heartbeat it should tell them so. And if an OSD is told that a future 
> epoch says it should send heartbeats to node foo, then it will do so, at 
> least until it processes that epoch.
Hmmm -- I don't think they're telling the other OSDs that they're heartbeat partners! At least I didn't see anything that would make that happen. They just start expecting pings, and in some cases they will start sending them because they notice they're a local replica too, but there's nothing in those messages like "you owe me pings as of epoch x".
Are there stubs you know of that I should look at in re-implementing this behavior?

> > We still have no idea what could be causing the stall *inside* of 
> > tick(), though. :/
> 
> You mean heartbeat(), right? Yep, still no clue... :(
> 
Well the 28-second stall is inside of tick() as it arms a timer for the next tick. Heartbeat is definitely failing but nobody's quite sure why, as I recall. 

  reply	other threads:[~2011-02-23 19:12 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16 21:25 cosd multi-second stalls cause "wrongly marked me down" Jim Schutt
2011-02-16 21:37 ` Wido den Hollander
2011-02-16 21:51   ` Jim Schutt
2011-02-16 21:40 ` Gregory Farnum
2011-02-16 21:50   ` Jim Schutt
2011-02-17  0:50     ` Sage Weil
2011-02-17  0:54       ` Sage Weil
2011-02-17 15:46         ` Jim Schutt
2011-02-17 16:11           ` Sage Weil
2011-02-17 23:31             ` Jim Schutt
2011-02-18  7:13               ` Sage Weil
2011-02-18 17:04                 ` Jim Schutt
2011-02-18 17:15                 ` Gregory Farnum
2011-02-18 18:41                 ` Jim Schutt
2011-02-18 19:07                 ` Colin McCabe
2011-02-18 20:48                   ` Jim Schutt
2011-02-18 20:58                     ` Sage Weil
2011-02-18 21:09                       ` Jim Schutt
2011-03-09 16:02               ` Jim Schutt
2011-03-09 17:07                 ` Gregory Farnum
2011-03-09 18:36                   ` Jim Schutt
2011-03-09 19:37                     ` Gregory Farnum
2011-03-10 23:09                       ` Jim Schutt
2011-03-10 23:21                         ` Sage Weil
2011-03-10 23:32                           ` Jim Schutt
2011-03-10 23:40                             ` Sage Weil
2011-03-11 14:51                               ` Jim Schutt
2011-03-11 18:26                               ` Jim Schutt
2011-03-11 18:37                                 ` Jim Schutt
2011-03-11 18:37                                 ` Sage Weil
2011-03-11 18:51                                   ` Jim Schutt
2011-03-11 19:09                                     ` Gregory Farnum
2011-03-11 19:13                                       ` Yehuda Sadeh Weinraub
2011-03-11 19:17                                         ` Yehuda Sadeh Weinraub
2011-03-11 19:16                                       ` Jim Schutt
2011-03-11 21:13                                   ` Jim Schutt
2011-03-11 21:37                                     ` Sage Weil
2011-03-11 22:21                                       ` Jim Schutt
2011-03-11 22:26                                         ` Jim Schutt
2011-03-11 22:45                                           ` Sage Weil
2011-03-11 23:29                                             ` Jim Schutt
2011-03-30 21:26                                       ` Jim Schutt
2011-03-30 21:55                                         ` Sage Weil
2011-03-31 14:16                                           ` Jim Schutt
2011-03-31 16:25                                             ` Sage Weil
2011-03-31 17:00                                               ` Jim Schutt
2011-03-31 17:10                                                 ` Jim Schutt
2011-03-31 17:24                                                   ` Sage Weil
2011-03-31 18:08                                                     ` Jim Schutt
2011-03-31 18:41                                                       ` Sage Weil
2011-04-01 22:38                                                         ` Jim Schutt
2011-02-23 17:52             ` Jim Schutt
2011-02-23 18:12               ` Gregory Farnum
2011-02-23 18:54                 ` Sage Weil
2011-02-23 19:12                   ` Gregory Farnum [this message]
2011-02-23 19:23                 ` Jim Schutt
2011-02-23 20:27                   ` Gregory Farnum
2011-03-02  0:53                   ` Sage Weil
2011-03-02 15:21                     ` Jim Schutt
2011-03-02 17:10                       ` Sage Weil
2011-03-02 20:54                         ` Jim Schutt
2011-03-02 21:45                           ` Sage Weil
2011-03-02 21:59                             ` Jim Schutt
2011-03-02 22:57                               ` Jim Schutt
2011-03-02 23:20                                 ` Gregory Farnum
2011-03-02 23:25                                   ` Jim Schutt
2011-03-02 23:33                                     ` Gregory Farnum
2011-03-03  2:26                                 ` Colin McCabe
2011-03-03 20:03                                   ` Jim Schutt
2011-03-03 20:47                                     ` Jim Schutt
2011-03-03 20:55                                       ` Yehuda Sadeh Weinraub
2011-03-03 21:45                                         ` Jim Schutt
2011-03-03 22:22                                           ` Sage Weil
2011-03-03 22:34                                             ` Jim Schutt
2011-03-03 21:53                                         ` Colin McCabe
2011-03-03 23:06                                           ` Jim Schutt
2011-03-03 23:30                                             ` Colin McCabe
2011-03-03 23:37                                               ` Jim Schutt
2011-03-03  5:03                                 ` Sage Weil
2011-03-03 16:35                                   ` Jim Schutt
2011-03-03 17:28                                   ` Jim Schutt
2011-03-03 18:04                                     ` Sage Weil
2011-03-03 18:42                                       ` Jim Schutt
2011-03-03 18:51                                         ` Sage Weil
2011-03-03 19:39                                           ` Jim Schutt
2011-04-08 16:23       ` Jim Schutt
2011-04-08 20:50         ` Sage Weil
2011-04-08 22:11           ` Jim Schutt
2011-04-08 23:10             ` Colin McCabe
2011-04-11 14:41               ` Jim Schutt
2011-04-11 16:25                 ` Sage Weil
2011-04-11 20:14             ` Jim Schutt
2011-04-11 21:18             ` Jim Schutt
2011-04-11 23:23               ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99643591EE67457AAE61648D53093F40@gmail.com \
    --to=gregory.farnum@dreamhost.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jaschut@sandia.gov \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.