All of lore.kernel.org
 help / color / mirror / Atom feed
* Monitors fallen apart
@ 2011-08-03  9:16 Székelyi Szabolcs
  2011-08-04 18:14 ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 3+ messages in thread
From: Székelyi Szabolcs @ 2011-08-03  9:16 UTC (permalink / raw)
  To: ceph-devel

Hello,

I'm running ceph 0.32, and since a while it looks like if a monitor fails, 
then the cluster doesn't find a new one.

I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is 
also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w 
run on the cmds+cmon node tells nothing but

2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >> 
<killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault

infinitely and the filesystem stops working (processes using files in it block 
forever). Looks like it rties to connect to the killed monitor instead of 
failing over to a working one.

The first message after killing the node was:

2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon

Do you have any idea what I'm doing wrong?

Thanks,
-- 
cc


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Monitors fallen apart
  2011-08-03  9:16 Monitors fallen apart Székelyi Szabolcs
@ 2011-08-04 18:14 ` Yehuda Sadeh Weinraub
  2011-08-05 13:34   ` Székelyi Szabolcs
  0 siblings, 1 reply; 3+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-08-04 18:14 UTC (permalink / raw)
  To: Székelyi Szabolcs; +Cc: ceph-devel

2011/8/3 Székelyi Szabolcs <szekelyi@niif.hu>:
> Hello,
>
> I'm running ceph 0.32, and since a while it looks like if a monitor fails,
> then the cluster doesn't find a new one.
>
> I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is
> also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w
> run on the cmds+cmon node tells nothing but
>
> 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault
>
> infinitely and the filesystem stops working (processes using files in it block
> forever). Looks like it rties to connect to the killed monitor instead of
> failing over to a working one.
>
> The first message after killing the node was:
>
> 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
>
> Do you have any idea what I'm doing wrong?
>

Do you have the mon logs, or any core files?


Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Monitors fallen apart
  2011-08-04 18:14 ` Yehuda Sadeh Weinraub
@ 2011-08-05 13:34   ` Székelyi Szabolcs
  0 siblings, 0 replies; 3+ messages in thread
From: Székelyi Szabolcs @ 2011-08-05 13:34 UTC (permalink / raw)
  To: Yehuda Sadeh Weinraub; +Cc: ceph-devel

On 2011. August 4. 20:14:54 Yehuda Sadeh Weinraub wrote:
> 2011/8/3 Székelyi Szabolcs <szekelyi@niif.hu>:
> > I'm running ceph 0.32, and since a while it looks like if a monitor
> > fails, then the cluster doesn't find a new one.
> > 
> > I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon,
> > which is also running the client. If I stop one of the cmds+cosd+cmon
> > nodes, ceph -w run on the cmds+cmon node tells nothing but
> > 
> > 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> > <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first
> > fault
> > 
> > infinitely and the filesystem stops working (processes using files in it
> > block forever). Looks like it rties to connect to the killed monitor
> > instead of failing over to a working one.
> > 
> > The first message after killing the node was:
> > 
> > 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
> > 
> > Do you have any idea what I'm doing wrong?
> 
> Do you have the mon logs, or any core files?

No, and I nuked my ceph cluster since then, because I realized that the monmap 
was kinda screwed up. When I ran the client with the -m option, it reported 
errors that it's unable to connect to some strange hostnames with binary 
characters in it. I guess it got broken in the tortures I've put my cluster 
under.

I'll report if I experience anything like this with the fresh cluster.

Thanks,
-- 
cc

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-08-05 13:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-03  9:16 Monitors fallen apart Székelyi Szabolcs
2011-08-04 18:14 ` Yehuda Sadeh Weinraub
2011-08-05 13:34   ` Székelyi Szabolcs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.