* Monitors fallen apart
@ 2011-08-03 9:16 Székelyi Szabolcs
2011-08-04 18:14 ` Yehuda Sadeh Weinraub
0 siblings, 1 reply; 3+ messages in thread
From: Székelyi Szabolcs @ 2011-08-03 9:16 UTC (permalink / raw)
To: ceph-devel
Hello,
I'm running ceph 0.32, and since a while it looks like if a monitor fails,
then the cluster doesn't find a new one.
I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is
also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w
run on the cmds+cmon node tells nothing but
2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
<killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault
infinitely and the filesystem stops working (processes using files in it block
forever). Looks like it rties to connect to the killed monitor instead of
failing over to a working one.
The first message after killing the node was:
2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
Do you have any idea what I'm doing wrong?
Thanks,
--
cc
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Monitors fallen apart
2011-08-03 9:16 Monitors fallen apart Székelyi Szabolcs
@ 2011-08-04 18:14 ` Yehuda Sadeh Weinraub
2011-08-05 13:34 ` Székelyi Szabolcs
0 siblings, 1 reply; 3+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-08-04 18:14 UTC (permalink / raw)
To: Székelyi Szabolcs; +Cc: ceph-devel
2011/8/3 Székelyi Szabolcs <szekelyi@niif.hu>:
> Hello,
>
> I'm running ceph 0.32, and since a while it looks like if a monitor fails,
> then the cluster doesn't find a new one.
>
> I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is
> also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w
> run on the cmds+cmon node tells nothing but
>
> 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault
>
> infinitely and the filesystem stops working (processes using files in it block
> forever). Looks like it rties to connect to the killed monitor instead of
> failing over to a working one.
>
> The first message after killing the node was:
>
> 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
>
> Do you have any idea what I'm doing wrong?
>
Do you have the mon logs, or any core files?
Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Monitors fallen apart
2011-08-04 18:14 ` Yehuda Sadeh Weinraub
@ 2011-08-05 13:34 ` Székelyi Szabolcs
0 siblings, 0 replies; 3+ messages in thread
From: Székelyi Szabolcs @ 2011-08-05 13:34 UTC (permalink / raw)
To: Yehuda Sadeh Weinraub; +Cc: ceph-devel
On 2011. August 4. 20:14:54 Yehuda Sadeh Weinraub wrote:
> 2011/8/3 Székelyi Szabolcs <szekelyi@niif.hu>:
> > I'm running ceph 0.32, and since a while it looks like if a monitor
> > fails, then the cluster doesn't find a new one.
> >
> > I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon,
> > which is also running the client. If I stop one of the cmds+cosd+cmon
> > nodes, ceph -w run on the cmds+cmon node tells nothing but
> >
> > 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> > <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first
> > fault
> >
> > infinitely and the filesystem stops working (processes using files in it
> > block forever). Looks like it rties to connect to the killed monitor
> > instead of failing over to a working one.
> >
> > The first message after killing the node was:
> >
> > 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
> >
> > Do you have any idea what I'm doing wrong?
>
> Do you have the mon logs, or any core files?
No, and I nuked my ceph cluster since then, because I realized that the monmap
was kinda screwed up. When I ran the client with the -m option, it reported
errors that it's unable to connect to some strange hostnames with binary
characters in it. I guess it got broken in the tortures I've put my cluster
under.
I'll report if I experience anything like this with the fresh cluster.
Thanks,
--
cc
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-08-05 13:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-03 9:16 Monitors fallen apart Székelyi Szabolcs
2011-08-04 18:14 ` Yehuda Sadeh Weinraub
2011-08-05 13:34 ` Székelyi Szabolcs
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.