All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph behavior in case of network failure
@ 2012-01-29  4:52 Madhusudhan
  2012-01-30 17:14 ` Gregory Farnum
  2012-01-31 23:34 ` Jon
  0 siblings, 2 replies; 5+ messages in thread
From: Madhusudhan @ 2012-01-29  4:52 UTC (permalink / raw)
  To: ceph-devel

I have configured ceph in centos5.6 after a 
very long fight. Now, i am in the way 
to evaluate the Ceph. Forget me if my 
question looks amateur. If we consider a 
situation where my core switch fails, 
resulting in n/w failure in entire data 
center. What happens to ceph cluster ? 
will it sustain this n/w failure and comes 
online when the n/w is restored ? 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph behavior in case of network failure
  2012-01-29  4:52 Ceph behavior in case of network failure Madhusudhan
@ 2012-01-30 17:14 ` Gregory Farnum
  2012-01-31  5:13   ` madhusudhan
  2012-01-31 23:34 ` Jon
  1 sibling, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2012-01-30 17:14 UTC (permalink / raw)
  To: Madhusudhan; +Cc: ceph-devel

On Sat, Jan 28, 2012 at 8:52 PM, Madhusudhan
<madhusudhana.u.acharya@gmail.com> wrote:
> I have configured ceph in centos5.6 after a
> very long fight. Now, i am in the way
> to evaluate the Ceph. Forget me if my
> question looks amateur. If we consider a
> situation where my core switch fails,
> resulting in n/w failure in entire data
> center. What happens to ceph cluster ?
> will it sustain this n/w failure and comes
> online when the n/w is restored ?

Hmmm. If your network breaks horribly, you will probably need to
restart the daemons — once their communication breaks they'll start
marking each other down and the monitors will probably accept those
reports once the network starts working again. (Actually, maybe we
should update that so the monitors reject sufficiently old reports.)
But it will be a transient effect; restarting your machines will be
enough to restore service. :)
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph behavior in case of network failure
  2012-01-30 17:14 ` Gregory Farnum
@ 2012-01-31  5:13   ` madhusudhan
  2012-01-31 18:29     ` Gregory Farnum
  0 siblings, 1 reply; 5+ messages in thread
From: madhusudhan @ 2012-01-31  5:13 UTC (permalink / raw)
  To: ceph-devel

Gregory Farnum <gregory.farnum <at> dreamhost.com> writes:

> 
> On Sat, Jan 28, 2012 at 8:52 PM, Madhusudhan
> <madhusudhana.u.acharya <at> gmail.com> wrote:
> > I have configured ceph in centos5.6 after a
> > very long fight. Now, i am in the way
> > to evaluate the Ceph. Forget me if my
> > question looks amateur. If we consider a
> > situation where my core switch fails,
> > resulting in n/w failure in entire data
> > center. What happens to ceph cluster ?
> > will it sustain this n/w failure and comes
> > online when the n/w is restored ?
> 
> Hmmm. If your network breaks horribly, you will probably need to
> restart the daemons — once their communication breaks they'll start
> marking each other down and the monitors will probably accept those
> reports once the network starts working again. (Actually, maybe we
> should update that so the monitors reject sufficiently old reports.)
> But it will be a transient effect; restarting your machines will be
> enough to restore service. :)
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
Thank you Greg for the reply. Do we have to start both osd/mon 
deamons on all the nodes ? I one of the case, i rebooted my 
osd node (when it wan running to check the fault tolerance) 
and when it came online, its journal got corrupted. I had to 
reinitialize the node by erasing all data in it. And rebooting
 the entire cluster (in case of n/w failure) doesn't seems to
 be a good idea for me as clients will start mounting the 
cluster immediately and start writing or reading  
from the cluster. 



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph behavior in case of network failure
  2012-01-31  5:13   ` madhusudhan
@ 2012-01-31 18:29     ` Gregory Farnum
  0 siblings, 0 replies; 5+ messages in thread
From: Gregory Farnum @ 2012-01-31 18:29 UTC (permalink / raw)
  To: madhusudhan; +Cc: ceph-devel

On Mon, Jan 30, 2012 at 9:13 PM, madhusudhan
<madhusudhana.u.acharya@gmail.com> wrote:
> Gregory Farnum <gregory.farnum <at> dreamhost.com> writes:
>
>>
>> On Sat, Jan 28, 2012 at 8:52 PM, Madhusudhan
>> <madhusudhana.u.acharya <at> gmail.com> wrote:
>> > I have configured ceph in centos5.6 after a
>> > very long fight. Now, i am in the way
>> > to evaluate the Ceph. Forget me if my
>> > question looks amateur. If we consider a
>> > situation where my core switch fails,
>> > resulting in n/w failure in entire data
>> > center. What happens to ceph cluster ?
>> > will it sustain this n/w failure and comes
>> > online when the n/w is restored ?
>>
>> Hmmm. If your network breaks horribly, you will probably need to
>> restart the daemons — once their communication breaks they'll start
>> marking each other down and the monitors will probably accept those
>> reports once the network starts working again. (Actually, maybe we
>> should update that so the monitors reject sufficiently old reports.)
>> But it will be a transient effect; restarting your machines will be
>> enough to restore service. :)
>> -Greg
>>
> Thank you Greg for the reply. Do we have to start both osd/mon
> deamons on all the nodes ? I one of the case, i rebooted my
> osd node (when it wan running to check the fault tolerance)
> and when it came online, its journal got corrupted. I had to
> reinitialize the node by erasing all data in it. And rebooting
>  the entire cluster (in case of n/w failure) doesn't seems to
>  be a good idea for me as clients will start mounting the
> cluster immediately and start writing or reading
> from the cluster.

Hmm, actually I checked with some coworkers and you shouldn't need to
restart anything at all — the OSDs will correct the report themselves.
So it should all be good! You'll likely experience some slowness while
the OSD states flap (up and down), but it will all be transparent to
the clients.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ceph behavior in case of network failure
  2012-01-29  4:52 Ceph behavior in case of network failure Madhusudhan
  2012-01-30 17:14 ` Gregory Farnum
@ 2012-01-31 23:34 ` Jon
  1 sibling, 0 replies; 5+ messages in thread
From: Jon @ 2012-01-31 23:34 UTC (permalink / raw)
  To: Madhusudhan; +Cc: ceph-devel

I look forward to the availability of TRILL on commodity ethernet switches.

http://en.wikipedia.org/wiki/TRILL_(computing)

I expect that clustered systems like Ceph will help increase demand for TRILL.

On Jan 28, 2012, at 8:52 PM, Madhusudhan wrote:

> I have configured ceph in centos5.6 after a 
> very long fight. Now, i am in the way 
> to evaluate the Ceph. Forget me if my 
> question looks amateur. If we consider a 
> situation where my core switch fails, 
> resulting in n/w failure in entire data 
> center. What happens to ceph cluster ? 
> will it sustain this n/w failure and comes 
> online when the n/w is restored ? 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-01-31 23:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-29  4:52 Ceph behavior in case of network failure Madhusudhan
2012-01-30 17:14 ` Gregory Farnum
2012-01-31  5:13   ` madhusudhan
2012-01-31 18:29     ` Gregory Farnum
2012-01-31 23:34 ` Jon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.