Primary mds failure

* Primary mds failure
@ 2011-07-27 20:26 Jojy Varghese
  2011-07-27 20:35 ` Sage Weil
  0 siblings, 1 reply; 2+ messages in thread
From: Jojy Varghese @ 2011-07-27 20:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi
   We are observing that when the primary mds goes away(say OOM killer
victim), the client keeps on trying (forever) to write to it(try_write
method in the messenger) and eventually results in filesystem hang. So
the question is :

 - Why does the kernel client attempt another mds?
 - Is replication (mds) guaranteed to take place before the primary
mds goes down? In other words, is replication done preemtively or due
to a trigger (scheduled or event based)?

thanks again
Jojy

^ permalink raw reply	[flat|nested] 2+ messages in thread