All of lore.kernel.org
 help / color / mirror / Atom feed
* OSDs marked OUT wrongly after monitor failover
@ 2016-10-27  2:36 Ridge Chen
  2016-10-27  2:48 ` Ridge Chen
  0 siblings, 1 reply; 3+ messages in thread
From: Ridge Chen @ 2016-10-27  2:36 UTC (permalink / raw)
  To: ceph-devel

Hi Experts,

Recently we find an issue with our ceph cluster, the version is 0.94.6.

We want to add additional RAM to the ceph nodes, so we need to stop
the ceph service on the nodes first. When we did that on the first
node, we found the OSDs on that node marked OUT and backfill started
(DOWN is expected in this case). The first node is somewhat special
that it is also the location of the leader monitor.

Then checked the monitor log and found the following:

cluster [INF] osd.0 out (down for 3375169.141844)

Looks like the monitor (who just become leader) has wrong
"down_pending_out" records and computes out a  a very long DOWN time ,
finally decides to mark them OUT.

After researching the related code, the reason could be that:

1. "down_pending_out" is set a month ago for those OSDs because of a
network issue.
2. The down OSDs up and join the cluster again. "down_pending_out" is
cleared in the "OSDMonitor::tick()" method. But only happened on
leader monitor.
3. When we stop the ceph service on the first node. The monitor group
failover. The new leader monitor will recognize the OSDs kept in DOWN
status for a a very long time, and mark them OUT wrongly.


What do you think of this?

Regards
Ridge

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OSDs marked OUT wrongly after monitor failover
  2016-10-27  2:36 OSDs marked OUT wrongly after monitor failover Ridge Chen
@ 2016-10-27  2:48 ` Ridge Chen
  2016-10-27  3:01   ` Dong Wu
  0 siblings, 1 reply; 3+ messages in thread
From: Ridge Chen @ 2016-10-27  2:48 UTC (permalink / raw)
  To: ceph-devel

I also raised a bug report: http://tracker.ceph.com/issues/17719

2016-10-27 10:36 GMT+08:00 Ridge Chen <ridge.chen@gmail.com>:
> Hi Experts,
>
> Recently we find an issue with our ceph cluster, the version is 0.94.6.
>
> We want to add additional RAM to the ceph nodes, so we need to stop
> the ceph service on the nodes first. When we did that on the first
> node, we found the OSDs on that node marked OUT and backfill started
> (DOWN is expected in this case). The first node is somewhat special
> that it is also the location of the leader monitor.
>
> Then checked the monitor log and found the following:
>
> cluster [INF] osd.0 out (down for 3375169.141844)
>
> Looks like the monitor (who just become leader) has wrong
> "down_pending_out" records and computes out a  a very long DOWN time ,
> finally decides to mark them OUT.
>
> After researching the related code, the reason could be that:
>
> 1. "down_pending_out" is set a month ago for those OSDs because of a
> network issue.
> 2. The down OSDs up and join the cluster again. "down_pending_out" is
> cleared in the "OSDMonitor::tick()" method. But only happened on
> leader monitor.
> 3. When we stop the ceph service on the first node. The monitor group
> failover. The new leader monitor will recognize the OSDs kept in DOWN
> status for a a very long time, and mark them OUT wrongly.
>
>
> What do you think of this?
>
> Regards
> Ridge

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OSDs marked OUT wrongly after monitor failover
  2016-10-27  2:48 ` Ridge Chen
@ 2016-10-27  3:01   ` Dong Wu
  0 siblings, 0 replies; 3+ messages in thread
From: Dong Wu @ 2016-10-27  3:01 UTC (permalink / raw)
  To: Ridge Chen; +Cc: ceph-devel

we can easily reproduce:
1. stop osd.0;
2. start osd.0;
3. wait mon_osd_down_out_interval time, eg:300s;
4. stop osd.0;
5. stop monA(leader);
6. monB won the election, becomes the leader, and then osd.0 is marked out.

2016-10-27 10:48 GMT+08:00 Ridge Chen <ridge.chen@gmail.com>:
> I also raised a bug report: http://tracker.ceph.com/issues/17719
>
> 2016-10-27 10:36 GMT+08:00 Ridge Chen <ridge.chen@gmail.com>:
>> Hi Experts,
>>
>> Recently we find an issue with our ceph cluster, the version is 0.94.6.
>>
>> We want to add additional RAM to the ceph nodes, so we need to stop
>> the ceph service on the nodes first. When we did that on the first
>> node, we found the OSDs on that node marked OUT and backfill started
>> (DOWN is expected in this case). The first node is somewhat special
>> that it is also the location of the leader monitor.
>>
>> Then checked the monitor log and found the following:
>>
>> cluster [INF] osd.0 out (down for 3375169.141844)
>>
>> Looks like the monitor (who just become leader) has wrong
>> "down_pending_out" records and computes out a  a very long DOWN time ,
>> finally decides to mark them OUT.
>>
>> After researching the related code, the reason could be that:
>>
>> 1. "down_pending_out" is set a month ago for those OSDs because of a
>> network issue.
>> 2. The down OSDs up and join the cluster again. "down_pending_out" is
>> cleared in the "OSDMonitor::tick()" method. But only happened on
>> leader monitor.
>> 3. When we stop the ceph service on the first node. The monitor group
>> failover. The new leader monitor will recognize the OSDs kept in DOWN
>> status for a a very long time, and mark them OUT wrongly.
>>
>>
>> What do you think of this?
>>
>> Regards
>> Ridge
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-10-27  3:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27  2:36 OSDs marked OUT wrongly after monitor failover Ridge Chen
2016-10-27  2:48 ` Ridge Chen
2016-10-27  3:01   ` Dong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.