From mboxrd@z Thu Jan 1 00:00:00 1970 From: Two Spirit Subject: Re: clearing unfound objects Date: Tue, 12 Sep 2017 17:07:54 -0700 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from mail-it0-f44.google.com ([209.85.214.44]:37574 "EHLO mail-it0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750944AbdIMAH4 (ORCPT ); Tue, 12 Sep 2017 20:07:56 -0400 Received: by mail-it0-f44.google.com with SMTP id o200so3021667itg.0 for ; Tue, 12 Sep 2017 17:07:56 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: John Spray , ceph-devel I attached the complete output with the previous email. ... "objects": [ { "oid": { "oid": "200.0000052d", "key": "", "snapid": -2, "hash": 2728386690, "max": 0, "pool": 6, "namespace": "" }, "need": "1496'15853", "have": "0'0", "flags": "none", "locations": [] } So it goes Filename -> OID -> PG -> OSD? So if I trace down "200.0000052d" I should be able to clear the problem? I seem to get files in the lost+found directory think from fsck. Does the deep scrubbing eventually clear these after a week or will they always require manual intervention? On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil wrote: > On Tue, 12 Sep 2017, Two Spirit wrote: >> >On Tue, 12 Sep 2017, Two Spirit wrote: >> >> I don't have any OSDs that are down, so the 1 unfound object I think >> >> needs to be manually cleared. I ran across a webpage a while ago that >> >> talked about how to clear it, but if you have a reference, would save >> >> me a little time. >> > >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound >> >> Thanks. That was the page I had read earlier. >> >> I've attached the full outputs to this mail and show just clips below. >> >> # ceph health detail >> OBJECT_UNFOUND 1/731529 unfound (0.000%) >> pg 6.2 has 1 unfound objects >> >> There looks like one number that shouldn't be there... >> # ceph pg 6.2 list_missing >> { >> "offset": { >> ... >> "pool": -9223372036854775808, >> "namespace": "" >> }, >> ... > > I think you've snipped out the bit that has the name of the unfound > object? > > sage > >> >> # ceph -s >> osd: 6 osds: 6 up, 6 in; 10 remapped pgs >> >> This shows under the pg query that something believes that osd "2" is >> down, but all OSDs are up, as seen in the previous ceph -s command. >> # ceph pg 6.2 query >> "recovery_state": [ >> { >> "name": "Started/Primary/Active", >> "enter_time": "2017-09-12 10:33:11.193486", >> "might_have_unfound": [ >> { >> "osd": "0", >> "status": "already probed" >> }, >> { >> "osd": "1", >> "status": "already probed" >> }, >> { >> "osd": "2", >> "status": "osd is down" >> }, >> { >> "osd": "4", >> "status": "already probed" >> }, >> { >> "osd": "5", >> "status": "already probed" >> } >> >> >> If i go to a couple other OSDs, and run the same command, >> the osd "2" is listed as "already probed". They are not in sync. I >> double checked that all the OSDs were up on all 3 times I ran the >> command. >> >> Now. my question to debug this to figure out if I want to >> "revert|delete", is what in the heck are these file(s)/object(s) >> associated with the pg? I assume this might be in the MDS, but I'd >> like to see a file name associated with this to make a further >> determination of what I should do. I don't have enough information at >> this point to figure out how I should recover. >>