All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sweil@redhat.com>
To: Two Spirit <twospirit6905@gmail.com>
Cc: John Spray <jspray@redhat.com>, ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: clearing unfound objects
Date: Wed, 13 Sep 2017 01:54:37 +0000 (UTC)	[thread overview]
Message-ID: <alpine.DEB.2.11.1709130154080.24068@piezo.novalocal> (raw)
In-Reply-To: <CAKRxpuuMbavH4V7zEDaOzJFzG=eLU0XH2KjgF6EJh1EWevxxQw@mail.gmail.com>

On Tue, 12 Sep 2017, Two Spirit wrote:
> I attached the complete output with the previous email.
> 
> ...
>     "objects": [
>         {
>             "oid": {
>                 "oid": "200.0000052d",

This is an MDS journal object.. so the MDS is stuck replaying its journal 
because it is unfound.

In this case I would do 'revert'.

sage


>                 "key": "",
>                 "snapid": -2,
>                 "hash": 2728386690,
>                 "max": 0,
>                 "pool": 6,
>                 "namespace": ""
>             },
>             "need": "1496'15853",
>             "have": "0'0",
>             "flags": "none",
>             "locations": []
>         }
> 
> 
> So it goes Filename -> OID -> PG -> OSD? So if I trace down
> "200.0000052d" I should be able to clear the problem? I seem to get
> files in the lost+found directory think from fsck. Does the deep
> scrubbing eventually clear these after a week or will they always
> require manual intervention?
> 
> On Tue, Sep 12, 2017 at 3:48 PM, Sage Weil <sweil@redhat.com> wrote:
> > On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >On Tue, 12 Sep 2017, Two Spirit wrote:
> >> >> I don't have any OSDs that are down, so the 1 unfound object I think
> >> >> needs to be manually cleared. I ran across a webpage a while ago that
> >> >> talked about how to clear it, but if you have a reference, would save
> >> >> me a little time.
> >> >
> >> >http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
> >>
> >> Thanks. That was the page I had read earlier.
> >>
> >> I've attached the full outputs to this mail and show just clips below.
> >>
> >> # ceph health detail
> >> OBJECT_UNFOUND 1/731529 unfound (0.000%)
> >>     pg 6.2 has 1 unfound objects
> >>
> >> There looks like one number that shouldn't be there...
> >> # ceph pg 6.2 list_missing
> >> {
> >>     "offset": {
> >> ...
> >>         "pool": -9223372036854775808,
> >>         "namespace": ""
> >>     },
> >> ...
> >
> > I think you've snipped out the bit that has the name of the unfound
> > object?
> >
> > sage
> >
> >>
> >> # ceph -s
> >>     osd: 6 osds: 6 up, 6 in; 10 remapped pgs
> >>
> >> This shows under the pg query that something believes that osd "2" is
> >> down, but all OSDs are up, as seen in the previous ceph -s command.
> >> # ceph pg 6.2 query
> >>     "recovery_state": [
> >>         {
> >>             "name": "Started/Primary/Active",
> >>             "enter_time": "2017-09-12 10:33:11.193486",
> >>             "might_have_unfound": [
> >>                 {
> >>                     "osd": "0",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "1",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "2",
> >>                     "status": "osd is down"
> >>                 },
> >>                 {
> >>                     "osd": "4",
> >>                     "status": "already probed"
> >>                 },
> >>                 {
> >>                     "osd": "5",
> >>                     "status": "already probed"
> >>                 }
> >>
> >>
> >> If i go to a couple other OSDs, and run the same command,
> >> the osd "2" is listed as "already probed". They are not in sync. I
> >> double checked that all the OSDs were up on all 3 times I ran the
> >> command.
> >>
> >> Now. my question to debug this to figure out if I want to
> >> "revert|delete", is what in the heck are these file(s)/object(s)
> >> associated with the pg? I assume this might be in the MDS, but I'd
> >> like to see a file name associated with this to make a further
> >> determination of what I should do.  I don't have enough information at
> >> this point to figure out how I should recover.
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  reply	other threads:[~2017-09-13  1:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12 22:20 clearing unfound objects Two Spirit
2017-09-12 22:48 ` Sage Weil
2017-09-13  0:07   ` Two Spirit
2017-09-13  1:54     ` Sage Weil [this message]
2017-09-13 15:46       ` Two Spirit
2017-09-13 21:34         ` Two Spirit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1709130154080.24068@piezo.novalocal \
    --to=sweil@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jspray@redhat.com \
    --cc=twospirit6905@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.