From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?iso-8859-2?Q?=A3ukasz_Chrustek?= Subject: Re: Problem with query and any operation on PGs Date: Thu, 25 May 2017 00:09:04 +0200 Message-ID: <804507840.20170525000904@tlen.pl> References: <175484591.20170523135449@tlen.pl> <483467685.20170523144818@tlen.pl> <1464688590.20170523185052@tlen.pl> <1075363645.20170523234331@tlen.pl> <135176900.20170524151952@tlen.pl> <1203308391.20170524155848@tlen.pl> <379087365.20170524161815@tlen.pl> <2910218531.20170524233842@tlen.pl> Reply-To: =?iso-8859-2?Q?=A3ukasz_Chrustek?= Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8BIT Return-path: Received: from mx-out.tlen.pl ([193.222.135.142]:31178 "EHLO mx-out.tlen.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031705AbdEXWZN (ORCPT ); Wed, 24 May 2017 18:25:13 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel@vger.kernel.org Cześć, > On Wed, 24 May 2017, Łukasz Chrustek wrote: >> Hello, >> >> >> >> >> > This >> >> >> >> osd 6 - isn't startable >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs >> > from this osd to recover any important writes on that osd. >> >> >> osd 10, 37, 72 are startable >> >> > With those started, I'd repeat the original sequence and get a fresh pg >> > query to confirm that it still wants just osd.6. >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other >> > ranodm osd (not one of these ones), import the pg into that osd, and start >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that >> > point. repeat with the same basic process with the other pg. >> >> Here is output from ceph-objectstore-tool - also didn't success: >> >> https://pastebin.com/7XGAHdKH > Hmm, btrfs: > 2017-05-24 23:28:58.547456 7f500948e940 -1 > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > losing new data > You could try setting --osd-use-stale-snap as suggested. Yes... tried... and I simply get rided of 39GB data... > Is it the same error with the other one? Yes: https://pastebin.com/7XGAHdKH > in particular, osd 37 38 48 67 all have incomplete copies of the PG (they > are mid-backfill) and 68 has nothing. Some data is lost unless you can > recovery another OSD with that PG. > The set of OSDs that might have data are: 6,10,33,72,84 > If that bears no fruit, then you can force last_backfill to report how to force last_backfill ? > complete on one of those OSDs and it'll think it has all the data even > though some of it is likely gone. (We can pick one that is farther > along... 38 48 and 67 seem to all match.) > sage -- Pozdrowienia, Łukasz Chrustek