From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: MDS stuck in a crash loop Date: Thu, 22 Oct 2015 06:14:59 -0700 (PDT) Message-ID: References: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from cobra.newdream.net ([66.33.216.30]:39414 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751718AbbJVNPA (ORCPT ); Thu, 22 Oct 2015 09:15:00 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: John Spray Cc: Milosz Tanski , Gregory Farnum , ceph-devel On Thu, 22 Oct 2015, John Spray wrote: > On Thu, Oct 22, 2015 at 1:43 PM, Milosz Tanski wrote: > > On Wed, Oct 21, 2015 at 5:33 PM, John Spray wrote: > >> On Wed, Oct 21, 2015 at 10:33 PM, John Spray wrote: > >>>> John, I know you've got > >>>> https://github.com/ceph/ceph-qa-suite/pull/647. I think that's > >>>> supposed to be for this, but I'm not sure if you spotted any issues > >>>> with it or if we need to do some more diagnosing? > >>> > >>> That test path is just verifying that we do handle dirs without dying > >>> in at least one case -- it passes with the existing ceph code, so it's > >>> not reproducing this issue. > >> > >> Clicked send to soon, I was about to add... > >> > >> Milosz mentioned that they don't have the data from the system in the > >> broken state, so I don't have any bright ideas about learning more > >> about what went wrong here unfortunately. > >> > > > > Sorry about that, wasn't thinking at the time and just wanted to get > > this up and going as quickly as possible :( > > > > If this happens next time I'll be more careful to keep more evidence. > > I think multi-fs in the same rados namespace support would actually > > helped here, since it makes it easier to create a newfs and leave the > > other one around (for investigation) > > Yep, good point. I am a known enthusiast for multi-filesystem support :-) A rados pool export on the metadata pool would have helped, too. That doesn't include data object backtrace metadata, though. I wonder if we should make a cephfs metadata imager tool to capture the metadata state of the file system (similar to the tools that are available for xfs) that captures both. On the data pool side it'd just record the object names, xattrs, and object size, ignoring the data. It wouldn't anonymize filenames (that is tricky without breaking the mds dir hashing), but it excludes data and would probably be sufficient for most users... sage