All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Chris Mason <clm@fb.com>,
	bo.li.liu@oracle.com, fdmanana@suse.com,
	Josef Bacik <jbacik@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com>,
	David Sterba <dsterba@suse.cz>
Subject: Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?
Date: Mon, 1 May 2017 22:11:55 -0700	[thread overview]
Message-ID: <20170502051155.igmbqcqgzi4ukefg@merlins.org> (raw)
In-Reply-To: <CAJCQCtRaeejj4b+J6M7VTsDAe7Sn4PUtBwcvHXgfLNAFe=8Jqg@mail.gmail.com>

On Mon, May 01, 2017 at 10:56:06PM -0600, Chris Murphy wrote:
> > Right, of course, I was being way over optimistic here. I kind of forgot
> > that metadata wasn't COW, my bad.
> 
> Well it is COW. But there's more to the file system than fs trees, and
> just because an fs tree gets snapshot doesn't mean all data is
> snapshot. So whether snapshot or not, there's metadata that becomes
> obsolete as the file system is updated and those areas get freed up
> and eventually overwritten.

Got it, thanks for explaining.

> > Also, how is --mode=lowmem being useful?
> 
> Testing. lowmem is a different implementation, so it might find
> different things from the regular check.
 
I see.
I've fired off some scrub -r and then check to run overnight, I'll see
if it finishes overnight assuming the kernel doesn't crash again (yeah,
just to make things simpler, I'm hitting another issue when I/O piles up
on btrfs on top of dmcrypt on top of bcache
http://lkml.iu.edu/hypermail/linux/kernel/1705.0/00626.html
https://pastebin.com/YqE4riw0
but that's not a bcache bug, just something else getting in the way.

> > And for re-parenting a sub-subvolume, is that possible?
> > (I want to delete /sub1/ but I can't because I have /sub1/sub2 that's also a subvolume
> > and I'm not sure how to re-parent sub2 to somewhere else so that I can subvolume delete
> > sub1)
> 
> Well you can move sub2 out of sub1 just like a directory and then
> delete sub1. If it's read-only it can't be moved, but you can use
> btrfs property get/set ro true/false to temporarily make it not
> read-only, move it, then make it read-only again, and it's still fine
> to use with btrfs send receive.

Ah, I didn't think mv would work from inside a subvolume to outside of a
subvolume without copying data (it doesn't for files) but I guess it
would for for subvolumes, good point.
I'll try that, thanks.

> Not understanding the problem, it's by definition naive for me to
> suggest it should go read-only sooner before hosing itself. But I'd
> like to think it's possible for Btrfs to look backward every once in a
> while for sanity checking, to limit damage should it be occurring even
> if the hardware isn't reporting any problems.

Fair point. To be honest, maybe btrfs could indeed have detected
problems earlier, but ultimately it's not really its fault if bad things
happen when I'm having repeated storage errors underneath. For all I
know, some data got written after getting corrupted and btrfs would not
notice that right away.
Now, I kind of naively thought I could simply unroll all writes done
after a certain point. You pointed right (rightfully so) that it's not
nearly as simple as I was hoping.

So at this point, I think it's just a matter of me providing
check/repair logs if they are useful, and someone looking into this
balance causing a kernel crash, which is IMO the only real thing that
btrfs should reasonably fix.

I'll update the thread when I have more logs and have moved further on
the recovery.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

  reply	other threads:[~2017-05-02  5:12 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-20 14:39 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean Marc MERLIN
2017-06-20 15:23 ` Hugo Mills
2017-06-20 15:26   ` Marc MERLIN
2017-06-20 15:36     ` Hugo Mills
2017-06-20 15:44       ` Marc MERLIN
2017-06-20 23:12         ` Marc MERLIN
2017-06-20 23:58           ` Marc MERLIN
2017-06-21  3:31           ` Chris Murphy
2017-06-21  3:43             ` Marc MERLIN
2017-06-21 15:13               ` How to fix errors that check --mode lomem finds, but --mode normal doesn't? Marc MERLIN
2017-06-21 23:22                 ` Chris Murphy
2017-06-22  0:48                   ` Marc MERLIN
2017-06-22  2:22                 ` Qu Wenruo
2017-06-22  2:53                   ` Marc MERLIN
2017-06-22  4:08                     ` Qu Wenruo
2017-06-23  4:06                       ` Marc MERLIN
2017-06-23  8:54                         ` Lu Fengqi
2017-06-23 16:17                           ` Marc MERLIN
2017-06-24  2:34                             ` Marc MERLIN
2017-06-26 10:46                               ` Lu Fengqi
2017-06-27 23:11                                 ` Marc MERLIN
2017-06-28  7:10                                   ` Lu Fengqi
2017-06-28 14:43                                     ` Marc MERLIN
2017-05-01 17:06                                       ` 4.11 relocate crash, null pointer Marc MERLIN
2017-05-01 18:08                                         ` 4.11 relocate crash, null pointer + rolling back a filesystem by X hours? Marc MERLIN
2017-05-02  1:50                                           ` Chris Murphy
2017-05-02  3:23                                             ` Marc MERLIN
2017-05-02  4:56                                               ` Chris Murphy
2017-05-02  5:11                                                 ` Marc MERLIN [this message]
2017-05-02 18:47                                                   ` btrfs check --repair: failed to repair damaged filesystem, aborting Marc MERLIN
2017-05-03  6:00                                                     ` Marc MERLIN
2017-05-03  6:17                                                       ` Marc MERLIN
2017-05-03  6:32                                                         ` Roman Mamedov
2017-05-03 20:40                                                           ` Marc MERLIN
2017-07-07  5:37                                                   ` ctree.c:197: update_ref_for_cow: BUG_ON `ret` triggered, value -5 Marc MERLIN
2017-07-07  5:39                                                     ` Marc MERLIN
2017-07-07  9:33                                                       ` Lu Fengqi
2017-07-07 16:38                                                         ` Marc MERLIN
2017-07-09  4:34                                                           ` 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0) Marc MERLIN
2017-07-09  5:05                                                             ` We really need a better/working btrfs check --repair Marc MERLIN
2017-07-09  6:34                                                             ` 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0) Marc MERLIN
2017-07-09  7:57                                                             ` Martin Steigerwald
2017-07-09  9:16                                                               ` Paul Jones
2017-07-09 11:17                                                                 ` Duncan
2017-07-09 13:00                                                                   ` Martin Steigerwald
2017-07-29 19:29                                                                   ` Imran Geriskovan
2017-07-29 23:38                                                                     ` Duncan
2017-07-30 14:54                                                                       ` Imran Geriskovan
2017-07-31  4:53                                                                         ` Duncan
2017-07-31 20:32                                                                           ` Imran Geriskovan
2017-08-01  1:36                                                                             ` Duncan
2017-08-01 15:18                                                                               ` Imran Geriskovan
2017-07-31 21:07                                                               ` Ivan Sizov
2017-07-31 21:17                                                                 ` Marc MERLIN
2017-07-31 21:39                                                                   ` Ivan Sizov
2017-08-01 16:41                                                                     ` Ivan Sizov
2017-07-31 22:00                                                                   ` Justin Maggard
2017-08-01  6:38                                                                     ` Marc MERLIN
2017-05-02 19:59                                                 ` 4.11 relocate crash, null pointer + rolling back a filesystem by X hours? Kai Krakow
2017-05-02  5:01                                               ` Duncan
2017-05-02 19:53                                                 ` Kai Krakow
2017-05-23 16:58                                                 ` Marc MERLIN
2017-05-24 10:16                                                   ` Duncan
2017-05-05  1:19                                               ` Qu Wenruo
2017-05-05  2:10                                                 ` Qu Wenruo
2017-05-05  2:40                                                 ` Marc MERLIN
2017-05-05  5:03                                                   ` Qu Wenruo
2017-05-05 15:43                                                     ` Marc MERLIN
2017-05-17 18:23                                                       ` Kai Krakow
2017-05-05  1:13                                           ` Qu Wenruo
2017-06-29 13:36                                       ` How to fix errors that check --mode lomem finds, but --mode normal doesn't? Lu Fengqi
2017-06-29 15:30                                         ` Marc MERLIN
2017-06-30 14:59                                           ` Lu Fengqi
2017-06-22  4:08                     ` Qu Wenruo
2017-06-21 12:04           ` 4.11.3: BTRFS critical (device dm-1): unable to add free space :-17 => btrfs check --repair runs clean Duncan
2017-06-21  3:26         ` Chris Murphy
2017-06-21  4:06           ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502051155.igmbqcqgzi4ukefg@merlins.org \
    --to=marc@merlins.org \
    --cc=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=fdmanana@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.