From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f49.google.com ([209.85.218.49]:36412 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751249AbcFYNUa (ORCPT ); Sat, 25 Jun 2016 09:20:30 -0400 Received: by mail-oi0-f49.google.com with SMTP id f189so149270210oig.3 for ; Sat, 25 Jun 2016 06:20:30 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20160625010610.Horde.tUycS31CmgVWfy3CPu7qJCD@mail.sapo.pt> References: <5356822.A3RRKHDHNy@linux-omuo> <20160625010610.Horde.tUycS31CmgVWfy3CPu7qJCD@mail.sapo.pt> From: Chris Murphy Date: Sat, 25 Jun 2016 07:20:28 -0600 Message-ID: Subject: Re: Bad hard drive - checksum verify failure forces readonly mount To: Vasco Almeida Cc: Chris Murphy , Btrfs BTRFS Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Fri, Jun 24, 2016 at 6:06 PM, Vasco Almeida wrote: > Citando Chris Murphy : >> A lot of changes have happened since 4.1.2 I would still use something >> newer and try to repair it. > > > By repair do you mean issue "btrfs check --repair /device" ? Once you have copied off the important stuff, yes. It's less likely to make things worse now. However, there are some things to do first: > dmesg http://paste.fedoraproject.org/384352/80842814/ [ 1837.386732] BTRFS info (device dm-9): continuing balance [ 1838.006038] BTRFS info (device dm-9): relocating block group 15799943168 flags 34 [ 1838.684892] BTRFS info (device dm-9): relocating block group 10934550528 flags 36 [ 1839.301453] ------------[ cut here ]------------ [ 1839.301495] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x45c/0x5a0 [btrfs]() followed by [ 1839.301797] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x29d/0x2d0 [btrfs]() [ 1839.301798] BTRFS: Transaction aborted (error -5) [...] [ 1839.301972] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2946: errno=-5 IO failure [ 1839.301975] BTRFS info (device dm-9): forced readonly So it looks like it was resuming a balance automatically, and while processing delayed references it's running into something it doesn't expect and doesn't have a way to fix, so it goes read only to avoid causing more problems. I would do a couple things in order: 1. Mount ro and copy off what you want in case the whole thing gets worse and can't ever be mounted again. 2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache If it mounts rw, don't do anything with it, just see if it cleans up after itself. It also looks from the previous trace it was trying to remove a snapshot and there are complaints of problems in that snapshot. So hopefully just waiting 5 minutes doing nothing and it'll clean up after itself (you can check with top to see if there are any btrfs related transactions that run including the btrfs-cleaner process) wait until they're done. Then umount. If you want you could have two other consoles ready first, one for 'journalctl -f' and another for sysrq+t to issue in case you get a hang. This doesn't fix anything but it collects more information for a bug report for the devs. Once you get it umounted normally or by force, the next thing to do is 3. btrfs-image so that devs can see what's causing the problem that the current code isn't handling well enough. 4. btrfs check --repair Let's see the results of that repair. You can use 'script btrfsrepair.txt' first and then 'btrfs check --repair' and it will log everything. After btrfs check completes, use 'exit' to stop script from recording and you should have a btrfsrepair.txt file you can post somewhere. When using > not everything gets logged for some reason but script will capture everything. Depending on how the repair goes, there might be a couple more options left. -- Chris Murphy