From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f52.google.com ([209.85.218.52]:32923 "EHLO mail-oi0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbcFXAy3 (ORCPT ); Thu, 23 Jun 2016 20:54:29 -0400 Received: by mail-oi0-f52.google.com with SMTP id u201so95333077oie.0 for ; Thu, 23 Jun 2016 17:54:29 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Chris Murphy Date: Thu, 23 Jun 2016 18:54:28 -0600 Message-ID: Subject: Re: Bad hard drive - checksum verify failure forces readonly mount To: Vasco Almeida Cc: Btrfs BTRFS Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Jun 23, 2016 at 2:30 PM, Vasco Almeida wrote: > I was running OpenSuse Leap 42.1 with btrfs and > LVM (Logical Volume Management). > Last time I've checked smartd log, I noticed there were > 30 sector pending reallocation and 1 unrecoverable bad > sector on hard drive. > I think my hard drive got some sector corrupted and now btrfs fails > some checksum and forces mount readonly. > The device is successfully mounted readonly. > > OpenSuse dmesg reported: > > BTRFS: dm-1 checksum verify failed on 437944320 wanted 39F45669 found > 8BF8C752 leval 0 > (more 2 times) > BTRFS: error (device dm-1) in btrfs_drop_snapshot:???: error=-5 IO failure > BTRFS: info (device dm-1): forced readonly > > Now I'm on System Rescue CD and that is not reported. > I've written down those log line on paper, so there may be some typo. > Seemingly there is no journalctl installed on this system to check > OpenSuse logs again. > > All the following logs are on System Rescue CD. > mount -o ro,recovery /dev/mapper/vg_pupu-lv_opensuse_root /mnt/opensuse > https://bpaste.net/show/263e5f7ae9d4 > > After mounting and umounting several times with and without "-o ro,recovery" > https://bpaste.net/show/43eb64decb63 > > btrfs check --readonly /dev/mapper/vg_pupu-lv_opensuse_root > https://bpaste.net/show/7ecf422c73a2 > > > Would it be apropriate to run any of "btrfs check --repair /device" or > "btrfs check --init-csum-tree /device" to be able to mount readwrite again? > > smartctl --all /dev/disk/by-id/ata-SAMSUNG_HD154UI_S1Y6JDWSC01351 > https://bpaste.net/show/a6c132618974 > > btrfs check manpage: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check > btrfsck page: https://btrfs.wiki.kernel.org/index.php/Btrfsck Normally if this is just data blocks corrupted it will still mount rw and just flag the affected file in kernel messages so you can delete it and replace. Since that's not happening, it's probably metadata, but then there should be two copies unless this is on SSD or otherwise the file system was created with -m single. If there are two copies of the metadata and both are wrong that's unusual. >>From the pasted kernel messages: > Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version 4.8.5 (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC 2016 3.18.34 is ancient. Find something newer and try to remount normally. And then also with recovery if necessary (don't use ro, see if it'll mount rw and fix itself). And if not, then try btrfs check with a newer version of btrfs-progs, I can't tell from the pasted output what version you're using but since the kernel is so old, decent chance the btrfsck is old also. Chris Murphy -- Chris Murphy