From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f68.google.com ([209.85.167.68]:37954 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728479AbfDBSzA (ORCPT ); Tue, 2 Apr 2019 14:55:00 -0400 Received: by mail-lf1-f68.google.com with SMTP id a6so9826826lfl.5 for ; Tue, 02 Apr 2019 11:54:59 -0700 (PDT) MIME-Version: 1.0 References: <20181211183203.7fdbca0f@lud1.home> <20190331224918.GO23020@dastard> <20190401181311.334e96e8@lud1.home> <20190401213226.GR26298@dastard> <20190402132357.0f72e3a9@lud1.home> In-Reply-To: <20190402132357.0f72e3a9@lud1.home> From: Chris Murphy Date: Tue, 2 Apr 2019 12:54:47 -0600 Message-ID: Subject: Re: File system corruption in two hard disks Content-Type: text/plain; charset="UTF-8" Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Luciano ES Cc: XFS mailing list On Tue, Apr 2, 2019 at 10:24 AM Luciano ES wrote: > [ 3.790321] sd 1:0:0:0: [sdb] tag#12 CDB: Read(10) 28 00 00 00 0a 00 00 01 00 00 > [ 3.790323] blk_update_request: I/O error, dev sdb, sector 2640 Common bad sector error, includes the LBA for the sector. There's a scant chance it's recoverable if the drive supports configurable SCT ERC, and just happens to have a low timeout value (common on NAS and enterprise drives). You can check it with: # smartctl -l scterc /dev/sdb # cat /sys/block/sdb/device/timeout These are two different things. The first is internal to the drive (firmware). The second is the kernel's command queue timer for that block device. If the SCT ERC value is something short like 70 deciseconds, you can try disabling it. # smartctl -l scterc,0,0 /dev/sdb And then increase the kernel command timer to something ridiculous like 180 seconds. # echo 180 > /sys/block/sdb/device/timeout Try your repair again. xfs_repair might appear to hang. My guess is it fails again right away. But there's some chance giving the drive more time to recover that sector, and it might just do it. Thing is, if there's no problem with the contents on that bad sector, it won't likely be overwritten, and it only gets "repaired" by an overwrite. Once the xfs_repair completes and if successful, you'll want to mount the file system rw, make some trivial change like touching a file, then unmount. A reboot will reset all of these values, and you'll quickly learn if this is fixed. If not...well cross that bridge later depending on what results you get. > [ 8.298754] sd 1:0:0:0: [sdb] tag#14 Add. Sense: Unrecovered read error - auto reallocate failed > [ 8.298757] sd 1:0:0:0: [sdb] tag#14 CDB: Read(10) 28 00 00 00 0a 56 00 00 02 00 > [ 8.298758] blk_update_request: I/O error, dev sdb, sector 2646 2640 and 2646 are likely the same 4096 physical sector; they get different values because of 512 byte sector emulation. What do you get for # blockdev --getss --getpbsz > I didn't have time to investigate more so I didn't even try smartctl on it. > But looks like that disk is dead, doesn't it? > :-( Uncertain. Some number of bad sectors are considered acceptable by the manufacturer if they remap. Well, yours went bad before the remap so I'd complain if the drive is under warranty. But that's separate from recovery... # smartctl -x /dev/sdb -- Chris Murphy