From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF9CC432C3 for ; Wed, 27 Nov 2019 14:15:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D9963206F0 for ; Wed, 27 Nov 2019 14:15:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="K5rh3al9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727028AbfK0OPs (ORCPT ); Wed, 27 Nov 2019 09:15:48 -0500 Received: from mail-qt1-f175.google.com ([209.85.160.175]:46029 "EHLO mail-qt1-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727022AbfK0OPq (ORCPT ); Wed, 27 Nov 2019 09:15:46 -0500 Received: by mail-qt1-f175.google.com with SMTP id 30so25444786qtz.12 for ; Wed, 27 Nov 2019 06:15:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=bVHtNgvsO+pqoxlp3DiiLUwne81VLxMm55SwI+5rWp8=; b=K5rh3al9XG2h/QwYEtZ75TSRuj9okcgKNDqETlQFwZB4/jWmTComVA2KWc+Oo4kxEH eB4kibVgcu8Aqsfb7WEsbmXoNe9ErTQSXrF9YdA+cejUzvFJEOQqlDDJu3/yFxfc4XK1 yFLD6M5K72a4+iDw/gO/tfYCGxYQRKar/1GBfbqkrMCSXu0PGvCX36SurF7A2CVrhHKe VirhbUqLGiZW/7+lNp5efUsHzGQGectXhyjOFZErpeN+efEpZT3r6CSrqowbmBGrXt3s EGs0cnuRzXIdPqRKZM6vdM8sQb53Zc5lYiblpwY42kKPxhHcbT+kUQR563OsIOPGVQlD 8Tig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=bVHtNgvsO+pqoxlp3DiiLUwne81VLxMm55SwI+5rWp8=; b=Gyv7sEEo6ODG0TEdAlRr3CqY63G+lNZN2Lv5pHhCehMaTE5lDdMjDrod3kFaoZkjvY dZ+mZsQQv3lq2jga2wWMSDog4qeD56l9daydPc6dkkO31+5BxiohRsVc01G/lzjvZiZ8 BaLLrL835Em1m+bRxEP4dOgAe2pWOAkz1Nr7wGKhsb3He8cB9CGp1yyeukzKjmMnGklE to6SsSoFF5RKLmGAwHS2E13oucvzzsjuZEUcgE55nBupmJkh1NCgHjgo6vNSCFknwR8f rSYfYFOxUYW5N6eauL7NDk/lbbkNRU+GWMG783raRa4cxMOwLNvOBSYr24FP2ti0gs79 earg== X-Gm-Message-State: APjAAAW5r+/2voD66WcRI9DM+7DgwvUzeWIqFbsouLDwSv7kRCCo6L12 HyhqIzKSX0cN0scrjIsenv1ORwVGus4= X-Google-Smtp-Source: APXvYqwuuEUuBPaFAHNpySczfW40dIWoWYbtuGz6Yc2kAdsQHZZIgxyoB1qzuOE1OQ1RpZnMuWCOQA== X-Received: by 2002:ac8:7405:: with SMTP id p5mr3838088qtq.113.1574864143913; Wed, 27 Nov 2019 06:15:43 -0800 (PST) Received: from [192.168.255.200] (user-12l2ihc.cable.mindspring.com. [69.81.74.44]) by smtp.gmail.com with ESMTPSA id i203sm2095393qke.126.2019.11.27.06.15.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 27 Nov 2019 06:15:43 -0800 (PST) Subject: Re: bad sector / bad block support To: Christopher Staples , linux-btrfs@vger.kernel.org References: From: "Austin S. Hemmelgarn" Message-ID: Date: Wed, 27 Nov 2019 09:15:41 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 2019-11-26 22:30, Christopher Staples wrote: > will their ever be a better way to handle bad sectors ? I keep > getting silent corruption from random bad sectors > scrubs keep passing with out showing any errors , but if I do a > ddrescue backup to a new drive I find the bad sectors Zygo is correct, if there are no checksum errors, it's almost certainly not the storage device. Put simply, for a media error to cause corruption without a checksum error, all of the following need to happen at the same time: * At least one sector in a data block has to go bad without the storage device's built-in error correction catching it. If the ECC functionality of the drive caught it, it would either return the correct data or a read error. This is actually rather unlikely for small numbers of devices (but the likelihood of it happening goes up as you increase the number of devices involved). * A mix of similar errors in the block containing the checksum for the data block has to similarly go bad without being detected or corrected and it has to match up with the corrupted data _or_ the checksum for the corrupted data block has to be valid for the corrupted data. This is astronomically unlikely, to a level that you're far more likely to be struck and killed by a meteor than this happening. * The above then has to happen for the checksum for the metadata block containing the checksum for that data block, and in turn the same condition has to repeat for each block up the tree to the root (usually 3+ times). This is so unlikely to be a statistical impossibility. So, as Zygo suggests, check your RAM, check your CPU, possibly check your PSU (bad power supplies can cause all kinds of weird things). > > > the only thing I can do for now is mark I/O error files as bad buy > renaming them and make another file copy onto the file system , > > > I like btrfs for the snapshot ability , but when it comes to keeping > data safe ext4 seems better ? at least it looks for bad sectors and > marks them , btrfs just seems to write and assumes its written .. ext4 wouldn't save you here, because you almost certainly aren't dealing with bad sectors. BTRFS doesn't include bad sector support like ext4 because it solves the issue a different way, namely by checksumming everything and then validating the checksums on read. On a slightly separate note, you should never need the ext4 bad block functionality on any modern hardware unless your storage devices are way beyond the point at which they should be replaced. The original reason for having bad block lists in the filesystem was that disk drives didn't remap bad sectors, which in turn meant that the filesystem had to deal with them. It's been multiple decades since that was the case though, and all modern storage devices (except possibly some really cheap USB flash drives) remap bad sectors and only let the OS see them when they run out of space to remap them to.