linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: waxhead@dirtcellar.net, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS bad block management. Does it exist?
Date: Tue, 16 Oct 2018 17:57:32 +0800	[thread overview]
Message-ID: <2287c62d-6dbb-3b30-1134-d754e42941ab@oracle.com> (raw)
In-Reply-To: <42b1965a-356c-25c9-8c49-788a9a8a11aa@dirtcellar.net>




On 10/14/2018 07:08 PM, waxhead wrote:
> In case BTRFS fails to WRITE to a disk. What happens?

> Does the bad area get mapped out somehow?

There was a proposed patch, its not convincing because the disks does 
the bad block relocation part transparently to the host and if disk runs 
out of reserved list then probably its time to replace the disk as in my 
experience the disk would have failed for other non-media error before 
it runs out of the reserved list and where in this case the host 
performed relocation won't help. Further more being at the file-system 
level you won't be able to accurately determine whether the block write 
has failed for the bad media error and not because of the reason of 
target circuitry fault.

> Does it try again until it 
> succeed or

> until it "times out" or reach a threshold counter?

Block IO timeout and retry are the properties of the block layer 
depending on the type of error it should.

SD module already does retry of 5 counts (when failfast is not set), it 
should be tune-able. And I think there was a patch for that in the ML.

We had few discussion on the retry part in the past. [1]
[1]
https://www.spinics.net/lists/linux-btrfs/msg70240.html
https://www.spinics.net/lists/linux-btrfs/msg71779.html


> Does it eventually try to write to a different disk (in case of using 
> the raid1/10 profile?)

When there is mirror copy it does not go into the RO mode, and it leaves 
write hole(s) patchy across any transaction as we don't fail the disk at 
the first failed transaction. That means if a disk is at nth transaction 
per the super-block, its not guaranteed that all previous transactions 
have made it to the disk successfully in case of mirror-ed configs. I 
consider this as a bug. And there is a danger that it may read the junk 
data, which is hard but not impossible to hit due to our un-reasonable 
(there is a patch in the ML to address that as well) hard-coded 
pid-based read-mirror policy.

I sent a patch to fail the disk when first write fails so that we know 
the last good integrity of the FS based on the transaction id. That was 
a long time back I still believe its important patch. There wasn't 
enough comments I guess for it go into the next step.

The current solution is to replace the offending disk _without_ reading 
from it, to have a good recovery from the failed disk. As data centers 
can't relay on admin initiated manual recovery, there is also a patch to 
do this stuff automatically using the auto-replace feature, patches are 
in the ML. Again there wasn't enough comments I guess for it go into the 
next step.

Thanks, Anand

      parent reply	other threads:[~2018-10-16  9:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-14 11:08 BTRFS bad block management. Does it exist? waxhead
2018-10-14 11:31 ` Qu Wenruo
2018-10-15 12:09 ` Austin S. Hemmelgarn
2018-10-16  9:57 ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2287c62d-6dbb-3b30-1134-d754e42941ab@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).