All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Anand Jain <anand.jain@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
Date: Mon, 16 Nov 2015 08:41:49 -0500	[thread overview]
Message-ID: <5649DD1D.3090104@gmail.com> (raw)
In-Reply-To: <1447066589-3835-1-git-send-email-anand.jain@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]

On 2015-11-09 05:56, Anand Jain wrote:
> These set of patches provides btrfs hot spare and auto replace support
> for you review and comments.
>
> First, here below are the simple example steps to configure the same:
>
> Add a spare device:
>      btrfs spare add /dev/sde -f
>
> OR if there is a spare device which is already added before the, just
> run
>
>      btrfs dev scan [/dev/sde]
>
> this will register the spare device to the kernel.
>
>      btrfs fi show
>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
> 	Total devices 2 FS bytes used 112.00KiB
> 	devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
> 	devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>
>      Global spare
> 	device size 3.00GiB path /dev/sde
>
> Thats it.
>
> Auto replace:
>   Replace happens automatically, that is when there is any write
>   failed or flush failed, the device will be marked as failed, which
>   will stop any further IO attempt to that device. And in the next commit
>   thread cycle the auto replace will pick the spare device (/dev/sde is
>   above example) to replace the failed device. And so the btrfs volume is
>   back to a healthy state.
>
>
> Its btrfs Global spare:
>   as of now only global hot spare is supported, that is hot spare(s)
>   are for all the btrfs FS in the system.
>
> No spare when device failed:
>   It would scan for spare device at the rate of transaction commit
>   and will trigger the auto replace when ever spare device is added.
>
> Priority:
>   In some future work there can be some chronological order to pick
>   a spare and the failed device.
>
>
> Patches:
>
> Kernel:
> First, it needs, Qu's per chunk missing device patchset,
> which is part of the set here and also there is a light optimization
> (patch 5/15) which was required as part of this enhancement.
>
> Next patches 7,8/15 brings in support, to manage the transition of
> devices from online (no state) to offline OR failed state dynamically.
> On top of static device state like the current "missing" state.
>
> Patch 9/15 fixes a bug where in we should have blocked the incompatible
> feature at the device scan/add level instead/also at in the mount level.
> This is because we don't have to bring a device into the device list,
> if it is incompatible.
>
> Next patches 10,11,12,13/15 adds support for Spare device. For the
> details on how to add a spare device kindly see further below.
> For kernel with out spare feature supported the spare device
> is kept away. And when the kernel supports the spare device, it will
> inhibit from mounting it. Further these patch set provides helper
> function to pick a spare device and release a spare device back to
> the spare device pool.
>
> Patch 14/15 provides function for auto replace, this is mainly
> from the existing replace code, and in the long run I see opportunity
> to merge these code with the replace code that is triggered from
> the user spare.
>
> Last 15/15, uses all these facilities, picks a failed device and
> triggers a auto replace in a kthread (casualty_kthread())
>
>
> Progs:
> Would need 4 patches as listed below.
>
>
> Known Bug:
>
> As now I see below stale kmem cache during module unload. Which
> I am digging.
> ------
> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
> ------
>
> Anand Jain (10):
>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>    btrfs: introduce device dynamic state transition to offline or failed
>    btrfs: check device for critical errors and mark failed
>    btrfs: block incompatible optional features at scan
>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>    btrfs: add check not to mount a spare device
>    btrfs: support btrfs dev scan for spare device
>    btrfs: provide framework to get and put a spare device
>    btrfs: introduce helper functions to perform hot replace
>    btrfs: check for failed device and hot replace
>
> Qu Wenruo (5):
>    btrfs: Introduce a new function to check if all chunks a OK for
>      degraded mount
>    btrfs: Do per-chunk check for mount time check
>    btrfs: Do per-chunk degraded check for remount
>    btrfs: Allow barrier_all_devices to do per-chunk device check
>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>
>   fs/btrfs/ctree.h       |   7 +-
>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>   fs/btrfs/dev-replace.h |   1 +
>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>   fs/btrfs/disk-io.h     |   2 -
>   fs/btrfs/super.c       |  20 +++-
>   fs/btrfs/transaction.c |   3 +-
>   fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/volumes.h     |  27 +++++
>   9 files changed, 571 insertions(+), 99 deletions(-)
>
I've thrown everything I can think of at this over the weekend, and 
nothing broke (at least, nothing broke that had anything to do with 
these patches, I ended up triggering a couple of known bugs that I had 
completely forgotten about), so you can add:
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

  parent reply	other threads:[~2015-11-16 13:41 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2015-12-05  7:16   ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58   ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2015-11-09 10:58   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
2015-11-09 21:29   ` Duncan
2015-11-10 12:13     ` Austin S Hemmelgarn
2015-11-13 10:17       ` Anand Jain
2015-11-13 12:25         ` Austin S Hemmelgarn
2015-11-15 18:10         ` Christoph Anton Mitterer
2015-11-12  2:15 ` Qu Wenruo
2015-11-12  6:46   ` Duncan
2015-11-12 13:04   ` Austin S Hemmelgarn
2015-11-13  1:07     ` Qu Wenruo
2015-11-13 10:20       ` Anand Jain
2015-11-14  0:54         ` Qu Wenruo
2015-11-16 13:39           ` Austin S Hemmelgarn
2015-11-12 19:08   ` Goffredo Baroncelli
2015-11-13 10:18   ` Anand Jain
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20   ` Anand Jain
2015-11-14 11:05     ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn [this message]
2015-11-16 22:07   ` Anand Jain
2015-11-17 12:28     ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5649DD1D.3090104@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.