All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
Date: Thu, 12 Nov 2015 20:08:41 +0100	[thread overview]
Message-ID: <5644E3B9.5030800@inwind.it> (raw)
In-Reply-To: <5643F62D.6050703@cn.fujitsu.com>

On 2015-11-12 03:15, Qu Wenruo wrote:
> Hi Anand,
> 
> Nice work.
> But I have some small questions about it.
> 
> Anand Jain wrote on 2015/11/09 18:56 +0800:
>> These set of patches provides btrfs hot spare and auto replace support
>> for you review and comments.
>>
>> First, here below are the simple example steps to configure the same:
>>
>> Add a spare device:
>>      btrfs spare add /dev/sde -f
> 
> I'm sorry but I didn't quite see the benefit of a spare device.
> 
> Let's take the following example:
> 
> 1) 2 RAID1 + 1 spare
>    (A + B) + C
> 
> 2) 3 RAID1
>    (A + B + C)
> Let's assume they are all 12G size, and there are 3 raid1 chunks.
> Each one is 3G size.
> 
> In my understanding, in normal operation case:
> 
> For case 1), all raid chunks should only be allocated into 2 RAID disks,
> and spare one should contains no raid1 chunks.
> 
>   A       B       C
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  |    |
> |3Ga1|  |3Ga2|  |    |
> ------  ------  |    |
> |3Gb1|  |3Gb2|  |    |
> ------  ------  |    |
> |3Gc1|  |3Gc2|  |    |
> ------  ------  ------
> 
> 
> For case 2), all raid1 chunks will be allocated into all 3 disks, making the allocation more fair.
>   A       B       C
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  ------
> |free|  |free|  |free|
> ------  ------  ------
> |3Gb2|  |3Ga1|  |3Ga2|
> ------  ------  ------
> |3Gc1|  |3Gc2|  |3Gb1|
> ------  ------  ------
> 
> 
> At least in normal operation case, case 1) makes device C useless, and reduce the total usable space.
> 
> In disk B failure case:
> 
> For case 1), we can auto replace B with C.
> And it will copy all data chunks from A to C.
> Need to copy 9G data.
> 
> And after replace:
>   A       B       C
> ------  ------  ------
> |free|  | X  |  |free|
> ------  ------  ------
> |3Ga1|  | X  |->|3Ga2|
> ------  ------  ------
> |3Gb1|  | X  |->|3Gb2|
> ------  ------  ------
> |3Gc1|  | X  |->|3Gc2|
> ------  ------  ------
> 
> 
> 
> For case 2), we can just relocate and recover the bad chunks in B.
> It it should only need to copy 6G data.
> 
> And after the "recovery", it should be much the same as case 1):
>   A       B       C
> ------  ------  ------
> |free|  | X  |  |free|
> ------  ------  ------
> |3Ga1|<\| X  |/>|3Gc1|
> ------  ------  ------
> |3Gb2| || X  |/ |3Ga2|
> ------  ------  ------
> |3Gc1| \| X  |  |3Gb1|
> ------  ------  ------
> 
> 
> IIRC, the only benefit of a spare device is, we can ensure there is enough space for a device place.(If the failing one is no larger than spare).
> 
> But the cost is, increase in replace data copy and unfair chunk allocation.
> 
> So I am not sure if the cost is good enough for the case.
> At least, enhancing the chunk relocation to fulfill the case 2) will bring a much smaller code base.
> 
> Thanks,
> Qu

Interesting analysis. Another difference between the two scenarios, is that in the first case (A+B+spare) is that the spare doesn't work until it is needed: less power consumption and when needed you are using a new disk instead of an used one. 

>>
>> OR if there is a spare device which is already added before the, just
>> run
>>
>>      btrfs dev scan [/dev/sde]
>>
>> this will register the spare device to the kernel.
>>
>>      btrfs fi show
>>      Label: none  uuid: 52f170c1-725c-457d-8cfd-d57090460091
>>     Total devices 2 FS bytes used 112.00KiB
>>     devid    1 size 2.00GiB used 417.50MiB path /dev/sdc
>>     devid    2 size 2.00GiB used 417.50MiB path /dev/sdd
>>
>>      Global spare
>>     device size 3.00GiB path /dev/sde
>>
>> Thats it.
>>
>> Auto replace:
>>   Replace happens automatically, that is when there is any write
>>   failed or flush failed, the device will be marked as failed, which
>>   will stop any further IO attempt to that device. And in the next commit
>>   thread cycle the auto replace will pick the spare device (/dev/sde is
>>   above example) to replace the failed device. And so the btrfs volume is
>>   back to a healthy state.
>>
>>
>> Its btrfs Global spare:
>>   as of now only global hot spare is supported, that is hot spare(s)
>>   are for all the btrfs FS in the system.
>>
>> No spare when device failed:
>>   It would scan for spare device at the rate of transaction commit
>>   and will trigger the auto replace when ever spare device is added.
>>
>> Priority:
>>   In some future work there can be some chronological order to pick
>>   a spare and the failed device.
>>
>>
>> Patches:
>>
>> Kernel:
>> First, it needs, Qu's per chunk missing device patchset,
>> which is part of the set here and also there is a light optimization
>> (patch 5/15) which was required as part of this enhancement.
>>
>> Next patches 7,8/15 brings in support, to manage the transition of
>> devices from online (no state) to offline OR failed state dynamically.
>> On top of static device state like the current "missing" state.
>>
>> Patch 9/15 fixes a bug where in we should have blocked the incompatible
>> feature at the device scan/add level instead/also at in the mount level.
>> This is because we don't have to bring a device into the device list,
>> if it is incompatible.
>>
>> Next patches 10,11,12,13/15 adds support for Spare device. For the
>> details on how to add a spare device kindly see further below.
>> For kernel with out spare feature supported the spare device
>> is kept away. And when the kernel supports the spare device, it will
>> inhibit from mounting it. Further these patch set provides helper
>> function to pick a spare device and release a spare device back to
>> the spare device pool.
>>
>> Patch 14/15 provides function for auto replace, this is mainly
>> from the existing replace code, and in the long run I see opportunity
>> to merge these code with the replace code that is triggered from
>> the user spare.
>>
>> Last 15/15, uses all these facilities, picks a failed device and
>> triggers a auto replace in a kthread (casualty_kthread())
>>
>>
>> Progs:
>> Would need 4 patches as listed below.
>>
>>
>> Known Bug:
>>
>> As now I see below stale kmem cache during module unload. Which
>> I am digging.
>> ------
>> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on kmem_cache_close()
>> ------
>>
>> Anand Jain (10):
>>    btrfs: optimize btrfs_check_degradable() for calls outside of barrier
>>    btrfs: introduce device dynamic state transition to offline or failed
>>    btrfs: check device for critical errors and mark failed
>>    btrfs: block incompatible optional features at scan
>>    btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV
>>    btrfs: add check not to mount a spare device
>>    btrfs: support btrfs dev scan for spare device
>>    btrfs: provide framework to get and put a spare device
>>    btrfs: introduce helper functions to perform hot replace
>>    btrfs: check for failed device and hot replace
>>
>> Qu Wenruo (5):
>>    btrfs: Introduce a new function to check if all chunks a OK for
>>      degraded mount
>>    btrfs: Do per-chunk check for mount time check
>>    btrfs: Do per-chunk degraded check for remount
>>    btrfs: Allow barrier_all_devices to do per-chunk device check
>>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>>
>>   fs/btrfs/ctree.h       |   7 +-
>>   fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++
>>   fs/btrfs/dev-replace.h |   1 +
>>   fs/btrfs/disk-io.c     | 211 +++++++++++++++++++++++-------------
>>   fs/btrfs/disk-io.h     |   2 -
>>   fs/btrfs/super.c       |  20 +++-
>>   fs/btrfs/transaction.c |   3 +-
>>   fs/btrfs/volumes.c     | 283 ++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/btrfs/volumes.h     |  27 +++++
>>   9 files changed, 571 insertions(+), 99 deletions(-)
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  parent reply	other threads:[~2015-11-12 19:08 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-09 10:56 [PATCH 00/15] btrfs: Hot spare and Auto replace Anand Jain
2015-11-09 10:56 ` [PATCH 01/15] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2015-11-09 10:56 ` [PATCH 02/15] btrfs: Do per-chunk check for mount time check Anand Jain
2015-11-09 10:56 ` [PATCH 03/15] btrfs: Do per-chunk degraded check for remount Anand Jain
2015-11-09 10:56 ` [PATCH 04/15] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2015-11-09 10:56 ` [PATCH 05/15] btrfs: optimize btrfs_check_degradable() for calls outside of barrier Anand Jain
2015-11-09 10:56 ` [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2015-12-05  7:16   ` Qu Wenruo
2015-11-09 10:56 ` [PATCH 07/15] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2015-11-09 10:56 ` [PATCH 08/15] btrfs: check device for critical errors and mark failed Anand Jain
2015-11-09 10:56 ` [PATCH 09/15] btrfs: block incompatible optional features at scan Anand Jain
2015-11-09 10:56 ` [PATCH 10/15] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2015-11-09 10:56 ` [PATCH 11/15] btrfs: add check not to mount a spare device Anand Jain
2015-11-09 10:56 ` [PATCH 12/15] btrfs: support btrfs dev scan for " Anand Jain
2015-11-09 10:56 ` [PATCH 13/15] btrfs: provide framework to get and put a " Anand Jain
2015-11-09 10:56 ` [PATCH 14/15] btrfs: introduce helper functions to perform hot replace Anand Jain
2015-11-09 10:56 ` [PATCH 15/15] btrfs: check for failed device and " Anand Jain
2015-11-09 10:58 ` [PATCH 0/4] btrfs-progs: Hot spare and Auto replace Anand Jain
2015-11-09 10:58   ` [PATCH 1/4] btrfs-progs: Introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV SB flags Anand Jain
2015-11-09 10:58   ` [PATCH 2/4] btrfs-progs: Introduce btrfs spare subcommand Anand Jain
2015-11-09 10:58   ` [PATCH 3/4] btrfs-progs: add fi show for spare Anand Jain
2015-11-09 10:58   ` [PATCH 4/4] btrfs-progs: add global spare device list to filesystem show Anand Jain
2015-11-09 14:09 ` [PATCH 00/15] btrfs: Hot spare and Auto replace Austin S Hemmelgarn
2015-11-09 21:29   ` Duncan
2015-11-10 12:13     ` Austin S Hemmelgarn
2015-11-13 10:17       ` Anand Jain
2015-11-13 12:25         ` Austin S Hemmelgarn
2015-11-15 18:10         ` Christoph Anton Mitterer
2015-11-12  2:15 ` Qu Wenruo
2015-11-12  6:46   ` Duncan
2015-11-12 13:04   ` Austin S Hemmelgarn
2015-11-13  1:07     ` Qu Wenruo
2015-11-13 10:20       ` Anand Jain
2015-11-14  0:54         ` Qu Wenruo
2015-11-16 13:39           ` Austin S Hemmelgarn
2015-11-12 19:08   ` Goffredo Baroncelli [this message]
2015-11-13 10:18   ` Anand Jain
2015-11-12 19:21 ` Goffredo Baroncelli
2015-11-13 10:20   ` Anand Jain
2015-11-14 11:05     ` Goffredo Baroncelli
2015-11-16 13:41 ` Austin S Hemmelgarn
2015-11-16 22:07   ` Anand Jain
2015-11-17 12:28     ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5644E3B9.5030800@inwind.it \
    --to=kreijack@inwind.it \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.