linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: bo.li.liu@oracle.com
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
Date: Wed, 2 Aug 2017 00:14:27 +0200	[thread overview]
Message-ID: <dc65edc8-82bd-ac1b-fa90-6076bea13df9@inwind.it> (raw)
In-Reply-To: <20170801172411.GE26357@localhost.localdomain>

On 2017-08-01 19:24, Liu Bo wrote:
> On Tue, Aug 01, 2017 at 07:42:14PM +0200, Goffredo Baroncelli wrote:
>> Hi Liu,
>>
>> On 2017-08-01 18:14, Liu Bo wrote:
>>> This aims to fix write hole issue on btrfs raid5/6 setup by adding a
>>> separate disk as a journal (aka raid5/6 log), so that after unclean
>>> shutdown we can make sure data and parity are consistent on the raid
>>> array by replaying the journal.
>>>
>>
>> it would be possible to have more information ?
>> - what is logged ? data, parity or data + parity ?
> 
> Patch 5 has more details(sorry for not making it clear that in the
> cover letter).
> 
> So both data and parity are logged so that while replaying the journal
> everything is written to whichever disk it should be written to.

It is correct reading this as: all data is written two times ? Or are logged only the stripes involved by a RMW cycle (i.e. if a stripe is fully written, the log is bypassed )?
> 
>> - in the past I thought that it would be sufficient to log only the stripe position involved by a RMW cycle, and then start a scrub on these stripes in case of an unclean shutdown: do you think that it is feasible ?
> 
> An unclean shutdown causes inconsistence between data and parity, so
> scrub won't help as it's not able to tell which one (data or parity)
> is valid
Scrub compares data against its checksum; so it knows if the data is correct. If no disk is lost, a scrub process is sufficient/needed to rebuild the parity/data.

The problem born when after "an unclean shutdown" a disk failure happens. But these  are *two* distinct failures. These together break the BTRFS raid5 redundancy. But if you run a scrub process between these two failures, the btrfs raid5 redundancy is still effective.


> 
> With nodatacow, we do overwrite, so RMW during unclean shutdown is not safe.
> With datacow, we don't do overwrite, but the following situation may happen,
> say we have a raid5 setup with 3 disks, the stripe length is 64k, so
> 
> 1) write 64K  --> now the raid layout is
> [64K data + 64K random + 64K parity]
> 2) write another 64K --> now the raid layout after RMW is
> [64K 1)'s data + 64K 2)'s data + 64K new parity]
> 
> If unclean shutdown occurs before 2) finishes, then parity may be
> corrupted and then 1)'s data may be recovered wrongly if the disk
> which holds 1)'s data is offline.
> 
>> - does this journal disk also host other btrfs log ?
>>
> 
> No, purely data/parity and some associated metadata.
> 
> Thanks,
> 
> -liubo
> 
>>> The idea and the code are similar to the write-through mode of md
>>> raid5-cache, so ppl(partial parity log) is also feasible to implement.
>>> (If you've been familiar with md, you may find this patch set is
>>> boring to read...)
>>>
>>> Patch 1-3 are about adding a log disk, patch 5-8 are the main part of
>>> the implementation, the rest patches are improvements and bugfixes,
>>> eg. readahead for recovery, checksum.
>>>
>>> Two btrfs-progs patches are required to play with this patch set, one
>>> is to enhance 'btrfs device add' to add a disk as raid5/6 log with the
>>> option '-L', the other is to teach 'btrfs-show-super' to show
>>> %journal_tail.
>>>
>>> This is currently based on 4.12-rc3.
>>>
>>> The patch set is tagged with RFC, and comments are always welcome,
>>> thanks.
>>>
>>> Known limitations:
>>> - Deleting a log device is not implemented yet.
>>>
>>>
>>> Liu Bo (14):
>>>   Btrfs: raid56: add raid56 log via add_dev v2 ioctl
>>>   Btrfs: raid56: do not allocate chunk on raid56 log
>>>   Btrfs: raid56: detect raid56 log on mount
>>>   Btrfs: raid56: add verbose debug
>>>   Btrfs: raid56: add stripe log for raid5/6
>>>   Btrfs: raid56: add reclaim support
>>>   Btrfs: raid56: load r5log
>>>   Btrfs: raid56: log recovery
>>>   Btrfs: raid56: add readahead for recovery
>>>   Btrfs: raid56: use the readahead helper to get page
>>>   Btrfs: raid56: add csum support
>>>   Btrfs: raid56: fix error handling while adding a log device
>>>   Btrfs: raid56: initialize raid5/6 log after adding it
>>>   Btrfs: raid56: maintain IO order on raid5/6 log
>>>
>>>  fs/btrfs/ctree.h                |   16 +-
>>>  fs/btrfs/disk-io.c              |   16 +
>>>  fs/btrfs/ioctl.c                |   48 +-
>>>  fs/btrfs/raid56.c               | 1429 ++++++++++++++++++++++++++++++++++-----
>>>  fs/btrfs/raid56.h               |   82 +++
>>>  fs/btrfs/transaction.c          |    2 +
>>>  fs/btrfs/volumes.c              |   56 +-
>>>  fs/btrfs/volumes.h              |    7 +-
>>>  include/uapi/linux/btrfs.h      |    3 +
>>>  include/uapi/linux/btrfs_tree.h |    4 +
>>>  10 files changed, 1487 insertions(+), 176 deletions(-)
>>>
>>
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


  reply	other threads:[~2017-08-01 22:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-01 16:14 [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes Liu Bo
2017-08-01 16:14 ` [PATCH 01/14] Btrfs: raid56: add raid56 log via add_dev v2 ioctl Liu Bo
2017-08-02 19:25   ` Nikolay Borisov
2017-08-01 16:14 ` [PATCH 02/14] Btrfs: raid56: do not allocate chunk on raid56 log Liu Bo
2017-08-01 16:14 ` [PATCH 03/14] Btrfs: raid56: detect raid56 log on mount Liu Bo
2017-08-01 16:14 ` [PATCH 04/14] Btrfs: raid56: add verbose debug Liu Bo
2017-08-01 16:14 ` [PATCH 05/14] Btrfs: raid56: add stripe log for raid5/6 Liu Bo
2017-08-01 16:14 ` [PATCH 06/14] Btrfs: raid56: add reclaim support Liu Bo
2017-08-01 16:14 ` [PATCH 07/14] Btrfs: raid56: load r5log Liu Bo
2017-08-01 16:14 ` [PATCH 08/14] Btrfs: raid56: log recovery Liu Bo
2017-08-01 16:14 ` [PATCH 09/14] Btrfs: raid56: add readahead for recovery Liu Bo
2017-08-01 16:14 ` [PATCH 10/14] Btrfs: raid56: use the readahead helper to get page Liu Bo
2017-08-01 16:14 ` [PATCH 11/14] Btrfs: raid56: add csum support Liu Bo
2017-08-01 16:14 ` [PATCH 12/14] Btrfs: raid56: fix error handling while adding a log device Liu Bo
2017-08-01 16:14 ` [PATCH 13/14] Btrfs: raid56: initialize raid5/6 log after adding it Liu Bo
2017-08-01 16:14 ` [PATCH 14/14] Btrfs: raid56: maintain IO order on raid5/6 log Liu Bo
2017-08-01 16:14 ` [PATCH 1/2] Btrfs-progs: add option to add raid5/6 log device Liu Bo
2017-08-01 16:14 ` [PATCH 2/2] Btrfs-progs: introduce super_journal_tail to inspect-dump-super Liu Bo
2017-08-01 17:25 ` [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes Roman Mamedov
2017-08-01 17:03   ` Liu Bo
2017-08-01 17:39   ` Austin S. Hemmelgarn
2017-08-01 17:07     ` Liu Bo
2017-08-02 18:47     ` Chris Mason
2018-05-03 19:16       ` Goffredo Baroncelli
2017-08-01 17:28 ` Hugo Mills
2017-08-01 16:56   ` Liu Bo
2017-08-01 18:15     ` Hugo Mills
2017-08-01 17:42 ` Goffredo Baroncelli
2017-08-01 17:24   ` Liu Bo
2017-08-01 22:14     ` Goffredo Baroncelli [this message]
2017-08-02 17:57       ` Liu Bo
2017-08-02 20:41         ` Goffredo Baroncelli
2017-08-02 20:27           ` Liu Bo
2017-08-03  4:02             ` Duncan
2017-08-03  4:40               ` Goffredo Baroncelli
2017-08-23 15:28             ` Chris Murphy
2017-08-23 15:47               ` Austin S. Hemmelgarn
2017-08-25 13:53               ` Goffredo Baroncelli
2017-08-01 21:00 ` Christoph Anton Mitterer
2017-08-01 22:24   ` Goffredo Baroncelli
     [not found] <CAL5DHTFA8G5fq=BQ7N7qQimesjygKTiBR2qZF4YBRAsAjB_L5Q@mail.gmail.com>
2019-07-30 16:20 ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc65edc8-82bd-ac1b-fa90-6076bea13df9@inwind.it \
    --to=kreijack@inwind.it \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).