linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "jianchao.wang" <jianchao.w.wang@oracle.com>
To: Bob Liu <bob.liu@oracle.com>, linux-block@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	martin.petersen@oracle.com, shirley.ma@oracle.com,
	allison.henderson@oracle.com, david@fromorbit.com,
	darrick.wong@oracle.com, hch@infradead.org, adilger@dilger.ca
Subject: Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry
Date: Tue, 19 Feb 2019 09:29:16 +0800	[thread overview]
Message-ID: <5053beac-658e-d6a7-bcf8-050e0590f18d@oracle.com> (raw)
In-Reply-To: <c2b6ec5c-ad8a-6f69-c8b9-8e264cc6aa0d@oracle.com>



On 2/18/19 4:08 PM, jianchao.wang wrote:
> Hi Bob
> 
> On 2/13/19 5:50 PM, Bob Liu wrote:
>> Motivation:
>> When fs data/metadata checksum mismatch, lower block devices may have other
>> correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
>> decides that the metadata is garbage, today it will shut down the entire
>> filesystem without trying any of the other mirrors.  This is a severe
>> loss of service, and we propose these patches to have XFS try harder to
>> avoid failure.
>>
>> This patch prototype this mirror retry idea by:
>> * Adding @nr_mirrors to struct request_queue which is similar as
>>   blk_queue_nonrot(), filesystem can grab device request queue and check max
>>   mirrors this block device has.
>>   Helper functions were also added to get/set the nr_mirrors.
>>
>> * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
>> in order to support stacked layer case.
> 
> Why does we need a bitmap to know which underlying device has been tried ?
> For example, the following scenario,
> 
>                     md8
>                    / | \
>                sda sdb sdc
> 
> If the the raid read the data from sda and fs check and find the data is corrupted.
> Then we may just need to let raid1 know that the data is from sda. Then based on this
> hint, raid1 could handle it with handle_read_error to try other replica and fix the
> error.

This doesn't work.
The md raid1 can only see IO success or failure, so fix_read_error won't fix this.
Sorry for the noise.

Thanks
Jianchao

> 
> If this is feasible, we just need to modify the bio as following and needn't add any
> bytes in it.
> 
> struct bio {
>     ...
>     union {
>         unsigned short bi_write_hint;
>         unsigned short bi_read_hint;
>     }
>     ...
> }
> 
> Thanks
> Jianchao
>>
>> * Modify md/raid1 to support this retry feature.
>>
>> * Adapter xfs to use this feature.
>>   If the read verify fails, we loop over the available mirrors and retry the read.
>>
>> * Rewrite retried read
>>   When the read verification fails, but the retry succeedes
>>   write the buffer back to correct the bad mirror
>>
>> * Add tracepoints and logging to alternate device retry.
>>   This patch adds new log entries and trace points to the alternate device retry
>>   error path.
>>
>> Changes v2:
>> - No more reuse bi_write_hint
>> - Stacked layer support(see patch 4/9)
>> - Other feedback fix
>>
>> Allison Henderson (5):
>>   Add b_alt_retry to xfs_buf
>>   xfs: Add b_rd_hint to xfs_buf
>>   xfs: Add device retry
>>   xfs: Rewrite retried read
>>   xfs: Add tracepoints and logging to alternate device retry
>>
>> Bob Liu (4):
>>   block: add nr_mirrors to request_queue
>>   block: add rd_hint to bio and request
>>   md:raid1: set mirrors correctly
>>   md:raid1: rd_hint support and consider stacked layer case
>>
>>  Documentation/block/biodoc.txt |   3 +
>>  block/bio.c                    |   1 +
>>  block/blk-core.c               |   4 ++
>>  block/blk-merge.c              |   6 ++
>>  block/blk-settings.c           |  24 +++++++
>>  block/bounce.c                 |   1 +
>>  drivers/md/raid1.c             | 123 ++++++++++++++++++++++++++++++++-
>>  fs/xfs/xfs_buf.c               |  58 +++++++++++++++-
>>  fs/xfs/xfs_buf.h               |  14 ++++
>>  fs/xfs/xfs_trace.h             |   6 +-
>>  include/linux/blk_types.h      |   1 +
>>  include/linux/blkdev.h         |   4 ++
>>  include/linux/types.h          |   3 +
>>  13 files changed, 244 insertions(+), 4 deletions(-)
>>
> 

  reply	other threads:[~2019-02-19  1:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  9:50 [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue Bob Liu
2019-02-13 10:26   ` Andreas Dilger
2019-02-13 16:04   ` Theodore Y. Ts'o
2019-02-14  5:57     ` Bob Liu
2019-02-18 17:56       ` Theodore Y. Ts'o
2019-02-13  9:50 ` [RFC PATCH v2 2/9] block: add rd_hint to bio and request Bob Liu
2019-02-13 16:18   ` Jens Axboe
2019-02-14  6:10     ` Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 3/9] md:raid1: set mirrors correctly Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 4/9] md:raid1: rd_hint support and consider stacked layer case Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 5/9] Add b_alt_retry to xfs_buf Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 6/9] xfs: Add b_rd_hint " Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 7/9] xfs: Add device retry Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 8/9] xfs: Rewrite retried read Bob Liu
2019-02-13  9:50 ` [RFC PATCH v2 9/9] xfs: Add tracepoints and logging to alternate device retry Bob Liu
2019-02-18  8:08 ` [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror " jianchao.wang
2019-02-19  1:29   ` jianchao.wang [this message]
2019-02-18 21:31 ` Dave Chinner
2019-02-19  2:55   ` Darrick J. Wong
2019-02-19  3:33     ` Dave Chinner
2019-02-28 14:22   ` Bob Liu
2019-02-28 21:49     ` Dave Chinner
2019-03-03  2:37       ` Bob Liu
2019-03-03 23:18         ` Dave Chinner
2019-02-28 23:28     ` Andreas Dilger
2019-03-01 14:14       ` Bob Liu
2019-03-03 23:45       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5053beac-658e-d6a7-bcf8-050e0590f18d@oracle.com \
    --to=jianchao.w.wang@oracle.com \
    --cc=adilger@dilger.ca \
    --cc=allison.henderson@oracle.com \
    --cc=bob.liu@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=shirley.ma@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).