All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org,
	darrick.wong@oracle.com, hch@lst.de, song@kernel.org,
	rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com
Subject: Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink
Date: Wed, 2 Dec 2020 15:12:20 +0800	[thread overview]
Message-ID: <e0aa187f-e124-1ddc-0f5a-6a8c41a3dc66@cn.fujitsu.com> (raw)
In-Reply-To: <20201129224723.GG2842436@dread.disaster.area>

Hi Dave,

On 2020/11/30 上午6:47, Dave Chinner wrote:
> On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote:
>> 
>> The call trace is like this:
>>   memory_failure()
>>     pgmap->ops->memory_failure()   => pmem_pgmap_memory_failure()
>>      gendisk->fops->block_lost()   => pmem_block_lost() or
>>                                           md_blk_block_lost()
>>       sb->s_ops->storage_lost()    => xfs_fs_storage_lost()
>>        xfs_rmap_query_range()
>>         xfs_storage_lost_helper()
>>          mf_recover_controller->recover_fn => \
>>                              memory_failure_dev_pagemap_kill_procs()
>>
>> The collect_procs() and kill_procs() are moved into a callback which
>> is passed from memory_failure() to xfs_storage_lost_helper().  So we
>> can call it when a file assocaited is found, instead of creating a
>> file list and iterate it.
>>
>> The fsdax & reflink support for XFS is not contained in this patchset.
> 
> This looks promising - the overall architecture is a lot more
> generic and less dependent on knowing about memory, dax or memory
> failures. A few comments that I think would further improve
> understanding the patchset and the implementation:

Thanks for your kindly comment.  It gives me confidence.

> 
> - the order of the patches is inverted. It should start with a
>    single patch introducing the mf_recover_controller structure for
>    callbacks, then introduce pgmap->ops->memory_failure, then
>    ->block_lost, then the pmem and md implementations of ->block
>    list, then ->storage_lost and the XFS implementations of
>    ->storage_lost.

Yes, it will be easier to understand the patchset in this order.

But I have something unsure: for example, I introduce ->memory_failure() 
firstly, but the implementation of ->memory_failure() needs to call 
->block_lost() which is supposed to be introduced in the next patch. So, 
I am not sure the code is supposed to be what in the implementation of 
->memory_failure() in pmem?  To avoid this situation, I committed the 
patches in the inverted order: lowest level first, then its caller, and 
then caller's caller.

I am trying to sort out the order.  How about this:
  Patch i.
    Introduce ->memory_failure()
       - just introduce interface, without implementation
  Patch i++.
    Introduce ->block_lost()
       - introduce interface and implement ->memory_failure()
          in pmem, so that it can call ->block_lost()
  Patch i++.
    (similar with above, skip...)

> 
> - I think the names "block_lost" and "storage_lost" are misleading.
>    It's more like a "media failure" or a general "data corruption"
>    event at a specific physical location. The data may not be "lost"
>    but only damaged, so we might be able to recover from it without
>    "losing" anything. Hence I think they could be better named,
>    perhaps just "->corrupt_range"

'corrupt' sounds better.  (I'm not good at naming functions...)

> 
> - need to pass a {offset,len} pair through the chain, not just a
>    single offset. This will allow other types of devices to report
>    different ranges of failures, from a single sector to an entire
>    device.

Yes, it's better to add the length.  I restrictively thought that 
memory-failure on pmem should affect one single page at one time.

> 
> - I'm not sure that passing the mf_recover_controller structure
>    through the corruption event chain is the right thing to do here.
>    A block device could generate this storage failure callback if it
>    detects an unrecoverable error (e.g. during a MD media scrub or
>    rebuild/resilver failure) and in that case we don't have PFNs or
>    memory device failure functions to perform.
> 
>    IOWs, I think the action that is taken needs to be independent of
>    the source that generated the error. Even for a pmem device, we
>    can be using the page cache, so it may be possible to recover the
>    pmem error by writing the cached page (if it exists) back over the
>    pmem.
> 
>    Hence I think that the recover function probably needs to be moved
>    to the address space ops, because what we do to recover from the
>    error is going to be dependent on type of mapping the filesystem
>    is using. If it's a DAX mapping, we call back into a generic DAX
>    function that does the vma walk and process kill functions. If it
>    is a page cache mapping, then if the page is cached then we can
>    try to re-write it to disk to fix the bad data, otherwise we treat
>    it like a writeback error and report it on the next
>    write/fsync/close operation done on that file.
> 
>    This gets rid of the mf_recover_controller altogether and allows
>    the interface to be used by any sort of block device for any sort
>    of bottom-up reporting of media/device failures.

Moving the recover function to the address_space ops looks a better 
idea. But I think that the error handler for page cache mapping is 
finished well in memory-failure.  The memory-failure is also reused to 
handles anonymous page.  If we move the recover function to 
address_space ops, I think we also need to refactor the existing handler 
for page cache mapping, which may affect anonymous page handling.  This 
makes me confused...


I rewrote the call trace:
memory_failure()
  * dax mapping case
  pgmap->ops->memory_failure()          =>
                                    pmem_pgmap_memory_failure()
   gendisk->fops->block_corrupt_range() =>
                                    - pmem_block_corrupt_range()
                                    - md_blk_block_corrupt_range()
    sb->s_ops->storage_currupt_range()  =>
                                    xfs_fs_storage_corrupt_range()
     xfs_rmap_query_range()
      xfs_storage_lost_helper()
       mapping->a_ops->corrupt_range()  =>
                                    xfs_dax_aops.xfs_dax_corrupt_range
        memory_failure_dev_pagemap_kill_procs()

  * page cache mapping case
  mapping->a_ops->corrupt_range()       =>
                                    xfs_address_space_operations.xfs_xxx
   memory_failure_generic_kill_procs()

It's rough and not completed yet.  Hope for your comment.

-- 
Thanks,
Ruan Shiyang.

> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
	<linux-nvdimm@lists.01.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-raid@vger.kernel.org>,
	<darrick.wong@oracle.com>, <dan.j.williams@intel.com>,
	<hch@lst.de>, <song@kernel.org>, <rgoldwyn@suse.de>,
	<qi.fuli@fujitsu.com>, <y-goto@fujitsu.com>
Subject: Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink
Date: Wed, 2 Dec 2020 15:12:20 +0800	[thread overview]
Message-ID: <e0aa187f-e124-1ddc-0f5a-6a8c41a3dc66@cn.fujitsu.com> (raw)
In-Reply-To: <20201129224723.GG2842436@dread.disaster.area>

Hi Dave,

On 2020/11/30 上午6:47, Dave Chinner wrote:
> On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote:
>> 
>> The call trace is like this:
>>   memory_failure()
>>     pgmap->ops->memory_failure()   => pmem_pgmap_memory_failure()
>>      gendisk->fops->block_lost()   => pmem_block_lost() or
>>                                           md_blk_block_lost()
>>       sb->s_ops->storage_lost()    => xfs_fs_storage_lost()
>>        xfs_rmap_query_range()
>>         xfs_storage_lost_helper()
>>          mf_recover_controller->recover_fn => \
>>                              memory_failure_dev_pagemap_kill_procs()
>>
>> The collect_procs() and kill_procs() are moved into a callback which
>> is passed from memory_failure() to xfs_storage_lost_helper().  So we
>> can call it when a file assocaited is found, instead of creating a
>> file list and iterate it.
>>
>> The fsdax & reflink support for XFS is not contained in this patchset.
> 
> This looks promising - the overall architecture is a lot more
> generic and less dependent on knowing about memory, dax or memory
> failures. A few comments that I think would further improve
> understanding the patchset and the implementation:

Thanks for your kindly comment.  It gives me confidence.

> 
> - the order of the patches is inverted. It should start with a
>    single patch introducing the mf_recover_controller structure for
>    callbacks, then introduce pgmap->ops->memory_failure, then
>    ->block_lost, then the pmem and md implementations of ->block
>    list, then ->storage_lost and the XFS implementations of
>    ->storage_lost.

Yes, it will be easier to understand the patchset in this order.

But I have something unsure: for example, I introduce ->memory_failure() 
firstly, but the implementation of ->memory_failure() needs to call 
->block_lost() which is supposed to be introduced in the next patch. So, 
I am not sure the code is supposed to be what in the implementation of 
->memory_failure() in pmem?  To avoid this situation, I committed the 
patches in the inverted order: lowest level first, then its caller, and 
then caller's caller.

I am trying to sort out the order.  How about this:
  Patch i.
    Introduce ->memory_failure()
       - just introduce interface, without implementation
  Patch i++.
    Introduce ->block_lost()
       - introduce interface and implement ->memory_failure()
          in pmem, so that it can call ->block_lost()
  Patch i++.
    (similar with above, skip...)

> 
> - I think the names "block_lost" and "storage_lost" are misleading.
>    It's more like a "media failure" or a general "data corruption"
>    event at a specific physical location. The data may not be "lost"
>    but only damaged, so we might be able to recover from it without
>    "losing" anything. Hence I think they could be better named,
>    perhaps just "->corrupt_range"

'corrupt' sounds better.  (I'm not good at naming functions...)

> 
> - need to pass a {offset,len} pair through the chain, not just a
>    single offset. This will allow other types of devices to report
>    different ranges of failures, from a single sector to an entire
>    device.

Yes, it's better to add the length.  I restrictively thought that 
memory-failure on pmem should affect one single page at one time.

> 
> - I'm not sure that passing the mf_recover_controller structure
>    through the corruption event chain is the right thing to do here.
>    A block device could generate this storage failure callback if it
>    detects an unrecoverable error (e.g. during a MD media scrub or
>    rebuild/resilver failure) and in that case we don't have PFNs or
>    memory device failure functions to perform.
> 
>    IOWs, I think the action that is taken needs to be independent of
>    the source that generated the error. Even for a pmem device, we
>    can be using the page cache, so it may be possible to recover the
>    pmem error by writing the cached page (if it exists) back over the
>    pmem.
> 
>    Hence I think that the recover function probably needs to be moved
>    to the address space ops, because what we do to recover from the
>    error is going to be dependent on type of mapping the filesystem
>    is using. If it's a DAX mapping, we call back into a generic DAX
>    function that does the vma walk and process kill functions. If it
>    is a page cache mapping, then if the page is cached then we can
>    try to re-write it to disk to fix the bad data, otherwise we treat
>    it like a writeback error and report it on the next
>    write/fsync/close operation done on that file.
> 
>    This gets rid of the mf_recover_controller altogether and allows
>    the interface to be used by any sort of block device for any sort
>    of bottom-up reporting of media/device failures.

Moving the recover function to the address_space ops looks a better 
idea. But I think that the error handler for page cache mapping is 
finished well in memory-failure.  The memory-failure is also reused to 
handles anonymous page.  If we move the recover function to 
address_space ops, I think we also need to refactor the existing handler 
for page cache mapping, which may affect anonymous page handling.  This 
makes me confused...


I rewrote the call trace:
memory_failure()
  * dax mapping case
  pgmap->ops->memory_failure()          =>
                                    pmem_pgmap_memory_failure()
   gendisk->fops->block_corrupt_range() =>
                                    - pmem_block_corrupt_range()
                                    - md_blk_block_corrupt_range()
    sb->s_ops->storage_currupt_range()  =>
                                    xfs_fs_storage_corrupt_range()
     xfs_rmap_query_range()
      xfs_storage_lost_helper()
       mapping->a_ops->corrupt_range()  =>
                                    xfs_dax_aops.xfs_dax_corrupt_range
        memory_failure_dev_pagemap_kill_procs()

  * page cache mapping case
  mapping->a_ops->corrupt_range()       =>
                                    xfs_address_space_operations.xfs_xxx
   memory_failure_generic_kill_procs()

It's rough and not completed yet.  Hope for your comment.

-- 
Thanks,
Ruan Shiyang.

> 
> Cheers,
> 
> Dave.
> 



  reply	other threads:[~2020-12-02  7:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23  0:41 [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink Shiyang Ruan
2020-11-23  0:41 ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 1/6] fs: introduce ->storage_lost() for memory-failure Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 2/6] blk: introduce ->block_lost() to handle memory-failure Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 3/6] md: implement ->block_lost() for memory-failure Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 4/6] pagemap: introduce ->memory_failure() Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 5/6] mm, fsdax: refactor dax handler in memory-failure Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-23  0:41 ` [RFC PATCH v2 6/6] fsdax: remove useless (dis)associate functions Shiyang Ruan
2020-11-23  0:41   ` Shiyang Ruan
2020-11-29 22:47 ` [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink Dave Chinner
2020-11-29 22:47   ` Dave Chinner
2020-12-02  7:12   ` Ruan Shiyang [this message]
2020-12-02  7:12     ` Ruan Shiyang
2020-12-06 22:55     ` Dave Chinner
2020-12-06 22:55       ` Dave Chinner
2020-12-14 20:58 ` Jane Chu
2020-12-14 20:58   ` Jane Chu
2020-12-15 11:58   ` Ruan Shiyang
2020-12-15 11:58     ` Ruan Shiyang
2020-12-15 19:05     ` Jane Chu
2020-12-15 19:05       ` Jane Chu
2020-12-15 23:10       ` Dave Chinner
2020-12-15 23:10         ` Dave Chinner
2020-12-16  2:46         ` Darrick J. Wong
2020-12-16  2:46           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0aa187f-e124-1ddc-0f5a-6a8c41a3dc66@cn.fujitsu.com \
    --to=ruansy.fnst@cn.fujitsu.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=rgoldwyn@suse.de \
    --cc=song@kernel.org \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.