linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: "ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>
To: "ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	device-mapper development <dm-devel@redhat.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	david <david@fromorbit.com>, Christoph Hellwig <hch@lst.de>,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	"qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>,
	"y-goto@fujitsu.com" <y-goto@fujitsu.com>
Subject: RE: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()
Date: Fri, 19 Mar 2021 02:17:49 +0000	[thread overview]
Message-ID: <OSBPR01MB29208779955B49F84D857F80F4689@OSBPR01MB2920.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <OSBPR01MB2920E46CBE4816CDF711E004F46F9@OSBPR01MB2920.jpnprd01.prod.outlook.com>



> -----Original Message-----
> From: ruansy.fnst@fujitsu.com <ruansy.fnst@fujitsu.com>
> Subject: RE: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()
> > > > > >
> > > > > > After the conversation with Dave I don't see the point of this.
> > > > > > If there is a memory_failure() on a page, why not just call
> > > > > > memory_failure()? That already knows how to find the inode and
> > > > > > the filesystem can be notified from there.
> > > > >
> > > > > We want memory_failure() supports reflinked files.  In this
> > > > > case, we are not able to track multiple files from a page(this
> > > > > broken
> > > > > page) because
> > > > > page->mapping,page->index can only track one file.  Thus, I
> > > > > page->introduce this
> > > > > ->memory_failure() implemented in pmem driver, to call
> > > > > ->->corrupted_range()
> > > > > upper level to upper level, and finally find out files who are
> > > > > using(mmapping) this page.
> > > > >
> > > >
> > > > I know the motivation, but this implementation seems backwards.
> > > > It's already the case that memory_failure() looks up the
> > > > address_space associated with a mapping. From there I would expect
> > > > a new 'struct address_space_operations' op to let the fs handle
> > > > the case when there are multiple address_spaces associated with a given
> file.
> > > >
> > >
> > > Let me think about it.  In this way, we
> > >     1. associate file mapping with dax page in dax page fault;
> >
> > I think this needs to be a new type of association that proxies the
> > representation of the reflink across all involved address_spaces.
> >
> > >     2. iterate files reflinked to notify `kill processes signal` by the
> > >           new address_space_operation;
> > >     3. re-associate to another reflinked file mapping when unmmaping
> > >         (rmap qeury in filesystem to get the another file).
> >
> > Perhaps the proxy object is reference counted per-ref-link. It seems
> > error prone to keep changing the association of the pfn while the reflink is
> in-tact.
> Hi, Dan
> 
> I think my early rfc patchset was implemented in this way:
>  - Create a per-page 'dax-rmap tree' to store each reflinked file's (mapping,
> offset) when causing dax page fault.
>  - Mount this tree on page->zone_device_data which is not used in fsdax, so
> that we can iterate reflinked file mappings in memory_failure() easily.
> In my understanding, the dax-rmap tree is the proxy object you mentioned.  If
> so, I have to say, this method was rejected. Because this will cause huge
> overhead in some case that every dax page have one dax-rmap tree.
> 

Hi, Dan

How do you think about this?  I am still confused.  Could you give me some advice?


--
Thanks,
Ruan Shiyang.

> 
> --
> Thanks,
> Ruan Shiyang.
> >
> > > It did not handle those dax pages are not in use, because their
> > > ->mapping are not associated to any file.  I didn't think it through
> > > until reading your conversation.  Here is my understanding: this
> > > case should be handled by badblock mechanism in pmem driver.  This
> > > badblock mechanism will call
> > > ->corrupted_range() to tell filesystem to repaire the data if possible.
> >
> > There are 2 types of notifications. There are badblocks discovered by
> > the driver (see notify_pmem()) and there are memory_failures()
> > signalled by the CPU machine-check handler, or the platform BIOS. In
> > the case of badblocks that needs to be information considered by the
> > fs block allocator to avoid / try-to-repair badblocks on allocate, and
> > to allow listing damaged files that need repair. The memory_failure()
> > notification needs immediate handling to tear down mappings to that
> > pfn and signal processes that have consumed it with
> > SIGBUS-action-required. Processes that have the poison mapped, but have not
> consumed it receive SIGBUS-action-optional.
> >
> > > So, we split it into two parts.  And dax device and block device
> > > won't be
> > mixed
> > > up again.   Is my understanding right?
> >
> > Right, it's only the filesystem that knows that the block_device and
> > the dax_device alias data at the same logical offset. The requirements
> > for sector error handling and page error handling are separate like
> > block_device_operations and dax_operations.
> >
> > > But the solution above is to solve the hwpoison on one or couple
> > > pages, which happens rarely(I think).  Do the 'pmem remove'
> > > operation
> > cause hwpoison too?
> > > Call memory_failure() so many times?  I havn't understood this yet.
> >
> > I'm working on a patch here to call memory_failure() on a wide range
> > for the surprise remove of a dax_device while a filesystem might be
> > mounted. It won't be efficient, but there is no other way to notify
> > the kernel that it needs to immediately stop referencing a page.
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an
> email to linux-nvdimm-leave@lists.01.org
> 

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2021-03-19  2:18 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 10:55 [PATCH v3 00/11] fsdax: introduce fs query to support reflink Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 01/11] pagemap: Introduce ->memory_failure() Shiyang Ruan
2021-02-10 13:20   ` Christoph Hellwig
2021-03-06 20:36   ` Dan Williams
2021-03-08  3:38     ` ruansy.fnst
2021-03-08  5:23       ` Dan Williams
2021-03-08 11:34         ` ruansy.fnst
2021-03-08 18:01           ` Dan Williams
2021-03-12 10:18             ` ruansy.fnst
2021-03-19  2:17               ` ruansy.fnst [this message]
2021-03-24  2:19                 ` Dan Williams
2021-03-24  7:47                   ` Christoph Hellwig
2021-03-24 16:37                     ` Dan Williams
2021-03-24 17:39                       ` Christoph Hellwig
2021-03-24 18:00                         ` Dan Williams
2021-02-08 10:55 ` [PATCH v3 02/11] blk: Introduce ->corrupted_range() for block device Shiyang Ruan
2021-02-10 13:21   ` Christoph Hellwig
2021-03-04 22:42     ` Darrick J. Wong
2021-03-05  6:10       ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 03/11] fs: Introduce ->corrupted_range() for superblock Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 04/11] block_dev: Introduce bd_corrupted_range() for block device Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 05/11] mm, fsdax: Refactor memory-failure handler for dax mapping Shiyang Ruan
2021-02-10 13:33   ` Christoph Hellwig
2021-02-17  2:56     ` Ruan Shiyang
2021-02-18  8:32       ` Christoph Hellwig
2021-02-18  8:59         ` Ruan Shiyang
2021-03-16  3:21   ` zhong jiang
2021-03-17  3:46     ` ruansy.fnst
2021-02-08 10:55 ` [PATCH v3 06/11] mm, pmem: Implement ->memory_failure() in pmem driver Shiyang Ruan
2021-02-10 13:41   ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 07/11] pmem: Implement ->corrupted_range() for " Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 08/11] dm: Introduce ->rmap() to find bdev offset Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 09/11] md: Implement ->corrupted_range() Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 10/11] xfs: Implement ->corrupted_range() for XFS Shiyang Ruan
2021-02-10 13:44   ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 11/11] fs/dax: Remove useless functions Shiyang Ruan
2021-02-10 13:09   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OSBPR01MB29208779955B49F84D857F80F4689@OSBPR01MB2920.jpnprd01.prod.outlook.com \
    --to=ruansy.fnst@fujitsu.com \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=rgoldwyn@suse.de \
    --cc=snitzer@redhat.com \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).