From: "ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-xfs <linux-xfs@vger.kernel.org>,
linux-nvdimm <linux-nvdimm@lists.01.org>,
Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
device-mapper development <dm-devel@redhat.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
david <david@fromorbit.com>, Christoph Hellwig <hch@lst.de>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@redhat.com>,
Goldwyn Rodrigues <rgoldwyn@suse.de>,
"qi.fuli@fujitsu.com" <qi.fuli@fujitsu.com>,
"y-goto@fujitsu.com" <y-goto@fujitsu.com>
Subject: RE: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()
Date: Fri, 12 Mar 2021 10:18:58 +0000 [thread overview]
Message-ID: <OSBPR01MB2920E46CBE4816CDF711E004F46F9@OSBPR01MB2920.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <CAPcyv4gn_AvT6BA7g4jLKRFODSpt7_ORowVd3KgyWxyaFG0k9g@mail.gmail.com>
> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Subject: Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure()
>
> On Mon, Mar 8, 2021 at 3:34 AM ruansy.fnst@fujitsu.com
> <ruansy.fnst@fujitsu.com> wrote:
> > > > > > 1 file changed, 8 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/memremap.h
> > > > > > b/include/linux/memremap.h index 79c49e7f5c30..0bcf2b1e20bd
> > > > > > 100644
> > > > > > --- a/include/linux/memremap.h
> > > > > > +++ b/include/linux/memremap.h
> > > > > > @@ -87,6 +87,14 @@ struct dev_pagemap_ops {
> > > > > > * the page back to a CPU accessible page.
> > > > > > */
> > > > > > vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf);
> > > > > > +
> > > > > > + /*
> > > > > > + * Handle the memory failure happens on one page. Notify
> the processes
> > > > > > + * who are using this page, and try to recover the data on
> this page
> > > > > > + * if necessary.
> > > > > > + */
> > > > > > + int (*memory_failure)(struct dev_pagemap *pgmap,
> unsigned long pfn,
> > > > > > + int flags);
> > > > > > };
> > > > >
> > > > > After the conversation with Dave I don't see the point of this.
> > > > > If there is a memory_failure() on a page, why not just call
> > > > > memory_failure()? That already knows how to find the inode and
> > > > > the filesystem can be notified from there.
> > > >
> > > > We want memory_failure() supports reflinked files. In this case,
> > > > we are not able to track multiple files from a page(this broken
> > > > page) because
> > > > page->mapping,page->index can only track one file. Thus, I
> > > > page->introduce this
> > > > ->memory_failure() implemented in pmem driver, to call
> > > > ->->corrupted_range()
> > > > upper level to upper level, and finally find out files who are
> > > > using(mmapping) this page.
> > > >
> > >
> > > I know the motivation, but this implementation seems backwards. It's
> > > already the case that memory_failure() looks up the address_space
> > > associated with a mapping. From there I would expect a new 'struct
> > > address_space_operations' op to let the fs handle the case when
> > > there are multiple address_spaces associated with a given file.
> > >
> >
> > Let me think about it. In this way, we
> > 1. associate file mapping with dax page in dax page fault;
>
> I think this needs to be a new type of association that proxies the representation
> of the reflink across all involved address_spaces.
>
> > 2. iterate files reflinked to notify `kill processes signal` by the
> > new address_space_operation;
> > 3. re-associate to another reflinked file mapping when unmmaping
> > (rmap qeury in filesystem to get the another file).
>
> Perhaps the proxy object is reference counted per-ref-link. It seems error prone
> to keep changing the association of the pfn while the reflink is in-tact.
Hi, Dan
I think my early rfc patchset was implemented in this way:
- Create a per-page 'dax-rmap tree' to store each reflinked file's (mapping, offset) when causing dax page fault.
- Mount this tree on page->zone_device_data which is not used in fsdax, so that we can iterate reflinked file mappings in memory_failure() easily.
In my understanding, the dax-rmap tree is the proxy object you mentioned. If so, I have to say, this method was rejected. Because this will cause huge overhead in some case that every dax page have one dax-rmap tree.
--
Thanks,
Ruan Shiyang.
>
> > It did not handle those dax pages are not in use, because their
> > ->mapping are not associated to any file. I didn't think it through
> > until reading your conversation. Here is my understanding: this case
> > should be handled by badblock mechanism in pmem driver. This badblock
> > mechanism will call
> > ->corrupted_range() to tell filesystem to repaire the data if possible.
>
> There are 2 types of notifications. There are badblocks discovered by the driver
> (see notify_pmem()) and there are memory_failures() signalled by the CPU
> machine-check handler, or the platform BIOS. In the case of badblocks that
> needs to be information considered by the fs block allocator to avoid /
> try-to-repair badblocks on allocate, and to allow listing damaged files that need
> repair. The memory_failure() notification needs immediate handling to tear
> down mappings to that pfn and signal processes that have consumed it with
> SIGBUS-action-required. Processes that have the poison mapped, but have not
> consumed it receive SIGBUS-action-optional.
>
> > So, we split it into two parts. And dax device and block device won't be
> mixed
> > up again. Is my understanding right?
>
> Right, it's only the filesystem that knows that the block_device and the
> dax_device alias data at the same logical offset. The requirements for sector
> error handling and page error handling are separate like
> block_device_operations and dax_operations.
>
> > But the solution above is to solve the hwpoison on one or couple
> > pages, which happens rarely(I think). Do the 'pmem remove' operation
> cause hwpoison too?
> > Call memory_failure() so many times? I havn't understood this yet.
>
> I'm working on a patch here to call memory_failure() on a wide range for the
> surprise remove of a dax_device while a filesystem might be mounted. It won't
> be efficient, but there is no other way to notify the kernel that it needs to
> immediately stop referencing a page.
next prev parent reply other threads:[~2021-03-12 10:27 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-08 10:55 [PATCH v3 00/11] fsdax: introduce fs query to support reflink Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 01/11] pagemap: Introduce ->memory_failure() Shiyang Ruan
2021-02-10 13:20 ` Christoph Hellwig
2021-03-06 20:36 ` Dan Williams
2021-03-08 3:38 ` ruansy.fnst
2021-03-08 5:23 ` Dan Williams
2021-03-08 11:34 ` ruansy.fnst
2021-03-08 18:01 ` Dan Williams
2021-03-12 10:18 ` ruansy.fnst [this message]
2021-03-19 2:17 ` ruansy.fnst
2021-03-24 2:19 ` Dan Williams
2021-03-24 7:47 ` Christoph Hellwig
2021-03-24 16:37 ` Dan Williams
2021-03-24 17:39 ` Christoph Hellwig
2021-03-24 18:00 ` Dan Williams
2021-02-08 10:55 ` [PATCH v3 02/11] blk: Introduce ->corrupted_range() for block device Shiyang Ruan
2021-02-10 13:21 ` Christoph Hellwig
2021-03-04 22:42 ` Darrick J. Wong
2021-03-05 6:10 ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 03/11] fs: Introduce ->corrupted_range() for superblock Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 04/11] block_dev: Introduce bd_corrupted_range() for block device Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 05/11] mm, fsdax: Refactor memory-failure handler for dax mapping Shiyang Ruan
2021-02-10 13:33 ` Christoph Hellwig
2021-02-17 2:56 ` Ruan Shiyang
2021-02-18 8:32 ` Christoph Hellwig
2021-02-18 8:59 ` Ruan Shiyang
2021-03-16 3:21 ` zhong jiang
2021-03-17 3:46 ` ruansy.fnst
2021-02-08 10:55 ` [PATCH v3 06/11] mm, pmem: Implement ->memory_failure() in pmem driver Shiyang Ruan
2021-02-10 13:41 ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 07/11] pmem: Implement ->corrupted_range() for " Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 08/11] dm: Introduce ->rmap() to find bdev offset Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 09/11] md: Implement ->corrupted_range() Shiyang Ruan
2021-02-08 10:55 ` [PATCH v3 10/11] xfs: Implement ->corrupted_range() for XFS Shiyang Ruan
2021-02-10 13:44 ` Christoph Hellwig
2021-02-08 10:55 ` [PATCH v3 11/11] fs/dax: Remove useless functions Shiyang Ruan
2021-02-10 13:09 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OSBPR01MB2920E46CBE4816CDF711E004F46F9@OSBPR01MB2920.jpnprd01.prod.outlook.com \
--to=ruansy.fnst@fujitsu.com \
--cc=agk@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=dm-devel@redhat.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=qi.fuli@fujitsu.com \
--cc=rgoldwyn@suse.de \
--cc=snitzer@redhat.com \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).