linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>, linux-nvdimm <linux-nvdimm@lists.01.org>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH 05/11] filesystem-dax: set page->index
Date: Wed, 30 May 2018 10:13:56 +0200	[thread overview]
Message-ID: <20180530081356.mohu6fx22fzd7fxb@quack2.suse.cz> (raw)
In-Reply-To: <CAPcyv4gUM7Br3XONOVkNCg-mvR5U8QLq+OOc54cLpP61LXhJXA@mail.gmail.com>

On Tue 29-05-18 18:38:41, Dan Williams wrote:
> On Wed, May 23, 2018 at 1:40 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 22-05-18 07:39:57, Dan Williams wrote:
> >> In support of enabling memory_failure() handling for filesystem-dax
> >> mappings, set ->index to the pgoff of the page. The rmap implementation
> >> requires ->index to bound the search through the vma interval tree. The
> >> index is set and cleared at dax_associate_entry() and
> >> dax_disassociate_entry() time respectively.
> >>
> >> Cc: Jan Kara <jack@suse.cz>
> >> Cc: Christoph Hellwig <hch@lst.de>
> >> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> >> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >>  fs/dax.c |   11 ++++++++---
> >>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/dax.c b/fs/dax.c
> >> index aaec72ded1b6..2e4682cd7c69 100644
> >> --- a/fs/dax.c
> >> +++ b/fs/dax.c
> >> @@ -319,18 +319,22 @@ static unsigned long dax_radix_end_pfn(void *entry)
> >>       for (pfn = dax_radix_pfn(entry); \
> >>                       pfn < dax_radix_end_pfn(entry); pfn++)
> >>
> >> -static void dax_associate_entry(void *entry, struct address_space *mapping)
> >> +static void dax_associate_entry(void *entry, struct address_space *mapping,
> >> +             struct vm_area_struct *vma, unsigned long address)
> >>  {
> >> -     unsigned long pfn;
> >> +     unsigned long size = dax_entry_size(entry), pfn, index;
> >> +     int i = 0;
> >>
> >>       if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
> >>               return;
> >>
> >> +     index = linear_page_index(vma, address & ~(size - 1));
> >>       for_each_mapped_pfn(entry, pfn) {
> >>               struct page *page = pfn_to_page(pfn);
> >>
> >>               WARN_ON_ONCE(page->mapping);
> >>               page->mapping = mapping;
> >> +             page->index = index + i++;
> >>       }
> >>  }
> >
> > Hum, this just made me think: How is this going to work with XFS reflink?
> > In fact is not the page->mapping association already broken by XFS reflink?
> > Because with reflink we can have two or more mappings pointing to the same
> > physical blocks (i.e., pages in DAX case)...
> 
> Good question. I assume we are ok in the non-DAX reflink case because
> rmap of failing / poison pages is only relative to the specific page
> cache page for a given inode in the reflink. However, DAX would seem
> to break this because we only get one shared 'struct page' for all
> possible mappings of the physical file block. I think this means for
> iterating over the rmap of "where is this page mapped" would require
> iterating over the other "sibling" inodes that know about the given
> physical file block.
> 
> As far as I can see reflink+dax would require teaching kernel code
> paths that ->mapping may not be a singular relationship. Something
> along the line's of what Jerome was presenting at LSF to create a
> special value to indicate, "call back into the filesystem (or the page
> owner)" to perform this operation.
> 
> In the meantime the kernel crashes when userspace accesses poisoned
> pmem via DAX. I assume that reworking rmap for the dax+reflink case
> should not block dax poison handling? Yell if you disagree.

The thing is, up until get_user_pages() vs truncate series ("fs, dax: use
page->mapping to warn if truncate collides with a busy page" in
particular), DAX was perfectly fine with reflinks since we never needed
page->mapping. Now this series adds even page->index dependency which makes
life for rmap with reflinks even harder. So if nothing else we should at
least make sure reflinked filesystems cannot be mounted with dax mount
option for now and seriously start looking into how to implement rmap with
reflinked files for DAX because this noticeably reduces its usefulness.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2018-05-30  8:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 14:39 [PATCH 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-05-22 14:39 ` [PATCH 01/11] device-dax: convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-05-22 14:39 ` [PATCH 02/11] device-dax: cleanup vm_fault de-reference chains Dan Williams
2018-05-22 14:39 ` [PATCH 03/11] device-dax: enable page_mapping() Dan Williams
2018-05-23  9:03   ` Jan Kara
2018-05-30 19:54   ` kbuild test robot
2018-05-22 14:39 ` [PATCH 04/11] device-dax: set page->index Dan Williams
2018-05-22 14:39 ` [PATCH 05/11] filesystem-dax: " Dan Williams
2018-05-23  8:40   ` Jan Kara
2018-05-30  1:38     ` Dan Williams
2018-05-30  8:13       ` Jan Kara [this message]
2018-05-30 23:21         ` Dan Williams
2018-05-31 10:08           ` Jan Kara
2018-05-31 21:49             ` Dan Williams
2018-05-22 14:40 ` [PATCH 06/11] filesystem-dax: perform __dax_invalidate_mapping_entry() under the page lock Dan Williams
2018-05-23  9:35   ` Jan Kara
2018-05-23 13:50     ` Dan Williams
2018-05-22 14:40 ` [PATCH 07/11] mm, madvise_inject_error: fix page count leak Dan Williams
2018-05-23  4:19   ` Naoya Horiguchi
2018-05-24 20:55     ` Dan Williams
2018-05-22 14:40 ` [PATCH 08/11] x86, memory_failure: introduce {set, clear}_mce_nospec() Dan Williams
2018-05-22 14:40 ` [PATCH 09/11] mm, memory_failure: pass page size to kill_proc() Dan Williams
2018-05-23  6:41   ` Naoya Horiguchi
2018-05-22 14:40 ` [PATCH 10/11] mm, memory_failure: teach memory_failure() about dev_pagemap pages Dan Williams
2018-05-23  6:48   ` Naoya Horiguchi
2018-05-22 14:40 ` [PATCH 11/11] libnvdimm, pmem: restore page attributes when clearing errors Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530081356.mohu6fx22fzd7fxb@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).