nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Christoph Hellwig <hch@lst.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Alex Sierra <alex.sierra@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	Linux MM <linux-mm@kvack.org>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>
Subject: Re: can we finally kill off CONFIG_FS_DAX_LIMITED
Date: Mon, 18 Oct 2021 20:30:45 -0300	[thread overview]
Message-ID: <20211018233045.GQ2744544@nvidia.com> (raw)
In-Reply-To: <5ca908e3-b4ad-dfef-d75f-75073d4165f7@oracle.com>

On Fri, Oct 15, 2021 at 01:22:41AM +0100, Joao Martins wrote:

> dev_pagemap_mapping_shift() does a lookup to figure out
> which order is the page table entry represents. is_zone_device_page()
> is already used to gate usage of dev_pagemap_mapping_shift(). I think
> this might be an artifact of the same issue as 3) in which PMDs/PUDs
> are represented with base pages and hence you can't do what the rest
> of the world does with:

This code is looks broken as written.

vma_address() relies on certain properties that I maybe DAX (maybe
even only FSDAX?) sets on its ZONE_DEVICE pages, and
dev_pagemap_mapping_shift() does not handle the -EFAULT return. It
will crash if a memory failure hits any other kind of ZONE_DEVICE
area.

I'm not sure the comment is correct anyhow:

		/*
		 * Unmap the largest mapping to avoid breaking up
		 * device-dax mappings which are constant size. The
		 * actual size of the mapping being torn down is
		 * communicated in siginfo, see kill_proc()
		 */
		unmap_mapping_range(page->mapping, start, size, 0);

Beacuse for non PageAnon unmap_mapping_range() does either
zap_huge_pud(), __split_huge_pmd(), or zap_huge_pmd().

Despite it's name __split_huge_pmd() does not actually split, it will
call __split_huge_pmd_locked:

	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
		goto out;
	__split_huge_pmd_locked(vma, pmd, range.start, freeze);

Which does
	if (!vma_is_anonymous(vma)) {
		old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd);

Which is a zap, not split.

So I wonder if there is a reason to use anything other than 4k here
for DAX?

> 	tk->size_shift = page_shift(compound_head(p));
> 
> ... as page_shift() would just return PAGE_SHIFT (as compound_order() is 0).

And what would be so wrong with memory failure doing this as a 4k
page?

Jason

  reply	other threads:[~2021-10-18 23:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20  5:43 can we finally kill off CONFIG_FS_DAX_LIMITED Christoph Hellwig
2021-08-20 15:41 ` Dan Williams
2021-08-20 17:42   ` Dan Williams
2021-08-20 19:03     ` Gerald Schaefer
2021-08-24 14:17     ` Joao Martins
2021-08-23 14:05 ` Gerald Schaefer
2021-08-23 19:47   ` Gerald Schaefer
2021-08-23 20:21     ` Dan Williams
2021-08-24 14:09       ` Joao Martins
2021-08-24 14:53         ` Dan Williams
2021-08-24 18:24           ` Gerald Schaefer
2021-08-24 18:44             ` Dan Williams
2021-10-14 23:04               ` Jason Gunthorpe
2021-10-15  0:22                 ` Joao Martins
2021-10-18 23:30                   ` Jason Gunthorpe [this message]
2021-10-19  4:26                     ` Dan Williams
2021-10-19 14:20                       ` Jason Gunthorpe
2021-10-19 15:20                         ` Joao Martins
2021-10-19 15:38                         ` Felix Kuehling
2021-10-19 17:38                         ` Dan Williams
2021-10-19 17:54                           ` Jason Gunthorpe
2021-08-24  6:49   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018233045.GQ2744544@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=alex.sierra@amd.com \
    --cc=apopple@nvidia.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rcampbell@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).