All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Alex Sierra <alex.sierra@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	Linux MM <linux-mm@kvack.org>,
	Ralph Campbell <rcampbell@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>
Subject: Re: can we finally kill off CONFIG_FS_DAX_LIMITED
Date: Thu, 14 Oct 2021 20:04:39 -0300	[thread overview]
Message-ID: <20211014230439.GA3592864@nvidia.com> (raw)
In-Reply-To: <CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com>

On Tue, Aug 24, 2021 at 11:44:20AM -0700, Dan Williams wrote:

> Yes, that's along the lines of what I'm thinking. I.e don't expect
> pte_devmap() to be there in the slow path, and use the vma to check
> for DAX.

I think we should delete pte_devmap completely from gup.c.

It is doing a few things that are better done in more general ways:

1) Doing the get_dev_pagemap() stuff which should be entirely deleted
   from gup.c in favour of proper use of struct page references.

2) Denying FOLL_LONGTERM
   Once GUP has grabbed the page we can call is_zone_device_page() on
   the struct page. If true we can check page->pgmap and read some
   DENY_FOLL_LONGTERM flag from there

3) Different refcounts for pud/pmd pages

   Ideally DAX cases would not do this (ie Joao is fixing device-dax)
   but in the interm we can just loop over the PUD/PMD in all
   cases. Looping is safe for THP AFAIK. I described how this can work
   here:

   https://lore.kernel.org/all/20211013174140.GJ2744544@nvidia.com/

After that there are only two remaining uses:

4) The pud/pmd_devmap() in vm_normal_page() should just go
   away. ZONE_DEVICE memory with struct pages SHOULD be a normal
   page. This also means dropping pte_special too.

5) dev_pagemap_mapping_shift() - I don't know what this does
   but why not use the is_zone_device_page() approach from 2?

In this way ZONE_DEVICE pages can be fully normal pages with no
requirements on PTE flags.

Where have I gone wrong? :)

pud/pmd_devmap() looks a little more involved to remove, but I wonder
if we can change logic like this:

	if (pmd_trans_huge(*vmf->pmd) || pmd_devmap(*vmf->pmd)) {

Into

  if (pmd_is_page(*pmd))

? And rely on struct page based stuff as above to discern THP vs devmap?

Thanks,
Jason

  reply	other threads:[~2021-10-14 23:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20  5:43 can we finally kill off CONFIG_FS_DAX_LIMITED Christoph Hellwig
2021-08-20 15:41 ` Dan Williams
2021-08-20 15:41   ` Dan Williams
2021-08-20 17:42   ` Dan Williams
2021-08-20 17:42     ` Dan Williams
2021-08-20 19:03     ` Gerald Schaefer
2021-08-24 14:17     ` Joao Martins
2021-08-23 14:05 ` Gerald Schaefer
2021-08-23 19:47   ` Gerald Schaefer
2021-08-23 20:21     ` Dan Williams
2021-08-23 20:21       ` Dan Williams
2021-08-24 14:09       ` Joao Martins
2021-08-24 14:53         ` Dan Williams
2021-08-24 14:53           ` Dan Williams
2021-08-24 18:24           ` Gerald Schaefer
2021-08-24 18:44             ` Dan Williams
2021-08-24 18:44               ` Dan Williams
2021-10-14 23:04               ` Jason Gunthorpe [this message]
2021-10-15  0:22                 ` Joao Martins
2021-10-18 23:30                   ` Jason Gunthorpe
2021-10-19  4:26                     ` Dan Williams
2021-10-19 14:20                       ` Jason Gunthorpe
2021-10-19 15:20                         ` Joao Martins
2021-10-19 15:38                         ` Felix Kuehling
2021-10-19 17:38                         ` Dan Williams
2021-10-19 17:54                           ` Jason Gunthorpe
2021-08-24  6:49   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211014230439.GA3592864@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=alex.sierra@amd.com \
    --cc=apopple@nvidia.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rcampbell@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.