Re: can we finally kill off CONFIG_FS_DAX_LIMITED

* Re: can we finally kill off CONFIG_FS_DAX_LIMITED
       [not found]             ` <CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com>
@ 2021-10-14 23:04               ` Jason Gunthorpe
  2021-10-15  0:22                 ` Joao Martins
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2021-10-14 23:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Gerald Schaefer, Joao Martins, Christoph Hellwig, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Linux NVDIMM, linux-s390,
	Matthew Wilcox, Alex Sierra, Kuehling, Felix, Linux MM,
	Ralph Campbell, Alistair Popple, Vishal Verma, Dave Jiang

On Tue, Aug 24, 2021 at 11:44:20AM -0700, Dan Williams wrote:

> Yes, that's along the lines of what I'm thinking. I.e don't expect
> pte_devmap() to be there in the slow path, and use the vma to check
> for DAX.

I think we should delete pte_devmap completely from gup.c.

It is doing a few things that are better done in more general ways:

1) Doing the get_dev_pagemap() stuff which should be entirely deleted
   from gup.c in favour of proper use of struct page references.

2) Denying FOLL_LONGTERM
   Once GUP has grabbed the page we can call is_zone_device_page() on
   the struct page. If true we can check page->pgmap and read some
   DENY_FOLL_LONGTERM flag from there

3) Different refcounts for pud/pmd pages

   Ideally DAX cases would not do this (ie Joao is fixing device-dax)
   but in the interm we can just loop over the PUD/PMD in all
   cases. Looping is safe for THP AFAIK. I described how this can work
   here:

   https://lore.kernel.org/all/20211013174140.GJ2744544@nvidia.com/

After that there are only two remaining uses:

4) The pud/pmd_devmap() in vm_normal_page() should just go
   away. ZONE_DEVICE memory with struct pages SHOULD be a normal
   page. This also means dropping pte_special too.

5) dev_pagemap_mapping_shift() - I don't know what this does
   but why not use the is_zone_device_page() approach from 2?

In this way ZONE_DEVICE pages can be fully normal pages with no
requirements on PTE flags.

Where have I gone wrong? :)

pud/pmd_devmap() looks a little more involved to remove, but I wonder
if we can change logic like this:

	if (pmd_trans_huge(*vmf->pmd) || pmd_devmap(*vmf->pmd)) {

Into

  if (pmd_is_page(*pmd))

? And rely on struct page based stuff as above to discern THP vs devmap?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread