nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	 Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	 Christian Borntraeger <borntraeger@de.ibm.com>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	 linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: can we finally kill off CONFIG_FS_DAX_LIMITED
Date: Tue, 24 Aug 2021 11:44:20 -0700	[thread overview]
Message-ID: <CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com> (raw)
In-Reply-To: <20210824202449.19d524b5@thinkpad>

On Tue, Aug 24, 2021 at 11:25 AM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Tue, 24 Aug 2021 07:53:22 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Tue, Aug 24, 2021 at 7:10 AM Joao Martins <joao.m.martins@oracle.com> wrote:
> > >
> > >
> > >
> > > On 8/23/21 9:21 PM, Dan Williams wrote:
> > > > On Mon, Aug 23, 2021 at 12:47 PM Gerald Schaefer
> > > > <gerald.schaefer@linux.ibm.com> wrote:
> > > >>
> > > >> On Mon, 23 Aug 2021 16:05:46 +0200
> > > >> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
> > > >>
> > > >>> On Fri, 20 Aug 2021 07:43:40 +0200
> > > >>> Christoph Hellwig <hch@lst.de> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> looking at the recent ZONE_DEVICE related changes we still have a
> > > >>>> horrible maze of different code paths.  I already suggested to
> > > >>>> depend on ARCH_HAS_PTE_SPECIAL for ZONE_DEVICE there, which all modern
> > > >>
> > > >> Oh, we do have PTE_SPECIAL, actually that took away the last free bit
> > > >> in the pte. So, if there is a chance that ZONE_DEVICE would depend
> > > >> on PTE_SPECIAL instead of PTE_DEVMAP, we might be back in the game
> > > >> and get rid of that CONFIG_FS_DAX_LIMITED.
> > > >
> > > > So PTE_DEVMAP is primarily there to coordinate the
> > > > get_user_pages_fast() path, and even there it's usage can be
> > > > eliminated in favor of PTE_SPECIAL. I started that effort [1], but
> > > > need to rebase on new notify_failure infrastructure coming from Ruan
> > > > [2]. So I think you are not in the critical path until I can get the
> > > > PTE_DEVMAP requirement out of your way.
> > > >
> > >
> > > Isn't the implicit case that PTE_SPECIAL means that you
> > > aren't supposed to get a struct page back? The gup path bails out on
> > > pte_special() case. And in the fact in this thread that you quote:
> > >
> > > > [1]: https://lore.kernel.org/r/161604050866.1463742.7759521510383551055.stgit@dwillia2-desk3.amr.corp.intel.com
> > >
> > > (...) we were speaking about[1.1] using that same special bit to block
> > > longterm gup for fs-dax (while allowing it device-dax which does support it).
> > >
> > > [1.1] https://lore.kernel.org/nvdimm/a8c41028-c7f5-9b93-4721-b8ddcf2427da@oracle.com/
> > >
> > > Or maybe that's what you mean for this particular case of FS_DAX_LIMITED. Most _special*()
> > > cases in mm match _devmap*() as far I've experimented in the past with PMD/PUD and dax
> > > (prior to [1.1]).
> > >
> > > I am just wondering would you differentiate the case where you have metadata for the
> > > !FS_DAX_LIMITED case in {gup,gup_fast} path in light of removing PTE_DEVMAP. I would have
> > > thought of checking that a pgmap exists for the pfn (without grabbing a ref to it).
> >
> > So I should clarify, I'm not proposing removing PTE_DEVMAP, I'm
> > proposing relaxing its need for architectures that can not afford the
> > PTE bit. Those architectures would miss out on get_user_pages_fast()
> > for devmap pages. Then, once PTE_SPECIAL kicks get_user_pages() to the
> > slow path, get_dev_pagemap() is used to detect devmap pages.
>
> Thanks, I was also a bit confused, but I think I got it now. Does that mean
> that you also plan to relax the pte_devmap(pte) check in follow_page_pte(),
> before calling get_dev_pagemap() in the slow path? So that it could also be
> called for pte_special(), maybe with additional vma_is_dax() check. And then
> rely on get_dev_pagemap() finding the pages for those "very special" PTEs that
> actually would have struct pages (at least for s390 DCSS with DAX)?

Yes, that's along the lines of what I'm thinking. I.e don't expect
pte_devmap() to be there in the slow path, and use the vma to check
for DAX.

  reply	other threads:[~2021-08-24 18:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20  5:43 can we finally kill off CONFIG_FS_DAX_LIMITED Christoph Hellwig
2021-08-20 15:41 ` Dan Williams
2021-08-20 17:42   ` Dan Williams
2021-08-20 19:03     ` Gerald Schaefer
2021-08-24 14:17     ` Joao Martins
2021-08-23 14:05 ` Gerald Schaefer
2021-08-23 19:47   ` Gerald Schaefer
2021-08-23 20:21     ` Dan Williams
2021-08-24 14:09       ` Joao Martins
2021-08-24 14:53         ` Dan Williams
2021-08-24 18:24           ` Gerald Schaefer
2021-08-24 18:44             ` Dan Williams [this message]
2021-10-14 23:04               ` Jason Gunthorpe
2021-10-15  0:22                 ` Joao Martins
2021-10-18 23:30                   ` Jason Gunthorpe
2021-10-19  4:26                     ` Dan Williams
2021-10-19 14:20                       ` Jason Gunthorpe
2021-10-19 15:20                         ` Joao Martins
2021-10-19 15:38                         ` Felix Kuehling
2021-10-19 17:38                         ` Dan Williams
2021-10-19 17:54                           ` Jason Gunthorpe
2021-08-24  6:49   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iFeVDVPn6uc=aKsyUvkiu3-fK-N16iJVZQ3N8oT00hWA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=borntraeger@de.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).