All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ralph Campbell <rcampbell@nvidia.com>
Cc: Linux MM <linux-mm@kvack.org>,
	kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ira Weiny <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@nvidia.com>,
	Bharata B Rao <bharata@linux.ibm.com>, Zi Yan <ziy@nvidia.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount
Date: Mon, 5 Oct 2020 11:24:07 -0700	[thread overview]
Message-ID: <CAPcyv4gu=So5PgQU9LezhW4vUQt+paaUr1T6CAvQYjh0XzkkgQ@mail.gmail.com> (raw)
In-Reply-To: <20201001181715.17416-1-rcampbell@nvidia.com>

On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
> This is still an RFC because after looking at the pmem/dax code some
> more, I realized that the ZONE_DEVICE struct pages are being inserted
> into the process' page tables with vmf_insert_mixed() and a zero
> refcount on the ZONE_DEVICE struct page. This is sort of OK because
> insert_pfn() increments the reference count on the pgmap which is what
> prevents memunmap_pages() from freeing the struct pages and it doesn't
> check for a non-zero struct page reference count.
> But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that
> checks for a reference count == 0.
>
> // mmap() an ext4 file that is mounted -o dax.
> ext4_dax_fault()
>   ext4_dax_huge_fault()
>     dax_iomap_fault(&ext4_iomap_ops)
>       dax_iomap_pte_fault()
>         ops->iomap_begin() // ext4_iomap_begin()
>           ext4_map_blocks()
>           ext4_set_iomap()
>         dax_iomap_pfn()
>         dax_insert_entry()
>         vmf_insert_mixed(pfn)
>           __vm_insert_mixed()
>             if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
>                 !pfn_t_devmap(pfn) && pfn_t_valid(pfn))
>               insert_page()
>                 get_page(page) // XXX would trigger VM_BUG_ON_PAGE()
>                 page_add_file_rmap()
>                 set_pte_at()
>             else
>               insert_pfn()
>                 pte_mkdevmap()
>                 set_pte_at()
>
> Should pmem set the page reference count to one before inserting the
> pfn into the page tables (and decrement when removing devmap PTEs)?
> What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases?
> Where should they icrement/decrement the page reference count?
> I don't know enough about how these are used to really know what to
> do at this point. If people want me to continue to work on this series,
> I will need some guidance.

fs/dax could take the reference when inserting, but that would mean
that ext4 and xfs would need to go back to checking for 1 to be page
idle. I think that's ok because the filesystem is actually not
checking for page-idle it's checking for "get_user_pages()" idle.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Ralph Campbell <rcampbell@nvidia.com>
Cc: Linux MM <linux-mm@kvack.org>,
	kvm-ppc@vger.kernel.org, nouveau@lists.freedesktop.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Ira Weiny <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@nvidia.com>,
	Bharata B Rao <bharata@linux.ibm.com>, Zi Yan <ziy@nvidia.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount
Date: Mon, 05 Oct 2020 18:24:07 +0000	[thread overview]
Message-ID: <CAPcyv4gu=So5PgQU9LezhW4vUQt+paaUr1T6CAvQYjh0XzkkgQ@mail.gmail.com> (raw)
In-Reply-To: <20201001181715.17416-1-rcampbell@nvidia.com>

On Thu, Oct 1, 2020 at 11:17 AM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
> This is still an RFC because after looking at the pmem/dax code some
> more, I realized that the ZONE_DEVICE struct pages are being inserted
> into the process' page tables with vmf_insert_mixed() and a zero
> refcount on the ZONE_DEVICE struct page. This is sort of OK because
> insert_pfn() increments the reference count on the pgmap which is what
> prevents memunmap_pages() from freeing the struct pages and it doesn't
> check for a non-zero struct page reference count.
> But, any calls to get_page() will hit the VM_BUG_ON_PAGE() that
> checks for a reference count = 0.
>
> // mmap() an ext4 file that is mounted -o dax.
> ext4_dax_fault()
>   ext4_dax_huge_fault()
>     dax_iomap_fault(&ext4_iomap_ops)
>       dax_iomap_pte_fault()
>         ops->iomap_begin() // ext4_iomap_begin()
>           ext4_map_blocks()
>           ext4_set_iomap()
>         dax_iomap_pfn()
>         dax_insert_entry()
>         vmf_insert_mixed(pfn)
>           __vm_insert_mixed()
>             if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
>                 !pfn_t_devmap(pfn) && pfn_t_valid(pfn))
>               insert_page()
>                 get_page(page) // XXX would trigger VM_BUG_ON_PAGE()
>                 page_add_file_rmap()
>                 set_pte_at()
>             else
>               insert_pfn()
>                 pte_mkdevmap()
>                 set_pte_at()
>
> Should pmem set the page reference count to one before inserting the
> pfn into the page tables (and decrement when removing devmap PTEs)?
> What about MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA use cases?
> Where should they icrement/decrement the page reference count?
> I don't know enough about how these are used to really know what to
> do at this point. If people want me to continue to work on this series,
> I will need some guidance.

fs/dax could take the reference when inserting, but that would mean
that ext4 and xfs would need to go back to checking for 1 to be page
idle. I think that's ok because the filesystem is actually not
checking for page-idle it's checking for "get_user_pages()" idle.

  parent reply	other threads:[~2020-10-05 18:24 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 18:17 [RFC PATCH v3 0/2] mm: remove extra ZONE_DEVICE struct page refcount Ralph Campbell
2020-10-01 18:17 ` Ralph Campbell
2020-10-01 18:17 ` Ralph Campbell
2020-10-01 18:17 ` [RFC PATCH v3 1/2] ext4/xfs: add page refcount helper Ralph Campbell
2020-10-01 18:17   ` Ralph Campbell
2020-10-01 18:17   ` Ralph Campbell
2020-10-02  5:56   ` Christoph Hellwig
2020-10-02  5:56     ` Christoph Hellwig
2020-10-05 16:29     ` Ralph Campbell
2020-10-05 16:29       ` Ralph Campbell
2020-10-05 16:29       ` Ralph Campbell
2020-10-01 18:17 ` [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount Ralph Campbell
2020-10-01 18:17   ` Ralph Campbell
2020-10-01 18:17   ` Ralph Campbell
2020-10-02  5:59   ` Christoph Hellwig
2020-10-02  5:59     ` Christoph Hellwig
2020-10-05 16:40     ` Ralph Campbell
2020-10-05 16:40       ` Ralph Campbell
2020-10-05 16:40       ` Ralph Campbell
2020-10-08  5:17   ` Ram Pai
2020-10-08  5:17     ` Ram Pai
2020-10-08 16:45     ` Ralph Campbell
2020-10-08 16:45       ` Ralph Campbell
2020-10-08 16:45       ` Ralph Campbell
2020-10-05 16:56 ` [RFC PATCH v3 0/2] " Matthew Wilcox
2020-10-05 16:56   ` Matthew Wilcox
2020-10-05 18:24 ` Dan Williams [this message]
2020-10-05 18:24   ` Dan Williams
2020-10-05 18:24   ` Dan Williams
2021-05-13 13:15 ` Matthew Wilcox
2021-05-13 13:15   ` Matthew Wilcox
2021-05-13 13:15   ` [Nouveau] " Matthew Wilcox
2021-05-13 17:35   ` Ralph Campbell
2021-05-13 17:35     ` Ralph Campbell
2021-05-13 17:35     ` [Nouveau] " Ralph Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gu=So5PgQU9LezhW4vUQt+paaUr1T6CAvQYjh0XzkkgQ@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bharata@linux.ibm.com \
    --cc=bskeggs@redhat.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=paulus@ozlabs.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.