From: Felix Kuehling <felix.kuehling@amd.com>
To: Ralph Campbell <rcampbell@nvidia.com>,
Alex Sierra <alex.sierra@amd.com>,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, "Yang,
Philip" <Philip.Yang@amd.com>
Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
hch@lst.de, jgg@nvidia.com, jglisse@redhat.com
Subject: Re: [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount
Date: Mon, 28 Jun 2021 12:46:59 -0400 [thread overview]
Message-ID: <cecd2164-ebfd-382b-af73-992cdc4304b7@amd.com> (raw)
In-Reply-To: <7163dbb6-67b5-6eef-5772-500fd2107e5c@nvidia.com>
Am 2021-06-17 um 3:16 p.m. schrieb Ralph Campbell:
>
> On 6/17/21 8:16 AM, Alex Sierra wrote:
>> From: Ralph Campbell <rcampbell@nvidia.com>
>>
>> ZONE_DEVICE struct pages have an extra reference count that
>> complicates the
>> code for put_page() and several places in the kernel that need to
>> check the
>> reference count to see that a page is not being used (gup, compaction,
>> migration, etc.). Clean up the code so the reference count doesn't
>> need to
>> be treated specially for ZONE_DEVICE.
>>
>> v2:
>> AS: merged this patch in linux 5.11 version
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +-
>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
>> fs/dax.c | 4 +-
>> include/linux/dax.h | 2 +-
>> include/linux/memremap.h | 7 +--
>> include/linux/mm.h | 44 -----------------
>> lib/test_hmm.c | 2 +-
>> mm/internal.h | 8 +++
>> mm/memremap.c | 68 +++++++-------------------
>> mm/migrate.c | 5 --
>> mm/page_alloc.c | 3 ++
>> mm/swap.c | 45 ++---------------
>> 12 files changed, 45 insertions(+), 147 deletions(-)
>>
> I think it is great that you are picking this up and trying to revive it.
>
> However, I have a number of concerns about how it affects existing
> ZONE_DEVICE
> MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_FS_DAX users and I don't see this
> addressing them. For example, dev_dax_probe() allocates
> MEMORY_DEVICE_GENERIC
> struct pages and then:
> dev_dax_fault()
> dev_dax_huge_fault()
> __dev_dax_pte_fault()
> vmf_insert_mixed()
> which just inserts the PFN into the CPU page tables without increasing
> the page
> refcount so it is zero (whereas it was one before). But using
> get_page() will
> trigger VM_BUG_ON_PAGE() if it is enabled. There isn't any current
> notion of
> free verses allocated for these struct pages. I suppose init_page_count()
> could be called on all the struct pages in dev_dax_probe() to fix that
> though.
Hi Ralph,
For DEVICE_GENERIC pages free_zone_device_page doesn't do anything. So
I'm not sure what the reference counting is good for in this case.
Alex is going to add free_zone_device_page support for DEVICE_GENERIC
pages (patch 8 of this series). However, even then, it only does
anything if there is an actual call to put_page. Where would that call
come from in the dev_dax driver case?
>
> I'm even less clear about how to fix MEMORY_DEVICE_FS_DAX. File
> systems have clear
> allocate and free states for backing storage but there are the
> complications with
> the page cache references, etc. to consider. The >1 to 1 reference
> count seems to
> be used to tell when a page is idle (no I/O, reclaim scanners) rather
> than free
> (not allocated to any file) but I'm not 100% sure about that since I
> don't really
> understand all the issues around why a file system needs to have a DAX
> mount option
> besides knowing that the storage block size has to be a multiple of
> the page size.
The only thing that happens in free_zone_device_page for FS_DAX pages is
wake_up_var(&page->_refcount). I guess, whoever is waiting for this
wake-up will need to be prepared to see a refcount 0 instead of 1 now. I
see these callers that would need to be updated:
./fs/ext4/inode.c: error = ___wait_var_event(&page->_refcount,
./fs/ext4/inode.c- atomic_read(&page->_refcount) == 1,
./fs/ext4/inode.c- TASK_INTERRUPTIBLE, 0, 0,
./fs/ext4/inode.c- ext4_wait_dax_page(ei));
--
./fs/fuse/dax.c: return ___wait_var_event(&page->_refcount,
./fs/fuse/dax.c- atomic_read(&page->_refcount) == 1,
TASK_INTERRUPTIBLE,
./fs/fuse/dax.c- 0, 0, fuse_wait_dax_page(inode));
--
./fs/xfs/xfs_file.c: return ___wait_var_event(&page->_refcount,
./fs/xfs/xfs_file.c- atomic_read(&page->_refcount) == 1,
TASK_INTERRUPTIBLE,
./fs/xfs/xfs_file.c- 0, 0, xfs_wait_dax_page(inode));
Regarding your page-cache comment, doesn't DAX mean that those file
pages are not in the page cache?
Regards,
Felix
next prev parent reply other threads:[~2021-06-28 16:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-17 15:16 [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_* Alex Sierra
2021-06-17 15:16 ` [PATCH v3 1/8] ext4/xfs: add page refcount helper Alex Sierra
2021-06-17 16:11 ` Darrick J. Wong
2021-06-17 23:52 ` Dave Chinner
2021-06-19 16:14 ` Sierra Guiza, Alejandro (Alex)
2021-06-19 18:31 ` Alex Sierra
2021-06-17 15:16 ` [PATCH v3 2/8] mm: remove extra ZONE_DEVICE struct page refcount Alex Sierra
2021-06-17 19:16 ` Ralph Campbell
2021-06-28 16:46 ` Felix Kuehling [this message]
2021-06-30 0:23 ` Ralph Campbell
2021-06-17 15:17 ` [PATCH v3 3/8] kernel: resource: lookup_resource as exported symbol Alex Sierra
2021-06-17 15:17 ` [PATCH v3 4/8] drm/amdkfd: add SPM support for SVM Alex Sierra
2021-06-17 15:17 ` [PATCH v3 5/8] drm/amdkfd: generic type as sys mem on migration to ram Alex Sierra
2021-06-17 15:17 ` [PATCH v3 6/8] include/linux/mm.h: helpers to check zone device generic type Alex Sierra
2021-06-17 15:17 ` [PATCH v3 7/8] mm: add generic type support to migrate_vma helpers Alex Sierra
2021-06-17 15:17 ` [PATCH v3 8/8] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages Alex Sierra
2021-06-17 15:56 ` [PATCH v3 0/8] Support DEVICE_GENERIC memory in migrate_vma_* Sierra Guiza, Alejandro (Alex)
2021-06-20 14:14 ` Theodore Ts'o
2021-06-23 21:49 ` Felix Kuehling
2021-06-24 5:30 ` Christoph Hellwig
2021-06-24 15:08 ` Felix Kuehling
2021-07-16 15:07 ` Theodore Y. Ts'o
2021-07-16 22:14 ` Felix Kuehling
2021-07-17 19:54 ` Sierra Guiza, Alejandro (Alex)
2021-07-23 22:46 ` Sierra Guiza, Alejandro (Alex)
2021-07-30 19:02 ` Felix Kuehling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cecd2164-ebfd-382b-af73-992cdc4304b7@amd.com \
--to=felix.kuehling@amd.com \
--cc=Philip.Yang@amd.com \
--cc=akpm@linux-foundation.org \
--cc=alex.sierra@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=rcampbell@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).