linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/13] Support DEVICE_GENERIC memory in migrate_vma_*
@ 2021-07-17 19:21 Alex Sierra
  2021-07-17 19:21 ` [PATCH v4 01/13] ext4/xfs: add page refcount helper Alex Sierra
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Alex Sierra @ 2021-07-17 19:21 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse

v1:
AMD is building a system architecture for the Frontier supercomputer with a
coherent interconnect between CPUs and GPUs. This hardware architecture allows
the CPUs to coherently access GPU device memory. We have hardware in our labs
and we are working with our partner HPE on the BIOS, firmware and software
for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver looks
it up with lookup_resource and registers it with devmap as MEMORY_DEVICE_GENERIC
using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the migrate_vma_*
helpers so we can support page-based migration in our unified memory allocations,
while also supporting CPU access to those pages.

This patch series makes a few changes to make MEMORY_DEVICE_GENERIC pages behave
correctly in the migrate_vma_* helpers. We are looking for feedback about this
approach. If we're close, what's needed to make our patches acceptable upstream?
If we're not close, any suggestions how else to achieve what we are trying to do
(i.e. page migration and coherent CPU access to VRAM)?

This work is based on HMM and our SVM memory manager that was recently upstreamed
to Dave Airlie's drm-next branch
https://cgit.freedesktop.org/drm/drm/log/?h=drm-next
On top of that we did some rework of our VRAM management for migrations to remove
some incorrect assumptions, allow partially successful migrations and GPU memory
mappings that mix pages in VRAM and system memory.
https://lore.kernel.org/dri-devel/20210527205606.2660-6-Felix.Kuehling@amd.com/T/#r996356015e295780eb50453e7dbd5d0d68b47cbc

v2:
This patch series version has merged "[RFC PATCH v3 0/2]
mm: remove extra ZONE_DEVICE struct page refcount" patch series made by
Ralph Campbell. It also applies at the top of these series, our changes
to support device generic type in migration_vma helpers.
This has been tested in systems with device memory that has coherent
access by CPU.

Also addresses the following feedback made in v1:
- Isolate in one patch kernel/resource.c modification, based
on Christoph's feedback.
- Add helpers check for generic and private type to avoid
duplicated long lines.

v3:
- Include cover letter from v1.
- Rename dax_layout_is_idle_page func to dax_page_unused in patch
ext4/xfs: add page refcount helper.

v4:
- Add support for zone device generic type in lib/test_hmm and
tool/testing/selftest/vm/hmm-tests.
- Add missing page refcount helper to fuse/dax.c. This was included in
one of Ralph Campbell's patches.

Patches 1-2 Rebased Ralph Campbell's ZONE_DEVICE page refcounting patches.

Patches 4-5 are for context to show how we are looking up the SPM 
memory and registering it with devmap.

Patches 3,6-8 are the changes we are trying to upstream or rework to 
make them acceptable upstream.

Patches 9-13 add ZONE_DEVICE Generic type support into the hmm test.

Alex Sierra (11):
  kernel: resource: lookup_resource as exported symbol
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: generic type as sys mem on migration to ram
  include/linux/mm.h: helpers to check zone device generic type
  mm: add generic type support to migrate_vma helpers
  mm: call pgmap->ops->page_free for DEVICE_GENERIC pages
  lib: test_hmm add ioctl to get zone device type
  lib: test_hmm add module param for zone device type
  lib: add support for device generic type in test_hmm
  tools: update hmm-test to support device generic type
  tools: update test_hmm script to support SP config

Ralph Campbell (2):
  ext4/xfs: add page refcount helper
  mm: remove extra ZONE_DEVICE struct page refcount

 arch/powerpc/kvm/book3s_hv_uvmem.c       |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  20 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |   2 +-
 fs/dax.c                                 |   8 +-
 fs/ext4/inode.c                          |   5 +-
 fs/fuse/dax.c                            |   4 +-
 fs/xfs/xfs_file.c                        |   4 +-
 include/linux/dax.h                      |  10 +
 include/linux/memremap.h                 |   7 +-
 include/linux/mm.h                       |  52 +----
 kernel/resource.c                        |   2 +-
 lib/test_hmm.c                           | 230 +++++++++++++++--------
 lib/test_hmm_uapi.h                      |  16 ++
 mm/internal.h                            |   8 +
 mm/memremap.c                            |  69 ++-----
 mm/migrate.c                             |  25 +--
 mm/page_alloc.c                          |   3 +
 mm/swap.c                                |  45 +----
 tools/testing/selftests/vm/hmm-tests.c   | 142 ++++++++++++--
 tools/testing/selftests/vm/test_hmm.sh   |  20 +-
 20 files changed, 402 insertions(+), 272 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-07-30 19:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-17 19:21 [PATCH v4 00/13] Support DEVICE_GENERIC memory in migrate_vma_* Alex Sierra
2021-07-17 19:21 ` [PATCH v4 01/13] ext4/xfs: add page refcount helper Alex Sierra
2021-07-17 19:21 ` [PATCH v4 02/13] mm: remove extra ZONE_DEVICE struct page refcount Alex Sierra
2021-07-17 19:21 ` [PATCH v4 03/13] kernel: resource: lookup_resource as exported symbol Alex Sierra
2021-07-19  9:16   ` Christoph Hellwig
2021-07-17 19:21 ` [PATCH v4 04/13] drm/amdkfd: add SPM support for SVM Alex Sierra
2021-07-17 19:21 ` [PATCH v4 05/13] drm/amdkfd: generic type as sys mem on migration to ram Alex Sierra
2021-07-19  9:17   ` Christoph Hellwig
2021-07-17 19:21 ` [PATCH v4 06/13] include/linux/mm.h: helpers to check zone device generic type Alex Sierra
2021-07-19 20:47   ` Zeng, Oak
2021-07-17 19:21 ` [PATCH v4 07/13] mm: add generic type support to migrate_vma helpers Alex Sierra
2021-07-17 19:21 ` [PATCH v4 08/13] mm: call pgmap->ops->page_free for DEVICE_GENERIC pages Alex Sierra
2021-07-19  9:18   ` Christoph Hellwig
2021-07-17 19:21 ` [PATCH v4 09/13] lib: test_hmm add ioctl to get zone device type Alex Sierra
2021-07-17 19:21 ` [PATCH v4 10/13] lib: test_hmm add module param for " Alex Sierra
2021-07-22 12:23   ` Jason Gunthorpe
2021-07-22 16:59     ` Sierra Guiza, Alejandro (Alex)
2021-07-22 17:26       ` Jason Gunthorpe
2021-07-28 23:45         ` Sierra Guiza, Alejandro (Alex)
2021-07-30 19:11           ` Felix Kuehling
2021-07-17 19:21 ` [PATCH v4 11/13] lib: add support for device generic type in test_hmm Alex Sierra
2021-07-17 19:21 ` [PATCH v4 12/13] tools: update hmm-test to support device generic type Alex Sierra
2021-07-17 19:21 ` [PATCH v4 13/13] tools: update test_hmm script to support SP config Alex Sierra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).