All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Hellstrom, Thomas" <thomas.hellstrom@intel.com>
To: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Zeng,  Oak" <oak.zeng@intel.com>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"Welty, Brian" <brian.welty@intel.com>,
	"airlied@gmail.com" <airlied@gmail.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Subject: Re: [PATCH 4/5] drm/xe: Helper to populate a userptr or hmmptr
Date: Mon, 18 Mar 2024 11:53:54 +0000	[thread overview]
Message-ID: <8931f427c533ecbc3db202b54364572e86a2d1fc.camel@intel.com> (raw)
In-Reply-To: <20240314033553.1379444-5-oak.zeng@intel.com>

Hi, Oak.


On Wed, 2024-03-13 at 23:35 -0400, Oak Zeng wrote:
> Add a helper function xe_hmm_populate_range to populate
> a a userptr or hmmptr range. This functions calls hmm_range_fault
> to read CPU page tables and populate all pfns/pages of this
> virtual address range.
> 
> If the populated page is system memory page, dma-mapping is performed
> to get a dma-address which can be used later for GPU to access pages.
> 
> If the populated page is device private page, we calculate the dpa (
> device physical address) of the page.
> 
> The dma-address or dpa is then saved in userptr's sg table. This is
> prepare work to replace the get_user_pages_fast code in userptr code
> path. The helper function will also be used to populate hmmptr later.
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> Co-developed-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> Cc: Brian Welty <brian.welty@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile |   3 +-
>  drivers/gpu/drm/xe/xe_hmm.c | 213
> ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_hmm.h |  12 ++
>  3 files changed, 227 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
>  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h

I mostly agree with Matt's review comments on this patch. Some
additional below.

> 
> diff --git a/drivers/gpu/drm/xe/Makefile
> b/drivers/gpu/drm/xe/Makefile
> index 840467080e59..29dcbc938b01 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -143,7 +143,8 @@ xe-y += xe_bb.o \
>  	xe_wait_user_fence.o \
>  	xe_wa.o \
>  	xe_wopcm.o \
> -	xe_svm_devmem.o
> +	xe_svm_devmem.o \
> +	xe_hmm.o
>  
>  # graphics hardware monitoring (HWMON) support
>  xe-$(CONFIG_HWMON) += xe_hwmon.o
> diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> b/drivers/gpu/drm/xe/xe_hmm.c
> new file mode 100644
> index 000000000000..c45c2447d386
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/mmu_notifier.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/memremap.h>
> +#include <linux/swap.h>
> +#include <linux/mm.h>
> +#include "xe_hmm.h"
> +#include "xe_vm.h"
> +
> +/**
> + * mark_range_accessed() - mark a range is accessed, so core mm
> + * have such information for memory eviction or write back to
> + * hard disk
> + *
> + * @range: the range to mark
> + * @write: if write to this range, we mark pages in this range
> + * as dirty
> + */
> +static void mark_range_accessed(struct hmm_range *range, bool write)

Some of the static function names aren't really unique enough not to
stand in the way for a future core function name clash. Please consider
using an xe_ prefix in such cases. It will also make backtraces easier
to follow.


> +{
> +	struct page *page;
> +	u64 i, npages;
> +
> +	npages = ((range->end - 1) >> PAGE_SHIFT) - (range->start >>
> PAGE_SHIFT) + 1;
> +	for (i = 0; i < npages; i++) {
> +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> +		if (write) {
> +			lock_page(page);
> +			set_page_dirty(page);
> +			unlock_page(page);

Could be using set_page_dirty_lock() here.

/Thomas


> +		}
> +		mark_page_accessed(page);
> +	}
> +}
> +
> +/**
> + * build_sg() - build a scatter gather table for all the physical
> pages/pfn
> + * in a hmm_range. dma-address is save in sg table and will be used
> to program
> + * GPU page table later.
> + *
> + * @xe: the xe device who will access the dma-address in sg table
> + * @range: the hmm range that we build the sg table from. range-
> >hmm_pfns[]
> + * has the pfn numbers of pages that back up this hmm address range.
> + * @st: pointer to the sg table.
> + * @write: whether we write to this range. This decides dma map
> direction
> + * for system pages. If write we map it bi-diretional; otherwise
> + * DMA_TO_DEVICE
> + *
> + * All the contiguous pfns will be collapsed into one entry in
> + * the scatter gather table. This is for the convenience of
> + * later on operations to bind address range to GPU page table.
> + *
> + * The dma_address in the sg table will later be used by GPU to
> + * access memory. So if the memory is system memory, we need to
> + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> + * is GPU local memory (of the GPU who is going to access memory),
> + * we need gpu dpa (device physical address), and there is no need
> + * of dma-mapping.
> + *
> + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> + * memory. Add this when you support p2p
> + *
> + * This function allocates the storage of the sg table. It is
> + * caller's responsibility to free it calling sg_free_table.
> + *
> + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> + */
> +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> +			     struct sg_table *st, bool write)
> +{
> +	struct device *dev = xe->drm.dev;
> +	struct scatterlist *sg;
> +	u64 i, npages;
> +
> +	sg = NULL;
> +	st->nents = 0;
> +	npages = ((range->end - 1) >> PAGE_SHIFT) - (range->start >>
> PAGE_SHIFT) + 1;
> +
> +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> +		return -ENOMEM;
> +
> +	for (i = 0; i < npages; i++) {
> +		struct page *page;
> +		unsigned long addr;
> +		struct xe_mem_region *mr;
> +
> +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> +		if (is_device_private_page(page)) {
> +			mr = page_to_mem_region(page);
> +			addr = vram_pfn_to_dpa(mr, range-
> >hmm_pfns[i]);
> +		} else {
> +			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> +					write ? DMA_BIDIRECTIONAL :
> DMA_TO_DEVICE);
> +		}
> +
> +		if (sg && (addr == (sg_dma_address(sg) + sg-
> >length))) {
> +			sg->length += PAGE_SIZE;
> +			sg_dma_len(sg) += PAGE_SIZE;
> +			continue;
> +		}
> +
> +		sg =  sg ? sg_next(sg) : st->sgl;
> +		sg_dma_address(sg) = addr;
> +		sg_dma_len(sg) = PAGE_SIZE;
> +		sg->length = PAGE_SIZE;
> +		st->nents++;
> +	}
> +
> +	sg_mark_end(sg);
> +	return 0;
> +}
> +
> +/**
> + * xe_hmm_populate_range() - Populate physical pages of a virtual
> + * address range
> + *
> + * @vma: vma has information of the range to populate. only vma
> + * of userptr and hmmptr type can be populated.
> + * @hmm_range: pointer to hmm_range struct. hmm_rang->hmm_pfns
> + * will hold the populated pfns.
> + * @write: populate pages with write permission
> + *
> + * This function populate the physical pages of a virtual
> + * address range. The populated physical pages is saved in
> + * userptr's sg table. It is similar to get_user_pages but call
> + * hmm_range_fault.
> + *
> + * This function also read mmu notifier sequence # (
> + * mmu_interval_read_begin), for the purpose of later
> + * comparison (through mmu_interval_read_retry).
> + *
> + * This must be called with mmap read or write lock held.
> + *
> + * This function allocates the storage of the userptr sg table.
> + * It is caller's responsibility to free it calling sg_free_table.
> + *
> + * returns: 0 for succuss; negative error no on failure
> + */
> +int xe_hmm_populate_range(struct xe_vma *vma, struct hmm_range
> *hmm_range,
> +						bool write)
> +{
> +	unsigned long timeout =
> +		jiffies +
> msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> +	struct xe_userptr_vma *userptr_vma;
> +	struct xe_userptr *userptr;
> +	u64 start = vma->gpuva.va.addr;
> +	u64 end = start + vma->gpuva.va.range;
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +	u64 npages;
> +	int ret;
> +
> +	userptr_vma = to_userptr_vma(vma);
> +	userptr = &userptr_vma->userptr;
> +	mmap_assert_locked(userptr->notifier.mm);
> +
> +	npages = ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> 1;
> +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> +	if (unlikely(!pfns))
> +		return -ENOMEM;
> +
> +	if (write)
> +		flags |= HMM_PFN_REQ_WRITE;
> +
> +	memset64((u64 *)pfns, (u64)flags, npages);
> +	hmm_range->hmm_pfns = pfns;
> +	hmm_range->notifier_seq = mmu_interval_read_begin(&userptr-
> >notifier);
> +	hmm_range->notifier = &userptr->notifier;
> +	hmm_range->start = start;
> +	hmm_range->end = end;
> +	hmm_range->pfn_flags_mask = HMM_PFN_REQ_FAULT |
> HMM_PFN_REQ_WRITE;
> +	/**
> +	 * FIXME:
> +	 * Set the the dev_private_owner can prevent hmm_range_fault
> to fault
> +	 * in the device private pages owned by caller. See function
> +	 * hmm_vma_handle_pte. In multiple GPU case, this should be
> set to the
> +	 * device owner of the best migration destination. e.g.,
> device0/vm0
> +	 * has a page fault, but we have determined the best
> placement of
> +	 * the fault address should be on device1, we should set
> below to
> +	 * device1 instead of device0.
> +	 */
> +	hmm_range->dev_private_owner = vm->xe->drm.dev;
> +
> +	while (true) {
> +		ret = hmm_range_fault(hmm_range);
> +		if (time_after(jiffies, timeout))
> +			break;
> +
> +		if (ret == -EBUSY)
> +			continue;
> +		break;
> +	}
> +
> +	if (ret)
> +		goto free_pfns;
> +
> +	ret = build_sg(vm->xe, hmm_range, &userptr->sgt, write);
> +	if (ret)
> +		goto free_pfns;
> +
> +	mark_range_accessed(hmm_range, write);
> +	userptr->sg = &userptr->sgt;
> +	userptr->notifier_seq = hmm_range->notifier_seq;
> +
> +free_pfns:
> +	kvfree(pfns);
> +	return ret;
> +}
> +
> diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> b/drivers/gpu/drm/xe/xe_hmm.h
> new file mode 100644
> index 000000000000..960f3f6d36ae
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.h
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/types.h>
> +#include <linux/hmm.h>
> +#include "xe_vm_types.h"
> +#include "xe_svm.h"
> +
> +int xe_hmm_populate_range(struct xe_vma *vma, struct hmm_range
> *hmm_range,
> +						bool write);


  parent reply	other threads:[~2024-03-18 11:53 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-14  3:35 [PATCH 0/5] Use hmm_range_fault to populate user page Oak Zeng
2024-03-14  3:28 ` ✓ CI.Patch_applied: success for " Patchwork
2024-03-14  3:28 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-14  3:29 ` ✗ CI.KUnit: failure " Patchwork
2024-03-14  3:35 ` [PATCH 1/5] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-03-14 17:17   ` Matthew Brost
2024-03-14 18:32     ` Zeng, Oak
2024-03-14 20:49       ` Matthew Brost
2024-03-15 16:00         ` Zeng, Oak
2024-03-15 20:39           ` Matthew Brost
2024-03-15 21:31             ` Zeng, Oak
2024-03-16  1:25               ` Matthew Brost
2024-03-18 10:16                 ` Hellstrom, Thomas
2024-03-18 15:02                   ` Zeng, Oak
2024-03-18 15:46                     ` Hellstrom, Thomas
2024-03-18 14:51                 ` Zeng, Oak
2024-03-15  1:45   ` Welty, Brian
2024-03-15  3:10     ` Zeng, Oak
2024-03-15  3:16       ` Zeng, Oak
2024-03-15 18:05         ` Welty, Brian
2024-03-15 23:11           ` Zeng, Oak
2024-03-14  3:35 ` [PATCH 2/5] drm/xe: Helper to get memory region from tile Oak Zeng
2024-03-14 17:33   ` Matthew Brost
2024-03-14 17:44   ` Matthew Brost
2024-03-15  2:48     ` Zeng, Oak
2024-03-14  3:35 ` [PATCH 3/5] drm/xe: Helper to get dpa from pfn Oak Zeng
2024-03-14 17:39   ` Matthew Brost
2024-03-15 17:29     ` Zeng, Oak
2024-03-16  1:33       ` Matthew Brost
2024-03-18 19:25         ` Zeng, Oak
2024-03-18 12:09     ` Hellstrom, Thomas
2024-03-18 19:27       ` Zeng, Oak
2024-03-14  3:35 ` [PATCH 4/5] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
2024-03-14 20:25   ` Matthew Brost
2024-03-16  1:35     ` Zeng, Oak
2024-03-18  0:29       ` Matthew Brost
2024-03-18 11:53   ` Hellstrom, Thomas [this message]
2024-03-18 19:50     ` Zeng, Oak
2024-03-19  8:41       ` Hellstrom, Thomas
2024-03-19 16:13         ` Zeng, Oak
2024-03-19 19:52           ` Hellstrom, Thomas
2024-03-19 20:01             ` Zeng, Oak
2024-03-18 13:12   ` Hellstrom, Thomas
2024-03-18 14:49     ` Zeng, Oak
2024-03-18 15:40       ` Hellstrom, Thomas
2024-03-18 16:09         ` Zeng, Oak
2024-03-14  3:35 ` [PATCH 5/5] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
2024-03-14 20:54   ` Matthew Brost
2024-03-19  2:36     ` Zeng, Oak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8931f427c533ecbc3db202b54364572e86a2d1fc.camel@intel.com \
    --to=thomas.hellstrom@intel.com \
    --cc=airlied@gmail.com \
    --cc=brian.welty@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=oak.zeng@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.