[PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
@ 2023-08-09 12:17 Boris Brezillon
  2023-08-09 12:17 ` [PATCH 1/2] " Boris Brezillon
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-08-09 12:17 UTC (permalink / raw)
  To: Joerg Roedel, iommu, Will Deacon, Robin Murphy, linux-arm-kernel
  Cc: Rob Clark, Boris Brezillon

Hello,

This patchset is an attempt at making page table allocation
customizable. This is useful to some GPU drivers for various reasons:

- speed-up upcoming page table allocations by managing a pool of free
  pages
- batch page table allocation instead of allocating one page at a time
- pre-reserve pages for page tables needed for map/unmap operations and
  return the unused page tables to some pool

The first and last reasons are particularly important for GPU drivers
wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND requires
that any page table needed for a map/unmap operation to succeed be
allocated at VM_BIND job creation time. At the time of the job creation,
we don't know what the VM will look like when we get to execute the
map/unmap, and can't guess how many page tables we will need. Because
of that, we have to over-provision page tables for the worst case
scenario (page table tree is empty), which means we will allocate/free
a lot. Having pool a pool of free pages is crucial if we want to
speed-up VM_BIND requests.

A real example of how such custom allocators can be used is available
here[1]. v2 of the Panthor driver is approaching submission, and I
figured I'd try to upstream the dependencies separately, which is
why I submit this series now, even though the user of this new API
will come afterwards. If you'd prefer to have those patches submitted
along with the Panthor driver, let me know.

This approach has been discussed with Robin, and is hopefully not too
far from what he had in mind.

Regards,

Boris

[1]https://gitlab.freedesktop.org/panfrost/linux/-/blob/panthor/drivers/gpu/drm/panthor/panthor_mmu.c#L441

Boris Brezillon (2):
  iommu: Allow passing custom allocators to pgtable drivers
  iommu: Extend LPAE page table format to support custom allocators

 drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
 drivers/iommu/io-pgtable.c     | 31 +++++++++++++++++++++
 include/linux/io-pgtable.h     | 21 ++++++++++++++
 3 files changed, 86 insertions(+), 16 deletions(-)

-- 
2.41.0

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-08-09 12:17 [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Boris Brezillon
@ 2023-08-09 12:17 ` Boris Brezillon
  2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
  2023-09-20 13:12 ` [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Steven Price
  2 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-08-09 12:17 UTC (permalink / raw)
  To: Joerg Roedel, iommu, Will Deacon, Robin Murphy, linux-arm-kernel
  Cc: Rob Clark, Boris Brezillon

This will be useful for GPU drivers who want to keep page tables in a
pool so they can:

- keep freed page tables in a free pool and speed-up upcoming page
  table allocations
- batch page table allocation instead of allocating one page at a time
- pre-reserve pages for page tables needed for map/unmap operations,
  to ensure map/unmap operations don't try to allocate memory in paths
  they're allowed to block or fail

We will extend the Arm LPAE format to support custom allocators in a
separate commit.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/iommu/io-pgtable.c | 19 +++++++++++++++++++
 include/linux/io-pgtable.h | 21 +++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index b843fcd365d2..f4caf630638a 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -34,6 +34,22 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
 #endif
 };
 
+static int check_custom_allocator(enum io_pgtable_fmt fmt,
+				  struct io_pgtable_cfg *cfg)
+{
+	/* When passing a custom allocator, both the alloc and free
+	 * functions should be provided.
+	 */
+	if ((cfg->alloc != NULL) != (cfg->free != NULL))
+		return -EINVAL;
+
+	/* No custom allocator, no need to check the format. */
+	if (!cfg->alloc)
+		return 0;
+
+	return -EINVAL;
+}
+
 struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
 					    struct io_pgtable_cfg *cfg,
 					    void *cookie)
@@ -44,6 +60,9 @@ struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
 	if (fmt >= IO_PGTABLE_NUM_FMTS)
 		return NULL;
 
+	if (check_custom_allocator(fmt, cfg))
+		return NULL;
+
 	fns = io_pgtable_init_table[fmt];
 	if (!fns)
 		return NULL;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..80d05dcb0f4c 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -100,6 +100,27 @@ struct io_pgtable_cfg {
 	const struct iommu_flush_ops	*tlb;
 	struct device			*iommu_dev;
 
+	/**
+	 * @alloc: Custom page allocator.
+	 *
+	 * Optional hook used to allocate page tables. If this function is NULL,
+	 * @free must be NULL too.
+	 *
+	 * Not all formats support custom page allocators. Before considering
+	 * passing a non-NULL value, make sure the chosen page format supports
+	 * this feature.
+	 */
+	void *(*alloc)(void *cookie, size_t size, gfp_t gfp);
+
+	/**
+	 * @free: Custom page de-allocator.
+	 *
+	 * Optional hook used to free page tables allocated with the @alloc
+	 * hook. Must be non-NULL if @alloc is not NULL, must be NULL
+	 * otherwise.
+	 */
+	void (*free)(void *cookie, void *pages, size_t size);
+
 	/* Low-level data specific to the table format */
 	union {
 		struct {
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-08-09 12:17 [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Boris Brezillon
  2023-08-09 12:17 ` [PATCH 1/2] " Boris Brezillon
@ 2023-08-09 12:17 ` Boris Brezillon
  2023-08-09 14:47   ` Will Deacon
  2023-09-20 16:42   ` Robin Murphy
  2023-09-20 13:12 ` [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Steven Price
  2 siblings, 2 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-08-09 12:17 UTC (permalink / raw)
  To: Joerg Roedel, iommu, Will Deacon, Robin Murphy, linux-arm-kernel
  Cc: Rob Clark, Boris Brezillon

We need that in order to implement the VM_BIND ioctl in the GPU driver
targeting new Mali GPUs.

VM_BIND is about executing MMU map/unmap requests asynchronously,
possibly after waiting for external dependencies encoded as dma_fences.
We intend to use the drm_sched framework to automate the dependency
tracking and VM job dequeuing logic, but this comes with its own set
of constraints, one of them being the fact we are not allowed to
allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
sort of deadlocks:

- VM_BIND map job needs to allocate a page table to map some memory
  to the VM. No memory available, so kswapd is kicked
- GPU driver shrinker backend ends up waiting on the fence attached to
  the VM map job or any other job fence depending on this VM operation.

With custom allocators, we will be able to pre-reserve enough pages to
guarantee the map/unmap operations we queued will take place without
going through the system allocator. But we can also optimize
allocation/reservation by not free-ing pages immediately, so any
upcoming page table allocation requests can be serviced by some free
page table pool kept at the driver level.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
 drivers/iommu/io-pgtable.c     | 12 ++++++++
 2 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 72dcdd468cf3..c5c04f0106f3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
 }
 
 static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
-				    struct io_pgtable_cfg *cfg)
+				    struct io_pgtable_cfg *cfg,
+				    void *cookie)
 {
 	struct device *dev = cfg->iommu_dev;
 	int order = get_order(size);
-	struct page *p;
 	dma_addr_t dma;
 	void *pages;
 
 	VM_BUG_ON((gfp & __GFP_HIGHMEM));
-	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
-	if (!p)
+
+	if (cfg->alloc) {
+		pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
+	} else {
+		struct page *p;
+
+		p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
+		pages = p ? page_address(p) : NULL;
+	}
+
+	if (!pages)
 		return NULL;
 
-	pages = page_address(p);
 	if (!cfg->coherent_walk) {
 		dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, dma))
@@ -220,18 +228,28 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 out_unmap:
 	dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
 	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+
 out_free:
-	__free_pages(p, order);
+	if (cfg->free)
+		cfg->free(cookie, pages, size);
+	else
+		free_pages((unsigned long)pages, order);
+
 	return NULL;
 }
 
 static void __arm_lpae_free_pages(void *pages, size_t size,
-				  struct io_pgtable_cfg *cfg)
+				  struct io_pgtable_cfg *cfg,
+				  void *cookie)
 {
 	if (!cfg->coherent_walk)
 		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
 				 size, DMA_TO_DEVICE);
-	free_pages((unsigned long)pages, get_order(size));
+
+	if (cfg->free)
+		cfg->free(cookie, pages, size);
+	else
+		free_pages((unsigned long)pages, get_order(size));
 }
 
 static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
@@ -373,13 +391,13 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	/* Grab a pointer to the next level */
 	pte = READ_ONCE(*ptep);
 	if (!pte) {
-		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
+		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg, data->iop.cookie);
 		if (!cptep)
 			return -ENOMEM;
 
 		pte = arm_lpae_install_table(cptep, ptep, 0, data);
 		if (pte)
-			__arm_lpae_free_pages(cptep, tblsz, cfg);
+			__arm_lpae_free_pages(cptep, tblsz, cfg, data->iop.cookie);
 	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
 		__arm_lpae_sync_pte(ptep, 1, cfg);
 	}
@@ -524,7 +542,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
 	}
 
-	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
+	__arm_lpae_free_pages(start, table_size, &data->iop.cfg, data->iop.cookie);
 }
 
 static void arm_lpae_free_pgtable(struct io_pgtable *iop)
@@ -552,7 +570,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
 		return 0;
 
-	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie);
 	if (!tablep)
 		return 0; /* Bytes unmapped */
 
@@ -575,7 +593,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 
 	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
 	if (pte != blk_pte) {
-		__arm_lpae_free_pages(tablep, tablesz, cfg);
+		__arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie);
 		/*
 		 * We may race against someone unmapping another part of this
 		 * block, but anything else is invalid. We can't misinterpret
@@ -882,7 +900,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* Looking good; allocate a pgd */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
+					   GFP_KERNEL, cfg, cookie);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -984,7 +1002,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* Allocate pgd pages */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
-					   GFP_KERNEL, cfg);
+					   GFP_KERNEL, cfg, cookie);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -1059,7 +1077,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
 
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
-					   cfg);
+					   cfg, cookie);
 	if (!data->pgd)
 		goto out_free_data;
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index f4caf630638a..e273c18ae22b 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt,
 	if (!cfg->alloc)
 		return 0;
 
+	switch (fmt) {
+	case ARM_32_LPAE_S1:
+	case ARM_32_LPAE_S2:
+	case ARM_64_LPAE_S1:
+	case ARM_64_LPAE_S2:
+	case ARM_MALI_LPAE:
+		return 0;
+
+	default:
+		break;
+	}
+
 	return -EINVAL;
 }
 
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
@ 2023-08-09 14:47   ` Will Deacon
  2023-08-09 15:10     ` Boris Brezillon
  2023-09-20 16:42   ` Robin Murphy
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2023-08-09 14:47 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Joerg Roedel, iommu, Robin Murphy, linux-arm-kernel, Rob Clark

On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote:
> We need that in order to implement the VM_BIND ioctl in the GPU driver
> targeting new Mali GPUs.
> 
> VM_BIND is about executing MMU map/unmap requests asynchronously,
> possibly after waiting for external dependencies encoded as dma_fences.
> We intend to use the drm_sched framework to automate the dependency
> tracking and VM job dequeuing logic, but this comes with its own set
> of constraints, one of them being the fact we are not allowed to
> allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> sort of deadlocks:
> 
> - VM_BIND map job needs to allocate a page table to map some memory
>   to the VM. No memory available, so kswapd is kicked
> - GPU driver shrinker backend ends up waiting on the fence attached to
>   the VM map job or any other job fence depending on this VM operation.
> 
> With custom allocators, we will be able to pre-reserve enough pages to
> guarantee the map/unmap operations we queued will take place without
> going through the system allocator. But we can also optimize
> allocation/reservation by not free-ing pages immediately, so any
> upcoming page table allocation requests can be serviced by some free
> page table pool kept at the driver level.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
>  drivers/iommu/io-pgtable.c     | 12 ++++++++
>  2 files changed, 46 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 72dcdd468cf3..c5c04f0106f3 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
>  }
>  
>  static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> -				    struct io_pgtable_cfg *cfg)
> +				    struct io_pgtable_cfg *cfg,
> +				    void *cookie)
>  {
>  	struct device *dev = cfg->iommu_dev;
>  	int order = get_order(size);
> -	struct page *p;
>  	dma_addr_t dma;
>  	void *pages;
>  
>  	VM_BUG_ON((gfp & __GFP_HIGHMEM));
> -	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> -	if (!p)
> +
> +	if (cfg->alloc) {
> +		pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
> +	} else {
> +		struct page *p;
> +
> +		p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> +		pages = p ? page_address(p) : NULL;

Hmm, so the reason we pass the order is because the pgd may have
additional alignment requirements but that's not passed back to your new
allocation hook. Does it just return naturally aligned allocations?

If you don't care about the pgd (since it's not something that's going
to be freed and reallocated during the lifetime of the page-table), then
perhaps it would be cleaner to pass in a 'struct kmem_cache' for the
non-pgd pages when configuring the page-table, rather than override the
allocation function wholesale? I think that would also mean that all the
lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a
little less fragile.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-08-09 14:47   ` Will Deacon
@ 2023-08-09 15:10     ` Boris Brezillon
  2023-08-28 12:50       ` Boris Brezillon
  0 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2023-08-09 15:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, iommu, Robin Murphy, linux-arm-kernel, Rob Clark

On Wed, 9 Aug 2023 15:47:21 +0100
Will Deacon <will@kernel.org> wrote:

> On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote:
> > We need that in order to implement the VM_BIND ioctl in the GPU driver
> > targeting new Mali GPUs.
> > 
> > VM_BIND is about executing MMU map/unmap requests asynchronously,
> > possibly after waiting for external dependencies encoded as dma_fences.
> > We intend to use the drm_sched framework to automate the dependency
> > tracking and VM job dequeuing logic, but this comes with its own set
> > of constraints, one of them being the fact we are not allowed to
> > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> > sort of deadlocks:
> > 
> > - VM_BIND map job needs to allocate a page table to map some memory
> >   to the VM. No memory available, so kswapd is kicked
> > - GPU driver shrinker backend ends up waiting on the fence attached to
> >   the VM map job or any other job fence depending on this VM operation.
> > 
> > With custom allocators, we will be able to pre-reserve enough pages to
> > guarantee the map/unmap operations we queued will take place without
> > going through the system allocator. But we can also optimize
> > allocation/reservation by not free-ing pages immediately, so any
> > upcoming page table allocation requests can be serviced by some free
> > page table pool kept at the driver level.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
> >  drivers/iommu/io-pgtable.c     | 12 ++++++++
> >  2 files changed, 46 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index 72dcdd468cf3..c5c04f0106f3 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
> >  }
> >  
> >  static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> > -				    struct io_pgtable_cfg *cfg)
> > +				    struct io_pgtable_cfg *cfg,
> > +				    void *cookie)
> >  {
> >  	struct device *dev = cfg->iommu_dev;
> >  	int order = get_order(size);
> > -	struct page *p;
> >  	dma_addr_t dma;
> >  	void *pages;
> >  
> >  	VM_BUG_ON((gfp & __GFP_HIGHMEM));
> > -	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > -	if (!p)
> > +
> > +	if (cfg->alloc) {
> > +		pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
> > +	} else {
> > +		struct page *p;
> > +
> > +		p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > +		pages = p ? page_address(p) : NULL;  
> 
> Hmm, so the reason we pass the order is because the pgd may have
> additional alignment requirements but that's not passed back to your new
> allocation hook. Does it just return naturally aligned allocations?

So, the assumption was that the custom allocator should be aware of
these alignment constraints. Like, if size > min_granule_sz, then order
equal get_order(size).

Right now, I reject any size that's not SZ_4K, because, given the
VA-space and the pgtable config, I know I'll always end up with 4
levels and the pgd will fit in a 4k page. But if we ever decide we want
to support 64k granule or a VA space that's smaller, we'll adjust the
custom allocator accordingly.

I'm not sure we discussed this specific aspect with Robin, but his
point was that the user passing a custom allocator should be aware of
the page table format and all the allocation constraints that come with
it.

> 
> If you don't care about the pgd (since it's not something that's going
> to be freed and reallocated during the lifetime of the page-table), then
> perhaps it would be cleaner to pass in a 'struct kmem_cache' for the
> non-pgd pages when configuring the page-table, rather than override the
> allocation function wholesale?

I'd be fine with that, but I wasn't sure every one would be okay to use
a kmem_cache as the page caching mechanism. I know Rob was intending to
use a custom allocator with the MSM MMU to provide the same sort of
caching for the Adreno GPU driver, so it might be good to have his
opinion on that.

> I think that would also mean that all the
> lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a
> little less fragile.

Well, that would only help if a custom kmem_cache is used, unless you
want to generalize the kmem_cache approach and force it to all
io-pgtable-arm users, in which case I probably don't need to pass a
custom kmem_cache in the first place (mine has nothing special).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-08-09 15:10     ` Boris Brezillon
@ 2023-08-28 12:50       ` Boris Brezillon
  0 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-08-28 12:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, iommu, Robin Murphy, linux-arm-kernel, Rob Clark

On Wed, 9 Aug 2023 17:10:17 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Wed, 9 Aug 2023 15:47:21 +0100
> Will Deacon <will@kernel.org> wrote:
> 
> > On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote:  
> > > We need that in order to implement the VM_BIND ioctl in the GPU driver
> > > targeting new Mali GPUs.
> > > 
> > > VM_BIND is about executing MMU map/unmap requests asynchronously,
> > > possibly after waiting for external dependencies encoded as dma_fences.
> > > We intend to use the drm_sched framework to automate the dependency
> > > tracking and VM job dequeuing logic, but this comes with its own set
> > > of constraints, one of them being the fact we are not allowed to
> > > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> > > sort of deadlocks:
> > > 
> > > - VM_BIND map job needs to allocate a page table to map some memory
> > >   to the VM. No memory available, so kswapd is kicked
> > > - GPU driver shrinker backend ends up waiting on the fence attached to
> > >   the VM map job or any other job fence depending on this VM operation.
> > > 
> > > With custom allocators, we will be able to pre-reserve enough pages to
> > > guarantee the map/unmap operations we queued will take place without
> > > going through the system allocator. But we can also optimize
> > > allocation/reservation by not free-ing pages immediately, so any
> > > upcoming page table allocation requests can be serviced by some free
> > > page table pool kept at the driver level.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
> > >  drivers/iommu/io-pgtable.c     | 12 ++++++++
> > >  2 files changed, 46 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > > index 72dcdd468cf3..c5c04f0106f3 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
> > >  }
> > >  
> > >  static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> > > -				    struct io_pgtable_cfg *cfg)
> > > +				    struct io_pgtable_cfg *cfg,
> > > +				    void *cookie)
> > >  {
> > >  	struct device *dev = cfg->iommu_dev;
> > >  	int order = get_order(size);
> > > -	struct page *p;
> > >  	dma_addr_t dma;
> > >  	void *pages;
> > >  
> > >  	VM_BUG_ON((gfp & __GFP_HIGHMEM));
> > > -	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > > -	if (!p)
> > > +
> > > +	if (cfg->alloc) {
> > > +		pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);
> > > +	} else {
> > > +		struct page *p;
> > > +
> > > +		p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> > > +		pages = p ? page_address(p) : NULL;    
> > 
> > Hmm, so the reason we pass the order is because the pgd may have
> > additional alignment requirements but that's not passed back to your new
> > allocation hook. Does it just return naturally aligned allocations?  
> 
> So, the assumption was that the custom allocator should be aware of
> these alignment constraints. Like, if size > min_granule_sz, then order
> equal get_order(size).
> 
> Right now, I reject any size that's not SZ_4K, because, given the
> VA-space and the pgtable config, I know I'll always end up with 4
> levels and the pgd will fit in a 4k page. But if we ever decide we want
> to support 64k granule or a VA space that's smaller, we'll adjust the
> custom allocator accordingly.
> 
> I'm not sure we discussed this specific aspect with Robin, but his
> point was that the user passing a custom allocator should be aware of
> the page table format and all the allocation constraints that come with
> it.
> 
> > 
> > If you don't care about the pgd (since it's not something that's going
> > to be freed and reallocated during the lifetime of the page-table), then
> > perhaps it would be cleaner to pass in a 'struct kmem_cache' for the
> > non-pgd pages when configuring the page-table, rather than override the
> > allocation function wholesale?  
> 
> I'd be fine with that, but I wasn't sure every one would be okay to use
> a kmem_cache as the page caching mechanism.

My bad, I still need the custom allocator hooks so I can pre-allocate
pages and make sure no allocation happens in the page table update
path (dma-signaling path where no blocking allocations are allowed). I
might be wrong, but I don't think kmem_cache provides such a
reservation mechanism.

> I know Rob was intending to
> use a custom allocator with the MSM MMU to provide the same sort of
> caching for the Adreno GPU driver, so it might be good to have his
> opinion on that.
> 
> > I think that would also mean that all the
> > lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a
> > little less fragile.  
> 
> Well, that would only help if a custom kmem_cache is used, unless you
> want to generalize the kmem_cache approach and force it to all
> io-pgtable-arm users, in which case I probably don't need to pass a
> custom kmem_cache in the first place (mine has nothing special).


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-08-09 12:17 [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Boris Brezillon
  2023-08-09 12:17 ` [PATCH 1/2] " Boris Brezillon
  2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
@ 2023-09-20 13:12 ` Steven Price
  2023-10-23 21:02   ` Rob Clark
  2 siblings, 1 reply; 13+ messages in thread
From: Steven Price @ 2023-09-20 13:12 UTC (permalink / raw)
  To: Boris Brezillon, Joerg Roedel, iommu, Will Deacon, Robin Murphy,
	linux-arm-kernel
  Cc: Rob Clark

On 09/08/2023 13:17, Boris Brezillon wrote:
> Hello,
> 
> This patchset is an attempt at making page table allocation
> customizable. This is useful to some GPU drivers for various reasons:
> 
> - speed-up upcoming page table allocations by managing a pool of free
>   pages
> - batch page table allocation instead of allocating one page at a time
> - pre-reserve pages for page tables needed for map/unmap operations and
>   return the unused page tables to some pool
> 
> The first and last reasons are particularly important for GPU drivers
> wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND requires
> that any page table needed for a map/unmap operation to succeed be
> allocated at VM_BIND job creation time. At the time of the job creation,
> we don't know what the VM will look like when we get to execute the
> map/unmap, and can't guess how many page tables we will need. Because
> of that, we have to over-provision page tables for the worst case
> scenario (page table tree is empty), which means we will allocate/free
> a lot. Having pool a pool of free pages is crucial if we want to
> speed-up VM_BIND requests.
> 
> A real example of how such custom allocators can be used is available
> here[1]. v2 of the Panthor driver is approaching submission, and I
> figured I'd try to upstream the dependencies separately, which is
> why I submit this series now, even though the user of this new API
> will come afterwards. If you'd prefer to have those patches submitted
> along with the Panthor driver, let me know.
> 
> This approach has been discussed with Robin, and is hopefully not too
> far from what he had in mind.

The alternative would be to embed a cache of pages into the IOMMU
framework, however kmem_cache sadly doesn't seem to support the
'reserve' of pages concept that we need. mempools could be a solution
but the mempool would need to be created by the IOMMU framework as the
alloc/free functions are specified when creating the pool. So it would
be a much bigger change (to drivers/iommu).

So, given that so far it's just Panthor this seems like the right
approach for now - when/if other drivers want the same functionality
then it might make sense to revisit the idea of doing the caching within
the IOMMU framework.

Robin: Does this approach sound sensible?

FWIW:

Reviewed-by: Steven Price <steven.price@arm.com>

Steve

> Regards,
> 
> Boris
> 
> [1]https://gitlab.freedesktop.org/panfrost/linux/-/blob/panthor/drivers/gpu/drm/panthor/panthor_mmu.c#L441
> 
> Boris Brezillon (2):
>   iommu: Allow passing custom allocators to pgtable drivers
>   iommu: Extend LPAE page table format to support custom allocators
> 
>  drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
>  drivers/iommu/io-pgtable.c     | 31 +++++++++++++++++++++
>  include/linux/io-pgtable.h     | 21 ++++++++++++++
>  3 files changed, 86 insertions(+), 16 deletions(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
  2023-08-09 14:47   ` Will Deacon
@ 2023-09-20 16:42   ` Robin Murphy
  2023-11-10  9:52     ` Boris Brezillon
  1 sibling, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2023-09-20 16:42 UTC (permalink / raw)
  To: Boris Brezillon, Joerg Roedel, iommu, Will Deacon, linux-arm-kernel
  Cc: Rob Clark

On 09/08/2023 1:17 pm, Boris Brezillon wrote:
> We need that in order to implement the VM_BIND ioctl in the GPU driver
> targeting new Mali GPUs.
> 
> VM_BIND is about executing MMU map/unmap requests asynchronously,
> possibly after waiting for external dependencies encoded as dma_fences.
> We intend to use the drm_sched framework to automate the dependency
> tracking and VM job dequeuing logic, but this comes with its own set
> of constraints, one of them being the fact we are not allowed to
> allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> sort of deadlocks:
> 
> - VM_BIND map job needs to allocate a page table to map some memory
>    to the VM. No memory available, so kswapd is kicked
> - GPU driver shrinker backend ends up waiting on the fence attached to
>    the VM map job or any other job fence depending on this VM operation.
> 
> With custom allocators, we will be able to pre-reserve enough pages to
> guarantee the map/unmap operations we queued will take place without
> going through the system allocator. But we can also optimize
> allocation/reservation by not free-ing pages immediately, so any
> upcoming page table allocation requests can be serviced by some free
> page table pool kept at the driver level.

We should bear in mind it's also potentially valuable for other aspects 
of GPU and similar use-cases, like fine-grained memory accounting and 
resource limiting. That's a significant factor in this approach vs. 
internal caching schemes that could only solve the specific reclaim concern.

> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>   drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
>   drivers/iommu/io-pgtable.c     | 12 ++++++++
>   2 files changed, 46 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 72dcdd468cf3..c5c04f0106f3 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages)
>   }
>   
>   static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
> -				    struct io_pgtable_cfg *cfg)
> +				    struct io_pgtable_cfg *cfg,
> +				    void *cookie)
>   {
>   	struct device *dev = cfg->iommu_dev;
>   	int order = get_order(size);
> -	struct page *p;
>   	dma_addr_t dma;
>   	void *pages;
>   
>   	VM_BUG_ON((gfp & __GFP_HIGHMEM));
> -	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> -	if (!p)
> +
> +	if (cfg->alloc) {
> +		pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO);

I think it should be a fundamental requirement of the interface that an 
allocator always returns zeroed pages, since there's no obvious use-case 
for it ever not doing so. Ideally I'd like to not pass a gfp value at 
all, given the intent that a custom allocator will typically be 
something *other* than alloc_pages() by necessity, but it's conceivable 
that contextual flags like GFP_ATOMIC and __GFP_NOWARN could still be 
meaningful, so ultimately it doesn't really seem worthwhile trying to 
re-encode them differently just for the sake of it.

> +	} else {
> +		struct page *p;
> +
> +		p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
> +		pages = p ? page_address(p) : NULL;
> +	}
> +
> +	if (!pages)
>   		return NULL;
>   
> -	pages = page_address(p);
>   	if (!cfg->coherent_walk) {
>   		dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
>   		if (dma_mapping_error(dev, dma))
> @@ -220,18 +228,28 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>   out_unmap:
>   	dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
>   	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> +
>   out_free:
> -	__free_pages(p, order);
> +	if (cfg->free)
> +		cfg->free(cookie, pages, size);
> +	else
> +		free_pages((unsigned long)pages, order);
> +
>   	return NULL;
>   }
>   
>   static void __arm_lpae_free_pages(void *pages, size_t size,
> -				  struct io_pgtable_cfg *cfg)
> +				  struct io_pgtable_cfg *cfg,
> +				  void *cookie)
>   {
>   	if (!cfg->coherent_walk)
>   		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
>   				 size, DMA_TO_DEVICE);
> -	free_pages((unsigned long)pages, get_order(size));
> +
> +	if (cfg->free)
> +		cfg->free(cookie, pages, size);
> +	else
> +		free_pages((unsigned long)pages, get_order(size));
>   }
>   
>   static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
> @@ -373,13 +391,13 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
>   	/* Grab a pointer to the next level */
>   	pte = READ_ONCE(*ptep);
>   	if (!pte) {
> -		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg);
> +		cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg, data->iop.cookie);
>   		if (!cptep)
>   			return -ENOMEM;
>   
>   		pte = arm_lpae_install_table(cptep, ptep, 0, data);
>   		if (pte)
> -			__arm_lpae_free_pages(cptep, tblsz, cfg);
> +			__arm_lpae_free_pages(cptep, tblsz, cfg, data->iop.cookie);
>   	} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
>   		__arm_lpae_sync_pte(ptep, 1, cfg);
>   	}
> @@ -524,7 +542,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
>   		__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
>   	}
>   
> -	__arm_lpae_free_pages(start, table_size, &data->iop.cfg);
> +	__arm_lpae_free_pages(start, table_size, &data->iop.cfg, data->iop.cookie);
>   }
>   
>   static void arm_lpae_free_pgtable(struct io_pgtable *iop)
> @@ -552,7 +570,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>   	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
>   		return 0;
>   
> -	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
> +	tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie);
>   	if (!tablep)
>   		return 0; /* Bytes unmapped */
>   
> @@ -575,7 +593,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>   
>   	pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
>   	if (pte != blk_pte) {
> -		__arm_lpae_free_pages(tablep, tablesz, cfg);
> +		__arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie);
>   		/*
>   		 * We may race against someone unmapping another part of this
>   		 * block, but anything else is invalid. We can't misinterpret
> @@ -882,7 +900,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	/* Looking good; allocate a pgd */
>   	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> -					   GFP_KERNEL, cfg);
> +					   GFP_KERNEL, cfg, cookie);
>   	if (!data->pgd)
>   		goto out_free_data;
>   
> @@ -984,7 +1002,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	/* Allocate pgd pages */
>   	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
> -					   GFP_KERNEL, cfg);
> +					   GFP_KERNEL, cfg, cookie);
>   	if (!data->pgd)
>   		goto out_free_data;
>   
> @@ -1059,7 +1077,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
>   		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
>   
>   	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
> -					   cfg);
> +					   cfg, cookie);
>   	if (!data->pgd)
>   		goto out_free_data;
>   
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index f4caf630638a..e273c18ae22b 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt,
>   	if (!cfg->alloc)
>   		return 0;
>   
> +	switch (fmt) {
> +	case ARM_32_LPAE_S1:
> +	case ARM_32_LPAE_S2:
> +	case ARM_64_LPAE_S1:
> +	case ARM_64_LPAE_S2:
> +	case ARM_MALI_LPAE:
> +		return 0;

I remain not entirely convinced by the value of this, but could it at 
least be done in a more scalable manner like some kind of flag provided 
by the format itself?

Thanks,
Robin.


> +
> +	default:
> +		break;
> +	}
> +
>   	return -EINVAL;
>   }
>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-09-20 13:12 ` [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Steven Price
@ 2023-10-23 21:02   ` Rob Clark
  2023-11-07 11:52     ` Gaurav Kohli
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Clark @ 2023-10-23 21:02 UTC (permalink / raw)
  To: Steven Price
  Cc: Boris Brezillon, Joerg Roedel, iommu, Will Deacon, Robin Murphy,
	linux-arm-kernel

On Wed, Sep 20, 2023 at 6:12 AM Steven Price <steven.price@arm.com> wrote:
>
> On 09/08/2023 13:17, Boris Brezillon wrote:
> > Hello,
> >
> > This patchset is an attempt at making page table allocation
> > customizable. This is useful to some GPU drivers for various reasons:
> >
> > - speed-up upcoming page table allocations by managing a pool of free
> >   pages
> > - batch page table allocation instead of allocating one page at a time
> > - pre-reserve pages for page tables needed for map/unmap operations and
> >   return the unused page tables to some pool
> >
> > The first and last reasons are particularly important for GPU drivers
> > wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND requires
> > that any page table needed for a map/unmap operation to succeed be
> > allocated at VM_BIND job creation time. At the time of the job creation,
> > we don't know what the VM will look like when we get to execute the
> > map/unmap, and can't guess how many page tables we will need. Because
> > of that, we have to over-provision page tables for the worst case
> > scenario (page table tree is empty), which means we will allocate/free
> > a lot. Having pool a pool of free pages is crucial if we want to
> > speed-up VM_BIND requests.
> >
> > A real example of how such custom allocators can be used is available
> > here[1]. v2 of the Panthor driver is approaching submission, and I
> > figured I'd try to upstream the dependencies separately, which is
> > why I submit this series now, even though the user of this new API
> > will come afterwards. If you'd prefer to have those patches submitted
> > along with the Panthor driver, let me know.
> >
> > This approach has been discussed with Robin, and is hopefully not too
> > far from what he had in mind.
>
> The alternative would be to embed a cache of pages into the IOMMU
> framework, however kmem_cache sadly doesn't seem to support the
> 'reserve' of pages concept that we need. mempools could be a solution
> but the mempool would need to be created by the IOMMU framework as the
> alloc/free functions are specified when creating the pool. So it would
> be a much bigger change (to drivers/iommu).
>
> So, given that so far it's just Panthor this seems like the right
> approach for now - when/if other drivers want the same functionality
> then it might make sense to revisit the idea of doing the caching within
> the IOMMU framework.

I have some plans to use this as well for drm/msm.. but the reasons
and requirements are basically the same as for panthor.  I think I
prefer the custom allocator approach, rather than tying this to IOMMU
framework.  (But ofc custom allocators, I guess, does not prevent the
iommu driver from doing it's own caching.)

BR,
-R

> Robin: Does this approach sound sensible?
>
> FWIW:
>
> Reviewed-by: Steven Price <steven.price@arm.com>
>
> Steve
>
> > Regards,
> >
> > Boris
> >
> > [1]https://gitlab.freedesktop.org/panfrost/linux/-/blob/panthor/drivers/gpu/drm/panthor/panthor_mmu.c#L441
> >
> > Boris Brezillon (2):
> >   iommu: Allow passing custom allocators to pgtable drivers
> >   iommu: Extend LPAE page table format to support custom allocators
> >
> >  drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
> >  drivers/iommu/io-pgtable.c     | 31 +++++++++++++++++++++
> >  include/linux/io-pgtable.h     | 21 ++++++++++++++
> >  3 files changed, 86 insertions(+), 16 deletions(-)
> >
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-10-23 21:02   ` Rob Clark
@ 2023-11-07 11:52     ` Gaurav Kohli
  2023-11-07 12:01       ` Gaurav Kohli
  2023-11-10  9:47       ` Boris Brezillon
  0 siblings, 2 replies; 13+ messages in thread
From: Gaurav Kohli @ 2023-11-07 11:52 UTC (permalink / raw)
  To: Rob Clark, Steven Price
  Cc: Boris Brezillon, Joerg Roedel, iommu, Will Deacon, Robin Murphy,
	linux-arm-kernel, linux-arm-msm



On 10/24/2023 2:32 AM, Rob Clark wrote:
> On Wed, Sep 20, 2023 at 6:12 AM Steven Price <steven.price@arm.com> wrote:
>>
>> On 09/08/2023 13:17, Boris Brezillon wrote:
>>> Hello,
>>>
>>> This patchset is an attempt at making page table allocation
>>> customizable. This is useful to some GPU drivers for various reasons:
>>>
>>> - speed-up upcoming page table allocations by managing a pool of free
>>>    pages
>>> - batch page table allocation instead of allocating one page at a time
>>> - pre-reserve pages for page tables needed for map/unmap operations and
>>>    return the unused page tables to some pool
>>>
>>> The first and last reasons are particularly important for GPU drivers
>>> wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND requires
>>> that any page table needed for a map/unmap operation to succeed be
>>> allocated at VM_BIND job creation time. At the time of the job creation,
>>> we don't know what the VM will look like when we get to execute the
>>> map/unmap, and can't guess how many page tables we will need. Because
>>> of that, we have to over-provision page tables for the worst case
>>> scenario (page table tree is empty), which means we will allocate/free
>>> a lot. Having pool a pool of free pages is crucial if we want to
>>> speed-up VM_BIND requests.
>>>
>>> A real example of how such custom allocators can be used is available
>>> here[1]. v2 of the Panthor driver is approaching submission, and I
>>> figured I'd try to upstream the dependencies separately, which is
>>> why I submit this series now, even though the user of this new API
>>> will come afterwards. If you'd prefer to have those patches submitted
>>> along with the Panthor driver, let me know.
>>>
>>> This approach has been discussed with Robin, and is hopefully not too
>>> far from what he had in mind.
>>
>> The alternative would be to embed a cache of pages into the IOMMU
>> framework, however kmem_cache sadly doesn't seem to support the
>> 'reserve' of pages concept that we need. mempools could be a solution
>> but the mempool would need to be created by the IOMMU framework as the
>> alloc/free functions are specified when creating the pool. So it would
>> be a much bigger change (to drivers/iommu).
>>
>> So, given that so far it's just Panthor this seems like the right
>> approach for now - when/if other drivers want the same functionality
>> then it might make sense to revisit the idea of doing the caching within
>> the IOMMU framework.
> 
> I have some plans to use this as well for drm/msm.. but the reasons
> and requirements are basically the same as for panthor.  I think I
> prefer the custom allocator approach, rather than tying this to IOMMU
> framework.  (But ofc custom allocators, I guess, does not prevent the
> iommu driver from doing it's own caching.)
> 
> BR,
> -R
> 

We have also posted one RFC patch series which is based on this current 
patches by Boris and helping us to define our custom alloc and free 
pgtable call. For our side usecase we have a requirement to create 
pgtable from HLOS and then share it to different entity(VMID) and 
basically that also requires few smc calls and for that we need
custom alloc/free callbacks.

https://lore.kernel.org/all/20231101071144.16309-1-quic_gkohli@quicinc.com/


So custom allocator and free ops is helping for us also. Is there any 
plan to merge these patches from Boris.




>> Robin: Does this approach sound sensible?
>>
>> FWIW:
>>
>> Reviewed-by: Steven Price <steven.price@arm.com>
>>
>> Steve
>>
>>> Regards,
>>>
>>> Boris
>>>
>>> [1]https://gitlab.freedesktop.org/panfrost/linux/-/blob/panthor/drivers/gpu/drm/panthor/panthor_mmu.c#L441
>>>
>>> Boris Brezillon (2):
>>>    iommu: Allow passing custom allocators to pgtable drivers
>>>    iommu: Extend LPAE page table format to support custom allocators
>>>
>>>   drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++-----------
>>>   drivers/iommu/io-pgtable.c     | 31 +++++++++++++++++++++
>>>   include/linux/io-pgtable.h     | 21 ++++++++++++++
>>>   3 files changed, 86 insertions(+), 16 deletions(-)
>>>
>>
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-11-07 11:52     ` Gaurav Kohli
@ 2023-11-07 12:01       ` Gaurav Kohli
  2023-11-10  9:47       ` Boris Brezillon
  1 sibling, 0 replies; 13+ messages in thread
From: Gaurav Kohli @ 2023-11-07 12:01 UTC (permalink / raw)
  To: Rob Clark, Steven Price, Will Deacon
  Cc: Boris Brezillon, Joerg Roedel, iommu, Will Deacon, Robin Murphy,
	linux-arm-kernel, linux-arm-msm



On 11/7/2023 5:22 PM, Gaurav Kohli wrote:
> 
> 
> On 10/24/2023 2:32 AM, Rob Clark wrote:
>> On Wed, Sep 20, 2023 at 6:12 AM Steven Price <steven.price@arm.com> 
>> wrote:
>>>
>>> On 09/08/2023 13:17, Boris Brezillon wrote:
>>>> Hello,
>>>>
>>>> This patchset is an attempt at making page table allocation
>>>> customizable. This is useful to some GPU drivers for various reasons:
>>>>
>>>> - speed-up upcoming page table allocations by managing a pool of free
>>>>    pages
>>>> - batch page table allocation instead of allocating one page at a time
>>>> - pre-reserve pages for page tables needed for map/unmap operations and
>>>>    return the unused page tables to some pool
>>>>
>>>> The first and last reasons are particularly important for GPU drivers
>>>> wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND 
>>>> requires
>>>> that any page table needed for a map/unmap operation to succeed be
>>>> allocated at VM_BIND job creation time. At the time of the job 
>>>> creation,
>>>> we don't know what the VM will look like when we get to execute the
>>>> map/unmap, and can't guess how many page tables we will need. Because
>>>> of that, we have to over-provision page tables for the worst case
>>>> scenario (page table tree is empty), which means we will allocate/free
>>>> a lot. Having pool a pool of free pages is crucial if we want to
>>>> speed-up VM_BIND requests.
>>>>
>>>> A real example of how such custom allocators can be used is available
>>>> here[1]. v2 of the Panthor driver is approaching submission, and I
>>>> figured I'd try to upstream the dependencies separately, which is
>>>> why I submit this series now, even though the user of this new API
>>>> will come afterwards. If you'd prefer to have those patches submitted
>>>> along with the Panthor driver, let me know.
>>>>
>>>> This approach has been discussed with Robin, and is hopefully not too
>>>> far from what he had in mind.
>>>
>>> The alternative would be to embed a cache of pages into the IOMMU
>>> framework, however kmem_cache sadly doesn't seem to support the
>>> 'reserve' of pages concept that we need. mempools could be a solution
>>> but the mempool would need to be created by the IOMMU framework as the
>>> alloc/free functions are specified when creating the pool. So it would
>>> be a much bigger change (to drivers/iommu).
>>>
>>> So, given that so far it's just Panthor this seems like the right
>>> approach for now - when/if other drivers want the same functionality
>>> then it might make sense to revisit the idea of doing the caching within
>>> the IOMMU framework.
>>
>> I have some plans to use this as well for drm/msm.. but the reasons
>> and requirements are basically the same as for panthor.  I think I
>> prefer the custom allocator approach, rather than tying this to IOMMU
>> framework.  (But ofc custom allocators, I guess, does not prevent the
>> iommu driver from doing it's own caching.)
>>
>> BR,
>> -R
>>
> 
> We have also posted one RFC patch series which is based on this current 
> patches by Boris and helping us to define our custom alloc and free 
> pgtable call. For our side usecase we have a requirement to create 
> pgtable from HLOS and then share it to different entity(VMID) and 
> basically that also requires few smc calls and for that we need
> custom alloc/free callbacks.
> 
> https://lore.kernel.org/all/20231101071144.16309-1-quic_gkohli@quicinc.com/
> 
> 
> So custom allocator and free ops is helping for us also. Is there any 
> plan to merge these patches from Boris.
> 
> 
> 
> 

Please use below tested by tag, if you are planning to merge patches by 
Boris:

Tested-by: Gaurav Kohli <quic_gkohli@quicinc.com>

>>> Robin: Does this approach sound sensible?
>>>
>>> FWIW:
>>>
>>> Reviewed-by: Steven Price <steven.price@arm.com>
>>>
>>> Steve
>>>
>>>> Regards,
>>>>
>>>> Boris
>>>>
>>>> [1]https://gitlab.freedesktop.org/panfrost/linux/-/blob/panthor/drivers/gpu/drm/panthor/panthor_mmu.c#L441
>>>>
>>>> Boris Brezillon (2):
>>>>    iommu: Allow passing custom allocators to pgtable drivers
>>>>    iommu: Extend LPAE page table format to support custom allocators
>>>>
>>>>   drivers/iommu/io-pgtable-arm.c | 50 
>>>> +++++++++++++++++++++++-----------
>>>>   drivers/iommu/io-pgtable.c     | 31 +++++++++++++++++++++
>>>>   include/linux/io-pgtable.h     | 21 ++++++++++++++
>>>>   3 files changed, 86 insertions(+), 16 deletions(-)
>>>>
>>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers
  2023-11-07 11:52     ` Gaurav Kohli
  2023-11-07 12:01       ` Gaurav Kohli
@ 2023-11-10  9:47       ` Boris Brezillon
  1 sibling, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-11-10  9:47 UTC (permalink / raw)
  To: Gaurav Kohli
  Cc: Rob Clark, Steven Price, Joerg Roedel, iommu, Will Deacon,
	Robin Murphy, linux-arm-kernel, linux-arm-msm

Hi Gaurav,

On Tue, 7 Nov 2023 17:22:39 +0530
Gaurav Kohli <quic_gkohli@quicinc.com> wrote:

> On 10/24/2023 2:32 AM, Rob Clark wrote:
> > On Wed, Sep 20, 2023 at 6:12 AM Steven Price <steven.price@arm.com> wrote:  
> >>
> >> On 09/08/2023 13:17, Boris Brezillon wrote:  
> >>> Hello,
> >>>
> >>> This patchset is an attempt at making page table allocation
> >>> customizable. This is useful to some GPU drivers for various reasons:
> >>>
> >>> - speed-up upcoming page table allocations by managing a pool of free
> >>>    pages
> >>> - batch page table allocation instead of allocating one page at a time
> >>> - pre-reserve pages for page tables needed for map/unmap operations and
> >>>    return the unused page tables to some pool
> >>>
> >>> The first and last reasons are particularly important for GPU drivers
> >>> wanting to implement asynchronous VM_BIND. Asynchronous VM_BIND requires
> >>> that any page table needed for a map/unmap operation to succeed be
> >>> allocated at VM_BIND job creation time. At the time of the job creation,
> >>> we don't know what the VM will look like when we get to execute the
> >>> map/unmap, and can't guess how many page tables we will need. Because
> >>> of that, we have to over-provision page tables for the worst case
> >>> scenario (page table tree is empty), which means we will allocate/free
> >>> a lot. Having pool a pool of free pages is crucial if we want to
> >>> speed-up VM_BIND requests.
> >>>
> >>> A real example of how such custom allocators can be used is available
> >>> here[1]. v2 of the Panthor driver is approaching submission, and I
> >>> figured I'd try to upstream the dependencies separately, which is
> >>> why I submit this series now, even though the user of this new API
> >>> will come afterwards. If you'd prefer to have those patches submitted
> >>> along with the Panthor driver, let me know.
> >>>
> >>> This approach has been discussed with Robin, and is hopefully not too
> >>> far from what he had in mind.  
> >>
> >> The alternative would be to embed a cache of pages into the IOMMU
> >> framework, however kmem_cache sadly doesn't seem to support the
> >> 'reserve' of pages concept that we need. mempools could be a solution
> >> but the mempool would need to be created by the IOMMU framework as the
> >> alloc/free functions are specified when creating the pool. So it would
> >> be a much bigger change (to drivers/iommu).
> >>
> >> So, given that so far it's just Panthor this seems like the right
> >> approach for now - when/if other drivers want the same functionality
> >> then it might make sense to revisit the idea of doing the caching within
> >> the IOMMU framework.  
> > 
> > I have some plans to use this as well for drm/msm.. but the reasons
> > and requirements are basically the same as for panthor.  I think I
> > prefer the custom allocator approach, rather than tying this to IOMMU
> > framework.  (But ofc custom allocators, I guess, does not prevent the
> > iommu driver from doing it's own caching.)
> > 
> > BR,
> > -R
> >   
> 
> We have also posted one RFC patch series which is based on this current 
> patches by Boris and helping us to define our custom alloc and free 
> pgtable call. For our side usecase we have a requirement to create 
> pgtable from HLOS and then share it to different entity(VMID) and 
> basically that also requires few smc calls and for that we need
> custom alloc/free callbacks.
> 
> https://lore.kernel.org/all/20231101071144.16309-1-quic_gkohli@quicinc.com/
> 
> 
> So custom allocator and free ops is helping for us also. Is there any 
> plan to merge these patches from Boris.

Sorry for the late reply. I just sent a v2, but I forgot to add your
Tested-by :-/. Feel free to add it back.

Regards,

Boris

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators
  2023-09-20 16:42   ` Robin Murphy
@ 2023-11-10  9:52     ` Boris Brezillon
  0 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2023-11-10  9:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, iommu, Will Deacon, linux-arm-kernel, Rob Clark

On Wed, 20 Sep 2023 17:42:01 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 09/08/2023 1:17 pm, Boris Brezillon wrote:
> > We need that in order to implement the VM_BIND ioctl in the GPU driver
> > targeting new Mali GPUs.
> > 
> > VM_BIND is about executing MMU map/unmap requests asynchronously,
> > possibly after waiting for external dependencies encoded as dma_fences.
> > We intend to use the drm_sched framework to automate the dependency
> > tracking and VM job dequeuing logic, but this comes with its own set
> > of constraints, one of them being the fact we are not allowed to
> > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
> > sort of deadlocks:
> > 
> > - VM_BIND map job needs to allocate a page table to map some memory
> >    to the VM. No memory available, so kswapd is kicked
> > - GPU driver shrinker backend ends up waiting on the fence attached to
> >    the VM map job or any other job fence depending on this VM operation.
> > 
> > With custom allocators, we will be able to pre-reserve enough pages to
> > guarantee the map/unmap operations we queued will take place without
> > going through the system allocator. But we can also optimize
> > allocation/reservation by not free-ing pages immediately, so any
> > upcoming page table allocation requests can be serviced by some free
> > page table pool kept at the driver level.  
> 
> We should bear in mind it's also potentially valuable for other aspects 
> of GPU and similar use-cases, like fine-grained memory accounting and 
> resource limiting. That's a significant factor in this approach vs. 
> internal caching schemes that could only solve the specific reclaim concern.

I mentioned these other cases in v2. Let me know if that's not detailed
enough.

> > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> > index f4caf630638a..e273c18ae22b 100644
> > --- a/drivers/iommu/io-pgtable.c
> > +++ b/drivers/iommu/io-pgtable.c
> > @@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt,
> >   	if (!cfg->alloc)
> >   		return 0;
> >   
> > +	switch (fmt) {
> > +	case ARM_32_LPAE_S1:
> > +	case ARM_32_LPAE_S2:
> > +	case ARM_64_LPAE_S1:
> > +	case ARM_64_LPAE_S2:
> > +	case ARM_MALI_LPAE:
> > +		return 0;  
> 
> I remain not entirely convinced by the value of this, but could it at 
> least be done in a more scalable manner like some kind of flag provided 
> by the format itself?

I added a caps flag to io_pgtable_init_fns in v2. Feels a bit weird to
add a field that's not a function pointer in a struct that's prefixed
with _fns (which I guess stands for _functions), but oh well.

Regards,

Boris

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-11-10  9:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-09 12:17 [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Boris Brezillon
2023-08-09 12:17 ` [PATCH 1/2] " Boris Brezillon
2023-08-09 12:17 ` [PATCH 2/2] iommu: Extend LPAE page table format to support custom allocators Boris Brezillon
2023-08-09 14:47   ` Will Deacon
2023-08-09 15:10     ` Boris Brezillon
2023-08-28 12:50       ` Boris Brezillon
2023-09-20 16:42   ` Robin Murphy
2023-11-10  9:52     ` Boris Brezillon
2023-09-20 13:12 ` [PATCH 0/2] iommu: Allow passing custom allocators to pgtable drivers Steven Price
2023-10-23 21:02   ` Rob Clark
2023-11-07 11:52     ` Gaurav Kohli
2023-11-07 12:01       ` Gaurav Kohli
2023-11-10  9:47       ` Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).