All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>
Cc: <dri-devel@lists.freedesktop.org>,
	<nouveau@lists.freedesktop.org>, <linux-cxl@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>, <nvdimm@lists.linux.dev>,
	<linux-pci@vger.kernel.org>
Subject: RE: [PATCH v2 16/28] resource: Introduce alloc_free_mem_region()
Date: Thu, 21 Jul 2022 09:10:49 -0700	[thread overview]
Message-ID: <62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com>

[ add dri-devel and nouveau ]

Dan Williams wrote:
> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>    drivers from the CXL Region driver (like the pmem or dax drivers) are
>    responsible for request_mem_region() when they activate the memory
>    range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> As part of the consolidation of the legacy GFR_REQUEST_REGION case with
> the new default of just inserting a new resource into the free space
> some minor cleanups like not checking for NULL before calling
> devres_free() (which does its own check) is included.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>

Jason, Christoph, anyone else that depends on CONFIG_DEVICE_PRIVATE,

Just a heads up that with Jonathan's review I am going to proceed with
pushing this change to linux-next. Please holler if
CONFIG_DEVICE_PRIVATE starts misbehaving, or if you have other feedback,
and I will get it fixed up.

> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |  178 +++++++++++++++++++++++++++++++++++++++---------
>  mm/Kconfig             |    5 +
>  3 files changed, 150 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 79d1ad6d6275..616b683563a9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> +		unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..4c5e80b92f2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
>  }
>  EXPORT_SYMBOL_GPL(page_is_ram);
>  
> -static int __region_intersects(resource_size_t start, size_t size,
> -			unsigned long flags, unsigned long desc)
> +static int __region_intersects(struct resource *parent, resource_size_t start,
> +			       size_t size, unsigned long flags,
> +			       unsigned long desc)
>  {
>  	struct resource res;
>  	int type = 0; int other = 0;
> @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size,
>  	res.start = start;
>  	res.end = start + size - 1;
>  
> -	for (p = iomem_resource.child; p ; p = p->sibling) {
> +	for (p = parent->child; p ; p = p->sibling) {
>  		bool is_type = (((p->flags & flags) == flags) &&
>  				((desc == IORES_DESC_NONE) ||
>  				 (desc == p->desc)));
> @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
>  	int ret;
>  
>  	read_lock(&resource_lock);
> -	ret = __region_intersects(start, size, flags, desc);
> +	ret = __region_intersects(&iomem_resource, start, size, flags, desc);
>  	read_unlock(&resource_lock);
>  
>  	return ret;
> @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static struct resource *__request_free_mem_region(struct device *dev,
> -		struct resource *base, unsigned long size, const char *name)
> +#ifdef CONFIG_GET_FREE_REGION
> +#define GFR_DESCENDING		(1UL << 0)
> +#define GFR_REQUEST_REGION	(1UL << 1)
> +#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
> +
> +static resource_size_t gfr_start(struct resource *base, resource_size_t size,
> +				 resource_size_t align, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING) {
> +		resource_size_t end;
> +
> +		end = min_t(resource_size_t, base->end,
> +			    (1ULL << MAX_PHYSMEM_BITS) - 1);
> +		return end - size + 1;
> +	}
> +
> +	return ALIGN(base->start, align);
> +}
> +
> +static bool gfr_continue(struct resource *base, resource_size_t addr,
> +			 resource_size_t size, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr > size && addr >= base->start;
> +	/*
> +	 * In the ascend case be careful that the last increment by
> +	 * @size did not wrap 0.
> +	 */
> +	return addr > addr - size &&
> +	       addr <= min_t(resource_size_t, base->end,
> +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> +}
> +
> +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> +				unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr - size;
> +	return addr + size;
> +}
> +
> +static void remove_free_mem_region(void *_res)
> +{
> +	struct resource *res = _res;
> +
> +	if (res->parent)
> +		remove_resource(res);
> +	free_resource(res);
> +}
> +
> +static struct resource *
> +get_free_mem_region(struct device *dev, struct resource *base,
> +		    resource_size_t size, const unsigned long align,
> +		    const char *name, const unsigned long desc,
> +		    const unsigned long flags)
>  {
> -	resource_size_t end, addr;
> +	resource_size_t addr;
>  	struct resource *res;
>  	struct region_devres *dr = NULL;
>  
> -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> -	addr = end - size + 1UL;
> +	size = ALIGN(size, align);
>  
>  	res = alloc_resource(GFP_KERNEL);
>  	if (!res)
>  		return ERR_PTR(-ENOMEM);
>  
> -	if (dev) {
> +	if (dev && (flags & GFR_REQUEST_REGION)) {
>  		dr = devres_alloc(devm_region_release,
>  				sizeof(struct region_devres), GFP_KERNEL);
>  		if (!dr) {
>  			free_resource(res);
>  			return ERR_PTR(-ENOMEM);
>  		}
> +	} else if (dev) {
> +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> +			return ERR_PTR(-ENOMEM);
>  	}
>  
>  	write_lock(&resource_lock);
> -	for (; addr > size && addr >= base->start; addr -= size) {
> -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> -				REGION_DISJOINT)
> +	for (addr = gfr_start(base, size, align, flags);
> +	     gfr_continue(base, addr, size, flags);
> +	     addr = gfr_next(addr, size, flags)) {
> +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> +		    REGION_DISJOINT)
>  			continue;
>  
> -		if (__request_region_locked(res, &iomem_resource, addr, size,
> -						name, 0))
> -			break;
> +		if (flags & GFR_REQUEST_REGION) {
> +			if (__request_region_locked(res, &iomem_resource, addr,
> +						    size, name, 0))
> +				break;
>  
> -		if (dev) {
> -			dr->parent = &iomem_resource;
> -			dr->start = addr;
> -			dr->n = size;
> -			devres_add(dev, dr);
> -		}
> +			if (dev) {
> +				dr->parent = &iomem_resource;
> +				dr->start = addr;
> +				dr->n = size;
> +				devres_add(dev, dr);
> +			}
>  
> -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> -		write_unlock(&resource_lock);
> +			res->desc = desc;
> +			write_unlock(&resource_lock);
> +
> +
> +			/*
> +			 * A driver is claiming this region so revoke any
> +			 * mappings.
> +			 */
> +			revoke_iomem(res);
> +		} else {
> +			res->start = addr;
> +			res->end = addr + size - 1;
> +			res->name = name;
> +			res->desc = desc;
> +			res->flags = IORESOURCE_MEM;
> +
> +			/*
> +			 * Only succeed if the resource hosts an exclusive
> +			 * range after the insert
> +			 */
> +			if (__insert_resource(base, res) || res->child)
> +				break;
> +
> +			write_unlock(&resource_lock);
> +		}
>  
> -		/*
> -		 * A driver is claiming this region so revoke any mappings.
> -		 */
> -		revoke_iomem(res);
>  		return res;
>  	}
>  	write_unlock(&resource_lock);
>  
> -	free_resource(res);
> -	if (dr)
> +	if (flags & GFR_REQUEST_REGION) {
> +		free_resource(res);
>  		devres_free(dr);
> +	} else if (dev)
> +		devm_release_action(dev, remove_free_mem_region, res);
>  
>  	return ERR_PTR(-ERANGE);
>  }
> @@ -1854,18 +1932,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
>  struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size)
>  {
> -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> +				   dev_name(dev),
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
>  
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name)
>  {
> -	return __request_free_mem_region(NULL, base, size, name);
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(request_free_mem_region);
>  
> -#endif /* CONFIG_DEVICE_PRIVATE */
> +/**
> + * alloc_free_mem_region - find a free region relative to @base
> + * @base: resource that will parent the new resource
> + * @size: size in bytes of memory to allocate from @base
> + * @align: alignment requirements for the allocation
> + * @name: resource name
> + *
> + * Buses like CXL, that can dynamically instantiate new memory regions,
> + * need a method to allocate physical address space for those regions.
> + * Allocate and insert a new resource to cover a free, unclaimed by a
> + * descendant of @base, range in the span of @base.
> + */
> +struct resource *alloc_free_mem_region(struct resource *base,
> +				       unsigned long size, unsigned long align,
> +				       const char *name)
> +{
> +	/* Default of ascending direction and insert resource */
> +	unsigned long flags = 0;
> +
> +	return get_free_mem_region(NULL, base, size, align, name,
> +				   IORES_DESC_NONE, flags);
> +}
> +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
> +#endif /* CONFIG_GET_FREE_REGION */
>  
>  static int __init strict_iomem(char *str)
>  {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 169e64192e48..a5b4fee2e3fd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -994,9 +994,14 @@ config HMM_MIRROR
>  	bool
>  	depends on MMU
>  
> +config GET_FREE_REGION
> +	depends on SPARSEMEM
> +	bool
> +
>  config DEVICE_PRIVATE
>  	bool "Unaddressable device memory (GPU memory, ...)"
>  	depends on ZONE_DEVICE
> +	select GET_FREE_REGION
>  
>  	help
>  	  Allows creation of struct pages to represent unaddressable device
> 



WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>
Cc: nvdimm@lists.linux.dev, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	Matthew Wilcox <willy@infradead.org>,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [Nouveau] [PATCH v2 16/28] resource: Introduce alloc_free_mem_region()
Date: Thu, 21 Jul 2022 09:10:49 -0700	[thread overview]
Message-ID: <62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com>

[ add dri-devel and nouveau ]

Dan Williams wrote:
> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>    drivers from the CXL Region driver (like the pmem or dax drivers) are
>    responsible for request_mem_region() when they activate the memory
>    range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> As part of the consolidation of the legacy GFR_REQUEST_REGION case with
> the new default of just inserting a new resource into the free space
> some minor cleanups like not checking for NULL before calling
> devres_free() (which does its own check) is included.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>

Jason, Christoph, anyone else that depends on CONFIG_DEVICE_PRIVATE,

Just a heads up that with Jonathan's review I am going to proceed with
pushing this change to linux-next. Please holler if
CONFIG_DEVICE_PRIVATE starts misbehaving, or if you have other feedback,
and I will get it fixed up.

> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |  178 +++++++++++++++++++++++++++++++++++++++---------
>  mm/Kconfig             |    5 +
>  3 files changed, 150 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 79d1ad6d6275..616b683563a9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> +		unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..4c5e80b92f2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
>  }
>  EXPORT_SYMBOL_GPL(page_is_ram);
>  
> -static int __region_intersects(resource_size_t start, size_t size,
> -			unsigned long flags, unsigned long desc)
> +static int __region_intersects(struct resource *parent, resource_size_t start,
> +			       size_t size, unsigned long flags,
> +			       unsigned long desc)
>  {
>  	struct resource res;
>  	int type = 0; int other = 0;
> @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size,
>  	res.start = start;
>  	res.end = start + size - 1;
>  
> -	for (p = iomem_resource.child; p ; p = p->sibling) {
> +	for (p = parent->child; p ; p = p->sibling) {
>  		bool is_type = (((p->flags & flags) == flags) &&
>  				((desc == IORES_DESC_NONE) ||
>  				 (desc == p->desc)));
> @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
>  	int ret;
>  
>  	read_lock(&resource_lock);
> -	ret = __region_intersects(start, size, flags, desc);
> +	ret = __region_intersects(&iomem_resource, start, size, flags, desc);
>  	read_unlock(&resource_lock);
>  
>  	return ret;
> @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static struct resource *__request_free_mem_region(struct device *dev,
> -		struct resource *base, unsigned long size, const char *name)
> +#ifdef CONFIG_GET_FREE_REGION
> +#define GFR_DESCENDING		(1UL << 0)
> +#define GFR_REQUEST_REGION	(1UL << 1)
> +#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
> +
> +static resource_size_t gfr_start(struct resource *base, resource_size_t size,
> +				 resource_size_t align, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING) {
> +		resource_size_t end;
> +
> +		end = min_t(resource_size_t, base->end,
> +			    (1ULL << MAX_PHYSMEM_BITS) - 1);
> +		return end - size + 1;
> +	}
> +
> +	return ALIGN(base->start, align);
> +}
> +
> +static bool gfr_continue(struct resource *base, resource_size_t addr,
> +			 resource_size_t size, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr > size && addr >= base->start;
> +	/*
> +	 * In the ascend case be careful that the last increment by
> +	 * @size did not wrap 0.
> +	 */
> +	return addr > addr - size &&
> +	       addr <= min_t(resource_size_t, base->end,
> +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> +}
> +
> +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> +				unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr - size;
> +	return addr + size;
> +}
> +
> +static void remove_free_mem_region(void *_res)
> +{
> +	struct resource *res = _res;
> +
> +	if (res->parent)
> +		remove_resource(res);
> +	free_resource(res);
> +}
> +
> +static struct resource *
> +get_free_mem_region(struct device *dev, struct resource *base,
> +		    resource_size_t size, const unsigned long align,
> +		    const char *name, const unsigned long desc,
> +		    const unsigned long flags)
>  {
> -	resource_size_t end, addr;
> +	resource_size_t addr;
>  	struct resource *res;
>  	struct region_devres *dr = NULL;
>  
> -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> -	addr = end - size + 1UL;
> +	size = ALIGN(size, align);
>  
>  	res = alloc_resource(GFP_KERNEL);
>  	if (!res)
>  		return ERR_PTR(-ENOMEM);
>  
> -	if (dev) {
> +	if (dev && (flags & GFR_REQUEST_REGION)) {
>  		dr = devres_alloc(devm_region_release,
>  				sizeof(struct region_devres), GFP_KERNEL);
>  		if (!dr) {
>  			free_resource(res);
>  			return ERR_PTR(-ENOMEM);
>  		}
> +	} else if (dev) {
> +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> +			return ERR_PTR(-ENOMEM);
>  	}
>  
>  	write_lock(&resource_lock);
> -	for (; addr > size && addr >= base->start; addr -= size) {
> -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> -				REGION_DISJOINT)
> +	for (addr = gfr_start(base, size, align, flags);
> +	     gfr_continue(base, addr, size, flags);
> +	     addr = gfr_next(addr, size, flags)) {
> +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> +		    REGION_DISJOINT)
>  			continue;
>  
> -		if (__request_region_locked(res, &iomem_resource, addr, size,
> -						name, 0))
> -			break;
> +		if (flags & GFR_REQUEST_REGION) {
> +			if (__request_region_locked(res, &iomem_resource, addr,
> +						    size, name, 0))
> +				break;
>  
> -		if (dev) {
> -			dr->parent = &iomem_resource;
> -			dr->start = addr;
> -			dr->n = size;
> -			devres_add(dev, dr);
> -		}
> +			if (dev) {
> +				dr->parent = &iomem_resource;
> +				dr->start = addr;
> +				dr->n = size;
> +				devres_add(dev, dr);
> +			}
>  
> -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> -		write_unlock(&resource_lock);
> +			res->desc = desc;
> +			write_unlock(&resource_lock);
> +
> +
> +			/*
> +			 * A driver is claiming this region so revoke any
> +			 * mappings.
> +			 */
> +			revoke_iomem(res);
> +		} else {
> +			res->start = addr;
> +			res->end = addr + size - 1;
> +			res->name = name;
> +			res->desc = desc;
> +			res->flags = IORESOURCE_MEM;
> +
> +			/*
> +			 * Only succeed if the resource hosts an exclusive
> +			 * range after the insert
> +			 */
> +			if (__insert_resource(base, res) || res->child)
> +				break;
> +
> +			write_unlock(&resource_lock);
> +		}
>  
> -		/*
> -		 * A driver is claiming this region so revoke any mappings.
> -		 */
> -		revoke_iomem(res);
>  		return res;
>  	}
>  	write_unlock(&resource_lock);
>  
> -	free_resource(res);
> -	if (dr)
> +	if (flags & GFR_REQUEST_REGION) {
> +		free_resource(res);
>  		devres_free(dr);
> +	} else if (dev)
> +		devm_release_action(dev, remove_free_mem_region, res);
>  
>  	return ERR_PTR(-ERANGE);
>  }
> @@ -1854,18 +1932,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
>  struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size)
>  {
> -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> +				   dev_name(dev),
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
>  
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name)
>  {
> -	return __request_free_mem_region(NULL, base, size, name);
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(request_free_mem_region);
>  
> -#endif /* CONFIG_DEVICE_PRIVATE */
> +/**
> + * alloc_free_mem_region - find a free region relative to @base
> + * @base: resource that will parent the new resource
> + * @size: size in bytes of memory to allocate from @base
> + * @align: alignment requirements for the allocation
> + * @name: resource name
> + *
> + * Buses like CXL, that can dynamically instantiate new memory regions,
> + * need a method to allocate physical address space for those regions.
> + * Allocate and insert a new resource to cover a free, unclaimed by a
> + * descendant of @base, range in the span of @base.
> + */
> +struct resource *alloc_free_mem_region(struct resource *base,
> +				       unsigned long size, unsigned long align,
> +				       const char *name)
> +{
> +	/* Default of ascending direction and insert resource */
> +	unsigned long flags = 0;
> +
> +	return get_free_mem_region(NULL, base, size, align, name,
> +				   IORES_DESC_NONE, flags);
> +}
> +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
> +#endif /* CONFIG_GET_FREE_REGION */
>  
>  static int __init strict_iomem(char *str)
>  {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 169e64192e48..a5b4fee2e3fd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -994,9 +994,14 @@ config HMM_MIRROR
>  	bool
>  	depends on MMU
>  
> +config GET_FREE_REGION
> +	depends on SPARSEMEM
> +	bool
> +
>  config DEVICE_PRIVATE
>  	bool "Unaddressable device memory (GPU memory, ...)"
>  	depends on ZONE_DEVICE
> +	select GET_FREE_REGION
>  
>  	help
>  	  Allows creation of struct pages to represent unaddressable device
> 



WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>
Cc: nvdimm@lists.linux.dev, nouveau@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	Matthew Wilcox <willy@infradead.org>,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org
Subject: RE: [PATCH v2 16/28] resource: Introduce alloc_free_mem_region()
Date: Thu, 21 Jul 2022 09:10:49 -0700	[thread overview]
Message-ID: <62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com>

[ add dri-devel and nouveau ]

Dan Williams wrote:
> The core of devm_request_free_mem_region() is a helper that searches for
> free space in iomem_resource and performs __request_region_locked() on
> the result of that search. The policy choices of the implementation
> conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is
> immediately marked busy, and a preference to search for the first-fit
> free range in descending order from the top of the physical address
> space.
> 
> CXL has a need for a similar allocator, but with the following tweaks:
> 
> 1/ Search for free space in ascending order
> 
> 2/ Search for free space relative to a given CXL window
> 
> 3/ 'insert' rather than 'request' the new resource given downstream
>    drivers from the CXL Region driver (like the pmem or dax drivers) are
>    responsible for request_mem_region() when they activate the memory
>    range.
> 
> Rework __request_free_mem_region() into get_free_mem_region() which
> takes a set of GFR_* (Get Free Region) flags to control the allocation
> policy (ascending vs descending), and "busy" policy (insert_resource()
> vs request_region()).
> 
> As part of the consolidation of the legacy GFR_REQUEST_REGION case with
> the new default of just inserting a new resource into the free space
> some minor cleanups like not checking for NULL before calling
> devres_free() (which does its own check) is included.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@lst.de>

Jason, Christoph, anyone else that depends on CONFIG_DEVICE_PRIVATE,

Just a heads up that with Jonathan's review I am going to proceed with
pushing this change to linux-next. Please holler if
CONFIG_DEVICE_PRIVATE starts misbehaving, or if you have other feedback,
and I will get it fixed up.

> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |  178 +++++++++++++++++++++++++++++++++++++++---------
>  mm/Kconfig             |    5 +
>  3 files changed, 150 insertions(+), 35 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 79d1ad6d6275..616b683563a9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size);
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name);
> +struct resource *alloc_free_mem_region(struct resource *base,
> +		unsigned long size, unsigned long align, const char *name);
>  
>  static inline void irqresource_disabled(struct resource *res, u32 irq)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 53a534db350e..4c5e80b92f2f 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -489,8 +489,9 @@ int __weak page_is_ram(unsigned long pfn)
>  }
>  EXPORT_SYMBOL_GPL(page_is_ram);
>  
> -static int __region_intersects(resource_size_t start, size_t size,
> -			unsigned long flags, unsigned long desc)
> +static int __region_intersects(struct resource *parent, resource_size_t start,
> +			       size_t size, unsigned long flags,
> +			       unsigned long desc)
>  {
>  	struct resource res;
>  	int type = 0; int other = 0;
> @@ -499,7 +500,7 @@ static int __region_intersects(resource_size_t start, size_t size,
>  	res.start = start;
>  	res.end = start + size - 1;
>  
> -	for (p = iomem_resource.child; p ; p = p->sibling) {
> +	for (p = parent->child; p ; p = p->sibling) {
>  		bool is_type = (((p->flags & flags) == flags) &&
>  				((desc == IORES_DESC_NONE) ||
>  				 (desc == p->desc)));
> @@ -543,7 +544,7 @@ int region_intersects(resource_size_t start, size_t size, unsigned long flags,
>  	int ret;
>  
>  	read_lock(&resource_lock);
> -	ret = __region_intersects(start, size, flags, desc);
> +	ret = __region_intersects(&iomem_resource, start, size, flags, desc);
>  	read_unlock(&resource_lock);
>  
>  	return ret;
> @@ -1780,62 +1781,139 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> -#ifdef CONFIG_DEVICE_PRIVATE
> -static struct resource *__request_free_mem_region(struct device *dev,
> -		struct resource *base, unsigned long size, const char *name)
> +#ifdef CONFIG_GET_FREE_REGION
> +#define GFR_DESCENDING		(1UL << 0)
> +#define GFR_REQUEST_REGION	(1UL << 1)
> +#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
> +
> +static resource_size_t gfr_start(struct resource *base, resource_size_t size,
> +				 resource_size_t align, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING) {
> +		resource_size_t end;
> +
> +		end = min_t(resource_size_t, base->end,
> +			    (1ULL << MAX_PHYSMEM_BITS) - 1);
> +		return end - size + 1;
> +	}
> +
> +	return ALIGN(base->start, align);
> +}
> +
> +static bool gfr_continue(struct resource *base, resource_size_t addr,
> +			 resource_size_t size, unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr > size && addr >= base->start;
> +	/*
> +	 * In the ascend case be careful that the last increment by
> +	 * @size did not wrap 0.
> +	 */
> +	return addr > addr - size &&
> +	       addr <= min_t(resource_size_t, base->end,
> +			     (1ULL << MAX_PHYSMEM_BITS) - 1);
> +}
> +
> +static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
> +				unsigned long flags)
> +{
> +	if (flags & GFR_DESCENDING)
> +		return addr - size;
> +	return addr + size;
> +}
> +
> +static void remove_free_mem_region(void *_res)
> +{
> +	struct resource *res = _res;
> +
> +	if (res->parent)
> +		remove_resource(res);
> +	free_resource(res);
> +}
> +
> +static struct resource *
> +get_free_mem_region(struct device *dev, struct resource *base,
> +		    resource_size_t size, const unsigned long align,
> +		    const char *name, const unsigned long desc,
> +		    const unsigned long flags)
>  {
> -	resource_size_t end, addr;
> +	resource_size_t addr;
>  	struct resource *res;
>  	struct region_devres *dr = NULL;
>  
> -	size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> -	end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> -	addr = end - size + 1UL;
> +	size = ALIGN(size, align);
>  
>  	res = alloc_resource(GFP_KERNEL);
>  	if (!res)
>  		return ERR_PTR(-ENOMEM);
>  
> -	if (dev) {
> +	if (dev && (flags & GFR_REQUEST_REGION)) {
>  		dr = devres_alloc(devm_region_release,
>  				sizeof(struct region_devres), GFP_KERNEL);
>  		if (!dr) {
>  			free_resource(res);
>  			return ERR_PTR(-ENOMEM);
>  		}
> +	} else if (dev) {
> +		if (devm_add_action_or_reset(dev, remove_free_mem_region, res))
> +			return ERR_PTR(-ENOMEM);
>  	}
>  
>  	write_lock(&resource_lock);
> -	for (; addr > size && addr >= base->start; addr -= size) {
> -		if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> -				REGION_DISJOINT)
> +	for (addr = gfr_start(base, size, align, flags);
> +	     gfr_continue(base, addr, size, flags);
> +	     addr = gfr_next(addr, size, flags)) {
> +		if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
> +		    REGION_DISJOINT)
>  			continue;
>  
> -		if (__request_region_locked(res, &iomem_resource, addr, size,
> -						name, 0))
> -			break;
> +		if (flags & GFR_REQUEST_REGION) {
> +			if (__request_region_locked(res, &iomem_resource, addr,
> +						    size, name, 0))
> +				break;
>  
> -		if (dev) {
> -			dr->parent = &iomem_resource;
> -			dr->start = addr;
> -			dr->n = size;
> -			devres_add(dev, dr);
> -		}
> +			if (dev) {
> +				dr->parent = &iomem_resource;
> +				dr->start = addr;
> +				dr->n = size;
> +				devres_add(dev, dr);
> +			}
>  
> -		res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> -		write_unlock(&resource_lock);
> +			res->desc = desc;
> +			write_unlock(&resource_lock);
> +
> +
> +			/*
> +			 * A driver is claiming this region so revoke any
> +			 * mappings.
> +			 */
> +			revoke_iomem(res);
> +		} else {
> +			res->start = addr;
> +			res->end = addr + size - 1;
> +			res->name = name;
> +			res->desc = desc;
> +			res->flags = IORESOURCE_MEM;
> +
> +			/*
> +			 * Only succeed if the resource hosts an exclusive
> +			 * range after the insert
> +			 */
> +			if (__insert_resource(base, res) || res->child)
> +				break;
> +
> +			write_unlock(&resource_lock);
> +		}
>  
> -		/*
> -		 * A driver is claiming this region so revoke any mappings.
> -		 */
> -		revoke_iomem(res);
>  		return res;
>  	}
>  	write_unlock(&resource_lock);
>  
> -	free_resource(res);
> -	if (dr)
> +	if (flags & GFR_REQUEST_REGION) {
> +		free_resource(res);
>  		devres_free(dr);
> +	} else if (dev)
> +		devm_release_action(dev, remove_free_mem_region, res);
>  
>  	return ERR_PTR(-ERANGE);
>  }
> @@ -1854,18 +1932,48 @@ static struct resource *__request_free_mem_region(struct device *dev,
>  struct resource *devm_request_free_mem_region(struct device *dev,
>  		struct resource *base, unsigned long size)
>  {
> -	return __request_free_mem_region(dev, base, size, dev_name(dev));
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(dev, base, size, GFR_DEFAULT_ALIGN,
> +				   dev_name(dev),
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
>  
>  struct resource *request_free_mem_region(struct resource *base,
>  		unsigned long size, const char *name)
>  {
> -	return __request_free_mem_region(NULL, base, size, name);
> +	unsigned long flags = GFR_DESCENDING | GFR_REQUEST_REGION;
> +
> +	return get_free_mem_region(NULL, base, size, GFR_DEFAULT_ALIGN, name,
> +				   IORES_DESC_DEVICE_PRIVATE_MEMORY, flags);
>  }
>  EXPORT_SYMBOL_GPL(request_free_mem_region);
>  
> -#endif /* CONFIG_DEVICE_PRIVATE */
> +/**
> + * alloc_free_mem_region - find a free region relative to @base
> + * @base: resource that will parent the new resource
> + * @size: size in bytes of memory to allocate from @base
> + * @align: alignment requirements for the allocation
> + * @name: resource name
> + *
> + * Buses like CXL, that can dynamically instantiate new memory regions,
> + * need a method to allocate physical address space for those regions.
> + * Allocate and insert a new resource to cover a free, unclaimed by a
> + * descendant of @base, range in the span of @base.
> + */
> +struct resource *alloc_free_mem_region(struct resource *base,
> +				       unsigned long size, unsigned long align,
> +				       const char *name)
> +{
> +	/* Default of ascending direction and insert resource */
> +	unsigned long flags = 0;
> +
> +	return get_free_mem_region(NULL, base, size, align, name,
> +				   IORES_DESC_NONE, flags);
> +}
> +EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
> +#endif /* CONFIG_GET_FREE_REGION */
>  
>  static int __init strict_iomem(char *str)
>  {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 169e64192e48..a5b4fee2e3fd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -994,9 +994,14 @@ config HMM_MIRROR
>  	bool
>  	depends on MMU
>  
> +config GET_FREE_REGION
> +	depends on SPARSEMEM
> +	bool
> +
>  config DEVICE_PRIVATE
>  	bool "Unaddressable device memory (GPU memory, ...)"
>  	depends on ZONE_DEVICE
> +	select GET_FREE_REGION
>  
>  	help
>  	  Allows creation of struct pages to represent unaddressable device
> 



  parent reply	other threads:[~2022-07-21 16:11 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15  0:00 [PATCH v2 00/28] CXL PMEM Region Provisioning Dan Williams
2022-07-15  0:00 ` [PATCH v2 01/28] Documentation/cxl: Use a double line break between entries Dan Williams
2022-07-20 13:26   ` Jonathan Cameron
2022-07-15  0:00 ` [PATCH v2 02/28] cxl/core: Define a 'struct cxl_switch_decoder' Dan Williams
2022-07-15  2:57   ` kernel test robot
2022-07-20 15:39   ` Jonathan Cameron
2022-07-15  0:00 ` [PATCH v2 03/28] cxl/acpi: Track CXL resources in iomem_resource Dan Williams
2022-07-15  5:23   ` Greg Kroah-Hartman
2022-07-20 16:03   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 04/28] cxl/core: Define a 'struct cxl_root_decoder' Dan Williams
2022-07-20 16:07   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 05/28] cxl/core: Define a 'struct cxl_endpoint_decoder' Dan Williams
2022-07-20 16:11   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 06/28] cxl/hdm: Enumerate allocated DPA Dan Williams
2022-07-20 16:40   ` Jonathan Cameron
2022-07-21 15:29     ` Dan Williams
2022-07-15  0:01 ` [PATCH v2 07/28] cxl/hdm: Add 'mode' attribute to decoder objects Dan Williams
2022-07-15  0:01 ` [PATCH v2 08/28] cxl/hdm: Track next decoder to allocate Dan Williams
2022-07-20 16:45   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 09/28] cxl/hdm: Add support for allocating DPA to an endpoint decoder Dan Williams
2022-07-20 16:51   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 10/28] cxl/port: Record dport in endpoint references Dan Williams
2022-07-20 16:53   ` Jonathan Cameron
2022-07-15  0:01 ` [PATCH v2 11/28] cxl/port: Record parent dport when adding ports Dan Williams
2022-07-15  0:01 ` [PATCH v2 12/28] cxl/port: Move 'cxl_ep' references to an xarray per port Dan Williams
2022-07-15  0:01 ` [PATCH v2 13/28] cxl/port: Move dport tracking to an xarray Dan Williams
2022-07-20 16:56   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 14/28] cxl/hdm: Add sysfs attributes for interleave ways + granularity Dan Williams
2022-07-20 16:58   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 15/28] cxl/mem: Enumerate port targets before adding endpoints Dan Williams
2022-07-15  0:02 ` [PATCH v2 16/28] resource: Introduce alloc_free_mem_region() Dan Williams
2022-07-20 17:00   ` Jonathan Cameron
2022-07-21 16:10   ` Dan Williams [this message]
2022-07-21 16:10     ` Dan Williams
2022-07-21 16:10     ` [Nouveau] " Dan Williams
2022-09-06 13:25   ` Rogerio Alves
2022-07-15  0:02 ` [PATCH v2 17/28] cxl/region: Add region creation support Dan Williams
2022-07-20 17:16   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 18/28] cxl/region: Add a 'uuid' attribute Dan Williams
2022-07-20 17:18   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 19/28] cxl/region: Add interleave geometry attributes Dan Williams
2022-07-15  0:02 ` [PATCH v2 20/28] cxl/region: Allocate HPA capacity to regions Dan Williams
2022-07-20 17:20   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 21/28] cxl/region: Enable the assignment of endpoint decoders " Dan Williams
2022-07-15  3:28   ` kernel test robot
2022-07-20 17:26   ` Jonathan Cameron
2022-07-20 19:05     ` Dan Williams
2022-07-15  0:02 ` [PATCH v2 22/28] cxl/acpi: Add a host-bridge index lookup mechanism Dan Williams
2022-07-15  0:02 ` [PATCH v2 23/28] cxl/region: Attach endpoint decoders Dan Williams
2022-07-20 17:29   ` Jonathan Cameron
2022-07-15  0:02 ` [PATCH v2 24/28] cxl/region: Program target lists Dan Williams
2022-07-20 17:41   ` Jonathan Cameron
2022-07-21 16:56     ` Dan Williams
2022-07-15  0:03 ` [PATCH v2 25/28] cxl/hdm: Commit decoder state to hardware Dan Williams
2022-07-20 17:44   ` Jonathan Cameron
2022-07-15  0:03 ` [PATCH v2 26/28] cxl/region: Add region driver boiler plate Dan Williams
2022-07-15  0:03 ` [PATCH v2 27/28] cxl/pmem: Fix offline_nvdimm_bus() to offline by bridge Dan Williams
2022-07-20 17:46   ` Jonathan Cameron
2022-07-15  0:03 ` [PATCH v2 28/28] cxl/region: Introduce cxl_pmem_region objects Dan Williams
2022-07-20 18:05   ` Jonathan Cameron
2022-07-20 18:12 ` [PATCH v2 00/28] CXL PMEM Region Provisioning Jonathan Cameron
2022-07-21 18:34   ` Dan Williams
2022-07-21 14:59 ` Jonathan Cameron
2022-07-21 16:29   ` Dan Williams
2022-07-21 17:22     ` Jonathan Cameron
2022-07-16 19:55 [PATCH v2 21/28] cxl/region: Enable the assignment of endpoint decoders to regions kernel test robot
2022-07-18 11:32 ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62d97a89d66a1_17f3e82949e@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.