linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Support memory hot-delete to boot memory
@ 2013-04-02 16:17 Toshi Kani
  2013-04-02 16:17 ` [PATCH 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Toshi Kani @ 2013-04-02 16:17 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu, Toshi Kani

Memory hot-delete a memory range present at boot causes an
error message in __release_region(), such as:

 Trying to free nonexistent resource <0000000070000000-0000000077ffffff>

Hot-delete operation still continues since __release_region() is 
a void function, but the target memory range is not freed from
iomem_resource as the result.  This also leads a failure in a 
subsequent hot-add operation to the same memory range since the
address range is still in-use in iomem_resource.

This problem happens because the granularity of memory resource ranges
may be different between boot and hot-delete.  During bootup,
iomem_resource is set up from the boot descriptor table, such as EFI
Memory Table and e820.  Each resource entry usually covers the whole
contiguous memory range.  Hot-delete request, on the other hand, may
target to a particular range of memory resource, and its size can be
much smaller than the whole contiguous memory.  Since the existing
release interfaces like __release_region() require a requested region
to be exactly matched to a resource entry, they do not allow a partial
resource to be released.

This patchset introduces release_mem_region_adjustable() for memory
hot-delete operations, which allows releasing a partial memory range
and adjusts remaining resource accordingly.  This patchset makes no
changes to the existing interfaces since their restriction is still
valid for I/O resources.

---
Toshi Kani (3):
 resource: Add __adjust_resource() for internal use
 resource: Add release_mem_region_adjustable()
 mm: Change __remove_pages() to call release_mem_region_adjustable()

---
 include/linux/ioport.h |   2 +
 kernel/resource.c      | 122 +++++++++++++++++++++++++++++++++++++++++++------
 mm/memory_hotplug.c    |  11 ++++-
 3 files changed, 120 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] resource: Add __adjust_resource() for internal use
  2013-04-02 16:17 [PATCH 0/3] Support memory hot-delete to boot memory Toshi Kani
@ 2013-04-02 16:17 ` Toshi Kani
  2013-04-03  1:11   ` Yasuaki Ishimatsu
  2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
  2013-04-02 16:17 ` [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
  2 siblings, 1 reply; 16+ messages in thread
From: Toshi Kani @ 2013-04-02 16:17 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu, Toshi Kani

Added __adjust_resource(), which is called by adjust_resource()
internally after the resource_lock is held.  There is no interface
change to adjust_resource().  This change allows other functions
to call __adjust_resource() internally while the resource_lock is
held.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 kernel/resource.c |   35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 73f35d4..ae246f9 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -706,24 +706,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
 	write_unlock(&resource_lock);
 }
 
-/**
- * adjust_resource - modify a resource's start and size
- * @res: resource to modify
- * @start: new start value
- * @size: new size
- *
- * Given an existing resource, change its start and size to match the
- * arguments.  Returns 0 on success, -EBUSY if it can't fit.
- * Existing children of the resource are assumed to be immutable.
- */
-int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size)
+static int __adjust_resource(struct resource *res, resource_size_t start,
+				resource_size_t size)
 {
 	struct resource *tmp, *parent = res->parent;
 	resource_size_t end = start + size - 1;
 	int result = -EBUSY;
 
-	write_lock(&resource_lock);
-
 	if (!parent)
 		goto skip;
 
@@ -751,6 +740,26 @@ skip:
 	result = 0;
 
  out:
+	return result;
+}
+
+/**
+ * adjust_resource - modify a resource's start and size
+ * @res: resource to modify
+ * @start: new start value
+ * @size: new size
+ *
+ * Given an existing resource, change its start and size to match the
+ * arguments.  Returns 0 on success, -EBUSY if it can't fit.
+ * Existing children of the resource are assumed to be immutable.
+ */
+int adjust_resource(struct resource *res, resource_size_t start,
+			resource_size_t size)
+{
+	int result;
+
+	write_lock(&resource_lock);
+	result = __adjust_resource(res, start, size);
 	write_unlock(&resource_lock);
 	return result;
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-02 16:17 [PATCH 0/3] Support memory hot-delete to boot memory Toshi Kani
  2013-04-02 16:17 ` [PATCH 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
@ 2013-04-02 16:17 ` Toshi Kani
  2013-04-03  1:26   ` Yasuaki Ishimatsu
                     ` (2 more replies)
  2013-04-02 16:17 ` [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
  2 siblings, 3 replies; 16+ messages in thread
From: Toshi Kani @ 2013-04-02 16:17 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu, Toshi Kani

Added release_mem_region_adjustable(), which releases a requested
region from a currently busy memory resource.  This interface
adjusts the matched memory resource accordingly if the requested
region does not match exactly but still fits into.

This new interface is intended for memory hot-delete.  During
bootup, memory resources are inserted from the boot descriptor
table, such as EFI Memory Table and e820.  Each memory resource
entry usually covers the whole contigous memory range.  Memory
hot-delete request, on the other hand, may target to a particular
range of memory resource, and its size can be much smaller than
the whole contiguous memory.  Since the existing release interfaces
like __release_region() require a requested region to be exactly
matched to a resource entry, they do not allow a partial resource
to be released.

There is no change to the existing interfaces since their restriction
is valid for I/O resources.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/linux/ioport.h |    2 +
 kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 85ac9b9b..0fe1a82 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
 extern int __check_region(struct resource *, resource_size_t, resource_size_t);
 extern void __release_region(struct resource *, resource_size_t,
 				resource_size_t);
+extern int release_mem_region_adjustable(struct resource *, resource_size_t,
+				resource_size_t);
 
 static inline int __deprecated check_region(resource_size_t s,
 						resource_size_t n)
diff --git a/kernel/resource.c b/kernel/resource.c
index ae246f9..789f160 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
 }
 EXPORT_SYMBOL(__release_region);
 
+/**
+ * release_mem_region_adjustable - release a previously reserved memory region
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @size: resource region size
+ *
+ * The requested region is released from a currently busy memory resource.
+ * It adjusts the matched busy memory resource accordingly if the requested
+ * region does not match exactly but still fits into.  Existing children of
+ * the busy memory resource must be immutable in this request.
+ *
+ * Note, when the busy memory resource gets split into two entries, the code
+ * assumes that all children remain in the lower address entry for simplicity.
+ * Enhance this logic when necessary.
+ */
+int release_mem_region_adjustable(struct resource *parent,
+			resource_size_t start, resource_size_t size)
+{
+	struct resource **p;
+	struct resource *res, *new;
+	resource_size_t end;
+	int ret = 0;
+
+	p = &parent->child;
+	end = start + size - 1;
+
+	write_lock(&resource_lock);
+
+	while ((res = *p)) {
+		if (res->start > start || res->end < end) {
+			p = &res->sibling;
+			continue;
+		}
+
+		if (!(res->flags & IORESOURCE_MEM)) {
+			ret = -EINVAL;
+			break;
+		}
+
+		if (!(res->flags & IORESOURCE_BUSY)) {
+			p = &res->child;
+			continue;
+		}
+
+		if (res->start == start && res->end == end) {
+			/* free the whole entry */
+			*p = res->sibling;
+			kfree(res);
+		} else if (res->start == start && res->end != end) {
+			/* adjust the start */
+			ret = __adjust_resource(res, end+1,
+						res->end - end);
+		} else if (res->start != start && res->end == end) {
+			/* adjust the end */
+			ret = __adjust_resource(res, res->start,
+						start - res->start);
+		} else {
+			/* split into two entries */
+			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
+			if (!new) {
+				ret = -ENOMEM;
+				break;
+			}
+			new->name = res->name;
+			new->start = end + 1;
+			new->end = res->end;
+			new->flags = res->flags;
+			new->parent = res->parent;
+			new->sibling = res->sibling;
+			new->child = NULL;
+
+			ret = __adjust_resource(res, res->start,
+						start - res->start);
+			if (ret) {
+				kfree(new);
+				break;
+			}
+			res->sibling = new;
+		}
+
+		break;
+	}
+
+	write_unlock(&resource_lock);
+	return ret;
+}
+
 /*
  * Managed region resource
  */

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
  2013-04-02 16:17 [PATCH 0/3] Support memory hot-delete to boot memory Toshi Kani
  2013-04-02 16:17 ` [PATCH 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
  2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-02 16:17 ` Toshi Kani
  2013-04-03  1:28   ` Yasuaki Ishimatsu
  2 siblings, 1 reply; 16+ messages in thread
From: Toshi Kani @ 2013-04-02 16:17 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu, Toshi Kani

Changed __remove_pages() to call release_mem_region_adjustable().
This allows a requested memory range to be released from
the iomem_resource table even if it does not match exactly to
an resource entry but still fits into.  The resource entries
initialized at bootup usually cover the whole contiguous
memory ranges and may not necessarily match with the size of
memory hot-delete requests.

If release_mem_region_adjustable() failed, __remove_pages() logs
an error message and continues to proceed as it was the case
with release_mem_region().  release_mem_region(), which is defined
to __release_region(), logs an error message and returns no error
since a void function.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/memory_hotplug.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 57decb2..c916582 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -705,8 +705,10 @@ EXPORT_SYMBOL_GPL(__add_pages);
 int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		 unsigned long nr_pages)
 {
-	unsigned long i, ret = 0;
+	unsigned long i;
 	int sections_to_remove;
+	resource_size_t start, size;
+	int ret = 0;
 
 	/*
 	 * We can only remove entire sections
@@ -714,7 +716,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
 	BUG_ON(nr_pages % PAGES_PER_SECTION);
 
-	release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE);
+	start = phys_start_pfn << PAGE_SHIFT;
+	size = nr_pages * PAGE_SIZE;
+	ret = release_mem_region_adjustable(&iomem_resource, start, size);
+	if (ret)
+		pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n",
+				start, start + size - 1, ret);
 
 	sections_to_remove = nr_pages / PAGES_PER_SECTION;
 	for (i = 0; i < sections_to_remove; i++) {

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] resource: Add __adjust_resource() for internal use
  2013-04-02 16:17 ` [PATCH 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
@ 2013-04-03  1:11   ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 16+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-03  1:11 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, wency, tangchen, jiang.liu

Hi Toshi,

2013/04/03 1:17, Toshi Kani wrote:
> Added __adjust_resource(), which is called by adjust_resource()
> internally after the resource_lock is held.  There is no interface
> change to adjust_resource().  This change allows other functions
> to call __adjust_resource() internally while the resource_lock is
> held.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>

The patch looks good.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks,
Yasuaki Ishimatsu

> ---
>   kernel/resource.c |   35 ++++++++++++++++++++++-------------
>   1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 73f35d4..ae246f9 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -706,24 +706,13 @@ void insert_resource_expand_to_fit(struct resource *root, struct resource *new)
>   	write_unlock(&resource_lock);
>   }
>   
> -/**
> - * adjust_resource - modify a resource's start and size
> - * @res: resource to modify
> - * @start: new start value
> - * @size: new size
> - *
> - * Given an existing resource, change its start and size to match the
> - * arguments.  Returns 0 on success, -EBUSY if it can't fit.
> - * Existing children of the resource are assumed to be immutable.
> - */
> -int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size)
> +static int __adjust_resource(struct resource *res, resource_size_t start,
> +				resource_size_t size)
>   {
>   	struct resource *tmp, *parent = res->parent;
>   	resource_size_t end = start + size - 1;
>   	int result = -EBUSY;
>   
> -	write_lock(&resource_lock);
> -
>   	if (!parent)
>   		goto skip;
>   
> @@ -751,6 +740,26 @@ skip:
>   	result = 0;
>   
>    out:
> +	return result;
> +}
> +
> +/**
> + * adjust_resource - modify a resource's start and size
> + * @res: resource to modify
> + * @start: new start value
> + * @size: new size
> + *
> + * Given an existing resource, change its start and size to match the
> + * arguments.  Returns 0 on success, -EBUSY if it can't fit.
> + * Existing children of the resource are assumed to be immutable.
> + */
> +int adjust_resource(struct resource *res, resource_size_t start,
> +			resource_size_t size)
> +{
> +	int result;
> +
> +	write_lock(&resource_lock);
> +	result = __adjust_resource(res, start, size);
>   	write_unlock(&resource_lock);
>   	return result;
>   }
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
@ 2013-04-03  1:26   ` Yasuaki Ishimatsu
  2013-04-03 19:46     ` Toshi Kani
  2013-04-03  5:37   ` Ram Pai
  2013-04-03  7:37   ` Gu Zheng
  2 siblings, 1 reply; 16+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-03  1:26 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, wency, tangchen, jiang.liu

2013/04/03 1:17, Toshi Kani wrote:
> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource.  This interface
> adjusts the matched memory resource accordingly if the requested
> region does not match exactly but still fits into.
> 
> This new interface is intended for memory hot-delete.  During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820.  Each memory resource
> entry usually covers the whole contigous memory range.  Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory.  Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
> 
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>

The patch looks mostly good. One nitpick below.

> ---
>   include/linux/ioport.h |    2 +
>   kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 89 insertions(+)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 85ac9b9b..0fe1a82 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
>   extern int __check_region(struct resource *, resource_size_t, resource_size_t);
>   extern void __release_region(struct resource *, resource_size_t,
>   				resource_size_t);
> +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> +				resource_size_t);
>   
>   static inline int __deprecated check_region(resource_size_t s,
>   						resource_size_t n)
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ae246f9..789f160 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
>   }
>   EXPORT_SYMBOL(__release_region);
>   
> +/**
> + * release_mem_region_adjustable - release a previously reserved memory region
> + * @parent: parent resource descriptor
> + * @start: resource start address
> + * @size: resource region size
> + *
> + * The requested region is released from a currently busy memory resource.
> + * It adjusts the matched busy memory resource accordingly if the requested
> + * region does not match exactly but still fits into.  Existing children of
> + * the busy memory resource must be immutable in this request.
> + *
> + * Note, when the busy memory resource gets split into two entries, the code
> + * assumes that all children remain in the lower address entry for simplicity.
> + * Enhance this logic when necessary.
> + */
> +int release_mem_region_adjustable(struct resource *parent,
> +			resource_size_t start, resource_size_t size)
> +{
> +	struct resource **p;
> +	struct resource *res, *new;
> +	resource_size_t end;
> +	int ret = 0;
> +
> +	p = &parent->child;
> +	end = start + size - 1;
> +
> +	write_lock(&resource_lock);
> +
> +	while ((res = *p)) {
> +		if (res->start > start || res->end < end) {
> +			p = &res->sibling;
> +			continue;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_MEM)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_BUSY)) {
> +			p = &res->child;
> +			continue;
> +		}
> +
> +		if (res->start == start && res->end == end) {
> +			/* free the whole entry */
> +			*p = res->sibling;
> +			kfree(res);
> +		} else if (res->start == start && res->end != end) {
> +			/* adjust the start */
> +			ret = __adjust_resource(res, end+1,
                                                     end + 1,

Thanks,
Yasuaki Ishimatsu

> +						res->end - end);
> +		} else if (res->start != start && res->end == end) {
> +			/* adjust the end */
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +		} else {
> +			/* split into two entries */
> +			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +			if (!new) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			new->name = res->name;
> +			new->start = end + 1;
> +			new->end = res->end;
> +			new->flags = res->flags;
> +			new->parent = res->parent;
> +			new->sibling = res->sibling;
> +			new->child = NULL;
> +
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +			if (ret) {
> +				kfree(new);
> +				break;
> +			}
> +			res->sibling = new;
> +		}
> +
> +		break;
> +	}
> +
> +	write_unlock(&resource_lock);
> +	return ret;
> +}
> +
>   /*
>    * Managed region resource
>    */
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable()
  2013-04-02 16:17 ` [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
@ 2013-04-03  1:28   ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 16+ messages in thread
From: Yasuaki Ishimatsu @ 2013-04-03  1:28 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, wency, tangchen, jiang.liu

Hi Toshi,

2013/04/03 1:17, Toshi Kani wrote:
> Changed __remove_pages() to call release_mem_region_adjustable().
> This allows a requested memory range to be released from
> the iomem_resource table even if it does not match exactly to
> an resource entry but still fits into.  The resource entries
> initialized at bootup usually cover the whole contiguous
> memory ranges and may not necessarily match with the size of
> memory hot-delete requests.
> 
> If release_mem_region_adjustable() failed, __remove_pages() logs
> an error message and continues to proceed as it was the case
> with release_mem_region().  release_mem_region(), which is defined
> to __release_region(), logs an error message and returns no error
> since a void function.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>

The patch looks good.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Thanks,
Yasuaki Ishimatsu

> ---
>   mm/memory_hotplug.c |   11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 57decb2..c916582 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -705,8 +705,10 @@ EXPORT_SYMBOL_GPL(__add_pages);
>   int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   		 unsigned long nr_pages)
>   {
> -	unsigned long i, ret = 0;
> +	unsigned long i;
>   	int sections_to_remove;
> +	resource_size_t start, size;
> +	int ret = 0;
>   
>   	/*
>   	 * We can only remove entire sections
> @@ -714,7 +716,12 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
>   	BUG_ON(nr_pages % PAGES_PER_SECTION);
>   
> -	release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE);
> +	start = phys_start_pfn << PAGE_SHIFT;
> +	size = nr_pages * PAGE_SIZE;
> +	ret = release_mem_region_adjustable(&iomem_resource, start, size);
> +	if (ret)
> +		pr_warn("Unable to release resource <%016llx-%016llx> (%d)\n",
> +				start, start + size - 1, ret);
>   
>   	sections_to_remove = nr_pages / PAGES_PER_SECTION;
>   	for (i = 0; i < sections_to_remove; i++) {
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
  2013-04-03  1:26   ` Yasuaki Ishimatsu
@ 2013-04-03  5:37   ` Ram Pai
  2013-04-03 19:55     ` Toshi Kani
  2013-04-03  7:37   ` Gu Zheng
  2 siblings, 1 reply; 16+ messages in thread
From: Ram Pai @ 2013-04-03  5:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource.  This interface
> adjusts the matched memory resource accordingly if the requested
> region does not match exactly but still fits into.
> 
> This new interface is intended for memory hot-delete.  During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820.  Each memory resource
> entry usually covers the whole contigous memory range.  Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory.  Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
> 
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 85ac9b9b..0fe1a82 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
>  extern int __check_region(struct resource *, resource_size_t, resource_size_t);
>  extern void __release_region(struct resource *, resource_size_t,
>  				resource_size_t);
> +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> +				resource_size_t);
> 
>  static inline int __deprecated check_region(resource_size_t s,
>  						resource_size_t n)
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ae246f9..789f160 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
>  }
>  EXPORT_SYMBOL(__release_region);
> 
> +/**
> + * release_mem_region_adjustable - release a previously reserved memory region
> + * @parent: parent resource descriptor
> + * @start: resource start address
> + * @size: resource region size
> + *
> + * The requested region is released from a currently busy memory resource.
> + * It adjusts the matched busy memory resource accordingly if the requested
> + * region does not match exactly but still fits into.  Existing children of
> + * the busy memory resource must be immutable in this request.
> + *
> + * Note, when the busy memory resource gets split into two entries, the code
> + * assumes that all children remain in the lower address entry for simplicity.
> + * Enhance this logic when necessary.
> + */
> +int release_mem_region_adjustable(struct resource *parent,
> +			resource_size_t start, resource_size_t size)
> +{
> +	struct resource **p;
> +	struct resource *res, *new;
> +	resource_size_t end;
> +	int ret = 0;
> +
> +	p = &parent->child;
> +	end = start + size - 1;
> +
> +	write_lock(&resource_lock);
> +
> +	while ((res = *p)) {
> +		if (res->start > start || res->end < end) {

This check looks sub-optimal; possbily wrong, to me.  if the res->start
is greater than 'start', then obviously its sibling's start will
also be greater than 'start'. So it will loop through all the
resources unnecesarily.  you might want something like

		if (start >= res->end) {
		
> +			p = &res->sibling;
> +			continue;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_MEM)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_BUSY)) {
> +			p = &res->child;
> +			continue;
> +		}
> +
> +		if (res->start == start && res->end == end) {
> +			/* free the whole entry */
> +			*p = res->sibling;
> +			kfree(res);

This is incomplete. the prev resource's sibling should now point to
this resource's sibling. The parent's child has to be updated if
this resource is the first child resource. no?

> +		} else if (res->start == start && res->end != end) {
> +			/* adjust the start */
> +			ret = __adjust_resource(res, end+1,
> +						res->end - end);
> +		} else if (res->start != start && res->end == end) {
> +			/* adjust the end */
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +		} else {
> +			/* split into two entries */
> +			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +			if (!new) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			new->name = res->name;
> +			new->start = end + 1;
> +			new->end = res->end;
> +			new->flags = res->flags;
> +			new->parent = res->parent;
> +			new->sibling = res->sibling;
> +			new->child = NULL;
> +
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +			if (ret) {
> +				kfree(new);
> +				break;
> +			}
> +			res->sibling = new;
> +		}
> +
> +		break;
> +	}
> +
> +	write_unlock(&resource_lock);
> +	return ret;
> +}
> +
>  /*
>   * Managed region resource
>   */

-- 
Ram Pai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
  2013-04-03  1:26   ` Yasuaki Ishimatsu
  2013-04-03  5:37   ` Ram Pai
@ 2013-04-03  7:37   ` Gu Zheng
  2013-04-03 19:58     ` Toshi Kani
  2 siblings, 1 reply; 16+ messages in thread
From: Gu Zheng @ 2013-04-03  7:37 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki,
	wency, tangchen, jiang.liu

On 04/03/2013 12:17 AM, Toshi Kani wrote:

> Added release_mem_region_adjustable(), which releases a requested
> region from a currently busy memory resource.  This interface
> adjusts the matched memory resource accordingly if the requested
> region does not match exactly but still fits into.
> 
> This new interface is intended for memory hot-delete.  During
> bootup, memory resources are inserted from the boot descriptor
> table, such as EFI Memory Table and e820.  Each memory resource
> entry usually covers the whole contigous memory range.  Memory
> hot-delete request, on the other hand, may target to a particular
> range of memory resource, and its size can be much smaller than
> the whole contiguous memory.  Since the existing release interfaces
> like __release_region() require a requested region to be exactly
> matched to a resource entry, they do not allow a partial resource
> to be released.
> 
> There is no change to the existing interfaces since their restriction
> is valid for I/O resources.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  include/linux/ioport.h |    2 +
>  kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 85ac9b9b..0fe1a82 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
>  extern int __check_region(struct resource *, resource_size_t, resource_size_t);
>  extern void __release_region(struct resource *, resource_size_t,
>  				resource_size_t);
> +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> +				resource_size_t);
>  
>  static inline int __deprecated check_region(resource_size_t s,
>  						resource_size_t n)
> diff --git a/kernel/resource.c b/kernel/resource.c
> index ae246f9..789f160 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
>  }
>  EXPORT_SYMBOL(__release_region);
>  
> +/**
> + * release_mem_region_adjustable - release a previously reserved memory region
> + * @parent: parent resource descriptor
> + * @start: resource start address
> + * @size: resource region size
> + *
> + * The requested region is released from a currently busy memory resource.
> + * It adjusts the matched busy memory resource accordingly if the requested
> + * region does not match exactly but still fits into.  Existing children of
> + * the busy memory resource must be immutable in this request.
> + *
> + * Note, when the busy memory resource gets split into two entries, the code
> + * assumes that all children remain in the lower address entry for simplicity.
> + * Enhance this logic when necessary.
> + */
> +int release_mem_region_adjustable(struct resource *parent,
> +			resource_size_t start, resource_size_t size)
> +{
> +	struct resource **p;
> +	struct resource *res, *new;
> +	resource_size_t end;
> +	int ret = 0;
> +
> +	p = &parent->child;
> +	end = start + size - 1;
> +
> +	write_lock(&resource_lock);
> +
> +	while ((res = *p)) {
> +		if (res->start > start || res->end < end) {
> +			p = &res->sibling;
> +			continue;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_MEM)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (!(res->flags & IORESOURCE_BUSY)) {
> +			p = &res->child;
> +			continue;
> +		}
> +
> +		if (res->start == start && res->end == end) {
> +			/* free the whole entry */
> +			*p = res->sibling;
> +			kfree(res);
> +		} else if (res->start == start && res->end != end) {
> +			/* adjust the start */
> +			ret = __adjust_resource(res, end+1,
> +						res->end - end);
> +		} else if (res->start != start && res->end == end) {
> +			/* adjust the end */
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +		} else {
> +			/* split into two entries */
> +			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
> +			if (!new) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			new->name = res->name;
> +			new->start = end + 1;
> +			new->end = res->end;
> +			new->flags = res->flags;
> +			new->parent = res->parent;
> +			new->sibling = res->sibling;
> +			new->child = NULL;
> +
> +			ret = __adjust_resource(res, res->start,
> +						start - res->start);
> +			if (ret) {
> +				kfree(new);
> +				break;
> +			}
> +			res->sibling = new;
> +		}
> +
> +		break;
> +	}
> +
> +	write_unlock(&resource_lock);
> +	return ret;
> +}
> +

Hi Toshi,
  What about the following small changes? Maybe it can make the code more rigorous~

Thanks,
Gu

int release_mem_region_adjustable(struct resource *parent,
                        resource_size_t start, resource_size_t size)
{
        struct resource **p;
        struct resource *res, *new;
        resource_size_t end;
        int ret = 0;

        end = start + size - 1;
        if ((start < parent->start) || (end > parent->end))
                return -EINVAL;

        p = &parent->child;

        write_lock(&resource_lock);

        while (res = *p) {
                if (res->start <= start && res->end >= end) {
                        if (!(res->flags & IORESOURCE_MEM)) {
                                ret = -EINVAL;
                                break;
                        }  

                        if (!(res->flags & IORESOURCE_BUSY)) {
                                p = &res->child;
                                continue;
                        }   

                        if (res->start == start && res->end == end) {
                                /* free the whole entry */
                                *p = res->sibling;
                                kfree(res);
                        } else if (res->start == start && res->end != end) {
                                /* adjust the start */
                                ret = __adjust_resource(res, end+1,
                                                res->end - end);
                        } else if (res->start != start && res->end == end) {
                                /* adjust the end */
                                ret = __adjust_resource(res, res->start,
                                                start - res->start);
                        } else {
                                /* split into two entries */
                                new = kzalloc(sizeof(struct resource), GFP_KERNEL);
                                if (!new) {
                                        ret = -ENOMEM;
                                        break;
                                }   
                                new->name = res->name;
                                new->start = end + 1;
                                new->end = res->end;
                                new->flags = res->flags;
                                new->parent = res->parent;
                                new->sibling = res->sibling;
                                new->child = NULL;

                                ret = __adjust_resource(res, res->start,
                                                start - res->start);
                                if (ret) {
                                        kfree(new);
                                        break;
                                }   
                                res->sibling = new;
                        }   
                        break;
                }   
                p = &res->sibling;
        }   

        write_unlock(&resource_lock);
        return ret;
}

>  /*
>   * Managed region resource
>   */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-03  1:26   ` Yasuaki Ishimatsu
@ 2013-04-03 19:46     ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2013-04-03 19:46 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, wency, tangchen, jiang.liu

On Wed, 2013-04-03 at 10:26 +0900, Yasuaki Ishimatsu wrote:
> 2013/04/03 1:17, Toshi Kani wrote:
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource.  This interface
> > adjusts the matched memory resource accordingly if the requested
> > region does not match exactly but still fits into.
> > 
> > This new interface is intended for memory hot-delete.  During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820.  Each memory resource
> > entry usually covers the whole contigous memory range.  Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory.  Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> > 
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> 
> The patch looks mostly good. One nitpick below.

Thanks for reviewing all 3 patches!

> > ---
> >   include/linux/ioport.h |    2 +
> >   kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 89 insertions(+)
> > 
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 85ac9b9b..0fe1a82 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
> >   extern int __check_region(struct resource *, resource_size_t, resource_size_t);
> >   extern void __release_region(struct resource *, resource_size_t,
> >   				resource_size_t);
> > +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> > +				resource_size_t);
> >   
> >   static inline int __deprecated check_region(resource_size_t s,
> >   						resource_size_t n)
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index ae246f9..789f160 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
> >   }
> >   EXPORT_SYMBOL(__release_region);
> >   
> > +/**
> > + * release_mem_region_adjustable - release a previously reserved memory region
> > + * @parent: parent resource descriptor
> > + * @start: resource start address
> > + * @size: resource region size
> > + *
> > + * The requested region is released from a currently busy memory resource.
> > + * It adjusts the matched busy memory resource accordingly if the requested
> > + * region does not match exactly but still fits into.  Existing children of
> > + * the busy memory resource must be immutable in this request.
> > + *
> > + * Note, when the busy memory resource gets split into two entries, the code
> > + * assumes that all children remain in the lower address entry for simplicity.
> > + * Enhance this logic when necessary.
> > + */
> > +int release_mem_region_adjustable(struct resource *parent,
> > +			resource_size_t start, resource_size_t size)
> > +{
> > +	struct resource **p;
> > +	struct resource *res, *new;
> > +	resource_size_t end;
> > +	int ret = 0;
> > +
> > +	p = &parent->child;
> > +	end = start + size - 1;
> > +
> > +	write_lock(&resource_lock);
> > +
> > +	while ((res = *p)) {
> > +		if (res->start > start || res->end < end) {
> > +			p = &res->sibling;
> > +			continue;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_MEM)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > +			p = &res->child;
> > +			continue;
> > +		}
> > +
> > +		if (res->start == start && res->end == end) {
> > +			/* free the whole entry */
> > +			*p = res->sibling;
> > +			kfree(res);
> > +		} else if (res->start == start && res->end != end) {
> > +			/* adjust the start */
> > +			ret = __adjust_resource(res, end+1,
>                                                      end + 1,

Yes, I will update as you suggested. 

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-03  5:37   ` Ram Pai
@ 2013-04-03 19:55     ` Toshi Kani
  2013-04-04  6:48       ` Ram Pai
  0 siblings, 1 reply; 16+ messages in thread
From: Toshi Kani @ 2013-04-03 19:55 UTC (permalink / raw)
  To: Ram Pai
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Wed, 2013-04-03 at 13:37 +0800, Ram Pai wrote:
> On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource.  This interface
> > adjusts the matched memory resource accordingly if the requested
> > region does not match exactly but still fits into.
> > 
> > This new interface is intended for memory hot-delete.  During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820.  Each memory resource
> > entry usually covers the whole contigous memory range.  Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory.  Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> > 
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  include/linux/ioport.h |    2 +
> >  kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 89 insertions(+)
> > 
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 85ac9b9b..0fe1a82 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
> >  extern int __check_region(struct resource *, resource_size_t, resource_size_t);
> >  extern void __release_region(struct resource *, resource_size_t,
> >  				resource_size_t);
> > +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> > +				resource_size_t);
> > 
> >  static inline int __deprecated check_region(resource_size_t s,
> >  						resource_size_t n)
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index ae246f9..789f160 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
> >  }
> >  EXPORT_SYMBOL(__release_region);
> > 
> > +/**
> > + * release_mem_region_adjustable - release a previously reserved memory region
> > + * @parent: parent resource descriptor
> > + * @start: resource start address
> > + * @size: resource region size
> > + *
> > + * The requested region is released from a currently busy memory resource.
> > + * It adjusts the matched busy memory resource accordingly if the requested
> > + * region does not match exactly but still fits into.  Existing children of
> > + * the busy memory resource must be immutable in this request.
> > + *
> > + * Note, when the busy memory resource gets split into two entries, the code
> > + * assumes that all children remain in the lower address entry for simplicity.
> > + * Enhance this logic when necessary.
> > + */
> > +int release_mem_region_adjustable(struct resource *parent,
> > +			resource_size_t start, resource_size_t size)
> > +{
> > +	struct resource **p;
> > +	struct resource *res, *new;
> > +	resource_size_t end;
> > +	int ret = 0;
> > +
> > +	p = &parent->child;
> > +	end = start + size - 1;
> > +
> > +	write_lock(&resource_lock);
> > +
> > +	while ((res = *p)) {
> > +		if (res->start > start || res->end < end) {
> 
> This check looks sub-optimal; possbily wrong, to me.  if the res->start
> is greater than 'start', then obviously its sibling's start will
> also be greater than 'start'. So it will loop through all the
> resources unnecesarily.

I think this check is necessary to check if the requested range fits
into a resource.  It needs to check both sides to verify this.  I will
add some comment on this check.

>   you might want something like
> 
> 		if (start >= res->end) {

I agree that this list is sorted, so we can optimize an error case (i.e.
no matching entry is found) with an additional check.  I will add the
following check at the beginning of the while loop.  

                if (res->start >= end)
                        break;

I also realized that the function returns 0 when no matching entry is
found.  I will change it to return -EINVAL as well.  

> 		
> > +			p = &res->sibling;
> > +			continue;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_MEM)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > +			p = &res->child;
> > +			continue;
> > +		}
> > +
> > +		if (res->start == start && res->end == end) {
> > +			/* free the whole entry */
> > +			*p = res->sibling;
> > +			kfree(res);
> 
> This is incomplete. the prev resource's sibling should now point to
> this resource's sibling. The parent's child has to be updated if
> this resource is the first child resource. no?

If this resource is the first child, *p is set to &parent->child.  So,
it will update the parents' child.

Thanks!
-Toshi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-03  7:37   ` Gu Zheng
@ 2013-04-03 19:58     ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2013-04-03 19:58 UTC (permalink / raw)
  To: Gu Zheng
  Cc: akpm, linux-mm, linux-kernel, linuxram, tmac, isimatu.yasuaki,
	wency, tangchen, jiang.liu

On Wed, 2013-04-03 at 15:37 +0800, Gu Zheng wrote:
> On 04/03/2013 12:17 AM, Toshi Kani wrote:
> 
> > Added release_mem_region_adjustable(), which releases a requested
> > region from a currently busy memory resource.  This interface
> > adjusts the matched memory resource accordingly if the requested
> > region does not match exactly but still fits into.
> > 
> > This new interface is intended for memory hot-delete.  During
> > bootup, memory resources are inserted from the boot descriptor
> > table, such as EFI Memory Table and e820.  Each memory resource
> > entry usually covers the whole contigous memory range.  Memory
> > hot-delete request, on the other hand, may target to a particular
> > range of memory resource, and its size can be much smaller than
> > the whole contiguous memory.  Since the existing release interfaces
> > like __release_region() require a requested region to be exactly
> > matched to a resource entry, they do not allow a partial resource
> > to be released.
> > 
> > There is no change to the existing interfaces since their restriction
> > is valid for I/O resources.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  include/linux/ioport.h |    2 +
> >  kernel/resource.c      |   87 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 89 insertions(+)
> > 
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 85ac9b9b..0fe1a82 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource *,
> >  extern int __check_region(struct resource *, resource_size_t, resource_size_t);
> >  extern void __release_region(struct resource *, resource_size_t,
> >  				resource_size_t);
> > +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
> > +				resource_size_t);
> >  
> >  static inline int __deprecated check_region(resource_size_t s,
> >  						resource_size_t n)
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index ae246f9..789f160 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, resource_size_t start,
> >  }
> >  EXPORT_SYMBOL(__release_region);
> >  
> > +/**
> > + * release_mem_region_adjustable - release a previously reserved memory region
> > + * @parent: parent resource descriptor
> > + * @start: resource start address
> > + * @size: resource region size
> > + *
> > + * The requested region is released from a currently busy memory resource.
> > + * It adjusts the matched busy memory resource accordingly if the requested
> > + * region does not match exactly but still fits into.  Existing children of
> > + * the busy memory resource must be immutable in this request.
> > + *
> > + * Note, when the busy memory resource gets split into two entries, the code
> > + * assumes that all children remain in the lower address entry for simplicity.
> > + * Enhance this logic when necessary.
> > + */
> > +int release_mem_region_adjustable(struct resource *parent,
> > +			resource_size_t start, resource_size_t size)
> > +{
> > +	struct resource **p;
> > +	struct resource *res, *new;
> > +	resource_size_t end;
> > +	int ret = 0;
> > +
> > +	p = &parent->child;
> > +	end = start + size - 1;
> > +
> > +	write_lock(&resource_lock);
> > +
> > +	while ((res = *p)) {
> > +		if (res->start > start || res->end < end) {
> > +			p = &res->sibling;
> > +			continue;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_MEM)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > +			p = &res->child;
> > +			continue;
> > +		}
> > +
> > +		if (res->start == start && res->end == end) {
> > +			/* free the whole entry */
> > +			*p = res->sibling;
> > +			kfree(res);
> > +		} else if (res->start == start && res->end != end) {
> > +			/* adjust the start */
> > +			ret = __adjust_resource(res, end+1,
> > +						res->end - end);
> > +		} else if (res->start != start && res->end == end) {
> > +			/* adjust the end */
> > +			ret = __adjust_resource(res, res->start,
> > +						start - res->start);
> > +		} else {
> > +			/* split into two entries */
> > +			new = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > +			if (!new) {
> > +				ret = -ENOMEM;
> > +				break;
> > +			}
> > +			new->name = res->name;
> > +			new->start = end + 1;
> > +			new->end = res->end;
> > +			new->flags = res->flags;
> > +			new->parent = res->parent;
> > +			new->sibling = res->sibling;
> > +			new->child = NULL;
> > +
> > +			ret = __adjust_resource(res, res->start,
> > +						start - res->start);
> > +			if (ret) {
> > +				kfree(new);
> > +				break;
> > +			}
> > +			res->sibling = new;
> > +		}
> > +
> > +		break;
> > +	}
> > +
> > +	write_unlock(&resource_lock);
> > +	return ret;
> > +}
> > +
> 
> Hi Toshi,
>   What about the following small changes? Maybe it can make the code more rigorous~
> 
> Thanks,
> Gu
> 
> int release_mem_region_adjustable(struct resource *parent,
>                         resource_size_t start, resource_size_t size)
> {
>         struct resource **p;
>         struct resource *res, *new;
>         resource_size_t end;
>         int ret = 0;
> 
>         end = start + size - 1;
>         if ((start < parent->start) || (end > parent->end))
>                 return -EINVAL;

Sure, I will add this check.

> 
>         p = &parent->child;
> 
>         write_lock(&resource_lock);
> 
>         while (res = *p) {

gcc actually complains if I do not put parentheses "((res = *p))" here.

>                 if (res->start <= start && res->end >= end) {

Yes, this way works the same as well.  But, I'd prefer to avoid a big
if-statement in a loop in order to keep indent small.  I will add
comments to make sure that the code is easy to follow.

Thanks!
-Toshi

>                         if (!(res->flags & IORESOURCE_MEM)) {
>                                 ret = -EINVAL;
>                                 break;
>                         }  
> 
>                         if (!(res->flags & IORESOURCE_BUSY)) {
>                                 p = &res->child;
>                                 continue;
>                         }   
> 
>                         if (res->start == start && res->end == end) {
>                                 /* free the whole entry */
>                                 *p = res->sibling;
>                                 kfree(res);
>                         } else if (res->start == start && res->end != end) {
>                                 /* adjust the start */
>                                 ret = __adjust_resource(res, end+1,
>                                                 res->end - end);
>                         } else if (res->start != start && res->end == end) {
>                                 /* adjust the end */
>                                 ret = __adjust_resource(res, res->start,
>                                                 start - res->start);
>                         } else {
>                                 /* split into two entries */
>                                 new = kzalloc(sizeof(struct resource), GFP_KERNEL);
>                                 if (!new) {
>                                         ret = -ENOMEM;
>                                         break;
>                                 }   
>                                 new->name = res->name;
>                                 new->start = end + 1;
>                                 new->end = res->end;
>                                 new->flags = res->flags;
>                                 new->parent = res->parent;
>                                 new->sibling = res->sibling;
>                                 new->child = NULL;
> 
>                                 ret = __adjust_resource(res, res->start,
>                                                 start - res->start);
>                                 if (ret) {
>                                         kfree(new);
>                                         break;
>                                 }   
>                                 res->sibling = new;
>                         }   
>                         break;
>                 }   
>                 p = &res->sibling;
>         }   
> 
>         write_unlock(&resource_lock);
>         return ret;
> }
> 
> >  /*
> >   * Managed region resource
> >   */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-03 19:55     ` Toshi Kani
@ 2013-04-04  6:48       ` Ram Pai
  2013-04-04 14:07         ` Toshi Kani
  0 siblings, 1 reply; 16+ messages in thread
From: Ram Pai @ 2013-04-04  6:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Wed, Apr 03, 2013 at 01:55:05PM -0600, Toshi Kani wrote:
> On Wed, 2013-04-03 at 13:37 +0800, Ram Pai wrote:
> > On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> > > +	while ((res = *p)) {

...snip...

> > > +		if (res->start > start || res->end < end) {
> > 
> > This check looks sub-optimal; possbily wrong, to me.  if the res->start
> > is greater than 'start', then obviously its sibling's start will
> > also be greater than 'start'. So it will loop through all the
> > resources unnecesarily.
> 
> I think this check is necessary to check if the requested range fits
> into a resource.  It needs to check both sides to verify this.  I will
> add some comment on this check.
> 
> >   you might want something like
> > 
> > 		if (start >= res->end) {
> 
> I agree that this list is sorted, so we can optimize an error case (i.e.
> no matching entry is found) with an additional check.  I will add the
> following check at the beginning of the while loop.  
> 
>                 if (res->start >= end)
>                         break;
> 
> I also realized that the function returns 0 when no matching entry is
> found.  I will change it to return -EINVAL as well.  

ok. this will take care of it.

> 
> > 		
> > > +			p = &res->sibling;
> > > +			continue;
> > > +		}
> > > +
> > > +		if (!(res->flags & IORESOURCE_MEM)) {
> > > +			ret = -EINVAL;
> > > +			break;
> > > +		}
> > > +
> > > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > > +			p = &res->child;
> > > +			continue;
> > > +		}
> > > +
> > > +		if (res->start == start && res->end == end) {
> > > +			/* free the whole entry */
> > > +			*p = res->sibling;
> > > +			kfree(res);
> > 
> > This is incomplete. the prev resource's sibling should now point to
> > this resource's sibling. The parent's child has to be updated if
> > this resource is the first child resource. no?
> 
> If this resource is the first child, *p is set to &parent->child.  So,
> it will update the parents' child.

But if the resource is not the parent's first child? will it update the
previous siblings ->sibling ?

-- 
Ram Pai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-04  6:48       ` Ram Pai
@ 2013-04-04 14:07         ` Toshi Kani
  2013-04-07  4:01           ` Ram Pai
  0 siblings, 1 reply; 16+ messages in thread
From: Toshi Kani @ 2013-04-04 14:07 UTC (permalink / raw)
  To: Ram Pai
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Thu, 2013-04-04 at 14:48 +0800, Ram Pai wrote:
> On Wed, Apr 03, 2013 at 01:55:05PM -0600, Toshi Kani wrote:
> > On Wed, 2013-04-03 at 13:37 +0800, Ram Pai wrote:
> > > On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> > > > +	while ((res = *p)) {
> 
> ...snip...
> 
> > > > +		if (res->start > start || res->end < end) {
> > > 
> > > This check looks sub-optimal; possbily wrong, to me.  if the res->start
> > > is greater than 'start', then obviously its sibling's start will
> > > also be greater than 'start'. So it will loop through all the
> > > resources unnecesarily.
> > 
> > I think this check is necessary to check if the requested range fits
> > into a resource.  It needs to check both sides to verify this.  I will
> > add some comment on this check.
> > 
> > >   you might want something like
> > > 
> > > 		if (start >= res->end) {
> > 
> > I agree that this list is sorted, so we can optimize an error case (i.e.
> > no matching entry is found) with an additional check.  I will add the
> > following check at the beginning of the while loop.  
> > 
> >                 if (res->start >= end)
> >                         break;
> > 
> > I also realized that the function returns 0 when no matching entry is
> > found.  I will change it to return -EINVAL as well.  
> 
> ok. this will take care of it.
> 
> > 
> > > 		
> > > > +			p = &res->sibling;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		if (!(res->flags & IORESOURCE_MEM)) {
> > > > +			ret = -EINVAL;
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > > > +			p = &res->child;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		if (res->start == start && res->end == end) {
> > > > +			/* free the whole entry */
> > > > +			*p = res->sibling;
> > > > +			kfree(res);
> > > 
> > > This is incomplete. the prev resource's sibling should now point to
> > > this resource's sibling. The parent's child has to be updated if
> > > this resource is the first child resource. no?
> > 
> > If this resource is the first child, *p is set to &parent->child.  So,
> > it will update the parents' child.
> 
> But if the resource is not the parent's first child? will it update the
> previous siblings ->sibling ?

Yes.  When it continues in the while loop, p is set to &res->sibling.
So, it will update the previous sibling's ->sibling.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-04 14:07         ` Toshi Kani
@ 2013-04-07  4:01           ` Ram Pai
  2013-04-08 14:24             ` Toshi Kani
  0 siblings, 1 reply; 16+ messages in thread
From: Ram Pai @ 2013-04-07  4:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Thu, Apr 04, 2013 at 08:07:44AM -0600, Toshi Kani wrote:
> On Thu, 2013-04-04 at 14:48 +0800, Ram Pai wrote:
> > On Wed, Apr 03, 2013 at 01:55:05PM -0600, Toshi Kani wrote:
> > > On Wed, 2013-04-03 at 13:37 +0800, Ram Pai wrote:
> > > > On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> > > > > +	while ((res = *p)) {
> > 
> > ...snip...
> > 
> > > > > +		if (res->start > start || res->end < end) {
> > > > 
> > > > This check looks sub-optimal; possbily wrong, to me.  if the res->start
> > > > is greater than 'start', then obviously its sibling's start will
> > > > also be greater than 'start'. So it will loop through all the
> > > > resources unnecesarily.
> > > 
> > > I think this check is necessary to check if the requested range fits
> > > into a resource.  It needs to check both sides to verify this.  I will
> > > add some comment on this check.
> > > 
> > > >   you might want something like
> > > > 
> > > > 		if (start >= res->end) {
> > > 
> > > I agree that this list is sorted, so we can optimize an error case (i.e.
> > > no matching entry is found) with an additional check.  I will add the
> > > following check at the beginning of the while loop.  
> > > 
> > >                 if (res->start >= end)
> > >                         break;
> > > 
> > > I also realized that the function returns 0 when no matching entry is
> > > found.  I will change it to return -EINVAL as well.  
> > 
> > ok. this will take care of it.
> > 
> > > 
> > > > 		
> > > > > +			p = &res->sibling;
> > > > > +			continue;
> > > > > +		}
> > > > > +
> > > > > +		if (!(res->flags & IORESOURCE_MEM)) {
> > > > > +			ret = -EINVAL;
> > > > > +			break;
> > > > > +		}
> > > > > +
> > > > > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > > > > +			p = &res->child;
> > > > > +			continue;
> > > > > +		}
> > > > > +
> > > > > +		if (res->start == start && res->end == end) {
> > > > > +			/* free the whole entry */
> > > > > +			*p = res->sibling;
> > > > > +			kfree(res);
> > > > 
> > > > This is incomplete. the prev resource's sibling should now point to
> > > > this resource's sibling. The parent's child has to be updated if
> > > > this resource is the first child resource. no?
> > > 
> > > If this resource is the first child, *p is set to &parent->child.  So,
> > > it will update the parents' child.
> > 
> > But if the resource is not the parent's first child? will it update the
> > previous siblings ->sibling ?
> 
> Yes.  When it continues in the while loop, p is set to &res->sibling.
> So, it will update the previous sibling's ->sibling.

You are right. It does update the pointers correctly. I mis-read the
code.
RP


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()
  2013-04-07  4:01           ` Ram Pai
@ 2013-04-08 14:24             ` Toshi Kani
  0 siblings, 0 replies; 16+ messages in thread
From: Toshi Kani @ 2013-04-08 14:24 UTC (permalink / raw)
  To: Ram Pai
  Cc: akpm, linux-mm, linux-kernel, tmac, isimatu.yasuaki, wency,
	tangchen, jiang.liu

On Sun, 2013-04-07 at 12:01 +0800, Ram Pai wrote:
> On Thu, Apr 04, 2013 at 08:07:44AM -0600, Toshi Kani wrote:
> > On Thu, 2013-04-04 at 14:48 +0800, Ram Pai wrote:
> > > On Wed, Apr 03, 2013 at 01:55:05PM -0600, Toshi Kani wrote:
> > > > On Wed, 2013-04-03 at 13:37 +0800, Ram Pai wrote:
> > > > > On Tue, Apr 02, 2013 at 10:17:29AM -0600, Toshi Kani wrote:
> > > > > > +	while ((res = *p)) {
> > > 
> > > ...snip...
> > > 
> > > > > > +		if (res->start > start || res->end < end) {
> > > > > 
> > > > > This check looks sub-optimal; possbily wrong, to me.  if the res->start
> > > > > is greater than 'start', then obviously its sibling's start will
> > > > > also be greater than 'start'. So it will loop through all the
> > > > > resources unnecesarily.
> > > > 
> > > > I think this check is necessary to check if the requested range fits
> > > > into a resource.  It needs to check both sides to verify this.  I will
> > > > add some comment on this check.
> > > > 
> > > > >   you might want something like
> > > > > 
> > > > > 		if (start >= res->end) {
> > > > 
> > > > I agree that this list is sorted, so we can optimize an error case (i.e.
> > > > no matching entry is found) with an additional check.  I will add the
> > > > following check at the beginning of the while loop.  
> > > > 
> > > >                 if (res->start >= end)
> > > >                         break;
> > > > 
> > > > I also realized that the function returns 0 when no matching entry is
> > > > found.  I will change it to return -EINVAL as well.  
> > > 
> > > ok. this will take care of it.
> > > 
> > > > 
> > > > > 		
> > > > > > +			p = &res->sibling;
> > > > > > +			continue;
> > > > > > +		}
> > > > > > +
> > > > > > +		if (!(res->flags & IORESOURCE_MEM)) {
> > > > > > +			ret = -EINVAL;
> > > > > > +			break;
> > > > > > +		}
> > > > > > +
> > > > > > +		if (!(res->flags & IORESOURCE_BUSY)) {
> > > > > > +			p = &res->child;
> > > > > > +			continue;
> > > > > > +		}
> > > > > > +
> > > > > > +		if (res->start == start && res->end == end) {
> > > > > > +			/* free the whole entry */
> > > > > > +			*p = res->sibling;
> > > > > > +			kfree(res);
> > > > > 
> > > > > This is incomplete. the prev resource's sibling should now point to
> > > > > this resource's sibling. The parent's child has to be updated if
> > > > > this resource is the first child resource. no?
> > > > 
> > > > If this resource is the first child, *p is set to &parent->child.  So,
> > > > it will update the parents' child.
> > > 
> > > But if the resource is not the parent's first child? will it update the
> > > previous siblings ->sibling ?
> > 
> > Yes.  When it continues in the while loop, p is set to &res->sibling.
> > So, it will update the previous sibling's ->sibling.
> 
> You are right. It does update the pointers correctly. I mis-read the
> code.

No problem.  Thanks for reviewing it!
-Toshi


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-04-08 14:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-02 16:17 [PATCH 0/3] Support memory hot-delete to boot memory Toshi Kani
2013-04-02 16:17 ` [PATCH 1/3] resource: Add __adjust_resource() for internal use Toshi Kani
2013-04-03  1:11   ` Yasuaki Ishimatsu
2013-04-02 16:17 ` [PATCH 2/3] resource: Add release_mem_region_adjustable() Toshi Kani
2013-04-03  1:26   ` Yasuaki Ishimatsu
2013-04-03 19:46     ` Toshi Kani
2013-04-03  5:37   ` Ram Pai
2013-04-03 19:55     ` Toshi Kani
2013-04-04  6:48       ` Ram Pai
2013-04-04 14:07         ` Toshi Kani
2013-04-07  4:01           ` Ram Pai
2013-04-08 14:24             ` Toshi Kani
2013-04-03  7:37   ` Gu Zheng
2013-04-03 19:58     ` Toshi Kani
2013-04-02 16:17 ` [PATCH 3/3] mm: Change __remove_pages() to call release_mem_region_adjustable() Toshi Kani
2013-04-03  1:28   ` Yasuaki Ishimatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).