All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25  2:39 ` Gioh Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25  2:39 UTC (permalink / raw)
  To: akpm, mgorman, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee, Gioh Kim

My driver allocates more than 40MB pages via alloc_page() at a time and
maps them at virtual address. Totally it uses 300~400MB pages.

If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
because-of the external fragmentation.

I thought I needed a anti-fragmentation solution for my driver.
But there is no allocation function that considers fragmentation.
The compaction is not helpful because it is only for movable pages, not unmovable pages.

This patch proposes a allocation function allocates only pages in the same pageblock.

I tested this patch like following:

1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
Node 0, zone   Normal          135            3          124            2            0            0
Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1

2. The driver frees all pages and allocates pages again with alloc_pages_compact.
This is a kind of compaction of the driver.
Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
Node 0, zone   Normal          135            3          124            2            0            0
Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1


I found that fragmentation is decreased.

This patch is based on 3.16. It is not change any code so that it can apply to any version.

Changelog since v1:
- change argument of page order into page count

Signed-off-by: Gioh Kim <gioh.kim@lge.com>
---
 mm/page_alloc.c |  167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 86c9a72..e269030 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6646,3 +6646,170 @@ void dump_page(struct page *page, const char *reason)
 	dump_page_badflags(page, reason, 0);
 }
 EXPORT_SYMBOL(dump_page);
+
+static unsigned long alloc_freepages_block(unsigned long start_pfn,
+					   unsigned long end_pfn,
+					   int count,
+					   struct list_head *freelist)
+{
+	int total_alloc = 0;
+	struct page *cursor, *valid_page = NULL;
+
+	cursor = pfn_to_page(start_pfn);
+
+	/* Isolate free pages. */
+	for (; start_pfn < end_pfn; start_pfn++, cursor++) {
+		int alloc, i;
+		struct page *page = cursor;
+
+		if (!pfn_valid_within(start_pfn))
+			continue;
+
+		if (!valid_page)
+			valid_page = page;
+		if (!PageBuddy(page))
+			continue;
+
+		if (!PageBuddy(page))
+			continue;
+
+		/* allocate only low-order pages */
+		if (page_order(page) >= 3) {
+			start_pfn += (1 << page_order(page)) - 1;
+			cursor += (1 << page_order(page)) - 1;
+			continue;
+		}
+
+		/* Found a free pages, break it into order-0 pages */
+		alloc = split_free_page(page);
+
+		total_alloc += alloc;
+		for (i = 0; i < alloc; i++) {
+			list_add(&page->lru, freelist);
+			page++;
+		}
+
+		if (total_alloc >= count)
+			break;
+
+		if (alloc) {
+			start_pfn += alloc - 1;
+			cursor += alloc - 1;
+			continue;
+		}
+	}
+
+	return total_alloc;
+}
+
+static int rmqueue_compact(struct zone *zone, int nr_request,
+			   int migratetype, struct list_head *freepages)
+{
+	unsigned int current_order;
+	struct free_area *area;
+	struct page *page;
+	unsigned long block_start_pfn;	/* start of current pageblock */
+	unsigned long block_end_pfn;	/* end of current pageblock */
+	int total_alloc = 0;
+	unsigned long flags;
+	struct page *next;
+	int to_free = 0;
+	int nr_remain = nr_request;
+	int loop_count = 0;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	/* Find a page of the appropriate size in the preferred list */
+	current_order = 0;
+	page = NULL;
+	while (current_order <= pageblock_order) {
+		int alloc;
+
+		/* search all possible pages in each list? */
+		if (loop_count > (zone->managed_pages / (1 << current_order)))
+			goto next_order;
+		loop_count++;
+
+		area = &(zone->free_area[current_order]);
+
+		if (list_empty(&area->free_list[migratetype]))
+			goto next_order;
+
+		page = list_entry(area->free_list[migratetype].next,
+				  struct page, lru);
+
+		/*
+		 * check migratetype of pageblock again,
+		 * some pages can be set as different migratetype
+		 * by rmqueue_fallback
+		 */
+		if (get_pageblock_migratetype(page) != migratetype)
+			continue;
+
+		block_start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+		block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
+				    zone_end_pfn(zone));
+
+		alloc = alloc_freepages_block(block_start_pfn,
+						 block_end_pfn,
+						 nr_remain,
+						 freepages);
+
+		total_alloc += alloc;
+		nr_remain -= alloc;
+
+		/*
+		 * alloc == 0: free buddy block is found but it is too big
+		 * or free buddy block is not valid page.
+		 * Try next order.
+		*/
+		if (alloc == 0)
+			goto next_order;
+
+		if (nr_remain <= 0)
+			break;
+
+next_order:
+		current_order++;
+		loop_count = 0;
+	}
+	__mod_zone_page_state(zone, NR_ALLOC_BATCH, -total_alloc);
+	__count_zone_vm_events(PGALLOC, zone, total_alloc);
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	list_for_each_entry_safe(page, next, freepages, lru) {
+		if (to_free >= nr_request) {
+			list_del(&page->lru);
+			atomic_dec(&page->_count);
+			__free_pages_ok(page, 0);
+		}
+		to_free++;
+	}
+
+	list_for_each_entry(page, freepages, lru) {
+		arch_alloc_page(page, 0);
+		kernel_map_pages(page, 1, 1);
+	}
+	return total_alloc < nr_request ? total_alloc : nr_request;
+}
+
+int alloc_pages_compact(gfp_t gfp_mask, int nr_request,
+			struct list_head *freepages)
+{
+	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+	struct zone *preferred_zone;
+	struct zoneref *preferred_zoneref;
+
+	preferred_zoneref = first_zones_zonelist(node_zonelist(numa_node_id(),
+							       gfp_mask),
+						 high_zoneidx,
+						 &cpuset_current_mems_allowed,
+						 &preferred_zone);
+	if (!preferred_zone)
+		return 0;
+
+	return rmqueue_compact(preferred_zone, nr_request,
+			       allocflags_to_migratetype(gfp_mask), freepages);
+}
+EXPORT_SYMBOL(alloc_pages_compact);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25  2:39 ` Gioh Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25  2:39 UTC (permalink / raw)
  To: akpm, mgorman, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee, Gioh Kim

My driver allocates more than 40MB pages via alloc_page() at a time and
maps them at virtual address. Totally it uses 300~400MB pages.

If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
because-of the external fragmentation.

I thought I needed a anti-fragmentation solution for my driver.
But there is no allocation function that considers fragmentation.
The compaction is not helpful because it is only for movable pages, not unmovable pages.

This patch proposes a allocation function allocates only pages in the same pageblock.

I tested this patch like following:

1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
Node 0, zone   Normal          135            3          124            2            0            0
Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1

2. The driver frees all pages and allocates pages again with alloc_pages_compact.
This is a kind of compaction of the driver.
Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
Node 0, zone   Normal          135            3          124            2            0            0
Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1


I found that fragmentation is decreased.

This patch is based on 3.16. It is not change any code so that it can apply to any version.

Changelog since v1:
- change argument of page order into page count

Signed-off-by: Gioh Kim <gioh.kim@lge.com>
---
 mm/page_alloc.c |  167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 86c9a72..e269030 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6646,3 +6646,170 @@ void dump_page(struct page *page, const char *reason)
 	dump_page_badflags(page, reason, 0);
 }
 EXPORT_SYMBOL(dump_page);
+
+static unsigned long alloc_freepages_block(unsigned long start_pfn,
+					   unsigned long end_pfn,
+					   int count,
+					   struct list_head *freelist)
+{
+	int total_alloc = 0;
+	struct page *cursor, *valid_page = NULL;
+
+	cursor = pfn_to_page(start_pfn);
+
+	/* Isolate free pages. */
+	for (; start_pfn < end_pfn; start_pfn++, cursor++) {
+		int alloc, i;
+		struct page *page = cursor;
+
+		if (!pfn_valid_within(start_pfn))
+			continue;
+
+		if (!valid_page)
+			valid_page = page;
+		if (!PageBuddy(page))
+			continue;
+
+		if (!PageBuddy(page))
+			continue;
+
+		/* allocate only low-order pages */
+		if (page_order(page) >= 3) {
+			start_pfn += (1 << page_order(page)) - 1;
+			cursor += (1 << page_order(page)) - 1;
+			continue;
+		}
+
+		/* Found a free pages, break it into order-0 pages */
+		alloc = split_free_page(page);
+
+		total_alloc += alloc;
+		for (i = 0; i < alloc; i++) {
+			list_add(&page->lru, freelist);
+			page++;
+		}
+
+		if (total_alloc >= count)
+			break;
+
+		if (alloc) {
+			start_pfn += alloc - 1;
+			cursor += alloc - 1;
+			continue;
+		}
+	}
+
+	return total_alloc;
+}
+
+static int rmqueue_compact(struct zone *zone, int nr_request,
+			   int migratetype, struct list_head *freepages)
+{
+	unsigned int current_order;
+	struct free_area *area;
+	struct page *page;
+	unsigned long block_start_pfn;	/* start of current pageblock */
+	unsigned long block_end_pfn;	/* end of current pageblock */
+	int total_alloc = 0;
+	unsigned long flags;
+	struct page *next;
+	int to_free = 0;
+	int nr_remain = nr_request;
+	int loop_count = 0;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	/* Find a page of the appropriate size in the preferred list */
+	current_order = 0;
+	page = NULL;
+	while (current_order <= pageblock_order) {
+		int alloc;
+
+		/* search all possible pages in each list? */
+		if (loop_count > (zone->managed_pages / (1 << current_order)))
+			goto next_order;
+		loop_count++;
+
+		area = &(zone->free_area[current_order]);
+
+		if (list_empty(&area->free_list[migratetype]))
+			goto next_order;
+
+		page = list_entry(area->free_list[migratetype].next,
+				  struct page, lru);
+
+		/*
+		 * check migratetype of pageblock again,
+		 * some pages can be set as different migratetype
+		 * by rmqueue_fallback
+		 */
+		if (get_pageblock_migratetype(page) != migratetype)
+			continue;
+
+		block_start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+		block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
+				    zone_end_pfn(zone));
+
+		alloc = alloc_freepages_block(block_start_pfn,
+						 block_end_pfn,
+						 nr_remain,
+						 freepages);
+
+		total_alloc += alloc;
+		nr_remain -= alloc;
+
+		/*
+		 * alloc == 0: free buddy block is found but it is too big
+		 * or free buddy block is not valid page.
+		 * Try next order.
+		*/
+		if (alloc == 0)
+			goto next_order;
+
+		if (nr_remain <= 0)
+			break;
+
+next_order:
+		current_order++;
+		loop_count = 0;
+	}
+	__mod_zone_page_state(zone, NR_ALLOC_BATCH, -total_alloc);
+	__count_zone_vm_events(PGALLOC, zone, total_alloc);
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	list_for_each_entry_safe(page, next, freepages, lru) {
+		if (to_free >= nr_request) {
+			list_del(&page->lru);
+			atomic_dec(&page->_count);
+			__free_pages_ok(page, 0);
+		}
+		to_free++;
+	}
+
+	list_for_each_entry(page, freepages, lru) {
+		arch_alloc_page(page, 0);
+		kernel_map_pages(page, 1, 1);
+	}
+	return total_alloc < nr_request ? total_alloc : nr_request;
+}
+
+int alloc_pages_compact(gfp_t gfp_mask, int nr_request,
+			struct list_head *freepages)
+{
+	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+	struct zone *preferred_zone;
+	struct zoneref *preferred_zoneref;
+
+	preferred_zoneref = first_zones_zonelist(node_zonelist(numa_node_id(),
+							       gfp_mask),
+						 high_zoneidx,
+						 &cpuset_current_mems_allowed,
+						 &preferred_zone);
+	if (!preferred_zone)
+		return 0;
+
+	return rmqueue_compact(preferred_zone, nr_request,
+			       allocflags_to_migratetype(gfp_mask), freepages);
+}
+EXPORT_SYMBOL(alloc_pages_compact);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25  2:39 ` Gioh Kim
@ 2015-03-25 10:56   ` Mel Gorman
  -1 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2015-03-25 10:56 UTC (permalink / raw)
  To: Gioh Kim
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee

On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
> My driver allocates more than 40MB pages via alloc_page() at a time and
> maps them at virtual address. Totally it uses 300~400MB pages.
> 
> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> because-of the external fragmentation.
> 
> I thought I needed a anti-fragmentation solution for my driver.
> But there is no allocation function that considers fragmentation.
> The compaction is not helpful because it is only for movable pages, not unmovable pages.
> 
> This patch proposes a allocation function allocates only pages in the same pageblock.
> 

Is this not what CMA is for? Or creating a MOVABLE zone?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25 10:56   ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2015-03-25 10:56 UTC (permalink / raw)
  To: Gioh Kim
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee

On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
> My driver allocates more than 40MB pages via alloc_page() at a time and
> maps them at virtual address. Totally it uses 300~400MB pages.
> 
> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> because-of the external fragmentation.
> 
> I thought I needed a anti-fragmentation solution for my driver.
> But there is no allocation function that considers fragmentation.
> The compaction is not helpful because it is only for movable pages, not unmovable pages.
> 
> This patch proposes a allocation function allocates only pages in the same pageblock.
> 

Is this not what CMA is for? Or creating a MOVABLE zone?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25 10:56   ` Mel Gorman
@ 2015-03-25 21:16     ` Gioh Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25 21:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee



2015-03-25 오후 7:56에 Mel Gorman 이(가) 쓴 글:
> On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
>> My driver allocates more than 40MB pages via alloc_page() at a time and
>> maps them at virtual address. Totally it uses 300~400MB pages.
>>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>> because-of the external fragmentation.
>>
>> I thought I needed a anti-fragmentation solution for my driver.
>> But there is no allocation function that considers fragmentation.
>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>
>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>
>
> Is this not what CMA is for? Or creating a MOVABLE zone?

It's not related to CMA and MOVABLE zone.
It's for compaction and anti-fragmentation for any zone.


>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25 21:16     ` Gioh Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25 21:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee



2015-03-25 i??i?? 7:56i?? Mel Gorman i?'(e??) i?' e,?:
> On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
>> My driver allocates more than 40MB pages via alloc_page() at a time and
>> maps them at virtual address. Totally it uses 300~400MB pages.
>>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>> because-of the external fragmentation.
>>
>> I thought I needed a anti-fragmentation solution for my driver.
>> But there is no allocation function that considers fragmentation.
>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>
>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>
>
> Is this not what CMA is for? Or creating a MOVABLE zone?

It's not related to CMA and MOVABLE zone.
It's for compaction and anti-fragmentation for any zone.


>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25  2:39 ` Gioh Kim
@ 2015-03-25 22:16   ` Vlastimil Babka
  -1 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2015-03-25 22:16 UTC (permalink / raw)
  To: Gioh Kim, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee

On 25.3.2015 3:39, Gioh Kim wrote:
> My driver allocates more than 40MB pages via alloc_page() at a time and
> maps them at virtual address. Totally it uses 300~400MB pages.
> 
> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> because-of the external fragmentation.
> 
> I thought I needed a anti-fragmentation solution for my driver.
> But there is no allocation function that considers fragmentation.
> The compaction is not helpful because it is only for movable pages, not unmovable pages.
> 
> This patch proposes a allocation function allocates only pages in the same pageblock.
> 
> I tested this patch like following:
> 
> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
> 
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
> Node 0, zone   Normal          135            3          124            2            0            0
> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
> 
> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.

This is not a good test setup. You shouldn't switch the allocation types during
single system boot. You should compare results from a boot where common
allocation is used and from a boot where your new allocation is used.

> This is a kind of compaction of the driver.
> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
> 
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
> Node 0, zone   Normal          135            3          124            2            0            0
> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1

The number of unmovable pageblocks didn't change here. The stats for free
unmovable pages does look better for higher orders than in the first listing
above, but even the common allocation logic would give you that result, if you
allocated your 400 MB using (many) order-0 allocations (since you apparently
don't care about physically contiguous memory). That would also prefer order-0
free pages before splitting higher orders. So this doesn't demonstrate benefits
of the alloc_pages_compact() approach I'm afraid. The results suggest that the
system was in a worst state when the first allocation happened, and meanwhile
some pages were freed, creating the large numbers of order-0 unmovable free
pages. Or maybe the system got fragmented in the first allocation because your
driver tries to allocate the memory with high-order allocations before falling
back to lower orders? That would probably defeat the natural anti-fragmentation
of the buddy system.

So a proper test could be based on this:

> If I run a heavy load test for a few days in 1GB memory system, I cannot
allocate even order=3 pages
> because-of the external fragmentation.

With this patch, is the situation quantifiably better? Can you post the
pagetype/buddyinfo for system boot where all driver allocations use the common
allocator, and system boot with the patch? That should be comparable if the
workload is the same for both boots.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25 22:16   ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2015-03-25 22:16 UTC (permalink / raw)
  To: Gioh Kim, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee

On 25.3.2015 3:39, Gioh Kim wrote:
> My driver allocates more than 40MB pages via alloc_page() at a time and
> maps them at virtual address. Totally it uses 300~400MB pages.
> 
> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> because-of the external fragmentation.
> 
> I thought I needed a anti-fragmentation solution for my driver.
> But there is no allocation function that considers fragmentation.
> The compaction is not helpful because it is only for movable pages, not unmovable pages.
> 
> This patch proposes a allocation function allocates only pages in the same pageblock.
> 
> I tested this patch like following:
> 
> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
> 
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
> Node 0, zone   Normal          135            3          124            2            0            0
> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
> 
> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.

This is not a good test setup. You shouldn't switch the allocation types during
single system boot. You should compare results from a boot where common
allocation is used and from a boot where your new allocation is used.

> This is a kind of compaction of the driver.
> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
> 
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> 
> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
> Node 0, zone   Normal          135            3          124            2            0            0
> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1

The number of unmovable pageblocks didn't change here. The stats for free
unmovable pages does look better for higher orders than in the first listing
above, but even the common allocation logic would give you that result, if you
allocated your 400 MB using (many) order-0 allocations (since you apparently
don't care about physically contiguous memory). That would also prefer order-0
free pages before splitting higher orders. So this doesn't demonstrate benefits
of the alloc_pages_compact() approach I'm afraid. The results suggest that the
system was in a worst state when the first allocation happened, and meanwhile
some pages were freed, creating the large numbers of order-0 unmovable free
pages. Or maybe the system got fragmented in the first allocation because your
driver tries to allocate the memory with high-order allocations before falling
back to lower orders? That would probably defeat the natural anti-fragmentation
of the buddy system.

So a proper test could be based on this:

> If I run a heavy load test for a few days in 1GB memory system, I cannot
allocate even order=3 pages
> because-of the external fragmentation.

With this patch, is the situation quantifiably better? Can you post the
pagetype/buddyinfo for system boot where all driver allocations use the common
allocator, and system boot with the patch? That should be comparable if the
workload is the same for both boots.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25 22:16   ` Vlastimil Babka
@ 2015-03-25 23:25     ` Gioh Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25 23:25 UTC (permalink / raw)
  To: Vlastimil Babka, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee



2015-03-26 오전 7:16에 Vlastimil Babka 이(가) 쓴 글:
> On 25.3.2015 3:39, Gioh Kim wrote:
>> My driver allocates more than 40MB pages via alloc_page() at a time and
>> maps them at virtual address. Totally it uses 300~400MB pages.
>>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>> because-of the external fragmentation.
>>
>> I thought I needed a anti-fragmentation solution for my driver.
>> But there is no allocation function that considers fragmentation.
>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>
>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>
>> I tested this patch like following:
>>
>> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>
>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
>> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
>> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>
>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>> Node 0, zone   Normal          135            3          124            2            0            0
>> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
>>
>> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
>
> This is not a good test setup. You shouldn't switch the allocation types during
> single system boot. You should compare results from a boot where common
> allocation is used and from a boot where your new allocation is used.

The new allocator is slower so I don't think it can replace current allocator.
I don't aim to change general allocator.
The main pupose of the new allocator is a specific allocator if system has too much fragmentation.
If some drivers consume much memory and generate fragmentation, it can use new allocator instead at the time.
I want to make a kind of compaction for drivers that allocates unmovable pages.

Therefore I tested like that.
I first generated fragmentation and called the new allocator.
I wanted to check whether the fragmentation was caused by my driver
and the pages of the driver was able to be compacted.
I thought the pages was compacted.

If I freed pages and called the commmon allocator again,
it could decrease a little fragmentation (not much as the new allocator).
But there was no pages compaction and fragmentation would increase soon.


>
>> This is a kind of compaction of the driver.
>> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>
>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
>> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
>> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>
>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>> Node 0, zone   Normal          135            3          124            2            0            0
>> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1
>
> The number of unmovable pageblocks didn't change here. The stats for free
> unmovable pages does look better for higher orders than in the first listing
> above, but even the common allocation logic would give you that result, if you
> allocated your 400 MB using (many) order-0 allocations (since you apparently
> don't care about physically contiguous memory). That would also prefer order-0
> free pages before splitting higher orders. So this doesn't demonstrate benefits
> of the alloc_pages_compact() approach I'm afraid. The results suggest that the
> system was in a worst state when the first allocation happened, and meanwhile
> some pages were freed, creating the large numbers of order-0 unmovable free
> pages. Or maybe the system got fragmented in the first allocation because your
> driver tries to allocate the memory with high-order allocations before falling
> back to lower orders? That would probably defeat the natural anti-fragmentation
> of the buddy system.

My driver is allocating pages only with alloc_page, not alloc_pages with high order.

Yes, if I freed pages and called alloc_page again, it could decrease fragmentation at the time.
But there was no compaction and fragmentation would increase soon,
because the allocated pages was scattered all over the system.

The new allocator compacts pages. I believe it can decrease fragmentation for long time.

>
> So a proper test could be based on this:
>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot
> allocate even order=3 pages
>> because-of the external fragmentation.
>
> With this patch, is the situation quantifiably better? Can you post the
> pagetype/buddyinfo for system boot where all driver allocations use the common
> allocator, and system boot with the patch? That should be comparable if the
> workload is the same for both boots.
>

OK. I'll. I can be good test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-25 23:25     ` Gioh Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-25 23:25 UTC (permalink / raw)
  To: Vlastimil Babka, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee



2015-03-26 i??i ? 7:16i?? Vlastimil Babka i?'(e??) i?' e,?:
> On 25.3.2015 3:39, Gioh Kim wrote:
>> My driver allocates more than 40MB pages via alloc_page() at a time and
>> maps them at virtual address. Totally it uses 300~400MB pages.
>>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>> because-of the external fragmentation.
>>
>> I thought I needed a anti-fragmentation solution for my driver.
>> But there is no allocation function that considers fragmentation.
>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>
>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>
>> I tested this patch like following:
>>
>> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>
>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
>> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
>> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>
>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>> Node 0, zone   Normal          135            3          124            2            0            0
>> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
>>
>> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
>
> This is not a good test setup. You shouldn't switch the allocation types during
> single system boot. You should compare results from a boot where common
> allocation is used and from a boot where your new allocation is used.

The new allocator is slower so I don't think it can replace current allocator.
I don't aim to change general allocator.
The main pupose of the new allocator is a specific allocator if system has too much fragmentation.
If some drivers consume much memory and generate fragmentation, it can use new allocator instead at the time.
I want to make a kind of compaction for drivers that allocates unmovable pages.

Therefore I tested like that.
I first generated fragmentation and called the new allocator.
I wanted to check whether the fragmentation was caused by my driver
and the pages of the driver was able to be compacted.
I thought the pages was compacted.

If I freed pages and called the commmon allocator again,
it could decrease a little fragmentation (not much as the new allocator).
But there was no pages compaction and fragmentation would increase soon.


>
>> This is a kind of compaction of the driver.
>> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>
>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
>> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
>> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>
>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>> Node 0, zone   Normal          135            3          124            2            0            0
>> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1
>
> The number of unmovable pageblocks didn't change here. The stats for free
> unmovable pages does look better for higher orders than in the first listing
> above, but even the common allocation logic would give you that result, if you
> allocated your 400 MB using (many) order-0 allocations (since you apparently
> don't care about physically contiguous memory). That would also prefer order-0
> free pages before splitting higher orders. So this doesn't demonstrate benefits
> of the alloc_pages_compact() approach I'm afraid. The results suggest that the
> system was in a worst state when the first allocation happened, and meanwhile
> some pages were freed, creating the large numbers of order-0 unmovable free
> pages. Or maybe the system got fragmented in the first allocation because your
> driver tries to allocate the memory with high-order allocations before falling
> back to lower orders? That would probably defeat the natural anti-fragmentation
> of the buddy system.

My driver is allocating pages only with alloc_page, not alloc_pages with high order.

Yes, if I freed pages and called alloc_page again, it could decrease fragmentation at the time.
But there was no compaction and fragmentation would increase soon,
because the allocated pages was scattered all over the system.

The new allocator compacts pages. I believe it can decrease fragmentation for long time.

>
> So a proper test could be based on this:
>
>> If I run a heavy load test for a few days in 1GB memory system, I cannot
> allocate even order=3 pages
>> because-of the external fragmentation.
>
> With this patch, is the situation quantifiably better? Can you post the
> pagetype/buddyinfo for system boot where all driver allocations use the common
> allocator, and system boot with the patch? That should be comparable if the
> workload is the same for both boots.
>

OK. I'll. I can be good test.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25 21:16     ` Gioh Kim
@ 2015-03-26 10:28       ` Mel Gorman
  -1 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2015-03-26 10:28 UTC (permalink / raw)
  To: Gioh Kim
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee

On Thu, Mar 26, 2015 at 06:16:22AM +0900, Gioh Kim wrote:
> 
> 
> 2015-03-25 ?????? 7:56??? Mel Gorman ???(???) ??? ???:
> >On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
> >>My driver allocates more than 40MB pages via alloc_page() at a time and
> >>maps them at virtual address. Totally it uses 300~400MB pages.
> >>
> >>If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> >>because-of the external fragmentation.
> >>
> >>I thought I needed a anti-fragmentation solution for my driver.
> >>But there is no allocation function that considers fragmentation.
> >>The compaction is not helpful because it is only for movable pages, not unmovable pages.
> >>
> >>This patch proposes a allocation function allocates only pages in the same pageblock.
> >>
> >
> >Is this not what CMA is for? Or creating a MOVABLE zone?
> 
> It's not related to CMA and MOVABLE zone.
> It's for compaction and anti-fragmentation for any zone.
> 

Create a CMA area, allow your driver to use it use alloc_contig_range.
As it is, this is creating another contiguous range allocation function
with no in-kernel users.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-26 10:28       ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2015-03-26 10:28 UTC (permalink / raw)
  To: Gioh Kim
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee

On Thu, Mar 26, 2015 at 06:16:22AM +0900, Gioh Kim wrote:
> 
> 
> 2015-03-25 ?????? 7:56??? Mel Gorman ???(???) ??? ???:
> >On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
> >>My driver allocates more than 40MB pages via alloc_page() at a time and
> >>maps them at virtual address. Totally it uses 300~400MB pages.
> >>
> >>If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
> >>because-of the external fragmentation.
> >>
> >>I thought I needed a anti-fragmentation solution for my driver.
> >>But there is no allocation function that considers fragmentation.
> >>The compaction is not helpful because it is only for movable pages, not unmovable pages.
> >>
> >>This patch proposes a allocation function allocates only pages in the same pageblock.
> >>
> >
> >Is this not what CMA is for? Or creating a MOVABLE zone?
> 
> It's not related to CMA and MOVABLE zone.
> It's for compaction and anti-fragmentation for any zone.
> 

Create a CMA area, allow your driver to use it use alloc_contig_range.
As it is, this is creating another contiguous range allocation function
with no in-kernel users.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-26 10:28       ` Mel Gorman
@ 2015-03-27  0:51         ` Gioh Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-27  0:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee



2015-03-26 오후 7:28에 Mel Gorman 이(가) 쓴 글:
> On Thu, Mar 26, 2015 at 06:16:22AM +0900, Gioh Kim wrote:
>>
>>
>> 2015-03-25 ?????? 7:56??? Mel Gorman ???(???) ??? ???:
>>> On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
>>>> My driver allocates more than 40MB pages via alloc_page() at a time and
>>>> maps them at virtual address. Totally it uses 300~400MB pages.
>>>>
>>>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>>>> because-of the external fragmentation.
>>>>
>>>> I thought I needed a anti-fragmentation solution for my driver.
>>>> But there is no allocation function that considers fragmentation.
>>>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>>>
>>>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>>>
>>>
>>> Is this not what CMA is for? Or creating a MOVABLE zone?
>>
>> It's not related to CMA and MOVABLE zone.
>> It's for compaction and anti-fragmentation for any zone.
>>
>
> Create a CMA area, allow your driver to use it use alloc_contig_range.
> As it is, this is creating another contiguous range allocation function
> with no in-kernel users.
>

I'm sorry but I cannot follow your point.
I think this is not contiguous range allocation.
And CMA is not suitable for my driver because it needs fast allocation.

I can move pages into CMA area if I need high-order pages.
But the pages are unmovable types so it would pin the CMA area.

Please let me explain my problem again.
I've been suffering for years from fragmentation via unmovable pages.
Many of them are via graphic driver such as gpu and coder/decoder.
Current kernel compaction is not sufficient with this situation.
Graphic memory of the embedded systems like TV, phone I'm working for is getting bigger.
For instance my platform has 1GB and 300MB~400MB are consumed for graphic processing.
There are two reason:
1. cpu and gpu share memory
2. screen size(resolution) is getting bigger so that icon and ux images also are getting bigger

Therefore I don't need any contigous pages, but less fragmentation page allocation for unmovable pages.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-03-27  0:51         ` Gioh Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Gioh Kim @ 2015-03-27  0:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: akpm, riel, hannes, rientjes, vdavydov, iamjoonsoo.kim, linux-mm,
	linux-kernel, gunho.lee



2015-03-26 i??i?? 7:28i?? Mel Gorman i?'(e??) i?' e,?:
> On Thu, Mar 26, 2015 at 06:16:22AM +0900, Gioh Kim wrote:
>>
>>
>> 2015-03-25 ?????? 7:56??? Mel Gorman ???(???) ??? ???:
>>> On Wed, Mar 25, 2015 at 11:39:15AM +0900, Gioh Kim wrote:
>>>> My driver allocates more than 40MB pages via alloc_page() at a time and
>>>> maps them at virtual address. Totally it uses 300~400MB pages.
>>>>
>>>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>>>> because-of the external fragmentation.
>>>>
>>>> I thought I needed a anti-fragmentation solution for my driver.
>>>> But there is no allocation function that considers fragmentation.
>>>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>>>
>>>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>>>
>>>
>>> Is this not what CMA is for? Or creating a MOVABLE zone?
>>
>> It's not related to CMA and MOVABLE zone.
>> It's for compaction and anti-fragmentation for any zone.
>>
>
> Create a CMA area, allow your driver to use it use alloc_contig_range.
> As it is, this is creating another contiguous range allocation function
> with no in-kernel users.
>

I'm sorry but I cannot follow your point.
I think this is not contiguous range allocation.
And CMA is not suitable for my driver because it needs fast allocation.

I can move pages into CMA area if I need high-order pages.
But the pages are unmovable types so it would pin the CMA area.

Please let me explain my problem again.
I've been suffering for years from fragmentation via unmovable pages.
Many of them are via graphic driver such as gpu and coder/decoder.
Current kernel compaction is not sufficient with this situation.
Graphic memory of the embedded systems like TV, phone I'm working for is getting bigger.
For instance my platform has 1GB and 300MB~400MB are consumed for graphic processing.
There are two reason:
1. cpu and gpu share memory
2. screen size(resolution) is getting bigger so that icon and ux images also are getting bigger

Therefore I don't need any contigous pages, but less fragmentation page allocation for unmovable pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
  2015-03-25 23:25     ` Gioh Kim
@ 2015-04-01 12:05       ` Vlastimil Babka
  -1 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2015-04-01 12:05 UTC (permalink / raw)
  To: Gioh Kim, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee

On 03/26/2015 12:25 AM, Gioh Kim wrote:
>
>
> 2015-03-26 오전 7:16에 Vlastimil Babka 이(가) 쓴 글:
>> On 25.3.2015 3:39, Gioh Kim wrote:
>>> My driver allocates more than 40MB pages via alloc_page() at a time and
>>> maps them at virtual address. Totally it uses 300~400MB pages.
>>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>>> because-of the external fragmentation.
>>>
>>> I thought I needed a anti-fragmentation solution for my driver.
>>> But there is no allocation function that considers fragmentation.
>>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>>
>>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>>
>>> I tested this patch like following:
>>>
>>> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>>> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
>>> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
>>> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
>>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>>
>>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>>> Node 0, zone   Normal          135            3          124            2            0            0
>>> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
>>>
>>> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
>>
>> This is not a good test setup. You shouldn't switch the allocation types during
>> single system boot. You should compare results from a boot where common
>> allocation is used and from a boot where your new allocation is used.
>
> The new allocator is slower so I don't think it can replace current allocator.
> I don't aim to change general allocator.

I don't say you should replace current allocator for everything. Use it 
just for your driver, that's fine. But when you perform/simulate your 
driver allocation, use either the general allocator or the new 
allocator, don't change from one to another during a single boot.

> The main pupose of the new allocator is a specific allocator if system has too much fragmentation.
> If some drivers consume much memory and generate fragmentation, it can use new allocator instead at the time.
> I want to make a kind of compaction for drivers that allocates unmovable pages.
>
> Therefore I tested like that.
> I first generated fragmentation and called the new allocator.
> I wanted to check whether the fragmentation was caused by my driver
> and the pages of the driver was able to be compacted.
> I thought the pages was compacted.
>
> If I freed pages and called the commmon allocator again,
> it could decrease a little fragmentation (not much as the new allocator).
> But there was no pages compaction and fragmentation would increase soon.

Yes, we need data comparing common/new allocator in the same scenario. 
Presumably that's what you have in v3 submission.

>
>
>>
>>> This is a kind of compaction of the driver.
>>> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>>> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
>>> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
>>> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
>>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>>
>>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>>> Node 0, zone   Normal          135            3          124            2            0            0
>>> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1
>>
>> The number of unmovable pageblocks didn't change here. The stats for free
>> unmovable pages does look better for higher orders than in the first listing
>> above, but even the common allocation logic would give you that result, if you
>> allocated your 400 MB using (many) order-0 allocations (since you apparently
>> don't care about physically contiguous memory). That would also prefer order-0
>> free pages before splitting higher orders. So this doesn't demonstrate benefits
>> of the alloc_pages_compact() approach I'm afraid. The results suggest that the
>> system was in a worst state when the first allocation happened, and meanwhile
>> some pages were freed, creating the large numbers of order-0 unmovable free
>> pages. Or maybe the system got fragmented in the first allocation because your
>> driver tries to allocate the memory with high-order allocations before falling
>> back to lower orders? That would probably defeat the natural anti-fragmentation
>> of the buddy system.
>
> My driver is allocating pages only with alloc_page, not alloc_pages with high order.
>
> Yes, if I freed pages and called alloc_page again, it could decrease fragmentation at the time.
> But there was no compaction and fragmentation would increase soon,
> because the allocated pages was scattered all over the system.
>
> The new allocator compacts pages. I believe it can decrease fragmentation for long time.

If that's what v3 shows, ok. Let me check.

>>
>> So a proper test could be based on this:
>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot
>> allocate even order=3 pages
>>> because-of the external fragmentation.
>>
>> With this patch, is the situation quantifiably better? Can you post the
>> pagetype/buddyinfo for system boot where all driver allocations use the common
>> allocator, and system boot with the patch? That should be comparable if the
>> workload is the same for both boots.
>>
>
> OK. I'll. I can be good test.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFCv2] mm: page allocation for less fragmentation
@ 2015-04-01 12:05       ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2015-04-01 12:05 UTC (permalink / raw)
  To: Gioh Kim, akpm, mgorman, riel, hannes, rientjes, vdavydov,
	iamjoonsoo.kim
  Cc: linux-mm, linux-kernel, gunho.lee

On 03/26/2015 12:25 AM, Gioh Kim wrote:
>
>
> 2015-03-26 i??i ? 7:16i?? Vlastimil Babka i?'(e??) i?' e,?:
>> On 25.3.2015 3:39, Gioh Kim wrote:
>>> My driver allocates more than 40MB pages via alloc_page() at a time and
>>> maps them at virtual address. Totally it uses 300~400MB pages.
>>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot allocate even order=3 pages
>>> because-of the external fragmentation.
>>>
>>> I thought I needed a anti-fragmentation solution for my driver.
>>> But there is no allocation function that considers fragmentation.
>>> The compaction is not helpful because it is only for movable pages, not unmovable pages.
>>>
>>> This patch proposes a allocation function allocates only pages in the same pageblock.
>>>
>>> I tested this patch like following:
>>>
>>> 1. When the driver allocates about 400MB and do "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>>> Node    0, zone   Normal, type    Unmovable   3864    728    394    216    129     47     18      9      1      0      0
>>> Node    0, zone   Normal, type  Reclaimable    902     96     68     17      3      0      1      0      0      0      0
>>> Node    0, zone   Normal, type      Movable   5146    663    178     91     43     16      4      0      0      0      0
>>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>>
>>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>>> Node 0, zone   Normal          135            3          124            2            0            0
>>> Node 0, zone   Normal   9880   1489    647    332    177     64     24     10      1      1      1
>>>
>>> 2. The driver frees all pages and allocates pages again with alloc_pages_compact.
>>
>> This is not a good test setup. You shouldn't switch the allocation types during
>> single system boot. You should compare results from a boot where common
>> allocation is used and from a boot where your new allocation is used.
>
> The new allocator is slower so I don't think it can replace current allocator.
> I don't aim to change general allocator.

I don't say you should replace current allocator for everything. Use it 
just for your driver, that's fine. But when you perform/simulate your 
driver allocation, use either the general allocator or the new 
allocator, don't change from one to another during a single boot.

> The main pupose of the new allocator is a specific allocator if system has too much fragmentation.
> If some drivers consume much memory and generate fragmentation, it can use new allocator instead at the time.
> I want to make a kind of compaction for drivers that allocates unmovable pages.
>
> Therefore I tested like that.
> I first generated fragmentation and called the new allocator.
> I wanted to check whether the fragmentation was caused by my driver
> and the pages of the driver was able to be compacted.
> I thought the pages was compacted.
>
> If I freed pages and called the commmon allocator again,
> it could decrease a little fragmentation (not much as the new allocator).
> But there was no pages compaction and fragmentation would increase soon.

Yes, we need data comparing common/new allocator in the same scenario. 
Presumably that's what you have in v3 submission.

>
>
>>
>>> This is a kind of compaction of the driver.
>>> Following is the result of "cat /proc/pagetypeinfo;cat /proc/buddyinfo"
>>>
>>> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
>>> Node    0, zone   Normal, type    Unmovable      8      5      1    432    272     91     37     11      1      0      0
>>> Node    0, zone   Normal, type  Reclaimable    901     96     68     17      3      0      1      0      0      0      0
>>> Node    0, zone   Normal, type      Movable   4790    776    192     91     43     16      4      0      0      0      0
>>> Node    0, zone   Normal, type      Reserve      1      4      6      6      2      1      1      1      0      1      1
>>> Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
>>> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
>>>
>>> Number of blocks type     Unmovable  Reclaimable      Movable      Reserve          CMA      Isolate
>>> Node 0, zone   Normal          135            3          124            2            0            0
>>> Node 0, zone   Normal   5693    877    266    544    320    108     43     12      1      1      1
>>
>> The number of unmovable pageblocks didn't change here. The stats for free
>> unmovable pages does look better for higher orders than in the first listing
>> above, but even the common allocation logic would give you that result, if you
>> allocated your 400 MB using (many) order-0 allocations (since you apparently
>> don't care about physically contiguous memory). That would also prefer order-0
>> free pages before splitting higher orders. So this doesn't demonstrate benefits
>> of the alloc_pages_compact() approach I'm afraid. The results suggest that the
>> system was in a worst state when the first allocation happened, and meanwhile
>> some pages were freed, creating the large numbers of order-0 unmovable free
>> pages. Or maybe the system got fragmented in the first allocation because your
>> driver tries to allocate the memory with high-order allocations before falling
>> back to lower orders? That would probably defeat the natural anti-fragmentation
>> of the buddy system.
>
> My driver is allocating pages only with alloc_page, not alloc_pages with high order.
>
> Yes, if I freed pages and called alloc_page again, it could decrease fragmentation at the time.
> But there was no compaction and fragmentation would increase soon,
> because the allocated pages was scattered all over the system.
>
> The new allocator compacts pages. I believe it can decrease fragmentation for long time.

If that's what v3 shows, ok. Let me check.

>>
>> So a proper test could be based on this:
>>
>>> If I run a heavy load test for a few days in 1GB memory system, I cannot
>> allocate even order=3 pages
>>> because-of the external fragmentation.
>>
>> With this patch, is the situation quantifiably better? Can you post the
>> pagetype/buddyinfo for system boot where all driver allocations use the common
>> allocator, and system boot with the patch? That should be comparable if the
>> workload is the same for both boots.
>>
>
> OK. I'll. I can be good test.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-04-01 12:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25  2:39 [RFCv2] mm: page allocation for less fragmentation Gioh Kim
2015-03-25  2:39 ` Gioh Kim
2015-03-25 10:56 ` Mel Gorman
2015-03-25 10:56   ` Mel Gorman
2015-03-25 21:16   ` Gioh Kim
2015-03-25 21:16     ` Gioh Kim
2015-03-26 10:28     ` Mel Gorman
2015-03-26 10:28       ` Mel Gorman
2015-03-27  0:51       ` Gioh Kim
2015-03-27  0:51         ` Gioh Kim
2015-03-25 22:16 ` Vlastimil Babka
2015-03-25 22:16   ` Vlastimil Babka
2015-03-25 23:25   ` Gioh Kim
2015-03-25 23:25     ` Gioh Kim
2015-04-01 12:05     ` Vlastimil Babka
2015-04-01 12:05       ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.