linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: Add cache coloring mechanism
@ 2017-08-22 12:45 Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 1/3] mm: move free_list selection to dedicated functions Łukasz Daniluk
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Łukasz Daniluk @ 2017-08-22 12:45 UTC (permalink / raw)
  To: linux-mm; +Cc: dave.hansen, lukasz.anaczkowski, Łukasz Daniluk

This patch series adds cache coloring mechanism that works along buddy
allocator. The solution is opt-in, disabled by default minimally
interferes with default allocation paths due to added if statements.

Why would such patches be needed? Big caches with low associativity
(direct mapped caches, 2-way associative) will benefit from the solution
the most - it allows for near constant performance through the lifetime
of a system, despite the allocations and deallocations happening and
reordering buddy lists.

On KNL system, the STREAM benchmark with problem size resulting in its
internal arrays being of 16GB size will yield bandwidth performance of
336GB/s after fresh boot. With cache coloring patches applied and
enabled, this performance stays near constant (most 1.5% drop observed),
despite running benchmark multiple times with varying sizes over course
of days.  Without these patches however, the bandwidth when using such
allocations drops to 117GB/s - over 65% of irrecoverable performance
penalty. Workloads that exceed set cache size suffer from decreased
randomization of allocations with cache coloring enabled, but effect of
cache usage disappears roughly at the same allocation size.

Solution is divided into three patches. First patch is a preparatory one
that provides interface for retrieving (information about) free lists
contained by particular free_area structure.  Second one (parallel
structure keeping separate list_heads for each cache color in a given
context) shows general solution overview and is working as it is.
However, it has serious performance implications with bigger caches due
to linear search for next color to be used during allocations. Third
patch (sorting list_heads using RB trees) aims to improve solution's
performance by replacing linear search for next color with searching in
RB tree. While improving computational performance, it imposes increased
memory cost of the solution.

A?ukasz Daniluk (3):
  mm: move free_list selection to dedicated functions
  mm: Add page colored allocation path
  mm: Add helper rbtree to search for next cache color

 Documentation/admin-guide/kernel-parameters.txt |   8 +
 include/linux/mmzone.h                          |  12 +-
 mm/compaction.c                                 |   4 +-
 mm/page_alloc.c                                 | 381 ++++++++++++++++++++++--
 mm/vmstat.c                                     |  10 +-
 5 files changed, 383 insertions(+), 32 deletions(-)

-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] mm: move free_list selection to dedicated functions
  2017-08-22 12:45 [PATCH 0/3] mm: Add cache coloring mechanism Łukasz Daniluk
@ 2017-08-22 12:45 ` Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 2/3] mm: Add page colored allocation path Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 3/3] mm: Add helper rbtree to search for next cache color Łukasz Daniluk
  2 siblings, 0 replies; 4+ messages in thread
From: Łukasz Daniluk @ 2017-08-22 12:45 UTC (permalink / raw)
  To: linux-mm; +Cc: dave.hansen, lukasz.anaczkowski, Łukasz Daniluk

Currently free_list selection from particular free_area was done on
principle that there is only one free_list per order per migratetype.
This patch is preparation for page coloring solution utilising multiple
free_lists.

Signed-off-by: A?ukasz Daniluk <lukasz.daniluk@intel.com>
---
 include/linux/mmzone.h |  4 ++++
 mm/compaction.c        |  4 ++--
 mm/page_alloc.c        | 61 +++++++++++++++++++++++++++++++++++++-------------
 mm/vmstat.c            | 10 +++------
 4 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc14b8b3f6ce..04128890a684 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -910,6 +910,10 @@ extern struct pglist_data contig_page_data;
 
 #endif /* !CONFIG_NEED_MULTIPLE_NODES */
 
+extern bool area_empty(struct free_area *area, int order, int migratetype);
+extern unsigned long area_free_count(struct free_area *area, int order,
+					int migratetype);
+
 extern struct pglist_data *first_online_pgdat(void);
 extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat);
 extern struct zone *next_zone(struct zone *zone);
diff --git a/mm/compaction.c b/mm/compaction.c
index fb548e4c7bd4..8f6274c0a04b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1334,13 +1334,13 @@ static enum compact_result __compact_finished(struct zone *zone,
 		bool can_steal;
 
 		/* Job done if page is free of the right migratetype */
-		if (!list_empty(&area->free_list[migratetype]))
+		if (!area_empty(area, order, migratetype))
 			return COMPACT_SUCCESS;
 
 #ifdef CONFIG_CMA
 		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
 		if (migratetype == MIGRATE_MOVABLE &&
-			!list_empty(&area->free_list[MIGRATE_CMA]))
+			!area_empty(area, order, MIGRATE_CMA))
 			return COMPACT_SUCCESS;
 #endif
 		/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 471b0526b876..3f7b074fbfdb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -288,6 +288,29 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+struct list_head *area_get_free_list(struct free_area *area, int order,
+				int migratetype)
+{
+	return &area->free_list[migratetype];
+}
+
+bool area_empty(struct free_area *area, int order, int migratetype)
+{
+	return list_empty(&area->free_list[migratetype]);
+}
+
+unsigned long area_free_count(struct free_area *area, int order,
+				int migratetype)
+{
+	unsigned long count = 0;
+	struct list_head *lh;
+
+	list_for_each(lh, &area->free_list[migratetype])
+		++count;
+
+	return count;
+}
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
@@ -887,12 +910,14 @@ static inline void __free_one_page(struct page *page,
 		if (pfn_valid_within(buddy_pfn) &&
 		    page_is_buddy(higher_page, higher_buddy, order + 1)) {
 			list_add_tail(&page->lru,
-				&zone->free_area[order].free_list[migratetype]);
+				area_get_free_list(&zone->free_area[order],
+						order, migratetype));
 			goto out;
 		}
 	}
 
-	list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
+	list_add(&page->lru, area_get_free_list(&zone->free_area[order], order,
+						migratetype));
 out:
 	zone->free_area[order].nr_free++;
 }
@@ -1660,7 +1685,8 @@ static inline void expand(struct zone *zone, struct page *page,
 		if (set_page_guard(zone, &page[size], high, migratetype))
 			continue;
 
-		list_add(&page[size].lru, &area->free_list[migratetype]);
+		list_add(&page[size].lru, area_get_free_list(area, high,
+								migratetype));
 		area->nr_free++;
 		set_page_order(&page[size], high);
 	}
@@ -1802,8 +1828,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 	/* Find a page of the appropriate size in the preferred list */
 	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
 		area = &(zone->free_area[current_order]);
-		page = list_first_entry_or_null(&area->free_list[migratetype],
-							struct page, lru);
+		page = list_first_entry_or_null(
+			area_get_free_list(area, current_order, migratetype),
+			struct page, lru);
 		if (!page)
 			continue;
 		list_del(&page->lru);
@@ -1897,7 +1924,8 @@ static int move_freepages(struct zone *zone,
 
 		order = page_order(page);
 		list_move(&page->lru,
-			  &zone->free_area[order].free_list[migratetype]);
+			area_get_free_list(&zone->free_area[order], order,
+					migratetype));
 		page += 1 << order;
 		pages_moved += 1 << order;
 	}
@@ -2046,7 +2074,8 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 
 single_page:
 	area = &zone->free_area[current_order];
-	list_move(&page->lru, &area->free_list[start_type]);
+	list_move(&page->lru, area_get_free_list(area, current_order,
+						start_type));
 }
 
 /*
@@ -2070,7 +2099,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
 		if (fallback_mt == MIGRATE_TYPES)
 			break;
 
-		if (list_empty(&area->free_list[fallback_mt]))
+		if (area_empty(area, order, fallback_mt))
 			continue;
 
 		if (can_steal_fallback(order, migratetype))
@@ -2158,7 +2187,8 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			struct free_area *area = &(zone->free_area[order]);
 
 			page = list_first_entry_or_null(
-					&area->free_list[MIGRATE_HIGHATOMIC],
+					area_get_free_list(area, order,
+							MIGRATE_HIGHATOMIC),
 					struct page, lru);
 			if (!page)
 				continue;
@@ -2272,8 +2302,8 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	VM_BUG_ON(current_order == MAX_ORDER);
 
 do_steal:
-	page = list_first_entry(&area->free_list[fallback_mt],
-							struct page, lru);
+	page = list_first_entry(area_get_free_list(area, current_order,
+					fallback_mt), struct page, lru);
 
 	steal_suitable_fallback(zone, page, start_migratetype, can_steal);
 
@@ -2562,7 +2592,8 @@ void mark_free_pages(struct zone *zone)
 
 	for_each_migratetype_order(order, t) {
 		list_for_each_entry(page,
-				&zone->free_area[order].free_list[t], lru) {
+				area_get_free_list(&zone->free_area[order],
+						order, t), lru) {
 			unsigned long i;
 
 			pfn = page_to_pfn(page);
@@ -2983,13 +3014,13 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			return true;
 
 		for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
-			if (!list_empty(&area->free_list[mt]))
+			if (!area_empty(area, o, mt))
 				return true;
 		}
 
 #ifdef CONFIG_CMA
 		if ((alloc_flags & ALLOC_CMA) &&
-		    !list_empty(&area->free_list[MIGRATE_CMA])) {
+		    !area_empty(area, o, MIGRATE_CMA)) {
 			return true;
 		}
 #endif
@@ -4788,7 +4819,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 
 			types[order] = 0;
 			for (type = 0; type < MIGRATE_TYPES; type++) {
-				if (!list_empty(&area->free_list[type]))
+				if (!area_empty(area, order, type))
 					types[order] |= 1 << type;
 			}
 		}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 9a4441bbeef2..1ff86656cb2e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1183,14 +1183,10 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					zone->name,
 					migratetype_names[mtype]);
 		for (order = 0; order < MAX_ORDER; ++order) {
-			unsigned long freecount = 0;
-			struct free_area *area;
-			struct list_head *curr;
+			struct free_area *area = &(zone->free_area[order]);
+			unsigned long freecount =
+				area_free_count(area, order, mtype);
 
-			area = &(zone->free_area[order]);
-
-			list_for_each(curr, &area->free_list[mtype])
-				freecount++;
 			seq_printf(m, "%6lu ", freecount);
 		}
 		seq_putc(m, '\n');
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] mm: Add page colored allocation path
  2017-08-22 12:45 [PATCH 0/3] mm: Add cache coloring mechanism Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 1/3] mm: move free_list selection to dedicated functions Łukasz Daniluk
@ 2017-08-22 12:45 ` Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 3/3] mm: Add helper rbtree to search for next cache color Łukasz Daniluk
  2 siblings, 0 replies; 4+ messages in thread
From: Łukasz Daniluk @ 2017-08-22 12:45 UTC (permalink / raw)
  To: linux-mm; +Cc: dave.hansen, lukasz.anaczkowski, Łukasz Daniluk

Create alternative path when requesting free_list from buddy system
that takes recent allocations or physical page offset into account.

This solution carries a cost of additional, early memory allocations
done by kernel that is dependent on cache size and minimum order to be
colored.

Signed-off-by: A?ukasz Daniluk <lukasz.daniluk@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   8 +
 include/linux/mmzone.h                          |   5 +-
 mm/page_alloc.c                                 | 253 +++++++++++++++++++++---
 3 files changed, 243 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 372cc66bba23..4648f1cc6665 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -455,6 +455,14 @@
 			possible to determine what the correct size should be.
 			This option provides an override for these situations.
 
+	cache_color_size=
+			[KNL] Set cache size for purposes of cache coloring
+			mechanism in buddy allocator.
+
+	cache_color_min_order=
+			[KNL] Set minimal order for which page coloring
+			mechanism will be enabled in buddy allocator.
+
 	ca_keys=	[KEYS] This parameter identifies a specific key(s) on
 			the system trusted keyring to be used for certificate
 			trust validation.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 04128890a684..d10a5421b18b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -25,7 +25,8 @@
 #else
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif
-#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
+#define ORDER_NR_PAGES(order) (1 << (order))
+#define MAX_ORDER_NR_PAGES ORDER_NR_PAGES(MAX_ORDER - 1)
 
 /*
  * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
@@ -94,6 +95,8 @@ extern int page_group_by_mobility_disabled;
 
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
+	struct list_head	*colored_free_list[MIGRATE_TYPES];
+	unsigned long		next_color[MIGRATE_TYPES];
 	unsigned long		nr_free;
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f7b074fbfdb..3718b49032c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -288,14 +288,129 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-struct list_head *area_get_free_list(struct free_area *area, int order,
-				int migratetype)
+int cache_color_min_order __read_mostly = MAX_ORDER - 1;
+
+
+/*
+ * cache_size - size of cache (in bytes)
+ */
+unsigned long long cache_size __read_mostly;
+
+/*
+ * Returns true if cache color allocations are enabled
+ */
+static inline bool cache_color_used(int order)
+{
+	return cache_size && order >= cache_color_min_order;
+}
+
+/*
+ * Returns number of cache colors in specified order
+ */
+static inline unsigned long cache_colors(int order)
+{
+	unsigned long colors = cache_size / (PAGE_SIZE * ORDER_NR_PAGES(order));
+
+	return colors ? colors : 1;
+}
+
+/*
+ * Returns cache color for page of specified order
+ */
+static inline unsigned long page_cache_color(struct page *page, int order)
+{
+	return (page_to_phys(page) % cache_size)
+		/ (PAGE_SIZE * ORDER_NR_PAGES(order));
+}
+
+/*
+ * Returns color of first non-empty free_list for purpose of cache color
+ * allocations in specified range (start until end-1).
+ */
+static inline unsigned long cache_color_find_nonempty(struct free_area *area,
+			int migratetype, unsigned long start, unsigned long end)
 {
+	unsigned long color;
+
+	for (color = start; color < end; ++color)
+		if (!list_empty(&area->colored_free_list[migratetype][color]))
+			break;
+
+	return color;
+}
+
+/*
+ * Returns free_list for cache color allocation purposes.
+ * In case page is passed, it is assumed that it will be added to free_list, so
+ * return value is based on color of the page.
+ * Otherwise it searches for next color with free pages available.
+ */
+static inline struct list_head *cache_color_area_get_free_list(
+		struct free_area *area, struct page *page, int order,
+		int migratetype)
+{
+	unsigned long color;
+	unsigned long current_color = area->next_color[migratetype];
+
+	if (page) {
+		color = page_cache_color(page, order);
+		return &area->colored_free_list[migratetype][color];
+	}
+
+	color = cache_color_find_nonempty(area, migratetype, current_color,
+					cache_colors(order));
+
+
+	if (color == cache_colors(order))
+		color = cache_color_find_nonempty(area, migratetype, 0,
+						current_color);
+
+	current_color = color + 1;
+	if (current_color >= cache_colors(order))
+		current_color = 0;
+
+	area->next_color[migratetype] = current_color;
+
+	return &area->colored_free_list[migratetype][color];
+}
+
+static inline bool cache_color_area_empty(struct free_area *area, int order,
+					int migratetype)
+{
+	return cache_color_find_nonempty(area, migratetype, 0,
+				cache_colors(order)) == cache_colors(order);
+}
+
+
+static inline unsigned long cache_color_area_free_count(struct free_area *area,
+						int order, int migratetype)
+{
+	unsigned long count = 0;
+	unsigned long color;
+	struct list_head *lh;
+
+	for (color = 0; color < cache_colors(order); ++color)
+		list_for_each(lh, &area->colored_free_list[migratetype][color])
+			++count;
+
+	return count;
+}
+
+struct list_head *area_get_free_list(struct free_area *area, struct page *page,
+					int order, int migratetype)
+{
+	if (cache_color_used(order))
+		return cache_color_area_get_free_list(area, page, order,
+							migratetype);
+
 	return &area->free_list[migratetype];
 }
 
 bool area_empty(struct free_area *area, int order, int migratetype)
 {
+	if (cache_color_used(order))
+		return cache_color_area_empty(area, order, migratetype);
+
 	return list_empty(&area->free_list[migratetype]);
 }
 
@@ -305,12 +420,67 @@ unsigned long area_free_count(struct free_area *area, int order,
 	unsigned long count = 0;
 	struct list_head *lh;
 
+	if (cache_color_used(order))
+		return cache_color_area_free_count(area, order, migratetype);
+
 	list_for_each(lh, &area->free_list[migratetype])
 		++count;
 
 	return count;
 }
 
+/*
+ * Returns pointer to allocated space that will host the array of free_lists
+ * used for cache color allocations.
+ */
+static __ref void *cache_color_alloc_lists(struct zone *zone, int order)
+{
+	const size_t size = sizeof(struct list_head) * cache_colors(order);
+
+	return alloc_bootmem_pages_node(zone->zone_pgdat, size);
+}
+
+static __init int setup_cache_color_size(char *_cache_size)
+{
+	char *retptr;
+
+	cache_size = memparse(_cache_size, &retptr);
+
+	if (retptr == _cache_size) {
+		pr_warn("Invalid cache size requested");
+		cache_size = 0;
+	}
+
+	if (cache_size == 0)
+		pr_info("Cache color pages functionality disabled");
+	else
+		pr_info("Cache size set to %llu bytes", cache_size);
+
+	return 0;
+}
+early_param("cache_color_size", setup_cache_color_size);
+
+static __init int setup_cache_color_min_order(char *_cache_color_min_order)
+{
+	int order;
+
+	if (kstrtoint(_cache_color_min_order, 10, &order) < 0) {
+		pr_warn("Invalid cache color min order requested");
+		order = -1;
+	}
+
+	if (order < 0 || order >= MAX_ORDER) {
+		pr_info("Cache color pages functionality disabled");
+		cache_color_min_order = MAX_ORDER;
+	} else {
+		pr_info("Cache min order set to %d", order);
+		cache_color_min_order = order;
+	}
+
+	return 0;
+}
+early_param("cache_color_min_order", setup_cache_color_min_order);
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
@@ -911,13 +1081,13 @@ static inline void __free_one_page(struct page *page,
 		    page_is_buddy(higher_page, higher_buddy, order + 1)) {
 			list_add_tail(&page->lru,
 				area_get_free_list(&zone->free_area[order],
-						order, migratetype));
+						page, order, migratetype));
 			goto out;
 		}
 	}
 
-	list_add(&page->lru, area_get_free_list(&zone->free_area[order], order,
-						migratetype));
+	list_add(&page->lru, area_get_free_list(&zone->free_area[order], page,
+						order, migratetype));
 out:
 	zone->free_area[order].nr_free++;
 }
@@ -1685,8 +1855,8 @@ static inline void expand(struct zone *zone, struct page *page,
 		if (set_page_guard(zone, &page[size], high, migratetype))
 			continue;
 
-		list_add(&page[size].lru, area_get_free_list(area, high,
-								migratetype));
+		list_add(&page[size].lru, area_get_free_list(area, &page[size],
+							high, migratetype));
 		area->nr_free++;
 		set_page_order(&page[size], high);
 	}
@@ -1829,7 +1999,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 	for (current_order = order; current_order < MAX_ORDER; ++current_order) {
 		area = &(zone->free_area[current_order]);
 		page = list_first_entry_or_null(
-			area_get_free_list(area, current_order, migratetype),
+			area_get_free_list(area, NULL, current_order,
+					migratetype),
 			struct page, lru);
 		if (!page)
 			continue;
@@ -1924,8 +2095,8 @@ static int move_freepages(struct zone *zone,
 
 		order = page_order(page);
 		list_move(&page->lru,
-			area_get_free_list(&zone->free_area[order], order,
-					migratetype));
+			  area_get_free_list(&zone->free_area[order], page,
+						order, migratetype));
 		page += 1 << order;
 		pages_moved += 1 << order;
 	}
@@ -2074,7 +2245,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
 
 single_page:
 	area = &zone->free_area[current_order];
-	list_move(&page->lru, area_get_free_list(area, current_order,
+	list_move(&page->lru, area_get_free_list(area, page, current_order,
 						start_type));
 }
 
@@ -2187,7 +2358,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			struct free_area *area = &(zone->free_area[order]);
 
 			page = list_first_entry_or_null(
-					area_get_free_list(area, order,
+					area_get_free_list(area, NULL, order,
 							MIGRATE_HIGHATOMIC),
 					struct page, lru);
 			if (!page)
@@ -2302,7 +2473,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	VM_BUG_ON(current_order == MAX_ORDER);
 
 do_steal:
-	page = list_first_entry(area_get_free_list(area, current_order,
+	page = list_first_entry(area_get_free_list(area, NULL, current_order,
 					fallback_mt), struct page, lru);
 
 	steal_suitable_fallback(zone, page, start_migratetype, can_steal);
@@ -2566,6 +2737,16 @@ void drain_all_pages(struct zone *zone)
 
 #ifdef CONFIG_HIBERNATION
 
+static inline void mark_free_page(struct page *page, int order)
+{
+	unsigned long i, pfn;
+
+	pfn = page_to_pfn(page);
+
+	for (i = 0; i < (1UL << order); ++i)
+		swsusp_set_page_free(pfn_to_page(pfn + i));
+}
+
 void mark_free_pages(struct zone *zone)
 {
 	unsigned long pfn, max_zone_pfn;
@@ -2591,14 +2772,20 @@ void mark_free_pages(struct zone *zone)
 		}
 
 	for_each_migratetype_order(order, t) {
-		list_for_each_entry(page,
-				area_get_free_list(&zone->free_area[order],
-						order, t), lru) {
-			unsigned long i;
+		struct free_area *area = &zone->free_area[order];
+
+		if (cache_color_used(order)) {
+			unsigned long color;
+			struct list_head *colored_lists =
+					area->colored_free_list[t];
 
-			pfn = page_to_pfn(page);
-			for (i = 0; i < (1UL << order); i++)
-				swsusp_set_page_free(pfn_to_page(pfn + i));
+			for (color = 0; color < cache_colors(order); ++color)
+				list_for_each_entry(page,
+						&colored_lists[color], lru)
+					mark_free_page(page, order);
+		} else {
+			list_for_each_entry(page, &area->free_list[t], lru)
+				mark_free_page(page, order);
 		}
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -5490,12 +5677,34 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	}
 }
 
+static void __meminit zone_init_cache_color(struct zone *zone, int order,
+						int migratetype)
+{
+	unsigned long c;
+	struct free_area *area = &zone->free_area[order];
+
+	if (!cache_color_used(order))
+		return;
+
+	area->next_color[migratetype] = 0;
+	area->colored_free_list[migratetype] =
+		cache_color_alloc_lists(zone, order);
+
+	for (c = 0; c < cache_colors(order); ++c)
+		INIT_LIST_HEAD(&area->colored_free_list[migratetype][c]);
+}
+
 static void __meminit zone_init_free_lists(struct zone *zone)
 {
 	unsigned int order, t;
 	for_each_migratetype_order(order, t) {
-		INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
-		zone->free_area[order].nr_free = 0;
+		struct free_area *area = &zone->free_area[order];
+
+		INIT_LIST_HEAD(&area->free_list[t]);
+		area->nr_free = 0;
+
+		if (cache_color_used(order))
+			zone_init_cache_color(zone, order, t);
 	}
 }
 
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] mm: Add helper rbtree to search for next cache color
  2017-08-22 12:45 [PATCH 0/3] mm: Add cache coloring mechanism Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 1/3] mm: move free_list selection to dedicated functions Łukasz Daniluk
  2017-08-22 12:45 ` [PATCH 2/3] mm: Add page colored allocation path Łukasz Daniluk
@ 2017-08-22 12:45 ` Łukasz Daniluk
  2 siblings, 0 replies; 4+ messages in thread
From: Łukasz Daniluk @ 2017-08-22 12:45 UTC (permalink / raw)
  To: linux-mm; +Cc: dave.hansen, lukasz.anaczkowski, Łukasz Daniluk

Before this patch search for next available cache color was done
linearly. This kind of search is problematic if the contents are sparse
or there are no contents at all.

This patch aims to fix this problem by arranging the free_lists that
take part in cache aware allocations in the RB tree structure.

In order for the solution to work properly, space for free_lists and RB
tree nodes has to be allocated. Required space is 5 pointers per color
(struct rb_node, struct list_head).

Cost of the solution with RB tree helpers can be estimated as:
5 * MIGRATE_TYPES * cache_colors(order), per zone per node, for each
order that we wish to be affected. For example 16GB cache with only
order 10 enabled requires 160KB per zone type, migratetype, node; whereas
enabling this feature for order 9 will bump this number up to 480KB per
zone type, migratetype.

Signed-off-by: A?ukasz Daniluk <lukasz.daniluk@intel.com>
---
 include/linux/mmzone.h |   5 +-
 mm/page_alloc.c        | 155 ++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 130 insertions(+), 30 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d10a5421b18b..cda726854078 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -6,6 +6,7 @@
 
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/wait.h>
 #include <linux/bitops.h>
 #include <linux/cache.h>
@@ -93,9 +94,11 @@ extern int page_group_by_mobility_disabled;
 	get_pfnblock_flags_mask(page, page_to_pfn(page),		\
 			PB_migrate_end, MIGRATETYPE_MASK)
 
+struct cache_color;
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
-	struct list_head	*colored_free_list[MIGRATE_TYPES];
+	struct rb_root		cache_colored_free_lists[MIGRATE_TYPES];
+	struct cache_color	*cache_colors[MIGRATE_TYPES];
 	unsigned long		next_color[MIGRATE_TYPES];
 	unsigned long		nr_free;
 };
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3718b49032c2..da1431c4703c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -290,6 +290,10 @@ int page_group_by_mobility_disabled __read_mostly;
 
 int cache_color_min_order __read_mostly = MAX_ORDER - 1;
 
+struct cache_color {
+	struct list_head	free_list;
+	struct rb_node		rb_tree_node;
+};
 
 /*
  * cache_size - size of cache (in bytes)
@@ -323,18 +327,102 @@ static inline unsigned long page_cache_color(struct page *page, int order)
 		/ (PAGE_SIZE * ORDER_NR_PAGES(order));
 }
 
+static inline unsigned long get_cache_color(struct cache_color *cc,
+				struct free_area *area, int migratetype)
+{
+	return cc - area->cache_colors[migratetype];
+}
+
+/*
+ * Returns pointer to cache_color structure that has first available color
+ * after color passed as argument, or NULL, if no such structure was found.
+ */
+static inline struct cache_color *cache_color_area_find_next(
+					struct free_area *area,
+					int migratetype, unsigned long color)
+{
+	struct rb_root *root = &area->cache_colored_free_lists[migratetype];
+	struct rb_node *node = root->rb_node;
+	struct cache_color *ret = NULL;
+
+	while (node) {
+		struct cache_color *cc =
+			rb_entry(node, struct cache_color, rb_tree_node);
+		unsigned long cc_color =
+			get_cache_color(cc, area, migratetype);
+
+		if (cc_color < color) {
+			node = node->rb_right;
+		} else if (cc_color > color) {
+			ret = cc;
+			node = node->rb_left;
+		} else {
+			return cc;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Inserts cache_color structure into RB tree associated with migratetype in
+ * area, in case the color was absent from the tree.
+ */
+static inline void cache_color_area_insert(struct free_area *area,
+					int migratetype, struct cache_color *cc)
+{
+	struct rb_root *root = &area->cache_colored_free_lists[migratetype];
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+	unsigned long cc_color = get_cache_color(cc, area, migratetype);
+
+	while (*new) {
+		struct cache_color *this =
+			rb_entry(*new, struct cache_color, rb_tree_node);
+		unsigned long this_color =
+			get_cache_color(this, area, migratetype);
+
+		parent = *new;
+		if (this_color < cc_color)
+			new = &((*new)->rb_right);
+		else if (this_color > cc_color)
+			new = &((*new)->rb_left);
+		else
+			return;
+	}
+
+	rb_link_node(&cc->rb_tree_node, parent, new);
+	rb_insert_color(&cc->rb_tree_node, root);
+}
+
 /*
  * Returns color of first non-empty free_list for purpose of cache color
  * allocations in specified range (start until end-1).
  */
-static inline unsigned long cache_color_find_nonempty(struct free_area *area,
-			int migratetype, unsigned long start, unsigned long end)
-{
-	unsigned long color;
-
-	for (color = start; color < end; ++color)
-		if (!list_empty(&area->colored_free_list[migratetype][color]))
-			break;
+static inline unsigned long cache_color_find_nonempty(
+		struct free_area *area, int migratetype,
+		unsigned long start, unsigned long end, bool delete_empty)
+{
+	unsigned long color = start;
+	struct cache_color *cc;
+
+	while (color < end) {
+		struct rb_root *root =
+			&area->cache_colored_free_lists[migratetype];
+		unsigned long cc_color;
+
+		cc = cache_color_area_find_next(area, migratetype, color);
+		if (!cc)
+			return end;
+
+		cc_color = get_cache_color(cc, area, migratetype);
+		if (list_empty(&cc->free_list)) {
+			if (delete_empty)
+				rb_erase(&cc->rb_tree_node, root);
+			color = cc_color + 1;
+		} else {
+			return cc_color;
+		}
+	}
 
 	return color;
 }
@@ -353,17 +441,23 @@ static inline struct list_head *cache_color_area_get_free_list(
 	unsigned long current_color = area->next_color[migratetype];
 
 	if (page) {
+		struct cache_color *cc;
+
 		color = page_cache_color(page, order);
-		return &area->colored_free_list[migratetype][color];
+		cc = &area->cache_colors[migratetype][color];
+
+		if (list_empty(&cc->free_list))
+			cache_color_area_insert(area, migratetype, cc);
+
+		return &cc->free_list;
 	}
 
 	color = cache_color_find_nonempty(area, migratetype, current_color,
-					cache_colors(order));
-
+					cache_colors(order), true);
 
 	if (color == cache_colors(order))
 		color = cache_color_find_nonempty(area, migratetype, 0,
-						current_color);
+						current_color, true);
 
 	current_color = color + 1;
 	if (current_color >= cache_colors(order))
@@ -371,14 +465,14 @@ static inline struct list_head *cache_color_area_get_free_list(
 
 	area->next_color[migratetype] = current_color;
 
-	return &area->colored_free_list[migratetype][color];
+	return &area->cache_colors[migratetype][color].free_list;
 }
 
 static inline bool cache_color_area_empty(struct free_area *area, int order,
 					int migratetype)
 {
 	return cache_color_find_nonempty(area, migratetype, 0,
-				cache_colors(order)) == cache_colors(order);
+			cache_colors(order), false) == cache_colors(order);
 }
 
 
@@ -386,12 +480,17 @@ static inline unsigned long cache_color_area_free_count(struct free_area *area,
 						int order, int migratetype)
 {
 	unsigned long count = 0;
-	unsigned long color;
 	struct list_head *lh;
+	struct rb_node *node;
 
-	for (color = 0; color < cache_colors(order); ++color)
-		list_for_each(lh, &area->colored_free_list[migratetype][color])
+	for (node = rb_first(&area->cache_colored_free_lists[migratetype]);
+			node; node = rb_next(node)) {
+		struct cache_color *cc =
+			rb_entry(node, struct cache_color, rb_tree_node);
+
+		list_for_each(lh, &cc->free_list)
 			++count;
+	}
 
 	return count;
 }
@@ -435,7 +534,7 @@ unsigned long area_free_count(struct free_area *area, int order,
  */
 static __ref void *cache_color_alloc_lists(struct zone *zone, int order)
 {
-	const size_t size = sizeof(struct list_head) * cache_colors(order);
+	const size_t size = sizeof(struct cache_color) * cache_colors(order);
 
 	return alloc_bootmem_pages_node(zone->zone_pgdat, size);
 }
@@ -2776,12 +2875,11 @@ void mark_free_pages(struct zone *zone)
 
 		if (cache_color_used(order)) {
 			unsigned long color;
-			struct list_head *colored_lists =
-					area->colored_free_list[t];
+			struct cache_color *ccs = area->cache_colors[t];
 
 			for (color = 0; color < cache_colors(order); ++color)
 				list_for_each_entry(page,
-						&colored_lists[color], lru)
+						&ccs[color].free_list, lru)
 					mark_free_page(page, order);
 		} else {
 			list_for_each_entry(page, &area->free_list[t], lru)
@@ -5680,18 +5778,17 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 static void __meminit zone_init_cache_color(struct zone *zone, int order,
 						int migratetype)
 {
-	unsigned long c;
+	unsigned long color;
 	struct free_area *area = &zone->free_area[order];
-
-	if (!cache_color_used(order))
-		return;
+	struct cache_color **ccs = &area->cache_colors[migratetype];
 
 	area->next_color[migratetype] = 0;
-	area->colored_free_list[migratetype] =
-		cache_color_alloc_lists(zone, order);
+	area->cache_colored_free_lists[migratetype] = RB_ROOT;
 
-	for (c = 0; c < cache_colors(order); ++c)
-		INIT_LIST_HEAD(&area->colored_free_list[migratetype][c]);
+	*ccs = cache_color_alloc_lists(zone, order);
+
+	for (color = 0; color < cache_colors(order); ++color)
+		INIT_LIST_HEAD(&(*ccs)[color].free_list);
 }
 
 static void __meminit zone_init_free_lists(struct zone *zone)
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-22 12:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22 12:45 [PATCH 0/3] mm: Add cache coloring mechanism Łukasz Daniluk
2017-08-22 12:45 ` [PATCH 1/3] mm: move free_list selection to dedicated functions Łukasz Daniluk
2017-08-22 12:45 ` [PATCH 2/3] mm: Add page colored allocation path Łukasz Daniluk
2017-08-22 12:45 ` [PATCH 3/3] mm: Add helper rbtree to search for next cache color Łukasz Daniluk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).